linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Add top down metrics to perf stat
@ 2016-01-16  1:12 Andi Kleen
  2016-01-16  1:12 ` [PATCH 01/11] perf, tools: Dont stop PMU parsing on alias parse error Andi Kleen
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the end.

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 100 ./BC1s
     0.100576098 frontend bound           retiring                 bad speculation          backend bound            
     0.100576098     8.83%                  48.93%                  35.24%                   7.00%               
     0.200800845     8.84%                  48.49%                  35.53%                   7.13%               
     0.300905983     8.73%                  48.64%                  35.58%                   7.05%            
...


 
On Hyper Threaded CPUs Top Down computes metrics per core instead of per logical CPU.
In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting.  

On systems without Hyper Threading it can be used per process.

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-11

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/11] perf, tools: Dont stop PMU parsing on alias parse error
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  2016-01-16  1:12 ` [PATCH 02/11] perf, tools: Parse an .aggr-per-core event attribute Andi Kleen
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

When an error happens during alias parsing currently the complete
parsing of all attributes of the PMU is stopped. This is breaks
old perf on a newer kernel that may have not-yet-know
alias attributes (such as .scale or .per-pkg).

Continue when some attribute is unparseable.

This is IMHO a stable candidate and should be backported
to older versions to avoid problems with newer kernels.

v2: Print warnings when something goes wrong.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/pmu.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index b597bcc..0d228d1 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -284,13 +284,12 @@ static int pmu_aliases_parse(char *dir, struct list_head *head)
 {
 	struct dirent *evt_ent;
 	DIR *event_dir;
-	int ret = 0;
 
 	event_dir = opendir(dir);
 	if (!event_dir)
 		return -EINVAL;
 
-	while (!ret && (evt_ent = readdir(event_dir))) {
+	while ((evt_ent = readdir(event_dir))) {
 		char path[PATH_MAX];
 		char *name = evt_ent->d_name;
 		FILE *file;
@@ -306,17 +305,19 @@ static int pmu_aliases_parse(char *dir, struct list_head *head)
 
 		snprintf(path, PATH_MAX, "%s/%s", dir, name);
 
-		ret = -EINVAL;
 		file = fopen(path, "r");
-		if (!file)
-			break;
+		if (!file) {
+			pr_warning("Cannot open %s\n", path);
+			continue;
+		}
 
-		ret = perf_pmu__new_alias(head, dir, name, file);
+		if (perf_pmu__new_alias(head, dir, name, file) < 0)
+			pr_warning("Cannot set up %s\n", name);
 		fclose(file);
 	}
 
 	closedir(event_dir);
-	return ret;
+	return 0;
 }
 
 /*
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 02/11] perf, tools: Parse an .aggr-per-core event attribute
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
  2016-01-16  1:12 ` [PATCH 01/11] perf, tools: Dont stop PMU parsing on alias parse error Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  2016-01-16  1:12 ` [PATCH 03/11] perf, tools, stat: Force --per-core mode for .aggr-per-core aliases Andi Kleen
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add the basic code to parse an .aggr-per-core event attribute.
The attribute means that the event needs to be measured in
per core mode.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/evsel.h        |  1 +
 tools/perf/util/parse-events.c |  1 +
 tools/perf/util/pmu.c          | 24 ++++++++++++++++++++++++
 tools/perf/util/pmu.h          |  2 ++
 4 files changed, 28 insertions(+)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 8e75434..239a666 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -114,6 +114,7 @@ struct perf_evsel {
 	bool			tracking;
 	bool			per_pkg;
 	bool			precise_max;
+	bool			aggr_per_core;
 	/* parse modifier helper */
 	int			exclude_GH;
 	int			nr_members;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4f7b0ef..9dff8c5 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1027,6 +1027,7 @@ int parse_events_add_pmu(struct parse_events_evlist *data,
 		evsel->unit = info.unit;
 		evsel->scale = info.scale;
 		evsel->per_pkg = info.per_pkg;
+		evsel->aggr_per_core = info.aggr_per_core;
 		evsel->snapshot = info.snapshot;
 	}
 
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 0d228d1..c5eb1ce 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -189,6 +189,24 @@ perf_pmu__parse_per_pkg(struct perf_pmu_alias *alias, char *dir, char *name)
 	return 0;
 }
 
+static void
+perf_pmu__parse_aggr_per_core(struct perf_pmu_alias *alias, char *dir, char *name)
+{
+	char path[PATH_MAX];
+	FILE *f;
+	int flag;
+
+	snprintf(path, PATH_MAX, "%s/%s.aggr-per-core", dir, name);
+
+	f = fopen(path, "r");
+	if (f) {
+		if (fscanf(f, "%d", &flag) == 1)
+			alias->aggr_per_core = flag != 0;
+		fclose(f);
+	}
+}
+
+
 static int perf_pmu__parse_snapshot(struct perf_pmu_alias *alias,
 				    char *dir, char *name)
 {
@@ -238,6 +256,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 		perf_pmu__parse_scale(alias, dir, name);
 		perf_pmu__parse_per_pkg(alias, dir, name);
 		perf_pmu__parse_snapshot(alias, dir, name);
+		perf_pmu__parse_aggr_per_core(alias, dir, name);
 	}
 
 	list_add_tail(&alias->list, list);
@@ -272,6 +291,8 @@ static inline bool pmu_alias_info_file(char *name)
 		return true;
 	if (len > 9 && !strcmp(name + len - 9, ".snapshot"))
 		return true;
+	if (len > 14 && !strcmp(name + len - 14, ".aggr-per-core"))
+		return true;
 
 	return false;
 }
@@ -851,6 +872,7 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
 	int ret;
 
 	info->per_pkg = false;
+	info->aggr_per_core = false;
 
 	/*
 	 * Mark unit and scale as not set
@@ -874,6 +896,8 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
 
 		if (alias->per_pkg)
 			info->per_pkg = true;
+		if (alias->aggr_per_core)
+			info->aggr_per_core = true;
 
 		list_del(&term->list);
 		free(term);
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 5d7e844..e9dde6c 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -32,6 +32,7 @@ struct perf_pmu_info {
 	double scale;
 	bool per_pkg;
 	bool snapshot;
+	bool aggr_per_core;
 };
 
 #define UNIT_MAX_LEN	31 /* max length for event unit name */
@@ -44,6 +45,7 @@ struct perf_pmu_alias {
 	double scale;
 	bool per_pkg;
 	bool snapshot;
+	bool aggr_per_core;
 };
 
 struct perf_pmu *perf_pmu__find(const char *name);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/11] perf, tools, stat: Force --per-core mode for .aggr-per-core aliases
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
  2016-01-16  1:12 ` [PATCH 01/11] perf, tools: Dont stop PMU parsing on alias parse error Andi Kleen
  2016-01-16  1:12 ` [PATCH 02/11] perf, tools: Parse an .aggr-per-core event attribute Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  2016-01-16  1:12 ` [PATCH 04/11] perf, tools, stat: Avoid fractional digits for integer scales Andi Kleen
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

When an event alias is used that the kernel marked as .aggr-per-core, force
--per-core mode (and also require -a and forbid cgroups or per thread mode).
This in term means, --topdown forces --per-core mode.

This is needed for TopDown in SMT mode, because it needs to measure
all threads in a core together and merge the values to compute the correct
percentages of how the pipeline is limited.

We do this if any alias is aggr-per-core.

The main stat code does the necessary checks and forces per core mode.

v2: Rename agg-per-core to aggr-per-core
v3: Split patch into parse and use
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 5a02870..d20cf3e 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2086,6 +2086,7 @@ static int __cmd_report(int argc, const char **argv)
 
 int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 {
+	struct perf_evsel *counter;
 	const char * const stat_usage[] = {
 		"perf stat [<options>] [<command>]",
 		NULL
@@ -2232,6 +2233,23 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (add_default_attributes())
 		goto out;
 
+	evlist__for_each (evsel_list, counter) {
+		/* Enable per core mode if only a single event requires it. */
+		if (counter->aggr_per_core) {
+			if (stat_config.aggr_mode != AGGR_GLOBAL &&
+			    stat_config.aggr_mode != AGGR_CORE) {
+				pr_err("per core event configuration requires per core mode\n");
+				goto out;
+			}
+			stat_config.aggr_mode = AGGR_CORE;
+			if (nr_cgroups || !target__has_cpu(&target)) {
+				pr_err("per core event configuration requires system-wide mode (-a)\n");
+				goto out;
+			}
+			break;
+		}
+	}
+
 	target__validate(&target);
 
 	if (perf_evlist__create_maps(evsel_list, &target) < 0) {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 04/11] perf, tools, stat: Avoid fractional digits for integer scales
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
                   ` (2 preceding siblings ...)
  2016-01-16  1:12 ` [PATCH 03/11] perf, tools, stat: Force --per-core mode for .aggr-per-core aliases Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  2016-01-16  1:12 ` [PATCH 05/11] perf, tools, stat: Scale values by unit before metrics Andi Kleen
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

When the scaling factor is a full integer don't display fractional
digits. This avoids unnecessary .00 output for topdown metrics
with scale factors.

v2: Remove redundant check.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index d20cf3e..dca4f0d 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -66,6 +66,7 @@
 #include <stdlib.h>
 #include <sys/prctl.h>
 #include <locale.h>
+#include <math.h>
 
 #define DEFAULT_SEPARATOR	" "
 #define CNTR_NOT_SUPPORTED	"<not supported>"
@@ -967,12 +968,12 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
 	const char *fmt;
 
 	if (csv_output) {
-		fmt = sc != 1.0 ?  "%.2f%s" : "%.0f%s";
+		fmt = floor(sc) != sc ?  "%.2f%s" : "%.0f%s";
 	} else {
 		if (big_num)
-			fmt = sc != 1.0 ? "%'18.2f%s" : "%'18.0f%s";
+			fmt = floor(sc) != sc ? "%'18.2f%s" : "%'18.0f%s";
 		else
-			fmt = sc != 1.0 ? "%18.2f%s" : "%18.0f%s";
+			fmt = floor(sc) != sc ? "%18.2f%s" : "%18.0f%s";
 	}
 
 	aggr_printout(evsel, id, nr);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 05/11] perf, tools, stat: Scale values by unit before metrics
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
                   ` (3 preceding siblings ...)
  2016-01-16  1:12 ` [PATCH 04/11] perf, tools, stat: Avoid fractional digits for integer scales Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  2016-01-16  1:12 ` [PATCH 06/11] perf, tools, stat: Basic support for TopDown in perf stat Andi Kleen
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Scale values by unit before passing them to the metrics printing functions.
This is needed for TopDown, because it needs to scale the slots correctly
by pipeline width / SMTness.

For existing metrics it shouldn't make any difference, as those generally
use events that don't have any units.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/stat.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 2f901d1..f3217b3 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -307,6 +307,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	struct perf_counts_values *aggr = &counter->counts->aggr;
 	struct perf_stat_evsel *ps = counter->priv;
 	u64 *count = counter->counts->aggr.values;
+	u64 val;
 	int i, ret;
 
 	aggr->val = aggr->ena = aggr->run = 0;
@@ -337,7 +338,8 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	/*
 	 * Save the full runtime - to allow normalization during printout:
 	 */
-	perf_stat__update_shadow_stats(counter, count, 0);
+	val = counter->scale * *count;
+	perf_stat__update_shadow_stats(counter, &val, 0);
 
 	return 0;
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 06/11] perf, tools, stat: Basic support for TopDown in perf stat
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
                   ` (4 preceding siblings ...)
  2016-01-16  1:12 ` [PATCH 05/11] perf, tools, stat: Scale values by unit before metrics Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  2016-01-16  1:12 ` [PATCH 07/11] perf, tools, stat: Add computation of TopDown formulas Andi Kleen
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add basic plumbing for TopDown in perf stat

Add a new --topdown options to enable events.
When --topdown is specified set up events for all topdown
events supported by the kernel.
Add topdown-* as a special case to the event parser, as is
needed for all events containing -.

The actual code to compute the metrics is in follow-on patches.

v2: Use standard sysctl read function.
v3: Move x86 specific code to arch/
v4: Enable --metric-only implicitly for topdown.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt |   8 +++
 tools/perf/arch/x86/util/Build         |   1 +
 tools/perf/arch/x86/util/group.c       |  27 ++++++++
 tools/perf/builtin-stat.c              | 110 +++++++++++++++++++++++++++++++--
 tools/perf/util/group.h                |   7 +++
 tools/perf/util/parse-events.l         |   1 +
 6 files changed, 148 insertions(+), 6 deletions(-)
 create mode 100644 tools/perf/arch/x86/util/group.c
 create mode 100644 tools/perf/util/group.h

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 51e3c5a..9f075c5 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -196,6 +196,14 @@ Aggregate counts per physical processor for system-wide mode measurements.
 --no-aggr::
 Do not aggregate counts across all monitored CPUs.
 
+--topdown::
+
+Print top down level 1 metrics if supported by the CPU. This allows to
+determine bottle necks in the CPU pipeline for CPU bound workloads,
+by breaking it down into frontend bound, backend bound, bad speculation
+and retiring. Metrics are only printed when they cross a threshold.
+
+This enable --metric-only, unless overriden with --no-metric-only.
 
 EXAMPLES
 --------
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 4659703..4cd8a16 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -3,6 +3,7 @@ libperf-y += tsc.o
 libperf-y += pmu.o
 libperf-y += kvm-stat.o
 libperf-y += perf_regs.o
+libperf-y += group.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/group.c b/tools/perf/arch/x86/util/group.c
new file mode 100644
index 0000000..f3039b5
--- /dev/null
+++ b/tools/perf/arch/x86/util/group.c
@@ -0,0 +1,27 @@
+#include <stdio.h>
+#include "api/fs/fs.h"
+#include "util/group.h"
+
+/*
+ * Check whether we can use a group for top down.
+ * Without a group may get bad results due to multiplexing.
+ */
+bool check_group(bool *warn)
+{
+	int n;
+
+	if (sysctl__read_int("kernel/nmi_watchdog", &n) < 0)
+		return false;
+	if (n > 0) {
+		*warn = true;
+		return false;
+	}
+	return true;
+}
+
+void group_warn(void)
+{
+	fprintf(stderr,
+		"nmi_watchdog enabled with topdown. May give wrong results.\n"
+		"Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
+}
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index dca4f0d..afea25d 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,10 +59,9 @@
 #include "util/thread.h"
 #include "util/thread_map.h"
 #include "util/counts.h"
-#include "util/session.h"
-#include "util/tool.h"
-#include "asm/bug.h"
+#include "util/group.h"
 
+#include <api/fs/fs.h>
 #include <stdlib.h>
 #include <sys/prctl.h>
 #include <locale.h>
@@ -98,6 +97,15 @@ static const char * transaction_limited_attrs = {
 	"}"
 };
 
+static const char * topdown_attrs[] = {
+	"topdown-total-slots",
+	"topdown-fetch-bubbles",
+	"topdown-slots-retired",
+	"topdown-recovery-bubbles",
+	"topdown-slots-issued",
+	NULL,
+};
+
 static struct perf_evlist	*evsel_list;
 
 static struct target target = {
@@ -112,6 +120,7 @@ static volatile pid_t		child_pid			= -1;
 static bool			null_run			=  false;
 static int			detailed_run			=  0;
 static bool			transaction_run;
+static bool			topdown_run			= false;
 static bool			big_num				=  true;
 static int			big_num_opt			=  -1;
 static const char		*csv_sep			= NULL;
@@ -124,6 +133,7 @@ static unsigned int		initial_delay			= 0;
 static unsigned int		unit_width			= 4; /* strlen("unit") */
 static bool			forever				= false;
 static bool			metric_only			= false;
+static bool			force_metric_only		= false;
 static struct timespec		ref_time;
 static struct cpu_map		*aggr_map;
 static aggr_get_id_t		aggr_get_id;
@@ -1455,6 +1465,14 @@ static int stat__set_big_num(const struct option *opt __maybe_unused,
 	return 0;
 }
 
+static int enable_metric_only(const struct option *opt __maybe_unused,
+			      const char *s __maybe_unused, int unset)
+{
+	force_metric_only = true;
+	metric_only = !unset;
+	return 0;
+}
+
 static const struct option stat_options[] = {
 	OPT_BOOLEAN('T', "transaction", &transaction_run,
 		    "hardware transaction statistics"),
@@ -1513,8 +1531,10 @@ static const struct option stat_options[] = {
 		     "aggregate counts per thread", AGGR_THREAD),
 	OPT_UINTEGER('D', "delay", &initial_delay,
 		     "ms to wait before starting measurement after program start"),
-	OPT_BOOLEAN(0, "metric-only", &metric_only,
-			"Only print computed metrics. No raw values"),
+	OPT_CALLBACK_NOOPT(0, "metric-only", &metric_only, NULL,
+			"Only print computed metrics. No raw values", enable_metric_only),
+	OPT_BOOLEAN(0, "topdown", &topdown_run,
+			"measure topdown level 1 statistics"),
 	OPT_END()
 };
 
@@ -1707,12 +1727,61 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	return 0;
 }
 
+static void filter_events(const char **attr, char **str, bool use_group)
+{
+	int off = 0;
+	int i;
+	int len = 0;
+	char *s;
+
+	for (i = 0; attr[i]; i++) {
+		if (pmu_have_event("cpu", attr[i])) {
+			len += strlen(attr[i]) + 1;
+			attr[i - off] = attr[i];
+		} else
+			off++;
+	}
+	attr[i - off] = NULL;
+
+	*str = malloc(len + 1 + 2);
+	if (!*str)
+		return;
+	s = *str;
+	if (i - off == 0) {
+		*s = 0;
+		return;
+	}
+	if (use_group)
+		*s++ = '{';
+	for (i = 0; attr[i]; i++) {
+		strcpy(s, attr[i]);
+		s += strlen(s);
+		*s++ = ',';
+	}
+	if (use_group) {
+		s[-1] = '}';
+		*s = 0;
+	} else
+		s[-1] = 0;
+}
+
+__weak bool check_group(bool *warn)
+{
+	*warn = false;
+	return false;
+}
+
+__weak void group_warn(void)
+{
+}
+
 /*
  * Add default attributes, if there were no attributes specified or
  * if -d/--detailed, -d -d or -d -d -d is used:
  */
 static int add_default_attributes(void)
 {
+	int err;
 	struct perf_event_attr default_attrs[] = {
 
   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK		},
@@ -1825,7 +1894,6 @@ static int add_default_attributes(void)
 		return 0;
 
 	if (transaction_run) {
-		int err;
 		if (pmu_have_event("cpu", "cycles-ct") &&
 		    pmu_have_event("cpu", "el-start"))
 			err = parse_events(evsel_list, transaction_attrs, NULL);
@@ -1838,6 +1906,36 @@ static int add_default_attributes(void)
 		return 0;
 	}
 
+	if (topdown_run) {
+		char *str = NULL;
+		bool warn = false;
+
+		if (!force_metric_only)
+			metric_only = true;
+		filter_events(topdown_attrs, &str, check_group(&warn));
+		if (topdown_attrs[0] && str) {
+			if (warn)
+				group_warn();
+			err = parse_events(evsel_list, str, NULL);
+			if (err) {
+				fprintf(stderr,
+					"Cannot set up top down events %s: %d\n",
+					str, err);
+				free(str);
+				return -1;
+			}
+		} else {
+			fprintf(stderr, "System does not support topdown\n");
+			return -1;
+		}
+		free(str);
+		/*
+		 * Right now combining with the other attributes breaks group
+		 * semantics.
+		 */
+		return 0;
+	}
+
 	if (!evsel_list->nr_entries) {
 		if (perf_evlist__add_default_attrs(evsel_list, default_attrs) < 0)
 			return -1;
diff --git a/tools/perf/util/group.h b/tools/perf/util/group.h
new file mode 100644
index 0000000..daad3ff
--- /dev/null
+++ b/tools/perf/util/group.h
@@ -0,0 +1,7 @@
+#ifndef GROUP_H
+#define GROUP_H 1
+
+bool check_group(bool *warn);
+void group_warn(void);
+
+#endif
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 58c5831..3e65d61 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -248,6 +248,7 @@ cycles-ct					{ return str(yyscanner, PE_KERNEL_PMU_EVENT); }
 cycles-t					{ return str(yyscanner, PE_KERNEL_PMU_EVENT); }
 mem-loads					{ return str(yyscanner, PE_KERNEL_PMU_EVENT); }
 mem-stores					{ return str(yyscanner, PE_KERNEL_PMU_EVENT); }
+topdown-[a-z-]+					{ return str(yyscanner, PE_KERNEL_PMU_EVENT); }
 
 L1-dcache|l1-d|l1d|L1-data		|
 L1-icache|l1-i|l1i|L1-instruction	|
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 07/11] perf, tools, stat: Add computation of TopDown formulas
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
                   ` (5 preceding siblings ...)
  2016-01-16  1:12 ` [PATCH 06/11] perf, tools, stat: Basic support for TopDown in perf stat Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  2016-01-16  1:12 ` [PATCH 08/11] perf, tools, stat: Add extra output of counter values with -v Andi Kleen
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Implement the TopDown formulas in perf stat. The topdown basic metrics
reported by the kernel are collected, and the formulas are computed
and output as normal metrics.

See the kernel commit exporting the events for details on the used
metrics.

v2: Always print all metrics, only use thresholds for coloring.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/stat-shadow.c | 160 ++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/stat.c        |   5 ++
 tools/perf/util/stat.h        |   5 ++
 3 files changed, 170 insertions(+)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 4d8f185..4b853c3 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -2,6 +2,7 @@
 #include "evsel.h"
 #include "stat.h"
 #include "color.h"
+#include "pmu.h"
 
 enum {
 	CTX_BIT_USER	= 1 << 0,
@@ -28,6 +29,11 @@ static struct stats runtime_dtlb_cache_stats[NUM_CTX][MAX_NR_CPUS];
 static struct stats runtime_cycles_in_tx_stats[NUM_CTX][MAX_NR_CPUS];
 static struct stats runtime_transaction_stats[NUM_CTX][MAX_NR_CPUS];
 static struct stats runtime_elision_stats[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_total_slots[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_slots_issued[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_slots_retired[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_fetch_bubbles[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_recovery_bubbles[NUM_CTX][MAX_NR_CPUS];
 
 struct stats walltime_nsecs_stats;
 
@@ -68,6 +74,11 @@ void perf_stat__reset_shadow_stats(void)
 		sizeof(runtime_transaction_stats));
 	memset(runtime_elision_stats, 0, sizeof(runtime_elision_stats));
 	memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
+	memset(runtime_topdown_total_slots, 0, sizeof(runtime_topdown_total_slots));
+	memset(runtime_topdown_slots_retired, 0, sizeof(runtime_topdown_slots_retired));
+	memset(runtime_topdown_slots_issued, 0, sizeof(runtime_topdown_slots_issued));
+	memset(runtime_topdown_fetch_bubbles, 0, sizeof(runtime_topdown_fetch_bubbles));
+	memset(runtime_topdown_recovery_bubbles, 0, sizeof(runtime_topdown_recovery_bubbles));
 }
 
 /*
@@ -90,6 +101,16 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 *count,
 		update_stats(&runtime_transaction_stats[ctx][cpu], count[0]);
 	else if (perf_stat_evsel__is(counter, ELISION_START))
 		update_stats(&runtime_elision_stats[ctx][cpu], count[0]);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS))
+		update_stats(&runtime_topdown_total_slots[ctx][cpu], count[0]);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED))
+		update_stats(&runtime_topdown_slots_issued[ctx][cpu], count[0]);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED))
+		update_stats(&runtime_topdown_slots_retired[ctx][cpu], count[0]);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES))
+		update_stats(&runtime_topdown_fetch_bubbles[ctx][cpu],count[0]);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
+		update_stats(&runtime_topdown_recovery_bubbles[ctx][cpu], count[0]);
 	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_stats(&runtime_stalled_cycles_front_stats[ctx][cpu], count[0]);
 	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
@@ -289,6 +310,104 @@ static void print_ll_cache_misses(int cpu,
 	out->print_metric(out->ctx, color, "%7.2f%%", "of all LL-cache hits", ratio);
 }
 
+/*
+ * High level "TopDown" CPU core pipe line bottleneck break down.
+ *
+ * Basic concept following
+ * Yasin, A Top Down Method for Performance analysis and Counter architecture
+ * ISPASS14
+ *
+ * The CPU pipeline is divided into 4 areas that can be bottlenecks:
+ *
+ * Frontend -> Backend -> Retiring
+ * BadSpeculation in addition means out of order execution that is thrown away
+ * (for example branch mispredictions)
+ * Frontend is instruction decoding.
+ * Backend is execution, like computation and accessing data in memory
+ * Retiring is good execution that is not directly bottlenecked
+ *
+ * The formulas are computed in slots.
+ * A slot is an entry in the pipeline each for the pipeline width
+ * (for example a 4-wide pipeline has 4 slots for each cycle)
+ *
+ * Formulas:
+ * BadSpeculation = ((SlotsIssued - SlotsRetired) + RecoveryBubbles) /
+ *			TotalSlots
+ * Retiring = SlotsRetired / TotalSlots
+ * FrontendBound = FetchBubbles / TotalSlots
+ * BackendBound = 1.0 - BadSpeculation - Retiring - FrontendBound
+ *
+ * The kernel provides the mapping to the low level CPU events and any scaling
+ * needed for the CPU pipeline width, for example:
+ *
+ * TotalSlots = Cycles * 4
+ *
+ * The scaling factor is communicated in the sysfs unit.
+ *
+ * In some cases the CPU may not be able to measure all the formulas due to
+ * missing events. In this case multiple formulas are combined, as possible.
+ *
+ * With SMT the slots of thread siblings need to be combined to get meaningful
+ * results. This is implemented by the kernel forcing per-core mode with
+ * the .aggr-per-core sysfs attribute.
+ *
+ * Full TopDown supports more levels to sub-divide each area: for example
+ * BackendBound into computing bound and memory bound. For now we only
+ * support Level 1 TopDown.
+ */
+
+static double td_total_slots(int ctx, int cpu)
+{
+	return avg_stats(&runtime_topdown_total_slots[ctx][cpu]);
+}
+
+static double td_bad_spec(int ctx, int cpu)
+{
+	double bad_spec = 0;
+	double total_slots;
+	double total;
+
+	total = avg_stats(&runtime_topdown_slots_issued[ctx][cpu]) -
+		avg_stats(&runtime_topdown_slots_retired[ctx][cpu]) +
+		avg_stats(&runtime_topdown_recovery_bubbles[ctx][cpu]);
+	total_slots = td_total_slots(ctx, cpu);
+	if (total_slots)
+		bad_spec = total / total_slots;
+	return bad_spec;
+}
+
+static double td_retiring(int ctx, int cpu)
+{
+	double retiring = 0;
+	double total_slots = td_total_slots(ctx, cpu);
+	double ret_slots = avg_stats(&runtime_topdown_slots_retired[ctx][cpu]);
+
+	if (total_slots)
+		retiring = ret_slots / total_slots;
+	return retiring;
+}
+
+static double td_fe_bound(int ctx, int cpu)
+{
+	double fe_bound = 0;
+	double total_slots = td_total_slots(ctx, cpu);
+	double fetch_bub = avg_stats(&runtime_topdown_fetch_bubbles[ctx][cpu]);
+
+	if (total_slots)
+		fe_bound = fetch_bub / total_slots;
+	return fe_bound;
+}
+
+static double td_be_bound(int ctx, int cpu)
+{
+	double sum = (td_fe_bound(ctx, cpu) +
+		      td_bad_spec(ctx, cpu) +
+		      td_retiring(ctx, cpu));
+	if (sum == 0)
+		return 0;
+	return 1.0 - sum;
+}
+
 void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 				   double avg, int cpu,
 				   struct perf_stat_output_ctx *out)
@@ -296,6 +415,7 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 	void *ctxp = out->ctx;
 	print_metric_t print_metric = out->print_metric;
 	double total, ratio = 0.0, total2;
+	const char *color = NULL;
 	int ctx = evsel_context(evsel);
 
 	if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
@@ -438,6 +558,46 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 				     avg / ratio);
 		else
 			print_metric(ctxp, NULL, NULL, "CPUs utilized", 0);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
+		double fe_bound = td_fe_bound(ctx, cpu);
+
+		if (fe_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(ctxp, color, "%8.2f%%", "frontend bound",
+				fe_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) {
+		double retiring = td_retiring(ctx, cpu);
+
+		if (retiring > 0.7)
+			color = PERF_COLOR_RED;
+		print_metric(ctxp, color, "%8.2f%%", "retiring",
+				retiring * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) {
+		double bad_spec = td_bad_spec(ctx, cpu);
+
+		if (bad_spec > 0.1)
+			color = PERF_COLOR_RED;
+		print_metric(ctxp, color, "%8.2f%%", "bad speculation",
+				bad_spec * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) {
+		double be_bound = td_be_bound(ctx, cpu);
+		const char *name = "backend bound";
+		static int have_recovery_bubbles = -1;
+
+		/* In case the CPU does not support topdown-recovery-bubbles */
+		if (have_recovery_bubbles < 0)
+			have_recovery_bubbles = pmu_have_event("cpu",
+					"topdown-recovery-bubbles");
+		if (!have_recovery_bubbles)
+			name = "backend bound/bad spec";
+
+		if (be_bound > 0.2)
+			color = PERF_COLOR_RED;
+		if (td_total_slots(ctx, cpu) > 0)
+			print_metric(ctxp, NULL, "%8.2f%%", name,
+					be_bound * 100.);
+		else
+			print_metric(ctxp, NULL, NULL, name, 0);
 	} else if (runtime_nsecs_stats[cpu].n != 0) {
 		char unit = 'M';
 		char unit_buf[10];
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index f3217b3..fc6039c 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -79,6 +79,11 @@ static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
 	ID(TRANSACTION_START,	cpu/tx-start/),
 	ID(ELISION_START,	cpu/el-start/),
 	ID(CYCLES_IN_TX_CP,	cpu/cycles-ct/),
+	ID(TOPDOWN_TOTAL_SLOTS, topdown-total-slots),
+	ID(TOPDOWN_SLOTS_ISSUED, topdown-slots-issued),
+	ID(TOPDOWN_SLOTS_RETIRED, topdown-slots-retired),
+	ID(TOPDOWN_FETCH_BUBBLES, topdown-fetch-bubbles),
+	ID(TOPDOWN_RECOVERY_BUBBLES, topdown-recovery-bubbles),
 };
 #undef ID
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index f14c0f4..4866240 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -17,6 +17,11 @@ enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__TRANSACTION_START,
 	PERF_STAT_EVSEL_ID__ELISION_START,
 	PERF_STAT_EVSEL_ID__CYCLES_IN_TX_CP,
+	PERF_STAT_EVSEL_ID__TOPDOWN_TOTAL_SLOTS,
+	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_ISSUED,
+	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_RETIRED,
+	PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_BUBBLES,
+	PERF_STAT_EVSEL_ID__TOPDOWN_RECOVERY_BUBBLES,
 	PERF_STAT_EVSEL_ID__MAX,
 };
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 08/11] perf, tools, stat: Add extra output of counter values with -v
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
                   ` (6 preceding siblings ...)
  2016-01-16  1:12 ` [PATCH 07/11] perf, tools, stat: Add computation of TopDown formulas Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  2016-01-16  1:12 ` [PATCH 09/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add debug output of raw counter values per CPU when
perf stat -v is specified, together with their cpu numbers.
This is very useful to debug problems with per core counters,
where we can normally only see aggregated values.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index afea25d..3930ee3 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -309,6 +309,14 @@ static int read_counter(struct perf_evsel *counter)
 					return -1;
 				}
 			}
+
+			if (verbose) {
+				fprintf(stat_config.output,
+					"%s: %d: %" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+						perf_evsel__name(counter),
+						cpu,
+						count->val, count->ena, count->run);
+			}
 		}
 	}
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 09/11] x86, perf: Support sysfs files depending on SMT status
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
                   ` (7 preceding siblings ...)
  2016-01-16  1:12 ` [PATCH 08/11] perf, tools, stat: Add extra output of counter values with -v Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  2016-01-18 18:03   ` Peter Zijlstra
  2016-01-18 18:06   ` Peter Zijlstra
  2016-01-16  1:12 ` [PATCH 10/11] x86, perf: Add Top Down events to Intel Core Andi Kleen
  2016-01-16  1:12 ` [PATCH 11/11] x86, perf: Add Top Down events to Intel Atom Andi Kleen
  10 siblings, 2 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a way to show different sysfs events attributes depending on
HyperThreading is on or off. This is difficult to determine
early at boot, so we just do it dynamically when the sysfs
attribute is read.

v2:
Compute HT status only once in CPU online/offline hooks.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event.c | 35 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event.h | 15 +++++++++++++++
 include/linux/perf_event.h       |  7 +++++++
 3 files changed, 57 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 1b443db..7379eb9 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1463,6 +1463,7 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 	unsigned int cpu = (long)hcpu;
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 	int i, ret = NOTIFY_OK;
+	bool ht_on;
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
@@ -1482,6 +1483,7 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 			kfree(cpuc->kfree_on_online[i]);
 			cpuc->kfree_on_online[i] = NULL;
 		}
+		x86_pmu.ht_on = cpumask_weight(topology_sibling_cpumask(cpu)) > 1;
 		break;
 
 	case CPU_DYING:
@@ -1493,6 +1495,15 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 	case CPU_DEAD:
 		if (x86_pmu.cpu_dead)
 			x86_pmu.cpu_dead(cpu);
+		/* Recompute HT state for all CPUs on offline */
+		ht_on = false;
+		for_each_online_cpu (cpu) {
+			if (cpumask_weight(topology_sibling_cpumask(cpu)) > 1) {
+				ht_on = true;
+				break;
+			}
+		}
+		x86_pmu.ht_on = ht_on;
 		break;
 
 	default:
@@ -1602,6 +1613,30 @@ ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
 	return x86_pmu.events_sysfs_show(page, config);
 }
 
+ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page)
+{
+	struct perf_pmu_events_ht_attr *pmu_attr =
+		container_of(attr, struct perf_pmu_events_ht_attr, attr);
+
+	/*
+	 * Report conditional events depending on Hyper-Threading.
+	 *
+	 * This is overly conservative as usually the HT special
+	 * handling is not needed if the other CPU thread is idle.
+	 *
+	 * Note this does not (cannot) handle the case when thread
+	 * siblings are invisible, for example with virtualization
+	 * if they are owned by some other guest.  The user tool
+	 * has to re-read when a thread sibling gets onlined later.
+	 */
+
+	return sprintf(page, "%s",
+			x86_pmu.ht_on ?
+			pmu_attr->event_str_ht :
+			pmu_attr->event_str_noht);
+}
+
 EVENT_ATTR(cpu-cycles,			CPU_CYCLES		);
 EVENT_ATTR(instructions,		INSTRUCTIONS		);
 EVENT_ATTR(cache-references,		CACHE_REFERENCES	);
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 7bb61e3..a7e0148 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -616,6 +616,11 @@ struct x86_pmu {
 	 * Intel host/guest support (KVM)
 	 */
 	struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
+
+	/*
+	 * Hyper Threading on?
+	 */
+	bool ht_on;
 };
 
 struct x86_perf_task_context {
@@ -661,6 +666,14 @@ static struct perf_pmu_events_attr event_attr_##v = {			\
 	.event_str	= str,						\
 };
 
+#define EVENT_ATTR_STR_HT(_name, v, noht, ht)				\
+static struct perf_pmu_events_ht_attr event_attr_##v = {		\
+	.attr		= __ATTR(_name, 0444, events_ht_sysfs_show, NULL),\
+	.id		= 0,						\
+	.event_str_noht	= noht,						\
+	.event_str_ht	= ht,						\
+}
+
 extern struct x86_pmu x86_pmu __read_mostly;
 
 static inline bool x86_pmu_has_lbr_callstack(void)
@@ -922,6 +935,8 @@ int knc_pmu_init(void);
 
 ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
 			  char *page);
+ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page);
 
 static inline int is_ht_workaround_enabled(void)
 {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f9828a4..ea2d830 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1166,6 +1166,13 @@ struct perf_pmu_events_attr {
 	const char *event_str;
 };
 
+struct perf_pmu_events_ht_attr {
+	struct device_attribute attr;
+	u64 id;
+	const char *event_str_ht;
+	const char *event_str_noht;
+};
+
 ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute *attr,
 			      char *page);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 10/11] x86, perf: Add Top Down events to Intel Core
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
                   ` (8 preceding siblings ...)
  2016-01-16  1:12 ` [PATCH 09/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  2016-01-16  1:12 ` [PATCH 11/11] x86, perf: Add Top Down events to Intel Atom Andi Kleen
  10 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add declarations for the events needed for TopDown to the
Intel big core CPUs starting with Sandy Bridge. We need
to report different values if HyperThreading is on or off.

The only thing this patch does is to export some events
in sysfs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots	  Available slots in the pipeline
topdown-slots-issued	  Slots issued into the pipeline
topdown-slots-retired	  Slots successfully retired
topdown-fetch-bubbles	  Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
			  from misspeculation

A slot is a single operation in the CPU pipe line.

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and their scaling factors (such as the pipeline width)
and perf stat then computes the formulas based on the
available metrics.  This is similar how existing
perf metrics, such as TSC metrics or IPC, are implemented.

This abstracts all CPU pipe line specific knowledge in the
kernel driver, but still avoids the need for larger scale perf
interface changes.

For HyperThreading the any bit is needed to get accurate
values when both threads are executing. This implies that
the events can only be collected as root or with
perf_event_paranoid=-1 for now.

Hyper Threading also requires averaging events from both
threads together (the CPU cannot measure them independently).

In perf stat this is already done by the per core mode.  The
new .aggr-per-core attribute is added to the events, which
then forces perf stat to enable --per-core.

The basic scheme is based on the following paper:
Yasin,
A Top Down Method for Performance analysis and Counter architecture
ISPASS14
(pdf available via google)

v2: Rework scaling. Fix formulas for HyperThreading.
v3: Rename agg-per-core to aggr-per-core
Always set aggr-per-core to one to get same output for HT off.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c | 68 ++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index a667078..c5a512e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -230,9 +230,59 @@ struct attribute *nhm_events_attrs[] = {
 	NULL,
 };
 
+/*
+ * TopDown events for Core.
+ *
+ * The events are all in slots, which is a free slot in a 4 wide
+ * pipeline. Some events are already reported in slots, for cycle
+ * events we multiply by the pipeline width (4).
+ *
+ * With Hyper Threading on, TopDown metrics are either summed or averaged
+ * between the threads of a core: (count_t0 + count_t1).
+ *
+ * For the average case the metric is always scaled to pipeline width,
+ * so we use factor 2 ((count_t0 + count_t1) / 2 * 4)
+ *
+ * We tell perf to aggregate per core by setting the .aggr-per-core
+ * attribute for the alias to 1.
+ */
+
+EVENT_ATTR_STR_HT(topdown-total-slots, td_total_slots,
+	"event=0x3c,umask=0x0",			/* cpu_clk_unhalted.thread */
+	"event=0x3c,umask=0x0,any=1");		/* cpu_clk_unhalted.thread_any */
+EVENT_ATTR_STR_HT(topdown-total-slots.scale, td_total_slots_scale, "4", "2");
+EVENT_ATTR_STR(topdown-total-slots.aggr-per-core, td_total_slots_pc, "1");
+EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued,
+	"event=0xe,umask=0x1");			/* uops_issued.any */
+EVENT_ATTR_STR(topdown-slots-issued.aggr-per-core, td_slots_issued_pc, "1");
+EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired,
+	"event=0xc2,umask=0x2");		/* uops_retired.retire_slots */
+EVENT_ATTR_STR(topdown-slots-retired.aggr-per-core, td_slots_retired_pc, "1");
+EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles,
+	"event=0x9c,umask=0x1");		/* idq_uops_not_delivered_core */
+EVENT_ATTR_STR(topdown-fetch-bubbles.aggr-per-core, td_fetch_bubbles_pc, "1");
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles,
+	"event=0xd,umask=0x3,cmask=1",		/* int_misc.recovery_cycles */
+	"event=0xd,umask=0x3,cmask=1,any=1");	/* int_misc.recovery_cycles_any */
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale,
+	"4", "2");
+EVENT_ATTR_STR(topdown-recovery-bubbles.aggr-per-core, td_recovery_bubbles_pc, "1");
+
 struct attribute *snb_events_attrs[] = {
 	EVENT_PTR(mem_ld_snb),
 	EVENT_PTR(mem_st_snb),
+	EVENT_PTR(td_slots_issued),
+	EVENT_PTR(td_slots_issued_pc),
+	EVENT_PTR(td_slots_retired),
+	EVENT_PTR(td_slots_retired_pc),
+	EVENT_PTR(td_fetch_bubbles),
+	EVENT_PTR(td_fetch_bubbles_pc),
+	EVENT_PTR(td_total_slots),
+	EVENT_PTR(td_total_slots_scale),
+	EVENT_PTR(td_total_slots_pc),
+	EVENT_PTR(td_recovery_bubbles),
+	EVENT_PTR(td_recovery_bubbles_scale),
+	EVENT_PTR(td_recovery_bubbles_pc),
 	NULL,
 };
 
@@ -3283,6 +3333,18 @@ static struct attribute *hsw_events_attrs[] = {
 	EVENT_PTR(cycles_ct),
 	EVENT_PTR(mem_ld_hsw),
 	EVENT_PTR(mem_st_hsw),
+	EVENT_PTR(td_slots_issued),
+	EVENT_PTR(td_slots_issued_pc),
+	EVENT_PTR(td_slots_retired),
+	EVENT_PTR(td_slots_retired_pc),
+	EVENT_PTR(td_fetch_bubbles),
+	EVENT_PTR(td_fetch_bubbles_pc),
+	EVENT_PTR(td_total_slots),
+	EVENT_PTR(td_total_slots_scale),
+	EVENT_PTR(td_total_slots_pc),
+	EVENT_PTR(td_recovery_bubbles),
+	EVENT_PTR(td_recovery_bubbles_scale),
+	EVENT_PTR(td_recovery_bubbles_pc),
 	NULL
 };
 
@@ -3622,6 +3684,12 @@ __init int intel_pmu_init(void)
 		memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
 		intel_pmu_lbr_init_skl();
 
+		/* INT_MISC.RECOVERY_CYCLES has umask 1 in Skylake */
+		event_attr_td_recovery_bubbles.event_str_noht =
+			"event=0xd,umask=0x1,cmask=1";
+		event_attr_td_recovery_bubbles.event_str_ht =
+			"event=0xd,umask=0x1,cmask=1,any=1";
+
 		x86_pmu.event_constraints = intel_skl_event_constraints;
 		x86_pmu.pebs_constraints = intel_skl_pebs_event_constraints;
 		x86_pmu.extra_regs = intel_skl_extra_regs;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 11/11] x86, perf: Add Top Down events to Intel Atom
  2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
                   ` (9 preceding siblings ...)
  2016-01-16  1:12 ` [PATCH 10/11] x86, perf: Add Top Down events to Intel Core Andi Kleen
@ 2016-01-16  1:12 ` Andi Kleen
  10 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add topdown event declarations to Silvermont / Airmont.
These cores do not support the full Top Down metrics, but an useful
subset (FrontendBound, Retiring, Backend Bound/Bad Speculation).

The perf stat tool automatically handles the missing events
and combines the available metrics.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index c5a512e..51a0f4c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1382,6 +1382,29 @@ static __initconst const u64 atom_hw_cache_event_ids
  },
 };
 
+EVENT_ATTR_STR(topdown-total-slots, td_total_slots_slm, "event=0x3c");
+EVENT_ATTR_STR(topdown-total-slots.scale, td_total_slots_scale_slm, "2");
+/* no_alloc_cycles.not_delivered */
+EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles_slm,
+	       "event=0xca,umask=0x50");
+EVENT_ATTR_STR(topdown-fetch-bubbles.scale, td_fetch_bubbles_scale_slm, "2");
+/* uops_retired.all */
+EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued_slm,
+	       "event=0xc2,umask=0x10");
+/* uops_retired.all */
+EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired_slm,
+	       "event=0xc2,umask=0x10");
+
+static struct attribute *slm_events_attrs[] = {
+	EVENT_PTR(td_total_slots_slm),
+	EVENT_PTR(td_total_slots_scale_slm),
+	EVENT_PTR(td_fetch_bubbles_slm),
+	EVENT_PTR(td_fetch_bubbles_scale_slm),
+	EVENT_PTR(td_slots_issued_slm),
+	EVENT_PTR(td_slots_retired_slm),
+	NULL
+};
+
 static struct extra_reg intel_slm_extra_regs[] __read_mostly =
 {
 	/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
@@ -3494,6 +3517,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints;
 		x86_pmu.extra_regs = intel_slm_extra_regs;
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
+		x86_pmu.cpu_events = slm_events_attrs;
 		pr_cont("Silvermont events, ");
 		break;
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 09/11] x86, perf: Support sysfs files depending on SMT status
  2016-01-16  1:12 ` [PATCH 09/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
@ 2016-01-18 18:03   ` Peter Zijlstra
  2016-01-18 18:06   ` Peter Zijlstra
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2016-01-18 18:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, eranian, linux-kernel, mingo, Andi Kleen

On Fri, Jan 15, 2016 at 05:12:51PM -0800, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add a way to show different sysfs events attributes depending on
> HyperThreading is on or off. This is difficult to determine
> early at boot, so we just do it dynamically when the sysfs
> attribute is read.
> 
> v2:
> Compute HT status only once in CPU online/offline hooks.


But we still have the 'duplicate' functionality in fixup_ht_bug() as
reported last time. Please merge these two, we should not have 2
different means of detecting SMT.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 09/11] x86, perf: Support sysfs files depending on SMT status
  2016-01-16  1:12 ` [PATCH 09/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
  2016-01-18 18:03   ` Peter Zijlstra
@ 2016-01-18 18:06   ` Peter Zijlstra
  2016-01-18 22:28     ` Andi Kleen
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2016-01-18 18:06 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, eranian, linux-kernel, mingo, Andi Kleen

On Fri, Jan 15, 2016 at 05:12:51PM -0800, Andi Kleen wrote:
> +ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
> +			  char *page)
> +{
> +	struct perf_pmu_events_ht_attr *pmu_attr =
> +		container_of(attr, struct perf_pmu_events_ht_attr, attr);
> +
> +	/*
> +	 * Report conditional events depending on Hyper-Threading.
> +	 *
> +	 * This is overly conservative as usually the HT special
> +	 * handling is not needed if the other CPU thread is idle.
> +	 *
> +	 * Note this does not (cannot) handle the case when thread
> +	 * siblings are invisible, for example with virtualization
> +	 * if they are owned by some other guest.  The user tool
> +	 * has to re-read when a thread sibling gets onlined later.
> +	 */
> +
> +	return sprintf(page, "%s",
> +			x86_pmu.ht_on ?
> +			pmu_attr->event_str_ht :
> +			pmu_attr->event_str_noht);
> +}

So one obvious problem with this is that a concurrent hotplug operation
can toggle ht_on resulting in bogus measurements.

I'm not sure there's anything we can do about that, other than WARN in
the tool if it finds inconsistent results; does it?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 09/11] x86, perf: Support sysfs files depending on SMT status
  2016-01-18 18:06   ` Peter Zijlstra
@ 2016-01-18 22:28     ` Andi Kleen
  0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-01-18 22:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, acme, jolsa, eranian, linux-kernel, mingo, Andi Kleen

> So one obvious problem with this is that a concurrent hotplug operation
> can toggle ht_on resulting in bogus measurements.
> 
> I'm not sure there's anything we can do about that, other than WARN in
> the tool if it finds inconsistent results; does it?

Right. Supporting a good interface for user programs to be informed
of CPU hotplug would be useful for a lot of things, but currently
it doesn't exist and it's definitely outside the scope of this
patchkit. If such an interface existed perf stat could use it
to reconfigure itself.

It's hard to do a good sanity check unfortunately.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-01-18 22:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-16  1:12 Add top down metrics to perf stat Andi Kleen
2016-01-16  1:12 ` [PATCH 01/11] perf, tools: Dont stop PMU parsing on alias parse error Andi Kleen
2016-01-16  1:12 ` [PATCH 02/11] perf, tools: Parse an .aggr-per-core event attribute Andi Kleen
2016-01-16  1:12 ` [PATCH 03/11] perf, tools, stat: Force --per-core mode for .aggr-per-core aliases Andi Kleen
2016-01-16  1:12 ` [PATCH 04/11] perf, tools, stat: Avoid fractional digits for integer scales Andi Kleen
2016-01-16  1:12 ` [PATCH 05/11] perf, tools, stat: Scale values by unit before metrics Andi Kleen
2016-01-16  1:12 ` [PATCH 06/11] perf, tools, stat: Basic support for TopDown in perf stat Andi Kleen
2016-01-16  1:12 ` [PATCH 07/11] perf, tools, stat: Add computation of TopDown formulas Andi Kleen
2016-01-16  1:12 ` [PATCH 08/11] perf, tools, stat: Add extra output of counter values with -v Andi Kleen
2016-01-16  1:12 ` [PATCH 09/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
2016-01-18 18:03   ` Peter Zijlstra
2016-01-18 18:06   ` Peter Zijlstra
2016-01-18 22:28     ` Andi Kleen
2016-01-16  1:12 ` [PATCH 10/11] x86, perf: Add Top Down events to Intel Core Andi Kleen
2016-01-16  1:12 ` [PATCH 11/11] x86, perf: Add Top Down events to Intel Atom Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).