[PATCH 0/4] TopDown metrics support for Ice Lake (perf tool)

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool)
@ 2020-08-20 16:45 kan.liang
  2020-08-20 16:45 ` [PATCH 1/4] perf tools: Rename group to topdown kan.liang
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: kan.liang @ 2020-08-20 16:45 UTC (permalink / raw)
  To: acme, peterz, mingo, jolsa, namhyung, linux-kernel; +Cc: eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The kernel patches have been merged into the tip's perf/core branch.
The patch set is on top of commit 2cb5383b30d4 ("perf/x86/intel: Support
per-thread RDPMC TopDown metrics") of the tip's perf/core branch.

The changes for the perf tool include:
- Extend --topdown option to support per thread TopDown metrics
- Support sample-read topdown metric group
- Add a complete document for the TopDown usage.

Ice Lake has support for measuring the level 1 TopDown metrics
directly in hardware. This is implemented by an additional METRICS
register, and a new Fixed Counter 3 that measures pipeline SLOTS.

New in Icelake
- Do not require generic counters. This allows to collect TopDown always
  in addition to other events.
- Measuring TopDown per thread/process instead of only per core

For the Ice Lake implementation of performance metrics, the values in
PERF_METRICS MSR are derived from fixed counter 3. Software should start
both registers, PERF_METRICS and fixed counter 3, from zero.
Additionally, software is recommended to periodically clear both
registers in order to maintain accurate measurements. The latter is
required for certain scenarios that involve sampling metrics at high
rates. Software should always write fixed counter 3 before write to
PERF_METRICS.

IA32_PERF_GLOBAL_STATUS. OVF_PERF_METRICS[48]: If this bit is set,
it indicates that some PERF_METRICS-related counter has overflowed and
a PMI is triggered. Software has to synchronize, e.g. re-start,
PERF_METRICS as well as fixed counter 3. Otherwise, PERF_METRICS may
return invalid values.

Limitation
- To get accurate result and avoid reading the METRICS register multiple
  times, the TopDown metrics events and SLOTS event have to be in the
  same group.
- METRICS and SLOTS registers have to be cleared after each read by SW.
  That is to prevent the lose of precision.
- Cannot do sampling read SLOTS and TopDown metric events

Please refer SDM Vol3, 18.3.9.3 Performance Metrics for the details of
TopDown metrics.

Andi Kleen (2):
  perf stat: Support new per thread TopDown metrics
  perf, tools: Add documentation for topdown metrics

Kan Liang (2):
  perf tools: Rename group to topdown
  perf record: Support sample-read topdown metric group

 tools/perf/Documentation/perf-stat.txt |   7 +-
 tools/perf/Documentation/topdown.txt   | 256 +++++++++++++++++++++++++
 tools/perf/arch/x86/util/Build         |   2 +-
 tools/perf/arch/x86/util/group.c       |  28 ---
 tools/perf/arch/x86/util/topdown.c     |  63 ++++++
 tools/perf/builtin-stat.c              |  81 +++-----
 tools/perf/util/Build                  |   1 +
 tools/perf/util/group.h                |   8 -
 tools/perf/util/record.c               |   3 +-
 tools/perf/util/stat-shadow.c          |  89 +++++++++
 tools/perf/util/stat.c                 |   4 +
 tools/perf/util/stat.h                 |   8 +
 tools/perf/util/topdown.c              |  58 ++++++
 tools/perf/util/topdown.h              |  12 ++
 14 files changed, 528 insertions(+), 92 deletions(-)
 create mode 100644 tools/perf/Documentation/topdown.txt
 delete mode 100644 tools/perf/arch/x86/util/group.c
 create mode 100644 tools/perf/arch/x86/util/topdown.c
 delete mode 100644 tools/perf/util/group.h
 create mode 100644 tools/perf/util/topdown.c
 create mode 100644 tools/perf/util/topdown.h

-- 
2.17.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] perf tools: Rename group to topdown
  2020-08-20 16:45 [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool) kan.liang
@ 2020-08-20 16:45 ` kan.liang
  2020-08-20 16:45 ` [PATCH 2/4] perf record: Support sample-read topdown metric group kan.liang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: kan.liang @ 2020-08-20 16:45 UTC (permalink / raw)
  To: acme, peterz, mingo, jolsa, namhyung, linux-kernel; +Cc: eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The group.h/c only include TopDown group related functions. The name
"group" is too generic and inaccurate. Use the name "topdown" to
replace it.

Move topdown related functions to a dedicated file, topdown.c.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/arch/x86/util/Build                |  2 +-
 .../perf/arch/x86/util/{group.c => topdown.c} |  2 +-
 tools/perf/builtin-stat.c                     | 51 +-----------------
 tools/perf/util/Build                         |  1 +
 tools/perf/util/topdown.c                     | 53 +++++++++++++++++++
 tools/perf/util/{group.h => topdown.h}        |  6 ++-
 6 files changed, 61 insertions(+), 54 deletions(-)
 rename tools/perf/arch/x86/util/{group.c => topdown.c} (95%)
 create mode 100644 tools/perf/util/topdown.c
 rename tools/perf/util/{group.h => topdown.h} (52%)

diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 47f9c56e744f..347c39b960eb 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -3,7 +3,7 @@ perf-y += tsc.o
 perf-y += pmu.o
 perf-y += kvm-stat.o
 perf-y += perf_regs.o
-perf-y += group.o
+perf-y += topdown.o
 perf-y += machine.o
 perf-y += event.o
 
diff --git a/tools/perf/arch/x86/util/group.c b/tools/perf/arch/x86/util/topdown.c
similarity index 95%
rename from tools/perf/arch/x86/util/group.c
rename to tools/perf/arch/x86/util/topdown.c
index e2f8034b8973..597e963fb3e7 100644
--- a/tools/perf/arch/x86/util/group.c
+++ b/tools/perf/arch/x86/util/topdown.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <stdio.h>
 #include "api/fs/fs.h"
-#include "util/group.h"
+#include "util/topdown.h"
 
 /*
  * Check whether we can use a group for top down.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 483a28ef4ec4..5583e22ca808 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -56,7 +56,7 @@
 #include "util/cpumap.h"
 #include "util/thread_map.h"
 #include "util/counts.h"
-#include "util/group.h"
+#include "util/topdown.h"
 #include "util/session.h"
 #include "util/tool.h"
 #include "util/string2.h"
@@ -1495,55 +1495,6 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	return 0;
 }
 
-static int topdown_filter_events(const char **attr, char **str, bool use_group)
-{
-	int off = 0;
-	int i;
-	int len = 0;
-	char *s;
-
-	for (i = 0; attr[i]; i++) {
-		if (pmu_have_event("cpu", attr[i])) {
-			len += strlen(attr[i]) + 1;
-			attr[i - off] = attr[i];
-		} else
-			off++;
-	}
-	attr[i - off] = NULL;
-
-	*str = malloc(len + 1 + 2);
-	if (!*str)
-		return -1;
-	s = *str;
-	if (i - off == 0) {
-		*s = 0;
-		return 0;
-	}
-	if (use_group)
-		*s++ = '{';
-	for (i = 0; attr[i]; i++) {
-		strcpy(s, attr[i]);
-		s += strlen(s);
-		*s++ = ',';
-	}
-	if (use_group) {
-		s[-1] = '}';
-		*s = 0;
-	} else
-		s[-1] = 0;
-	return 0;
-}
-
-__weak bool arch_topdown_check_group(bool *warn)
-{
-	*warn = false;
-	return false;
-}
-
-__weak void arch_topdown_group_warn(void)
-{
-}
-
 /*
  * Add default attributes, if there were no attributes specified or
  * if -d/--detailed, -d -d or -d -d -d is used:
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index cd5e41960e64..eebbd5223cae 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -101,6 +101,7 @@ perf-y += call-path.o
 perf-y += rwsem.o
 perf-y += thread-stack.o
 perf-y += spark.o
+perf-y += topdown.o
 perf-$(CONFIG_AUXTRACE) += auxtrace.o
 perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 perf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
new file mode 100644
index 000000000000..a085b3c77c27
--- /dev/null
+++ b/tools/perf/util/topdown.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdio.h>
+#include "pmu.h"
+#include "topdown.h"
+
+int topdown_filter_events(const char **attr, char **str, bool use_group)
+{
+	int off = 0;
+	int i;
+	int len = 0;
+	char *s;
+
+	for (i = 0; attr[i]; i++) {
+		if (pmu_have_event("cpu", attr[i])) {
+			len += strlen(attr[i]) + 1;
+			attr[i - off] = attr[i];
+		} else
+			off++;
+	}
+	attr[i - off] = NULL;
+
+	*str = malloc(len + 1 + 2);
+	if (!*str)
+		return -1;
+	s = *str;
+	if (i - off == 0) {
+		*s = 0;
+		return 0;
+	}
+	if (use_group)
+		*s++ = '{';
+	for (i = 0; attr[i]; i++) {
+		strcpy(s, attr[i]);
+		s += strlen(s);
+		*s++ = ',';
+	}
+	if (use_group) {
+		s[-1] = '}';
+		*s = 0;
+	} else
+		s[-1] = 0;
+	return 0;
+}
+
+__weak bool arch_topdown_check_group(bool *warn)
+{
+	*warn = false;
+	return false;
+}
+
+__weak void arch_topdown_group_warn(void)
+{
+}
diff --git a/tools/perf/util/group.h b/tools/perf/util/topdown.h
similarity index 52%
rename from tools/perf/util/group.h
rename to tools/perf/util/topdown.h
index f36c7e31780a..e3d70e95f4f1 100644
--- a/tools/perf/util/group.h
+++ b/tools/perf/util/topdown.h
@@ -1,8 +1,10 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef GROUP_H
-#define GROUP_H 1
+#ifndef TOPDOWN_H
+#define TOPDOWN_H 1
 
 bool arch_topdown_check_group(bool *warn);
 void arch_topdown_group_warn(void);
 
+int topdown_filter_events(const char **attr, char **str, bool use_group);
+
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] perf record: Support sample-read topdown metric group
  2020-08-20 16:45 [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool) kan.liang
  2020-08-20 16:45 ` [PATCH 1/4] perf tools: Rename group to topdown kan.liang
@ 2020-08-20 16:45 ` kan.liang
  2020-08-20 16:45 ` [PATCH 3/4] perf stat: Support new per thread TopDown metrics kan.liang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: kan.liang @ 2020-08-20 16:45 UTC (permalink / raw)
  To: acme, peterz, mingo, jolsa, namhyung, linux-kernel; +Cc: eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With the hardware TopDown metrics feature, sample-read feature should be
supported for a topdown group, e.g., sample a non-topdown event and read
a topdown metric group. But the current perf record code errors out.

For a topdown metric group, the slots event must be the leader of the
group, but the leader slots event doesn't support sampling.

To support sample-read the topdown metric group, use the 2nd event of
the group as the "leader" for the purposes of sampling.

Only the platform with Topdown metic feature supports sample-read the
topdown group. Add arch_topdown_sample_read() to indicate whether the
topdown group supports sample-read.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/arch/x86/util/topdown.c | 35 ++++++++++++++++++++++++++++++
 tools/perf/util/record.c           |  3 ++-
 tools/perf/util/topdown.c          |  5 +++++
 tools/perf/util/topdown.h          |  2 ++
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
index 597e963fb3e7..fcf1f9c17a44 100644
--- a/tools/perf/arch/x86/util/topdown.c
+++ b/tools/perf/arch/x86/util/topdown.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <stdio.h>
 #include "api/fs/fs.h"
+#include "util/pmu.h"
 #include "util/topdown.h"
 
 /*
@@ -26,3 +27,37 @@ void arch_topdown_group_warn(void)
 		"nmi_watchdog enabled with topdown. May give wrong results.\n"
 		"Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
 }
+
+#define TOPDOWN_SLOTS		0x0400
+
+static bool is_topdown_slots_event(struct evsel *counter)
+{
+	if (!counter->pmu_name)
+		return false;
+
+	if (strcmp(counter->pmu_name, "cpu"))
+		return false;
+
+	if (counter->core.attr.config == TOPDOWN_SLOTS)
+		return true;
+
+	return false;
+}
+
+/*
+ * Check whether a topdown group supports sample-read.
+ *
+ * Only Topdown metic supports sample-read. The slots
+ * event must be the leader of the topdown group.
+ */
+
+bool arch_topdown_sample_read(struct evsel *leader)
+{
+	if (!pmu_have_event("cpu", "slots"))
+		return false;
+
+	if (is_topdown_slots_event(leader))
+		return true;
+
+	return false;
+}
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index a4cc11592f6b..a857a13b0544 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -13,6 +13,7 @@
 #include "util/perf_api_probe.h"
 #include "record.h"
 #include "../perf-sys.h"
+#include "topdown.h"
 
 /*
  * evsel__config_leader_sampling() uses special rules for leader sampling.
@@ -23,7 +24,7 @@ static struct evsel *evsel__read_sampler(struct evsel *evsel, struct evlist *evl
 {
 	struct evsel *leader = evsel->leader;
 
-	if (evsel__is_aux_event(leader)) {
+	if (evsel__is_aux_event(leader) || arch_topdown_sample_read(leader)) {
 		evlist__for_each_entry(evlist, evsel) {
 			if (evsel->leader == leader && evsel != evsel->leader)
 				return evsel;
diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
index a085b3c77c27..1081b20f9891 100644
--- a/tools/perf/util/topdown.c
+++ b/tools/perf/util/topdown.c
@@ -51,3 +51,8 @@ __weak bool arch_topdown_check_group(bool *warn)
 __weak void arch_topdown_group_warn(void)
 {
 }
+
+__weak bool arch_topdown_sample_read(struct evsel *leader __maybe_unused)
+{
+	return false;
+}
diff --git a/tools/perf/util/topdown.h b/tools/perf/util/topdown.h
index e3d70e95f4f1..2f0d0b887639 100644
--- a/tools/perf/util/topdown.h
+++ b/tools/perf/util/topdown.h
@@ -1,9 +1,11 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef TOPDOWN_H
 #define TOPDOWN_H 1
+#include "evsel.h"
 
 bool arch_topdown_check_group(bool *warn);
 void arch_topdown_group_warn(void);
+bool arch_topdown_sample_read(struct evsel *leader);
 
 int topdown_filter_events(const char **attr, char **str, bool use_group);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] perf stat: Support new per thread TopDown metrics
  2020-08-20 16:45 [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool) kan.liang
  2020-08-20 16:45 ` [PATCH 1/4] perf tools: Rename group to topdown kan.liang
  2020-08-20 16:45 ` [PATCH 2/4] perf record: Support sample-read topdown metric group kan.liang
@ 2020-08-20 16:45 ` kan.liang
  2020-08-20 20:05   ` Andi Kleen
  2020-08-20 16:45 ` [PATCH 4/4] perf, tools: Add documentation for topdown metrics kan.liang
  2020-08-26 15:54 ` [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool) Jiri Olsa
  4 siblings, 1 reply; 8+ messages in thread
From: kan.liang @ 2020-08-20 16:45 UTC (permalink / raw)
  To: acme, peterz, mingo, jolsa, namhyung, linux-kernel; +Cc: eranian, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Icelake has support for reporting per thread TopDown metrics.
These are reported differently than the previous TopDown support,
each metric is standalone, but scaled to pipeline "slots".
We don't need to do anything special for HyperThreading anymore.
Teach perf stat --topdown to handle these new metrics and
print them in the same way as the previous TopDown metrics.
The restrictions of only being able to report information per core is
gone.

Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt |  7 +-
 tools/perf/builtin-stat.c              | 30 ++++++++-
 tools/perf/util/stat-shadow.c          | 89 ++++++++++++++++++++++++++
 tools/perf/util/stat.c                 |  4 ++
 tools/perf/util/stat.h                 |  8 +++
 5 files changed, 134 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index c9bfefc051fb..e803dbdc88a8 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -357,6 +357,11 @@ if the workload is actually bound by the CPU and not by something else.
 For best results it is usually a good idea to use it with interval
 mode like -I 1000, as the bottleneck of workloads can change often.
 
+This enables --metric-only, unless overridden with --no-metric-only.
+
+The following restrictions only apply to older Intel CPUs and Atom,
+on newer CPUs (IceLake and later) TopDown can be collected for any thread:
+
 The top down metrics are collected per core instead of per
 CPU thread. Per core mode is automatically enabled
 and -a (global monitoring) is needed, requiring root rights or
@@ -368,8 +373,6 @@ echo 0 > /proc/sys/kernel/nmi_watchdog
 for best results. Otherwise the bottlenecks may be inconsistent
 on workload with changing phases.
 
-This enables --metric-only, unless overridden with --no-metric-only.
-
 To interpret the results it is usually needed to know on which
 CPUs the workload runs on. If needed the CPUs can be forced using
 taskset.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 5583e22ca808..6290da5bd142 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -128,6 +128,15 @@ static const char * topdown_attrs[] = {
 	NULL,
 };
 
+static const char *topdown_metric_attrs[] = {
+	"slots",
+	"topdown-retiring",
+	"topdown-bad-spec",
+	"topdown-fe-bound",
+	"topdown-be-bound",
+	NULL,
+};
+
 static const char *smi_cost_attrs = {
 	"{"
 	"msr/aperf/,"
@@ -1691,6 +1700,24 @@ static int add_default_attributes(void)
 		char *str = NULL;
 		bool warn = false;
 
+		if (!force_metric_only)
+			stat_config.metric_only = true;
+
+		if (topdown_filter_events(topdown_metric_attrs, &str, 1) < 0) {
+			pr_err("Out of memory\n");
+			return -1;
+		}
+		if (topdown_metric_attrs[0] && str) {
+			if (!stat_config.interval && !stat_config.metric_only) {
+				fprintf(stat_config.output,
+					"Topdown accuracy may decreases when measuring long period.\n"
+					"Please print the result regularly, e.g. -I1000\n");
+			}
+			goto setup_metrics;
+		}
+
+		str = NULL;
+
 		if (stat_config.aggr_mode != AGGR_GLOBAL &&
 		    stat_config.aggr_mode != AGGR_CORE) {
 			pr_err("top down event configuration requires --per-core mode\n");
@@ -1702,8 +1729,6 @@ static int add_default_attributes(void)
 			return -1;
 		}
 
-		if (!force_metric_only)
-			stat_config.metric_only = true;
 		if (topdown_filter_events(topdown_attrs, &str,
 				arch_topdown_check_group(&warn)) < 0) {
 			pr_err("Out of memory\n");
@@ -1712,6 +1737,7 @@ static int add_default_attributes(void)
 		if (topdown_attrs[0] && str) {
 			if (warn)
 				arch_topdown_group_warn();
+setup_metrics:
 			err = parse_events(evsel_list, str, &errinfo);
 			if (err) {
 				fprintf(stderr,
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index e1ba6c1b916a..3204084161c9 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -241,6 +241,18 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
 		update_runtime_stat(st, STAT_TOPDOWN_RECOVERY_BUBBLES,
 				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_RETIRING))
+		update_runtime_stat(st, STAT_TOPDOWN_RETIRING,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BAD_SPEC))
+		update_runtime_stat(st, STAT_TOPDOWN_BAD_SPEC,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_FE_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_FE_BOUND,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BE_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_BE_BOUND,
+				    ctx, cpu, count);
 	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT,
 				    ctx, cpu, count);
@@ -705,6 +717,47 @@ static double td_be_bound(int ctx, int cpu, struct runtime_stat *st)
 	return sanitize_val(1.0 - sum);
 }
 
+/*
+ * Kernel reports metrics multiplied with slots. To get back
+ * the ratios we need to recreate the sum.
+ */
+
+static double td_metric_ratio(int ctx, int cpu,
+			      enum stat_type type,
+			      struct runtime_stat *stat)
+{
+	double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, ctx, cpu);
+	double d = runtime_stat_avg(stat, type, ctx, cpu);
+
+	if (sum)
+		return d / sum;
+	return 0;
+}
+
+/*
+ * ... but only if most of the values are actually available.
+ * We allow two missing.
+ */
+
+static bool full_td(int ctx, int cpu,
+		    struct runtime_stat *stat)
+{
+	int c = 0;
+
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, ctx, cpu) > 0)
+		c++;
+	return c >= 2;
+}
+
 static void print_smi_cost(struct perf_stat_config *config,
 			   int cpu, struct evsel *evsel,
 			   struct perf_stat_output_ctx *out,
@@ -1071,6 +1124,42 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					be_bound * 100.);
 		else
 			print_metric(config, ctxp, NULL, NULL, name, 0);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RETIRING) &&
+			full_td(ctx, cpu, st)) {
+		double retiring = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_RETIRING, st);
+
+		if (retiring > 0.7)
+			color = PERF_COLOR_GREEN;
+		print_metric(config, ctxp, color, "%8.1f%%", "retiring",
+				retiring * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FE_BOUND) &&
+			full_td(ctx, cpu, st)) {
+		double fe_bound = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_FE_BOUND, st);
+
+		if (fe_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "frontend bound",
+				fe_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BE_BOUND) &&
+			full_td(ctx, cpu, st)) {
+		double be_bound = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_BE_BOUND, st);
+
+		if (be_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "backend bound",
+				be_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BAD_SPEC) &&
+			full_td(ctx, cpu, st)) {
+		double bad_spec = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_BAD_SPEC, st);
+
+		if (bad_spec > 0.1)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "bad speculation",
+				bad_spec * 100.);
 	} else if (evsel->metric_expr) {
 		generic_metric(config, evsel->metric_expr, evsel->metric_events, NULL,
 				evsel->name, evsel->metric_name, NULL, 1, cpu, out, st);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index cdb154381a87..bd0decd6d753 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -95,6 +95,10 @@ static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
 	ID(TOPDOWN_SLOTS_RETIRED, topdown-slots-retired),
 	ID(TOPDOWN_FETCH_BUBBLES, topdown-fetch-bubbles),
 	ID(TOPDOWN_RECOVERY_BUBBLES, topdown-recovery-bubbles),
+	ID(TOPDOWN_RETIRING, topdown-retiring),
+	ID(TOPDOWN_BAD_SPEC, topdown-bad-spec),
+	ID(TOPDOWN_FE_BOUND, topdown-fe-bound),
+	ID(TOPDOWN_BE_BOUND, topdown-be-bound),
 	ID(SMI_NUM, msr/smi/),
 	ID(APERF, msr/aperf/),
 };
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index f8778cffd941..6c944d81d726 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -28,6 +28,10 @@ enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_RETIRED,
 	PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_BUBBLES,
 	PERF_STAT_EVSEL_ID__TOPDOWN_RECOVERY_BUBBLES,
+	PERF_STAT_EVSEL_ID__TOPDOWN_RETIRING,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BAD_SPEC,
+	PERF_STAT_EVSEL_ID__TOPDOWN_FE_BOUND,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BE_BOUND,
 	PERF_STAT_EVSEL_ID__SMI_NUM,
 	PERF_STAT_EVSEL_ID__APERF,
 	PERF_STAT_EVSEL_ID__MAX,
@@ -82,6 +86,10 @@ enum stat_type {
 	STAT_TOPDOWN_SLOTS_RETIRED,
 	STAT_TOPDOWN_FETCH_BUBBLES,
 	STAT_TOPDOWN_RECOVERY_BUBBLES,
+	STAT_TOPDOWN_RETIRING,
+	STAT_TOPDOWN_BAD_SPEC,
+	STAT_TOPDOWN_FE_BOUND,
+	STAT_TOPDOWN_BE_BOUND,
 	STAT_SMI_NUM,
 	STAT_APERF,
 	STAT_MAX
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] perf, tools: Add documentation for topdown metrics
  2020-08-20 16:45 [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool) kan.liang
                   ` (2 preceding siblings ...)
  2020-08-20 16:45 ` [PATCH 3/4] perf stat: Support new per thread TopDown metrics kan.liang
@ 2020-08-20 16:45 ` kan.liang
  2020-08-26 15:54 ` [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool) Jiri Olsa
  4 siblings, 0 replies; 8+ messages in thread
From: kan.liang @ 2020-08-20 16:45 UTC (permalink / raw)
  To: acme, peterz, mingo, jolsa, namhyung, linux-kernel; +Cc: eranian, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Add some documentation how to use the topdown metrics in ring 3.

Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/topdown.txt | 256 +++++++++++++++++++++++++++
 1 file changed, 256 insertions(+)
 create mode 100644 tools/perf/Documentation/topdown.txt

diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
new file mode 100644
index 000000000000..f08a2cfdf6e3
--- /dev/null
+++ b/tools/perf/Documentation/topdown.txt
@@ -0,0 +1,256 @@
+Using TopDown metrics in user space
+-----------------------------------
+
+Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown
+methology to break down CPU pipeline execution into 4 bottlenecks:
+frontend bound, backend bound, bad speculation, retiring.
+
+For more details on Topdown see [1][5]
+
+Traditionally this was implemented by events in generic counters
+and specific formulas to compute the bottlenecks.
+
+perf stat --topdown implements this.
+
+Full Top Down includes more levels that can break down the
+bottlenecks further. This is not directly implemented in perf,
+but available in other tools that can run on top of perf,
+such as toplev[2] or vtune[3]
+
+New Topdown features in Ice Lake
+===============================
+
+With Ice Lake CPUs the TopDown metrics are directly available as
+fixed counters and do not require generic counters. This allows
+to collect TopDown always in addition to other events.
+
+% perf stat -a --topdown -I1000
+#           time             retiring      bad speculation       frontend bound        backend bound
+     1.001281330                23.0%                15.3%                29.6%                32.1%
+     2.003009005                 5.0%                 6.8%                46.6%                41.6%
+     3.004646182                 6.7%                 6.7%                46.0%                40.6%
+     4.006326375                 5.0%                 6.4%                47.6%                41.0%
+     5.007991804                 5.1%                 6.3%                46.3%                42.3%
+     6.009626773                 6.2%                 7.1%                47.3%                39.3%
+     7.011296356                 4.7%                 6.7%                46.2%                42.4%
+     8.012951831                 4.7%                 6.7%                47.5%                41.1%
+...
+
+This also enables measuring TopDown per thread/process instead
+of only per core.
+
+Using TopDown through RDPMC in applications on Ice Lake
+======================================================
+
+For more fine grained measurements it can be useful to
+access the new  directly from user space. This is more complicated,
+but drastically lowers overhead.
+
+On Ice Lake, there is a new fixed counter 3: SLOTS, which reports
+"pipeline SLOTS" (cycles multiplied by core issue width) and a
+metric register that reports slots ratios for the different bottleneck
+categories.
+
+The metrics counter is CPU model specific and is not available on older
+CPUs.
+
+Example code
+============
+
+Library functions to do the functionality described below
+is also available in libjevents [4]
+
+The application opens a group with fixed counter 3 (SLOTS) and any
+metric event, and allow user programs to read the performance counters.
+
+Fixed counter 3 is mapped to a pseudo event event=0x00, umask=04,
+so the perf_event_attr structure should be initialized with
+{ .config = 0x0400, .type = PERF_TYPE_RAW }
+The metric events are mapped to the pseudo event event=0x00, umask=0x8X.
+For example, the perf_event_attr structure can be initialized with
+{ .config = 0x8000, .type = PERF_TYPE_RAW } for Retiring metric event
+The Fixed counter 3 must be the leader of the group.
+
+#include <linux/perf_event.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+/* Provide own perf_event_open stub because glibc doesn't */
+__attribute__((weak))
+int perf_event_open(struct perf_event_attr *attr, pid_t pid,
+		    int cpu, int group_fd, unsigned long flags)
+{
+	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+}
+
+/* Open slots counter file descriptor for current task. */
+struct perf_event_attr slots = {
+	.type = PERF_TYPE_RAW,
+	.size = sizeof(struct perf_event_attr),
+	.config = 0x400,
+	.exclude_kernel = 1,
+};
+
+int slots_fd = perf_event_open(&slots, 0, -1, -1, 0);
+if (slots_fd < 0)
+	... error ...
+
+/*
+ * Open metrics event file descriptor for current task.
+ * Set slots event as the leader of the group.
+ */
+struct perf_event_attr metrics = {
+	.type = PERF_TYPE_RAW,
+	.size = sizeof(struct perf_event_attr),
+	.config = 0x8000,
+	.exclude_kernel = 1,
+};
+
+int metrics_fd = perf_event_open(&metrics, 0, -1, slots_fd, 0);
+if (metrics_fd < 0)
+	... error ...
+
+
+The RDPMC instruction (or _rdpmc compiler intrinsic) can now be used
+to read slots and the topdown metrics at different points of the program:
+
+#include <stdint.h>
+#include <x86intrin.h>
+
+#define RDPMC_FIXED	(1 << 30)	/* return fixed counters */
+#define RDPMC_METRIC	(1 << 29)	/* return metric counters */
+
+#define FIXED_COUNTER_SLOTS		3
+#define METRIC_COUNTER_TOPDOWN_L1	0
+
+static inline uint64_t read_slots(void)
+{
+	return _rdpmc(RDPMC_FIXED | FIXED_COUNTER_SLOTS);
+}
+
+static inline uint64_t read_metrics(void)
+{
+	return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1);
+}
+
+Then the program can be instrumented to read these metrics at different
+points.
+
+It's not a good idea to do this with too short code regions,
+as the parallelism and overlap in the CPU program execution will
+cause too much measurement inaccuracy. For example instrumenting
+individual basic blocks is definitely too fine grained.
+
+Decoding metrics values
+=======================
+
+The value reported by read_metrics() contains four 8 bit fields
+that represent a scaled ratio that represent the Level 1 bottleneck.
+All four fields add up to 0xff (= 100%)
+
+The binary ratios in the metric value can be converted to float ratios:
+
+#define GET_METRIC(m, i) (((m) >> (i*8)) & 0xff)
+
+#define TOPDOWN_RETIRING(val)	((float)GET_METRIC(val, 0) / 0xff)
+#define TOPDOWN_BAD_SPEC(val)	((float)GET_METRIC(val, 1) / 0xff)
+#define TOPDOWN_FE_BOUND(val)	((float)GET_METRIC(val, 2) / 0xff)
+#define TOPDOWN_BE_BOUND(val)	((float)GET_METRIC(val, 3) / 0xff)
+
+and then converted to percent for printing.
+
+The ratios in the metric accumulate for the time when the counter
+is enabled. For measuring programs it is often useful to measure
+specific sections. For this it is needed to deltas on metrics.
+
+This can be done by scaling the metrics with the slots counter
+read at the same time.
+
+Then it's possible to take deltas of these slots counts
+measured at different points, and determine the metrics
+for that time period.
+
+	slots_a = read_slots();
+	metric_a = read_metrics();
+
+	... larger code region ...
+
+	slots_b = read_slots()
+	metric_b = read_metrics()
+
+	# compute scaled metrics for measurement a
+	retiring_slots_a = GET_METRIC(metric_a, 0) * slots_a
+	bad_spec_slots_a = GET_METRIC(metric_a, 1) * slots_a
+	fe_bound_slots_a = GET_METRIC(metric_a, 2) * slots_a
+	be_bound_slots_a = GET_METRIC(metric_a, 3) * slots_a
+
+	# compute delta scaled metrics between b and a
+	retiring_slots = GET_METRIC(metric_b, 0) * slots_b - retiring_slots_a
+	bad_spec_slots = GET_METRIC(metric_b, 1) * slots_b - bad_spec_slots_a
+	fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a
+	be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a
+
+Later the individual ratios for the measurement period can be recreated
+from these counts.
+
+	slots_delta = slots_b - slots_a
+	retiring_ratio = (float)retiring_slots / slots_delta
+	bad_spec_ratio = (float)bad_spec_slots / slots_delta
+	fe_bound_ratio = (float)fe_bound_slots / slots_delta
+	be_bound_ratio = (float)be_bound_slots / slota_delta
+
+	printf("Retiring %.2f%% Bad Speculation %.2f%% FE Bound %.2f%% BE Bound %.2f%%\n",
+		retiring_ratio * 100.,
+		bad_spec_ratio * 100.,
+		fe_bound_ratio * 100.,
+		be_bound_ratio * 100.);
+
+Resetting metrics counters
+==========================
+
+Since the individual metrics are only 8bit they lose precision for
+short regions over time because the number of cycles covered by each
+fraction bit shrinks. So the counters need to be reset regularly.
+
+When using the kernel perf API the kernel resets on every read.
+So as long as the reading is at reasonable intervals (every few
+seconds) the precision is good.
+
+When using perf stat it is recommended to always use the -I option,
+with no longer interval than a few seconds
+
+	perf stat -I 1000 --topdown ...
+
+For user programs using RDPMC directly the counter can
+be reset explicitly using ioctl:
+
+	ioctl(perf_fd, PERF_EVENT_IOC_RESET, 0);
+
+This "opens" a new measurement period.
+
+A program using RDPMC for TopDown should schedule such a reset
+regularly, as in every few seconds.
+
+Limits on Ice Lake
+==================
+
+Four pseudo TopDown metric events are exposed for the end-users,
+topdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound.
+They can be used to collect the TopDown value under the following
+rules:
+- All the TopDown metric events must be in a group with the SLOTS event.
+- The SLOTS event must be the leader of the group.
+- The PERF_FORMAT_GROUP flag must be applied for each TopDown metric
+  events
+
+The SLOTS event and the TopDown metric events can be counting members of
+a sampling read group. Since the SLOTS event must be the leader of a TopDown
+group, the second event of the group is the sampling event.
+For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S'
+
+
+[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
+[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
+[3] https://software.intel.com/en-us/intel-vtune-amplifier-xe
+[4] https://github.com/andikleen/pmu-tools/tree/master/jevents
+[5] https://sites.google.com/site/analysismethods/yasin-pubs
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/4] perf stat: Support new per thread TopDown metrics
  2020-08-20 16:45 ` [PATCH 3/4] perf stat: Support new per thread TopDown metrics kan.liang
@ 2020-08-20 20:05   ` Andi Kleen
  0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2020-08-20 20:05 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, peterz, mingo, jolsa, namhyung, linux-kernel, eranian

> +				fprintf(stat_config.output,
> +					"Topdown accuracy may decreases when measuring long period.\n"

"may decrease" ... "long periods".

(s needs to move to the end)


-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool)
  2020-08-20 16:45 [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool) kan.liang
                   ` (3 preceding siblings ...)
  2020-08-20 16:45 ` [PATCH 4/4] perf, tools: Add documentation for topdown metrics kan.liang
@ 2020-08-26 15:54 ` Jiri Olsa
  2020-08-27 12:39   ` Liang, Kan
  4 siblings, 1 reply; 8+ messages in thread
From: Jiri Olsa @ 2020-08-26 15:54 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, peterz, mingo, namhyung, linux-kernel, eranian, ak

On Thu, Aug 20, 2020 at 09:45:28AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The kernel patches have been merged into the tip's perf/core branch.
> The patch set is on top of commit 2cb5383b30d4 ("perf/x86/intel: Support
> per-thread RDPMC TopDown metrics") of the tip's perf/core branch.
> 
> The changes for the perf tool include:
> - Extend --topdown option to support per thread TopDown metrics
> - Support sample-read topdown metric group
> - Add a complete document for the TopDown usage.
> 
> Ice Lake has support for measuring the level 1 TopDown metrics
> directly in hardware. This is implemented by an additional METRICS
> register, and a new Fixed Counter 3 that measures pipeline SLOTS.
> 
> New in Icelake
> - Do not require generic counters. This allows to collect TopDown always
>   in addition to other events.
> - Measuring TopDown per thread/process instead of only per core
> 
> For the Ice Lake implementation of performance metrics, the values in
> PERF_METRICS MSR are derived from fixed counter 3. Software should start
> both registers, PERF_METRICS and fixed counter 3, from zero.
> Additionally, software is recommended to periodically clear both
> registers in order to maintain accurate measurements. The latter is
> required for certain scenarios that involve sampling metrics at high
> rates. Software should always write fixed counter 3 before write to
> PERF_METRICS.
> 
> IA32_PERF_GLOBAL_STATUS. OVF_PERF_METRICS[48]: If this bit is set,
> it indicates that some PERF_METRICS-related counter has overflowed and
> a PMI is triggered. Software has to synchronize, e.g. re-start,
> PERF_METRICS as well as fixed counter 3. Otherwise, PERF_METRICS may
> return invalid values.
> 
> Limitation
> - To get accurate result and avoid reading the METRICS register multiple
>   times, the TopDown metrics events and SLOTS event have to be in the
>   same group.
> - METRICS and SLOTS registers have to be cleared after each read by SW.
>   That is to prevent the lose of precision.
> - Cannot do sampling read SLOTS and TopDown metric events
> 
> Please refer SDM Vol3, 18.3.9.3 Performance Metrics for the details of
> TopDown metrics.
> 
> Andi Kleen (2):
>   perf stat: Support new per thread TopDown metrics
>   perf, tools: Add documentation for topdown metrics
> 
> Kan Liang (2):
>   perf tools: Rename group to topdown
>   perf record: Support sample-read topdown metric group

I don't have Ice lake to actualy check, but it looks good

Acked-by: Jiri Olsa <jolsa@redhat.com>

thanks,
jirka


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool)
  2020-08-26 15:54 ` [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool) Jiri Olsa
@ 2020-08-27 12:39   ` Liang, Kan
  0 siblings, 0 replies; 8+ messages in thread
From: Liang, Kan @ 2020-08-27 12:39 UTC (permalink / raw)
  To: Jiri Olsa, acme; +Cc: peterz, mingo, namhyung, linux-kernel, eranian, ak



On 8/26/2020 11:54 AM, Jiri Olsa wrote:
> On Thu, Aug 20, 2020 at 09:45:28AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The kernel patches have been merged into the tip's perf/core branch.
>> The patch set is on top of commit 2cb5383b30d4 ("perf/x86/intel: Support
>> per-thread RDPMC TopDown metrics") of the tip's perf/core branch.
>>
>> The changes for the perf tool include:
>> - Extend --topdown option to support per thread TopDown metrics
>> - Support sample-read topdown metric group
>> - Add a complete document for the TopDown usage.
>>
>> Ice Lake has support for measuring the level 1 TopDown metrics
>> directly in hardware. This is implemented by an additional METRICS
>> register, and a new Fixed Counter 3 that measures pipeline SLOTS.
>>
>> New in Icelake
>> - Do not require generic counters. This allows to collect TopDown always
>>    in addition to other events.
>> - Measuring TopDown per thread/process instead of only per core
>>
>> For the Ice Lake implementation of performance metrics, the values in
>> PERF_METRICS MSR are derived from fixed counter 3. Software should start
>> both registers, PERF_METRICS and fixed counter 3, from zero.
>> Additionally, software is recommended to periodically clear both
>> registers in order to maintain accurate measurements. The latter is
>> required for certain scenarios that involve sampling metrics at high
>> rates. Software should always write fixed counter 3 before write to
>> PERF_METRICS.
>>
>> IA32_PERF_GLOBAL_STATUS. OVF_PERF_METRICS[48]: If this bit is set,
>> it indicates that some PERF_METRICS-related counter has overflowed and
>> a PMI is triggered. Software has to synchronize, e.g. re-start,
>> PERF_METRICS as well as fixed counter 3. Otherwise, PERF_METRICS may
>> return invalid values.
>>
>> Limitation
>> - To get accurate result and avoid reading the METRICS register multiple
>>    times, the TopDown metrics events and SLOTS event have to be in the
>>    same group.
>> - METRICS and SLOTS registers have to be cleared after each read by SW.
>>    That is to prevent the lose of precision.
>> - Cannot do sampling read SLOTS and TopDown metric events
>>
>> Please refer SDM Vol3, 18.3.9.3 Performance Metrics for the details of
>> TopDown metrics.
>>
>> Andi Kleen (2):
>>    perf stat: Support new per thread TopDown metrics
>>    perf, tools: Add documentation for topdown metrics
>>
>> Kan Liang (2):
>>    perf tools: Rename group to topdown
>>    perf record: Support sample-read topdown metric group
> 
> I don't have Ice lake to actualy check, but it looks good
> 
> Acked-by: Jiri Olsa <jolsa@redhat.com>
> 


Thanks Jirka for the review.

Hi Arnaldo,

Andi has a comment regarding a grammar error in the printf message. I 
once plan to apply a fix in the next version. I'm not sure whether you 
will have more comments for this patch set.
If yes, I will wait for the comments.
If not, should I send a V2 patch set with this fixed?

On 8/20/2020 4:05 PM, Andi Kleen wrote:
 >> +				fprintf(stat_config.output,
 >> +					"Topdown accuracy may decreases when measuring long period.\n"
 > "may decrease" ... "long periods".
 >
 > (s needs to move to the end)


Thanks,
Kan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-08-27 14:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20 16:45 [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool) kan.liang
2020-08-20 16:45 ` [PATCH 1/4] perf tools: Rename group to topdown kan.liang
2020-08-20 16:45 ` [PATCH 2/4] perf record: Support sample-read topdown metric group kan.liang
2020-08-20 16:45 ` [PATCH 3/4] perf stat: Support new per thread TopDown metrics kan.liang
2020-08-20 20:05   ` Andi Kleen
2020-08-20 16:45 ` [PATCH 4/4] perf, tools: Add documentation for topdown metrics kan.liang
2020-08-26 15:54 ` [PATCH 0/4] TopDown metrics support for Ice Lake (perf tool) Jiri Olsa
2020-08-27 12:39   ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).