[PATCH 0/5] Add perf stat default events for hybrid machines

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] Add perf stat default events for hybrid machines
@ 2022-06-07  1:33 zhengjun.xing
  2022-06-07  1:33 ` [PATCH 1/5] perf stat: Revert "perf stat: Add default hybrid events" zhengjun.xing
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: zhengjun.xing @ 2022-06-07  1:33 UTC (permalink / raw)
  To: acme, peterz, mingo, alexander.shishkin, jolsa
  Cc: linux-kernel, linux-perf-users, irogers, adrian.hunter, ak,
	kan.liang, zhengjun.xing

From: Zhengjun Xing <zhengjun.xing@linux.intel.com>

The patch series is to clean up the existing perf stat default and support
the perf metrics Topdown for the p-core PMU in the perf stat default. The
first 4 patches are the clean-up patch and fixing the "--detailed" issue.
The last patch adds support for the perf metrics Topdown, the perf metrics
Topdown support for e-core PMU will be implemented later separately.

Kan Liang (4):
  perf stat: Revert "perf stat: Add default hybrid events"
  perf evsel: Add arch_evsel__hw_name()
  perf evlist: Always use arch_evlist__add_default_attrs()
  perf x86 evlist: Add default hybrid events for perf stat

Zhengjun Xing (1):
  perf stat: Add topdown metrics in the default perf stat on the hybrid
    machine

 tools/perf/arch/x86/util/evlist.c  | 64 +++++++++++++++++++++++++-----
 tools/perf/arch/x86/util/evsel.c   | 20 ++++++++++
 tools/perf/arch/x86/util/topdown.c | 51 ++++++++++++++++++++++++
 tools/perf/arch/x86/util/topdown.h |  1 +
 tools/perf/builtin-stat.c          | 50 ++++-------------------
 tools/perf/util/evlist.c           | 11 +++--
 tools/perf/util/evlist.h           |  9 ++++-
 tools/perf/util/evsel.c            |  7 +++-
 tools/perf/util/evsel.h            |  1 +
 tools/perf/util/stat-display.c     |  2 +-
 tools/perf/util/topdown.c          |  7 ++++
 tools/perf/util/topdown.h          |  3 +-
 12 files changed, 166 insertions(+), 60 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] perf stat: Revert "perf stat: Add default hybrid events"
  2022-06-07  1:33 [PATCH 0/5] Add perf stat default events for hybrid machines zhengjun.xing
@ 2022-06-07  1:33 ` zhengjun.xing
  2022-06-07  1:33 ` [PATCH 2/5] perf evsel: Add arch_evsel__hw_name() zhengjun.xing
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: zhengjun.xing @ 2022-06-07  1:33 UTC (permalink / raw)
  To: acme, peterz, mingo, alexander.shishkin, jolsa
  Cc: linux-kernel, linux-perf-users, irogers, adrian.hunter, ak,
	kan.liang, zhengjun.xing

From: Kan Liang <kan.liang@linux.intel.com>

This reverts commit ac2dc29edd21 ("perf stat: Add default hybrid
events").

Between this patch and the reverted patch, the commit 6c1912898ed2
("perf parse-events: Rename parse_events_error functions") and the
commit 07eafd4e053a ("perf parse-event: Add init and exit to
parse_event_error") clean up the parse_events_error_*() codes. The
related change is also reverted.

The reverted patch is hard to be extended to support new default
events, e.g., Topdown events, and the existing "--detailed" option
on a hybrid platform.

A new solution will be proposed in the following patch to enable the
perf stat default on a hybrid platform.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
---
 tools/perf/builtin-stat.c | 30 ------------------------------
 1 file changed, 30 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 4ce87a8eb7d7..6ac79d95f3b5 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1685,12 +1685,6 @@ static int add_default_attributes(void)
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS	},
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES		},
 
-};
-	struct perf_event_attr default_sw_attrs[] = {
-  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK		},
-  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES	},
-  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS		},
-  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS		},
 };
 
 /*
@@ -1947,30 +1941,6 @@ static int add_default_attributes(void)
 	}
 
 	if (!evsel_list->core.nr_entries) {
-		if (perf_pmu__has_hybrid()) {
-			struct parse_events_error errinfo;
-			const char *hybrid_str = "cycles,instructions,branches,branch-misses";
-
-			if (target__has_cpu(&target))
-				default_sw_attrs[0].config = PERF_COUNT_SW_CPU_CLOCK;
-
-			if (evlist__add_default_attrs(evsel_list,
-						      default_sw_attrs) < 0) {
-				return -1;
-			}
-
-			parse_events_error__init(&errinfo);
-			err = parse_events(evsel_list, hybrid_str, &errinfo);
-			if (err) {
-				fprintf(stderr,
-					"Cannot set up hybrid events %s: %d\n",
-					hybrid_str, err);
-				parse_events_error__print(&errinfo, hybrid_str);
-			}
-			parse_events_error__exit(&errinfo);
-			return err ? -1 : 0;
-		}
-
 		if (target__has_cpu(&target))
 			default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] perf evsel: Add arch_evsel__hw_name()
  2022-06-07  1:33 [PATCH 0/5] Add perf stat default events for hybrid machines zhengjun.xing
  2022-06-07  1:33 ` [PATCH 1/5] perf stat: Revert "perf stat: Add default hybrid events" zhengjun.xing
@ 2022-06-07  1:33 ` zhengjun.xing
  2022-06-07  1:33 ` [PATCH 3/5] perf evlist: Always use arch_evlist__add_default_attrs() zhengjun.xing
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: zhengjun.xing @ 2022-06-07  1:33 UTC (permalink / raw)
  To: acme, peterz, mingo, alexander.shishkin, jolsa
  Cc: linux-kernel, linux-perf-users, irogers, adrian.hunter, ak,
	kan.liang, zhengjun.xing

From: Kan Liang <kan.liang@linux.intel.com>

The commit 55bcf6ef314a ("perf: Extend PERF_TYPE_HARDWARE and
PERF_TYPE_HW_CACHE") extends the two types to become PMU aware types for
a hybrid system. However, current evsel__hw_name doesn't take the PMU
type into account. It mistakenly returns the "unknown-hardware" for the
hardware event with a specific PMU type.

Add an Arch specific arch_evsel__hw_name() to specially handle the PMU
aware hardware event.

Currently, the extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE is only
supported by X86. Only implement the specific arch_evsel__hw_name() for
X86 in the patch.

Nothing is changed for the other Archs.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
---
 tools/perf/arch/x86/util/evsel.c | 20 ++++++++++++++++++++
 tools/perf/util/evsel.c          |  7 ++++++-
 tools/perf/util/evsel.h          |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
index 3501399cef35..f6feb61d98a0 100644
--- a/tools/perf/arch/x86/util/evsel.c
+++ b/tools/perf/arch/x86/util/evsel.c
@@ -61,3 +61,23 @@ bool arch_evsel__must_be_in_group(const struct evsel *evsel)
 		(strcasestr(evsel->name, "slots") ||
 		 strcasestr(evsel->name, "topdown"));
 }
+
+int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
+{
+	u64 event = evsel->core.attr.config & PERF_HW_EVENT_MASK;
+	u64 pmu = evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT;
+	const char *event_name;
+
+	if (event < PERF_COUNT_HW_MAX && evsel__hw_names[event])
+		event_name = evsel__hw_names[event];
+	else
+		event_name = "unknown-hardware";
+
+	/* The PMU type is not required for the non-hybrid platform. */
+	if (!pmu)
+		return  scnprintf(bf, size, "%s", event_name);
+
+	return scnprintf(bf, size, "%s/%s/",
+			 evsel->pmu_name ? evsel->pmu_name : "cpu",
+			 event_name);
+}
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ce499c5da8d7..782be377208f 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -593,9 +593,14 @@ static int evsel__add_modifiers(struct evsel *evsel, char *bf, size_t size)
 	return r;
 }
 
+int __weak arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
+{
+	return scnprintf(bf, size, "%s", __evsel__hw_name(evsel->core.attr.config));
+}
+
 static int evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
 {
-	int r = scnprintf(bf, size, "%s", __evsel__hw_name(evsel->core.attr.config));
+	int r = arch_evsel__hw_name(evsel, bf, size);
 	return r + evsel__add_modifiers(evsel, bf + r, size - r);
 }
 
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 73ea48e94079..8dd3f04a5bdb 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -271,6 +271,7 @@ extern const char *const evsel__hw_names[PERF_COUNT_HW_MAX];
 extern const char *const evsel__sw_names[PERF_COUNT_SW_MAX];
 extern char *evsel__bpf_counter_events;
 bool evsel__match_bpf_counter_events(const char *name);
+int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size);
 
 int __evsel__hw_cache_type_op_res_name(u8 type, u8 op, u8 result, char *bf, size_t size);
 const char *evsel__name(struct evsel *evsel);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] perf evlist: Always use arch_evlist__add_default_attrs()
  2022-06-07  1:33 [PATCH 0/5] Add perf stat default events for hybrid machines zhengjun.xing
  2022-06-07  1:33 ` [PATCH 1/5] perf stat: Revert "perf stat: Add default hybrid events" zhengjun.xing
  2022-06-07  1:33 ` [PATCH 2/5] perf evsel: Add arch_evsel__hw_name() zhengjun.xing
@ 2022-06-07  1:33 ` zhengjun.xing
  2022-06-07  1:33 ` [PATCH 4/5] perf x86 evlist: Add default hybrid events for perf stat zhengjun.xing
  2022-06-07  1:33 ` [PATCH 5/5] perf stat: Add topdown metrics in the default perf stat on the hybrid machine zhengjun.xing
  4 siblings, 0 replies; 11+ messages in thread
From: zhengjun.xing @ 2022-06-07  1:33 UTC (permalink / raw)
  To: acme, peterz, mingo, alexander.shishkin, jolsa
  Cc: linux-kernel, linux-perf-users, irogers, adrian.hunter, ak,
	kan.liang, zhengjun.xing

From: Kan Liang <kan.liang@linux.intel.com>

Current perf stat uses the evlist__add_default_attrs() to add the
generic default attrs, and uses arch_evlist__add_default_attrs()
to add the Arch specific default attrs, e.g., Topdown for X86.

It works well for the non-hybrid platforms. However, for a hybrid
platform, the hard code generic default attrs don't work.

Uses arch_evlist__add_default_attrs() to replace the
evlist__add_default_attrs(). The arch_evlist__add_default_attrs() is
modified to invoke the same __evlist__add_default_attrs() for the
generic default attrs. No functional change.

Add default_null_attrs[] to indicate the Arch specific attrs.
No functional change for the Arch specific default attrs either.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
---
 tools/perf/arch/x86/util/evlist.c | 7 ++++++-
 tools/perf/builtin-stat.c         | 6 +++++-
 tools/perf/util/evlist.c          | 9 +++++++--
 tools/perf/util/evlist.h          | 7 +++++--
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index 68f681ad54c1..777bdf182a58 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -8,8 +8,13 @@
 #define TOPDOWN_L1_EVENTS	"{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
 #define TOPDOWN_L2_EVENTS	"{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
 
-int arch_evlist__add_default_attrs(struct evlist *evlist)
+int arch_evlist__add_default_attrs(struct evlist *evlist,
+				   struct perf_event_attr *attrs,
+				   size_t nr_attrs)
 {
+	if (nr_attrs)
+		return __evlist__add_default_attrs(evlist, attrs, nr_attrs);
+
 	if (!pmu_have_event("cpu", "slots"))
 		return 0;
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6ac79d95f3b5..837c3ca91af1 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1777,6 +1777,9 @@ static int add_default_attributes(void)
 	(PERF_COUNT_HW_CACHE_OP_PREFETCH	<<  8) |
 	(PERF_COUNT_HW_CACHE_RESULT_MISS	<< 16)				},
 };
+
+	struct perf_event_attr default_null_attrs[] = {};
+
 	/* Set attrs if no event is selected and !null_run: */
 	if (stat_config.null_run)
 		return 0;
@@ -1958,7 +1961,8 @@ static int add_default_attributes(void)
 			return -1;
 
 		stat_config.topdown_level = TOPDOWN_MAX_LEVEL;
-		if (arch_evlist__add_default_attrs(evsel_list) < 0)
+		/* Platform specific attrs */
+		if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
 			return -1;
 	}
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 48af7d379d82..efa5f006b5c6 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -342,9 +342,14 @@ int __evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *a
 	return evlist__add_attrs(evlist, attrs, nr_attrs);
 }
 
-__weak int arch_evlist__add_default_attrs(struct evlist *evlist __maybe_unused)
+__weak int arch_evlist__add_default_attrs(struct evlist *evlist,
+					  struct perf_event_attr *attrs,
+					  size_t nr_attrs)
 {
-	return 0;
+	if (!nr_attrs)
+		return 0;
+
+	return __evlist__add_default_attrs(evlist, attrs, nr_attrs);
 }
 
 struct evsel *evlist__find_tracepoint_by_id(struct evlist *evlist, int id)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 1bde9ccf4e7d..129095c0fe6d 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -107,10 +107,13 @@ static inline int evlist__add_default(struct evlist *evlist)
 int __evlist__add_default_attrs(struct evlist *evlist,
 				     struct perf_event_attr *attrs, size_t nr_attrs);
 
+int arch_evlist__add_default_attrs(struct evlist *evlist,
+				   struct perf_event_attr *attrs,
+				   size_t nr_attrs);
+
 #define evlist__add_default_attrs(evlist, array) \
-	__evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
+	arch_evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
 
-int arch_evlist__add_default_attrs(struct evlist *evlist);
 struct evsel *arch_evlist__leader(struct list_head *list);
 
 int evlist__add_dummy(struct evlist *evlist);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] perf x86 evlist: Add default hybrid events for perf stat
  2022-06-07  1:33 [PATCH 0/5] Add perf stat default events for hybrid machines zhengjun.xing
                   ` (2 preceding siblings ...)
  2022-06-07  1:33 ` [PATCH 3/5] perf evlist: Always use arch_evlist__add_default_attrs() zhengjun.xing
@ 2022-06-07  1:33 ` zhengjun.xing
  2022-06-09  0:04   ` Namhyung Kim
  2022-06-07  1:33 ` [PATCH 5/5] perf stat: Add topdown metrics in the default perf stat on the hybrid machine zhengjun.xing
  4 siblings, 1 reply; 11+ messages in thread
From: zhengjun.xing @ 2022-06-07  1:33 UTC (permalink / raw)
  To: acme, peterz, mingo, alexander.shishkin, jolsa
  Cc: linux-kernel, linux-perf-users, irogers, adrian.hunter, ak,
	kan.liang, zhengjun.xing

From: Kan Liang <kan.liang@linux.intel.com>

Provide a new solution to replace the reverted commit ac2dc29edd21
("perf stat: Add default hybrid events").

For the default software attrs, nothing is changed.
For the default hardware attrs, create a new evsel for each hybrid pmu.

With the new solution, adding a new default attr will not require the
special support for the hybrid platform anymore.

Also, the "--detailed" is supported on the hybrid platform

With the patch,

./perf stat -a -ddd sleep 1

 Performance counter stats for 'system wide':

       32,231.06 msec cpu-clock                 #   32.056 CPUs utilized
             529      context-switches          #   16.413 /sec
              32      cpu-migrations            #    0.993 /sec
              69      page-faults               #    2.141 /sec
     176,754,151      cpu_core/cycles/          #    5.484 M/sec          (41.65%)
     161,695,280      cpu_atom/cycles/          #    5.017 M/sec          (49.92%)
      48,595,992      cpu_core/instructions/    #    1.508 M/sec          (49.98%)
      32,363,337      cpu_atom/instructions/    #    1.004 M/sec          (58.26%)
      10,088,639      cpu_core/branches/        #  313.010 K/sec          (58.31%)
       6,390,582      cpu_atom/branches/        #  198.274 K/sec          (58.26%)
         846,201      cpu_core/branch-misses/   #   26.254 K/sec          (66.65%)
         676,477      cpu_atom/branch-misses/   #   20.988 K/sec          (58.27%)
      14,290,070      cpu_core/L1-dcache-loads/ #  443.363 K/sec          (66.66%)
       9,983,532      cpu_atom/L1-dcache-loads/ #  309.749 K/sec          (58.27%)
         740,725      cpu_core/L1-dcache-load-misses/ #   22.982 K/sec    (66.66%)
 <not supported>      cpu_atom/L1-dcache-load-misses/
         480,441      cpu_core/LLC-loads/       #   14.906 K/sec          (66.67%)
         326,570      cpu_atom/LLC-loads/       #   10.132 K/sec          (58.27%)
             329      cpu_core/LLC-load-misses/ #   10.208 /sec           (66.68%)
               0      cpu_atom/LLC-load-misses/ #    0.000 /sec           (58.32%)
 <not supported>      cpu_core/L1-icache-loads/
      21,982,491      cpu_atom/L1-icache-loads/ #  682.028 K/sec          (58.43%)
       4,493,189      cpu_core/L1-icache-load-misses/ #  139.406 K/sec    (33.34%)
       4,711,404      cpu_atom/L1-icache-load-misses/ #  146.176 K/sec    (50.08%)
      13,713,090      cpu_core/dTLB-loads/      #  425.462 K/sec          (33.34%)
       9,384,727      cpu_atom/dTLB-loads/      #  291.170 K/sec          (50.08%)
         157,387      cpu_core/dTLB-load-misses/ #    4.883 K/sec         (33.33%)
         108,328      cpu_atom/dTLB-load-misses/ #    3.361 K/sec         (50.08%)
 <not supported>      cpu_core/iTLB-loads/
 <not supported>      cpu_atom/iTLB-loads/
          37,655      cpu_core/iTLB-load-misses/ #    1.168 K/sec         (33.32%)
          61,661      cpu_atom/iTLB-load-misses/ #    1.913 K/sec         (50.03%)
 <not supported>      cpu_core/L1-dcache-prefetches/
 <not supported>      cpu_atom/L1-dcache-prefetches/
 <not supported>      cpu_core/L1-dcache-prefetch-misses/
 <not supported>      cpu_atom/L1-dcache-prefetch-misses/

       1.005466919 seconds time elapsed

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
---
 tools/perf/arch/x86/util/evlist.c | 52 ++++++++++++++++++++++++++++++-
 tools/perf/util/evlist.c          |  2 +-
 tools/perf/util/evlist.h          |  2 ++
 3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index 777bdf182a58..1b3f9e1a2287 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -4,16 +4,66 @@
 #include "util/evlist.h"
 #include "util/parse-events.h"
 #include "topdown.h"
+#include "util/event.h"
+#include "util/pmu-hybrid.h"
 
 #define TOPDOWN_L1_EVENTS	"{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
 #define TOPDOWN_L2_EVENTS	"{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
 
+static int ___evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs)
+{
+	struct perf_cpu_map *cpus;
+	struct evsel *evsel, *n;
+	struct perf_pmu *pmu;
+	LIST_HEAD(head);
+	size_t i, j = 0;
+
+	for (i = 0; i < nr_attrs; i++)
+		event_attr_init(attrs + i);
+
+	if (!perf_pmu__has_hybrid())
+		return evlist__add_attrs(evlist, attrs, nr_attrs);
+
+	for (i = 0; i < nr_attrs; i++) {
+		if (attrs[i].type == PERF_TYPE_SOFTWARE) {
+			evsel = evsel__new_idx(attrs + i, evlist->core.nr_entries + j);
+			if (evsel == NULL)
+				goto out_delete_partial_list;
+			j++;
+			list_add_tail(&evsel->core.node, &head);
+			continue;
+		}
+
+		perf_pmu__for_each_hybrid_pmu(pmu) {
+			evsel = evsel__new_idx(attrs + i, evlist->core.nr_entries + j);
+			if (evsel == NULL)
+				goto out_delete_partial_list;
+			j++;
+			evsel->core.attr.config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
+			cpus = perf_cpu_map__get(pmu->cpus);
+			evsel->core.cpus = cpus;
+			evsel->core.own_cpus = perf_cpu_map__get(cpus);
+			evsel->pmu_name = strdup(pmu->name);
+			list_add_tail(&evsel->core.node, &head);
+		}
+	}
+
+	evlist__splice_list_tail(evlist, &head);
+
+	return 0;
+
+out_delete_partial_list:
+	__evlist__for_each_entry_safe(&head, n, evsel)
+		evsel__delete(evsel);
+	return -1;
+}
+
 int arch_evlist__add_default_attrs(struct evlist *evlist,
 				   struct perf_event_attr *attrs,
 				   size_t nr_attrs)
 {
 	if (nr_attrs)
-		return __evlist__add_default_attrs(evlist, attrs, nr_attrs);
+		return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
 
 	if (!pmu_have_event("cpu", "slots"))
 		return 0;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index efa5f006b5c6..5ff4b9504828 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -309,7 +309,7 @@ struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide)
 	return evsel;
 }
 
-static int evlist__add_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs)
+int evlist__add_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs)
 {
 	struct evsel *evsel, *n;
 	LIST_HEAD(head);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 129095c0fe6d..351ba2887a79 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -104,6 +104,8 @@ static inline int evlist__add_default(struct evlist *evlist)
 	return __evlist__add_default(evlist, true);
 }
 
+int evlist__add_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs);
+
 int __evlist__add_default_attrs(struct evlist *evlist,
 				     struct perf_event_attr *attrs, size_t nr_attrs);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] perf stat: Add topdown metrics in the default perf stat on the hybrid machine
  2022-06-07  1:33 [PATCH 0/5] Add perf stat default events for hybrid machines zhengjun.xing
                   ` (3 preceding siblings ...)
  2022-06-07  1:33 ` [PATCH 4/5] perf x86 evlist: Add default hybrid events for perf stat zhengjun.xing
@ 2022-06-07  1:33 ` zhengjun.xing
  2022-06-09  0:09   ` Namhyung Kim
  4 siblings, 1 reply; 11+ messages in thread
From: zhengjun.xing @ 2022-06-07  1:33 UTC (permalink / raw)
  To: acme, peterz, mingo, alexander.shishkin, jolsa
  Cc: linux-kernel, linux-perf-users, irogers, adrian.hunter, ak,
	kan.liang, zhengjun.xing

From: Zhengjun Xing <zhengjun.xing@linux.intel.com>

Topdown metrics are missed in the default perf stat on the hybrid machine,
add Topdown metrics in default perf stat for hybrid systems.

Currently, we support the perf metrics Topdown for the p-core PMU in the
perf stat default, the perf metrics Topdown support for e-core PMU will be
implemented later separately. Refactor the code adds two x86 specific
functions. Widen the size of the event name column by 7 chars, so that all
metrics after the "#" become aligned again.

The perf metrics topdown feature is supported on the cpu_core of ADL. The
dedicated perf metrics counter and the fixed counter 3 are used for the
topdown events. Adding the topdown metrics doesn't trigger multiplexing.

Before:

 # ./perf  stat  -a true

 Performance counter stats for 'system wide':

             53.70 msec cpu-clock                 #   25.736 CPUs utilized
                80      context-switches          #    1.490 K/sec
                24      cpu-migrations            #  446.951 /sec
                52      page-faults               #  968.394 /sec
         2,788,555      cpu_core/cycles/          #   51.931 M/sec
           851,129      cpu_atom/cycles/          #   15.851 M/sec
         2,974,030      cpu_core/instructions/    #   55.385 M/sec
           416,919      cpu_atom/instructions/    #    7.764 M/sec
           586,136      cpu_core/branches/        #   10.916 M/sec
            79,872      cpu_atom/branches/        #    1.487 M/sec
            14,220      cpu_core/branch-misses/   #  264.819 K/sec
             7,691      cpu_atom/branch-misses/   #  143.229 K/sec

       0.002086438 seconds time elapsed

After:

 # ./perf stat  -a true

 Performance counter stats for 'system wide':

             61.39 msec cpu-clock                        #   24.874 CPUs utilized
                76      context-switches                 #    1.238 K/sec
                24      cpu-migrations                   #  390.968 /sec
                52      page-faults                      #  847.097 /sec
         2,753,695      cpu_core/cycles/                 #   44.859 M/sec
           903,899      cpu_atom/cycles/                 #   14.725 M/sec
         2,927,529      cpu_core/instructions/           #   47.690 M/sec
           428,498      cpu_atom/instructions/           #    6.980 M/sec
           581,299      cpu_core/branches/               #    9.470 M/sec
            83,409      cpu_atom/branches/               #    1.359 M/sec
            13,641      cpu_core/branch-misses/          #  222.216 K/sec
             8,008      cpu_atom/branch-misses/          #  130.453 K/sec
        14,761,308      cpu_core/slots/                  #  240.466 M/sec
         3,288,625      cpu_core/topdown-retiring/       #     22.3% retiring
         1,323,323      cpu_core/topdown-bad-spec/       #      9.0% bad speculation
         5,477,470      cpu_core/topdown-fe-bound/       #     37.1% frontend bound
         4,679,199      cpu_core/topdown-be-bound/       #     31.7% backend bound
           646,194      cpu_core/topdown-heavy-ops/      #      4.4% heavy operations       #     17.9% light operations
         1,244,999      cpu_core/topdown-br-mispredict/  #      8.4% branch mispredict      #      0.5% machine clears
         3,891,800      cpu_core/topdown-fetch-lat/      #     26.4% fetch latency          #     10.7% fetch bandwidth
         1,879,034      cpu_core/topdown-mem-bound/      #     12.7% memory bound           #     19.0% Core bound

       0.002467839 seconds time elapsed

Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/arch/x86/util/evlist.c  | 13 ++------
 tools/perf/arch/x86/util/topdown.c | 51 ++++++++++++++++++++++++++++++
 tools/perf/arch/x86/util/topdown.h |  1 +
 tools/perf/builtin-stat.c          | 14 ++------
 tools/perf/util/stat-display.c     |  2 +-
 tools/perf/util/topdown.c          |  7 ++++
 tools/perf/util/topdown.h          |  3 +-
 7 files changed, 66 insertions(+), 25 deletions(-)

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index 1b3f9e1a2287..883559064818 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -3,12 +3,9 @@
 #include "util/pmu.h"
 #include "util/evlist.h"
 #include "util/parse-events.h"
-#include "topdown.h"
 #include "util/event.h"
 #include "util/pmu-hybrid.h"
-
-#define TOPDOWN_L1_EVENTS	"{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
-#define TOPDOWN_L2_EVENTS	"{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
+#include "topdown.h"
 
 static int ___evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs)
 {
@@ -65,13 +62,7 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
 	if (nr_attrs)
 		return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
 
-	if (!pmu_have_event("cpu", "slots"))
-		return 0;
-
-	if (pmu_have_event("cpu", "topdown-heavy-ops"))
-		return parse_events(evlist, TOPDOWN_L2_EVENTS, NULL);
-	else
-		return parse_events(evlist, TOPDOWN_L1_EVENTS, NULL);
+	return topdown_parse_events(evlist);
 }
 
 struct evsel *arch_evlist__leader(struct list_head *list)
diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
index f81a7cfe4d63..ba66e43a6b2a 100644
--- a/tools/perf/arch/x86/util/topdown.c
+++ b/tools/perf/arch/x86/util/topdown.c
@@ -3,9 +3,17 @@
 #include "api/fs/fs.h"
 #include "util/pmu.h"
 #include "util/topdown.h"
+#include "util/evlist.h"
+#include "util/debug.h"
+#include "util/pmu-hybrid.h"
 #include "topdown.h"
 #include "evsel.h"
 
+#define TOPDOWN_L1_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
+#define TOPDOWN_L1_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/}"
+#define TOPDOWN_L2_EVENTS       "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
+#define TOPDOWN_L2_EVENTS_CORE  "{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}"
+
 /* Check whether there is a PMU which supports the perf metrics. */
 bool topdown_sys_has_perf_metrics(void)
 {
@@ -73,3 +81,46 @@ bool arch_topdown_sample_read(struct evsel *leader)
 
 	return false;
 }
+
+const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
+{
+	const char *pmu_name = "cpu";
+
+	if (perf_pmu__has_hybrid()) {
+		if (!evlist->hybrid_pmu_name) {
+			if (warn)
+				pr_warning
+				    ("WARNING: default to use cpu_core topdown events\n");
+			evlist->hybrid_pmu_name =
+			    perf_pmu__hybrid_type_to_pmu("core");
+		}
+
+		pmu_name = evlist->hybrid_pmu_name;
+	}
+	return pmu_name;
+}
+
+int topdown_parse_events(struct evlist *evlist)
+{
+	const char *topdown_events;
+	const char *pmu_name;
+
+	if (!topdown_sys_has_perf_metrics())
+		return 0;
+
+	pmu_name = arch_get_topdown_pmu_name(evlist, false);
+
+	if (pmu_have_event(pmu_name, "topdown-heavy-ops")) {
+		if (!strcmp(pmu_name, "cpu_core"))
+			topdown_events = TOPDOWN_L2_EVENTS_CORE;
+		else
+			topdown_events = TOPDOWN_L2_EVENTS;
+	} else {
+		if (!strcmp(pmu_name, "cpu_core"))
+			topdown_events = TOPDOWN_L1_EVENTS_CORE;
+		else
+			topdown_events = TOPDOWN_L1_EVENTS;
+	}
+
+	return parse_events(evlist, topdown_events, NULL);
+}
diff --git a/tools/perf/arch/x86/util/topdown.h b/tools/perf/arch/x86/util/topdown.h
index 46bf9273e572..7eb81f042838 100644
--- a/tools/perf/arch/x86/util/topdown.h
+++ b/tools/perf/arch/x86/util/topdown.h
@@ -3,5 +3,6 @@
 #define _TOPDOWN_H 1
 
 bool topdown_sys_has_perf_metrics(void);
+int topdown_parse_events(struct evlist *evlist);
 
 #endif
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 837c3ca91af1..c6b68be78f8c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -71,6 +71,7 @@
 #include "util/bpf_counter.h"
 #include "util/iostat.h"
 #include "util/pmu-hybrid.h"
+ #include "util/topdown.h"
 #include "asm/bug.h"
 
 #include <linux/time64.h>
@@ -1858,22 +1859,11 @@ static int add_default_attributes(void)
 		unsigned int max_level = 1;
 		char *str = NULL;
 		bool warn = false;
-		const char *pmu_name = "cpu";
+		const char *pmu_name = arch_get_topdown_pmu_name(evsel_list, true);
 
 		if (!force_metric_only)
 			stat_config.metric_only = true;
 
-		if (perf_pmu__has_hybrid()) {
-			if (!evsel_list->hybrid_pmu_name) {
-				pr_warning("WARNING: default to use cpu_core topdown events\n");
-				evsel_list->hybrid_pmu_name = perf_pmu__hybrid_type_to_pmu("core");
-			}
-
-			pmu_name = evsel_list->hybrid_pmu_name;
-			if (!pmu_name)
-				return -1;
-		}
-
 		if (pmu_have_event(pmu_name, topdown_metric_L2_attrs[5])) {
 			metric_attrs = topdown_metric_L2_attrs;
 			max_level = 2;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 606f09b09226..44045565c8f8 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -374,7 +374,7 @@ static void abs_printout(struct perf_stat_config *config,
 			config->csv_output ? 0 : config->unit_width,
 			evsel->unit, config->csv_sep);
 
-	fprintf(output, "%-*s", config->csv_output ? 0 : 25, evsel__name(evsel));
+	fprintf(output, "%-*s", config->csv_output ? 0 : 32, evsel__name(evsel));
 
 	print_cgroup(config, evsel);
 }
diff --git a/tools/perf/util/topdown.c b/tools/perf/util/topdown.c
index a369f84ceb6a..1090841550f7 100644
--- a/tools/perf/util/topdown.c
+++ b/tools/perf/util/topdown.c
@@ -65,3 +65,10 @@ __weak bool arch_topdown_sample_read(struct evsel *leader __maybe_unused)
 {
 	return false;
 }
+
+__weak const char *arch_get_topdown_pmu_name(struct evlist *evlist
+					     __maybe_unused,
+					     bool warn __maybe_unused)
+{
+	return "cpu";
+}
diff --git a/tools/perf/util/topdown.h b/tools/perf/util/topdown.h
index 118e75281f93..f9531528c559 100644
--- a/tools/perf/util/topdown.h
+++ b/tools/perf/util/topdown.h
@@ -2,11 +2,12 @@
 #ifndef TOPDOWN_H
 #define TOPDOWN_H 1
 #include "evsel.h"
+#include "evlist.h"
 
 bool arch_topdown_check_group(bool *warn);
 void arch_topdown_group_warn(void);
 bool arch_topdown_sample_read(struct evsel *leader);
-
+const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn);
 int topdown_filter_events(const char **attr, char **str, bool use_group,
 			  const char *pmu_name);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] perf x86 evlist: Add default hybrid events for perf stat
  2022-06-07  1:33 ` [PATCH 4/5] perf x86 evlist: Add default hybrid events for perf stat zhengjun.xing
@ 2022-06-09  0:04   ` Namhyung Kim
  2022-06-09 12:47     ` Liang, Kan
  0 siblings, 1 reply; 11+ messages in thread
From: Namhyung Kim @ 2022-06-09  0:04 UTC (permalink / raw)
  To: Xing Zhengjun
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	alexander.shishkin, Jiri Olsa, linux-kernel, linux-perf-users,
	Ian Rogers, Adrian Hunter, Andi Kleen, Kan Liang

Hello,

On Tue, Jun 7, 2022 at 12:31 AM <zhengjun.xing@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Provide a new solution to replace the reverted commit ac2dc29edd21
> ("perf stat: Add default hybrid events").
>
> For the default software attrs, nothing is changed.
> For the default hardware attrs, create a new evsel for each hybrid pmu.
>
> With the new solution, adding a new default attr will not require the
> special support for the hybrid platform anymore.
>
> Also, the "--detailed" is supported on the hybrid platform
>
> With the patch,
>
> ./perf stat -a -ddd sleep 1
>
>  Performance counter stats for 'system wide':
>
>        32,231.06 msec cpu-clock                 #   32.056 CPUs utilized
>              529      context-switches          #   16.413 /sec
>               32      cpu-migrations            #    0.993 /sec
>               69      page-faults               #    2.141 /sec
>      176,754,151      cpu_core/cycles/          #    5.484 M/sec          (41.65%)
>      161,695,280      cpu_atom/cycles/          #    5.017 M/sec          (49.92%)
>       48,595,992      cpu_core/instructions/    #    1.508 M/sec          (49.98%)
>       32,363,337      cpu_atom/instructions/    #    1.004 M/sec          (58.26%)
>       10,088,639      cpu_core/branches/        #  313.010 K/sec          (58.31%)
>        6,390,582      cpu_atom/branches/        #  198.274 K/sec          (58.26%)
>          846,201      cpu_core/branch-misses/   #   26.254 K/sec          (66.65%)
>          676,477      cpu_atom/branch-misses/   #   20.988 K/sec          (58.27%)
>       14,290,070      cpu_core/L1-dcache-loads/ #  443.363 K/sec          (66.66%)
>        9,983,532      cpu_atom/L1-dcache-loads/ #  309.749 K/sec          (58.27%)
>          740,725      cpu_core/L1-dcache-load-misses/ #   22.982 K/sec    (66.66%)
>  <not supported>      cpu_atom/L1-dcache-load-misses/
>          480,441      cpu_core/LLC-loads/       #   14.906 K/sec          (66.67%)
>          326,570      cpu_atom/LLC-loads/       #   10.132 K/sec          (58.27%)
>              329      cpu_core/LLC-load-misses/ #   10.208 /sec           (66.68%)
>                0      cpu_atom/LLC-load-misses/ #    0.000 /sec           (58.32%)
>  <not supported>      cpu_core/L1-icache-loads/
>       21,982,491      cpu_atom/L1-icache-loads/ #  682.028 K/sec          (58.43%)
>        4,493,189      cpu_core/L1-icache-load-misses/ #  139.406 K/sec    (33.34%)
>        4,711,404      cpu_atom/L1-icache-load-misses/ #  146.176 K/sec    (50.08%)
>       13,713,090      cpu_core/dTLB-loads/      #  425.462 K/sec          (33.34%)
>        9,384,727      cpu_atom/dTLB-loads/      #  291.170 K/sec          (50.08%)
>          157,387      cpu_core/dTLB-load-misses/ #    4.883 K/sec         (33.33%)
>          108,328      cpu_atom/dTLB-load-misses/ #    3.361 K/sec         (50.08%)
>  <not supported>      cpu_core/iTLB-loads/
>  <not supported>      cpu_atom/iTLB-loads/
>           37,655      cpu_core/iTLB-load-misses/ #    1.168 K/sec         (33.32%)
>           61,661      cpu_atom/iTLB-load-misses/ #    1.913 K/sec         (50.03%)
>  <not supported>      cpu_core/L1-dcache-prefetches/
>  <not supported>      cpu_atom/L1-dcache-prefetches/
>  <not supported>      cpu_core/L1-dcache-prefetch-misses/
>  <not supported>      cpu_atom/L1-dcache-prefetch-misses/
>
>        1.005466919 seconds time elapsed
>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
> ---
>  tools/perf/arch/x86/util/evlist.c | 52 ++++++++++++++++++++++++++++++-
>  tools/perf/util/evlist.c          |  2 +-
>  tools/perf/util/evlist.h          |  2 ++
>  3 files changed, 54 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> index 777bdf182a58..1b3f9e1a2287 100644
> --- a/tools/perf/arch/x86/util/evlist.c
> +++ b/tools/perf/arch/x86/util/evlist.c
> @@ -4,16 +4,66 @@
>  #include "util/evlist.h"
>  #include "util/parse-events.h"
>  #include "topdown.h"
> +#include "util/event.h"
> +#include "util/pmu-hybrid.h"
>
>  #define TOPDOWN_L1_EVENTS      "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
>  #define TOPDOWN_L2_EVENTS      "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
>
> +static int ___evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs)
> +{
> +       struct perf_cpu_map *cpus;
> +       struct evsel *evsel, *n;
> +       struct perf_pmu *pmu;
> +       LIST_HEAD(head);
> +       size_t i, j = 0;
> +
> +       for (i = 0; i < nr_attrs; i++)
> +               event_attr_init(attrs + i);
> +
> +       if (!perf_pmu__has_hybrid())
> +               return evlist__add_attrs(evlist, attrs, nr_attrs);
> +
> +       for (i = 0; i < nr_attrs; i++) {
> +               if (attrs[i].type == PERF_TYPE_SOFTWARE) {
> +                       evsel = evsel__new_idx(attrs + i, evlist->core.nr_entries + j);

Probably no need to calculate index (j) as it's updated
later when it goes to the evlist...


> +                       if (evsel == NULL)
> +                               goto out_delete_partial_list;
> +                       j++;
> +                       list_add_tail(&evsel->core.node, &head);
> +                       continue;
> +               }
> +
> +               perf_pmu__for_each_hybrid_pmu(pmu) {
> +                       evsel = evsel__new_idx(attrs + i, evlist->core.nr_entries + j);
> +                       if (evsel == NULL)
> +                               goto out_delete_partial_list;
> +                       j++;
> +                       evsel->core.attr.config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
> +                       cpus = perf_cpu_map__get(pmu->cpus);
> +                       evsel->core.cpus = cpus;
> +                       evsel->core.own_cpus = perf_cpu_map__get(cpus);
> +                       evsel->pmu_name = strdup(pmu->name);
> +                       list_add_tail(&evsel->core.node, &head);
> +               }
> +       }
> +
> +       evlist__splice_list_tail(evlist, &head);

... like here.

Thanks,
Namhyung


> +
> +       return 0;
> +
> +out_delete_partial_list:
> +       __evlist__for_each_entry_safe(&head, n, evsel)
> +               evsel__delete(evsel);
> +       return -1;
> +}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] perf stat: Add topdown metrics in the default perf stat on the hybrid machine
  2022-06-07  1:33 ` [PATCH 5/5] perf stat: Add topdown metrics in the default perf stat on the hybrid machine zhengjun.xing
@ 2022-06-09  0:09   ` Namhyung Kim
  2022-06-09 10:41     ` Xing Zhengjun
  0 siblings, 1 reply; 11+ messages in thread
From: Namhyung Kim @ 2022-06-09  0:09 UTC (permalink / raw)
  To: Xing Zhengjun
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	alexander.shishkin, Jiri Olsa, linux-kernel, linux-perf-users,
	Ian Rogers, Adrian Hunter, Andi Kleen, Kan Liang

On Tue, Jun 7, 2022 at 1:08 AM <zhengjun.xing@linux.intel.com> wrote:
>
> From: Zhengjun Xing <zhengjun.xing@linux.intel.com>
>
> Topdown metrics are missed in the default perf stat on the hybrid machine,
> add Topdown metrics in default perf stat for hybrid systems.
>
> Currently, we support the perf metrics Topdown for the p-core PMU in the
> perf stat default, the perf metrics Topdown support for e-core PMU will be
> implemented later separately. Refactor the code adds two x86 specific
> functions. Widen the size of the event name column by 7 chars, so that all
> metrics after the "#" become aligned again.
>
> The perf metrics topdown feature is supported on the cpu_core of ADL. The
> dedicated perf metrics counter and the fixed counter 3 are used for the
> topdown events. Adding the topdown metrics doesn't trigger multiplexing.
>
> Before:
>
>  # ./perf  stat  -a true
>
>  Performance counter stats for 'system wide':
>
>              53.70 msec cpu-clock                 #   25.736 CPUs utilized
>                 80      context-switches          #    1.490 K/sec
>                 24      cpu-migrations            #  446.951 /sec
>                 52      page-faults               #  968.394 /sec
>          2,788,555      cpu_core/cycles/          #   51.931 M/sec
>            851,129      cpu_atom/cycles/          #   15.851 M/sec
>          2,974,030      cpu_core/instructions/    #   55.385 M/sec
>            416,919      cpu_atom/instructions/    #    7.764 M/sec
>            586,136      cpu_core/branches/        #   10.916 M/sec
>             79,872      cpu_atom/branches/        #    1.487 M/sec
>             14,220      cpu_core/branch-misses/   #  264.819 K/sec
>              7,691      cpu_atom/branch-misses/   #  143.229 K/sec
>
>        0.002086438 seconds time elapsed
>
> After:
>
>  # ./perf stat  -a true
>
>  Performance counter stats for 'system wide':
>
>              61.39 msec cpu-clock                        #   24.874 CPUs utilized
>                 76      context-switches                 #    1.238 K/sec
>                 24      cpu-migrations                   #  390.968 /sec
>                 52      page-faults                      #  847.097 /sec
>          2,753,695      cpu_core/cycles/                 #   44.859 M/sec
>            903,899      cpu_atom/cycles/                 #   14.725 M/sec
>          2,927,529      cpu_core/instructions/           #   47.690 M/sec
>            428,498      cpu_atom/instructions/           #    6.980 M/sec
>            581,299      cpu_core/branches/               #    9.470 M/sec
>             83,409      cpu_atom/branches/               #    1.359 M/sec
>             13,641      cpu_core/branch-misses/          #  222.216 K/sec
>              8,008      cpu_atom/branch-misses/          #  130.453 K/sec
>         14,761,308      cpu_core/slots/                  #  240.466 M/sec
>          3,288,625      cpu_core/topdown-retiring/       #     22.3% retiring
>          1,323,323      cpu_core/topdown-bad-spec/       #      9.0% bad speculation
>          5,477,470      cpu_core/topdown-fe-bound/       #     37.1% frontend bound
>          4,679,199      cpu_core/topdown-be-bound/       #     31.7% backend bound
>            646,194      cpu_core/topdown-heavy-ops/      #      4.4% heavy operations       #     17.9% light operations
>          1,244,999      cpu_core/topdown-br-mispredict/  #      8.4% branch mispredict      #      0.5% machine clears
>          3,891,800      cpu_core/topdown-fetch-lat/      #     26.4% fetch latency          #     10.7% fetch bandwidth
>          1,879,034      cpu_core/topdown-mem-bound/      #     12.7% memory bound           #     19.0% Core bound
>
>        0.002467839 seconds time elapsed
>
> Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> ---
[SNIP]
> +const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
> +{
> +       const char *pmu_name = "cpu";
> +
> +       if (perf_pmu__has_hybrid()) {
> +               if (!evlist->hybrid_pmu_name) {
> +                       if (warn)
> +                               pr_warning
> +                                   ("WARNING: default to use cpu_core topdown events\n");
> +                       evlist->hybrid_pmu_name =
> +                           perf_pmu__hybrid_type_to_pmu("core");

This doesn't look good.  Please consider reducing the
indent level like returning early as

    if (!perf_pmu__has_hybrid())
        return "cpu";

    if (!evlist->hybrid_pmu_name) {
        ...

Thanks,
Namhyung


> +               }
> +
> +               pmu_name = evlist->hybrid_pmu_name;
> +       }
> +       return pmu_name;
> +}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] perf stat: Add topdown metrics in the default perf stat on the hybrid machine
  2022-06-09  0:09   ` Namhyung Kim
@ 2022-06-09 10:41     ` Xing Zhengjun
  0 siblings, 0 replies; 11+ messages in thread
From: Xing Zhengjun @ 2022-06-09 10:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	alexander.shishkin, Jiri Olsa, linux-kernel, linux-perf-users,
	Ian Rogers, Adrian Hunter, Andi Kleen, Kan Liang



On 6/9/2022 8:09 AM, Namhyung Kim wrote:
> On Tue, Jun 7, 2022 at 1:08 AM <zhengjun.xing@linux.intel.com> wrote:
>>
>> From: Zhengjun Xing <zhengjun.xing@linux.intel.com>
>>
>> Topdown metrics are missed in the default perf stat on the hybrid machine,
>> add Topdown metrics in default perf stat for hybrid systems.
>>
>> Currently, we support the perf metrics Topdown for the p-core PMU in the
>> perf stat default, the perf metrics Topdown support for e-core PMU will be
>> implemented later separately. Refactor the code adds two x86 specific
>> functions. Widen the size of the event name column by 7 chars, so that all
>> metrics after the "#" become aligned again.
>>
>> The perf metrics topdown feature is supported on the cpu_core of ADL. The
>> dedicated perf metrics counter and the fixed counter 3 are used for the
>> topdown events. Adding the topdown metrics doesn't trigger multiplexing.
>>
>> Before:
>>
>>   # ./perf  stat  -a true
>>
>>   Performance counter stats for 'system wide':
>>
>>               53.70 msec cpu-clock                 #   25.736 CPUs utilized
>>                  80      context-switches          #    1.490 K/sec
>>                  24      cpu-migrations            #  446.951 /sec
>>                  52      page-faults               #  968.394 /sec
>>           2,788,555      cpu_core/cycles/          #   51.931 M/sec
>>             851,129      cpu_atom/cycles/          #   15.851 M/sec
>>           2,974,030      cpu_core/instructions/    #   55.385 M/sec
>>             416,919      cpu_atom/instructions/    #    7.764 M/sec
>>             586,136      cpu_core/branches/        #   10.916 M/sec
>>              79,872      cpu_atom/branches/        #    1.487 M/sec
>>              14,220      cpu_core/branch-misses/   #  264.819 K/sec
>>               7,691      cpu_atom/branch-misses/   #  143.229 K/sec
>>
>>         0.002086438 seconds time elapsed
>>
>> After:
>>
>>   # ./perf stat  -a true
>>
>>   Performance counter stats for 'system wide':
>>
>>               61.39 msec cpu-clock                        #   24.874 CPUs utilized
>>                  76      context-switches                 #    1.238 K/sec
>>                  24      cpu-migrations                   #  390.968 /sec
>>                  52      page-faults                      #  847.097 /sec
>>           2,753,695      cpu_core/cycles/                 #   44.859 M/sec
>>             903,899      cpu_atom/cycles/                 #   14.725 M/sec
>>           2,927,529      cpu_core/instructions/           #   47.690 M/sec
>>             428,498      cpu_atom/instructions/           #    6.980 M/sec
>>             581,299      cpu_core/branches/               #    9.470 M/sec
>>              83,409      cpu_atom/branches/               #    1.359 M/sec
>>              13,641      cpu_core/branch-misses/          #  222.216 K/sec
>>               8,008      cpu_atom/branch-misses/          #  130.453 K/sec
>>          14,761,308      cpu_core/slots/                  #  240.466 M/sec
>>           3,288,625      cpu_core/topdown-retiring/       #     22.3% retiring
>>           1,323,323      cpu_core/topdown-bad-spec/       #      9.0% bad speculation
>>           5,477,470      cpu_core/topdown-fe-bound/       #     37.1% frontend bound
>>           4,679,199      cpu_core/topdown-be-bound/       #     31.7% backend bound
>>             646,194      cpu_core/topdown-heavy-ops/      #      4.4% heavy operations       #     17.9% light operations
>>           1,244,999      cpu_core/topdown-br-mispredict/  #      8.4% branch mispredict      #      0.5% machine clears
>>           3,891,800      cpu_core/topdown-fetch-lat/      #     26.4% fetch latency          #     10.7% fetch bandwidth
>>           1,879,034      cpu_core/topdown-mem-bound/      #     12.7% memory bound           #     19.0% Core bound
>>
>>         0.002467839 seconds time elapsed
>>
>> Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
>> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
> [SNIP]
>> +const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
>> +{
>> +       const char *pmu_name = "cpu";
>> +
>> +       if (perf_pmu__has_hybrid()) {
>> +               if (!evlist->hybrid_pmu_name) {
>> +                       if (warn)
>> +                               pr_warning
>> +                                   ("WARNING: default to use cpu_core topdown events\n");
>> +                       evlist->hybrid_pmu_name =
>> +                           perf_pmu__hybrid_type_to_pmu("core");
> 
> This doesn't look good.  Please consider reducing the
> indent level like returning early as
> 
>      if (!perf_pmu__has_hybrid())
>          return "cpu";
> 
>      if (!evlist->hybrid_pmu_name) {
>          ...
> 
Thanks for the comments, I will update it in the next version.
> Thanks,
> Namhyung
> 
> 
>> +               }
>> +
>> +               pmu_name = evlist->hybrid_pmu_name;
>> +       }
>> +       return pmu_name;
>> +}

-- 
Zhengjun Xing

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] perf x86 evlist: Add default hybrid events for perf stat
  2022-06-09  0:04   ` Namhyung Kim
@ 2022-06-09 12:47     ` Liang, Kan
  2022-06-09 13:51       ` Xing Zhengjun
  0 siblings, 1 reply; 11+ messages in thread
From: Liang, Kan @ 2022-06-09 12:47 UTC (permalink / raw)
  To: Namhyung Kim, Xing Zhengjun
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	alexander.shishkin, Jiri Olsa, linux-kernel, linux-perf-users,
	Ian Rogers, Adrian Hunter, Andi Kleen



On 6/8/2022 8:04 PM, Namhyung Kim wrote:
> Hello,
> 
> On Tue, Jun 7, 2022 at 12:31 AM <zhengjun.xing@linux.intel.com> wrote:
>>
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Provide a new solution to replace the reverted commit ac2dc29edd21
>> ("perf stat: Add default hybrid events").
>>
>> For the default software attrs, nothing is changed.
>> For the default hardware attrs, create a new evsel for each hybrid pmu.
>>
>> With the new solution, adding a new default attr will not require the
>> special support for the hybrid platform anymore.
>>
>> Also, the "--detailed" is supported on the hybrid platform
>>
>> With the patch,
>>
>> ./perf stat -a -ddd sleep 1
>>
>>   Performance counter stats for 'system wide':
>>
>>         32,231.06 msec cpu-clock                 #   32.056 CPUs utilized
>>               529      context-switches          #   16.413 /sec
>>                32      cpu-migrations            #    0.993 /sec
>>                69      page-faults               #    2.141 /sec
>>       176,754,151      cpu_core/cycles/          #    5.484 M/sec          (41.65%)
>>       161,695,280      cpu_atom/cycles/          #    5.017 M/sec          (49.92%)
>>        48,595,992      cpu_core/instructions/    #    1.508 M/sec          (49.98%)
>>        32,363,337      cpu_atom/instructions/    #    1.004 M/sec          (58.26%)
>>        10,088,639      cpu_core/branches/        #  313.010 K/sec          (58.31%)
>>         6,390,582      cpu_atom/branches/        #  198.274 K/sec          (58.26%)
>>           846,201      cpu_core/branch-misses/   #   26.254 K/sec          (66.65%)
>>           676,477      cpu_atom/branch-misses/   #   20.988 K/sec          (58.27%)
>>        14,290,070      cpu_core/L1-dcache-loads/ #  443.363 K/sec          (66.66%)
>>         9,983,532      cpu_atom/L1-dcache-loads/ #  309.749 K/sec          (58.27%)
>>           740,725      cpu_core/L1-dcache-load-misses/ #   22.982 K/sec    (66.66%)
>>   <not supported>      cpu_atom/L1-dcache-load-misses/
>>           480,441      cpu_core/LLC-loads/       #   14.906 K/sec          (66.67%)
>>           326,570      cpu_atom/LLC-loads/       #   10.132 K/sec          (58.27%)
>>               329      cpu_core/LLC-load-misses/ #   10.208 /sec           (66.68%)
>>                 0      cpu_atom/LLC-load-misses/ #    0.000 /sec           (58.32%)
>>   <not supported>      cpu_core/L1-icache-loads/
>>        21,982,491      cpu_atom/L1-icache-loads/ #  682.028 K/sec          (58.43%)
>>         4,493,189      cpu_core/L1-icache-load-misses/ #  139.406 K/sec    (33.34%)
>>         4,711,404      cpu_atom/L1-icache-load-misses/ #  146.176 K/sec    (50.08%)
>>        13,713,090      cpu_core/dTLB-loads/      #  425.462 K/sec          (33.34%)
>>         9,384,727      cpu_atom/dTLB-loads/      #  291.170 K/sec          (50.08%)
>>           157,387      cpu_core/dTLB-load-misses/ #    4.883 K/sec         (33.33%)
>>           108,328      cpu_atom/dTLB-load-misses/ #    3.361 K/sec         (50.08%)
>>   <not supported>      cpu_core/iTLB-loads/
>>   <not supported>      cpu_atom/iTLB-loads/
>>            37,655      cpu_core/iTLB-load-misses/ #    1.168 K/sec         (33.32%)
>>            61,661      cpu_atom/iTLB-load-misses/ #    1.913 K/sec         (50.03%)
>>   <not supported>      cpu_core/L1-dcache-prefetches/
>>   <not supported>      cpu_atom/L1-dcache-prefetches/
>>   <not supported>      cpu_core/L1-dcache-prefetch-misses/
>>   <not supported>      cpu_atom/L1-dcache-prefetch-misses/
>>
>>         1.005466919 seconds time elapsed
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
>> ---
>>   tools/perf/arch/x86/util/evlist.c | 52 ++++++++++++++++++++++++++++++-
>>   tools/perf/util/evlist.c          |  2 +-
>>   tools/perf/util/evlist.h          |  2 ++
>>   3 files changed, 54 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
>> index 777bdf182a58..1b3f9e1a2287 100644
>> --- a/tools/perf/arch/x86/util/evlist.c
>> +++ b/tools/perf/arch/x86/util/evlist.c
>> @@ -4,16 +4,66 @@
>>   #include "util/evlist.h"
>>   #include "util/parse-events.h"
>>   #include "topdown.h"
>> +#include "util/event.h"
>> +#include "util/pmu-hybrid.h"
>>
>>   #define TOPDOWN_L1_EVENTS      "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
>>   #define TOPDOWN_L2_EVENTS      "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}"
>>
>> +static int ___evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs)
>> +{
>> +       struct perf_cpu_map *cpus;
>> +       struct evsel *evsel, *n;
>> +       struct perf_pmu *pmu;
>> +       LIST_HEAD(head);
>> +       size_t i, j = 0;
>> +
>> +       for (i = 0; i < nr_attrs; i++)
>> +               event_attr_init(attrs + i);
>> +
>> +       if (!perf_pmu__has_hybrid())
>> +               return evlist__add_attrs(evlist, attrs, nr_attrs);
>> +
>> +       for (i = 0; i < nr_attrs; i++) {
>> +               if (attrs[i].type == PERF_TYPE_SOFTWARE) {
>> +                       evsel = evsel__new_idx(attrs + i, evlist->core.nr_entries + j);
> 
> Probably no need to calculate index (j) as it's updated
> later when it goes to the evlist...
> 
> 
>> +                       if (evsel == NULL)
>> +                               goto out_delete_partial_list;
>> +                       j++;
>> +                       list_add_tail(&evsel->core.node, &head);
>> +                       continue;
>> +               }
>> +
>> +               perf_pmu__for_each_hybrid_pmu(pmu) {
>> +                       evsel = evsel__new_idx(attrs + i, evlist->core.nr_entries + j);
>> +                       if (evsel == NULL)
>> +                               goto out_delete_partial_list;
>> +                       j++;
>> +                       evsel->core.attr.config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
>> +                       cpus = perf_cpu_map__get(pmu->cpus);
>> +                       evsel->core.cpus = cpus;
>> +                       evsel->core.own_cpus = perf_cpu_map__get(cpus);
>> +                       evsel->pmu_name = strdup(pmu->name);
>> +                       list_add_tail(&evsel->core.node, &head);
>> +               }
>> +       }
>> +
>> +       evlist__splice_list_tail(evlist, &head);
> 
> ... like here.

Yes, the index of all new evsel will be updated when adding to the evlist.

Zhengjun, could you please handle the patch? Just set 0 idx for the new 
evsel should be good enough.


Thanks,
Kan

> 
> Thanks,
> Namhyung
> 
> 
>> +
>> +       return 0;
>> +
>> +out_delete_partial_list:
>> +       __evlist__for_each_entry_safe(&head, n, evsel)
>> +               evsel__delete(evsel);
>> +       return -1;
>> +}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] perf x86 evlist: Add default hybrid events for perf stat
  2022-06-09 12:47     ` Liang, Kan
@ 2022-06-09 13:51       ` Xing Zhengjun
  0 siblings, 0 replies; 11+ messages in thread
From: Xing Zhengjun @ 2022-06-09 13:51 UTC (permalink / raw)
  To: Liang, Kan, Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	alexander.shishkin, Jiri Olsa, linux-kernel, linux-perf-users,
	Ian Rogers, Adrian Hunter, Andi Kleen



On 6/9/2022 8:47 PM, Liang, Kan wrote:
> 
> 
> On 6/8/2022 8:04 PM, Namhyung Kim wrote:
>> Hello,
>>
>> On Tue, Jun 7, 2022 at 12:31 AM <zhengjun.xing@linux.intel.com> wrote:
>>>
>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>
>>> Provide a new solution to replace the reverted commit ac2dc29edd21
>>> ("perf stat: Add default hybrid events").
>>>
>>> For the default software attrs, nothing is changed.
>>> For the default hardware attrs, create a new evsel for each hybrid pmu.
>>>
>>> With the new solution, adding a new default attr will not require the
>>> special support for the hybrid platform anymore.
>>>
>>> Also, the "--detailed" is supported on the hybrid platform
>>>
>>> With the patch,
>>>
>>> ./perf stat -a -ddd sleep 1
>>>
>>>   Performance counter stats for 'system wide':
>>>
>>>         32,231.06 msec cpu-clock                 #   32.056 CPUs 
>>> utilized
>>>               529      context-switches          #   16.413 /sec
>>>                32      cpu-migrations            #    0.993 /sec
>>>                69      page-faults               #    2.141 /sec
>>>       176,754,151      cpu_core/cycles/          #    5.484 
>>> M/sec          (41.65%)
>>>       161,695,280      cpu_atom/cycles/          #    5.017 
>>> M/sec          (49.92%)
>>>        48,595,992      cpu_core/instructions/    #    1.508 
>>> M/sec          (49.98%)
>>>        32,363,337      cpu_atom/instructions/    #    1.004 
>>> M/sec          (58.26%)
>>>        10,088,639      cpu_core/branches/        #  313.010 
>>> K/sec          (58.31%)
>>>         6,390,582      cpu_atom/branches/        #  198.274 
>>> K/sec          (58.26%)
>>>           846,201      cpu_core/branch-misses/   #   26.254 
>>> K/sec          (66.65%)
>>>           676,477      cpu_atom/branch-misses/   #   20.988 
>>> K/sec          (58.27%)
>>>        14,290,070      cpu_core/L1-dcache-loads/ #  443.363 
>>> K/sec          (66.66%)
>>>         9,983,532      cpu_atom/L1-dcache-loads/ #  309.749 
>>> K/sec          (58.27%)
>>>           740,725      cpu_core/L1-dcache-load-misses/ #   22.982 
>>> K/sec    (66.66%)
>>>   <not supported>      cpu_atom/L1-dcache-load-misses/
>>>           480,441      cpu_core/LLC-loads/       #   14.906 
>>> K/sec          (66.67%)
>>>           326,570      cpu_atom/LLC-loads/       #   10.132 
>>> K/sec          (58.27%)
>>>               329      cpu_core/LLC-load-misses/ #   10.208 
>>> /sec           (66.68%)
>>>                 0      cpu_atom/LLC-load-misses/ #    0.000 
>>> /sec           (58.32%)
>>>   <not supported>      cpu_core/L1-icache-loads/
>>>        21,982,491      cpu_atom/L1-icache-loads/ #  682.028 
>>> K/sec          (58.43%)
>>>         4,493,189      cpu_core/L1-icache-load-misses/ #  139.406 
>>> K/sec    (33.34%)
>>>         4,711,404      cpu_atom/L1-icache-load-misses/ #  146.176 
>>> K/sec    (50.08%)
>>>        13,713,090      cpu_core/dTLB-loads/      #  425.462 
>>> K/sec          (33.34%)
>>>         9,384,727      cpu_atom/dTLB-loads/      #  291.170 
>>> K/sec          (50.08%)
>>>           157,387      cpu_core/dTLB-load-misses/ #    4.883 
>>> K/sec         (33.33%)
>>>           108,328      cpu_atom/dTLB-load-misses/ #    3.361 
>>> K/sec         (50.08%)
>>>   <not supported>      cpu_core/iTLB-loads/
>>>   <not supported>      cpu_atom/iTLB-loads/
>>>            37,655      cpu_core/iTLB-load-misses/ #    1.168 
>>> K/sec         (33.32%)
>>>            61,661      cpu_atom/iTLB-load-misses/ #    1.913 
>>> K/sec         (50.03%)
>>>   <not supported>      cpu_core/L1-dcache-prefetches/
>>>   <not supported>      cpu_atom/L1-dcache-prefetches/
>>>   <not supported>      cpu_core/L1-dcache-prefetch-misses/
>>>   <not supported>      cpu_atom/L1-dcache-prefetch-misses/
>>>
>>>         1.005466919 seconds time elapsed
>>>
>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>> Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
>>> ---
>>>   tools/perf/arch/x86/util/evlist.c | 52 ++++++++++++++++++++++++++++++-
>>>   tools/perf/util/evlist.c          |  2 +-
>>>   tools/perf/util/evlist.h          |  2 ++
>>>   3 files changed, 54 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/perf/arch/x86/util/evlist.c 
>>> b/tools/perf/arch/x86/util/evlist.c
>>> index 777bdf182a58..1b3f9e1a2287 100644
>>> --- a/tools/perf/arch/x86/util/evlist.c
>>> +++ b/tools/perf/arch/x86/util/evlist.c
>>> @@ -4,16 +4,66 @@
>>>   #include "util/evlist.h"
>>>   #include "util/parse-events.h"
>>>   #include "topdown.h"
>>> +#include "util/event.h"
>>> +#include "util/pmu-hybrid.h"
>>>
>>>   #define TOPDOWN_L1_EVENTS      
>>> "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}" 
>>>
>>>   #define TOPDOWN_L2_EVENTS      
>>> "{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" 
>>>
>>>
>>> +static int ___evlist__add_default_attrs(struct evlist *evlist, 
>>> struct perf_event_attr *attrs, size_t nr_attrs)
>>> +{
>>> +       struct perf_cpu_map *cpus;
>>> +       struct evsel *evsel, *n;
>>> +       struct perf_pmu *pmu;
>>> +       LIST_HEAD(head);
>>> +       size_t i, j = 0;
>>> +
>>> +       for (i = 0; i < nr_attrs; i++)
>>> +               event_attr_init(attrs + i);
>>> +
>>> +       if (!perf_pmu__has_hybrid())
>>> +               return evlist__add_attrs(evlist, attrs, nr_attrs);
>>> +
>>> +       for (i = 0; i < nr_attrs; i++) {
>>> +               if (attrs[i].type == PERF_TYPE_SOFTWARE) {
>>> +                       evsel = evsel__new_idx(attrs + i, 
>>> evlist->core.nr_entries + j);
>>
>> Probably no need to calculate index (j) as it's updated
>> later when it goes to the evlist...
>>
>>
>>> +                       if (evsel == NULL)
>>> +                               goto out_delete_partial_list;
>>> +                       j++;
>>> +                       list_add_tail(&evsel->core.node, &head);
>>> +                       continue;
>>> +               }
>>> +
>>> +               perf_pmu__for_each_hybrid_pmu(pmu) {
>>> +                       evsel = evsel__new_idx(attrs + i, 
>>> evlist->core.nr_entries + j);
>>> +                       if (evsel == NULL)
>>> +                               goto out_delete_partial_list;
>>> +                       j++;
>>> +                       evsel->core.attr.config |= (__u64)pmu->type 
>>> << PERF_PMU_TYPE_SHIFT;
>>> +                       cpus = perf_cpu_map__get(pmu->cpus);
>>> +                       evsel->core.cpus = cpus;
>>> +                       evsel->core.own_cpus = perf_cpu_map__get(cpus);
>>> +                       evsel->pmu_name = strdup(pmu->name);
>>> +                       list_add_tail(&evsel->core.node, &head);
>>> +               }
>>> +       }
>>> +
>>> +       evlist__splice_list_tail(evlist, &head);
>>
>> ... like here.
> 
> Yes, the index of all new evsel will be updated when adding to the evlist.
> 
> Zhengjun, could you please handle the patch? Just set 0 idx for the new 
> evsel should be good enough.
> 
> 
Ok, I will update it in the new version.
> Thanks,
> Kan
> 
>>
>> Thanks,
>> Namhyung
>>
>>
>>> +
>>> +       return 0;
>>> +
>>> +out_delete_partial_list:
>>> +       __evlist__for_each_entry_safe(&head, n, evsel)
>>> +               evsel__delete(evsel);
>>> +       return -1;
>>> +}

-- 
Zhengjun Xing

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-06-09 13:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-07  1:33 [PATCH 0/5] Add perf stat default events for hybrid machines zhengjun.xing
2022-06-07  1:33 ` [PATCH 1/5] perf stat: Revert "perf stat: Add default hybrid events" zhengjun.xing
2022-06-07  1:33 ` [PATCH 2/5] perf evsel: Add arch_evsel__hw_name() zhengjun.xing
2022-06-07  1:33 ` [PATCH 3/5] perf evlist: Always use arch_evlist__add_default_attrs() zhengjun.xing
2022-06-07  1:33 ` [PATCH 4/5] perf x86 evlist: Add default hybrid events for perf stat zhengjun.xing
2022-06-09  0:04   ` Namhyung Kim
2022-06-09 12:47     ` Liang, Kan
2022-06-09 13:51       ` Xing Zhengjun
2022-06-07  1:33 ` [PATCH 5/5] perf stat: Add topdown metrics in the default perf stat on the hybrid machine zhengjun.xing
2022-06-09  0:09   ` Namhyung Kim
2022-06-09 10:41     ` Xing Zhengjun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.