All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes)
@ 2022-06-01  3:26 Ravi Bangoria
  2022-06-01  3:26 ` [PATCH v5 1/8] perf record ibs: Warn about sampling period skew Ravi Bangoria
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01  3:26 UTC (permalink / raw)
  To: acme, kan.liang
  Cc: ravi.bangoria, jolsa, irogers, peterz, rrichter, mingo,
	mark.rutland, namhyung, tglx, bp, james.clark, leo.yan, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

Kernel side of changes have already been applied to linus/master
(except amd-ibs.h header). This series contains perf tool changes.

Kan, I don't have any machine with heterogeneou cpus. It would be
helpful if you can check HEADER_PMU_CAPS on Intel ADL machine.

v4: https://lore.kernel.org/lkml/20220523033945.1612-1-ravi.bangoria@amd.com
v4->v5:
 - Replace HEADER_HYBRID_CPU_PMU_CAPS with HEADER_PMU_CAPS instead of
   adding new header HEADER_PMU_CAPS. Special care is taken by writing
   hybrid cpu pmu caps first in the header to make sure old perf tool
   does not break.
 - Store HEADER_CPU_PMU_CAPS capabilities in an array instead of single
   string separated by NULL.
 - Include "cpu" pmu while searching for capabilities in perf_env.
 - Rebase on acme/perf/core (9dde6cadb92b5)

Original cover letter:

IBS support has been enhanced with two new features in upcoming uarch:
1. DataSrc extension and 2. L3 Miss Filtering capability. Both are
indicated by CPUID_Fn8000001B_EAX bit 11.

DataSrc extension provides additional data source details for tagged
load/store operations. Add support for these new bits in perf report/
script raw-dump.

IBS L3 miss filtering works by tagging an instruction on IBS counter
overflow and generating an NMI if the tagged instruction causes an L3
miss. Samples without an L3 miss are discarded and counter is reset
with random value (between 1-15 for fetch pmu and 1-127 for op pmu).
This helps in reducing sampling overhead when user is interested only
in such samples. One of the use case of such filtered samples is to
feed data to page-migration daemon in tiered memory systems.

Add support for L3 miss filtering in IBS driver via new pmu attribute
"l3missonly". Example usage:

  # perf record -a -e ibs_op/l3missonly=1/ --raw-samples sleep 5
  # perf report -D

Some important points to keep in mind while using L3 miss filtering:
1. Hw internally reset sampling period when tagged instruction does
   not cause L3 miss. But there is no way to reconstruct aggregated
   sampling period when this happens.
2. L3 miss is not the actual event being counted. Rather, IBS will
   count fetch, cycles or uOps depending on the configuration. Thus
   sampling period have no direct connection to L3 misses.

1st causes sampling period skew. Thus, I've added warning message at
perf record:

  # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/
  WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled
  and tagged operation does not cause L3 Miss. This causes sampling period skew.

User can configure smaller sampling period to get more samples while
using l3missonly.


Ravi Bangoria (8):
  perf record ibs: Warn about sampling period skew
  perf tool: Parse pmu caps sysfs only once
  perf headers: Pass "cpu" pmu name while printing caps
  perf headers: Store pmu caps in an array of strings
  perf headers: Record non-cpu pmu capabilities
  perf/x86/ibs: Add new IBS register bits into header
  perf tool ibs: Sync amd ibs header file
  perf script ibs: Support new IBS bits in raw trace dump

 arch/x86/include/asm/amd-ibs.h                |  16 +-
 tools/arch/x86/include/asm/amd-ibs.h          |  16 +-
 .../Documentation/perf.data-file-format.txt   |  10 +-
 tools/perf/arch/x86/util/evsel.c              |  49 +++++
 tools/perf/builtin-inject.c                   |   2 +-
 tools/perf/util/amd-sample-raw.c              |  68 +++++-
 tools/perf/util/env.c                         |  62 +++++-
 tools/perf/util/env.h                         |  14 +-
 tools/perf/util/evsel.c                       |   7 +
 tools/perf/util/evsel.h                       |   1 +
 tools/perf/util/header.c                      | 196 ++++++++++--------
 tools/perf/util/header.h                      |   2 +-
 tools/perf/util/pmu.c                         |  15 +-
 tools/perf/util/pmu.h                         |   2 +
 14 files changed, 333 insertions(+), 127 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 1/8] perf record ibs: Warn about sampling period skew
  2022-06-01  3:26 [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Ravi Bangoria
@ 2022-06-01  3:26 ` Ravi Bangoria
  2022-06-02 20:30   ` Namhyung Kim
  2022-06-01  3:26 ` [PATCH v5 2/8] perf tool: Parse pmu caps sysfs only once Ravi Bangoria
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01  3:26 UTC (permalink / raw)
  To: acme, kan.liang
  Cc: ravi.bangoria, jolsa, irogers, peterz, rrichter, mingo,
	mark.rutland, namhyung, tglx, bp, james.clark, leo.yan, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

Samples without an L3 miss are discarded and counter is reset with
random value (between 1-15 for fetch pmu and 1-127 for op pmu) when
IBS L3 miss filtering is enabled. This causes a sampling period skew
but there is no way to reconstruct aggregated sampling period. So
print a warning at perf record if user sets l3missonly=1.

Ex:
  # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/
  WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled
  and tagged operation does not cause L3 Miss. This causes sampling period skew.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/x86/util/evsel.c | 49 ++++++++++++++++++++++++++++++++
 tools/perf/util/evsel.c          |  7 +++++
 tools/perf/util/evsel.h          |  1 +
 3 files changed, 57 insertions(+)

diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
index 88306183d629..fceb904902ab 100644
--- a/tools/perf/arch/x86/util/evsel.c
+++ b/tools/perf/arch/x86/util/evsel.c
@@ -5,6 +5,7 @@
 #include "util/env.h"
 #include "util/pmu.h"
 #include "linux/string.h"
+#include "util/debug.h"
 
 void arch_evsel__set_sample_weight(struct evsel *evsel)
 {
@@ -60,3 +61,51 @@ bool arch_evsel__must_be_in_group(const struct evsel *evsel)
 		(!strcasecmp(evsel->name, "slots") ||
 		 strcasestr(evsel->name, "topdown"));
 }
+
+static void ibs_l3miss_warn(void)
+{
+	pr_warning(
+"WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled\n"
+"and tagged operation does not cause L3 Miss. This causes sampling period skew.\n");
+}
+
+void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr)
+{
+	struct perf_pmu *evsel_pmu, *ibs_fetch_pmu, *ibs_op_pmu;
+	static int warned_once;
+	/* 0: Uninitialized, 1: Yes, -1: No */
+	static int is_amd;
+
+	if (warned_once || is_amd == -1)
+		return;
+
+	if (!is_amd) {
+		struct perf_env *env = evsel__env(evsel);
+
+		if (!perf_env__cpuid(env) || !env->cpuid ||
+		    !strstarts(env->cpuid, "AuthenticAMD")) {
+			is_amd = -1;
+			return;
+		}
+		is_amd = 1;
+	}
+
+	evsel_pmu = evsel__find_pmu(evsel);
+	if (!evsel_pmu)
+		return;
+
+	ibs_fetch_pmu = perf_pmu__find("ibs_fetch");
+	ibs_op_pmu = perf_pmu__find("ibs_op");
+
+	if (ibs_fetch_pmu && ibs_fetch_pmu->type == evsel_pmu->type) {
+		if (attr->config & (1ULL << 59)) {
+			ibs_l3miss_warn();
+			warned_once = 1;
+		}
+	} else if (ibs_op_pmu && ibs_op_pmu->type == evsel_pmu->type) {
+		if (attr->config & (1ULL << 16)) {
+			ibs_l3miss_warn();
+			warned_once = 1;
+		}
+	}
+}
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ce499c5da8d7..8fea51a9cd90 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1091,6 +1091,11 @@ void __weak arch_evsel__fixup_new_cycles(struct perf_event_attr *attr __maybe_un
 {
 }
 
+void __weak arch__post_evsel_config(struct evsel *evsel __maybe_unused,
+				    struct perf_event_attr *attr __maybe_unused)
+{
+}
+
 static void evsel__set_default_freq_period(struct record_opts *opts,
 					   struct perf_event_attr *attr)
 {
@@ -1366,6 +1371,8 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
 	 */
 	if (evsel__is_dummy_event(evsel))
 		evsel__reset_sample_bit(evsel, BRANCH_STACK);
+
+	arch__post_evsel_config(evsel, attr);
 }
 
 int evsel__set_filter(struct evsel *evsel, const char *filter)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 73ea48e94079..92bed8e2f7d8 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -297,6 +297,7 @@ void evsel__set_sample_id(struct evsel *evsel, bool use_sample_identifier);
 
 void arch_evsel__set_sample_weight(struct evsel *evsel);
 void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr);
+void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr);
 
 int evsel__set_filter(struct evsel *evsel, const char *filter);
 int evsel__append_tp_filter(struct evsel *evsel, const char *filter);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 2/8] perf tool: Parse pmu caps sysfs only once
  2022-06-01  3:26 [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Ravi Bangoria
  2022-06-01  3:26 ` [PATCH v5 1/8] perf record ibs: Warn about sampling period skew Ravi Bangoria
@ 2022-06-01  3:26 ` Ravi Bangoria
  2022-06-01 13:35   ` Liang, Kan
  2022-06-01  3:26 ` [PATCH v5 3/8] perf headers: Pass "cpu" pmu name while printing caps Ravi Bangoria
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01  3:26 UTC (permalink / raw)
  To: acme, kan.liang
  Cc: ravi.bangoria, jolsa, irogers, peterz, rrichter, mingo,
	mark.rutland, namhyung, tglx, bp, james.clark, leo.yan, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

In addition to returning nr_caps, cache it locally in struct perf_pmu.
Similarly, cache status of whether caps sysfs has already been parsed
or not. These will help to avoid parsing sysfs every time the function
gets called.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/pmu.c | 15 +++++++++++----
 tools/perf/util/pmu.h |  2 ++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 9a1c7e63e663..0112e1c36418 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1890,7 +1890,11 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
 	const char *sysfs = sysfs__mountpoint();
 	DIR *caps_dir;
 	struct dirent *evt_ent;
-	int nr_caps = 0;
+
+	if (pmu->caps_initialized)
+		return pmu->nr_caps;
+
+	pmu->nr_caps = 0;
 
 	if (!sysfs)
 		return -1;
@@ -1898,8 +1902,10 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
 	snprintf(caps_path, PATH_MAX,
 		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/caps", sysfs, pmu->name);
 
-	if (stat(caps_path, &st) < 0)
+	if (stat(caps_path, &st) < 0) {
+		pmu->caps_initialized = true;
 		return 0;	/* no error if caps does not exist */
+	}
 
 	caps_dir = opendir(caps_path);
 	if (!caps_dir)
@@ -1926,13 +1932,14 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
 			continue;
 		}
 
-		nr_caps++;
+		pmu->nr_caps++;
 		fclose(file);
 	}
 
 	closedir(caps_dir);
 
-	return nr_caps;
+	pmu->caps_initialized = true;
+	return pmu->nr_caps;
 }
 
 void perf_pmu__warn_invalid_config(struct perf_pmu *pmu, __u64 config,
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 541889fa9f9c..4b45fd8da5a3 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -46,6 +46,8 @@ struct perf_pmu {
 	struct perf_cpu_map *cpus;
 	struct list_head format;  /* HEAD struct perf_pmu_format -> list */
 	struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
+	bool caps_initialized;
+	u32 nr_caps;
 	struct list_head caps;    /* HEAD struct perf_pmu_caps -> list */
 	struct list_head list;    /* ELEM */
 	struct list_head hybrid_list;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 3/8] perf headers: Pass "cpu" pmu name while printing caps
  2022-06-01  3:26 [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Ravi Bangoria
  2022-06-01  3:26 ` [PATCH v5 1/8] perf record ibs: Warn about sampling period skew Ravi Bangoria
  2022-06-01  3:26 ` [PATCH v5 2/8] perf tool: Parse pmu caps sysfs only once Ravi Bangoria
@ 2022-06-01  3:26 ` Ravi Bangoria
  2022-06-01 13:35   ` Liang, Kan
  2022-06-01  3:26 ` [PATCH v5 4/8] perf headers: Store pmu caps in an array of strings Ravi Bangoria
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01  3:26 UTC (permalink / raw)
  To: acme, kan.liang
  Cc: ravi.bangoria, jolsa, irogers, peterz, rrichter, mingo,
	mark.rutland, namhyung, tglx, bp, james.clark, leo.yan, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

Avoid unnecessary conditional code to check if pmu name is NULL
or not by passing "cpu" pmu name to the printing function.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/header.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 53332da100e8..ee7ccd94e272 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2058,17 +2058,11 @@ static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char *cpu_pmu_caps,
 	char *str, buf[128];
 
 	if (!nr_caps) {
-		if (!pmu_name)
-			fprintf(fp, "# cpu pmu capabilities: not available\n");
-		else
-			fprintf(fp, "# %s pmu capabilities: not available\n", pmu_name);
+		fprintf(fp, "# %s pmu capabilities: not available\n", pmu_name);
 		return;
 	}
 
-	if (!pmu_name)
-		scnprintf(buf, sizeof(buf), "# cpu pmu capabilities: ");
-	else
-		scnprintf(buf, sizeof(buf), "# %s pmu capabilities: ", pmu_name);
+	scnprintf(buf, sizeof(buf), "# %s pmu capabilities: ", pmu_name);
 
 	delimiter = buf;
 
@@ -2085,7 +2079,7 @@ static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char *cpu_pmu_caps,
 static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
 {
 	print_per_cpu_pmu_caps(fp, ff->ph->env.nr_cpu_pmu_caps,
-			       ff->ph->env.cpu_pmu_caps, NULL);
+			       ff->ph->env.cpu_pmu_caps, (char *)"cpu");
 }
 
 static void print_hybrid_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 4/8] perf headers: Store pmu caps in an array of strings
  2022-06-01  3:26 [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Ravi Bangoria
                   ` (2 preceding siblings ...)
  2022-06-01  3:26 ` [PATCH v5 3/8] perf headers: Pass "cpu" pmu name while printing caps Ravi Bangoria
@ 2022-06-01  3:26 ` Ravi Bangoria
  2022-06-01 13:36   ` Liang, Kan
  2022-06-02 21:37   ` Namhyung Kim
  2022-06-01  3:26 ` [PATCH v5 5/8] perf headers: Record non-cpu pmu capabilities Ravi Bangoria
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01  3:26 UTC (permalink / raw)
  To: acme, kan.liang
  Cc: ravi.bangoria, jolsa, irogers, peterz, rrichter, mingo,
	mark.rutland, namhyung, tglx, bp, james.clark, leo.yan, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

Currently all capabilities are stored in a single string separated
by NULL character. Instead, store them in an array which makes
searching of capability easier.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/env.c    |  6 +++-
 tools/perf/util/env.h    |  4 +--
 tools/perf/util/header.c | 70 +++++++++++++++++++++++-----------------
 3 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 579e44c59914..7d3aeb2e4622 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -179,7 +179,7 @@ static void perf_env__purge_bpf(struct perf_env *env __maybe_unused)
 
 void perf_env__exit(struct perf_env *env)
 {
-	int i;
+	int i, j;
 
 	perf_env__purge_bpf(env);
 	perf_env__purge_cgroups(env);
@@ -196,6 +196,8 @@ void perf_env__exit(struct perf_env *env)
 	zfree(&env->sibling_threads);
 	zfree(&env->pmu_mappings);
 	zfree(&env->cpu);
+	for (i = 0; i < env->nr_cpu_pmu_caps; i++)
+		zfree(&env->cpu_pmu_caps[i]);
 	zfree(&env->cpu_pmu_caps);
 	zfree(&env->numa_map);
 
@@ -218,6 +220,8 @@ void perf_env__exit(struct perf_env *env)
 	zfree(&env->hybrid_nodes);
 
 	for (i = 0; i < env->nr_hybrid_cpc_nodes; i++) {
+		for (j = 0; j < env->hybrid_cpc_nodes[i].nr_cpu_pmu_caps; j++)
+			zfree(&env->hybrid_cpc_nodes[i].cpu_pmu_caps[j]);
 		zfree(&env->hybrid_cpc_nodes[i].cpu_pmu_caps);
 		zfree(&env->hybrid_cpc_nodes[i].pmu_name);
 	}
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index a3541f98e1fc..43aab59f7322 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -46,7 +46,7 @@ struct hybrid_node {
 struct hybrid_cpc_node {
 	int		nr_cpu_pmu_caps;
 	unsigned int    max_branches;
-	char            *cpu_pmu_caps;
+	char            **cpu_pmu_caps;
 	char            *pmu_name;
 };
 
@@ -81,7 +81,7 @@ struct perf_env {
 	char			*sibling_dies;
 	char			*sibling_threads;
 	char			*pmu_mappings;
-	char			*cpu_pmu_caps;
+	char			**cpu_pmu_caps;
 	struct cpu_topology_map	*cpu;
 	struct cpu_cache_level	*caches;
 	int			 caches_cnt;
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index ee7ccd94e272..a1e4ec53333d 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2051,26 +2051,21 @@ static void print_compressed(struct feat_fd *ff, FILE *fp)
 		ff->ph->env.comp_level, ff->ph->env.comp_ratio);
 }
 
-static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char *cpu_pmu_caps,
+static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char **cpu_pmu_caps,
 				   char *pmu_name)
 {
-	const char *delimiter;
-	char *str, buf[128];
+	const char *delimiter = "";
+	int i;
 
 	if (!nr_caps) {
 		fprintf(fp, "# %s pmu capabilities: not available\n", pmu_name);
 		return;
 	}
 
-	scnprintf(buf, sizeof(buf), "# %s pmu capabilities: ", pmu_name);
-
-	delimiter = buf;
-
-	str = cpu_pmu_caps;
-	while (nr_caps--) {
-		fprintf(fp, "%s%s", delimiter, str);
+	fprintf(fp, "# %s pmu capabilities: ", pmu_name);
+	for (i = 0; i < nr_caps; i++) {
+		fprintf(fp, "%s%s", delimiter, cpu_pmu_caps[i]);
 		delimiter = ", ";
-		str += strlen(str) + 1;
 	}
 
 	fprintf(fp, "\n");
@@ -3202,27 +3197,27 @@ static int process_compressed(struct feat_fd *ff,
 }
 
 static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
-				    char **cpu_pmu_caps,
+				    char ***cpu_pmu_caps,
 				    unsigned int *max_branches)
 {
-	char *name, *value;
-	struct strbuf sb;
-	u32 nr_caps;
+	int name_size, value_size;
+	char *name, *value, *ptr;
+	u32 nr_caps, i;
+
+	*nr_cpu_pmu_caps = 0;
+	*cpu_pmu_caps = NULL;
 
 	if (do_read_u32(ff, &nr_caps))
 		return -1;
 
-	if (!nr_caps) {
-		pr_debug("cpu pmu capabilities not available\n");
+	if (!nr_caps)
 		return 0;
-	}
-
-	*nr_cpu_pmu_caps = nr_caps;
 
-	if (strbuf_init(&sb, 128) < 0)
+	*cpu_pmu_caps = zalloc(sizeof(char *) * nr_caps);
+	if (!*cpu_pmu_caps)
 		return -1;
 
-	while (nr_caps--) {
+	for (i = 0; i < nr_caps; i++) {
 		name = do_read_string(ff);
 		if (!name)
 			goto error;
@@ -3231,12 +3226,16 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
 		if (!value)
 			goto free_name;
 
-		if (strbuf_addf(&sb, "%s=%s", name, value) < 0)
+		name_size = strlen(name);
+		value_size = strlen(value);
+		ptr = zalloc(sizeof(char) * (name_size + value_size + 2));
+		if (!ptr)
 			goto free_value;
 
-		/* include a NULL character at the end */
-		if (strbuf_add(&sb, "", 1) < 0)
-			goto free_value;
+		memcpy(ptr, name, name_size);
+		ptr[name_size] = '=';
+		memcpy(ptr + name_size + 1, value, value_size);
+		(*cpu_pmu_caps)[i] = ptr;
 
 		if (!strcmp(name, "branches"))
 			*max_branches = atoi(value);
@@ -3244,7 +3243,7 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
 		free(value);
 		free(name);
 	}
-	*cpu_pmu_caps = strbuf_detach(&sb, NULL);
+	*nr_cpu_pmu_caps = nr_caps;
 	return 0;
 
 free_value:
@@ -3252,16 +3251,24 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
 free_name:
 	free(name);
 error:
-	strbuf_release(&sb);
+	for (; i > 0; i--)
+		free((*cpu_pmu_caps)[i - 1]);
+	free(*cpu_pmu_caps);
+	*cpu_pmu_caps = NULL;
+	*nr_cpu_pmu_caps = 0;
 	return -1;
 }
 
 static int process_cpu_pmu_caps(struct feat_fd *ff,
 				void *data __maybe_unused)
 {
-	return process_per_cpu_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
+	int ret = process_per_cpu_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
 					&ff->ph->env.cpu_pmu_caps,
 					&ff->ph->env.max_branches);
+
+	if (!ret && !ff->ph->env.cpu_pmu_caps)
+		pr_debug("cpu pmu capabilities not available\n");
+	return ret;
 }
 
 static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
@@ -3270,6 +3277,7 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
 	struct hybrid_cpc_node *nodes;
 	u32 nr_pmu, i;
 	int ret;
+	int j;
 
 	if (do_read_u32(ff, &nr_pmu))
 		return -1;
@@ -3297,6 +3305,8 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
 			ret = -1;
 			goto err;
 		}
+		if (!n->nr_cpu_pmu_caps)
+			pr_debug("%s pmu capabilities not available\n", n->pmu_name);
 	}
 
 	ff->ph->env.nr_hybrid_cpc_nodes = nr_pmu;
@@ -3305,6 +3315,8 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
 
 err:
 	for (i = 0; i < nr_pmu; i++) {
+		for (j = 0; j < nodes[i].nr_cpu_pmu_caps; j++)
+			free(nodes[i].cpu_pmu_caps[j]);
 		free(nodes[i].cpu_pmu_caps);
 		free(nodes[i].pmu_name);
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 5/8] perf headers: Record non-cpu pmu capabilities
  2022-06-01  3:26 [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Ravi Bangoria
                   ` (3 preceding siblings ...)
  2022-06-01  3:26 ` [PATCH v5 4/8] perf headers: Store pmu caps in an array of strings Ravi Bangoria
@ 2022-06-01  3:26 ` Ravi Bangoria
  2022-06-01 13:37   ` Liang, Kan
  2022-06-01  3:26 ` [PATCH v5 6/8] perf/x86/ibs: Add new IBS register bits into header Ravi Bangoria
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01  3:26 UTC (permalink / raw)
  To: acme, kan.liang
  Cc: ravi.bangoria, jolsa, irogers, peterz, rrichter, mingo,
	mark.rutland, namhyung, tglx, bp, james.clark, leo.yan, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

Pmus advertise their capabilities via sysfs attribute files but
perf tool currently parses only core(cpu) or hybrid core pmu
capabilities. Add support of recording non-core pmu capabilities
int perf.data header.

Note that a newly proposed HEADER_PMU_CAPS is replacing existing
HEADER_HYBRID_CPU_PMU_CAPS. Special care is taken for hybrid core
pmus by writing their capabilities first in the perf.data header
to make sure new perf.data file being read by old perf tool does
not break.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 .../Documentation/perf.data-file-format.txt   |  10 +-
 tools/perf/builtin-inject.c                   |   2 +-
 tools/perf/util/env.c                         |  60 ++++++-
 tools/perf/util/env.h                         |  12 +-
 tools/perf/util/header.c                      | 160 ++++++++++--------
 tools/perf/util/header.h                      |   2 +-
 6 files changed, 158 insertions(+), 88 deletions(-)

diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt
index f56d0e0fbff6..2233534e508a 100644
--- a/tools/perf/Documentation/perf.data-file-format.txt
+++ b/tools/perf/Documentation/perf.data-file-format.txt
@@ -419,18 +419,20 @@ Example:
   cpu_core cpu list : 0-15
   cpu_atom cpu list : 16-23
 
-	HEADER_HYBRID_CPU_PMU_CAPS = 31,
+	HEADER_PMU_CAPS = 31,
 
-	A list of hybrid CPU PMU capabilities.
+	List of pmu capabilities (except cpu pmu which is already
+	covered by HEADER_CPU_PMU_CAPS). Note that hybrid cpu pmu
+	capabilities are also stored here.
 
 struct {
 	u32 nr_pmu;
 	struct {
-		u32 nr_cpu_pmu_caps;
+		u32 nr_caps;
 		{
 			char	name[];
 			char	value[];
-		} [nr_cpu_pmu_caps];
+		} [nr_caps];
 		char pmu_name[];
 	} [nr_pmu];
 };
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index a75bf11585b5..05bc0cfccf83 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -809,7 +809,7 @@ static bool keep_feat(int feat)
 	case HEADER_CPU_PMU_CAPS:
 	case HEADER_CLOCK_DATA:
 	case HEADER_HYBRID_TOPOLOGY:
-	case HEADER_HYBRID_CPU_PMU_CAPS:
+	case HEADER_PMU_CAPS:
 		return true;
 	/* Information that can be updated */
 	case HEADER_BUILD_ID:
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 7d3aeb2e4622..5b8cf6a421a4 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -219,13 +219,13 @@ void perf_env__exit(struct perf_env *env)
 	}
 	zfree(&env->hybrid_nodes);
 
-	for (i = 0; i < env->nr_hybrid_cpc_nodes; i++) {
-		for (j = 0; j < env->hybrid_cpc_nodes[i].nr_cpu_pmu_caps; j++)
-			zfree(&env->hybrid_cpc_nodes[i].cpu_pmu_caps[j]);
-		zfree(&env->hybrid_cpc_nodes[i].cpu_pmu_caps);
-		zfree(&env->hybrid_cpc_nodes[i].pmu_name);
+	for (i = 0; i < env->nr_pmus_with_caps; i++) {
+		for (j = 0; j < env->pmu_caps[i].nr_caps; j++)
+			zfree(&env->pmu_caps[i].caps[j]);
+		zfree(&env->pmu_caps[i].caps);
+		zfree(&env->pmu_caps[i].pmu_name);
 	}
-	zfree(&env->hybrid_cpc_nodes);
+	zfree(&env->pmu_caps);
 }
 
 void perf_env__init(struct perf_env *env)
@@ -531,3 +531,51 @@ int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu)
 
 	return cpu.cpu >= 0 && cpu.cpu < env->nr_numa_map ? env->numa_map[cpu.cpu] : -1;
 }
+
+char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name,
+			     const char *cap)
+{
+	char *cap_eq;
+	int cap_size;
+	char **ptr;
+	int i, j;
+
+	if (!pmu_name || !cap)
+		return NULL;
+
+	cap_size = strlen(cap);
+	cap_eq = zalloc(cap_size + 2);
+	if (!cap_eq)
+		return NULL;
+
+	memcpy(cap_eq, cap, cap_size);
+	cap_eq[cap_size] = '=';
+
+	if (!strcmp(pmu_name, "cpu")) {
+		for (i = 0; i < env->nr_cpu_pmu_caps; i++) {
+			if (!strncmp(env->cpu_pmu_caps[i], cap_eq, cap_size + 1)) {
+				free(cap_eq);
+				return &env->cpu_pmu_caps[i][cap_size + 1];
+			}
+		}
+		goto out;
+	}
+
+	for (i = 0; i < env->nr_pmus_with_caps; i++) {
+		if (strcmp(env->pmu_caps[i].pmu_name, pmu_name))
+			continue;
+
+		ptr = env->pmu_caps[i].caps;
+
+		for (j = 0; j < env->pmu_caps[i].nr_caps; j++) {
+			if (!strncmp(ptr[j], cap_eq, cap_size + 1)) {
+				free(cap_eq);
+				return &ptr[j][cap_size + 1];
+			}
+		}
+	}
+
+out:
+	free(cap_eq);
+	return NULL;
+}
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 43aab59f7322..4566c51f2fd9 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -43,10 +43,10 @@ struct hybrid_node {
 	char	*cpus;
 };
 
-struct hybrid_cpc_node {
-	int		nr_cpu_pmu_caps;
+struct pmu_caps {
+	int		nr_caps;
 	unsigned int    max_branches;
-	char            **cpu_pmu_caps;
+	char            **caps;
 	char            *pmu_name;
 };
 
@@ -74,7 +74,7 @@ struct perf_env {
 	int			nr_groups;
 	int			nr_cpu_pmu_caps;
 	int			nr_hybrid_nodes;
-	int			nr_hybrid_cpc_nodes;
+	int			nr_pmus_with_caps;
 	char			*cmdline;
 	const char		**cmdline_argv;
 	char			*sibling_cores;
@@ -94,7 +94,7 @@ struct perf_env {
 	struct memory_node	*memory_nodes;
 	unsigned long long	 memory_bsize;
 	struct hybrid_node	*hybrid_nodes;
-	struct hybrid_cpc_node	*hybrid_cpc_nodes;
+	struct pmu_caps		*pmu_caps;
 #ifdef HAVE_LIBBPF_SUPPORT
 	/*
 	 * bpf_info_lock protects bpf rbtrees. This is needed because the
@@ -172,4 +172,6 @@ bool perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node);
 struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id);
 
 int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu);
+char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name,
+			     const char *cap);
 #endif /* __PERF_ENV_H */
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index a1e4ec53333d..67bf897e8f89 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1512,18 +1512,13 @@ static int write_compressed(struct feat_fd *ff __maybe_unused,
 	return do_write(ff, &(ff->ph->env.comp_mmap_len), sizeof(ff->ph->env.comp_mmap_len));
 }
 
-static int write_per_cpu_pmu_caps(struct feat_fd *ff, struct perf_pmu *pmu,
-				  bool write_pmu)
+static int __write_pmu_caps(struct feat_fd *ff, struct perf_pmu *pmu,
+			    bool write_pmu)
 {
 	struct perf_pmu_caps *caps = NULL;
-	int nr_caps;
 	int ret;
 
-	nr_caps = perf_pmu__caps_parse(pmu);
-	if (nr_caps < 0)
-		return nr_caps;
-
-	ret = do_write(ff, &nr_caps, sizeof(nr_caps));
+	ret = do_write(ff, &pmu->nr_caps, sizeof(pmu->nr_caps));
 	if (ret < 0)
 		return ret;
 
@@ -1550,33 +1545,60 @@ static int write_cpu_pmu_caps(struct feat_fd *ff,
 			      struct evlist *evlist __maybe_unused)
 {
 	struct perf_pmu *cpu_pmu = perf_pmu__find("cpu");
+	int ret;
 
 	if (!cpu_pmu)
 		return -ENOENT;
 
-	return write_per_cpu_pmu_caps(ff, cpu_pmu, false);
+	ret = perf_pmu__caps_parse(cpu_pmu);
+	if (ret < 0)
+		return ret;
+
+	return __write_pmu_caps(ff, cpu_pmu, false);
 }
 
-static int write_hybrid_cpu_pmu_caps(struct feat_fd *ff,
-				     struct evlist *evlist __maybe_unused)
+static int write_pmu_caps(struct feat_fd *ff,
+			  struct evlist *evlist __maybe_unused)
 {
-	struct perf_pmu *pmu;
-	u32 nr_pmu = perf_pmu__hybrid_pmu_num();
+	struct perf_pmu *pmu = NULL;
+	int nr_pmu = 0;
 	int ret;
 
-	if (nr_pmu == 0)
-		return -ENOENT;
+	while ((pmu = perf_pmu__scan(pmu))) {
+		if (!pmu->name || !strcmp(pmu->name, "cpu") ||
+		    perf_pmu__caps_parse(pmu) <= 0)
+			continue;
+		nr_pmu++;
+	}
 
 	ret = do_write(ff, &nr_pmu, sizeof(nr_pmu));
 	if (ret < 0)
 		return ret;
 
+	if (!nr_pmu)
+		return 0;
+
+	/*
+	 * Write hybrid pmu caps first to maintain compatibility with
+	 * older perf tool.
+	 */
+	pmu = NULL;
 	perf_pmu__for_each_hybrid_pmu(pmu) {
-		ret = write_per_cpu_pmu_caps(ff, pmu, true);
+		ret = __write_pmu_caps(ff, pmu, true);
 		if (ret < 0)
 			return ret;
 	}
 
+	pmu = NULL;
+	while ((pmu = perf_pmu__scan(pmu))) {
+		if (!pmu->name || !strcmp(pmu->name, "cpu") ||
+		    !pmu->nr_caps || perf_pmu__is_hybrid(pmu->name))
+			continue;
+
+		ret = __write_pmu_caps(ff, pmu, true);
+		if (ret < 0)
+			return ret;
+	}
 	return 0;
 }
 
@@ -2051,8 +2073,7 @@ static void print_compressed(struct feat_fd *ff, FILE *fp)
 		ff->ph->env.comp_level, ff->ph->env.comp_ratio);
 }
 
-static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char **cpu_pmu_caps,
-				   char *pmu_name)
+static void __print_pmu_caps(FILE *fp, int nr_caps, char **caps, char *pmu_name)
 {
 	const char *delimiter = "";
 	int i;
@@ -2064,7 +2085,7 @@ static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char **cpu_pmu_caps,
 
 	fprintf(fp, "# %s pmu capabilities: ", pmu_name);
 	for (i = 0; i < nr_caps; i++) {
-		fprintf(fp, "%s%s", delimiter, cpu_pmu_caps[i]);
+		fprintf(fp, "%s%s", delimiter, caps[i]);
 		delimiter = ", ";
 	}
 
@@ -2073,19 +2094,18 @@ static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char **cpu_pmu_caps,
 
 static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
 {
-	print_per_cpu_pmu_caps(fp, ff->ph->env.nr_cpu_pmu_caps,
-			       ff->ph->env.cpu_pmu_caps, (char *)"cpu");
+	__print_pmu_caps(fp, ff->ph->env.nr_cpu_pmu_caps,
+			 ff->ph->env.cpu_pmu_caps, (char *)"cpu");
 }
 
-static void print_hybrid_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
+static void print_pmu_caps(struct feat_fd *ff, FILE *fp)
 {
-	struct hybrid_cpc_node *n;
+	struct pmu_caps *pmu_caps;
 
-	for (int i = 0; i < ff->ph->env.nr_hybrid_cpc_nodes; i++) {
-		n = &ff->ph->env.hybrid_cpc_nodes[i];
-		print_per_cpu_pmu_caps(fp, n->nr_cpu_pmu_caps,
-				       n->cpu_pmu_caps,
-				       n->pmu_name);
+	for (int i = 0; i < ff->ph->env.nr_pmus_with_caps; i++) {
+		pmu_caps = &ff->ph->env.pmu_caps[i];
+		__print_pmu_caps(fp, pmu_caps->nr_caps, pmu_caps->caps,
+				 pmu_caps->pmu_name);
 	}
 }
 
@@ -3196,28 +3216,27 @@ static int process_compressed(struct feat_fd *ff,
 	return 0;
 }
 
-static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
-				    char ***cpu_pmu_caps,
-				    unsigned int *max_branches)
+static int __process_pmu_caps(struct feat_fd *ff, int *nr_caps,
+			      char ***caps, unsigned int *max_branches)
 {
 	int name_size, value_size;
 	char *name, *value, *ptr;
-	u32 nr_caps, i;
+	u32 nr_pmu_caps, i;
 
-	*nr_cpu_pmu_caps = 0;
-	*cpu_pmu_caps = NULL;
+	*nr_caps = 0;
+	*caps = NULL;
 
-	if (do_read_u32(ff, &nr_caps))
+	if (do_read_u32(ff, &nr_pmu_caps))
 		return -1;
 
-	if (!nr_caps)
+	if (!nr_pmu_caps)
 		return 0;
 
-	*cpu_pmu_caps = zalloc(sizeof(char *) * nr_caps);
-	if (!*cpu_pmu_caps)
+	*caps = zalloc(sizeof(char *) * nr_pmu_caps);
+	if (!*caps)
 		return -1;
 
-	for (i = 0; i < nr_caps; i++) {
+	for (i = 0; i < nr_pmu_caps; i++) {
 		name = do_read_string(ff);
 		if (!name)
 			goto error;
@@ -3235,7 +3254,7 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
 		memcpy(ptr, name, name_size);
 		ptr[name_size] = '=';
 		memcpy(ptr + name_size + 1, value, value_size);
-		(*cpu_pmu_caps)[i] = ptr;
+		(*caps)[i] = ptr;
 
 		if (!strcmp(name, "branches"))
 			*max_branches = atoi(value);
@@ -3243,7 +3262,7 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
 		free(value);
 		free(name);
 	}
-	*nr_cpu_pmu_caps = nr_caps;
+	*nr_caps = nr_pmu_caps;
 	return 0;
 
 free_value:
@@ -3252,29 +3271,28 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
 	free(name);
 error:
 	for (; i > 0; i--)
-		free((*cpu_pmu_caps)[i - 1]);
-	free(*cpu_pmu_caps);
-	*cpu_pmu_caps = NULL;
-	*nr_cpu_pmu_caps = 0;
+		free((*caps)[i - 1]);
+	free(*caps);
+	*caps = NULL;
+	*nr_caps = 0;
 	return -1;
 }
 
 static int process_cpu_pmu_caps(struct feat_fd *ff,
 				void *data __maybe_unused)
 {
-	int ret = process_per_cpu_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
-					&ff->ph->env.cpu_pmu_caps,
-					&ff->ph->env.max_branches);
+	int ret = __process_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
+				     &ff->ph->env.cpu_pmu_caps,
+				     &ff->ph->env.max_branches);
 
 	if (!ret && !ff->ph->env.cpu_pmu_caps)
 		pr_debug("cpu pmu capabilities not available\n");
 	return ret;
 }
 
-static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
-				       void *data __maybe_unused)
+static int process_pmu_caps(struct feat_fd *ff, void *data __maybe_unused)
 {
-	struct hybrid_cpc_node *nodes;
+	struct pmu_caps *pmu_caps;
 	u32 nr_pmu, i;
 	int ret;
 	int j;
@@ -3283,45 +3301,45 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
 		return -1;
 
 	if (!nr_pmu) {
-		pr_debug("hybrid cpu pmu capabilities not available\n");
+		pr_debug("pmu capabilities not available\n");
 		return 0;
 	}
 
-	nodes = zalloc(sizeof(*nodes) * nr_pmu);
-	if (!nodes)
+	pmu_caps = zalloc(sizeof(*pmu_caps) * nr_pmu);
+	if (!pmu_caps)
 		return -ENOMEM;
 
 	for (i = 0; i < nr_pmu; i++) {
-		struct hybrid_cpc_node *n = &nodes[i];
-
-		ret = process_per_cpu_pmu_caps(ff, &n->nr_cpu_pmu_caps,
-					       &n->cpu_pmu_caps,
-					       &n->max_branches);
+		ret = __process_pmu_caps(ff, &pmu_caps[i].nr_caps,
+					 &pmu_caps[i].caps,
+					 &pmu_caps[i].max_branches);
 		if (ret)
 			goto err;
 
-		n->pmu_name = do_read_string(ff);
-		if (!n->pmu_name) {
+		pmu_caps[i].pmu_name = do_read_string(ff);
+		if (!pmu_caps[i].pmu_name) {
 			ret = -1;
 			goto err;
 		}
-		if (!n->nr_cpu_pmu_caps)
-			pr_debug("%s pmu capabilities not available\n", n->pmu_name);
+		if (!pmu_caps[i].nr_caps) {
+			pr_debug("%s pmu capabilities not available\n",
+				 pmu_caps[i].pmu_name);
+		}
 	}
 
-	ff->ph->env.nr_hybrid_cpc_nodes = nr_pmu;
-	ff->ph->env.hybrid_cpc_nodes = nodes;
+	ff->ph->env.nr_pmus_with_caps = nr_pmu;
+	ff->ph->env.pmu_caps = pmu_caps;
 	return 0;
 
 err:
 	for (i = 0; i < nr_pmu; i++) {
-		for (j = 0; j < nodes[i].nr_cpu_pmu_caps; j++)
-			free(nodes[i].cpu_pmu_caps[j]);
-		free(nodes[i].cpu_pmu_caps);
-		free(nodes[i].pmu_name);
+		for (j = 0; j < pmu_caps[i].nr_caps; j++)
+			free(pmu_caps[i].caps[j]);
+		free(pmu_caps[i].caps);
+		free(pmu_caps[i].pmu_name);
 	}
 
-	free(nodes);
+	free(pmu_caps);
 	return ret;
 }
 
@@ -3387,7 +3405,7 @@ const struct perf_header_feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPR(CPU_PMU_CAPS,	cpu_pmu_caps,	false),
 	FEAT_OPR(CLOCK_DATA,	clock_data,	false),
 	FEAT_OPN(HYBRID_TOPOLOGY,	hybrid_topology,	true),
-	FEAT_OPR(HYBRID_CPU_PMU_CAPS,	hybrid_cpu_pmu_caps,	false),
+	FEAT_OPR(PMU_CAPS,	pmu_caps,	false),
 };
 
 struct header_print_data {
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 08563c1f1bff..5790bf556b90 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -46,7 +46,7 @@ enum {
 	HEADER_CPU_PMU_CAPS,
 	HEADER_CLOCK_DATA,
 	HEADER_HYBRID_TOPOLOGY,
-	HEADER_HYBRID_CPU_PMU_CAPS,
+	HEADER_PMU_CAPS,
 	HEADER_LAST_FEATURE,
 	HEADER_FEAT_BITS	= 256,
 };
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 6/8] perf/x86/ibs: Add new IBS register bits into header
  2022-06-01  3:26 [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Ravi Bangoria
                   ` (4 preceding siblings ...)
  2022-06-01  3:26 ` [PATCH v5 5/8] perf headers: Record non-cpu pmu capabilities Ravi Bangoria
@ 2022-06-01  3:26 ` Ravi Bangoria
  2022-06-02 21:48   ` Namhyung Kim
  2022-06-01  3:26 ` [PATCH v5 7/8] perf tool ibs: Sync amd ibs header file Ravi Bangoria
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01  3:26 UTC (permalink / raw)
  To: acme, kan.liang
  Cc: ravi.bangoria, jolsa, irogers, peterz, rrichter, mingo,
	mark.rutland, namhyung, tglx, bp, james.clark, leo.yan, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

IBS support has been enhanced with two new features in upcoming uarch:
1. DataSrc extension and 2. L3 miss filtering. Additional set of bits
has been introduced in IBS registers to exploit these features. Define
these new bits into arch/x86/ header.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 arch/x86/include/asm/amd-ibs.h | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
index aabdbb5ab920..f3eb098d63d4 100644
--- a/arch/x86/include/asm/amd-ibs.h
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -29,7 +29,10 @@ union ibs_fetch_ctl {
 			rand_en:1,	/* 57: random tagging enable */
 			fetch_l2_miss:1,/* 58: L2 miss for sampled fetch
 					 *      (needs IbsFetchComp) */
-			reserved:5;	/* 59-63: reserved */
+			l3_miss_only:1,	/* 59: Collect L3 miss samples only */
+			fetch_oc_miss:1,/* 60: Op cache miss for the sampled fetch */
+			fetch_l3_miss:1,/* 61: L3 cache miss for the sampled fetch */
+			reserved:2;	/* 62-63: reserved */
 	};
 };
 
@@ -38,14 +41,14 @@ union ibs_op_ctl {
 	__u64 val;
 	struct {
 		__u64	opmaxcnt:16,	/* 0-15: periodic op max. count */
-			reserved0:1,	/* 16: reserved */
+			l3_miss_only:1,	/* 16: Collect L3 miss samples only */
 			op_en:1,	/* 17: op sampling enable */
 			op_val:1,	/* 18: op sample valid */
 			cnt_ctl:1,	/* 19: periodic op counter control */
 			opmaxcnt_ext:7,	/* 20-26: upper 7 bits of periodic op maximum count */
-			reserved1:5,	/* 27-31: reserved */
+			reserved0:5,	/* 27-31: reserved */
 			opcurcnt:27,	/* 32-58: periodic op counter current count */
-			reserved2:5;	/* 59-63: reserved */
+			reserved1:5;	/* 59-63: reserved */
 	};
 };
 
@@ -71,11 +74,12 @@ union ibs_op_data {
 union ibs_op_data2 {
 	__u64 val;
 	struct {
-		__u64	data_src:3,	/* 0-2: data source */
+		__u64	data_src_lo:3,	/* 0-2: data source low */
 			reserved0:1,	/* 3: reserved */
 			rmt_node:1,	/* 4: destination node */
 			cache_hit_st:1,	/* 5: cache hit state */
-			reserved1:57;	/* 5-63: reserved */
+			data_src_hi:2,	/* 6-7: data source high */
+			reserved1:56;	/* 8-63: reserved */
 	};
 };
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 7/8] perf tool ibs: Sync amd ibs header file
  2022-06-01  3:26 [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Ravi Bangoria
                   ` (5 preceding siblings ...)
  2022-06-01  3:26 ` [PATCH v5 6/8] perf/x86/ibs: Add new IBS register bits into header Ravi Bangoria
@ 2022-06-01  3:26 ` Ravi Bangoria
  2022-06-01  3:26 ` [PATCH v5 8/8] perf script ibs: Support new IBS bits in raw trace dump Ravi Bangoria
  2022-06-01 14:04 ` [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Liang, Kan
  8 siblings, 0 replies; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01  3:26 UTC (permalink / raw)
  To: acme, kan.liang
  Cc: ravi.bangoria, jolsa, irogers, peterz, rrichter, mingo,
	mark.rutland, namhyung, tglx, bp, james.clark, leo.yan, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

IBS support has been enhanced with two new features in upcoming uarch:
1. DataSrc extension and 2. L3 miss filtering. Additional set of bits
has been introduced in IBS registers to exploit these features.
New bits are already defining in arch/x86/ header. Sync it with tools
header file. Also rename existing ibs_op_data field 'data_src' to
'data_src_lo'.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/arch/x86/include/asm/amd-ibs.h | 16 ++++++++++------
 tools/perf/util/amd-sample-raw.c     |  4 ++--
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/tools/arch/x86/include/asm/amd-ibs.h b/tools/arch/x86/include/asm/amd-ibs.h
index 765e9e752d03..9a3312e12e2e 100644
--- a/tools/arch/x86/include/asm/amd-ibs.h
+++ b/tools/arch/x86/include/asm/amd-ibs.h
@@ -29,7 +29,10 @@ union ibs_fetch_ctl {
 			rand_en:1,	/* 57: random tagging enable */
 			fetch_l2_miss:1,/* 58: L2 miss for sampled fetch
 					 *      (needs IbsFetchComp) */
-			reserved:5;	/* 59-63: reserved */
+			l3_miss_only:1,	/* 59: Collect L3 miss samples only */
+			fetch_oc_miss:1,/* 60: Op cache miss for the sampled fetch */
+			fetch_l3_miss:1,/* 61: L3 cache miss for the sampled fetch */
+			reserved:2;	/* 62-63: reserved */
 	};
 };
 
@@ -38,14 +41,14 @@ union ibs_op_ctl {
 	__u64 val;
 	struct {
 		__u64	opmaxcnt:16,	/* 0-15: periodic op max. count */
-			reserved0:1,	/* 16: reserved */
+			l3_miss_only:1,	/* 16: Collect L3 miss samples only */
 			op_en:1,	/* 17: op sampling enable */
 			op_val:1,	/* 18: op sample valid */
 			cnt_ctl:1,	/* 19: periodic op counter control */
 			opmaxcnt_ext:7,	/* 20-26: upper 7 bits of periodic op maximum count */
-			reserved1:5,	/* 27-31: reserved */
+			reserved0:5,	/* 27-31: reserved */
 			opcurcnt:27,	/* 32-58: periodic op counter current count */
-			reserved2:5;	/* 59-63: reserved */
+			reserved1:5;	/* 59-63: reserved */
 	};
 };
 
@@ -71,11 +74,12 @@ union ibs_op_data {
 union ibs_op_data2 {
 	__u64 val;
 	struct {
-		__u64	data_src:3,	/* 0-2: data source */
+		__u64	data_src_lo:3,	/* 0-2: data source low */
 			reserved0:1,	/* 3: reserved */
 			rmt_node:1,	/* 4: destination node */
 			cache_hit_st:1,	/* 5: cache hit state */
-			reserved1:57;	/* 5-63: reserved */
+			data_src_hi:2,	/* 6-7: data source high */
+			reserved1:56;	/* 8-63: reserved */
 	};
 };
 
diff --git a/tools/perf/util/amd-sample-raw.c b/tools/perf/util/amd-sample-raw.c
index d19d765195c5..3b623ea6ee7e 100644
--- a/tools/perf/util/amd-sample-raw.c
+++ b/tools/perf/util/amd-sample-raw.c
@@ -98,9 +98,9 @@ static void pr_ibs_op_data2(union ibs_op_data2 reg)
 	};
 
 	printf("ibs_op_data2:\t%016llx %sRmtNode %d%s\n", reg.val,
-	       reg.data_src == 2 ? (reg.cache_hit_st ? "CacheHitSt 1=O-State "
+	       reg.data_src_lo == 2 ? (reg.cache_hit_st ? "CacheHitSt 1=O-State "
 						     : "CacheHitSt 0=M-state ") : "",
-	       reg.rmt_node, data_src_str[reg.data_src]);
+	       reg.rmt_node, data_src_str[reg.data_src_lo]);
 }
 
 static void pr_ibs_op_data3(union ibs_op_data3 reg)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 8/8] perf script ibs: Support new IBS bits in raw trace dump
  2022-06-01  3:26 [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Ravi Bangoria
                   ` (6 preceding siblings ...)
  2022-06-01  3:26 ` [PATCH v5 7/8] perf tool ibs: Sync amd ibs header file Ravi Bangoria
@ 2022-06-01  3:26 ` Ravi Bangoria
  2022-06-01 14:04 ` [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Liang, Kan
  8 siblings, 0 replies; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01  3:26 UTC (permalink / raw)
  To: acme, kan.liang
  Cc: ravi.bangoria, jolsa, irogers, peterz, rrichter, mingo,
	mark.rutland, namhyung, tglx, bp, james.clark, leo.yan, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

Interpret Additional set of IBS register bits while doing
perf report/script raw dump.

IBS op pmu ex:
  $ sudo ./perf record -c 130 -a -e ibs_op/l3missonly=1/ --raw-samples
  $ sudo ./perf report -D
  ...
  ibs_op_ctl:     0000004500070008 MaxCnt       128 L3MissOnly 1 En 1
        Val 1 CntCtl 0=cycles CurCnt        69
  ibs_op_data:    0000000000710002 CompToRetCtr     2 TagToRetCtr   113
        BrnRet 0  RipInvalid 0 BrnFuse 0 Microcode 0
  ibs_op_data2:   0000000000000002 CacheHitSt 0=M-state RmtNode 0
        DataSrc 2=A peer cache in a near CCX
  ibs_op_data3:   000000681d1700a1 LdOp 1 StOp 0 DcL1TlbMiss 0
        DcL2TlbMiss 0 DcL1TlbHit2M 0 DcL1TlbHit1G 1 DcL2TlbHit2M 0
        DcMiss 1 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0
        DcMissNoMabAlloc 1 DcLinAddrValid 1 DcPhyAddrValid 1
        DcL2TlbHit1G 0 L2Miss 1 SwPf 0 OpMemWidth  8 bytes
        OpDcMissOpenMemReqs  7 DcMissLat   104 TlbRefillLat     0

IBS Fetch pmu ex:
  $ sudo ./perf record -c 130 -a -e ibs_fetch/l3missonly=1/ --raw-samples
  $ sudo ./perf report -D
  ...
  ibs_fetch_ctl:  3c1f00c700080008 MaxCnt     128 Cnt     128 Lat   199
        En 1 Val 1 Comp 1 IcMiss 1 PhyAddrValid        1 L1TlbPgSz 4KB
        L1TlbMiss 0 L2TlbMiss 0 RandEn 0 L2Miss 1 L3MissOnly 1
        FetchOcMiss 1 FetchL3Miss 1

With the DataSrc extensions, the source of data can be decoded among:
 - Local L3 or other L1/L2 in CCX.
 - A peer cache in a near CCX.
 - Data returned from DRAM.
 - A peer cache in a far CCX.
 - DRAM address map with "long latency" bit set.
 - Data returned from MMIO/Config/PCI/APIC.
 - Extension Memory (S-Link, GenZ, etc - identified by the CS target
    and/or address map at DF's choice).
 - Peer Agent Memory.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/amd-sample-raw.c | 64 +++++++++++++++++++++++++++++---
 1 file changed, 58 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/amd-sample-raw.c b/tools/perf/util/amd-sample-raw.c
index 3b623ea6ee7e..238305868644 100644
--- a/tools/perf/util/amd-sample-raw.c
+++ b/tools/perf/util/amd-sample-raw.c
@@ -18,6 +18,7 @@
 #include "pmu-events/pmu-events.h"
 
 static u32 cpu_family, cpu_model, ibs_fetch_type, ibs_op_type;
+static bool zen4_ibs_extensions;
 
 static void pr_ibs_fetch_ctl(union ibs_fetch_ctl reg)
 {
@@ -39,6 +40,7 @@ static void pr_ibs_fetch_ctl(union ibs_fetch_ctl reg)
 	};
 	const char *ic_miss_str = NULL;
 	const char *l1tlb_pgsz_str = NULL;
+	char l3_miss_str[sizeof(" L3MissOnly _ FetchOcMiss _ FetchL3Miss _")] = "";
 
 	if (cpu_family == 0x19 && cpu_model < 0x10) {
 		/*
@@ -53,12 +55,19 @@ static void pr_ibs_fetch_ctl(union ibs_fetch_ctl reg)
 		ic_miss_str = ic_miss_strs[reg.ic_miss];
 	}
 
+	if (zen4_ibs_extensions) {
+		snprintf(l3_miss_str, sizeof(l3_miss_str),
+			 " L3MissOnly %d FetchOcMiss %d FetchL3Miss %d",
+			 reg.l3_miss_only, reg.fetch_oc_miss, reg.fetch_l3_miss);
+	}
+
 	printf("ibs_fetch_ctl:\t%016llx MaxCnt %7d Cnt %7d Lat %5d En %d Val %d Comp %d%s "
-	       "PhyAddrValid %d%s L1TlbMiss %d L2TlbMiss %d RandEn %d%s\n",
+		"PhyAddrValid %d%s L1TlbMiss %d L2TlbMiss %d RandEn %d%s%s\n",
 		reg.val, reg.fetch_maxcnt << 4, reg.fetch_cnt << 4, reg.fetch_lat,
 		reg.fetch_en, reg.fetch_val, reg.fetch_comp, ic_miss_str ? : "",
 		reg.phy_addr_valid, l1tlb_pgsz_str ? : "", reg.l1tlb_miss, reg.l2tlb_miss,
-		reg.rand_en, reg.fetch_comp ? (reg.fetch_l2_miss ? " L2Miss 1" : " L2Miss 0") : "");
+		reg.rand_en, reg.fetch_comp ? (reg.fetch_l2_miss ? " L2Miss 1" : " L2Miss 0") : "",
+		l3_miss_str);
 }
 
 static void pr_ic_ibs_extd_ctl(union ic_ibs_extd_ctl reg)
@@ -68,9 +77,15 @@ static void pr_ic_ibs_extd_ctl(union ic_ibs_extd_ctl reg)
 
 static void pr_ibs_op_ctl(union ibs_op_ctl reg)
 {
-	printf("ibs_op_ctl:\t%016llx MaxCnt %9d En %d Val %d CntCtl %d=%s CurCnt %9d\n",
-	       reg.val, ((reg.opmaxcnt_ext << 16) | reg.opmaxcnt) << 4, reg.op_en, reg.op_val,
-	       reg.cnt_ctl, reg.cnt_ctl ? "uOps" : "cycles", reg.opcurcnt);
+	char l3_miss_only[sizeof(" L3MissOnly _")] = "";
+
+	if (zen4_ibs_extensions)
+		snprintf(l3_miss_only, sizeof(l3_miss_only), " L3MissOnly %d", reg.l3_miss_only);
+
+	printf("ibs_op_ctl:\t%016llx MaxCnt %9d%s En %d Val %d CntCtl %d=%s CurCnt %9d\n",
+		reg.val, ((reg.opmaxcnt_ext << 16) | reg.opmaxcnt) << 4, l3_miss_only,
+		reg.op_en, reg.op_val, reg.cnt_ctl,
+		reg.cnt_ctl ? "uOps" : "cycles", reg.opcurcnt);
 }
 
 static void pr_ibs_op_data(union ibs_op_data reg)
@@ -84,7 +99,34 @@ static void pr_ibs_op_data(union ibs_op_data reg)
 		reg.op_brn_ret, reg.op_rip_invalid, reg.op_brn_fuse, reg.op_microcode);
 }
 
-static void pr_ibs_op_data2(union ibs_op_data2 reg)
+static void pr_ibs_op_data2_extended(union ibs_op_data2 reg)
+{
+	static const char * const data_src_str[] = {
+		"",
+		" DataSrc 1=Local L3 or other L1/L2 in CCX",
+		" DataSrc 2=A peer cache in a near CCX",
+		" DataSrc 3=Data returned from DRAM",
+		" DataSrc 4=(reserved)",
+		" DataSrc 5=A peer cache in a far CCX",
+		" DataSrc 6=DRAM address map with \"long latency\" bit set",
+		" DataSrc 7=Data returned from MMIO/Config/PCI/APIC",
+		" DataSrc 8=Extension Memory (S-Link, GenZ, etc)",
+		" DataSrc 9=(reserved)",
+		" DataSrc 10=(reserved)",
+		" DataSrc 11=(reserved)",
+		" DataSrc 12=Peer Agent Memory",
+		/* 13 to 31 are reserved. Avoid printing them. */
+	};
+	int data_src = (reg.data_src_hi << 3) | reg.data_src_lo;
+
+	printf("ibs_op_data2:\t%016llx %sRmtNode %d%s\n", reg.val,
+		(data_src == 1 || data_src == 2 || data_src == 5) ?
+			(reg.cache_hit_st ? "CacheHitSt 1=O-State " : "CacheHitSt 0=M-state ") : "",
+		reg.rmt_node,
+		data_src < (int)ARRAY_SIZE(data_src_str) ? data_src_str[data_src] : "");
+}
+
+static void pr_ibs_op_data2_default(union ibs_op_data2 reg)
 {
 	static const char * const data_src_str[] = {
 		"",
@@ -103,6 +145,13 @@ static void pr_ibs_op_data2(union ibs_op_data2 reg)
 	       reg.rmt_node, data_src_str[reg.data_src_lo]);
 }
 
+static void pr_ibs_op_data2(union ibs_op_data2 reg)
+{
+	if (zen4_ibs_extensions)
+		return pr_ibs_op_data2_extended(reg);
+	pr_ibs_op_data2_default(reg);
+}
+
 static void pr_ibs_op_data3(union ibs_op_data3 reg)
 {
 	char l2_miss_str[sizeof(" L2Miss _")] = "";
@@ -279,6 +328,9 @@ bool evlist__has_amd_ibs(struct evlist *evlist)
 		pmu_mapping += strlen(pmu_mapping) + 1 /* '\0' */;
 	}
 
+	if (perf_env__find_pmu_cap(env, "ibs_op", "zen4_ibs_extensions"))
+		zen4_ibs_extensions = 1;
+
 	if (ibs_fetch_type || ibs_op_type) {
 		if (!cpu_family)
 			parse_cpuid(env);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/8] perf tool: Parse pmu caps sysfs only once
  2022-06-01  3:26 ` [PATCH v5 2/8] perf tool: Parse pmu caps sysfs only once Ravi Bangoria
@ 2022-06-01 13:35   ` Liang, Kan
  2022-06-01 13:51     ` Ravi Bangoria
  0 siblings, 1 reply; 26+ messages in thread
From: Liang, Kan @ 2022-06-01 13:35 UTC (permalink / raw)
  To: Ravi Bangoria, acme
  Cc: jolsa, irogers, peterz, rrichter, mingo, mark.rutland, namhyung,
	tglx, bp, james.clark, leo.yan, ak, eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla



On 5/31/2022 11:26 PM, Ravi Bangoria wrote:
> In addition to returning nr_caps, cache it locally in struct perf_pmu.
> Similarly, cache status of whether caps sysfs has already been parsed
> or not. These will help to avoid parsing sysfs every time the function
> gets called.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>   tools/perf/util/pmu.c | 15 +++++++++++----
>   tools/perf/util/pmu.h |  2 ++
>   2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 9a1c7e63e663..0112e1c36418 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -1890,7 +1890,11 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
>   	const char *sysfs = sysfs__mountpoint();
>   	DIR *caps_dir;
>   	struct dirent *evt_ent;
> -	int nr_caps = 0;
> +
> +	if (pmu->caps_initialized)
> +		return pmu->nr_caps;
> +
> +	pmu->nr_caps = 0;
>   
>   	if (!sysfs)
>   		return -1;
> @@ -1898,8 +1902,10 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
>   	snprintf(caps_path, PATH_MAX,
>   		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/caps", sysfs, pmu->name);
>   
> -	if (stat(caps_path, &st) < 0)
> +	if (stat(caps_path, &st) < 0) {
> +		pmu->caps_initialized = true;
>   		return 0;	/* no error if caps does not exist */
> +	}
>   
>   	caps_dir = opendir(caps_path);
>   	if (!caps_dir)
> @@ -1926,13 +1932,14 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
>   			continue;
>   		}
>   
> -		nr_caps++;
> +		pmu->nr_caps++;
>   		fclose(file);
>   	}
>   
>   	closedir(caps_dir);
>   
> -	return nr_caps;
> +	pmu->caps_initialized = true;
> +	return pmu->nr_caps;
>   }
>   
>   void perf_pmu__warn_invalid_config(struct perf_pmu *pmu, __u64 config,
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index 541889fa9f9c..4b45fd8da5a3 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -46,6 +46,8 @@ struct perf_pmu {
>   	struct perf_cpu_map *cpus;
>   	struct list_head format;  /* HEAD struct perf_pmu_format -> list */
>   	struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
> +	bool caps_initialized;
> +	u32 nr_caps;

If they are just used for the cache purpose, I don't think we need to 
add the variables in the struct perf_pmu.

A static variable should be good enough. See sysctl__nmi_watchdog_enabled().

Thanks,
Kan

>   	struct list_head caps;    /* HEAD struct perf_pmu_caps -> list */
>   	struct list_head list;    /* ELEM */
>   	struct list_head hybrid_list;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 3/8] perf headers: Pass "cpu" pmu name while printing caps
  2022-06-01  3:26 ` [PATCH v5 3/8] perf headers: Pass "cpu" pmu name while printing caps Ravi Bangoria
@ 2022-06-01 13:35   ` Liang, Kan
  0 siblings, 0 replies; 26+ messages in thread
From: Liang, Kan @ 2022-06-01 13:35 UTC (permalink / raw)
  To: Ravi Bangoria, acme
  Cc: jolsa, irogers, peterz, rrichter, mingo, mark.rutland, namhyung,
	tglx, bp, james.clark, leo.yan, ak, eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla



On 5/31/2022 11:26 PM, Ravi Bangoria wrote:
> Avoid unnecessary conditional code to check if pmu name is NULL
> or not by passing "cpu" pmu name to the printing function.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>

Thanks,
Kan

> ---
>   tools/perf/util/header.c | 12 +++---------
>   1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 53332da100e8..ee7ccd94e272 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2058,17 +2058,11 @@ static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char *cpu_pmu_caps,
>   	char *str, buf[128];
>   
>   	if (!nr_caps) {
> -		if (!pmu_name)
> -			fprintf(fp, "# cpu pmu capabilities: not available\n");
> -		else
> -			fprintf(fp, "# %s pmu capabilities: not available\n", pmu_name);
> +		fprintf(fp, "# %s pmu capabilities: not available\n", pmu_name);
>   		return;
>   	}
>   
> -	if (!pmu_name)
> -		scnprintf(buf, sizeof(buf), "# cpu pmu capabilities: ");
> -	else
> -		scnprintf(buf, sizeof(buf), "# %s pmu capabilities: ", pmu_name);
> +	scnprintf(buf, sizeof(buf), "# %s pmu capabilities: ", pmu_name);
>   
>   	delimiter = buf;
>   
> @@ -2085,7 +2079,7 @@ static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char *cpu_pmu_caps,
>   static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
>   {
>   	print_per_cpu_pmu_caps(fp, ff->ph->env.nr_cpu_pmu_caps,
> -			       ff->ph->env.cpu_pmu_caps, NULL);
> +			       ff->ph->env.cpu_pmu_caps, (char *)"cpu");
>   }
>   
>   static void print_hybrid_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 4/8] perf headers: Store pmu caps in an array of strings
  2022-06-01  3:26 ` [PATCH v5 4/8] perf headers: Store pmu caps in an array of strings Ravi Bangoria
@ 2022-06-01 13:36   ` Liang, Kan
  2022-06-02 21:37   ` Namhyung Kim
  1 sibling, 0 replies; 26+ messages in thread
From: Liang, Kan @ 2022-06-01 13:36 UTC (permalink / raw)
  To: Ravi Bangoria, acme
  Cc: jolsa, irogers, peterz, rrichter, mingo, mark.rutland, namhyung,
	tglx, bp, james.clark, leo.yan, ak, eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla



On 5/31/2022 11:26 PM, Ravi Bangoria wrote:
> Currently all capabilities are stored in a single string separated
> by NULL character. Instead, store them in an array which makes
> searching of capability easier.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>

Thanks,
Kan
> ---
>   tools/perf/util/env.c    |  6 +++-
>   tools/perf/util/env.h    |  4 +--
>   tools/perf/util/header.c | 70 +++++++++++++++++++++++-----------------
>   3 files changed, 48 insertions(+), 32 deletions(-)
> 
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 579e44c59914..7d3aeb2e4622 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -179,7 +179,7 @@ static void perf_env__purge_bpf(struct perf_env *env __maybe_unused)
>   
>   void perf_env__exit(struct perf_env *env)
>   {
> -	int i;
> +	int i, j;
>   
>   	perf_env__purge_bpf(env);
>   	perf_env__purge_cgroups(env);
> @@ -196,6 +196,8 @@ void perf_env__exit(struct perf_env *env)
>   	zfree(&env->sibling_threads);
>   	zfree(&env->pmu_mappings);
>   	zfree(&env->cpu);
> +	for (i = 0; i < env->nr_cpu_pmu_caps; i++)
> +		zfree(&env->cpu_pmu_caps[i]);
>   	zfree(&env->cpu_pmu_caps);
>   	zfree(&env->numa_map);
>   
> @@ -218,6 +220,8 @@ void perf_env__exit(struct perf_env *env)
>   	zfree(&env->hybrid_nodes);
>   
>   	for (i = 0; i < env->nr_hybrid_cpc_nodes; i++) {
> +		for (j = 0; j < env->hybrid_cpc_nodes[i].nr_cpu_pmu_caps; j++)
> +			zfree(&env->hybrid_cpc_nodes[i].cpu_pmu_caps[j]);
>   		zfree(&env->hybrid_cpc_nodes[i].cpu_pmu_caps);
>   		zfree(&env->hybrid_cpc_nodes[i].pmu_name);
>   	}
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index a3541f98e1fc..43aab59f7322 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -46,7 +46,7 @@ struct hybrid_node {
>   struct hybrid_cpc_node {
>   	int		nr_cpu_pmu_caps;
>   	unsigned int    max_branches;
> -	char            *cpu_pmu_caps;
> +	char            **cpu_pmu_caps;
>   	char            *pmu_name;
>   };
>   
> @@ -81,7 +81,7 @@ struct perf_env {
>   	char			*sibling_dies;
>   	char			*sibling_threads;
>   	char			*pmu_mappings;
> -	char			*cpu_pmu_caps;
> +	char			**cpu_pmu_caps;
>   	struct cpu_topology_map	*cpu;
>   	struct cpu_cache_level	*caches;
>   	int			 caches_cnt;
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index ee7ccd94e272..a1e4ec53333d 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2051,26 +2051,21 @@ static void print_compressed(struct feat_fd *ff, FILE *fp)
>   		ff->ph->env.comp_level, ff->ph->env.comp_ratio);
>   }
>   
> -static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char *cpu_pmu_caps,
> +static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char **cpu_pmu_caps,
>   				   char *pmu_name)
>   {
> -	const char *delimiter;
> -	char *str, buf[128];
> +	const char *delimiter = "";
> +	int i;
>   
>   	if (!nr_caps) {
>   		fprintf(fp, "# %s pmu capabilities: not available\n", pmu_name);
>   		return;
>   	}
>   
> -	scnprintf(buf, sizeof(buf), "# %s pmu capabilities: ", pmu_name);
> -
> -	delimiter = buf;
> -
> -	str = cpu_pmu_caps;
> -	while (nr_caps--) {
> -		fprintf(fp, "%s%s", delimiter, str);
> +	fprintf(fp, "# %s pmu capabilities: ", pmu_name);
> +	for (i = 0; i < nr_caps; i++) {
> +		fprintf(fp, "%s%s", delimiter, cpu_pmu_caps[i]);
>   		delimiter = ", ";
> -		str += strlen(str) + 1;
>   	}
>   
>   	fprintf(fp, "\n");
> @@ -3202,27 +3197,27 @@ static int process_compressed(struct feat_fd *ff,
>   }
>   
>   static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
> -				    char **cpu_pmu_caps,
> +				    char ***cpu_pmu_caps,
>   				    unsigned int *max_branches)
>   {
> -	char *name, *value;
> -	struct strbuf sb;
> -	u32 nr_caps;
> +	int name_size, value_size;
> +	char *name, *value, *ptr;
> +	u32 nr_caps, i;
> +
> +	*nr_cpu_pmu_caps = 0;
> +	*cpu_pmu_caps = NULL;
>   
>   	if (do_read_u32(ff, &nr_caps))
>   		return -1;
>   
> -	if (!nr_caps) {
> -		pr_debug("cpu pmu capabilities not available\n");
> +	if (!nr_caps)
>   		return 0;
> -	}
> -
> -	*nr_cpu_pmu_caps = nr_caps;
>   
> -	if (strbuf_init(&sb, 128) < 0)
> +	*cpu_pmu_caps = zalloc(sizeof(char *) * nr_caps);
> +	if (!*cpu_pmu_caps)
>   		return -1;
>   
> -	while (nr_caps--) {
> +	for (i = 0; i < nr_caps; i++) {
>   		name = do_read_string(ff);
>   		if (!name)
>   			goto error;
> @@ -3231,12 +3226,16 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
>   		if (!value)
>   			goto free_name;
>   
> -		if (strbuf_addf(&sb, "%s=%s", name, value) < 0)
> +		name_size = strlen(name);
> +		value_size = strlen(value);
> +		ptr = zalloc(sizeof(char) * (name_size + value_size + 2));
> +		if (!ptr)
>   			goto free_value;
>   
> -		/* include a NULL character at the end */
> -		if (strbuf_add(&sb, "", 1) < 0)
> -			goto free_value;
> +		memcpy(ptr, name, name_size);
> +		ptr[name_size] = '=';
> +		memcpy(ptr + name_size + 1, value, value_size);
> +		(*cpu_pmu_caps)[i] = ptr;
>   
>   		if (!strcmp(name, "branches"))
>   			*max_branches = atoi(value);
> @@ -3244,7 +3243,7 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
>   		free(value);
>   		free(name);
>   	}
> -	*cpu_pmu_caps = strbuf_detach(&sb, NULL);
> +	*nr_cpu_pmu_caps = nr_caps;
>   	return 0;
>   
>   free_value:
> @@ -3252,16 +3251,24 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
>   free_name:
>   	free(name);
>   error:
> -	strbuf_release(&sb);
> +	for (; i > 0; i--)
> +		free((*cpu_pmu_caps)[i - 1]);
> +	free(*cpu_pmu_caps);
> +	*cpu_pmu_caps = NULL;
> +	*nr_cpu_pmu_caps = 0;
>   	return -1;
>   }
>   
>   static int process_cpu_pmu_caps(struct feat_fd *ff,
>   				void *data __maybe_unused)
>   {
> -	return process_per_cpu_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
> +	int ret = process_per_cpu_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
>   					&ff->ph->env.cpu_pmu_caps,
>   					&ff->ph->env.max_branches);
> +
> +	if (!ret && !ff->ph->env.cpu_pmu_caps)
> +		pr_debug("cpu pmu capabilities not available\n");
> +	return ret;
>   }
>   
>   static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
> @@ -3270,6 +3277,7 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>   	struct hybrid_cpc_node *nodes;
>   	u32 nr_pmu, i;
>   	int ret;
> +	int j;
>   
>   	if (do_read_u32(ff, &nr_pmu))
>   		return -1;
> @@ -3297,6 +3305,8 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>   			ret = -1;
>   			goto err;
>   		}
> +		if (!n->nr_cpu_pmu_caps)
> +			pr_debug("%s pmu capabilities not available\n", n->pmu_name);
>   	}
>   
>   	ff->ph->env.nr_hybrid_cpc_nodes = nr_pmu;
> @@ -3305,6 +3315,8 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>   
>   err:
>   	for (i = 0; i < nr_pmu; i++) {
> +		for (j = 0; j < nodes[i].nr_cpu_pmu_caps; j++)
> +			free(nodes[i].cpu_pmu_caps[j]);
>   		free(nodes[i].cpu_pmu_caps);
>   		free(nodes[i].pmu_name);
>   	}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 5/8] perf headers: Record non-cpu pmu capabilities
  2022-06-01  3:26 ` [PATCH v5 5/8] perf headers: Record non-cpu pmu capabilities Ravi Bangoria
@ 2022-06-01 13:37   ` Liang, Kan
  2022-06-01 13:58     ` Liang, Kan
  0 siblings, 1 reply; 26+ messages in thread
From: Liang, Kan @ 2022-06-01 13:37 UTC (permalink / raw)
  To: Ravi Bangoria, acme
  Cc: jolsa, irogers, peterz, rrichter, mingo, mark.rutland, namhyung,
	tglx, bp, james.clark, leo.yan, ak, eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla



On 5/31/2022 11:26 PM, Ravi Bangoria wrote:
> Pmus advertise their capabilities via sysfs attribute files but
> perf tool currently parses only core(cpu) or hybrid core pmu
> capabilities. Add support of recording non-core pmu capabilities
> int perf.data header.
> 
> Note that a newly proposed HEADER_PMU_CAPS is replacing existing
> HEADER_HYBRID_CPU_PMU_CAPS. Special care is taken for hybrid core
> pmus by writing their capabilities first in the perf.data header
> to make sure new perf.data file being read by old perf tool does
> not break.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>   .../Documentation/perf.data-file-format.txt   |  10 +-
>   tools/perf/builtin-inject.c                   |   2 +-
>   tools/perf/util/env.c                         |  60 ++++++-
>   tools/perf/util/env.h                         |  12 +-
>   tools/perf/util/header.c                      | 160 ++++++++++--------
>   tools/perf/util/header.h                      |   2 +-
>   6 files changed, 158 insertions(+), 88 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt
> index f56d0e0fbff6..2233534e508a 100644
> --- a/tools/perf/Documentation/perf.data-file-format.txt
> +++ b/tools/perf/Documentation/perf.data-file-format.txt
> @@ -419,18 +419,20 @@ Example:
>     cpu_core cpu list : 0-15
>     cpu_atom cpu list : 16-23
>   
> -	HEADER_HYBRID_CPU_PMU_CAPS = 31,
> +	HEADER_PMU_CAPS = 31,
>   
> -	A list of hybrid CPU PMU capabilities.
> +	List of pmu capabilities (except cpu pmu which is already
> +	covered by HEADER_CPU_PMU_CAPS). Note that hybrid cpu pmu
> +	capabilities are also stored here.
>   
>   struct {
>   	u32 nr_pmu;
>   	struct {
> -		u32 nr_cpu_pmu_caps;
> +		u32 nr_caps;
>   		{
>   			char	name[];
>   			char	value[];
> -		} [nr_cpu_pmu_caps];
> +		} [nr_caps];
>   		char pmu_name[];
>   	} [nr_pmu];
>   };
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index a75bf11585b5..05bc0cfccf83 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -809,7 +809,7 @@ static bool keep_feat(int feat)
>   	case HEADER_CPU_PMU_CAPS:
>   	case HEADER_CLOCK_DATA:
>   	case HEADER_HYBRID_TOPOLOGY:
> -	case HEADER_HYBRID_CPU_PMU_CAPS:
> +	case HEADER_PMU_CAPS:
>   		return true;
>   	/* Information that can be updated */
>   	case HEADER_BUILD_ID:
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 7d3aeb2e4622..5b8cf6a421a4 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -219,13 +219,13 @@ void perf_env__exit(struct perf_env *env)
>   	}
>   	zfree(&env->hybrid_nodes);
>   
> -	for (i = 0; i < env->nr_hybrid_cpc_nodes; i++) {
> -		for (j = 0; j < env->hybrid_cpc_nodes[i].nr_cpu_pmu_caps; j++)
> -			zfree(&env->hybrid_cpc_nodes[i].cpu_pmu_caps[j]);
> -		zfree(&env->hybrid_cpc_nodes[i].cpu_pmu_caps);
> -		zfree(&env->hybrid_cpc_nodes[i].pmu_name);
> +	for (i = 0; i < env->nr_pmus_with_caps; i++) {
> +		for (j = 0; j < env->pmu_caps[i].nr_caps; j++)
> +			zfree(&env->pmu_caps[i].caps[j]);
> +		zfree(&env->pmu_caps[i].caps);
> +		zfree(&env->pmu_caps[i].pmu_name);
>   	}
> -	zfree(&env->hybrid_cpc_nodes);
> +	zfree(&env->pmu_caps);
>   }
>   
>   void perf_env__init(struct perf_env *env)
> @@ -531,3 +531,51 @@ int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu)
>   
>   	return cpu.cpu >= 0 && cpu.cpu < env->nr_numa_map ? env->numa_map[cpu.cpu] : -1;
>   }
> +
> +char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name,
> +			     const char *cap)
> +{
> +	char *cap_eq;
> +	int cap_size;
> +	char **ptr;
> +	int i, j;
> +
> +	if (!pmu_name || !cap)
> +		return NULL;
> +
> +	cap_size = strlen(cap);
> +	cap_eq = zalloc(cap_size + 2);
> +	if (!cap_eq)
> +		return NULL;
> +
> +	memcpy(cap_eq, cap, cap_size);
> +	cap_eq[cap_size] = '=';
> +
> +	if (!strcmp(pmu_name, "cpu")) {
> +		for (i = 0; i < env->nr_cpu_pmu_caps; i++) {
> +			if (!strncmp(env->cpu_pmu_caps[i], cap_eq, cap_size + 1)) {
> +				free(cap_eq);
> +				return &env->cpu_pmu_caps[i][cap_size + 1];
> +			}
> +		}
> +		goto out;
> +	}
> +
> +	for (i = 0; i < env->nr_pmus_with_caps; i++) {
> +		if (strcmp(env->pmu_caps[i].pmu_name, pmu_name))
> +			continue;
> +
> +		ptr = env->pmu_caps[i].caps;
> +
> +		for (j = 0; j < env->pmu_caps[i].nr_caps; j++) {
> +			if (!strncmp(ptr[j], cap_eq, cap_size + 1)) {
> +				free(cap_eq);
> +				return &ptr[j][cap_size + 1];
> +			}
> +		}
> +	}
> +
> +out:
> +	free(cap_eq);
> +	return NULL;
> +}
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index 43aab59f7322..4566c51f2fd9 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -43,10 +43,10 @@ struct hybrid_node {
>   	char	*cpus;
>   };
>   
> -struct hybrid_cpc_node {
> -	int		nr_cpu_pmu_caps;
> +struct pmu_caps {
> +	int		nr_caps;
>   	unsigned int    max_branches;
> -	char            **cpu_pmu_caps;
> +	char            **caps;
>   	char            *pmu_name;
>   };
>   
> @@ -74,7 +74,7 @@ struct perf_env {
>   	int			nr_groups;
>   	int			nr_cpu_pmu_caps;
>   	int			nr_hybrid_nodes;
> -	int			nr_hybrid_cpc_nodes;
> +	int			nr_pmus_with_caps;
>   	char			*cmdline;
>   	const char		**cmdline_argv;
>   	char			*sibling_cores;
> @@ -94,7 +94,7 @@ struct perf_env {
>   	struct memory_node	*memory_nodes;
>   	unsigned long long	 memory_bsize;
>   	struct hybrid_node	*hybrid_nodes;
> -	struct hybrid_cpc_node	*hybrid_cpc_nodes;
> +	struct pmu_caps		*pmu_caps;
>   #ifdef HAVE_LIBBPF_SUPPORT
>   	/*
>   	 * bpf_info_lock protects bpf rbtrees. This is needed because the
> @@ -172,4 +172,6 @@ bool perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node);
>   struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id);
>   
>   int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu);
> +char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name,
> +			     const char *cap);
>   #endif /* __PERF_ENV_H */
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index a1e4ec53333d..67bf897e8f89 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -1512,18 +1512,13 @@ static int write_compressed(struct feat_fd *ff __maybe_unused,
>   	return do_write(ff, &(ff->ph->env.comp_mmap_len), sizeof(ff->ph->env.comp_mmap_len));
>   }
>   
> -static int write_per_cpu_pmu_caps(struct feat_fd *ff, struct perf_pmu *pmu,
> -				  bool write_pmu)
> +static int __write_pmu_caps(struct feat_fd *ff, struct perf_pmu *pmu,
> +			    bool write_pmu)
>   {
>   	struct perf_pmu_caps *caps = NULL;
> -	int nr_caps;
>   	int ret;
>   
> -	nr_caps = perf_pmu__caps_parse(pmu);
> -	if (nr_caps < 0)
> -		return nr_caps;
> -
> -	ret = do_write(ff, &nr_caps, sizeof(nr_caps));
> +	ret = do_write(ff, &pmu->nr_caps, sizeof(pmu->nr_caps));

The nr_caps can be retrieved from the cached perf_pmu__caps_parse().
Seems we only need to pass the nr_caps as a parameter of the 
__write_pmu_caps().

>   	if (ret < 0)
>   		return ret;
>   
> @@ -1550,33 +1545,60 @@ static int write_cpu_pmu_caps(struct feat_fd *ff,
>   			      struct evlist *evlist __maybe_unused)
>   {
>   	struct perf_pmu *cpu_pmu = perf_pmu__find("cpu");
> +	int ret;
>   
>   	if (!cpu_pmu)
>   		return -ENOENT;
>   
> -	return write_per_cpu_pmu_caps(ff, cpu_pmu, false);
> +	ret = perf_pmu__caps_parse(cpu_pmu);
> +	if (ret < 0)
> +		return ret;
> +
> +	return __write_pmu_caps(ff, cpu_pmu, false);
>   }
>   
> -static int write_hybrid_cpu_pmu_caps(struct feat_fd *ff,
> -				     struct evlist *evlist __maybe_unused)
> +static int write_pmu_caps(struct feat_fd *ff,
> +			  struct evlist *evlist __maybe_unused)
>   {
> -	struct perf_pmu *pmu;
> -	u32 nr_pmu = perf_pmu__hybrid_pmu_num();
> +	struct perf_pmu *pmu = NULL;
> +	int nr_pmu = 0;
>   	int ret;
>   
> -	if (nr_pmu == 0)
> -		return -ENOENT;
> +	while ((pmu = perf_pmu__scan(pmu))) {
> +		if (!pmu->name || !strcmp(pmu->name, "cpu") ||
> +		    perf_pmu__caps_parse(pmu) <= 0)
> +			continue;
> +		nr_pmu++;
> +	}
>   
>   	ret = do_write(ff, &nr_pmu, sizeof(nr_pmu));
>   	if (ret < 0)
>   		return ret;
>   
> +	if (!nr_pmu)
> +		return 0;
> +
> +	/*
> +	 * Write hybrid pmu caps first to maintain compatibility with
> +	 * older perf tool.
> +	 */
> +	pmu = NULL;
>   	perf_pmu__for_each_hybrid_pmu(pmu) {
> -		ret = write_per_cpu_pmu_caps(ff, pmu, true);
> +		ret = __write_pmu_caps(ff, pmu, true);
>   		if (ret < 0)
>   			return ret;
>   	}
>   
> +	pmu = NULL;
> +	while ((pmu = perf_pmu__scan(pmu))) {
> +		if (!pmu->name || !strcmp(pmu->name, "cpu") ||
> +		    !pmu->nr_caps || perf_pmu__is_hybrid(pmu->name))

The perf_pmu__caps_parse() can used to replace the pmu->nr_caps.

It seems we can factor out a macro to replace the repeat check 
!pmu->name || !strcmp(pmu->name, "cpu") || perf_pmu__caps_parse(pmu) <= 0.

Thanks,
Kan

> +			continue;
> +
> +		ret = __write_pmu_caps(ff, pmu, true);
> +		if (ret < 0)
> +			return ret;
> +	}
>   	return 0;
>   }
>   
> @@ -2051,8 +2073,7 @@ static void print_compressed(struct feat_fd *ff, FILE *fp)
>   		ff->ph->env.comp_level, ff->ph->env.comp_ratio);
>   }
>   
> -static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char **cpu_pmu_caps,
> -				   char *pmu_name)
> +static void __print_pmu_caps(FILE *fp, int nr_caps, char **caps, char *pmu_name)
>   {
>   	const char *delimiter = "";
>   	int i;
> @@ -2064,7 +2085,7 @@ static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char **cpu_pmu_caps,
>   
>   	fprintf(fp, "# %s pmu capabilities: ", pmu_name);
>   	for (i = 0; i < nr_caps; i++) {
> -		fprintf(fp, "%s%s", delimiter, cpu_pmu_caps[i]);
> +		fprintf(fp, "%s%s", delimiter, caps[i]);
>   		delimiter = ", ";
>   	}
>   
> @@ -2073,19 +2094,18 @@ static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char **cpu_pmu_caps,
>   
>   static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
>   {
> -	print_per_cpu_pmu_caps(fp, ff->ph->env.nr_cpu_pmu_caps,
> -			       ff->ph->env.cpu_pmu_caps, (char *)"cpu");
> +	__print_pmu_caps(fp, ff->ph->env.nr_cpu_pmu_caps,
> +			 ff->ph->env.cpu_pmu_caps, (char *)"cpu");
>   }
>   
> -static void print_hybrid_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
> +static void print_pmu_caps(struct feat_fd *ff, FILE *fp)
>   {
> -	struct hybrid_cpc_node *n;
> +	struct pmu_caps *pmu_caps;
>   
> -	for (int i = 0; i < ff->ph->env.nr_hybrid_cpc_nodes; i++) {
> -		n = &ff->ph->env.hybrid_cpc_nodes[i];
> -		print_per_cpu_pmu_caps(fp, n->nr_cpu_pmu_caps,
> -				       n->cpu_pmu_caps,
> -				       n->pmu_name);
> +	for (int i = 0; i < ff->ph->env.nr_pmus_with_caps; i++) {
> +		pmu_caps = &ff->ph->env.pmu_caps[i];
> +		__print_pmu_caps(fp, pmu_caps->nr_caps, pmu_caps->caps,
> +				 pmu_caps->pmu_name);
>   	}
>   }
>   
> @@ -3196,28 +3216,27 @@ static int process_compressed(struct feat_fd *ff,
>   	return 0;
>   }
>   
> -static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
> -				    char ***cpu_pmu_caps,
> -				    unsigned int *max_branches)
> +static int __process_pmu_caps(struct feat_fd *ff, int *nr_caps,
> +			      char ***caps, unsigned int *max_branches)
>   {
>   	int name_size, value_size;
>   	char *name, *value, *ptr;
> -	u32 nr_caps, i;
> +	u32 nr_pmu_caps, i;
>   
> -	*nr_cpu_pmu_caps = 0;
> -	*cpu_pmu_caps = NULL;
> +	*nr_caps = 0;
> +	*caps = NULL;
>   
> -	if (do_read_u32(ff, &nr_caps))
> +	if (do_read_u32(ff, &nr_pmu_caps))
>   		return -1;
>   
> -	if (!nr_caps)
> +	if (!nr_pmu_caps)
>   		return 0;
>   
> -	*cpu_pmu_caps = zalloc(sizeof(char *) * nr_caps);
> -	if (!*cpu_pmu_caps)
> +	*caps = zalloc(sizeof(char *) * nr_pmu_caps);
> +	if (!*caps)
>   		return -1;
>   
> -	for (i = 0; i < nr_caps; i++) {
> +	for (i = 0; i < nr_pmu_caps; i++) {
>   		name = do_read_string(ff);
>   		if (!name)
>   			goto error;
> @@ -3235,7 +3254,7 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
>   		memcpy(ptr, name, name_size);
>   		ptr[name_size] = '=';
>   		memcpy(ptr + name_size + 1, value, value_size);
> -		(*cpu_pmu_caps)[i] = ptr;
> +		(*caps)[i] = ptr;
>   
>   		if (!strcmp(name, "branches"))
>   			*max_branches = atoi(value);
> @@ -3243,7 +3262,7 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
>   		free(value);
>   		free(name);
>   	}
> -	*nr_cpu_pmu_caps = nr_caps;
> +	*nr_caps = nr_pmu_caps;
>   	return 0;
>   
>   free_value:
> @@ -3252,29 +3271,28 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
>   	free(name);
>   error:
>   	for (; i > 0; i--)
> -		free((*cpu_pmu_caps)[i - 1]);
> -	free(*cpu_pmu_caps);
> -	*cpu_pmu_caps = NULL;
> -	*nr_cpu_pmu_caps = 0;
> +		free((*caps)[i - 1]);
> +	free(*caps);
> +	*caps = NULL;
> +	*nr_caps = 0;
>   	return -1;
>   }
>   
>   static int process_cpu_pmu_caps(struct feat_fd *ff,
>   				void *data __maybe_unused)
>   {
> -	int ret = process_per_cpu_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
> -					&ff->ph->env.cpu_pmu_caps,
> -					&ff->ph->env.max_branches);
> +	int ret = __process_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
> +				     &ff->ph->env.cpu_pmu_caps,
> +				     &ff->ph->env.max_branches);
>   
>   	if (!ret && !ff->ph->env.cpu_pmu_caps)
>   		pr_debug("cpu pmu capabilities not available\n");
>   	return ret;
>   }
>   
> -static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
> -				       void *data __maybe_unused)
> +static int process_pmu_caps(struct feat_fd *ff, void *data __maybe_unused)
>   {
> -	struct hybrid_cpc_node *nodes;
> +	struct pmu_caps *pmu_caps;
>   	u32 nr_pmu, i;
>   	int ret;
>   	int j;
> @@ -3283,45 +3301,45 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>   		return -1;
>   
>   	if (!nr_pmu) {
> -		pr_debug("hybrid cpu pmu capabilities not available\n");
> +		pr_debug("pmu capabilities not available\n");
>   		return 0;
>   	}
>   
> -	nodes = zalloc(sizeof(*nodes) * nr_pmu);
> -	if (!nodes)
> +	pmu_caps = zalloc(sizeof(*pmu_caps) * nr_pmu);
> +	if (!pmu_caps)
>   		return -ENOMEM;
>   
>   	for (i = 0; i < nr_pmu; i++) {
> -		struct hybrid_cpc_node *n = &nodes[i];
> -
> -		ret = process_per_cpu_pmu_caps(ff, &n->nr_cpu_pmu_caps,
> -					       &n->cpu_pmu_caps,
> -					       &n->max_branches);
> +		ret = __process_pmu_caps(ff, &pmu_caps[i].nr_caps,
> +					 &pmu_caps[i].caps,
> +					 &pmu_caps[i].max_branches);
>   		if (ret)
>   			goto err;
>   
> -		n->pmu_name = do_read_string(ff);
> -		if (!n->pmu_name) {
> +		pmu_caps[i].pmu_name = do_read_string(ff);
> +		if (!pmu_caps[i].pmu_name) {
>   			ret = -1;
>   			goto err;
>   		}
> -		if (!n->nr_cpu_pmu_caps)
> -			pr_debug("%s pmu capabilities not available\n", n->pmu_name);
> +		if (!pmu_caps[i].nr_caps) {
> +			pr_debug("%s pmu capabilities not available\n",
> +				 pmu_caps[i].pmu_name);
> +		}
>   	}
>   
> -	ff->ph->env.nr_hybrid_cpc_nodes = nr_pmu;
> -	ff->ph->env.hybrid_cpc_nodes = nodes;
> +	ff->ph->env.nr_pmus_with_caps = nr_pmu;
> +	ff->ph->env.pmu_caps = pmu_caps;
>   	return 0;
>   
>   err:
>   	for (i = 0; i < nr_pmu; i++) {
> -		for (j = 0; j < nodes[i].nr_cpu_pmu_caps; j++)
> -			free(nodes[i].cpu_pmu_caps[j]);
> -		free(nodes[i].cpu_pmu_caps);
> -		free(nodes[i].pmu_name);
> +		for (j = 0; j < pmu_caps[i].nr_caps; j++)
> +			free(pmu_caps[i].caps[j]);
> +		free(pmu_caps[i].caps);
> +		free(pmu_caps[i].pmu_name);
>   	}
>   
> -	free(nodes);
> +	free(pmu_caps);
>   	return ret;
>   }
>   
> @@ -3387,7 +3405,7 @@ const struct perf_header_feature_ops feat_ops[HEADER_LAST_FEATURE] = {
>   	FEAT_OPR(CPU_PMU_CAPS,	cpu_pmu_caps,	false),
>   	FEAT_OPR(CLOCK_DATA,	clock_data,	false),
>   	FEAT_OPN(HYBRID_TOPOLOGY,	hybrid_topology,	true),
> -	FEAT_OPR(HYBRID_CPU_PMU_CAPS,	hybrid_cpu_pmu_caps,	false),
> +	FEAT_OPR(PMU_CAPS,	pmu_caps,	false),
>   };
>   
>   struct header_print_data {
> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> index 08563c1f1bff..5790bf556b90 100644
> --- a/tools/perf/util/header.h
> +++ b/tools/perf/util/header.h
> @@ -46,7 +46,7 @@ enum {
>   	HEADER_CPU_PMU_CAPS,
>   	HEADER_CLOCK_DATA,
>   	HEADER_HYBRID_TOPOLOGY,
> -	HEADER_HYBRID_CPU_PMU_CAPS,
> +	HEADER_PMU_CAPS,
>   	HEADER_LAST_FEATURE,
>   	HEADER_FEAT_BITS	= 256,
>   };

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/8] perf tool: Parse pmu caps sysfs only once
  2022-06-01 13:35   ` Liang, Kan
@ 2022-06-01 13:51     ` Ravi Bangoria
  2022-06-01 13:55       ` Liang, Kan
  0 siblings, 1 reply; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01 13:51 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, jolsa, irogers, peterz, rrichter, mingo, mark.rutland,
	namhyung, tglx, bp, james.clark, leo.yan, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria

Hi Kan,

[...]

>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>> index 541889fa9f9c..4b45fd8da5a3 100644
>> --- a/tools/perf/util/pmu.h
>> +++ b/tools/perf/util/pmu.h
>> @@ -46,6 +46,8 @@ struct perf_pmu {
>>       struct perf_cpu_map *cpus;
>>       struct list_head format;  /* HEAD struct perf_pmu_format -> list */
>>       struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
>> +    bool caps_initialized;
>> +    u32 nr_caps;
> 
> If they are just used for the cache purpose, I don't think we need to add the variables in the struct perf_pmu.
> 
> A static variable should be good enough. See sysctl__nmi_watchdog_enabled().

These fields are per pmu. Static variable won't help :) And they are
used in subsequent patches as well.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/8] perf tool: Parse pmu caps sysfs only once
  2022-06-01 13:51     ` Ravi Bangoria
@ 2022-06-01 13:55       ` Liang, Kan
  0 siblings, 0 replies; 26+ messages in thread
From: Liang, Kan @ 2022-06-01 13:55 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: acme, jolsa, irogers, peterz, rrichter, mingo, mark.rutland,
	namhyung, tglx, bp, james.clark, leo.yan, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla



On 6/1/2022 9:51 AM, Ravi Bangoria wrote:
> Hi Kan,
> 
> [...]
> 
>>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>>> index 541889fa9f9c..4b45fd8da5a3 100644
>>> --- a/tools/perf/util/pmu.h
>>> +++ b/tools/perf/util/pmu.h
>>> @@ -46,6 +46,8 @@ struct perf_pmu {
>>>        struct perf_cpu_map *cpus;
>>>        struct list_head format;  /* HEAD struct perf_pmu_format -> list */
>>>        struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
>>> +    bool caps_initialized;
>>> +    u32 nr_caps;
>>
>> If they are just used for the cache purpose, I don't think we need to add the variables in the struct perf_pmu.
>>
>> A static variable should be good enough. See sysctl__nmi_watchdog_enabled().
> 
> These fields are per pmu. Static variable won't help :) And they are
> used in subsequent patches as well.
> 

Ah, I see. Then the patch looks good to me.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>

Thanks,
Kan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 5/8] perf headers: Record non-cpu pmu capabilities
  2022-06-01 13:37   ` Liang, Kan
@ 2022-06-01 13:58     ` Liang, Kan
  0 siblings, 0 replies; 26+ messages in thread
From: Liang, Kan @ 2022-06-01 13:58 UTC (permalink / raw)
  To: Ravi Bangoria, acme
  Cc: jolsa, irogers, peterz, rrichter, mingo, mark.rutland, namhyung,
	tglx, bp, james.clark, leo.yan, ak, eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla



On 6/1/2022 9:37 AM, Liang, Kan wrote:
> 
> 
> On 5/31/2022 11:26 PM, Ravi Bangoria wrote:
>> Pmus advertise their capabilities via sysfs attribute files but
>> perf tool currently parses only core(cpu) or hybrid core pmu
>> capabilities. Add support of recording non-core pmu capabilities
>> int perf.data header.
>>
>> Note that a newly proposed HEADER_PMU_CAPS is replacing existing
>> HEADER_HYBRID_CPU_PMU_CAPS. Special care is taken for hybrid core
>> pmus by writing their capabilities first in the perf.data header
>> to make sure new perf.data file being read by old perf tool does
>> not break.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>

Since the perf_pmu__caps_parse() cached issue has been addressed, the 
patch looks good to me.

	Reviewed-by: Kan Liang <kan.liang@linux.intel.com>

Thanks,
Kan

>> ---
>>   .../Documentation/perf.data-file-format.txt   |  10 +-
>>   tools/perf/builtin-inject.c                   |   2 +-
>>   tools/perf/util/env.c                         |  60 ++++++-
>>   tools/perf/util/env.h                         |  12 +-
>>   tools/perf/util/header.c                      | 160 ++++++++++--------
>>   tools/perf/util/header.h                      |   2 +-
>>   6 files changed, 158 insertions(+), 88 deletions(-)
>>
>> diff --git a/tools/perf/Documentation/perf.data-file-format.txt 
>> b/tools/perf/Documentation/perf.data-file-format.txt
>> index f56d0e0fbff6..2233534e508a 100644
>> --- a/tools/perf/Documentation/perf.data-file-format.txt
>> +++ b/tools/perf/Documentation/perf.data-file-format.txt
>> @@ -419,18 +419,20 @@ Example:
>>     cpu_core cpu list : 0-15
>>     cpu_atom cpu list : 16-23
>> -    HEADER_HYBRID_CPU_PMU_CAPS = 31,
>> +    HEADER_PMU_CAPS = 31,
>> -    A list of hybrid CPU PMU capabilities.
>> +    List of pmu capabilities (except cpu pmu which is already
>> +    covered by HEADER_CPU_PMU_CAPS). Note that hybrid cpu pmu
>> +    capabilities are also stored here.
>>   struct {
>>       u32 nr_pmu;
>>       struct {
>> -        u32 nr_cpu_pmu_caps;
>> +        u32 nr_caps;
>>           {
>>               char    name[];
>>               char    value[];
>> -        } [nr_cpu_pmu_caps];
>> +        } [nr_caps];
>>           char pmu_name[];
>>       } [nr_pmu];
>>   };
>> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
>> index a75bf11585b5..05bc0cfccf83 100644
>> --- a/tools/perf/builtin-inject.c
>> +++ b/tools/perf/builtin-inject.c
>> @@ -809,7 +809,7 @@ static bool keep_feat(int feat)
>>       case HEADER_CPU_PMU_CAPS:
>>       case HEADER_CLOCK_DATA:
>>       case HEADER_HYBRID_TOPOLOGY:
>> -    case HEADER_HYBRID_CPU_PMU_CAPS:
>> +    case HEADER_PMU_CAPS:
>>           return true;
>>       /* Information that can be updated */
>>       case HEADER_BUILD_ID:
>> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
>> index 7d3aeb2e4622..5b8cf6a421a4 100644
>> --- a/tools/perf/util/env.c
>> +++ b/tools/perf/util/env.c
>> @@ -219,13 +219,13 @@ void perf_env__exit(struct perf_env *env)
>>       }
>>       zfree(&env->hybrid_nodes);
>> -    for (i = 0; i < env->nr_hybrid_cpc_nodes; i++) {
>> -        for (j = 0; j < env->hybrid_cpc_nodes[i].nr_cpu_pmu_caps; j++)
>> -            zfree(&env->hybrid_cpc_nodes[i].cpu_pmu_caps[j]);
>> -        zfree(&env->hybrid_cpc_nodes[i].cpu_pmu_caps);
>> -        zfree(&env->hybrid_cpc_nodes[i].pmu_name);
>> +    for (i = 0; i < env->nr_pmus_with_caps; i++) {
>> +        for (j = 0; j < env->pmu_caps[i].nr_caps; j++)
>> +            zfree(&env->pmu_caps[i].caps[j]);
>> +        zfree(&env->pmu_caps[i].caps);
>> +        zfree(&env->pmu_caps[i].pmu_name);
>>       }
>> -    zfree(&env->hybrid_cpc_nodes);
>> +    zfree(&env->pmu_caps);
>>   }
>>   void perf_env__init(struct perf_env *env)
>> @@ -531,3 +531,51 @@ int perf_env__numa_node(struct perf_env *env, 
>> struct perf_cpu cpu)
>>       return cpu.cpu >= 0 && cpu.cpu < env->nr_numa_map ? 
>> env->numa_map[cpu.cpu] : -1;
>>   }
>> +
>> +char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name,
>> +                 const char *cap)
>> +{
>> +    char *cap_eq;
>> +    int cap_size;
>> +    char **ptr;
>> +    int i, j;
>> +
>> +    if (!pmu_name || !cap)
>> +        return NULL;
>> +
>> +    cap_size = strlen(cap);
>> +    cap_eq = zalloc(cap_size + 2);
>> +    if (!cap_eq)
>> +        return NULL;
>> +
>> +    memcpy(cap_eq, cap, cap_size);
>> +    cap_eq[cap_size] = '=';
>> +
>> +    if (!strcmp(pmu_name, "cpu")) {
>> +        for (i = 0; i < env->nr_cpu_pmu_caps; i++) {
>> +            if (!strncmp(env->cpu_pmu_caps[i], cap_eq, cap_size + 1)) {
>> +                free(cap_eq);
>> +                return &env->cpu_pmu_caps[i][cap_size + 1];
>> +            }
>> +        }
>> +        goto out;
>> +    }
>> +
>> +    for (i = 0; i < env->nr_pmus_with_caps; i++) {
>> +        if (strcmp(env->pmu_caps[i].pmu_name, pmu_name))
>> +            continue;
>> +
>> +        ptr = env->pmu_caps[i].caps;
>> +
>> +        for (j = 0; j < env->pmu_caps[i].nr_caps; j++) {
>> +            if (!strncmp(ptr[j], cap_eq, cap_size + 1)) {
>> +                free(cap_eq);
>> +                return &ptr[j][cap_size + 1];
>> +            }
>> +        }
>> +    }
>> +
>> +out:
>> +    free(cap_eq);
>> +    return NULL;
>> +}
>> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
>> index 43aab59f7322..4566c51f2fd9 100644
>> --- a/tools/perf/util/env.h
>> +++ b/tools/perf/util/env.h
>> @@ -43,10 +43,10 @@ struct hybrid_node {
>>       char    *cpus;
>>   };
>> -struct hybrid_cpc_node {
>> -    int        nr_cpu_pmu_caps;
>> +struct pmu_caps {
>> +    int        nr_caps;
>>       unsigned int    max_branches;
>> -    char            **cpu_pmu_caps;
>> +    char            **caps;
>>       char            *pmu_name;
>>   };
>> @@ -74,7 +74,7 @@ struct perf_env {
>>       int            nr_groups;
>>       int            nr_cpu_pmu_caps;
>>       int            nr_hybrid_nodes;
>> -    int            nr_hybrid_cpc_nodes;
>> +    int            nr_pmus_with_caps;
>>       char            *cmdline;
>>       const char        **cmdline_argv;
>>       char            *sibling_cores;
>> @@ -94,7 +94,7 @@ struct perf_env {
>>       struct memory_node    *memory_nodes;
>>       unsigned long long     memory_bsize;
>>       struct hybrid_node    *hybrid_nodes;
>> -    struct hybrid_cpc_node    *hybrid_cpc_nodes;
>> +    struct pmu_caps        *pmu_caps;
>>   #ifdef HAVE_LIBBPF_SUPPORT
>>       /*
>>        * bpf_info_lock protects bpf rbtrees. This is needed because the
>> @@ -172,4 +172,6 @@ bool perf_env__insert_btf(struct perf_env *env, 
>> struct btf_node *btf_node);
>>   struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 
>> btf_id);
>>   int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu);
>> +char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name,
>> +                 const char *cap);
>>   #endif /* __PERF_ENV_H */
>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
>> index a1e4ec53333d..67bf897e8f89 100644
>> --- a/tools/perf/util/header.c
>> +++ b/tools/perf/util/header.c
>> @@ -1512,18 +1512,13 @@ static int write_compressed(struct feat_fd *ff 
>> __maybe_unused,
>>       return do_write(ff, &(ff->ph->env.comp_mmap_len), 
>> sizeof(ff->ph->env.comp_mmap_len));
>>   }
>> -static int write_per_cpu_pmu_caps(struct feat_fd *ff, struct perf_pmu 
>> *pmu,
>> -                  bool write_pmu)
>> +static int __write_pmu_caps(struct feat_fd *ff, struct perf_pmu *pmu,
>> +                bool write_pmu)
>>   {
>>       struct perf_pmu_caps *caps = NULL;
>> -    int nr_caps;
>>       int ret;
>> -    nr_caps = perf_pmu__caps_parse(pmu);
>> -    if (nr_caps < 0)
>> -        return nr_caps;
>> -
>> -    ret = do_write(ff, &nr_caps, sizeof(nr_caps));
>> +    ret = do_write(ff, &pmu->nr_caps, sizeof(pmu->nr_caps));
> 
> The nr_caps can be retrieved from the cached perf_pmu__caps_parse().
> Seems we only need to pass the nr_caps as a parameter of the 
> __write_pmu_caps().
> 
>>       if (ret < 0)
>>           return ret;
>> @@ -1550,33 +1545,60 @@ static int write_cpu_pmu_caps(struct feat_fd *ff,
>>                     struct evlist *evlist __maybe_unused)
>>   {
>>       struct perf_pmu *cpu_pmu = perf_pmu__find("cpu");
>> +    int ret;
>>       if (!cpu_pmu)
>>           return -ENOENT;
>> -    return write_per_cpu_pmu_caps(ff, cpu_pmu, false);
>> +    ret = perf_pmu__caps_parse(cpu_pmu);
>> +    if (ret < 0)
>> +        return ret;
>> +
>> +    return __write_pmu_caps(ff, cpu_pmu, false);
>>   }
>> -static int write_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>> -                     struct evlist *evlist __maybe_unused)
>> +static int write_pmu_caps(struct feat_fd *ff,
>> +              struct evlist *evlist __maybe_unused)
>>   {
>> -    struct perf_pmu *pmu;
>> -    u32 nr_pmu = perf_pmu__hybrid_pmu_num();
>> +    struct perf_pmu *pmu = NULL;
>> +    int nr_pmu = 0;
>>       int ret;
>> -    if (nr_pmu == 0)
>> -        return -ENOENT;
>> +    while ((pmu = perf_pmu__scan(pmu))) {
>> +        if (!pmu->name || !strcmp(pmu->name, "cpu") ||
>> +            perf_pmu__caps_parse(pmu) <= 0)
>> +            continue;
>> +        nr_pmu++;
>> +    }
>>       ret = do_write(ff, &nr_pmu, sizeof(nr_pmu));
>>       if (ret < 0)
>>           return ret;
>> +    if (!nr_pmu)
>> +        return 0;
>> +
>> +    /*
>> +     * Write hybrid pmu caps first to maintain compatibility with
>> +     * older perf tool.
>> +     */
>> +    pmu = NULL;
>>       perf_pmu__for_each_hybrid_pmu(pmu) {
>> -        ret = write_per_cpu_pmu_caps(ff, pmu, true);
>> +        ret = __write_pmu_caps(ff, pmu, true);
>>           if (ret < 0)
>>               return ret;
>>       }
>> +    pmu = NULL;
>> +    while ((pmu = perf_pmu__scan(pmu))) {
>> +        if (!pmu->name || !strcmp(pmu->name, "cpu") ||
>> +            !pmu->nr_caps || perf_pmu__is_hybrid(pmu->name))
> 
> The perf_pmu__caps_parse() can used to replace the pmu->nr_caps.
> 
> It seems we can factor out a macro to replace the repeat check 
> !pmu->name || !strcmp(pmu->name, "cpu") || perf_pmu__caps_parse(pmu) <= 0.
> 
> Thanks,
> Kan
> 
>> +            continue;
>> +
>> +        ret = __write_pmu_caps(ff, pmu, true);
>> +        if (ret < 0)
>> +            return ret;
>> +    }
>>       return 0;
>>   }
>> @@ -2051,8 +2073,7 @@ static void print_compressed(struct feat_fd *ff, 
>> FILE *fp)
>>           ff->ph->env.comp_level, ff->ph->env.comp_ratio);
>>   }
>> -static void print_per_cpu_pmu_caps(FILE *fp, int nr_caps, char 
>> **cpu_pmu_caps,
>> -                   char *pmu_name)
>> +static void __print_pmu_caps(FILE *fp, int nr_caps, char **caps, char 
>> *pmu_name)
>>   {
>>       const char *delimiter = "";
>>       int i;
>> @@ -2064,7 +2085,7 @@ static void print_per_cpu_pmu_caps(FILE *fp, int 
>> nr_caps, char **cpu_pmu_caps,
>>       fprintf(fp, "# %s pmu capabilities: ", pmu_name);
>>       for (i = 0; i < nr_caps; i++) {
>> -        fprintf(fp, "%s%s", delimiter, cpu_pmu_caps[i]);
>> +        fprintf(fp, "%s%s", delimiter, caps[i]);
>>           delimiter = ", ";
>>       }
>> @@ -2073,19 +2094,18 @@ static void print_per_cpu_pmu_caps(FILE *fp, 
>> int nr_caps, char **cpu_pmu_caps,
>>   static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
>>   {
>> -    print_per_cpu_pmu_caps(fp, ff->ph->env.nr_cpu_pmu_caps,
>> -                   ff->ph->env.cpu_pmu_caps, (char *)"cpu");
>> +    __print_pmu_caps(fp, ff->ph->env.nr_cpu_pmu_caps,
>> +             ff->ph->env.cpu_pmu_caps, (char *)"cpu");
>>   }
>> -static void print_hybrid_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
>> +static void print_pmu_caps(struct feat_fd *ff, FILE *fp)
>>   {
>> -    struct hybrid_cpc_node *n;
>> +    struct pmu_caps *pmu_caps;
>> -    for (int i = 0; i < ff->ph->env.nr_hybrid_cpc_nodes; i++) {
>> -        n = &ff->ph->env.hybrid_cpc_nodes[i];
>> -        print_per_cpu_pmu_caps(fp, n->nr_cpu_pmu_caps,
>> -                       n->cpu_pmu_caps,
>> -                       n->pmu_name);
>> +    for (int i = 0; i < ff->ph->env.nr_pmus_with_caps; i++) {
>> +        pmu_caps = &ff->ph->env.pmu_caps[i];
>> +        __print_pmu_caps(fp, pmu_caps->nr_caps, pmu_caps->caps,
>> +                 pmu_caps->pmu_name);
>>       }
>>   }
>> @@ -3196,28 +3216,27 @@ static int process_compressed(struct feat_fd *ff,
>>       return 0;
>>   }
>> -static int process_per_cpu_pmu_caps(struct feat_fd *ff, int 
>> *nr_cpu_pmu_caps,
>> -                    char ***cpu_pmu_caps,
>> -                    unsigned int *max_branches)
>> +static int __process_pmu_caps(struct feat_fd *ff, int *nr_caps,
>> +                  char ***caps, unsigned int *max_branches)
>>   {
>>       int name_size, value_size;
>>       char *name, *value, *ptr;
>> -    u32 nr_caps, i;
>> +    u32 nr_pmu_caps, i;
>> -    *nr_cpu_pmu_caps = 0;
>> -    *cpu_pmu_caps = NULL;
>> +    *nr_caps = 0;
>> +    *caps = NULL;
>> -    if (do_read_u32(ff, &nr_caps))
>> +    if (do_read_u32(ff, &nr_pmu_caps))
>>           return -1;
>> -    if (!nr_caps)
>> +    if (!nr_pmu_caps)
>>           return 0;
>> -    *cpu_pmu_caps = zalloc(sizeof(char *) * nr_caps);
>> -    if (!*cpu_pmu_caps)
>> +    *caps = zalloc(sizeof(char *) * nr_pmu_caps);
>> +    if (!*caps)
>>           return -1;
>> -    for (i = 0; i < nr_caps; i++) {
>> +    for (i = 0; i < nr_pmu_caps; i++) {
>>           name = do_read_string(ff);
>>           if (!name)
>>               goto error;
>> @@ -3235,7 +3254,7 @@ static int process_per_cpu_pmu_caps(struct 
>> feat_fd *ff, int *nr_cpu_pmu_caps,
>>           memcpy(ptr, name, name_size);
>>           ptr[name_size] = '=';
>>           memcpy(ptr + name_size + 1, value, value_size);
>> -        (*cpu_pmu_caps)[i] = ptr;
>> +        (*caps)[i] = ptr;
>>           if (!strcmp(name, "branches"))
>>               *max_branches = atoi(value);
>> @@ -3243,7 +3262,7 @@ static int process_per_cpu_pmu_caps(struct 
>> feat_fd *ff, int *nr_cpu_pmu_caps,
>>           free(value);
>>           free(name);
>>       }
>> -    *nr_cpu_pmu_caps = nr_caps;
>> +    *nr_caps = nr_pmu_caps;
>>       return 0;
>>   free_value:
>> @@ -3252,29 +3271,28 @@ static int process_per_cpu_pmu_caps(struct 
>> feat_fd *ff, int *nr_cpu_pmu_caps,
>>       free(name);
>>   error:
>>       for (; i > 0; i--)
>> -        free((*cpu_pmu_caps)[i - 1]);
>> -    free(*cpu_pmu_caps);
>> -    *cpu_pmu_caps = NULL;
>> -    *nr_cpu_pmu_caps = 0;
>> +        free((*caps)[i - 1]);
>> +    free(*caps);
>> +    *caps = NULL;
>> +    *nr_caps = 0;
>>       return -1;
>>   }
>>   static int process_cpu_pmu_caps(struct feat_fd *ff,
>>                   void *data __maybe_unused)
>>   {
>> -    int ret = process_per_cpu_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
>> -                    &ff->ph->env.cpu_pmu_caps,
>> -                    &ff->ph->env.max_branches);
>> +    int ret = __process_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
>> +                     &ff->ph->env.cpu_pmu_caps,
>> +                     &ff->ph->env.max_branches);
>>       if (!ret && !ff->ph->env.cpu_pmu_caps)
>>           pr_debug("cpu pmu capabilities not available\n");
>>       return ret;
>>   }
>> -static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>> -                       void *data __maybe_unused)
>> +static int process_pmu_caps(struct feat_fd *ff, void *data 
>> __maybe_unused)
>>   {
>> -    struct hybrid_cpc_node *nodes;
>> +    struct pmu_caps *pmu_caps;
>>       u32 nr_pmu, i;
>>       int ret;
>>       int j;
>> @@ -3283,45 +3301,45 @@ static int process_hybrid_cpu_pmu_caps(struct 
>> feat_fd *ff,
>>           return -1;
>>       if (!nr_pmu) {
>> -        pr_debug("hybrid cpu pmu capabilities not available\n");
>> +        pr_debug("pmu capabilities not available\n");
>>           return 0;
>>       }
>> -    nodes = zalloc(sizeof(*nodes) * nr_pmu);
>> -    if (!nodes)
>> +    pmu_caps = zalloc(sizeof(*pmu_caps) * nr_pmu);
>> +    if (!pmu_caps)
>>           return -ENOMEM;
>>       for (i = 0; i < nr_pmu; i++) {
>> -        struct hybrid_cpc_node *n = &nodes[i];
>> -
>> -        ret = process_per_cpu_pmu_caps(ff, &n->nr_cpu_pmu_caps,
>> -                           &n->cpu_pmu_caps,
>> -                           &n->max_branches);
>> +        ret = __process_pmu_caps(ff, &pmu_caps[i].nr_caps,
>> +                     &pmu_caps[i].caps,
>> +                     &pmu_caps[i].max_branches);
>>           if (ret)
>>               goto err;
>> -        n->pmu_name = do_read_string(ff);
>> -        if (!n->pmu_name) {
>> +        pmu_caps[i].pmu_name = do_read_string(ff);
>> +        if (!pmu_caps[i].pmu_name) {
>>               ret = -1;
>>               goto err;
>>           }
>> -        if (!n->nr_cpu_pmu_caps)
>> -            pr_debug("%s pmu capabilities not available\n", 
>> n->pmu_name);
>> +        if (!pmu_caps[i].nr_caps) {
>> +            pr_debug("%s pmu capabilities not available\n",
>> +                 pmu_caps[i].pmu_name);
>> +        }
>>       }
>> -    ff->ph->env.nr_hybrid_cpc_nodes = nr_pmu;
>> -    ff->ph->env.hybrid_cpc_nodes = nodes;
>> +    ff->ph->env.nr_pmus_with_caps = nr_pmu;
>> +    ff->ph->env.pmu_caps = pmu_caps;
>>       return 0;
>>   err:
>>       for (i = 0; i < nr_pmu; i++) {
>> -        for (j = 0; j < nodes[i].nr_cpu_pmu_caps; j++)
>> -            free(nodes[i].cpu_pmu_caps[j]);
>> -        free(nodes[i].cpu_pmu_caps);
>> -        free(nodes[i].pmu_name);
>> +        for (j = 0; j < pmu_caps[i].nr_caps; j++)
>> +            free(pmu_caps[i].caps[j]);
>> +        free(pmu_caps[i].caps);
>> +        free(pmu_caps[i].pmu_name);
>>       }
>> -    free(nodes);
>> +    free(pmu_caps);
>>       return ret;
>>   }
>> @@ -3387,7 +3405,7 @@ const struct perf_header_feature_ops 
>> feat_ops[HEADER_LAST_FEATURE] = {
>>       FEAT_OPR(CPU_PMU_CAPS,    cpu_pmu_caps,    false),
>>       FEAT_OPR(CLOCK_DATA,    clock_data,    false),
>>       FEAT_OPN(HYBRID_TOPOLOGY,    hybrid_topology,    true),
>> -    FEAT_OPR(HYBRID_CPU_PMU_CAPS,    hybrid_cpu_pmu_caps,    false),
>> +    FEAT_OPR(PMU_CAPS,    pmu_caps,    false),
>>   };
>>   struct header_print_data {
>> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
>> index 08563c1f1bff..5790bf556b90 100644
>> --- a/tools/perf/util/header.h
>> +++ b/tools/perf/util/header.h
>> @@ -46,7 +46,7 @@ enum {
>>       HEADER_CPU_PMU_CAPS,
>>       HEADER_CLOCK_DATA,
>>       HEADER_HYBRID_TOPOLOGY,
>> -    HEADER_HYBRID_CPU_PMU_CAPS,
>> +    HEADER_PMU_CAPS,
>>       HEADER_LAST_FEATURE,
>>       HEADER_FEAT_BITS    = 256,
>>   };

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes)
  2022-06-01  3:26 [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Ravi Bangoria
                   ` (7 preceding siblings ...)
  2022-06-01  3:26 ` [PATCH v5 8/8] perf script ibs: Support new IBS bits in raw trace dump Ravi Bangoria
@ 2022-06-01 14:04 ` Liang, Kan
  2022-06-01 14:10   ` Ravi Bangoria
  8 siblings, 1 reply; 26+ messages in thread
From: Liang, Kan @ 2022-06-01 14:04 UTC (permalink / raw)
  To: Ravi Bangoria, acme
  Cc: jolsa, irogers, peterz, rrichter, mingo, mark.rutland, namhyung,
	tglx, bp, james.clark, leo.yan, ak, eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla



On 5/31/2022 11:26 PM, Ravi Bangoria wrote:
> Kernel side of changes have already been applied to linus/master
> (except amd-ibs.h header). This series contains perf tool changes.
> 
> Kan, I don't have any machine with heterogeneou cpus. It would be
> helpful if you can check HEADER_PMU_CAPS on Intel ADL machine.
>

I tried the patch 2-5 on a hybrid machine. I didn't see any regression 
with perf report --header-only option.

Without the patch 2-5,
# perf report --header-only | grep capabilities
# cpu_core pmu capabilities: branches=32, max_precise=3, 
pmu_name=alderlake_hybrid
# cpu_atom pmu capabilities: branches=32, max_precise=3, 
pmu_name=alderlake_hybrid

With the patch 2-5,
# ./perf report --header-only | grep capabilities
# cpu_core pmu capabilities: branches=32, max_precise=3, 
pmu_name=alderlake_hybrid
# cpu_atom pmu capabilities: branches=32, max_precise=3, 
pmu_name=alderlake_hybrid


Thanks,
Kan


> v4: https://lore.kernel.org/lkml/20220523033945.1612-1-ravi.bangoria@amd.com
> v4->v5:
>   - Replace HEADER_HYBRID_CPU_PMU_CAPS with HEADER_PMU_CAPS instead of
>     adding new header HEADER_PMU_CAPS. Special care is taken by writing
>     hybrid cpu pmu caps first in the header to make sure old perf tool
>     does not break.
>   - Store HEADER_CPU_PMU_CAPS capabilities in an array instead of single
>     string separated by NULL.
>   - Include "cpu" pmu while searching for capabilities in perf_env.
>   - Rebase on acme/perf/core (9dde6cadb92b5)
> 
> Original cover letter:
> 
> IBS support has been enhanced with two new features in upcoming uarch:
> 1. DataSrc extension and 2. L3 Miss Filtering capability. Both are
> indicated by CPUID_Fn8000001B_EAX bit 11.
> 
> DataSrc extension provides additional data source details for tagged
> load/store operations. Add support for these new bits in perf report/
> script raw-dump.
> 
> IBS L3 miss filtering works by tagging an instruction on IBS counter
> overflow and generating an NMI if the tagged instruction causes an L3
> miss. Samples without an L3 miss are discarded and counter is reset
> with random value (between 1-15 for fetch pmu and 1-127 for op pmu).
> This helps in reducing sampling overhead when user is interested only
> in such samples. One of the use case of such filtered samples is to
> feed data to page-migration daemon in tiered memory systems.
> 
> Add support for L3 miss filtering in IBS driver via new pmu attribute
> "l3missonly". Example usage:
> 
>    # perf record -a -e ibs_op/l3missonly=1/ --raw-samples sleep 5
>    # perf report -D
> 
> Some important points to keep in mind while using L3 miss filtering:
> 1. Hw internally reset sampling period when tagged instruction does
>     not cause L3 miss. But there is no way to reconstruct aggregated
>     sampling period when this happens.
> 2. L3 miss is not the actual event being counted. Rather, IBS will
>     count fetch, cycles or uOps depending on the configuration. Thus
>     sampling period have no direct connection to L3 misses.
> 
> 1st causes sampling period skew. Thus, I've added warning message at
> perf record:
> 
>    # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/
>    WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled
>    and tagged operation does not cause L3 Miss. This causes sampling period skew.
> 
> User can configure smaller sampling period to get more samples while
> using l3missonly.
> 
> 
> Ravi Bangoria (8):
>    perf record ibs: Warn about sampling period skew
>    perf tool: Parse pmu caps sysfs only once
>    perf headers: Pass "cpu" pmu name while printing caps
>    perf headers: Store pmu caps in an array of strings
>    perf headers: Record non-cpu pmu capabilities
>    perf/x86/ibs: Add new IBS register bits into header
>    perf tool ibs: Sync amd ibs header file
>    perf script ibs: Support new IBS bits in raw trace dump
> 
>   arch/x86/include/asm/amd-ibs.h                |  16 +-
>   tools/arch/x86/include/asm/amd-ibs.h          |  16 +-
>   .../Documentation/perf.data-file-format.txt   |  10 +-
>   tools/perf/arch/x86/util/evsel.c              |  49 +++++
>   tools/perf/builtin-inject.c                   |   2 +-
>   tools/perf/util/amd-sample-raw.c              |  68 +++++-
>   tools/perf/util/env.c                         |  62 +++++-
>   tools/perf/util/env.h                         |  14 +-
>   tools/perf/util/evsel.c                       |   7 +
>   tools/perf/util/evsel.h                       |   1 +
>   tools/perf/util/header.c                      | 196 ++++++++++--------
>   tools/perf/util/header.h                      |   2 +-
>   tools/perf/util/pmu.c                         |  15 +-
>   tools/perf/util/pmu.h                         |   2 +
>   14 files changed, 333 insertions(+), 127 deletions(-)
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes)
  2022-06-01 14:04 ` [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Liang, Kan
@ 2022-06-01 14:10   ` Ravi Bangoria
  0 siblings, 0 replies; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-01 14:10 UTC (permalink / raw)
  To: Liang, Kan, acme
  Cc: jolsa, irogers, peterz, rrichter, mingo, mark.rutland, namhyung,
	tglx, bp, james.clark, leo.yan, ak, eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla, Ravi Bangoria


On 01-Jun-22 7:34 PM, Liang, Kan wrote:
> 
> 
> On 5/31/2022 11:26 PM, Ravi Bangoria wrote:
>> Kernel side of changes have already been applied to linus/master
>> (except amd-ibs.h header). This series contains perf tool changes.
>>
>> Kan, I don't have any machine with heterogeneou cpus. It would be
>> helpful if you can check HEADER_PMU_CAPS on Intel ADL machine.
>>
> 
> I tried the patch 2-5 on a hybrid machine. I didn't see any regression with perf report --header-only option.
> 
> Without the patch 2-5,
> # perf report --header-only | grep capabilities
> # cpu_core pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid
> # cpu_atom pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid
> 
> With the patch 2-5,
> # ./perf report --header-only | grep capabilities
> # cpu_core pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid
> # cpu_atom pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid
> 

Perfect! Thanks for testing, Kan.

Arnaldo, since kernel patches are already applied to linus' tree for -rc1,
would you be able to include this series in your -rc1 PR?

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/8] perf record ibs: Warn about sampling period skew
  2022-06-01  3:26 ` [PATCH v5 1/8] perf record ibs: Warn about sampling period skew Ravi Bangoria
@ 2022-06-02 20:30   ` Namhyung Kim
  2022-06-03  5:12     ` Ravi Bangoria
  0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2022-06-02 20:30 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Ian Rogers,
	Peter Zijlstra, rrichter, Ingo Molnar, Mark Rutland,
	Thomas Gleixner, Borislav Petkov, James Clark, Leo Yan,
	Andi Kleen, Stephane Eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, Sandipan Das, ananth.narayan,
	Kim Phillips, santosh.shukla

Hi Ravi,

On Tue, May 31, 2022 at 8:27 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Samples without an L3 miss are discarded and counter is reset with
> random value (between 1-15 for fetch pmu and 1-127 for op pmu) when
> IBS L3 miss filtering is enabled. This causes a sampling period skew
> but there is no way to reconstruct aggregated sampling period. So
> print a warning at perf record if user sets l3missonly=1.
>
> Ex:
>   # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/
>   WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled
>   and tagged operation does not cause L3 Miss. This causes sampling period skew.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> Acked-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/arch/x86/util/evsel.c | 49 ++++++++++++++++++++++++++++++++
>  tools/perf/util/evsel.c          |  7 +++++
>  tools/perf/util/evsel.h          |  1 +
>  3 files changed, 57 insertions(+)
>
> diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
> index 88306183d629..fceb904902ab 100644
> --- a/tools/perf/arch/x86/util/evsel.c
> +++ b/tools/perf/arch/x86/util/evsel.c
> @@ -5,6 +5,7 @@
>  #include "util/env.h"
>  #include "util/pmu.h"
>  #include "linux/string.h"
> +#include "util/debug.h"
>
>  void arch_evsel__set_sample_weight(struct evsel *evsel)
>  {
> @@ -60,3 +61,51 @@ bool arch_evsel__must_be_in_group(const struct evsel *evsel)
>                 (!strcasecmp(evsel->name, "slots") ||
>                  strcasestr(evsel->name, "topdown"));
>  }
> +
> +static void ibs_l3miss_warn(void)
> +{
> +       pr_warning(
> +"WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled\n"
> +"and tagged operation does not cause L3 Miss. This causes sampling period skew.\n");
> +}
> +
> +void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr)
> +{
> +       struct perf_pmu *evsel_pmu, *ibs_fetch_pmu, *ibs_op_pmu;
> +       static int warned_once;
> +       /* 0: Uninitialized, 1: Yes, -1: No */
> +       static int is_amd;
> +
> +       if (warned_once || is_amd == -1)
> +               return;
> +
> +       if (!is_amd) {
> +               struct perf_env *env = evsel__env(evsel);
> +
> +               if (!perf_env__cpuid(env) || !env->cpuid ||
> +                   !strstarts(env->cpuid, "AuthenticAMD")) {
> +                       is_amd = -1;
> +                       return;
> +               }
> +               is_amd = 1;
> +       }
> +
> +       evsel_pmu = evsel__find_pmu(evsel);
> +       if (!evsel_pmu)
> +               return;
> +
> +       ibs_fetch_pmu = perf_pmu__find("ibs_fetch");
> +       ibs_op_pmu = perf_pmu__find("ibs_op");
> +
> +       if (ibs_fetch_pmu && ibs_fetch_pmu->type == evsel_pmu->type) {
> +               if (attr->config & (1ULL << 59)) {

It'd be nice if we used a macro or something instead of the
magic number.

> +                       ibs_l3miss_warn();
> +                       warned_once = 1;
> +               }
> +       } else if (ibs_op_pmu && ibs_op_pmu->type == evsel_pmu->type) {
> +               if (attr->config & (1ULL << 16)) {

Ditto.

Thanks,
Namhyung


> +                       ibs_l3miss_warn();
> +                       warned_once = 1;
> +               }
> +       }
> +}
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index ce499c5da8d7..8fea51a9cd90 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1091,6 +1091,11 @@ void __weak arch_evsel__fixup_new_cycles(struct perf_event_attr *attr __maybe_un
>  {
>  }
>
> +void __weak arch__post_evsel_config(struct evsel *evsel __maybe_unused,
> +                                   struct perf_event_attr *attr __maybe_unused)
> +{
> +}
> +
>  static void evsel__set_default_freq_period(struct record_opts *opts,
>                                            struct perf_event_attr *attr)
>  {
> @@ -1366,6 +1371,8 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
>          */
>         if (evsel__is_dummy_event(evsel))
>                 evsel__reset_sample_bit(evsel, BRANCH_STACK);
> +
> +       arch__post_evsel_config(evsel, attr);
>  }
>
>  int evsel__set_filter(struct evsel *evsel, const char *filter)
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 73ea48e94079..92bed8e2f7d8 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -297,6 +297,7 @@ void evsel__set_sample_id(struct evsel *evsel, bool use_sample_identifier);
>
>  void arch_evsel__set_sample_weight(struct evsel *evsel);
>  void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr);
> +void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr);
>
>  int evsel__set_filter(struct evsel *evsel, const char *filter);
>  int evsel__append_tp_filter(struct evsel *evsel, const char *filter);
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 4/8] perf headers: Store pmu caps in an array of strings
  2022-06-01  3:26 ` [PATCH v5 4/8] perf headers: Store pmu caps in an array of strings Ravi Bangoria
  2022-06-01 13:36   ` Liang, Kan
@ 2022-06-02 21:37   ` Namhyung Kim
  2022-06-03  5:20     ` Ravi Bangoria
  1 sibling, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2022-06-02 21:37 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Ian Rogers,
	Peter Zijlstra, rrichter, Ingo Molnar, Mark Rutland,
	Thomas Gleixner, Borislav Petkov, James Clark, Leo Yan,
	Andi Kleen, Stephane Eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, Sandipan Das, ananth.narayan,
	Kim Phillips, santosh.shukla

On Tue, May 31, 2022 at 8:28 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Currently all capabilities are stored in a single string separated
> by NULL character. Instead, store them in an array which makes
> searching of capability easier.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
[SNIP]
> @@ -3231,12 +3226,16 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
>                 if (!value)
>                         goto free_name;
>
> -               if (strbuf_addf(&sb, "%s=%s", name, value) < 0)
> +               name_size = strlen(name);
> +               value_size = strlen(value);
> +               ptr = zalloc(sizeof(char) * (name_size + value_size + 2));
> +               if (!ptr)
>                         goto free_value;
>
> -               /* include a NULL character at the end */
> -               if (strbuf_add(&sb, "", 1) < 0)
> -                       goto free_value;
> +               memcpy(ptr, name, name_size);
> +               ptr[name_size] = '=';
> +               memcpy(ptr + name_size + 1, value, value_size);

What about using asprintf() instead?

Thanks,
Namhyung


> +               (*cpu_pmu_caps)[i] = ptr;
>
>                 if (!strcmp(name, "branches"))
>                         *max_branches = atoi(value);
> @@ -3244,7 +3243,7 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
>                 free(value);
>                 free(name);
>         }
> -       *cpu_pmu_caps = strbuf_detach(&sb, NULL);
> +       *nr_cpu_pmu_caps = nr_caps;
>         return 0;
>
>  free_value:
> @@ -3252,16 +3251,24 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
>  free_name:
>         free(name);
>  error:
> -       strbuf_release(&sb);
> +       for (; i > 0; i--)
> +               free((*cpu_pmu_caps)[i - 1]);
> +       free(*cpu_pmu_caps);
> +       *cpu_pmu_caps = NULL;
> +       *nr_cpu_pmu_caps = 0;
>         return -1;
>  }
>
>  static int process_cpu_pmu_caps(struct feat_fd *ff,
>                                 void *data __maybe_unused)
>  {
> -       return process_per_cpu_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
> +       int ret = process_per_cpu_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps,
>                                         &ff->ph->env.cpu_pmu_caps,
>                                         &ff->ph->env.max_branches);
> +
> +       if (!ret && !ff->ph->env.cpu_pmu_caps)
> +               pr_debug("cpu pmu capabilities not available\n");
> +       return ret;
>  }
>
>  static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
> @@ -3270,6 +3277,7 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>         struct hybrid_cpc_node *nodes;
>         u32 nr_pmu, i;
>         int ret;
> +       int j;
>
>         if (do_read_u32(ff, &nr_pmu))
>                 return -1;
> @@ -3297,6 +3305,8 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>                         ret = -1;
>                         goto err;
>                 }
> +               if (!n->nr_cpu_pmu_caps)
> +                       pr_debug("%s pmu capabilities not available\n", n->pmu_name);
>         }
>
>         ff->ph->env.nr_hybrid_cpc_nodes = nr_pmu;
> @@ -3305,6 +3315,8 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>
>  err:
>         for (i = 0; i < nr_pmu; i++) {
> +               for (j = 0; j < nodes[i].nr_cpu_pmu_caps; j++)
> +                       free(nodes[i].cpu_pmu_caps[j]);
>                 free(nodes[i].cpu_pmu_caps);
>                 free(nodes[i].pmu_name);
>         }
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/8] perf/x86/ibs: Add new IBS register bits into header
  2022-06-01  3:26 ` [PATCH v5 6/8] perf/x86/ibs: Add new IBS register bits into header Ravi Bangoria
@ 2022-06-02 21:48   ` Namhyung Kim
  2022-06-03  5:24     ` Ravi Bangoria
  0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2022-06-02 21:48 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Ian Rogers,
	Peter Zijlstra, rrichter, Ingo Molnar, Mark Rutland,
	Thomas Gleixner, Borislav Petkov, James Clark, Leo Yan,
	Andi Kleen, Stephane Eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, Sandipan Das, ananth.narayan,
	Kim Phillips, santosh.shukla

On Tue, May 31, 2022 at 8:30 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> IBS support has been enhanced with two new features in upcoming uarch:
> 1. DataSrc extension and 2. L3 miss filtering. Additional set of bits
> has been introduced in IBS registers to exploit these features. Define
> these new bits into arch/x86/ header.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> Acked-by: Ian Rogers <irogers@google.com>

Isn't it a part of kernel changes?

Thanks,
Namhyung


> ---
>  arch/x86/include/asm/amd-ibs.h | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
> index aabdbb5ab920..f3eb098d63d4 100644
> --- a/arch/x86/include/asm/amd-ibs.h
> +++ b/arch/x86/include/asm/amd-ibs.h
> @@ -29,7 +29,10 @@ union ibs_fetch_ctl {
>                         rand_en:1,      /* 57: random tagging enable */
>                         fetch_l2_miss:1,/* 58: L2 miss for sampled fetch
>                                          *      (needs IbsFetchComp) */
> -                       reserved:5;     /* 59-63: reserved */
> +                       l3_miss_only:1, /* 59: Collect L3 miss samples only */
> +                       fetch_oc_miss:1,/* 60: Op cache miss for the sampled fetch */
> +                       fetch_l3_miss:1,/* 61: L3 cache miss for the sampled fetch */
> +                       reserved:2;     /* 62-63: reserved */
>         };
>  };
>
> @@ -38,14 +41,14 @@ union ibs_op_ctl {
>         __u64 val;
>         struct {
>                 __u64   opmaxcnt:16,    /* 0-15: periodic op max. count */
> -                       reserved0:1,    /* 16: reserved */
> +                       l3_miss_only:1, /* 16: Collect L3 miss samples only */
>                         op_en:1,        /* 17: op sampling enable */
>                         op_val:1,       /* 18: op sample valid */
>                         cnt_ctl:1,      /* 19: periodic op counter control */
>                         opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */
> -                       reserved1:5,    /* 27-31: reserved */
> +                       reserved0:5,    /* 27-31: reserved */
>                         opcurcnt:27,    /* 32-58: periodic op counter current count */
> -                       reserved2:5;    /* 59-63: reserved */
> +                       reserved1:5;    /* 59-63: reserved */
>         };
>  };
>
> @@ -71,11 +74,12 @@ union ibs_op_data {
>  union ibs_op_data2 {
>         __u64 val;
>         struct {
> -               __u64   data_src:3,     /* 0-2: data source */
> +               __u64   data_src_lo:3,  /* 0-2: data source low */
>                         reserved0:1,    /* 3: reserved */
>                         rmt_node:1,     /* 4: destination node */
>                         cache_hit_st:1, /* 5: cache hit state */
> -                       reserved1:57;   /* 5-63: reserved */
> +                       data_src_hi:2,  /* 6-7: data source high */
> +                       reserved1:56;   /* 8-63: reserved */
>         };
>  };
>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/8] perf record ibs: Warn about sampling period skew
  2022-06-02 20:30   ` Namhyung Kim
@ 2022-06-03  5:12     ` Ravi Bangoria
  2022-06-03  5:28       ` Ravi Bangoria
  0 siblings, 1 reply; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-03  5:12 UTC (permalink / raw)
  To: acme, namhyung
  Cc: ravi.bangoria, kan.liang, jolsa, irogers, peterz, rrichter,
	mingo, mark.rutland, tglx, bp, james.clark, leo.yan, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla

>> +       if (ibs_fetch_pmu && ibs_fetch_pmu->type == evsel_pmu->type) {
>> +               if (attr->config & (1ULL << 59)) {
> 
> It'd be nice if we used a macro or something instead of the
> magic number.
> 
>> +                       ibs_l3miss_warn();
>> +                       warned_once = 1;
>> +               }
>> +       } else if (ibs_op_pmu && ibs_op_pmu->type == evsel_pmu->type) {
>> +               if (attr->config & (1ULL << 16)) {
> 
> Ditto.

Thanks for the review, Namhyung.

Arnaldo, Would you be able to squash below trivial patch into original
patch? Please let me know if you want me to respin the series instead.

---><---

From 352228ee010b0782410e233233377d9484ea51bd Mon Sep 17 00:00:00 2001
From: Ravi Bangoria <ravi.bangoria@amd.com>
Date: Fri, 3 Jun 2022 10:19:01 +0530
Subject: [PATCH] perf record ibs: Define macros for IBS L3MissOnly bits

Instead of magic numbers use macros for IBS L3MissOnly bits.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/arch/x86/util/evsel.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
index fceb904902ab..53763e583804 100644
--- a/tools/perf/arch/x86/util/evsel.c
+++ b/tools/perf/arch/x86/util/evsel.c
@@ -7,6 +7,9 @@
 #include "linux/string.h"
 #include "util/debug.h"
 
+#define IBS_FETCH_L3MISSONLY	(1ULL << 59)
+#define IBS_OP_L3MISSONLY	(1ULL << 16)
+
 void arch_evsel__set_sample_weight(struct evsel *evsel)
 {
 	evsel__set_sample_bit(evsel, WEIGHT_STRUCT);
@@ -98,12 +101,12 @@ void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr)
 	ibs_op_pmu = perf_pmu__find("ibs_op");
 
 	if (ibs_fetch_pmu && ibs_fetch_pmu->type == evsel_pmu->type) {
-		if (attr->config & (1ULL << 59)) {
+		if (attr->config & IBS_FETCH_L3MISSONLY) {
 			ibs_l3miss_warn();
 			warned_once = 1;
 		}
 	} else if (ibs_op_pmu && ibs_op_pmu->type == evsel_pmu->type) {
-		if (attr->config & (1ULL << 16)) {
+		if (attr->config & IBS_OP_L3MISSONLY) {
 			ibs_l3miss_warn();
 			warned_once = 1;
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 4/8] perf headers: Store pmu caps in an array of strings
  2022-06-02 21:37   ` Namhyung Kim
@ 2022-06-03  5:20     ` Ravi Bangoria
  0 siblings, 0 replies; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-03  5:20 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Ian Rogers,
	Peter Zijlstra, rrichter, Ingo Molnar, Mark Rutland,
	Thomas Gleixner, Borislav Petkov, James Clark, Leo Yan,
	Andi Kleen, Stephane Eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, Sandipan Das, ananth.narayan,
	Kim Phillips, santosh.shukla, Ravi Bangoria

Hi Namhyung,

On 03-Jun-22 3:07 AM, Namhyung Kim wrote:
> On Tue, May 31, 2022 at 8:28 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>> Currently all capabilities are stored in a single string separated
>> by NULL character. Instead, store them in an array which makes
>> searching of capability easier.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>> ---
> [SNIP]
>> @@ -3231,12 +3226,16 @@ static int process_per_cpu_pmu_caps(struct feat_fd *ff, int *nr_cpu_pmu_caps,
>>                 if (!value)
>>                         goto free_name;
>>
>> -               if (strbuf_addf(&sb, "%s=%s", name, value) < 0)
>> +               name_size = strlen(name);
>> +               value_size = strlen(value);
>> +               ptr = zalloc(sizeof(char) * (name_size + value_size + 2));
>> +               if (!ptr)
>>                         goto free_value;
>>
>> -               /* include a NULL character at the end */
>> -               if (strbuf_add(&sb, "", 1) < 0)
>> -                       goto free_value;
>> +               memcpy(ptr, name, name_size);
>> +               ptr[name_size] = '=';
>> +               memcpy(ptr + name_size + 1, value, value_size);
> 
> What about using asprintf() instead?

Yeah asprintf() would make that code bit simpler. I think I'll have to
respin the series :)

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/8] perf/x86/ibs: Add new IBS register bits into header
  2022-06-02 21:48   ` Namhyung Kim
@ 2022-06-03  5:24     ` Ravi Bangoria
  0 siblings, 0 replies; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-03  5:24 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Ian Rogers,
	Peter Zijlstra, rrichter, Ingo Molnar, Mark Rutland,
	Thomas Gleixner, Borislav Petkov, James Clark, Leo Yan,
	Andi Kleen, Stephane Eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, Sandipan Das, ananth.narayan,
	Kim Phillips, santosh.shukla, Ravi Bangoria


On 03-Jun-22 3:18 AM, Namhyung Kim wrote:
> On Tue, May 31, 2022 at 8:30 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>> IBS support has been enhanced with two new features in upcoming uarch:
>> 1. DataSrc extension and 2. L3 miss filtering. Additional set of bits
>> has been introduced in IBS registers to exploit these features. Define
>> these new bits into arch/x86/ header.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>> Acked-by: Ian Rogers <irogers@google.com>
> 
> Isn't it a part of kernel changes?

Yes, that was left out because in the initial versions I did not have a
separate patch for kernel and tools. So Peter might not have consider it.
In any case, the changes into this header file are not used by kernel
atm. So I'm fine whichever way it goes in.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/8] perf record ibs: Warn about sampling period skew
  2022-06-03  5:12     ` Ravi Bangoria
@ 2022-06-03  5:28       ` Ravi Bangoria
  2022-06-03 19:25         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 26+ messages in thread
From: Ravi Bangoria @ 2022-06-03  5:28 UTC (permalink / raw)
  To: acme, namhyung
  Cc: kan.liang, jolsa, irogers, peterz, rrichter, mingo, mark.rutland,
	tglx, bp, james.clark, leo.yan, ak, eranian, like.xu.linux, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla, Ravi Bangoria

On 03-Jun-22 10:42 AM, Ravi Bangoria wrote:
>>> +       if (ibs_fetch_pmu && ibs_fetch_pmu->type == evsel_pmu->type) {
>>> +               if (attr->config & (1ULL << 59)) {
>>
>> It'd be nice if we used a macro or something instead of the
>> magic number.
>>
>>> +                       ibs_l3miss_warn();
>>> +                       warned_once = 1;
>>> +               }
>>> +       } else if (ibs_op_pmu && ibs_op_pmu->type == evsel_pmu->type) {
>>> +               if (attr->config & (1ULL << 16)) {
>>
>> Ditto.
> 
> Thanks for the review, Namhyung.
> 
> Arnaldo, Would you be able to squash below trivial patch into original
> patch? Please let me know if you want me to respin the series instead.

I'm planning to respin with asprintf() change. Sorry for the noise.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/8] perf record ibs: Warn about sampling period skew
  2022-06-03  5:28       ` Ravi Bangoria
@ 2022-06-03 19:25         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-06-03 19:25 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: namhyung, kan.liang, jolsa, irogers, peterz, rrichter, mingo,
	mark.rutland, tglx, bp, james.clark, leo.yan, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla

Em Fri, Jun 03, 2022 at 10:58:13AM +0530, Ravi Bangoria escreveu:
> On 03-Jun-22 10:42 AM, Ravi Bangoria wrote:
> >>> +       if (ibs_fetch_pmu && ibs_fetch_pmu->type == evsel_pmu->type) {
> >>> +               if (attr->config & (1ULL << 59)) {
> >>
> >> It'd be nice if we used a macro or something instead of the
> >> magic number.
> >>
> >>> +                       ibs_l3miss_warn();
> >>> +                       warned_once = 1;
> >>> +               }
> >>> +       } else if (ibs_op_pmu && ibs_op_pmu->type == evsel_pmu->type) {
> >>> +               if (attr->config & (1ULL << 16)) {
> >>
> >> Ditto.
> > 
> > Thanks for the review, Namhyung.
> > 
> > Arnaldo, Would you be able to squash below trivial patch into original
> > patch? Please let me know if you want me to respin the series instead.
> 
> I'm planning to respin with asprintf() change. Sorry for the noise.

Ok, will wait for the respin then.

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-06-03 19:25 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-01  3:26 [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Ravi Bangoria
2022-06-01  3:26 ` [PATCH v5 1/8] perf record ibs: Warn about sampling period skew Ravi Bangoria
2022-06-02 20:30   ` Namhyung Kim
2022-06-03  5:12     ` Ravi Bangoria
2022-06-03  5:28       ` Ravi Bangoria
2022-06-03 19:25         ` Arnaldo Carvalho de Melo
2022-06-01  3:26 ` [PATCH v5 2/8] perf tool: Parse pmu caps sysfs only once Ravi Bangoria
2022-06-01 13:35   ` Liang, Kan
2022-06-01 13:51     ` Ravi Bangoria
2022-06-01 13:55       ` Liang, Kan
2022-06-01  3:26 ` [PATCH v5 3/8] perf headers: Pass "cpu" pmu name while printing caps Ravi Bangoria
2022-06-01 13:35   ` Liang, Kan
2022-06-01  3:26 ` [PATCH v5 4/8] perf headers: Store pmu caps in an array of strings Ravi Bangoria
2022-06-01 13:36   ` Liang, Kan
2022-06-02 21:37   ` Namhyung Kim
2022-06-03  5:20     ` Ravi Bangoria
2022-06-01  3:26 ` [PATCH v5 5/8] perf headers: Record non-cpu pmu capabilities Ravi Bangoria
2022-06-01 13:37   ` Liang, Kan
2022-06-01 13:58     ` Liang, Kan
2022-06-01  3:26 ` [PATCH v5 6/8] perf/x86/ibs: Add new IBS register bits into header Ravi Bangoria
2022-06-02 21:48   ` Namhyung Kim
2022-06-03  5:24     ` Ravi Bangoria
2022-06-01  3:26 ` [PATCH v5 7/8] perf tool ibs: Sync amd ibs header file Ravi Bangoria
2022-06-01  3:26 ` [PATCH v5 8/8] perf script ibs: Support new IBS bits in raw trace dump Ravi Bangoria
2022-06-01 14:04 ` [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes) Liang, Kan
2022-06-01 14:10   ` Ravi Bangoria

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.