linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] perf/amd: Zen4 IBS extensions support
@ 2022-05-09  4:49 Ravi Bangoria
  2022-05-09  4:49 ` [PATCH v2 1/8] perf/amd/ibs: Cascade pmu init functions' return value Ravi Bangoria
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-09  4:49 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, rrichter, mingo, mark.rutland, jolsa, namhyung,
	tglx, bp, irogers, yao.jin, james.clark, leo.yan, kan.liang, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

IBS support has been enhanced with two new features in upcoming uarch:
1. DataSrc extension and 2. L3 Miss Filtering capability. Both are
indicated by CPUID_Fn8000001B_EAX bit 11.

DataSrc extension provides additional data source details for tagged
load/store operations. Add support for these new bits in perf report/
script raw-dump.

IBS L3 miss filtering works by tagging an instruction on IBS counter
overflow and generating an NMI if the tagged instruction causes an L3
miss. Samples without an L3 miss are discarded and counter is reset
with random value (between 1-15 for fetch pmu and 1-127 for op pmu).
This helps in reducing sampling overhead when user is interested only
in such samples. One of the use case of such filtered samples is to
feed data to page-migration daemon in tiered memory systems.

Add support for L3 miss filtering in IBS driver via new pmu attribute
"l3missonly". Example usage:

  # perf record -a -e ibs_op/l3missonly=1/ --raw-samples sleep 5
  # perf report -D

Some important points to keep in mind while using L3 miss filtering:
1. Hw internally reset sampling period when tagged instruction does
   not cause L3 miss. But there is no way to reconstruct aggregated
   sampling period when this happens.
2. L3 miss is not the actual event being counted. Rather, IBS will
   count fetch, cycles or uOps depending on the configuration. Thus
   sampling period have no direct connection to L3 misses.

1st causes sampling period skew. Thus, I've added warning message at
perf record:

  # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/
  WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled
  and tagged operation does not cause L3 Miss. This causes sampling period skew.

User can configure smaller sampling period to get more samples while
using l3missonly.

v1: https://lore.kernel.org/r/20220425044323.2830-1-ravi.bangoria@amd.com
v1->v2:
 - patch 1 and 2 are new. 1st patch passes on return value of pmu init
   functions. 2nd patch refactors pmu attribute code by using
   ->is_visible() callback.
 - Patch 3 and 4 now also uses ->is_visible() callback for pmu format
   and capability attributes respectively.
 - Other minor improvements suggested by Robert

Ravi Bangoria (8):
  perf/amd/ibs: Cascade pmu init functions' return value
  perf/amd/ibs: Use ->is_visible callback for dynamic attributes
  perf/amd/ibs: Add support for L3 miss filtering
  perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability
    attribute
  perf record ibs: Warn about sampling period skew
  perf header: Parse non-cpu pmu capabilities
  perf script ibs: Support new IBS bits in raw trace dump
  perf ibs: Fix comment

 arch/x86/events/amd/ibs.c                     | 191 +++++++++++++---
 arch/x86/include/asm/amd-ibs.h                |  18 +-
 arch/x86/include/asm/perf_event.h             |   3 +
 tools/arch/x86/include/asm/amd-ibs.h          |  18 +-
 .../Documentation/perf.data-file-format.txt   |  18 ++
 tools/perf/arch/x86/util/evsel.c              |  34 +++
 tools/perf/util/amd-sample-raw.c              |  68 +++++-
 tools/perf/util/env.c                         |  48 +++-
 tools/perf/util/env.h                         |  11 +
 tools/perf/util/evsel.c                       |   7 +
 tools/perf/util/evsel.h                       |   1 +
 tools/perf/util/header.c                      | 211 ++++++++++++++++++
 tools/perf/util/header.h                      |   1 +
 tools/perf/util/pmu.c                         |  15 +-
 tools/perf/util/pmu.h                         |   2 +
 15 files changed, 586 insertions(+), 60 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/8] perf/amd/ibs: Cascade pmu init functions' return value
  2022-05-09  4:49 [PATCH v2 0/8] perf/amd: Zen4 IBS extensions support Ravi Bangoria
@ 2022-05-09  4:49 ` Ravi Bangoria
  2022-05-11 19:47   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-05-09  4:49 ` [PATCH v2 2/8] perf/amd/ibs: Use ->is_visible callback for dynamic attributes Ravi Bangoria
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-09  4:49 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, rrichter, mingo, mark.rutland, jolsa, namhyung,
	tglx, bp, irogers, yao.jin, james.clark, leo.yan, kan.liang, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

IBS pmu initialization code ignores return value provided by
callee functions. Fix it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9739019d4b67..367ca899e6e8 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -759,9 +759,10 @@ static __init int perf_ibs_pmu_init(struct perf_ibs *perf_ibs, char *name)
 	return ret;
 }
 
-static __init void perf_event_ibs_init(void)
+static __init int perf_event_ibs_init(void)
 {
 	struct attribute **attr = ibs_op_format_attrs;
+	int ret;
 
 	/*
 	 * Some chips fail to reset the fetch count when it is written; instead
@@ -773,7 +774,9 @@ static __init void perf_event_ibs_init(void)
 	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10)
 		perf_ibs_fetch.fetch_ignore_if_zero_rip = 1;
 
-	perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
+	ret = perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
+	if (ret)
+		return ret;
 
 	if (ibs_caps & IBS_CAPS_OPCNT) {
 		perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
@@ -786,15 +789,35 @@ static __init void perf_event_ibs_init(void)
 		perf_ibs_op.cnt_mask    |= IBS_OP_MAX_CNT_EXT_MASK;
 	}
 
-	perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
+	ret = perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
+	if (ret)
+		goto err_op;
+
+	ret = register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
+	if (ret)
+		goto err_nmi;
 
-	register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
 	pr_info("perf: AMD IBS detected (0x%08x)\n", ibs_caps);
+	return 0;
+
+err_nmi:
+	perf_pmu_unregister(&perf_ibs_op.pmu);
+	free_percpu(perf_ibs_op.pcpu);
+	perf_ibs_op.pcpu = NULL;
+err_op:
+	perf_pmu_unregister(&perf_ibs_fetch.pmu);
+	free_percpu(perf_ibs_fetch.pcpu);
+	perf_ibs_fetch.pcpu = NULL;
+
+	return ret;
 }
 
 #else /* defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD) */
 
-static __init void perf_event_ibs_init(void) { }
+static __init int perf_event_ibs_init(void)
+{
+	return 0;
+}
 
 #endif
 
@@ -1064,9 +1087,7 @@ static __init int amd_ibs_init(void)
 			  x86_pmu_amd_ibs_starting_cpu,
 			  x86_pmu_amd_ibs_dying_cpu);
 
-	perf_event_ibs_init();
-
-	return 0;
+	return perf_event_ibs_init();
 }
 
 /* Since we need the pci subsystem to init ibs we can't do this earlier: */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/8] perf/amd/ibs: Use ->is_visible callback for dynamic attributes
  2022-05-09  4:49 [PATCH v2 0/8] perf/amd: Zen4 IBS extensions support Ravi Bangoria
  2022-05-09  4:49 ` [PATCH v2 1/8] perf/amd/ibs: Cascade pmu init functions' return value Ravi Bangoria
@ 2022-05-09  4:49 ` Ravi Bangoria
  2022-05-11 19:47   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-05-09  4:49 ` [PATCH v2 3/8] perf/amd/ibs: Add support for L3 miss filtering Ravi Bangoria
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-09  4:49 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, rrichter, mingo, mark.rutland, jolsa, namhyung,
	tglx, bp, irogers, yao.jin, james.clark, leo.yan, kan.liang, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

Currently, some attributes are added at build time whereas others
at boot time depending on IBS pmu capabilities. Instead, we can
just add all attribute groups at build time but hide individual
group at boot time using more appropriate ->is_visible() callback.

Also, struct perf_ibs has bunch of fields for pmu attributes which
just pass on the pointer, does not do anything else. Remove them.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 78 +++++++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 24 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 367ca899e6e8..785212b5dfd6 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -94,10 +94,6 @@ struct perf_ibs {
 	unsigned int			fetch_ignore_if_zero_rip : 1;
 	struct cpu_perf_ibs __percpu	*pcpu;
 
-	struct attribute		**format_attrs;
-	struct attribute_group		format_group;
-	const struct attribute_group	*attr_groups[2];
-
 	u64				(*get_count)(u64 config);
 };
 
@@ -518,16 +514,61 @@ static void perf_ibs_del(struct perf_event *event, int flags)
 
 static void perf_ibs_read(struct perf_event *event) { }
 
+/*
+ * We need to initialize with empty group if all attributes in the
+ * group are dynamic.
+ */
+static struct attribute *attrs_empty[] = {
+	NULL,
+};
+
+static struct attribute_group empty_format_group = {
+	.name = "format",
+	.attrs = attrs_empty,
+};
+
+static const struct attribute_group *empty_attr_groups[] = {
+	&empty_format_group,
+	NULL,
+};
+
 PMU_FORMAT_ATTR(rand_en,	"config:57");
 PMU_FORMAT_ATTR(cnt_ctl,	"config:19");
 
-static struct attribute *ibs_fetch_format_attrs[] = {
+static struct attribute *rand_en_attrs[] = {
 	&format_attr_rand_en.attr,
 	NULL,
 };
 
-static struct attribute *ibs_op_format_attrs[] = {
-	NULL,	/* &format_attr_cnt_ctl.attr if IBS_CAPS_OPCNT */
+static struct attribute_group group_rand_en = {
+	.name = "format",
+	.attrs = rand_en_attrs,
+};
+
+static const struct attribute_group *fetch_attr_groups[] = {
+	&group_rand_en,
+	NULL,
+};
+
+static umode_t
+cnt_ctl_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+{
+	return ibs_caps & IBS_CAPS_OPCNT ? attr->mode : 0;
+}
+
+static struct attribute *cnt_ctl_attrs[] = {
+	&format_attr_cnt_ctl.attr,
+	NULL,
+};
+
+static struct attribute_group group_cnt_ctl = {
+	.name = "format",
+	.attrs = cnt_ctl_attrs,
+	.is_visible = cnt_ctl_is_visible,
+};
+
+static const struct attribute_group *op_attr_update[] = {
+	&group_cnt_ctl,
 	NULL,
 };
 
@@ -551,7 +592,6 @@ static struct perf_ibs perf_ibs_fetch = {
 	.max_period		= IBS_FETCH_MAX_CNT << 4,
 	.offset_mask		= { MSR_AMD64_IBSFETCH_REG_MASK },
 	.offset_max		= MSR_AMD64_IBSFETCH_REG_COUNT,
-	.format_attrs		= ibs_fetch_format_attrs,
 
 	.get_count		= get_ibs_fetch_count,
 };
@@ -577,7 +617,6 @@ static struct perf_ibs perf_ibs_op = {
 	.max_period		= IBS_OP_MAX_CNT << 4,
 	.offset_mask		= { MSR_AMD64_IBSOP_REG_MASK },
 	.offset_max		= MSR_AMD64_IBSOP_REG_COUNT,
-	.format_attrs		= ibs_op_format_attrs,
 
 	.get_count		= get_ibs_op_count,
 };
@@ -739,17 +778,6 @@ static __init int perf_ibs_pmu_init(struct perf_ibs *perf_ibs, char *name)
 
 	perf_ibs->pcpu = pcpu;
 
-	/* register attributes */
-	if (perf_ibs->format_attrs[0]) {
-		memset(&perf_ibs->format_group, 0, sizeof(perf_ibs->format_group));
-		perf_ibs->format_group.name	= "format";
-		perf_ibs->format_group.attrs	= perf_ibs->format_attrs;
-
-		memset(&perf_ibs->attr_groups, 0, sizeof(perf_ibs->attr_groups));
-		perf_ibs->attr_groups[0]	= &perf_ibs->format_group;
-		perf_ibs->pmu.attr_groups	= perf_ibs->attr_groups;
-	}
-
 	ret = perf_pmu_register(&perf_ibs->pmu, name, -1);
 	if (ret) {
 		perf_ibs->pcpu = NULL;
@@ -761,7 +789,6 @@ static __init int perf_ibs_pmu_init(struct perf_ibs *perf_ibs, char *name)
 
 static __init int perf_event_ibs_init(void)
 {
-	struct attribute **attr = ibs_op_format_attrs;
 	int ret;
 
 	/*
@@ -774,14 +801,14 @@ static __init int perf_event_ibs_init(void)
 	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10)
 		perf_ibs_fetch.fetch_ignore_if_zero_rip = 1;
 
+	perf_ibs_fetch.pmu.attr_groups = fetch_attr_groups;
+
 	ret = perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
 	if (ret)
 		return ret;
 
-	if (ibs_caps & IBS_CAPS_OPCNT) {
+	if (ibs_caps & IBS_CAPS_OPCNT)
 		perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
-		*attr++ = &format_attr_cnt_ctl.attr;
-	}
 
 	if (ibs_caps & IBS_CAPS_OPCNTEXT) {
 		perf_ibs_op.max_period  |= IBS_OP_MAX_CNT_EXT_MASK;
@@ -789,6 +816,9 @@ static __init int perf_event_ibs_init(void)
 		perf_ibs_op.cnt_mask    |= IBS_OP_MAX_CNT_EXT_MASK;
 	}
 
+	perf_ibs_op.pmu.attr_groups = empty_attr_groups;
+	perf_ibs_op.pmu.attr_update = op_attr_update;
+
 	ret = perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
 	if (ret)
 		goto err_op;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/8] perf/amd/ibs: Add support for L3 miss filtering
  2022-05-09  4:49 [PATCH v2 0/8] perf/amd: Zen4 IBS extensions support Ravi Bangoria
  2022-05-09  4:49 ` [PATCH v2 1/8] perf/amd/ibs: Cascade pmu init functions' return value Ravi Bangoria
  2022-05-09  4:49 ` [PATCH v2 2/8] perf/amd/ibs: Use ->is_visible callback for dynamic attributes Ravi Bangoria
@ 2022-05-09  4:49 ` Ravi Bangoria
  2022-05-09 12:05   ` Peter Zijlstra
  2022-05-11 19:46   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-05-09  4:49 ` [PATCH v2 4/8] perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute Ravi Bangoria
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-09  4:49 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, rrichter, mingo, mark.rutland, jolsa, namhyung,
	tglx, bp, irogers, yao.jin, james.clark, leo.yan, kan.liang, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

IBS L3 miss filtering works by tagging an instruction on IBS counter
overflow and generating an NMI if the tagged instruction causes an L3
miss. Samples without an L3 miss are discarded and counter is reset
with random value (between 1-15 for fetch pmu and 1-127 for op pmu).
This helps in reducing sampling overhead when user is interested only
in such samples. One of the use case of such filtered samples is to
feed data to page-migration daemon in tiered memory systems.

Add support for L3 miss filtering in IBS driver via new pmu attribute
"l3missonly". Example usage:

  # perf record -a -e ibs_op/l3missonly=1/ --raw-samples sleep 5

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c         | 67 +++++++++++++++++++++++++++----
 arch/x86/include/asm/perf_event.h |  3 ++
 2 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 785212b5dfd6..52d2eb9ff19a 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -534,22 +534,46 @@ static const struct attribute_group *empty_attr_groups[] = {
 
 PMU_FORMAT_ATTR(rand_en,	"config:57");
 PMU_FORMAT_ATTR(cnt_ctl,	"config:19");
+PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59");
+PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16");
+
+static umode_t
+zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+{
+	return ibs_caps & IBS_CAPS_ZEN4IBSEXTENSIONS ? attr->mode : 0;
+}
 
 static struct attribute *rand_en_attrs[] = {
 	&format_attr_rand_en.attr,
 	NULL,
 };
 
+static struct attribute *fetch_l3missonly_attrs[] = {
+	&fetch_l3missonly.attr.attr,
+	NULL,
+};
+
 static struct attribute_group group_rand_en = {
 	.name = "format",
 	.attrs = rand_en_attrs,
 };
 
+static struct attribute_group group_fetch_l3missonly = {
+	.name = "format",
+	.attrs = fetch_l3missonly_attrs,
+	.is_visible = zen4_ibs_extensions_is_visible,
+};
+
 static const struct attribute_group *fetch_attr_groups[] = {
 	&group_rand_en,
 	NULL,
 };
 
+static const struct attribute_group *fetch_attr_update[] = {
+	&group_fetch_l3missonly,
+	NULL,
+};
+
 static umode_t
 cnt_ctl_is_visible(struct kobject *kobj, struct attribute *attr, int i)
 {
@@ -561,14 +585,26 @@ static struct attribute *cnt_ctl_attrs[] = {
 	NULL,
 };
 
+static struct attribute *op_l3missonly_attrs[] = {
+	&op_l3missonly.attr.attr,
+	NULL,
+};
+
 static struct attribute_group group_cnt_ctl = {
 	.name = "format",
 	.attrs = cnt_ctl_attrs,
 	.is_visible = cnt_ctl_is_visible,
 };
 
+static struct attribute_group group_op_l3missonly = {
+	.name = "format",
+	.attrs = op_l3missonly_attrs,
+	.is_visible = zen4_ibs_extensions_is_visible,
+};
+
 static const struct attribute_group *op_attr_update[] = {
 	&group_cnt_ctl,
+	&group_op_l3missonly,
 	NULL,
 };
 
@@ -787,10 +823,8 @@ static __init int perf_ibs_pmu_init(struct perf_ibs *perf_ibs, char *name)
 	return ret;
 }
 
-static __init int perf_event_ibs_init(void)
+static __init int perf_ibs_fetch_init(void)
 {
-	int ret;
-
 	/*
 	 * Some chips fail to reset the fetch count when it is written; instead
 	 * they need a 0-1 transition of IbsFetchEn.
@@ -801,12 +835,17 @@ static __init int perf_event_ibs_init(void)
 	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10)
 		perf_ibs_fetch.fetch_ignore_if_zero_rip = 1;
 
+	if (ibs_caps & IBS_CAPS_ZEN4IBSEXTENSIONS)
+		perf_ibs_fetch.config_mask |= IBS_FETCH_L3MISSONLY;
+
 	perf_ibs_fetch.pmu.attr_groups = fetch_attr_groups;
+	perf_ibs_fetch.pmu.attr_update = fetch_attr_update;
 
-	ret = perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
-	if (ret)
-		return ret;
+	return perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
+}
 
+static __init int perf_ibs_op_init(void)
+{
 	if (ibs_caps & IBS_CAPS_OPCNT)
 		perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
 
@@ -816,10 +855,24 @@ static __init int perf_event_ibs_init(void)
 		perf_ibs_op.cnt_mask    |= IBS_OP_MAX_CNT_EXT_MASK;
 	}
 
+	if (ibs_caps & IBS_CAPS_ZEN4IBSEXTENSIONS)
+		perf_ibs_op.config_mask |= IBS_OP_L3MISSONLY;
+
 	perf_ibs_op.pmu.attr_groups = empty_attr_groups;
 	perf_ibs_op.pmu.attr_update = op_attr_update;
 
-	ret = perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
+	return perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
+}
+
+static __init int perf_event_ibs_init(void)
+{
+	int ret;
+
+	ret = perf_ibs_fetch_init();
+	if (ret)
+		return ret;
+
+	ret = perf_ibs_op_init();
 	if (ret)
 		goto err_op;
 
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index b06e4c573add..a24b637a6e1d 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -391,6 +391,7 @@ struct pebs_xmm {
 #define IBS_CAPS_OPBRNFUSE		(1U<<8)
 #define IBS_CAPS_FETCHCTLEXTD		(1U<<9)
 #define IBS_CAPS_OPDATA4		(1U<<10)
+#define IBS_CAPS_ZEN4IBSEXTENSIONS	(1U<<11)
 
 #define IBS_CAPS_DEFAULT		(IBS_CAPS_AVAIL		\
 					 | IBS_CAPS_FETCHSAM	\
@@ -404,6 +405,7 @@ struct pebs_xmm {
 #define IBSCTL_LVT_OFFSET_MASK		0x0F
 
 /* IBS fetch bits/masks */
+#define IBS_FETCH_L3MISSONLY	(1ULL<<59)
 #define IBS_FETCH_RAND_EN	(1ULL<<57)
 #define IBS_FETCH_VAL		(1ULL<<49)
 #define IBS_FETCH_ENABLE	(1ULL<<48)
@@ -420,6 +422,7 @@ struct pebs_xmm {
 #define IBS_OP_CNT_CTL		(1ULL<<19)
 #define IBS_OP_VAL		(1ULL<<18)
 #define IBS_OP_ENABLE		(1ULL<<17)
+#define IBS_OP_L3MISSONLY	(1ULL<<16)
 #define IBS_OP_MAX_CNT		0x0000FFFFULL
 #define IBS_OP_MAX_CNT_EXT	0x007FFFFFULL	/* not a register bit mask */
 #define IBS_OP_MAX_CNT_EXT_MASK	(0x7FULL<<20)	/* separate upper 7 bits */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/8] perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute
  2022-05-09  4:49 [PATCH v2 0/8] perf/amd: Zen4 IBS extensions support Ravi Bangoria
                   ` (2 preceding siblings ...)
  2022-05-09  4:49 ` [PATCH v2 3/8] perf/amd/ibs: Add support for L3 miss filtering Ravi Bangoria
@ 2022-05-09  4:49 ` Ravi Bangoria
  2022-05-11 19:46   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-05-09  4:49 ` [PATCH v2 5/8] perf record ibs: Warn about sampling period skew Ravi Bangoria
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-09  4:49 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, rrichter, mingo, mark.rutland, jolsa, namhyung,
	tglx, bp, irogers, yao.jin, james.clark, leo.yan, kan.liang, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

PMU driver can advertise certain feature via capability attribute('caps'
sysfs directory) which can be consumed by userspace tools like perf. Add
zen4_ibs_extensions capability attribute for IBS pmus. This attribute
will be enabled when CPUID_Fn8000001B_EAX[11] is set.

With patch on Zen4:

  $ ls /sys/bus/event_source/devices/ibs_op/caps
  zen4_ibs_extensions

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 52d2eb9ff19a..12b0fd4a0328 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -527,8 +527,14 @@ static struct attribute_group empty_format_group = {
 	.attrs = attrs_empty,
 };
 
+static struct attribute_group empty_caps_group = {
+	.name = "caps",
+	.attrs = attrs_empty,
+};
+
 static const struct attribute_group *empty_attr_groups[] = {
 	&empty_format_group,
+	&empty_caps_group,
 	NULL,
 };
 
@@ -536,6 +542,7 @@ PMU_FORMAT_ATTR(rand_en,	"config:57");
 PMU_FORMAT_ATTR(cnt_ctl,	"config:19");
 PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59");
 PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16");
+PMU_EVENT_ATTR_STRING(zen4_ibs_extensions, zen4_ibs_extensions, "1");
 
 static umode_t
 zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i)
@@ -553,6 +560,11 @@ static struct attribute *fetch_l3missonly_attrs[] = {
 	NULL,
 };
 
+static struct attribute *zen4_ibs_extensions_attrs[] = {
+	&zen4_ibs_extensions.attr.attr,
+	NULL,
+};
+
 static struct attribute_group group_rand_en = {
 	.name = "format",
 	.attrs = rand_en_attrs,
@@ -564,13 +576,21 @@ static struct attribute_group group_fetch_l3missonly = {
 	.is_visible = zen4_ibs_extensions_is_visible,
 };
 
+static struct attribute_group group_zen4_ibs_extensions = {
+	.name = "caps",
+	.attrs = zen4_ibs_extensions_attrs,
+	.is_visible = zen4_ibs_extensions_is_visible,
+};
+
 static const struct attribute_group *fetch_attr_groups[] = {
 	&group_rand_en,
+	&empty_caps_group,
 	NULL,
 };
 
 static const struct attribute_group *fetch_attr_update[] = {
 	&group_fetch_l3missonly,
+	&group_zen4_ibs_extensions,
 	NULL,
 };
 
@@ -605,6 +625,7 @@ static struct attribute_group group_op_l3missonly = {
 static const struct attribute_group *op_attr_update[] = {
 	&group_cnt_ctl,
 	&group_op_l3missonly,
+	&group_zen4_ibs_extensions,
 	NULL,
 };
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/8] perf record ibs: Warn about sampling period skew
  2022-05-09  4:49 [PATCH v2 0/8] perf/amd: Zen4 IBS extensions support Ravi Bangoria
                   ` (3 preceding siblings ...)
  2022-05-09  4:49 ` [PATCH v2 4/8] perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute Ravi Bangoria
@ 2022-05-09  4:49 ` Ravi Bangoria
  2022-05-16 13:22   ` Arnaldo Carvalho de Melo
  2022-05-09  4:49 ` [PATCH v2 6/8] perf header: Parse non-cpu pmu capabilities Ravi Bangoria
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-09  4:49 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, rrichter, mingo, mark.rutland, jolsa, namhyung,
	tglx, bp, irogers, yao.jin, james.clark, leo.yan, kan.liang, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

Samples without an L3 miss are discarded and counter is reset with
random value (between 1-15 for fetch pmu and 1-127 for op pmu) when
IBS L3 miss filtering is enabled. This causes a sampling period skew
but there is no way to reconstruct aggregated sampling period. So
print a warning at perf record if user sets l3missonly=1.

Ex:
  # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/
  WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled
  and tagged operation does not cause L3 Miss. This causes sampling period skew.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/arch/x86/util/evsel.c | 34 ++++++++++++++++++++++++++++++++
 tools/perf/util/evsel.c          |  7 +++++++
 tools/perf/util/evsel.h          |  1 +
 3 files changed, 42 insertions(+)

diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
index ac2899a25b7a..6399faa70a88 100644
--- a/tools/perf/arch/x86/util/evsel.c
+++ b/tools/perf/arch/x86/util/evsel.c
@@ -4,6 +4,8 @@
 #include "util/evsel.h"
 #include "util/env.h"
 #include "linux/string.h"
+#include "util/pmu.h"
+#include "util/debug.h"
 
 void arch_evsel__set_sample_weight(struct evsel *evsel)
 {
@@ -29,3 +31,35 @@ void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr)
 
 	free(env.cpuid);
 }
+
+static void ibs_l3miss_warn(void)
+{
+	pr_warning(
+"WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled\n"
+"and tagged operation does not cause L3 Miss. This causes sampling period skew.\n");
+}
+
+void arch_evsel__warn_ambiguity(struct evsel *evsel, struct perf_event_attr *attr)
+{
+	struct perf_env *env = evsel__env(evsel);
+	struct perf_pmu *evsel_pmu = evsel__find_pmu(evsel);
+	struct perf_pmu *ibs_fetch_pmu = perf_pmu__find("ibs_fetch");
+	struct perf_pmu *ibs_op_pmu = perf_pmu__find("ibs_op");
+	static int warned_once;
+
+	if (warned_once || !perf_env__cpuid(env) || !env->cpuid ||
+	    !strstarts(env->cpuid, "AuthenticAMD") || !evsel_pmu)
+		return;
+
+	if (ibs_fetch_pmu && ibs_fetch_pmu->type == evsel_pmu->type) {
+		if (attr->config & (1ULL << 59)) {
+			ibs_l3miss_warn();
+			warned_once = 1;
+		}
+	} else if (ibs_op_pmu && ibs_op_pmu->type == evsel_pmu->type) {
+		if (attr->config & (1ULL << 16)) {
+			ibs_l3miss_warn();
+			warned_once = 1;
+		}
+	}
+}
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 2a1729e7aee4..4f8b72d4a521 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1064,6 +1064,11 @@ void __weak arch_evsel__fixup_new_cycles(struct perf_event_attr *attr __maybe_un
 {
 }
 
+void __weak arch_evsel__warn_ambiguity(struct evsel *evsel __maybe_unused,
+				       struct perf_event_attr *attr __maybe_unused)
+{
+}
+
 static void evsel__set_default_freq_period(struct record_opts *opts,
 					   struct perf_event_attr *attr)
 {
@@ -1339,6 +1344,8 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
 	 */
 	if (evsel__is_dummy_event(evsel))
 		evsel__reset_sample_bit(evsel, BRANCH_STACK);
+
+	arch_evsel__warn_ambiguity(evsel, attr);
 }
 
 int evsel__set_filter(struct evsel *evsel, const char *filter)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 041b42d33bf5..195ae30ec45b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -281,6 +281,7 @@ void evsel__set_sample_id(struct evsel *evsel, bool use_sample_identifier);
 
 void arch_evsel__set_sample_weight(struct evsel *evsel);
 void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr);
+void arch_evsel__warn_ambiguity(struct evsel *evsel, struct perf_event_attr *attr);
 
 int evsel__set_filter(struct evsel *evsel, const char *filter);
 int evsel__append_tp_filter(struct evsel *evsel, const char *filter);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 6/8] perf header: Parse non-cpu pmu capabilities
  2022-05-09  4:49 [PATCH v2 0/8] perf/amd: Zen4 IBS extensions support Ravi Bangoria
                   ` (4 preceding siblings ...)
  2022-05-09  4:49 ` [PATCH v2 5/8] perf record ibs: Warn about sampling period skew Ravi Bangoria
@ 2022-05-09  4:49 ` Ravi Bangoria
  2022-05-16  4:15   ` Ravi Bangoria
  2022-05-16 13:28   ` Arnaldo Carvalho de Melo
  2022-05-09  4:49 ` [PATCH v2 7/8] perf script ibs: Support new IBS bits in raw trace dump Ravi Bangoria
  2022-05-09  4:49 ` [PATCH v2 8/8] perf ibs: Fix comment Ravi Bangoria
  7 siblings, 2 replies; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-09  4:49 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, rrichter, mingo, mark.rutland, jolsa, namhyung,
	tglx, bp, irogers, yao.jin, james.clark, leo.yan, kan.liang, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

Pmus advertise their capabilities via sysfs attribute files but
perf tool currently parses only core(cpu) pmu capabilities. Add
support for parsing non-cpu pmu capabilities.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 .../Documentation/perf.data-file-format.txt   |  18 ++
 tools/perf/util/env.c                         |  48 +++-
 tools/perf/util/env.h                         |  11 +
 tools/perf/util/header.c                      | 211 ++++++++++++++++++
 tools/perf/util/header.h                      |   1 +
 tools/perf/util/pmu.c                         |  15 +-
 tools/perf/util/pmu.h                         |   2 +
 7 files changed, 301 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt
index f56d0e0fbff6..dea3acb36558 100644
--- a/tools/perf/Documentation/perf.data-file-format.txt
+++ b/tools/perf/Documentation/perf.data-file-format.txt
@@ -435,6 +435,24 @@ struct {
 	} [nr_pmu];
 };
 
+	HEADER_PMU_CAPS = 32,
+
+	List of pmu capabilities (except cpu pmu which is already
+	covered by HEADER_CPU_PMU_CAPS)
+
+struct {
+	u32 nr_pmus;
+	struct {
+		u8 core_type;	/* For hybrid topology */
+		char pmu_name[];
+		u16 nr_caps;
+		struct {
+			char name[];
+			char value[];
+		} [nr_caps];
+	} [nr_pmus];
+};
+
 	other bits are reserved and should ignored for now
 	HEADER_FEAT_BITS	= 256,
 
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 579e44c59914..928633f07086 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -179,7 +179,7 @@ static void perf_env__purge_bpf(struct perf_env *env __maybe_unused)
 
 void perf_env__exit(struct perf_env *env)
 {
-	int i;
+	int i, j;
 
 	perf_env__purge_bpf(env);
 	perf_env__purge_cgroups(env);
@@ -222,6 +222,14 @@ void perf_env__exit(struct perf_env *env)
 		zfree(&env->hybrid_cpc_nodes[i].pmu_name);
 	}
 	zfree(&env->hybrid_cpc_nodes);
+
+	for (i = 0; i < env->nr_pmus_with_caps; i++) {
+		zfree(&env->env_pmu_caps[i].pmu_name);
+		for (j = 0; j < env->env_pmu_caps[i].nr_caps; j++)
+			zfree(&env->env_pmu_caps[i].pmu_caps[j]);
+		zfree(&env->env_pmu_caps[i].pmu_caps);
+	}
+	zfree(&env->env_pmu_caps);
 }
 
 void perf_env__init(struct perf_env *env)
@@ -527,3 +535,41 @@ int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu)
 
 	return cpu.cpu >= 0 && cpu.cpu < env->nr_numa_map ? env->numa_map[cpu.cpu] : -1;
 }
+
+char *perf_env__find_pmu_cap(struct perf_env *env, u8 core_type,
+			     const char *pmu_name, const char *cap)
+{
+	struct env_pmu_caps *env_pmu_caps = env->env_pmu_caps;
+	char *cap_eq;
+	int cap_size;
+	char **ptr;
+	int i, j;
+
+	if (!pmu_name || !cap)
+		return NULL;
+
+	cap_size = strlen(cap);
+	cap_eq = zalloc(cap_size + 2);
+	if (!cap_eq)
+		return NULL;
+
+	memcpy(cap_eq, cap, cap_size);
+	cap_eq[cap_size] = '=';
+
+	for (i = 0; i < env->nr_pmus_with_caps; i++) {
+		if (env_pmu_caps[i].core_type != core_type ||
+		    strcmp(env_pmu_caps[i].pmu_name, pmu_name))
+			continue;
+
+		ptr = env_pmu_caps[i].pmu_caps;
+
+		for (j = 0; j < env_pmu_caps[i].nr_caps; j++) {
+			if (!strncmp(ptr[j], cap_eq, cap_size + 1)) {
+				free(cap_eq);
+				return &ptr[j][cap_size + 1];
+			}
+		}
+	}
+	free(cap_eq);
+	return NULL;
+}
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index a3541f98e1fc..2b767f4ae6e0 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -50,6 +50,13 @@ struct hybrid_cpc_node {
 	char            *pmu_name;
 };
 
+struct env_pmu_caps {
+	u8	core_type;
+	char	*pmu_name;
+	u16	nr_caps;
+	char	**pmu_caps;
+};
+
 struct perf_env {
 	char			*hostname;
 	char			*os_release;
@@ -75,6 +82,7 @@ struct perf_env {
 	int			nr_cpu_pmu_caps;
 	int			nr_hybrid_nodes;
 	int			nr_hybrid_cpc_nodes;
+	int			nr_pmus_with_caps;
 	char			*cmdline;
 	const char		**cmdline_argv;
 	char			*sibling_cores;
@@ -95,6 +103,7 @@ struct perf_env {
 	unsigned long long	 memory_bsize;
 	struct hybrid_node	*hybrid_nodes;
 	struct hybrid_cpc_node	*hybrid_cpc_nodes;
+	struct env_pmu_caps	*env_pmu_caps;
 #ifdef HAVE_LIBBPF_SUPPORT
 	/*
 	 * bpf_info_lock protects bpf rbtrees. This is needed because the
@@ -172,4 +181,6 @@ bool perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node);
 struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id);
 
 int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu);
+char *perf_env__find_pmu_cap(struct perf_env *env, u8 core_type,
+			     const char *pmu_name, const char *cap);
 #endif /* __PERF_ENV_H */
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index a27132e5a5ef..23d89dbfcd96 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -217,6 +217,19 @@ static int __do_read(struct feat_fd *ff, void *addr, ssize_t size)
 	return __do_read_buf(ff, addr, size);
 }
 
+static int do_read_u16(struct feat_fd *ff, u16 *addr)
+{
+	int ret;
+
+	ret = __do_read(ff, addr, sizeof(*addr));
+	if (ret)
+		return ret;
+
+	if (ff->ph->needs_swap)
+		*addr = bswap_16(*addr);
+	return 0;
+}
+
 static int do_read_u32(struct feat_fd *ff, u32 *addr)
 {
 	int ret;
@@ -1580,6 +1593,77 @@ static int write_hybrid_cpu_pmu_caps(struct feat_fd *ff,
 	return 0;
 }
 
+/*
+ * File format:
+ *
+ * struct {
+ *	u32 nr_pmus;
+ *	struct {
+ *		u8 core_type;
+ *		char pmu_name[];
+ *		u16 nr_caps;
+ *		struct {
+ *			char name[];
+ *			char value[];
+ *		} [nr_caps];
+ *	} [nr_pmus];
+ * };
+ */
+static int write_pmu_caps(struct feat_fd *ff, struct evlist *evlist __maybe_unused)
+{
+	struct perf_pmu_caps *caps = NULL;
+	struct perf_pmu *pmu = NULL;
+	u8 core_type = 0;
+	u32 nr_pmus = 0;
+	int ret;
+
+	while ((pmu = perf_pmu__scan(pmu))) {
+		if (!pmu->name || !strncmp(pmu->name, "cpu", 3) ||
+		    perf_pmu__caps_parse(pmu) <= 0)
+			continue;
+		nr_pmus++;
+	}
+
+	ret = do_write(ff, &nr_pmus, sizeof(nr_pmus));
+	if (ret < 0)
+		return ret;
+
+	if (!nr_pmus)
+		return 0;
+
+	while ((pmu = perf_pmu__scan(pmu))) {
+		if (!pmu->name || !strncmp(pmu->name, "cpu", 3) || !pmu->nr_caps)
+			continue;
+
+		/*
+		 * Currently core_type is always set to 0. But it can be
+		 * used in future for hybrid topology pmus.
+		 */
+		ret = do_write(ff, &core_type, sizeof(core_type));
+		if (ret < 0)
+			return ret;
+
+		ret = do_write_string(ff, pmu->name);
+		if (ret < 0)
+			return ret;
+
+		ret = do_write(ff, &pmu->nr_caps, sizeof(pmu->nr_caps));
+		if (ret < 0)
+			return ret;
+
+		list_for_each_entry(caps, &pmu->caps, list) {
+			ret = do_write_string(ff, caps->name);
+			if (ret < 0)
+				return ret;
+
+			ret = do_write_string(ff, caps->value);
+			if (ret < 0)
+				return ret;
+		}
+	}
+	return 0;
+}
+
 static void print_hostname(struct feat_fd *ff, FILE *fp)
 {
 	fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname);
@@ -2209,6 +2293,31 @@ static void print_mem_topology(struct feat_fd *ff, FILE *fp)
 	}
 }
 
+static void print_pmu_caps(struct feat_fd *ff, FILE *fp)
+{
+	struct env_pmu_caps *env_pmu_caps = ff->ph->env.env_pmu_caps;
+	int nr_pmus_with_caps = ff->ph->env.nr_pmus_with_caps;
+	const char *delimiter = "";
+	char **ptr;
+	int i, j;
+
+	if (!nr_pmus_with_caps)
+		return;
+
+	for (i = 0; i < nr_pmus_with_caps; i++) {
+		fprintf(fp, "# %s pmu capabilities: ", env_pmu_caps[i].pmu_name);
+
+		ptr = env_pmu_caps[i].pmu_caps;
+
+		delimiter = "";
+		for (j = 0; j < env_pmu_caps[i].nr_caps; j++) {
+			fprintf(fp, "%s%s", delimiter, ptr[j]);
+			delimiter = ", ";
+		}
+		fprintf(fp, "\n");
+	}
+}
+
 static int __event_process_build_id(struct perf_record_header_build_id *bev,
 				    char *filename,
 				    struct perf_session *session)
@@ -3319,6 +3428,107 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
 	return ret;
 }
 
+static int __process_pmu_caps(struct feat_fd *ff, struct env_pmu_caps *env_pmu_caps)
+{
+	u16 nr_caps = env_pmu_caps->nr_caps;
+	int name_size, value_size;
+	char *name, *value, *ptr;
+	u16 i;
+
+	env_pmu_caps->pmu_caps = zalloc(sizeof(char *) * nr_caps);
+	if (!env_pmu_caps->pmu_caps)
+		return -1;
+
+	for (i = 0; i < nr_caps; i++) {
+		name = do_read_string(ff);
+		if (!name)
+			goto error;
+
+		value = do_read_string(ff);
+		if (!value)
+			goto free_name;
+
+		name_size = strlen(name);
+		value_size = strlen(value);
+		ptr = zalloc(sizeof(char) * (name_size + value_size + 2));
+		if (!ptr)
+			goto free_value;
+
+		memcpy(ptr, name, name_size);
+		ptr[name_size] = '=';
+		memcpy(ptr + name_size + 1, value, value_size);
+		env_pmu_caps->pmu_caps[i] = ptr;
+
+		free(value);
+		free(name);
+	}
+	return 0;
+
+free_value:
+	free(value);
+free_name:
+	free(name);
+error:
+	for (; i > 0; i--)
+		free(env_pmu_caps->pmu_caps[i - 1]);
+	free(env_pmu_caps->pmu_caps);
+	return -1;
+}
+
+static int process_pmu_caps(struct feat_fd *ff, void *data __maybe_unused)
+{
+	struct env_pmu_caps *env_pmu_caps;
+	u32 nr_pmus;
+	u32 i;
+	u16 j;
+
+	ff->ph->env.nr_pmus_with_caps = 0;
+	ff->ph->env.env_pmu_caps = NULL;
+
+	if (do_read_u32(ff, &nr_pmus))
+		return -1;
+
+	if (!nr_pmus)
+		return 0;
+
+	env_pmu_caps = zalloc(sizeof(struct env_pmu_caps) * nr_pmus);
+	if (!env_pmu_caps)
+		return -ENOMEM;
+
+	for (i = 0; i < nr_pmus; i++) {
+		if (__do_read(ff, &env_pmu_caps[i].core_type, sizeof(env_pmu_caps[i].core_type)))
+			goto error;
+
+		env_pmu_caps[i].pmu_name = do_read_string(ff);
+		if (!env_pmu_caps[i].pmu_name)
+			goto error;
+
+		if (do_read_u16(ff, &env_pmu_caps[i].nr_caps))
+			goto free_pmu_name;
+
+		if (!__process_pmu_caps(ff, &env_pmu_caps[i]))
+			continue;
+
+free_pmu_name:
+		free(env_pmu_caps[i].pmu_name);
+		goto error;
+	}
+
+	ff->ph->env.nr_pmus_with_caps = nr_pmus;
+	ff->ph->env.env_pmu_caps = env_pmu_caps;
+	return 0;
+
+error:
+	for (; i > 0; i--) {
+		free(env_pmu_caps[i - 1].pmu_name);
+		for (j = 0; j < env_pmu_caps[i - 1].nr_caps; j++)
+			free(env_pmu_caps[i - 1].pmu_caps[j]);
+		free(env_pmu_caps[i - 1].pmu_caps);
+	}
+	free(env_pmu_caps);
+	return -1;
+}
+
 #define FEAT_OPR(n, func, __full_only) \
 	[HEADER_##n] = {					\
 		.name	    = __stringify(n),			\
@@ -3382,6 +3592,7 @@ const struct perf_header_feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPR(CLOCK_DATA,	clock_data,	false),
 	FEAT_OPN(HYBRID_TOPOLOGY,	hybrid_topology,	true),
 	FEAT_OPR(HYBRID_CPU_PMU_CAPS,	hybrid_cpu_pmu_caps,	false),
+	FEAT_OPR(PMU_CAPS,	pmu_caps,	false),
 };
 
 struct header_print_data {
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 0eb4bc29a5a4..e9a067bb8b9e 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -47,6 +47,7 @@ enum {
 	HEADER_CLOCK_DATA,
 	HEADER_HYBRID_TOPOLOGY,
 	HEADER_HYBRID_CPU_PMU_CAPS,
+	HEADER_PMU_CAPS,
 	HEADER_LAST_FEATURE,
 	HEADER_FEAT_BITS	= 256,
 };
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 9a1c7e63e663..8d599acb7569 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1890,16 +1890,22 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
 	const char *sysfs = sysfs__mountpoint();
 	DIR *caps_dir;
 	struct dirent *evt_ent;
-	int nr_caps = 0;
+
+	if (pmu->caps_initialized)
+		return pmu->nr_caps;
 
 	if (!sysfs)
 		return -1;
 
+	pmu->nr_caps = 0;
+
 	snprintf(caps_path, PATH_MAX,
 		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/caps", sysfs, pmu->name);
 
-	if (stat(caps_path, &st) < 0)
+	if (stat(caps_path, &st) < 0) {
+		pmu->caps_initialized = true;
 		return 0;	/* no error if caps does not exist */
+	}
 
 	caps_dir = opendir(caps_path);
 	if (!caps_dir)
@@ -1926,13 +1932,14 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
 			continue;
 		}
 
-		nr_caps++;
+		pmu->nr_caps++;
 		fclose(file);
 	}
 
 	closedir(caps_dir);
 
-	return nr_caps;
+	pmu->caps_initialized = true;
+	return pmu->nr_caps;
 }
 
 void perf_pmu__warn_invalid_config(struct perf_pmu *pmu, __u64 config,
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 541889fa9f9c..593005e68bea 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -46,6 +46,8 @@ struct perf_pmu {
 	struct perf_cpu_map *cpus;
 	struct list_head format;  /* HEAD struct perf_pmu_format -> list */
 	struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
+	bool caps_initialized;
+	u16 nr_caps;
 	struct list_head caps;    /* HEAD struct perf_pmu_caps -> list */
 	struct list_head list;    /* ELEM */
 	struct list_head hybrid_list;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 7/8] perf script ibs: Support new IBS bits in raw trace dump
  2022-05-09  4:49 [PATCH v2 0/8] perf/amd: Zen4 IBS extensions support Ravi Bangoria
                   ` (5 preceding siblings ...)
  2022-05-09  4:49 ` [PATCH v2 6/8] perf header: Parse non-cpu pmu capabilities Ravi Bangoria
@ 2022-05-09  4:49 ` Ravi Bangoria
  2022-05-16 13:29   ` Arnaldo Carvalho de Melo
  2022-05-09  4:49 ` [PATCH v2 8/8] perf ibs: Fix comment Ravi Bangoria
  7 siblings, 1 reply; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-09  4:49 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, rrichter, mingo, mark.rutland, jolsa, namhyung,
	tglx, bp, irogers, yao.jin, james.clark, leo.yan, kan.liang, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

IBS support has been enhanced with two new features in upcoming uarch:
1. DataSrc extension and 2. L3 miss filtering. Additional set of bits
has been introduced in IBS registers to exploit these features.
Interpret those bits while doing perf report/script raw dump.

IBS op pmu ex:
  $ sudo ./perf record -c 130 -a -e ibs_op/l3missonly=1/ --raw-samples
  $ sudo ./perf report -D
  ...
  ibs_op_ctl:     0000004500070008 MaxCnt       128 L3MissOnly 1 En 1
        Val 1 CntCtl 0=cycles CurCnt        69
  ibs_op_data:    0000000000710002 CompToRetCtr     2 TagToRetCtr   113
        BrnRet 0  RipInvalid 0 BrnFuse 0 Microcode 0
  ibs_op_data2:   0000000000000002 CacheHitSt 0=M-state RmtNode 0
        DataSrc 2=A peer cache in a near CCX
  ibs_op_data3:   000000681d1700a1 LdOp 1 StOp 0 DcL1TlbMiss 0
        DcL2TlbMiss 0 DcL1TlbHit2M 0 DcL1TlbHit1G 1 DcL2TlbHit2M 0
        DcMiss 1 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0
        DcMissNoMabAlloc 1 DcLinAddrValid 1 DcPhyAddrValid 1
        DcL2TlbHit1G 0 L2Miss 1 SwPf 0 OpMemWidth  8 bytes
        OpDcMissOpenMemReqs  7 DcMissLat   104 TlbRefillLat     0

IBS Fetch pmu ex:
  $ sudo ./perf record -c 130 -a -e ibs_fetch/l3missonly=1/ --raw-samples
  $ sudo ./perf report -D
  ...
  ibs_fetch_ctl:  3c1f00c700080008 MaxCnt     128 Cnt     128 Lat   199
        En 1 Val 1 Comp 1 IcMiss 1 PhyAddrValid        1 L1TlbPgSz 4KB
        L1TlbMiss 0 L2TlbMiss 0 RandEn 0 L2Miss 1 L3MissOnly 1
        FetchOcMiss 1 FetchL3Miss 1

With the DataSrc extensions, the source of data can be decoded among:
 - Local L3 or other L1/L2 in CCX.
 - A peer cache in a near CCX.
 - Data returned from DRAM.
 - A peer cache in a far CCX.
 - DRAM address map with "long latency" bit set.
 - Data returned from MMIO/Config/PCI/APIC.
 - Extension Memory (S-Link, GenZ, etc - identified by the CS target
    and/or address map at DF's choice).
 - Peer Agent Memory.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/include/asm/amd-ibs.h       | 16 ++++---
 tools/arch/x86/include/asm/amd-ibs.h | 16 ++++---
 tools/perf/util/amd-sample-raw.c     | 68 ++++++++++++++++++++++++----
 3 files changed, 80 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
index 46e1df45efc0..b40b2d4ea2ac 100644
--- a/arch/x86/include/asm/amd-ibs.h
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -29,7 +29,10 @@ union ibs_fetch_ctl {
 			rand_en:1,	/* 57: random tagging enable */
 			fetch_l2_miss:1,/* 58: L2 miss for sampled fetch
 					 *      (needs IbsFetchComp) */
-			reserved:5;	/* 59-63: reserved */
+			l3_miss_only:1,	/* 59: Collect L3 miss samples only */
+			fetch_oc_miss:1,/* 60: Op cache miss for the sampled fetch */
+			fetch_l3_miss:1,/* 61: L3 cache miss for the sampled fetch */
+			reserved:2;	/* 62-63: reserved */
 	};
 };
 
@@ -38,14 +41,14 @@ union ibs_op_ctl {
 	__u64 val;
 	struct {
 		__u64	opmaxcnt:16,	/* 0-15: periodic op max. count */
-			reserved0:1,	/* 16: reserved */
+			l3_miss_only:1,	/* 16: Collect L3 miss samples only */
 			op_en:1,	/* 17: op sampling enable */
 			op_val:1,	/* 18: op sample valid */
 			cnt_ctl:1,	/* 19: periodic op counter control */
 			opmaxcnt_ext:7,	/* 20-26: upper 7 bits of periodic op maximum count */
-			reserved1:5,	/* 27-31: reserved */
+			reserved0:5,	/* 27-31: reserved */
 			opcurcnt:27,	/* 32-58: periodic op counter current count */
-			reserved2:5;	/* 59-63: reserved */
+			reserved1:5;	/* 59-63: reserved */
 	};
 };
 
@@ -71,11 +74,12 @@ union ibs_op_data {
 union ibs_op_data2 {
 	__u64 val;
 	struct {
-		__u64	data_src:3,	/* 0-2: data source */
+		__u64	data_src_lo:3,	/* 0-2: data source low */
 			reserved0:1,	/* 3: reserved */
 			rmt_node:1,	/* 4: destination node */
 			cache_hit_st:1,	/* 5: cache hit state */
-			reserved1:57;	/* 5-63: reserved */
+			data_src_hi:2,	/* 6-7: data source high */
+			reserved1:56;	/* 8-63: reserved */
 	};
 };
 
diff --git a/tools/arch/x86/include/asm/amd-ibs.h b/tools/arch/x86/include/asm/amd-ibs.h
index 174e7d83fcbd..21e01cf6162e 100644
--- a/tools/arch/x86/include/asm/amd-ibs.h
+++ b/tools/arch/x86/include/asm/amd-ibs.h
@@ -29,7 +29,10 @@ union ibs_fetch_ctl {
 			rand_en:1,	/* 57: random tagging enable */
 			fetch_l2_miss:1,/* 58: L2 miss for sampled fetch
 					 *      (needs IbsFetchComp) */
-			reserved:5;	/* 59-63: reserved */
+			l3_miss_only:1,	/* 59: Collect L3 miss samples only */
+			fetch_oc_miss:1,/* 60: Op cache miss for the sampled fetch */
+			fetch_l3_miss:1,/* 61: L3 cache miss for the sampled fetch */
+			reserved:2;	/* 62-63: reserved */
 	};
 };
 
@@ -38,14 +41,14 @@ union ibs_op_ctl {
 	__u64 val;
 	struct {
 		__u64	opmaxcnt:16,	/* 0-15: periodic op max. count */
-			reserved0:1,	/* 16: reserved */
+			l3_miss_only:1,	/* 16: Collect L3 miss samples only */
 			op_en:1,	/* 17: op sampling enable */
 			op_val:1,	/* 18: op sample valid */
 			cnt_ctl:1,	/* 19: periodic op counter control */
 			opmaxcnt_ext:7,	/* 20-26: upper 7 bits of periodic op maximum count */
-			reserved1:5,	/* 27-31: reserved */
+			reserved0:5,	/* 27-31: reserved */
 			opcurcnt:27,	/* 32-58: periodic op counter current count */
-			reserved2:5;	/* 59-63: reserved */
+			reserved1:5;	/* 59-63: reserved */
 	};
 };
 
@@ -71,11 +74,12 @@ union ibs_op_data {
 union ibs_op_data2 {
 	__u64 val;
 	struct {
-		__u64	data_src:3,	/* 0-2: data source */
+		__u64	data_src_lo:3,	/* 0-2: data source low */
 			reserved0:1,	/* 3: reserved */
 			rmt_node:1,	/* 4: destination node */
 			cache_hit_st:1,	/* 5: cache hit state */
-			reserved1:57;	/* 5-63: reserved */
+			data_src_hi:2,	/* 6-7: data source high */
+			reserved1:56;	/* 8-63: reserved */
 	};
 };
 
diff --git a/tools/perf/util/amd-sample-raw.c b/tools/perf/util/amd-sample-raw.c
index d19d765195c5..63303f583bc0 100644
--- a/tools/perf/util/amd-sample-raw.c
+++ b/tools/perf/util/amd-sample-raw.c
@@ -18,6 +18,7 @@
 #include "pmu-events/pmu-events.h"
 
 static u32 cpu_family, cpu_model, ibs_fetch_type, ibs_op_type;
+static bool zen4_ibs_extensions;
 
 static void pr_ibs_fetch_ctl(union ibs_fetch_ctl reg)
 {
@@ -39,6 +40,7 @@ static void pr_ibs_fetch_ctl(union ibs_fetch_ctl reg)
 	};
 	const char *ic_miss_str = NULL;
 	const char *l1tlb_pgsz_str = NULL;
+	char l3_miss_str[sizeof(" L3MissOnly _ FetchOcMiss _ FetchL3Miss _")] = "";
 
 	if (cpu_family == 0x19 && cpu_model < 0x10) {
 		/*
@@ -53,12 +55,19 @@ static void pr_ibs_fetch_ctl(union ibs_fetch_ctl reg)
 		ic_miss_str = ic_miss_strs[reg.ic_miss];
 	}
 
+	if (zen4_ibs_extensions) {
+		snprintf(l3_miss_str, sizeof(l3_miss_str),
+			 " L3MissOnly %d FetchOcMiss %d FetchL3Miss %d",
+			 reg.l3_miss_only, reg.fetch_oc_miss, reg.fetch_l3_miss);
+	}
+
 	printf("ibs_fetch_ctl:\t%016llx MaxCnt %7d Cnt %7d Lat %5d En %d Val %d Comp %d%s "
-	       "PhyAddrValid %d%s L1TlbMiss %d L2TlbMiss %d RandEn %d%s\n",
+		"PhyAddrValid %d%s L1TlbMiss %d L2TlbMiss %d RandEn %d%s%s\n",
 		reg.val, reg.fetch_maxcnt << 4, reg.fetch_cnt << 4, reg.fetch_lat,
 		reg.fetch_en, reg.fetch_val, reg.fetch_comp, ic_miss_str ? : "",
 		reg.phy_addr_valid, l1tlb_pgsz_str ? : "", reg.l1tlb_miss, reg.l2tlb_miss,
-		reg.rand_en, reg.fetch_comp ? (reg.fetch_l2_miss ? " L2Miss 1" : " L2Miss 0") : "");
+		reg.rand_en, reg.fetch_comp ? (reg.fetch_l2_miss ? " L2Miss 1" : " L2Miss 0") : "",
+		l3_miss_str);
 }
 
 static void pr_ic_ibs_extd_ctl(union ic_ibs_extd_ctl reg)
@@ -68,9 +77,15 @@ static void pr_ic_ibs_extd_ctl(union ic_ibs_extd_ctl reg)
 
 static void pr_ibs_op_ctl(union ibs_op_ctl reg)
 {
-	printf("ibs_op_ctl:\t%016llx MaxCnt %9d En %d Val %d CntCtl %d=%s CurCnt %9d\n",
-	       reg.val, ((reg.opmaxcnt_ext << 16) | reg.opmaxcnt) << 4, reg.op_en, reg.op_val,
-	       reg.cnt_ctl, reg.cnt_ctl ? "uOps" : "cycles", reg.opcurcnt);
+	char l3_miss_only[sizeof(" L3MissOnly _")] = "";
+
+	if (zen4_ibs_extensions)
+		snprintf(l3_miss_only, sizeof(l3_miss_only), " L3MissOnly %d", reg.l3_miss_only);
+
+	printf("ibs_op_ctl:\t%016llx MaxCnt %9d%s En %d Val %d CntCtl %d=%s CurCnt %9d\n",
+		reg.val, ((reg.opmaxcnt_ext << 16) | reg.opmaxcnt) << 4, l3_miss_only,
+		reg.op_en, reg.op_val, reg.cnt_ctl,
+		reg.cnt_ctl ? "uOps" : "cycles", reg.opcurcnt);
 }
 
 static void pr_ibs_op_data(union ibs_op_data reg)
@@ -84,7 +99,34 @@ static void pr_ibs_op_data(union ibs_op_data reg)
 		reg.op_brn_ret, reg.op_rip_invalid, reg.op_brn_fuse, reg.op_microcode);
 }
 
-static void pr_ibs_op_data2(union ibs_op_data2 reg)
+static void pr_ibs_op_data2_extended(union ibs_op_data2 reg)
+{
+	static const char * const data_src_str[] = {
+		"",
+		" DataSrc 1=Local L3 or other L1/L2 in CCX",
+		" DataSrc 2=A peer cache in a near CCX",
+		" DataSrc 3=Data returned from DRAM",
+		" DataSrc 4=(reserved)",
+		" DataSrc 5=A peer cache in a far CCX",
+		" DataSrc 6=DRAM address map with \"long latency\" bit set",
+		" DataSrc 7=Data returned from MMIO/Config/PCI/APIC",
+		" DataSrc 8=Extension Memory (S-Link, GenZ, etc)",
+		" DataSrc 9=(reserved)",
+		" DataSrc 10=(reserved)",
+		" DataSrc 11=(reserved)",
+		" DataSrc 12=Peer Agent Memory",
+		/* 13 to 31 are reserved. Avoid printing them. */
+	};
+	int data_src = (reg.data_src_hi << 3) | reg.data_src_lo;
+
+	printf("ibs_op_data2:\t%016llx %sRmtNode %d%s\n", reg.val,
+		(data_src == 1 || data_src == 2 || data_src == 5) ?
+			(reg.cache_hit_st ? "CacheHitSt 1=O-State " : "CacheHitSt 0=M-state ") : "",
+		reg.rmt_node,
+		data_src < (int)ARRAY_SIZE(data_src_str) ? data_src_str[data_src] : "");
+}
+
+static void pr_ibs_op_data2_default(union ibs_op_data2 reg)
 {
 	static const char * const data_src_str[] = {
 		"",
@@ -98,9 +140,16 @@ static void pr_ibs_op_data2(union ibs_op_data2 reg)
 	};
 
 	printf("ibs_op_data2:\t%016llx %sRmtNode %d%s\n", reg.val,
-	       reg.data_src == 2 ? (reg.cache_hit_st ? "CacheHitSt 1=O-State "
+	       reg.data_src_lo == 2 ? (reg.cache_hit_st ? "CacheHitSt 1=O-State "
 						     : "CacheHitSt 0=M-state ") : "",
-	       reg.rmt_node, data_src_str[reg.data_src]);
+	       reg.rmt_node, data_src_str[reg.data_src_lo]);
+}
+
+static void pr_ibs_op_data2(union ibs_op_data2 reg)
+{
+	if (zen4_ibs_extensions)
+		return pr_ibs_op_data2_extended(reg);
+	pr_ibs_op_data2_default(reg);
 }
 
 static void pr_ibs_op_data3(union ibs_op_data3 reg)
@@ -279,6 +328,9 @@ bool evlist__has_amd_ibs(struct evlist *evlist)
 		pmu_mapping += strlen(pmu_mapping) + 1 /* '\0' */;
 	}
 
+	if (perf_env__find_pmu_cap(env, 0, "ibs_op", "zen4_ibs_extensions"))
+		zen4_ibs_extensions = 1;
+
 	if (ibs_fetch_type || ibs_op_type) {
 		if (!cpu_family)
 			parse_cpuid(env);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 8/8] perf ibs: Fix comment
  2022-05-09  4:49 [PATCH v2 0/8] perf/amd: Zen4 IBS extensions support Ravi Bangoria
                   ` (6 preceding siblings ...)
  2022-05-09  4:49 ` [PATCH v2 7/8] perf script ibs: Support new IBS bits in raw trace dump Ravi Bangoria
@ 2022-05-09  4:49 ` Ravi Bangoria
  2022-05-11 19:46   ` [tip: perf/core] perf/ibs: " tip-bot2 for Ravi Bangoria
  7 siblings, 1 reply; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-09  4:49 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, rrichter, mingo, mark.rutland, jolsa, namhyung,
	tglx, bp, irogers, yao.jin, james.clark, leo.yan, kan.liang, ak,
	eranian, like.xu.linux, x86, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, kim.phillips, santosh.shukla

s/IBS Op Data 2/IBS Op Data 1/ for MSR 0xc0011035.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/include/asm/amd-ibs.h       | 2 +-
 tools/arch/x86/include/asm/amd-ibs.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
index b40b2d4ea2ac..f3eb098d63d4 100644
--- a/arch/x86/include/asm/amd-ibs.h
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -52,7 +52,7 @@ union ibs_op_ctl {
 	};
 };
 
-/* MSR 0xc0011035: IBS Op Data 2 */
+/* MSR 0xc0011035: IBS Op Data 1 */
 union ibs_op_data {
 	__u64 val;
 	struct {
diff --git a/tools/arch/x86/include/asm/amd-ibs.h b/tools/arch/x86/include/asm/amd-ibs.h
index 21e01cf6162e..9a3312e12e2e 100644
--- a/tools/arch/x86/include/asm/amd-ibs.h
+++ b/tools/arch/x86/include/asm/amd-ibs.h
@@ -52,7 +52,7 @@ union ibs_op_ctl {
 	};
 };
 
-/* MSR 0xc0011035: IBS Op Data 2 */
+/* MSR 0xc0011035: IBS Op Data 1 */
 union ibs_op_data {
 	__u64 val;
 	struct {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/8] perf/amd/ibs: Add support for L3 miss filtering
  2022-05-09  4:49 ` [PATCH v2 3/8] perf/amd/ibs: Add support for L3 miss filtering Ravi Bangoria
@ 2022-05-09 12:05   ` Peter Zijlstra
  2022-05-09 12:35     ` Ravi Bangoria
  2022-05-11 19:46   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2022-05-09 12:05 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: acme, rrichter, mingo, mark.rutland, jolsa, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla

On Mon, May 09, 2022 at 10:19:09AM +0530, Ravi Bangoria wrote:
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index b06e4c573add..a24b637a6e1d 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -391,6 +391,7 @@ struct pebs_xmm {
>  #define IBS_CAPS_OPBRNFUSE		(1U<<8)
>  #define IBS_CAPS_FETCHCTLEXTD		(1U<<9)
>  #define IBS_CAPS_OPDATA4		(1U<<10)
> +#define IBS_CAPS_ZEN4IBSEXTENSIONS	(1U<<11)
>  
>  #define IBS_CAPS_DEFAULT		(IBS_CAPS_AVAIL		\
>  					 | IBS_CAPS_FETCHSAM	\

Would you mind terribly if I do:

  's/IBS_CAPS_ZEN4IBSEXTENSIONS/IBS_CAPS_ZEN4/'

on it? Per the IBS_ suffix, we're already talking about IBS, per the
CAPS thing we're talking about capabilities and I'm thinking that makes
EXTENTION somewhat redundant, which then leaves:

  IBS_CAPS_ZEN4



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/8] perf/amd/ibs: Add support for L3 miss filtering
  2022-05-09 12:05   ` Peter Zijlstra
@ 2022-05-09 12:35     ` Ravi Bangoria
  2022-05-09 13:07       ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-09 12:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, rrichter, mingo, mark.rutland, jolsa, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria


On 09-May-22 5:35 PM, Peter Zijlstra wrote:
> On Mon, May 09, 2022 at 10:19:09AM +0530, Ravi Bangoria wrote:
>> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
>> index b06e4c573add..a24b637a6e1d 100644
>> --- a/arch/x86/include/asm/perf_event.h
>> +++ b/arch/x86/include/asm/perf_event.h
>> @@ -391,6 +391,7 @@ struct pebs_xmm {
>>  #define IBS_CAPS_OPBRNFUSE		(1U<<8)
>>  #define IBS_CAPS_FETCHCTLEXTD		(1U<<9)
>>  #define IBS_CAPS_OPDATA4		(1U<<10)
>> +#define IBS_CAPS_ZEN4IBSEXTENSIONS	(1U<<11)
>>  
>>  #define IBS_CAPS_DEFAULT		(IBS_CAPS_AVAIL		\
>>  					 | IBS_CAPS_FETCHSAM	\
> 
> Would you mind terribly if I do:
> 
>   's/IBS_CAPS_ZEN4IBSEXTENSIONS/IBS_CAPS_ZEN4/'
> 
> on it? Per the IBS_ suffix, we're already talking about IBS, per the
> CAPS thing we're talking about capabilities and I'm thinking that makes
> EXTENTION somewhat redundant, which then leaves:
> 
>   IBS_CAPS_ZEN4

Yeah, IBS_CAPS_ZEN4 is better. Let me know if you want me to respin.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/8] perf/amd/ibs: Add support for L3 miss filtering
  2022-05-09 12:35     ` Ravi Bangoria
@ 2022-05-09 13:07       ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2022-05-09 13:07 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: acme, rrichter, mingo, mark.rutland, jolsa, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla

On Mon, May 09, 2022 at 06:05:53PM +0530, Ravi Bangoria wrote:
> 
> On 09-May-22 5:35 PM, Peter Zijlstra wrote:
> > On Mon, May 09, 2022 at 10:19:09AM +0530, Ravi Bangoria wrote:
> >> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> >> index b06e4c573add..a24b637a6e1d 100644
> >> --- a/arch/x86/include/asm/perf_event.h
> >> +++ b/arch/x86/include/asm/perf_event.h
> >> @@ -391,6 +391,7 @@ struct pebs_xmm {
> >>  #define IBS_CAPS_OPBRNFUSE		(1U<<8)
> >>  #define IBS_CAPS_FETCHCTLEXTD		(1U<<9)
> >>  #define IBS_CAPS_OPDATA4		(1U<<10)
> >> +#define IBS_CAPS_ZEN4IBSEXTENSIONS	(1U<<11)
> >>  
> >>  #define IBS_CAPS_DEFAULT		(IBS_CAPS_AVAIL		\
> >>  					 | IBS_CAPS_FETCHSAM	\
> > 
> > Would you mind terribly if I do:
> > 
> >   's/IBS_CAPS_ZEN4IBSEXTENSIONS/IBS_CAPS_ZEN4/'
> > 
> > on it? Per the IBS_ suffix, we're already talking about IBS, per the
> > CAPS thing we're talking about capabilities and I'm thinking that makes
> > EXTENTION somewhat redundant, which then leaves:
> > 
> >   IBS_CAPS_ZEN4
> 
> Yeah, IBS_CAPS_ZEN4 is better. Let me know if you want me to respin.

Nah, I just edited the patch, all good.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip: perf/core] perf/ibs: Fix comment
  2022-05-09  4:49 ` [PATCH v2 8/8] perf ibs: Fix comment Ravi Bangoria
@ 2022-05-11 19:46   ` tip-bot2 for Ravi Bangoria
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-05-11 19:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     9cb23f598c641c1dcbe18defd219cdc439bc94a8
Gitweb:        https://git.kernel.org/tip/9cb23f598c641c1dcbe18defd219cdc439bc94a8
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Mon, 09 May 2022 10:19:14 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 11 May 2022 16:27:10 +02:00

perf/ibs: Fix comment

s/IBS Op Data 2/IBS Op Data 1/ for MSR 0xc0011035.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220509044914.1473-9-ravi.bangoria@amd.com
---
 arch/x86/include/asm/amd-ibs.h       | 2 +-
 tools/arch/x86/include/asm/amd-ibs.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
index 46e1df4..aabdbb5 100644
--- a/arch/x86/include/asm/amd-ibs.h
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -49,7 +49,7 @@ union ibs_op_ctl {
 	};
 };
 
-/* MSR 0xc0011035: IBS Op Data 2 */
+/* MSR 0xc0011035: IBS Op Data 1 */
 union ibs_op_data {
 	__u64 val;
 	struct {
diff --git a/tools/arch/x86/include/asm/amd-ibs.h b/tools/arch/x86/include/asm/amd-ibs.h
index 174e7d8..765e9e7 100644
--- a/tools/arch/x86/include/asm/amd-ibs.h
+++ b/tools/arch/x86/include/asm/amd-ibs.h
@@ -49,7 +49,7 @@ union ibs_op_ctl {
 	};
 };
 
-/* MSR 0xc0011035: IBS Op Data 2 */
+/* MSR 0xc0011035: IBS Op Data 1 */
 union ibs_op_data {
 	__u64 val;
 	struct {

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: perf/core] perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute
  2022-05-09  4:49 ` [PATCH v2 4/8] perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute Ravi Bangoria
@ 2022-05-11 19:46   ` tip-bot2 for Ravi Bangoria
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-05-11 19:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     838de1d843fc9b6161e0e1c6308a8c027d08606d
Gitweb:        https://git.kernel.org/tip/838de1d843fc9b6161e0e1c6308a8c027d08606d
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Mon, 09 May 2022 10:19:10 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 11 May 2022 16:27:10 +02:00

perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute

PMU driver can advertise certain feature via capability attribute('caps'
sysfs directory) which can be consumed by userspace tools like perf. Add
zen4_ibs_extensions capability attribute for IBS pmus. This attribute
will be enabled when CPUID_Fn8000001B_EAX[11] is set.

With patch on Zen4:

  $ ls /sys/bus/event_source/devices/ibs_op/caps
  zen4_ibs_extensions

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220509044914.1473-5-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 2dc8b7e..c251bc4 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -537,8 +537,14 @@ static struct attribute_group empty_format_group = {
 	.attrs = attrs_empty,
 };
 
+static struct attribute_group empty_caps_group = {
+	.name = "caps",
+	.attrs = attrs_empty,
+};
+
 static const struct attribute_group *empty_attr_groups[] = {
 	&empty_format_group,
+	&empty_caps_group,
 	NULL,
 };
 
@@ -546,6 +552,7 @@ PMU_FORMAT_ATTR(rand_en,	"config:57");
 PMU_FORMAT_ATTR(cnt_ctl,	"config:19");
 PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59");
 PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16");
+PMU_EVENT_ATTR_STRING(zen4_ibs_extensions, zen4_ibs_extensions, "1");
 
 static umode_t
 zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i)
@@ -563,6 +570,11 @@ static struct attribute *fetch_l3missonly_attrs[] = {
 	NULL,
 };
 
+static struct attribute *zen4_ibs_extensions_attrs[] = {
+	&zen4_ibs_extensions.attr.attr,
+	NULL,
+};
+
 static struct attribute_group group_rand_en = {
 	.name = "format",
 	.attrs = rand_en_attrs,
@@ -574,13 +586,21 @@ static struct attribute_group group_fetch_l3missonly = {
 	.is_visible = zen4_ibs_extensions_is_visible,
 };
 
+static struct attribute_group group_zen4_ibs_extensions = {
+	.name = "caps",
+	.attrs = zen4_ibs_extensions_attrs,
+	.is_visible = zen4_ibs_extensions_is_visible,
+};
+
 static const struct attribute_group *fetch_attr_groups[] = {
 	&group_rand_en,
+	&empty_caps_group,
 	NULL,
 };
 
 static const struct attribute_group *fetch_attr_update[] = {
 	&group_fetch_l3missonly,
+	&group_zen4_ibs_extensions,
 	NULL,
 };
 
@@ -615,6 +635,7 @@ static struct attribute_group group_op_l3missonly = {
 static const struct attribute_group *op_attr_update[] = {
 	&group_cnt_ctl,
 	&group_op_l3missonly,
+	&group_zen4_ibs_extensions,
 	NULL,
 };
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: perf/core] perf/amd/ibs: Add support for L3 miss filtering
  2022-05-09  4:49 ` [PATCH v2 3/8] perf/amd/ibs: Add support for L3 miss filtering Ravi Bangoria
  2022-05-09 12:05   ` Peter Zijlstra
@ 2022-05-11 19:46   ` tip-bot2 for Ravi Bangoria
  1 sibling, 0 replies; 25+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-05-11 19:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     ba5d35b442c65f32d38ef61f732218274c6dcf4c
Gitweb:        https://git.kernel.org/tip/ba5d35b442c65f32d38ef61f732218274c6dcf4c
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Mon, 09 May 2022 10:19:09 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 11 May 2022 16:27:10 +02:00

perf/amd/ibs: Add support for L3 miss filtering

IBS L3 miss filtering works by tagging an instruction on IBS counter
overflow and generating an NMI if the tagged instruction causes an L3
miss. Samples without an L3 miss are discarded and counter is reset
with random value (between 1-15 for fetch pmu and 1-127 for op pmu).
This helps in reducing sampling overhead when user is interested only
in such samples. One of the use case of such filtered samples is to
feed data to page-migration daemon in tiered memory systems.

Add support for L3 miss filtering in IBS driver via new pmu attribute
"l3missonly". Example usage:

  # perf record -a -e ibs_op/l3missonly=1/ --raw-samples sleep 5

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220509044914.1473-4-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c         | 67 ++++++++++++++++++++++++++----
 arch/x86/include/asm/perf_event.h |  3 +-
 2 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index ece4f6a..2dc8b7e 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -544,22 +544,46 @@ static const struct attribute_group *empty_attr_groups[] = {
 
 PMU_FORMAT_ATTR(rand_en,	"config:57");
 PMU_FORMAT_ATTR(cnt_ctl,	"config:19");
+PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59");
+PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16");
+
+static umode_t
+zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+{
+	return ibs_caps & IBS_CAPS_ZEN4 ? attr->mode : 0;
+}
 
 static struct attribute *rand_en_attrs[] = {
 	&format_attr_rand_en.attr,
 	NULL,
 };
 
+static struct attribute *fetch_l3missonly_attrs[] = {
+	&fetch_l3missonly.attr.attr,
+	NULL,
+};
+
 static struct attribute_group group_rand_en = {
 	.name = "format",
 	.attrs = rand_en_attrs,
 };
 
+static struct attribute_group group_fetch_l3missonly = {
+	.name = "format",
+	.attrs = fetch_l3missonly_attrs,
+	.is_visible = zen4_ibs_extensions_is_visible,
+};
+
 static const struct attribute_group *fetch_attr_groups[] = {
 	&group_rand_en,
 	NULL,
 };
 
+static const struct attribute_group *fetch_attr_update[] = {
+	&group_fetch_l3missonly,
+	NULL,
+};
+
 static umode_t
 cnt_ctl_is_visible(struct kobject *kobj, struct attribute *attr, int i)
 {
@@ -571,14 +595,26 @@ static struct attribute *cnt_ctl_attrs[] = {
 	NULL,
 };
 
+static struct attribute *op_l3missonly_attrs[] = {
+	&op_l3missonly.attr.attr,
+	NULL,
+};
+
 static struct attribute_group group_cnt_ctl = {
 	.name = "format",
 	.attrs = cnt_ctl_attrs,
 	.is_visible = cnt_ctl_is_visible,
 };
 
+static struct attribute_group group_op_l3missonly = {
+	.name = "format",
+	.attrs = op_l3missonly_attrs,
+	.is_visible = zen4_ibs_extensions_is_visible,
+};
+
 static const struct attribute_group *op_attr_update[] = {
 	&group_cnt_ctl,
+	&group_op_l3missonly,
 	NULL,
 };
 
@@ -805,10 +841,8 @@ static __init int perf_ibs_pmu_init(struct perf_ibs *perf_ibs, char *name)
 	return ret;
 }
 
-static __init int perf_event_ibs_init(void)
+static __init int perf_ibs_fetch_init(void)
 {
-	int ret;
-
 	/*
 	 * Some chips fail to reset the fetch count when it is written; instead
 	 * they need a 0-1 transition of IbsFetchEn.
@@ -819,12 +853,17 @@ static __init int perf_event_ibs_init(void)
 	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10)
 		perf_ibs_fetch.fetch_ignore_if_zero_rip = 1;
 
+	if (ibs_caps & IBS_CAPS_ZEN4)
+		perf_ibs_fetch.config_mask |= IBS_FETCH_L3MISSONLY;
+
 	perf_ibs_fetch.pmu.attr_groups = fetch_attr_groups;
+	perf_ibs_fetch.pmu.attr_update = fetch_attr_update;
 
-	ret = perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
-	if (ret)
-		return ret;
+	return perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
+}
 
+static __init int perf_ibs_op_init(void)
+{
 	if (ibs_caps & IBS_CAPS_OPCNT)
 		perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
 
@@ -834,10 +873,24 @@ static __init int perf_event_ibs_init(void)
 		perf_ibs_op.cnt_mask    |= IBS_OP_MAX_CNT_EXT_MASK;
 	}
 
+	if (ibs_caps & IBS_CAPS_ZEN4)
+		perf_ibs_op.config_mask |= IBS_OP_L3MISSONLY;
+
 	perf_ibs_op.pmu.attr_groups = empty_attr_groups;
 	perf_ibs_op.pmu.attr_update = op_attr_update;
 
-	ret = perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
+	return perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
+}
+
+static __init int perf_event_ibs_init(void)
+{
+	int ret;
+
+	ret = perf_ibs_fetch_init();
+	if (ret)
+		return ret;
+
+	ret = perf_ibs_op_init();
 	if (ret)
 		goto err_op;
 
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 7aa1d42..409725e 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -410,6 +410,7 @@ struct pebs_xmm {
 #define IBS_CAPS_OPBRNFUSE		(1U<<8)
 #define IBS_CAPS_FETCHCTLEXTD		(1U<<9)
 #define IBS_CAPS_OPDATA4		(1U<<10)
+#define IBS_CAPS_ZEN4			(1U<<11)
 
 #define IBS_CAPS_DEFAULT		(IBS_CAPS_AVAIL		\
 					 | IBS_CAPS_FETCHSAM	\
@@ -423,6 +424,7 @@ struct pebs_xmm {
 #define IBSCTL_LVT_OFFSET_MASK		0x0F
 
 /* IBS fetch bits/masks */
+#define IBS_FETCH_L3MISSONLY	(1ULL<<59)
 #define IBS_FETCH_RAND_EN	(1ULL<<57)
 #define IBS_FETCH_VAL		(1ULL<<49)
 #define IBS_FETCH_ENABLE	(1ULL<<48)
@@ -439,6 +441,7 @@ struct pebs_xmm {
 #define IBS_OP_CNT_CTL		(1ULL<<19)
 #define IBS_OP_VAL		(1ULL<<18)
 #define IBS_OP_ENABLE		(1ULL<<17)
+#define IBS_OP_L3MISSONLY	(1ULL<<16)
 #define IBS_OP_MAX_CNT		0x0000FFFFULL
 #define IBS_OP_MAX_CNT_EXT	0x007FFFFFULL	/* not a register bit mask */
 #define IBS_OP_MAX_CNT_EXT_MASK	(0x7FULL<<20)	/* separate upper 7 bits */

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: perf/core] perf/amd/ibs: Use ->is_visible callback for dynamic attributes
  2022-05-09  4:49 ` [PATCH v2 2/8] perf/amd/ibs: Use ->is_visible callback for dynamic attributes Ravi Bangoria
@ 2022-05-11 19:47   ` tip-bot2 for Ravi Bangoria
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-05-11 19:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     2a7a7e658682bfd7501dc6b4c9d365aa6c79788a
Gitweb:        https://git.kernel.org/tip/2a7a7e658682bfd7501dc6b4c9d365aa6c79788a
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Mon, 09 May 2022 10:19:08 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 11 May 2022 16:27:09 +02:00

perf/amd/ibs: Use ->is_visible callback for dynamic attributes

Currently, some attributes are added at build time whereas others
at boot time depending on IBS pmu capabilities. Instead, we can
just add all attribute groups at build time but hide individual
group at boot time using more appropriate ->is_visible() callback.

Also, struct perf_ibs has bunch of fields for pmu attributes which
just pass on the pointer, does not do anything else. Remove them.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220509044914.1473-3-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c | 78 ++++++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 24 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 2704ec1..ece4f6a 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -94,10 +94,6 @@ struct perf_ibs {
 	unsigned int			fetch_ignore_if_zero_rip : 1;
 	struct cpu_perf_ibs __percpu	*pcpu;
 
-	struct attribute		**format_attrs;
-	struct attribute_group		format_group;
-	const struct attribute_group	*attr_groups[2];
-
 	u64				(*get_count)(u64 config);
 };
 
@@ -528,16 +524,61 @@ static void perf_ibs_del(struct perf_event *event, int flags)
 
 static void perf_ibs_read(struct perf_event *event) { }
 
+/*
+ * We need to initialize with empty group if all attributes in the
+ * group are dynamic.
+ */
+static struct attribute *attrs_empty[] = {
+	NULL,
+};
+
+static struct attribute_group empty_format_group = {
+	.name = "format",
+	.attrs = attrs_empty,
+};
+
+static const struct attribute_group *empty_attr_groups[] = {
+	&empty_format_group,
+	NULL,
+};
+
 PMU_FORMAT_ATTR(rand_en,	"config:57");
 PMU_FORMAT_ATTR(cnt_ctl,	"config:19");
 
-static struct attribute *ibs_fetch_format_attrs[] = {
+static struct attribute *rand_en_attrs[] = {
 	&format_attr_rand_en.attr,
 	NULL,
 };
 
-static struct attribute *ibs_op_format_attrs[] = {
-	NULL,	/* &format_attr_cnt_ctl.attr if IBS_CAPS_OPCNT */
+static struct attribute_group group_rand_en = {
+	.name = "format",
+	.attrs = rand_en_attrs,
+};
+
+static const struct attribute_group *fetch_attr_groups[] = {
+	&group_rand_en,
+	NULL,
+};
+
+static umode_t
+cnt_ctl_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+{
+	return ibs_caps & IBS_CAPS_OPCNT ? attr->mode : 0;
+}
+
+static struct attribute *cnt_ctl_attrs[] = {
+	&format_attr_cnt_ctl.attr,
+	NULL,
+};
+
+static struct attribute_group group_cnt_ctl = {
+	.name = "format",
+	.attrs = cnt_ctl_attrs,
+	.is_visible = cnt_ctl_is_visible,
+};
+
+static const struct attribute_group *op_attr_update[] = {
+	&group_cnt_ctl,
 	NULL,
 };
 
@@ -561,7 +602,6 @@ static struct perf_ibs perf_ibs_fetch = {
 	.max_period		= IBS_FETCH_MAX_CNT << 4,
 	.offset_mask		= { MSR_AMD64_IBSFETCH_REG_MASK },
 	.offset_max		= MSR_AMD64_IBSFETCH_REG_COUNT,
-	.format_attrs		= ibs_fetch_format_attrs,
 
 	.get_count		= get_ibs_fetch_count,
 };
@@ -587,7 +627,6 @@ static struct perf_ibs perf_ibs_op = {
 	.max_period		= IBS_OP_MAX_CNT << 4,
 	.offset_mask		= { MSR_AMD64_IBSOP_REG_MASK },
 	.offset_max		= MSR_AMD64_IBSOP_REG_COUNT,
-	.format_attrs		= ibs_op_format_attrs,
 
 	.get_count		= get_ibs_op_count,
 };
@@ -757,17 +796,6 @@ static __init int perf_ibs_pmu_init(struct perf_ibs *perf_ibs, char *name)
 
 	perf_ibs->pcpu = pcpu;
 
-	/* register attributes */
-	if (perf_ibs->format_attrs[0]) {
-		memset(&perf_ibs->format_group, 0, sizeof(perf_ibs->format_group));
-		perf_ibs->format_group.name	= "format";
-		perf_ibs->format_group.attrs	= perf_ibs->format_attrs;
-
-		memset(&perf_ibs->attr_groups, 0, sizeof(perf_ibs->attr_groups));
-		perf_ibs->attr_groups[0]	= &perf_ibs->format_group;
-		perf_ibs->pmu.attr_groups	= perf_ibs->attr_groups;
-	}
-
 	ret = perf_pmu_register(&perf_ibs->pmu, name, -1);
 	if (ret) {
 		perf_ibs->pcpu = NULL;
@@ -779,7 +807,6 @@ static __init int perf_ibs_pmu_init(struct perf_ibs *perf_ibs, char *name)
 
 static __init int perf_event_ibs_init(void)
 {
-	struct attribute **attr = ibs_op_format_attrs;
 	int ret;
 
 	/*
@@ -792,14 +819,14 @@ static __init int perf_event_ibs_init(void)
 	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10)
 		perf_ibs_fetch.fetch_ignore_if_zero_rip = 1;
 
+	perf_ibs_fetch.pmu.attr_groups = fetch_attr_groups;
+
 	ret = perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
 	if (ret)
 		return ret;
 
-	if (ibs_caps & IBS_CAPS_OPCNT) {
+	if (ibs_caps & IBS_CAPS_OPCNT)
 		perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
-		*attr++ = &format_attr_cnt_ctl.attr;
-	}
 
 	if (ibs_caps & IBS_CAPS_OPCNTEXT) {
 		perf_ibs_op.max_period  |= IBS_OP_MAX_CNT_EXT_MASK;
@@ -807,6 +834,9 @@ static __init int perf_event_ibs_init(void)
 		perf_ibs_op.cnt_mask    |= IBS_OP_MAX_CNT_EXT_MASK;
 	}
 
+	perf_ibs_op.pmu.attr_groups = empty_attr_groups;
+	perf_ibs_op.pmu.attr_update = op_attr_update;
+
 	ret = perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
 	if (ret)
 		goto err_op;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: perf/core] perf/amd/ibs: Cascade pmu init functions' return value
  2022-05-09  4:49 ` [PATCH v2 1/8] perf/amd/ibs: Cascade pmu init functions' return value Ravi Bangoria
@ 2022-05-11 19:47   ` tip-bot2 for Ravi Bangoria
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-05-11 19:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     39b2ca75eec8a33e2ffdb8aa0c4840ec3e3b472c
Gitweb:        https://git.kernel.org/tip/39b2ca75eec8a33e2ffdb8aa0c4840ec3e3b472c
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Mon, 09 May 2022 10:19:07 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 11 May 2022 16:27:09 +02:00

perf/amd/ibs: Cascade pmu init functions' return value

IBS pmu initialization code ignores return value provided by
callee functions. Fix it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220509044914.1473-2-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 11e8b49..2704ec1 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -777,9 +777,10 @@ static __init int perf_ibs_pmu_init(struct perf_ibs *perf_ibs, char *name)
 	return ret;
 }
 
-static __init void perf_event_ibs_init(void)
+static __init int perf_event_ibs_init(void)
 {
 	struct attribute **attr = ibs_op_format_attrs;
+	int ret;
 
 	/*
 	 * Some chips fail to reset the fetch count when it is written; instead
@@ -791,7 +792,9 @@ static __init void perf_event_ibs_init(void)
 	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10)
 		perf_ibs_fetch.fetch_ignore_if_zero_rip = 1;
 
-	perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
+	ret = perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
+	if (ret)
+		return ret;
 
 	if (ibs_caps & IBS_CAPS_OPCNT) {
 		perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
@@ -804,15 +807,35 @@ static __init void perf_event_ibs_init(void)
 		perf_ibs_op.cnt_mask    |= IBS_OP_MAX_CNT_EXT_MASK;
 	}
 
-	perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
+	ret = perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
+	if (ret)
+		goto err_op;
+
+	ret = register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
+	if (ret)
+		goto err_nmi;
 
-	register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
 	pr_info("perf: AMD IBS detected (0x%08x)\n", ibs_caps);
+	return 0;
+
+err_nmi:
+	perf_pmu_unregister(&perf_ibs_op.pmu);
+	free_percpu(perf_ibs_op.pcpu);
+	perf_ibs_op.pcpu = NULL;
+err_op:
+	perf_pmu_unregister(&perf_ibs_fetch.pmu);
+	free_percpu(perf_ibs_fetch.pcpu);
+	perf_ibs_fetch.pcpu = NULL;
+
+	return ret;
 }
 
 #else /* defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD) */
 
-static __init void perf_event_ibs_init(void) { }
+static __init int perf_event_ibs_init(void)
+{
+	return 0;
+}
 
 #endif
 
@@ -1082,9 +1105,7 @@ static __init int amd_ibs_init(void)
 			  x86_pmu_amd_ibs_starting_cpu,
 			  x86_pmu_amd_ibs_dying_cpu);
 
-	perf_event_ibs_init();
-
-	return 0;
+	return perf_event_ibs_init();
 }
 
 /* Since we need the pci subsystem to init ibs we can't do this earlier: */

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 6/8] perf header: Parse non-cpu pmu capabilities
  2022-05-09  4:49 ` [PATCH v2 6/8] perf header: Parse non-cpu pmu capabilities Ravi Bangoria
@ 2022-05-16  4:15   ` Ravi Bangoria
  2022-05-16 12:53     ` Arnaldo Carvalho de Melo
  2022-05-16 13:28   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-16  4:15 UTC (permalink / raw)
  To: acme, jolsa
  Cc: peterz, rrichter, mingo, mark.rutland, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria


On 09-May-22 10:19 AM, Ravi Bangoria wrote:
> Pmus advertise their capabilities via sysfs attribute files but
> perf tool currently parses only core(cpu) pmu capabilities. Add
> support for parsing non-cpu pmu capabilities.

Arnaldo, Jiri,

Does tool side patches looks good to you? Please consider pulling them.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 6/8] perf header: Parse non-cpu pmu capabilities
  2022-05-16  4:15   ` Ravi Bangoria
@ 2022-05-16 12:53     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-16 12:53 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: jolsa, peterz, rrichter, mingo, mark.rutland, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla

Em Mon, May 16, 2022 at 09:45:44AM +0530, Ravi Bangoria escreveu:
> 
> On 09-May-22 10:19 AM, Ravi Bangoria wrote:
> > Pmus advertise their capabilities via sysfs attribute files but
> > perf tool currently parses only core(cpu) pmu capabilities. Add
> > support for parsing non-cpu pmu capabilities.
> 
> Arnaldo, Jiri,
> 
> Does tool side patches looks good to you? Please consider pulling them.

So the kernel part was merged, ok, will look into it.

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/8] perf record ibs: Warn about sampling period skew
  2022-05-09  4:49 ` [PATCH v2 5/8] perf record ibs: Warn about sampling period skew Ravi Bangoria
@ 2022-05-16 13:22   ` Arnaldo Carvalho de Melo
  2022-05-16 13:27     ` Ravi Bangoria
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-16 13:22 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, rrichter, mingo, mark.rutland, jolsa, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla

Em Mon, May 09, 2022 at 10:19:11AM +0530, Ravi Bangoria escreveu:
> Samples without an L3 miss are discarded and counter is reset with
> random value (between 1-15 for fetch pmu and 1-127 for op pmu) when
> IBS L3 miss filtering is enabled. This causes a sampling period skew
> but there is no way to reconstruct aggregated sampling period. So
> print a warning at perf record if user sets l3missonly=1.
> 
> Ex:
>   # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/
>   WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled
>   and tagged operation does not cause L3 Miss. This causes sampling period skew.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  tools/perf/arch/x86/util/evsel.c | 34 ++++++++++++++++++++++++++++++++
>  tools/perf/util/evsel.c          |  7 +++++++
>  tools/perf/util/evsel.h          |  1 +
>  3 files changed, 42 insertions(+)
> 
> diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
> index ac2899a25b7a..6399faa70a88 100644
> --- a/tools/perf/arch/x86/util/evsel.c
> +++ b/tools/perf/arch/x86/util/evsel.c
> @@ -4,6 +4,8 @@
>  #include "util/evsel.h"
>  #include "util/env.h"
>  #include "linux/string.h"
> +#include "util/pmu.h"
> +#include "util/debug.h"
>  
>  void arch_evsel__set_sample_weight(struct evsel *evsel)
>  {
> @@ -29,3 +31,35 @@ void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr)
>  
>  	free(env.cpuid);
>  }
> +
> +static void ibs_l3miss_warn(void)
> +{
> +	pr_warning(
> +"WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled\n"
> +"and tagged operation does not cause L3 Miss. This causes sampling period skew.\n");
> +}
> +
> +void arch_evsel__warn_ambiguity(struct evsel *evsel, struct perf_event_attr *attr)
> +{
> +	struct perf_env *env = evsel__env(evsel);
> +	struct perf_pmu *evsel_pmu = evsel__find_pmu(evsel);
> +	struct perf_pmu *ibs_fetch_pmu = perf_pmu__find("ibs_fetch");
> +	struct perf_pmu *ibs_op_pmu = perf_pmu__find("ibs_op");
> +	static int warned_once;

Please check first if the warning was emitted (warned_once is true)
before calling all the find routines above.

> +	if (warned_once || !perf_env__cpuid(env) || !env->cpuid ||
> +	    !strstarts(env->cpuid, "AuthenticAMD") || !evsel_pmu)
> +		return;
> +
> +	if (ibs_fetch_pmu && ibs_fetch_pmu->type == evsel_pmu->type) {
> +		if (attr->config & (1ULL << 59)) {
> +			ibs_l3miss_warn();
> +			warned_once = 1;
> +		}
> +	} else if (ibs_op_pmu && ibs_op_pmu->type == evsel_pmu->type) {
> +		if (attr->config & (1ULL << 16)) {
> +			ibs_l3miss_warn();
> +			warned_once = 1;
> +		}
> +	}
> +}
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 2a1729e7aee4..4f8b72d4a521 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1064,6 +1064,11 @@ void __weak arch_evsel__fixup_new_cycles(struct perf_event_attr *attr __maybe_un
>  {
>  }
>  
> +void __weak arch_evsel__warn_ambiguity(struct evsel *evsel __maybe_unused,
> +				       struct perf_event_attr *attr __maybe_unused)
> +{
> +}
> +
>  static void evsel__set_default_freq_period(struct record_opts *opts,
>  					   struct perf_event_attr *attr)
>  {
> @@ -1339,6 +1344,8 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
>  	 */
>  	if (evsel__is_dummy_event(evsel))
>  		evsel__reset_sample_bit(evsel, BRANCH_STACK);
> +
> +	arch_evsel__warn_ambiguity(evsel, attr);

Wouldn't this be better as a single arch__post_evsel_config() function that
could do arch specific fixups or emit such warnings _after_ (thus the
"post") the common code evsel__config() does its thing?

>  }
>  
>  int evsel__set_filter(struct evsel *evsel, const char *filter)
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 041b42d33bf5..195ae30ec45b 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -281,6 +281,7 @@ void evsel__set_sample_id(struct evsel *evsel, bool use_sample_identifier);
>  
>  void arch_evsel__set_sample_weight(struct evsel *evsel);
>  void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr);
> +void arch_evsel__warn_ambiguity(struct evsel *evsel, struct perf_event_attr *attr);
>  
>  int evsel__set_filter(struct evsel *evsel, const char *filter);
>  int evsel__append_tp_filter(struct evsel *evsel, const char *filter);
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/8] perf record ibs: Warn about sampling period skew
  2022-05-16 13:22   ` Arnaldo Carvalho de Melo
@ 2022-05-16 13:27     ` Ravi Bangoria
  0 siblings, 0 replies; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-16 13:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: peterz, rrichter, mingo, mark.rutland, jolsa, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria

On 16-May-22 6:52 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 09, 2022 at 10:19:11AM +0530, Ravi Bangoria escreveu:
>> Samples without an L3 miss are discarded and counter is reset with
>> random value (between 1-15 for fetch pmu and 1-127 for op pmu) when
>> IBS L3 miss filtering is enabled. This causes a sampling period skew
>> but there is no way to reconstruct aggregated sampling period. So
>> print a warning at perf record if user sets l3missonly=1.
>>
>> Ex:
>>   # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/
>>   WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled
>>   and tagged operation does not cause L3 Miss. This causes sampling period skew.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>> ---
>>  tools/perf/arch/x86/util/evsel.c | 34 ++++++++++++++++++++++++++++++++
>>  tools/perf/util/evsel.c          |  7 +++++++
>>  tools/perf/util/evsel.h          |  1 +
>>  3 files changed, 42 insertions(+)
>>
>> diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
>> index ac2899a25b7a..6399faa70a88 100644
>> --- a/tools/perf/arch/x86/util/evsel.c
>> +++ b/tools/perf/arch/x86/util/evsel.c
>> @@ -4,6 +4,8 @@
>>  #include "util/evsel.h"
>>  #include "util/env.h"
>>  #include "linux/string.h"
>> +#include "util/pmu.h"
>> +#include "util/debug.h"
>>  
>>  void arch_evsel__set_sample_weight(struct evsel *evsel)
>>  {
>> @@ -29,3 +31,35 @@ void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr)
>>  
>>  	free(env.cpuid);
>>  }
>> +
>> +static void ibs_l3miss_warn(void)
>> +{
>> +	pr_warning(
>> +"WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled\n"
>> +"and tagged operation does not cause L3 Miss. This causes sampling period skew.\n");
>> +}
>> +
>> +void arch_evsel__warn_ambiguity(struct evsel *evsel, struct perf_event_attr *attr)
>> +{
>> +	struct perf_env *env = evsel__env(evsel);
>> +	struct perf_pmu *evsel_pmu = evsel__find_pmu(evsel);
>> +	struct perf_pmu *ibs_fetch_pmu = perf_pmu__find("ibs_fetch");
>> +	struct perf_pmu *ibs_op_pmu = perf_pmu__find("ibs_op");
>> +	static int warned_once;
> 
> Please check first if the warning was emitted (warned_once is true)
> before calling all the find routines above.

Sure.

> 
>> +	if (warned_once || !perf_env__cpuid(env) || !env->cpuid ||
>> +	    !strstarts(env->cpuid, "AuthenticAMD") || !evsel_pmu)
>> +		return;
>> +
>> +	if (ibs_fetch_pmu && ibs_fetch_pmu->type == evsel_pmu->type) {
>> +		if (attr->config & (1ULL << 59)) {
>> +			ibs_l3miss_warn();
>> +			warned_once = 1;
>> +		}
>> +	} else if (ibs_op_pmu && ibs_op_pmu->type == evsel_pmu->type) {
>> +		if (attr->config & (1ULL << 16)) {
>> +			ibs_l3miss_warn();
>> +			warned_once = 1;
>> +		}
>> +	}
>> +}
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index 2a1729e7aee4..4f8b72d4a521 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -1064,6 +1064,11 @@ void __weak arch_evsel__fixup_new_cycles(struct perf_event_attr *attr __maybe_un
>>  {
>>  }
>>  
>> +void __weak arch_evsel__warn_ambiguity(struct evsel *evsel __maybe_unused,
>> +				       struct perf_event_attr *attr __maybe_unused)
>> +{
>> +}
>> +
>>  static void evsel__set_default_freq_period(struct record_opts *opts,
>>  					   struct perf_event_attr *attr)
>>  {
>> @@ -1339,6 +1344,8 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
>>  	 */
>>  	if (evsel__is_dummy_event(evsel))
>>  		evsel__reset_sample_bit(evsel, BRANCH_STACK);
>> +
>> +	arch_evsel__warn_ambiguity(evsel, attr);
> 
> Wouldn't this be better as a single arch__post_evsel_config() function that
> could do arch specific fixups or emit such warnings _after_ (thus the
> "post") the common code evsel__config() does its thing?

Will rename accordingly. 

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 6/8] perf header: Parse non-cpu pmu capabilities
  2022-05-09  4:49 ` [PATCH v2 6/8] perf header: Parse non-cpu pmu capabilities Ravi Bangoria
  2022-05-16  4:15   ` Ravi Bangoria
@ 2022-05-16 13:28   ` Arnaldo Carvalho de Melo
  2022-05-16 13:46     ` Ravi Bangoria
  1 sibling, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-16 13:28 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, rrichter, mingo, mark.rutland, jolsa, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla

Em Mon, May 09, 2022 at 10:19:12AM +0530, Ravi Bangoria escreveu:
> Pmus advertise their capabilities via sysfs attribute files but
> perf tool currently parses only core(cpu) pmu capabilities. Add
> support for parsing non-cpu pmu capabilities.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  .../Documentation/perf.data-file-format.txt   |  18 ++
>  tools/perf/util/env.c                         |  48 +++-
>  tools/perf/util/env.h                         |  11 +
>  tools/perf/util/header.c                      | 211 ++++++++++++++++++
>  tools/perf/util/header.h                      |   1 +
>  tools/perf/util/pmu.c                         |  15 +-
>  tools/perf/util/pmu.h                         |   2 +
>  7 files changed, 301 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt
> index f56d0e0fbff6..dea3acb36558 100644
> --- a/tools/perf/Documentation/perf.data-file-format.txt
> +++ b/tools/perf/Documentation/perf.data-file-format.txt
> @@ -435,6 +435,24 @@ struct {
>  	} [nr_pmu];
>  };
>  
> +	HEADER_PMU_CAPS = 32,
> +
> +	List of pmu capabilities (except cpu pmu which is already
> +	covered by HEADER_CPU_PMU_CAPS)
> +
> +struct {
> +	u32 nr_pmus;
> +	struct {
> +		u8 core_type;	/* For hybrid topology */

Humm, I'd say use u32 here and..

> +		char pmu_name[];
> +		u16 nr_caps;

Here, no need to save space here, I guess.

> +		struct {
> +			char name[];
> +			char value[];
> +		} [nr_caps];
> +	} [nr_pmus];
> +};
> +
>  	other bits are reserved and should ignored for now
>  	HEADER_FEAT_BITS	= 256,
>  
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 579e44c59914..928633f07086 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -179,7 +179,7 @@ static void perf_env__purge_bpf(struct perf_env *env __maybe_unused)
>  
>  void perf_env__exit(struct perf_env *env)
>  {
> -	int i;
> +	int i, j;
>  
>  	perf_env__purge_bpf(env);
>  	perf_env__purge_cgroups(env);
> @@ -222,6 +222,14 @@ void perf_env__exit(struct perf_env *env)
>  		zfree(&env->hybrid_cpc_nodes[i].pmu_name);
>  	}
>  	zfree(&env->hybrid_cpc_nodes);
> +
> +	for (i = 0; i < env->nr_pmus_with_caps; i++) {
> +		zfree(&env->env_pmu_caps[i].pmu_name);
> +		for (j = 0; j < env->env_pmu_caps[i].nr_caps; j++)
> +			zfree(&env->env_pmu_caps[i].pmu_caps[j]);
> +		zfree(&env->env_pmu_caps[i].pmu_caps);
> +	}
> +	zfree(&env->env_pmu_caps);
>  }
>  
>  void perf_env__init(struct perf_env *env)
> @@ -527,3 +535,41 @@ int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu)
>  
>  	return cpu.cpu >= 0 && cpu.cpu < env->nr_numa_map ? env->numa_map[cpu.cpu] : -1;
>  }
> +
> +char *perf_env__find_pmu_cap(struct perf_env *env, u8 core_type,
> +			     const char *pmu_name, const char *cap)
> +{
> +	struct env_pmu_caps *env_pmu_caps = env->env_pmu_caps;
> +	char *cap_eq;
> +	int cap_size;
> +	char **ptr;
> +	int i, j;
> +
> +	if (!pmu_name || !cap)
> +		return NULL;
> +
> +	cap_size = strlen(cap);
> +	cap_eq = zalloc(cap_size + 2);
> +	if (!cap_eq)
> +		return NULL;
> +
> +	memcpy(cap_eq, cap, cap_size);
> +	cap_eq[cap_size] = '=';
> +
> +	for (i = 0; i < env->nr_pmus_with_caps; i++) {
> +		if (env_pmu_caps[i].core_type != core_type ||
> +		    strcmp(env_pmu_caps[i].pmu_name, pmu_name))
> +			continue;
> +
> +		ptr = env_pmu_caps[i].pmu_caps;
> +
> +		for (j = 0; j < env_pmu_caps[i].nr_caps; j++) {
> +			if (!strncmp(ptr[j], cap_eq, cap_size + 1)) {
> +				free(cap_eq);
> +				return &ptr[j][cap_size + 1];
> +			}
> +		}
> +	}
> +	free(cap_eq);
> +	return NULL;
> +}
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index a3541f98e1fc..2b767f4ae6e0 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -50,6 +50,13 @@ struct hybrid_cpc_node {
>  	char            *pmu_name;
>  };
>  
> +struct env_pmu_caps {
> +	u8	core_type;
> +	char	*pmu_name;
> +	u16	nr_caps;
> +	char	**pmu_caps;
> +};
> +
>  struct perf_env {
>  	char			*hostname;
>  	char			*os_release;
> @@ -75,6 +82,7 @@ struct perf_env {
>  	int			nr_cpu_pmu_caps;
>  	int			nr_hybrid_nodes;
>  	int			nr_hybrid_cpc_nodes;
> +	int			nr_pmus_with_caps;
>  	char			*cmdline;
>  	const char		**cmdline_argv;
>  	char			*sibling_cores;
> @@ -95,6 +103,7 @@ struct perf_env {
>  	unsigned long long	 memory_bsize;
>  	struct hybrid_node	*hybrid_nodes;
>  	struct hybrid_cpc_node	*hybrid_cpc_nodes;
> +	struct env_pmu_caps	*env_pmu_caps;
>  #ifdef HAVE_LIBBPF_SUPPORT
>  	/*
>  	 * bpf_info_lock protects bpf rbtrees. This is needed because the
> @@ -172,4 +181,6 @@ bool perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node);
>  struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id);
>  
>  int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu);
> +char *perf_env__find_pmu_cap(struct perf_env *env, u8 core_type,
> +			     const char *pmu_name, const char *cap);
>  #endif /* __PERF_ENV_H */
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index a27132e5a5ef..23d89dbfcd96 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -217,6 +217,19 @@ static int __do_read(struct feat_fd *ff, void *addr, ssize_t size)
>  	return __do_read_buf(ff, addr, size);
>  }
>  
> +static int do_read_u16(struct feat_fd *ff, u16 *addr)
> +{
> +	int ret;
> +
> +	ret = __do_read(ff, addr, sizeof(*addr));
> +	if (ret)
> +		return ret;
> +
> +	if (ff->ph->needs_swap)
> +		*addr = bswap_16(*addr);
> +	return 0;
> +}
> +

You will not need the do_read_u16, use do_read_u32.

Then change the other parts to the u32 type.

- Arnaldo

>  static int do_read_u32(struct feat_fd *ff, u32 *addr)
>  {
>  	int ret;
> @@ -1580,6 +1593,77 @@ static int write_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>  	return 0;
>  }
>  
> +/*
> + * File format:
> + *
> + * struct {
> + *	u32 nr_pmus;
> + *	struct {
> + *		u8 core_type;
> + *		char pmu_name[];
> + *		u16 nr_caps;

Update here.

> + *		struct {
> + *			char name[];
> + *			char value[];
> + *		} [nr_caps];
> + *	} [nr_pmus];
> + * };
> + */
> +static int write_pmu_caps(struct feat_fd *ff, struct evlist *evlist __maybe_unused)
> +{
> +	struct perf_pmu_caps *caps = NULL;
> +	struct perf_pmu *pmu = NULL;
> +	u8 core_type = 0;
> +	u32 nr_pmus = 0;
> +	int ret;
> +
> +	while ((pmu = perf_pmu__scan(pmu))) {
> +		if (!pmu->name || !strncmp(pmu->name, "cpu", 3) ||
> +		    perf_pmu__caps_parse(pmu) <= 0)
> +			continue;
> +		nr_pmus++;
> +	}
> +
> +	ret = do_write(ff, &nr_pmus, sizeof(nr_pmus));
> +	if (ret < 0)
> +		return ret;
> +
> +	if (!nr_pmus)
> +		return 0;
> +
> +	while ((pmu = perf_pmu__scan(pmu))) {
> +		if (!pmu->name || !strncmp(pmu->name, "cpu", 3) || !pmu->nr_caps)
> +			continue;
> +
> +		/*
> +		 * Currently core_type is always set to 0. But it can be
> +		 * used in future for hybrid topology pmus.
> +		 */
> +		ret = do_write(ff, &core_type, sizeof(core_type));
> +		if (ret < 0)
> +			return ret;
> +
> +		ret = do_write_string(ff, pmu->name);
> +		if (ret < 0)
> +			return ret;
> +
> +		ret = do_write(ff, &pmu->nr_caps, sizeof(pmu->nr_caps));
> +		if (ret < 0)
> +			return ret;
> +
> +		list_for_each_entry(caps, &pmu->caps, list) {
> +			ret = do_write_string(ff, caps->name);
> +			if (ret < 0)
> +				return ret;
> +
> +			ret = do_write_string(ff, caps->value);
> +			if (ret < 0)
> +				return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
>  static void print_hostname(struct feat_fd *ff, FILE *fp)
>  {
>  	fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname);
> @@ -2209,6 +2293,31 @@ static void print_mem_topology(struct feat_fd *ff, FILE *fp)
>  	}
>  }
>  
> +static void print_pmu_caps(struct feat_fd *ff, FILE *fp)
> +{
> +	struct env_pmu_caps *env_pmu_caps = ff->ph->env.env_pmu_caps;
> +	int nr_pmus_with_caps = ff->ph->env.nr_pmus_with_caps;
> +	const char *delimiter = "";
> +	char **ptr;
> +	int i, j;
> +
> +	if (!nr_pmus_with_caps)
> +		return;
> +
> +	for (i = 0; i < nr_pmus_with_caps; i++) {
> +		fprintf(fp, "# %s pmu capabilities: ", env_pmu_caps[i].pmu_name);
> +
> +		ptr = env_pmu_caps[i].pmu_caps;
> +
> +		delimiter = "";
> +		for (j = 0; j < env_pmu_caps[i].nr_caps; j++) {
> +			fprintf(fp, "%s%s", delimiter, ptr[j]);
> +			delimiter = ", ";
> +		}
> +		fprintf(fp, "\n");
> +	}
> +}
> +
>  static int __event_process_build_id(struct perf_record_header_build_id *bev,
>  				    char *filename,
>  				    struct perf_session *session)
> @@ -3319,6 +3428,107 @@ static int process_hybrid_cpu_pmu_caps(struct feat_fd *ff,
>  	return ret;
>  }
>  
> +static int __process_pmu_caps(struct feat_fd *ff, struct env_pmu_caps *env_pmu_caps)
> +{
> +	u16 nr_caps = env_pmu_caps->nr_caps;
> +	int name_size, value_size;
> +	char *name, *value, *ptr;
> +	u16 i;
> +
> +	env_pmu_caps->pmu_caps = zalloc(sizeof(char *) * nr_caps);
> +	if (!env_pmu_caps->pmu_caps)
> +		return -1;
> +
> +	for (i = 0; i < nr_caps; i++) {
> +		name = do_read_string(ff);
> +		if (!name)
> +			goto error;
> +
> +		value = do_read_string(ff);
> +		if (!value)
> +			goto free_name;
> +
> +		name_size = strlen(name);
> +		value_size = strlen(value);
> +		ptr = zalloc(sizeof(char) * (name_size + value_size + 2));
> +		if (!ptr)
> +			goto free_value;
> +
> +		memcpy(ptr, name, name_size);
> +		ptr[name_size] = '=';
> +		memcpy(ptr + name_size + 1, value, value_size);
> +		env_pmu_caps->pmu_caps[i] = ptr;
> +
> +		free(value);
> +		free(name);
> +	}
> +	return 0;
> +
> +free_value:
> +	free(value);
> +free_name:
> +	free(name);
> +error:
> +	for (; i > 0; i--)
> +		free(env_pmu_caps->pmu_caps[i - 1]);
> +	free(env_pmu_caps->pmu_caps);
> +	return -1;
> +}
> +
> +static int process_pmu_caps(struct feat_fd *ff, void *data __maybe_unused)
> +{
> +	struct env_pmu_caps *env_pmu_caps;
> +	u32 nr_pmus;
> +	u32 i;
> +	u16 j;
> +
> +	ff->ph->env.nr_pmus_with_caps = 0;
> +	ff->ph->env.env_pmu_caps = NULL;
> +
> +	if (do_read_u32(ff, &nr_pmus))
> +		return -1;
> +
> +	if (!nr_pmus)
> +		return 0;
> +
> +	env_pmu_caps = zalloc(sizeof(struct env_pmu_caps) * nr_pmus);
> +	if (!env_pmu_caps)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < nr_pmus; i++) {
> +		if (__do_read(ff, &env_pmu_caps[i].core_type, sizeof(env_pmu_caps[i].core_type)))
> +			goto error;
> +
> +		env_pmu_caps[i].pmu_name = do_read_string(ff);
> +		if (!env_pmu_caps[i].pmu_name)
> +			goto error;
> +
> +		if (do_read_u16(ff, &env_pmu_caps[i].nr_caps))
> +			goto free_pmu_name;
> +
> +		if (!__process_pmu_caps(ff, &env_pmu_caps[i]))
> +			continue;
> +
> +free_pmu_name:
> +		free(env_pmu_caps[i].pmu_name);
> +		goto error;
> +	}
> +
> +	ff->ph->env.nr_pmus_with_caps = nr_pmus;
> +	ff->ph->env.env_pmu_caps = env_pmu_caps;
> +	return 0;
> +
> +error:
> +	for (; i > 0; i--) {
> +		free(env_pmu_caps[i - 1].pmu_name);
> +		for (j = 0; j < env_pmu_caps[i - 1].nr_caps; j++)
> +			free(env_pmu_caps[i - 1].pmu_caps[j]);
> +		free(env_pmu_caps[i - 1].pmu_caps);
> +	}
> +	free(env_pmu_caps);
> +	return -1;
> +}
> +
>  #define FEAT_OPR(n, func, __full_only) \
>  	[HEADER_##n] = {					\
>  		.name	    = __stringify(n),			\
> @@ -3382,6 +3592,7 @@ const struct perf_header_feature_ops feat_ops[HEADER_LAST_FEATURE] = {
>  	FEAT_OPR(CLOCK_DATA,	clock_data,	false),
>  	FEAT_OPN(HYBRID_TOPOLOGY,	hybrid_topology,	true),
>  	FEAT_OPR(HYBRID_CPU_PMU_CAPS,	hybrid_cpu_pmu_caps,	false),
> +	FEAT_OPR(PMU_CAPS,	pmu_caps,	false),
>  };
>  
>  struct header_print_data {
> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> index 0eb4bc29a5a4..e9a067bb8b9e 100644
> --- a/tools/perf/util/header.h
> +++ b/tools/perf/util/header.h
> @@ -47,6 +47,7 @@ enum {
>  	HEADER_CLOCK_DATA,
>  	HEADER_HYBRID_TOPOLOGY,
>  	HEADER_HYBRID_CPU_PMU_CAPS,
> +	HEADER_PMU_CAPS,
>  	HEADER_LAST_FEATURE,
>  	HEADER_FEAT_BITS	= 256,
>  };
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 9a1c7e63e663..8d599acb7569 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -1890,16 +1890,22 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
>  	const char *sysfs = sysfs__mountpoint();
>  	DIR *caps_dir;
>  	struct dirent *evt_ent;
> -	int nr_caps = 0;
> +
> +	if (pmu->caps_initialized)
> +		return pmu->nr_caps;
>  
>  	if (!sysfs)
>  		return -1;
>  
> +	pmu->nr_caps = 0;
> +
>  	snprintf(caps_path, PATH_MAX,
>  		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/caps", sysfs, pmu->name);
>  
> -	if (stat(caps_path, &st) < 0)
> +	if (stat(caps_path, &st) < 0) {
> +		pmu->caps_initialized = true;
>  		return 0;	/* no error if caps does not exist */
> +	}
>  
>  	caps_dir = opendir(caps_path);
>  	if (!caps_dir)
> @@ -1926,13 +1932,14 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
>  			continue;
>  		}
>  
> -		nr_caps++;
> +		pmu->nr_caps++;
>  		fclose(file);
>  	}
>  
>  	closedir(caps_dir);
>  
> -	return nr_caps;
> +	pmu->caps_initialized = true;
> +	return pmu->nr_caps;
>  }
>  
>  void perf_pmu__warn_invalid_config(struct perf_pmu *pmu, __u64 config,
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index 541889fa9f9c..593005e68bea 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -46,6 +46,8 @@ struct perf_pmu {
>  	struct perf_cpu_map *cpus;
>  	struct list_head format;  /* HEAD struct perf_pmu_format -> list */
>  	struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
> +	bool caps_initialized;
> +	u16 nr_caps;
>  	struct list_head caps;    /* HEAD struct perf_pmu_caps -> list */
>  	struct list_head list;    /* ELEM */
>  	struct list_head hybrid_list;
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 7/8] perf script ibs: Support new IBS bits in raw trace dump
  2022-05-09  4:49 ` [PATCH v2 7/8] perf script ibs: Support new IBS bits in raw trace dump Ravi Bangoria
@ 2022-05-16 13:29   ` Arnaldo Carvalho de Melo
  2022-05-16 13:47     ` Ravi Bangoria
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-16 13:29 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, rrichter, mingo, mark.rutland, jolsa, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla

Em Mon, May 09, 2022 at 10:19:13AM +0530, Ravi Bangoria escreveu:
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/include/asm/amd-ibs.h       | 16 ++++---
>  tools/arch/x86/include/asm/amd-ibs.h | 16 ++++---
>  tools/perf/util/amd-sample-raw.c     | 68 ++++++++++++++++++++++++----
>  3 files changed, 80 insertions(+), 20 deletions(-)

Please separate the tooling part on a separate patch.

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 6/8] perf header: Parse non-cpu pmu capabilities
  2022-05-16 13:28   ` Arnaldo Carvalho de Melo
@ 2022-05-16 13:46     ` Ravi Bangoria
  0 siblings, 0 replies; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-16 13:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: peterz, rrichter, mingo, mark.rutland, jolsa, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria


On 16-May-22 6:58 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 09, 2022 at 10:19:12AM +0530, Ravi Bangoria escreveu:
>> Pmus advertise their capabilities via sysfs attribute files but
>> perf tool currently parses only core(cpu) pmu capabilities. Add
>> support for parsing non-cpu pmu capabilities.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>> ---
>>  .../Documentation/perf.data-file-format.txt   |  18 ++
>>  tools/perf/util/env.c                         |  48 +++-
>>  tools/perf/util/env.h                         |  11 +
>>  tools/perf/util/header.c                      | 211 ++++++++++++++++++
>>  tools/perf/util/header.h                      |   1 +
>>  tools/perf/util/pmu.c                         |  15 +-
>>  tools/perf/util/pmu.h                         |   2 +
>>  7 files changed, 301 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt
>> index f56d0e0fbff6..dea3acb36558 100644
>> --- a/tools/perf/Documentation/perf.data-file-format.txt
>> +++ b/tools/perf/Documentation/perf.data-file-format.txt
>> @@ -435,6 +435,24 @@ struct {
>>  	} [nr_pmu];
>>  };
>>  
>> +	HEADER_PMU_CAPS = 32,
>> +
>> +	List of pmu capabilities (except cpu pmu which is already
>> +	covered by HEADER_CPU_PMU_CAPS)
>> +
>> +struct {
>> +	u32 nr_pmus;
>> +	struct {
>> +		u8 core_type;	/* For hybrid topology */
> 
> Humm, I'd say use u32 here and..
> 
>> +		char pmu_name[];
>> +		u16 nr_caps;
> 
> Here, no need to save space here, I guess.

Yeah I know it's not a biggie but fwiw I thoughtfully allocated space.
256 types should be more than enough for core_type. Similarly no real
pmu will have more than 65,536 capabilities :)

Anyway, will convert them to u32.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 7/8] perf script ibs: Support new IBS bits in raw trace dump
  2022-05-16 13:29   ` Arnaldo Carvalho de Melo
@ 2022-05-16 13:47     ` Ravi Bangoria
  0 siblings, 0 replies; 25+ messages in thread
From: Ravi Bangoria @ 2022-05-16 13:47 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: peterz, rrichter, mingo, mark.rutland, jolsa, namhyung, tglx, bp,
	irogers, yao.jin, james.clark, leo.yan, kan.liang, ak, eranian,
	like.xu.linux, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria

On 16-May-22 6:59 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 09, 2022 at 10:19:13AM +0530, Ravi Bangoria escreveu:
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>> ---
>>  arch/x86/include/asm/amd-ibs.h       | 16 ++++---
>>  tools/arch/x86/include/asm/amd-ibs.h | 16 ++++---
>>  tools/perf/util/amd-sample-raw.c     | 68 ++++++++++++++++++++++++----
>>  3 files changed, 80 insertions(+), 20 deletions(-)
> 
> Please separate the tooling part on a separate patch.

Sure.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-05-16 13:48 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09  4:49 [PATCH v2 0/8] perf/amd: Zen4 IBS extensions support Ravi Bangoria
2022-05-09  4:49 ` [PATCH v2 1/8] perf/amd/ibs: Cascade pmu init functions' return value Ravi Bangoria
2022-05-11 19:47   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-05-09  4:49 ` [PATCH v2 2/8] perf/amd/ibs: Use ->is_visible callback for dynamic attributes Ravi Bangoria
2022-05-11 19:47   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-05-09  4:49 ` [PATCH v2 3/8] perf/amd/ibs: Add support for L3 miss filtering Ravi Bangoria
2022-05-09 12:05   ` Peter Zijlstra
2022-05-09 12:35     ` Ravi Bangoria
2022-05-09 13:07       ` Peter Zijlstra
2022-05-11 19:46   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-05-09  4:49 ` [PATCH v2 4/8] perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute Ravi Bangoria
2022-05-11 19:46   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-05-09  4:49 ` [PATCH v2 5/8] perf record ibs: Warn about sampling period skew Ravi Bangoria
2022-05-16 13:22   ` Arnaldo Carvalho de Melo
2022-05-16 13:27     ` Ravi Bangoria
2022-05-09  4:49 ` [PATCH v2 6/8] perf header: Parse non-cpu pmu capabilities Ravi Bangoria
2022-05-16  4:15   ` Ravi Bangoria
2022-05-16 12:53     ` Arnaldo Carvalho de Melo
2022-05-16 13:28   ` Arnaldo Carvalho de Melo
2022-05-16 13:46     ` Ravi Bangoria
2022-05-09  4:49 ` [PATCH v2 7/8] perf script ibs: Support new IBS bits in raw trace dump Ravi Bangoria
2022-05-16 13:29   ` Arnaldo Carvalho de Melo
2022-05-16 13:47     ` Ravi Bangoria
2022-05-09  4:49 ` [PATCH v2 8/8] perf ibs: Fix comment Ravi Bangoria
2022-05-11 19:46   ` [tip: perf/core] perf/ibs: " tip-bot2 for Ravi Bangoria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).