linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/14] perf mem/c2c: Add support for AMD
@ 2022-06-16 11:36 Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 01/14] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
                   ` (11 more replies)
  0 siblings, 12 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:36 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Enable support for these tools on
AMD Zen processors based on IBS Op pmu.

There are some limitations though: Only load/store instructions provide
mem/c2c information. However, IBS does not provide a way to choose a
particular type of instruction to tag. This results in many non-LS
instructions being tagged which appear as N/A. IBS, being an uncore pmu
from kernel point of view[1], does not support per process monitoring.
Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.

Example:
  $ sudo ./perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo ./perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit            10  HitM
  Remote Cache (2 hops) hit             6  N/A
  Uncached hit                         4  N/A

Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement patches[2]

[1]: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com
[2]: https://lore.kernel.org/lkml/20220604044519.594-1-ravi.bangoria@amd.com

v1: https://lore.kernel.org/lkml/20220525093938.4101-1-ravi.bangoria@amd.com
v1->v2:
 - Instead of defining macros to extract IBS register bits, use existing
   bitfield definitions. Zen4 has introduced additional set of bits in
   IBS registers which this series also exploits and thus this series
   now depends on IBS Zen4 enhancement patchset.
 - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new event,
   perf tool starts with a set of attributes and goes on reverting some
   attributes in a predefined order until it succeeds or run out or all
   attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
   which always fails because IBS does not support guest filtering. The
   problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
   exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
   support from kernel, using it from the perf tool need more changes.
   I'll try to address this bug later.
 - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
   that physical address is set by arch pmu driver and should not be
   overwritten.


Ravi Bangoria (14):
  perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
  perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  perf/x86/amd: Support PERF_SAMPLE_ADDR
  perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  perf tool: Sync include/uapi/linux/perf_event.h header
  perf tool: Sync arch/x86/include/asm/amd-ibs.h header
  perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
  perf mem/c2c: Add load store event mappings for AMD
  perf mem/c2c: Avoid printing empty lines for unsupported events
  perf mem: Use more generic term for LFB
  perf script: Add missing fields in usage hint

 arch/x86/events/amd/ibs.c                | 372 ++++++++++++++++++++++-
 arch/x86/include/asm/amd-ibs.h           |  16 +
 include/uapi/linux/perf_event.h          |   5 +-
 kernel/events/core.c                     |   4 +-
 tools/arch/x86/include/asm/amd-ibs.h     |  16 +
 tools/include/uapi/linux/perf_event.h    |   5 +-
 tools/perf/Documentation/perf-c2c.txt    |  14 +-
 tools/perf/Documentation/perf-mem.txt    |   3 +-
 tools/perf/Documentation/perf-record.txt |   1 +
 tools/perf/arch/x86/util/mem-events.c    |  31 +-
 tools/perf/builtin-c2c.c                 |   1 +
 tools/perf/builtin-mem.c                 |   1 +
 tools/perf/builtin-script.c              |   7 +-
 tools/perf/util/mem-events.c             |  17 +-
 14 files changed, 467 insertions(+), 26 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 01/14] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
@ 2022-06-16 11:36 ` Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 02/14] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions Ravi Bangoria
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:36 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
accesses but it can not distinguish between local and remote IO.
Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 include/uapi/linux/perf_event.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d37629dbad72..1c3157c1be9d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1292,7 +1292,9 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
-/* 5-0xa available */
+/* 5-0x8 available */
+#define PERF_MEM_LVLNUM_EXTN_MEM 0x9 /* Extension memory */
+#define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
 #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */
 #define PERF_MEM_LVLNUM_RAM	0x0d /* RAM */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 02/14] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 01/14] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
@ 2022-06-16 11:36 ` Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 03/14] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC Ravi Bangoria
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:36 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IBS_OP_DATA2 DataSrc provides detail about location of the data
being accessed from by load ops. Define macros for legacy and
extended DataSrc values.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/include/asm/amd-ibs.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
index f3eb098d63d4..cb2a5e113daa 100644
--- a/arch/x86/include/asm/amd-ibs.h
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -6,6 +6,22 @@
 
 #include <asm/msr-index.h>
 
+/* IBS_OP_DATA2 DataSrc */
+#define IBS_DATA_SRC_LOC_CACHE			 2
+#define IBS_DATA_SRC_DRAM			 3
+#define IBS_DATA_SRC_REM_CACHE			 4
+#define IBS_DATA_SRC_IO				 7
+
+/* IBS_OP_DATA2 DataSrc Extension */
+#define IBS_DATA_SRC_EXT_LOC_CACHE		 1
+#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE		 2
+#define IBS_DATA_SRC_EXT_DRAM			 3
+#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE		 5
+#define IBS_DATA_SRC_EXT_PMEM			 6
+#define IBS_DATA_SRC_EXT_IO			 7
+#define IBS_DATA_SRC_EXT_EXT_MEM		 8
+#define IBS_DATA_SRC_EXT_PEER_AGENT_MEM		12
+
 /*
  * IBS Hardware MSRs
  */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 03/14] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 01/14] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 02/14] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions Ravi Bangoria
@ 2022-06-16 11:36 ` Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 04/14] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} Ravi Bangoria
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:36 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

struct perf_mem_data_src is used to pass arch specific memory access
details into generic form. These details gets consumed by tools like
perf mem and c2c. IBS tagged load/store sample provides most of the
information needed for these tools. Add a logic to convert IBS
specific raw data into perf_mem_data_src.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 302 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 296 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index c251bc44c088..de2632a2e44d 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -688,6 +688,294 @@ static struct perf_ibs perf_ibs_op = {
 	.get_count		= get_ibs_op_count,
 };
 
+static void perf_ibs_get_mem_op(union ibs_op_data3 *op_data3,
+				struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_op = PERF_MEM_OP_NA;
+
+	if (op_data3->ld_op)
+		data_src->mem_op = PERF_MEM_OP_LOAD;
+	else if (op_data3->st_op)
+		data_src->mem_op = PERF_MEM_OP_STORE;
+}
+
+/*
+ * Processors having CPUID_Fn8000001B_EAX[11] aka IBS_CAPS_ZEN4 has
+ * more fine granular DataSrc encodings. Others have coarse.
+ */
+static u8 perf_ibs_data_src(union ibs_op_data2 *op_data2)
+{
+	if (ibs_caps & IBS_CAPS_ZEN4)
+		return (op_data2->data_src_hi << 3) | op_data2->data_src_lo;
+
+	return op_data2->data_src_lo;
+}
+
+static void perf_ibs_get_mem_lvl(struct perf_event *event,
+				 union ibs_op_data2 *op_data2,
+				 union ibs_op_data3 *op_data3,
+				 struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u8 ibs_data_src = perf_ibs_data_src(op_data2);
+
+	data_src->mem_lvl = 0;
+
+	/*
+	 * DcMiss, L2Miss, DataSrc, DcMissLat etc. are all invalid for Uncached
+	 * memory accesses. So, check DcUcMemAcc bit early.
+	 */
+	if (op_data3->dc_uc_mem_acc && ibs_data_src != IBS_DATA_SRC_EXT_IO) {
+		data_src->mem_lvl = PERF_MEM_LVL_UNC | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* L1 Hit */
+	if (op_data3->dc_miss == 0) {
+		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* L2 Hit */
+	if (op_data3->l2_miss == 0) {
+		/* Erratum #1293 */
+		if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xF ||
+		    !(op_data3->sw_pf || op_data3->dc_miss_no_mab_alloc)) {
+			data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/* L3 Hit */
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+		    ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	} else {
+		if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+		    ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_REM_CCE1 |
+					    PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/* A peer cache in a near CCX. */
+	if (ibs_caps & IBS_CAPS_ZEN4 && data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE) {
+		data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* A peer cache in a far CCX. */
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+		    ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	} else {
+		if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+		    ibs_data_src == IBS_DATA_SRC_REM_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/* DRAM */
+	if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_DRAM) {
+		if (op_data2->rmt_node == 0)
+			data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
+		else
+			data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* PMEM */
+	if (ibs_caps & IBS_CAPS_ZEN4 && data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_PMEM) {
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_PMEM;
+		if (op_data2->rmt_node) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+	/* Extension Memory */
+	if (ibs_caps & IBS_CAPS_ZEN4 && data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_EXTN_MEM;
+		if (op_data2->rmt_node) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+	/* IO */
+	if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_IO) {
+		data_src->mem_lvl = PERF_MEM_LVL_IO;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_IO;
+		if (op_data2->rmt_node) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+	/*
+	 * MAB (Miss Address Buffer) Hit. MAB keeps track of outstanding
+	 * DC misses. However such data may come from any level in mem
+	 * hierarchy. IBS provides detail about both MAB as well as actual
+	 * DataSrc simultaneously. Prioritize DataSrc over MAB, i.e. set
+	 * MAB only when IBS fails to provide DataSrc.
+	 */
+	if (op_data3->dc_miss_no_mab_alloc) {
+		data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	data_src->mem_lvl = PERF_MEM_LVL_NA;
+}
+
+static bool perf_ibs_cache_hit_st_valid(void)
+{
+	/* 0: Uninitialized, 1: Valid, -1: Invalid */
+	static int cache_hit_st_valid;
+
+	if (unlikely(!cache_hit_st_valid)) {
+		if (boot_cpu_data.x86 == 0x19 &&
+		    (boot_cpu_data.x86_model <= 0xF ||
+		    (boot_cpu_data.x86_model >= 0x20 &&
+		     boot_cpu_data.x86_model <= 0x5F))) {
+			cache_hit_st_valid = -1;
+		} else {
+			cache_hit_st_valid = 1;
+		}
+	}
+
+	return cache_hit_st_valid == 1;
+}
+
+static void perf_ibs_get_mem_snoop(union ibs_op_data2 *op_data2,
+				   struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u8 ibs_data_src;
+
+	data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+
+	if (!perf_ibs_cache_hit_st_valid() ||
+	    data_src->mem_op != PERF_MEM_OP_LOAD ||
+	    data_src->mem_lvl & PERF_MEM_LVL_L1 ||
+	    data_src->mem_lvl & PERF_MEM_LVL_L2 ||
+	    op_data2->cache_hit_st)
+		return;
+
+	ibs_data_src = perf_ibs_data_src(op_data2);
+
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE ||
+		    ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE ||
+		    ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE)
+			data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+	} else if (ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
+		data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+	}
+}
+
+static void perf_ibs_get_tlb_lvl(union ibs_op_data3 *op_data3,
+				 struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_dtlb = PERF_MEM_TLB_NA;
+
+	if (!op_data3->dc_lin_addr_valid)
+		return;
+
+	if (!op_data3->dc_l1tlb_miss) {
+		data_src->mem_dtlb = PERF_MEM_TLB_L1 | PERF_MEM_TLB_HIT;
+		return;
+	}
+
+	if (!op_data3->dc_l2tlb_miss) {
+		data_src->mem_dtlb = PERF_MEM_TLB_L2 | PERF_MEM_TLB_HIT;
+		return;
+	}
+
+	data_src->mem_dtlb = PERF_MEM_TLB_L2 | PERF_MEM_TLB_MISS;
+}
+
+static void perf_ibs_get_mem_lock(union ibs_op_data3 *op_data3,
+				  struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_lock = PERF_MEM_LOCK_NA;
+
+	if (op_data3->dc_locked_op)
+		data_src->mem_lock = PERF_MEM_LOCK_LOCKED;
+}
+
+#define ibs_op_msr_idx(msr)	(msr - MSR_AMD64_IBSOPCTL)
+
+static void perf_ibs_get_data_src(struct perf_event *event,
+				  struct perf_ibs_data *ibs_data,
+				  struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	union ibs_op_data2 op_data2;
+	union ibs_op_data3 op_data3;
+
+	op_data3.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+
+	perf_ibs_get_mem_op(&op_data3, data);
+	if (data_src->mem_op != PERF_MEM_OP_LOAD &&
+	    data_src->mem_op != PERF_MEM_OP_STORE)
+		return;
+
+	op_data2.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA2)];
+
+	/* Erratum #1293 */
+	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model <= 0xF &&
+	    (op_data3.sw_pf || op_data3.dc_miss_no_mab_alloc)) {
+		/*
+		 * OP_DATA2 has only two fields on Zen3: DataSrc and RmtNode.
+		 * DataSrc=0 is No valid status and RmtNode is invalid when
+		 * DataSrc=0.
+		 */
+		op_data2.val = 0;
+	}
+
+	perf_ibs_get_mem_lvl(event, &op_data2, &op_data3, data);
+	perf_ibs_get_mem_snoop(&op_data2, data);
+	perf_ibs_get_tlb_lvl(&op_data3, data);
+	perf_ibs_get_mem_lock(&op_data3, data);
+}
+
+static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
+				   int check_rip)
+{
+	if (sample_type & PERF_SAMPLE_RAW ||
+	    (perf_ibs == &perf_ibs_op &&
+	     sample_type & PERF_SAMPLE_DATA_SRC))
+		return perf_ibs->offset_max;
+	else if (check_rip)
+		return 3;
+	return 1;
+}
+
 static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 {
 	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
@@ -735,12 +1023,9 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	size = 1;
 	offset = 1;
 	check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
-	if (event->attr.sample_type & PERF_SAMPLE_RAW)
-		offset_max = perf_ibs->offset_max;
-	else if (check_rip)
-		offset_max = 3;
-	else
-		offset_max = 1;
+
+	offset_max = perf_ibs_get_offset_max(perf_ibs, event->attr.sample_type, check_rip);
+
 	do {
 		rdmsrl(msr + offset, *buf++);
 		size++;
@@ -793,6 +1078,11 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		data.raw = &raw;
 	}
 
+	if (perf_ibs == &perf_ibs_op) {
+		if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC)
+			perf_ibs_get_data_src(event, &ibs_data, &data);
+	}
+
 	/*
 	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
 	 * recorded as part of interrupt regs. Thus we need to use rip from
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 04/14] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (2 preceding siblings ...)
  2022-06-16 11:36 ` [PATCH v2 03/14] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC Ravi Bangoria
@ 2022-06-16 11:36 ` Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 05/14] perf/x86/amd: Support PERF_SAMPLE_ADDR Ravi Bangoria
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:36 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IbsDcMissLat indicates the number of clock cycles from when a miss is
detected in the data cache to when the data was delivered to the core.
Similarly, IbsTagToRetCtr provides number of cycles from when the op
was tagged to when the op was retired. Consider these fields for
sample->weight. Note that sample->weight will be populated only when
PERF_SAMPLE_DATA_SRC is also set, although PERF_SAMPLE_WEIGHT_STRUCT
and PERF_SAMPLE_WEIGHT are independent of PERF_SAMPLE_DATA_SRC.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index de2632a2e44d..830e527a29c3 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -714,6 +714,7 @@ static u8 perf_ibs_data_src(union ibs_op_data2 *op_data2)
 }
 
 static void perf_ibs_get_mem_lvl(struct perf_event *event,
+				 union ibs_op_data *op_data,
 				 union ibs_op_data2 *op_data2,
 				 union ibs_op_data3 *op_data3,
 				 struct perf_sample_data *data)
@@ -738,6 +739,16 @@ static void perf_ibs_get_mem_lvl(struct perf_event *event,
 		return;
 	}
 
+	/* Load latency (Data cache miss latency) */
+	if (data_src->mem_op == PERF_MEM_OP_LOAD) {
+		if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
+			data->weight.var1_dw = op_data3->dc_miss_lat;
+			data->weight.var2_w = op_data->tag_to_ret_ctr;
+		} else if (event->attr.sample_type & PERF_SAMPLE_WEIGHT) {
+			data->weight.full = op_data3->dc_miss_lat;
+		}
+	}
+
 	/* L2 Hit */
 	if (op_data3->l2_miss == 0) {
 		/* Erratum #1293 */
@@ -935,6 +946,7 @@ static void perf_ibs_get_data_src(struct perf_event *event,
 				  struct perf_sample_data *data)
 {
 	union perf_mem_data_src *data_src = &data->data_src;
+	union ibs_op_data op_data;
 	union ibs_op_data2 op_data2;
 	union ibs_op_data3 op_data3;
 
@@ -945,6 +957,7 @@ static void perf_ibs_get_data_src(struct perf_event *event,
 	    data_src->mem_op != PERF_MEM_OP_STORE)
 		return;
 
+	op_data.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA)];
 	op_data2.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA2)];
 
 	/* Erratum #1293 */
@@ -958,7 +971,7 @@ static void perf_ibs_get_data_src(struct perf_event *event,
 		op_data2.val = 0;
 	}
 
-	perf_ibs_get_mem_lvl(event, &op_data2, &op_data3, data);
+	perf_ibs_get_mem_lvl(event, &op_data, &op_data2, &op_data3, data);
 	perf_ibs_get_mem_snoop(&op_data2, data);
 	perf_ibs_get_tlb_lvl(&op_data3, data);
 	perf_ibs_get_mem_lock(&op_data3, data);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 05/14] perf/x86/amd: Support PERF_SAMPLE_ADDR
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (3 preceding siblings ...)
  2022-06-16 11:36 ` [PATCH v2 04/14] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} Ravi Bangoria
@ 2022-06-16 11:36 ` Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 06/14] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR Ravi Bangoria
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:36 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IBS_DC_LINADDR provides the linear data address for the tagged load/
store operation. Populate perf sample address using it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 830e527a29c3..9b3e265a9fed 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -977,12 +977,35 @@ static void perf_ibs_get_data_src(struct perf_event *event,
 	perf_ibs_get_mem_lock(&op_data3, data);
 }
 
+static void perf_ibs_get_data_addr(struct perf_event *event,
+				   struct perf_ibs_data *ibs_data,
+				   struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	union ibs_op_data3 op_data3;
+
+	op_data3.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+
+	if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
+		perf_ibs_get_mem_op(&op_data3, data);
+
+	if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
+	    data_src->mem_op != PERF_MEM_OP_STORE) ||
+	    !op_data3.dc_lin_addr_valid) {
+		data->addr = 0x0;
+		return;
+	}
+
+	data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
+}
+
 static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
 				   int check_rip)
 {
 	if (sample_type & PERF_SAMPLE_RAW ||
 	    (perf_ibs == &perf_ibs_op &&
-	     sample_type & PERF_SAMPLE_DATA_SRC))
+	    (sample_type & PERF_SAMPLE_DATA_SRC ||
+	     sample_type & PERF_SAMPLE_ADDR)))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;
@@ -1094,6 +1117,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	if (perf_ibs == &perf_ibs_op) {
 		if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC)
 			perf_ibs_get_data_src(event, &ibs_data, &data);
+		if (event->attr.sample_type & PERF_SAMPLE_ADDR)
+			perf_ibs_get_data_addr(event, &ibs_data, &data);
 	}
 
 	/*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 06/14] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (4 preceding siblings ...)
  2022-06-16 11:36 ` [PATCH v2 05/14] perf/x86/amd: Support PERF_SAMPLE_ADDR Ravi Bangoria
@ 2022-06-16 11:36 ` Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 07/14] perf tool: Sync include/uapi/linux/perf_event.h header Ravi Bangoria
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:36 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IBS_DC_PHYSADDR provides the physical data address for the tagged load/
store operation. Populate perf sample physical address using it.
Currently, physical address is unconditionally overwritten by generic
perf driver. Introduce internal only __PERF_SAMPLE_PHYS_ADDR_EARLY type
to notify generic code that arch pmu has already set physical address.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c       | 34 ++++++++++++++++++++++++++++++++-
 include/uapi/linux/perf_event.h |  1 +
 kernel/events/core.c            |  4 +++-
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9b3e265a9fed..d224abddc3af 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -310,6 +310,13 @@ static int perf_ibs_init(struct perf_event *event)
 	if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
 		event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
 
+	/*
+	 * Setting _EARLY flag makes sure generic perf driver does not
+	 * overwrite physical address set by arch specific pmu driver.
+	 */
+	if (event->attr.sample_type & PERF_SAMPLE_PHYS_ADDR)
+		event->attr.sample_type |= __PERF_SAMPLE_PHYS_ADDR_EARLY;
+
 	return 0;
 }
 
@@ -999,13 +1006,36 @@ static void perf_ibs_get_data_addr(struct perf_event *event,
 	data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
 }
 
+static void perf_ibs_get_phy_addr(struct perf_event *event,
+				  struct perf_ibs_data *ibs_data,
+				  struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	union ibs_op_data3 op_data3;
+
+	op_data3.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+
+	if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
+		perf_ibs_get_mem_op(&op_data3, data);
+
+	if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
+	    data_src->mem_op != PERF_MEM_OP_STORE) ||
+	    !op_data3.dc_phy_addr_valid) {
+		data->phys_addr = 0x0;
+		return;
+	}
+
+	data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
+}
+
 static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
 				   int check_rip)
 {
 	if (sample_type & PERF_SAMPLE_RAW ||
 	    (perf_ibs == &perf_ibs_op &&
 	    (sample_type & PERF_SAMPLE_DATA_SRC ||
-	     sample_type & PERF_SAMPLE_ADDR)))
+	     sample_type & PERF_SAMPLE_ADDR ||
+	     sample_type & PERF_SAMPLE_PHYS_ADDR)))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;
@@ -1119,6 +1149,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 			perf_ibs_get_data_src(event, &ibs_data, &data);
 		if (event->attr.sample_type & PERF_SAMPLE_ADDR)
 			perf_ibs_get_data_addr(event, &ibs_data, &data);
+		if (event->attr.sample_type & PERF_SAMPLE_PHYS_ADDR)
+			perf_ibs_get_phy_addr(event, &ibs_data, &data);
 	}
 
 	/*
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 1c3157c1be9d..daf7c337e53e 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -165,6 +165,7 @@ enum perf_event_sample_format {
 
 	PERF_SAMPLE_MAX = 1U << 25,		/* non-ABI */
 
+	__PERF_SAMPLE_PHYS_ADDR_EARLY		= 1ULL << 62, /* non-ABI; internal use */
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 80782cddb1da..f1b486410d0b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7403,8 +7403,10 @@ void perf_prepare_sample(struct perf_event_header *header,
 		header->size += size;
 	}
 
-	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
+	if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
+	    !(sample_type & __PERF_SAMPLE_PHYS_ADDR_EARLY)) {
 		data->phys_addr = perf_virt_to_phys(data->addr);
+	}
 
 #ifdef CONFIG_CGROUP_PERF
 	if (sample_type & PERF_SAMPLE_CGROUP) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 07/14] perf tool: Sync include/uapi/linux/perf_event.h header
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (5 preceding siblings ...)
  2022-06-16 11:36 ` [PATCH v2 06/14] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR Ravi Bangoria
@ 2022-06-16 11:36 ` Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 08/14] perf tool: Sync arch/x86/include/asm/amd-ibs.h header Ravi Bangoria
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:36 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Two new fields for mem_lvl_num has been introduced: PERF_MEM_LVLNUM_IO
and PERF_MEM_LVLNUM_EXTN_MEM. Also, __PERF_SAMPLE_PHYS_ADDR_EARLY is
introduce to be used internally by kernel. Kernel header already
contains these definitions. Sync them into tools header as well.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/include/uapi/linux/perf_event.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index d37629dbad72..daf7c337e53e 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -165,6 +165,7 @@ enum perf_event_sample_format {
 
 	PERF_SAMPLE_MAX = 1U << 25,		/* non-ABI */
 
+	__PERF_SAMPLE_PHYS_ADDR_EARLY		= 1ULL << 62, /* non-ABI; internal use */
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
 
@@ -1292,7 +1293,9 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
-/* 5-0xa available */
+/* 5-0x8 available */
+#define PERF_MEM_LVLNUM_EXTN_MEM 0x9 /* Extension memory */
+#define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
 #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */
 #define PERF_MEM_LVLNUM_RAM	0x0d /* RAM */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 08/14] perf tool: Sync arch/x86/include/asm/amd-ibs.h header
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (6 preceding siblings ...)
  2022-06-16 11:36 ` [PATCH v2 07/14] perf tool: Sync include/uapi/linux/perf_event.h header Ravi Bangoria
@ 2022-06-16 11:36 ` Ravi Bangoria
  2022-06-16 11:36 ` [PATCH v2 09/14] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:36 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Although new details added into this header is currently used by
kernel only, tools copy needs to be in sync with kernel file.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/arch/x86/include/asm/amd-ibs.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/arch/x86/include/asm/amd-ibs.h b/tools/arch/x86/include/asm/amd-ibs.h
index 9a3312e12e2e..93807b437e4d 100644
--- a/tools/arch/x86/include/asm/amd-ibs.h
+++ b/tools/arch/x86/include/asm/amd-ibs.h
@@ -6,6 +6,22 @@
 
 #include "msr-index.h"
 
+/* IBS_OP_DATA2 DataSrc */
+#define IBS_DATA_SRC_LOC_CACHE			 2
+#define IBS_DATA_SRC_DRAM			 3
+#define IBS_DATA_SRC_REM_CACHE			 4
+#define IBS_DATA_SRC_IO				 7
+
+/* IBS_OP_DATA2 DataSrc Extension */
+#define IBS_DATA_SRC_EXT_LOC_CACHE		 1
+#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE		 2
+#define IBS_DATA_SRC_EXT_DRAM			 3
+#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE		 5
+#define IBS_DATA_SRC_EXT_PMEM			 6
+#define IBS_DATA_SRC_EXT_IO			 7
+#define IBS_DATA_SRC_EXT_EXT_MEM		 8
+#define IBS_DATA_SRC_EXT_PEER_AGENT_MEM		12
+
 /*
  * IBS Hardware MSRs
  */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 09/14] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (7 preceding siblings ...)
  2022-06-16 11:36 ` [PATCH v2 08/14] perf tool: Sync arch/x86/include/asm/amd-ibs.h header Ravi Bangoria
@ 2022-06-16 11:36 ` Ravi Bangoria
  2022-06-16 11:52 ` [PATCH v2 10/14] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:36 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Add support for printing these new fields in perf mem report.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/mem-events.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index c3c21a9c350b..4a55cdd51bba 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -294,6 +294,8 @@ static const char * const mem_lvl[] = {
 };
 
 static const char * const mem_lvlnum[] = {
+	[PERF_MEM_LVLNUM_EXTN_MEM] = "Ext Mem",
+	[PERF_MEM_LVLNUM_IO] = "I/O",
 	[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
 	[PERF_MEM_LVLNUM_LFB] = "LFB",
 	[PERF_MEM_LVLNUM_RAM] = "RAM",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 10/14] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (8 preceding siblings ...)
  2022-06-16 11:36 ` [PATCH v2 09/14] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
@ 2022-06-16 11:52 ` Ravi Bangoria
  2022-06-16 11:52   ` [PATCH v2 11/14] perf mem/c2c: Add load store event mappings for AMD Ravi Bangoria
                     ` (3 more replies)
  2022-07-12  9:00 ` [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
  2022-07-12 11:35 ` Jiri Olsa
  11 siblings, 4 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:52 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Currently perf sets PERF_SAMPLE_WEIGHT flag only for mem load events.
Set it for combined load-store event as well which will enable recording
of load latency by default on arch that does not support independent
mem load event.

Also document missing -W in perf-record man page.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/Documentation/perf-record.txt | 1 +
 tools/perf/builtin-c2c.c                 | 1 +
 tools/perf/builtin-mem.c                 | 1 +
 3 files changed, 3 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index cf8ad50f3de1..cf68eeb08316 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -397,6 +397,7 @@ is enabled for all the sampling events. The sampled branch type is the same for
 The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
 Note that this feature may not be available on all processors.
 
+-W::
 --weight::
 Enable weightened sampling. An additional weight is recorded per sample and can be
 displayed with the weight and local_weight sort keys.  This currently works for TSX
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 4898ee57d156..3bf3db6f889c 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -3034,6 +3034,7 @@ static int perf_c2c__record(int argc, const char **argv)
 		 */
 		if (e->tag) {
 			e->record = true;
+			rec_argv[i++] = "-W";
 		} else {
 			e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
 			e->record = true;
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 9e435fd23503..f7dd8216de72 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -122,6 +122,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 	    (mem->operation & MEM_OPERATION_LOAD) &&
 	    (mem->operation & MEM_OPERATION_STORE)) {
 		e->record = true;
+		rec_argv[i++] = "-W";
 	} else {
 		if (mem->operation & MEM_OPERATION_LOAD) {
 			e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 11/14] perf mem/c2c: Add load store event mappings for AMD
  2022-06-16 11:52 ` [PATCH v2 10/14] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
@ 2022-06-16 11:52   ` Ravi Bangoria
  2022-06-16 11:52   ` [PATCH v2 12/14] perf mem/c2c: Avoid printing empty lines for unsupported events Ravi Bangoria
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:52 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Wire in ibs_op// event as mem-ldst
event for AMD.

There are some limitations though: Only load/store instructions provide
mem/c2c information. However, IBS does not provide a way to choose a
particular type of instruction to tag. This results in many non-LS
instructions being tagged which appear as N/A. IBS, being an uncore pmu
from kernel point of view[1], does not support per process monitoring.
Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.

Example:
  $ sudo ./perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo ./perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit            10  HitM
  Remote Cache (2 hops) hit             6  N/A
  Uncached hit                         4  N/A

[1]: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/Documentation/perf-c2c.txt | 14 ++++++++----
 tools/perf/Documentation/perf-mem.txt |  3 ++-
 tools/perf/arch/x86/util/mem-events.c | 31 +++++++++++++++++++++++++--
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index 6f69173731aa..32d173fb6541 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -19,9 +19,10 @@ C2C stands for Cache To Cache.
 The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
 you to track down the cacheline contentions.
 
-On x86, the tool is based on load latency and precise store facility events
+On Intel, the tool is based on load latency and precise store facility events
 provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
-with thresholding feature.
+with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware
+limitations, perf c2c is not supported on Zen3 cpus).
 
 These events provide:
   - memory address of the access
@@ -49,7 +50,8 @@ RECORD OPTIONS
 
 -l::
 --ldlat::
-	Configure mem-loads latency. (x86 only)
+	Configure mem-loads latency. Supported on Intel and Arm64 processors
+	only. Ignored on other archs.
 
 -k::
 --all-kernel::
@@ -133,11 +135,15 @@ Following perf record options are configured by default:
   -W,-d,--phys-data,--sample-cpu
 
 Unless specified otherwise with '-e' option, following events are monitored by
-default on x86:
+default on Intel:
 
   cpu/mem-loads,ldlat=30/P
   cpu/mem-stores/P
 
+following on AMD:
+
+  ibs_op//
+
 and following on PowerPC:
 
   cpu/mem-loads/
diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 66177511c5c4..005c95580b1e 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -85,7 +85,8 @@ RECORD OPTIONS
 	Be more verbose (show counter open errors, etc)
 
 --ldlat <n>::
-	Specify desired latency for loads event. (x86 only)
+	Specify desired latency for loads event. Supported on Intel and Arm64
+	processors only. Ignored on other archs.
 
 In addition, for report all perf report options are valid, and for record
 all perf record options.
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index 5214370ca4e4..f683ac702247 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -1,7 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "util/pmu.h"
+#include "util/env.h"
 #include "map_symbol.h"
 #include "mem-events.h"
+#include "linux/string.h"
 
 static char mem_loads_name[100];
 static bool mem_loads_name__init;
@@ -12,18 +14,43 @@ static char mem_stores_name[100];
 
 #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
 
-static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+static struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
 	E("ldlat-loads",	"%s/mem-loads,ldlat=%u/P",	"%s/events/mem-loads"),
 	E("ldlat-stores",	"%s/mem-stores/P",		"%s/events/mem-stores"),
 	E(NULL,			NULL,				NULL),
 };
 
+static struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
+	E(NULL,		NULL,		NULL),
+	E(NULL,		NULL,		NULL),
+	E("mem-ldst",	"ibs_op//",	"ibs_op"),
+};
+
+static int perf_mem_is_amd_cpu(void)
+{
+	struct perf_env env = { .total_mem = 0, };
+
+	perf_env__cpuid(&env);
+	if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD"))
+		return 1;
+	return -1;
+}
+
 struct perf_mem_event *perf_mem_events__ptr(int i)
 {
+	/* 0: Uninitialized, 1: Yes, -1: No */
+	static int is_amd;
+
 	if (i >= PERF_MEM_EVENTS__MAX)
 		return NULL;
 
-	return &perf_mem_events[i];
+	if (!is_amd)
+		is_amd = perf_mem_is_amd_cpu();
+
+	if (is_amd == 1)
+		return &perf_mem_events_amd[i];
+
+	return &perf_mem_events_intel[i];
 }
 
 bool is_mem_loads_aux_event(struct evsel *leader)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 12/14] perf mem/c2c: Avoid printing empty lines for unsupported events
  2022-06-16 11:52 ` [PATCH v2 10/14] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
  2022-06-16 11:52   ` [PATCH v2 11/14] perf mem/c2c: Add load store event mappings for AMD Ravi Bangoria
@ 2022-06-16 11:52   ` Ravi Bangoria
  2022-06-16 11:52   ` [PATCH v2 13/14] perf mem: Use more generic term for LFB Ravi Bangoria
  2022-06-16 11:52   ` [PATCH v2 14/14] perf script: Add missing fields in usage hint Ravi Bangoria
  3 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:52 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Perf mem and c2c can be used with 3 different events: load, store and
combined load-store. Some architectures might support only partial set
of events in which case, perf prints empty line for unsupported events.
Avoid that.

Ex, AMD Zen cpus supports only combined load-store event and does not
support individual load store events.

Before patch:
  $ ./perf mem record -e list
  
  
  mem-ldst     : available

After patch:
  $ ./perf mem record -e list
  mem-ldst     : available

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/mem-events.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 4a55cdd51bba..91db7a0e2da6 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -156,11 +156,12 @@ void perf_mem_events__list(void)
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
 		struct perf_mem_event *e = perf_mem_events__ptr(j);
 
-		fprintf(stderr, "%-13s%-*s%s\n",
-			e->tag ?: "",
-			verbose > 0 ? 25 : 0,
-			verbose > 0 ? perf_mem_events__name(j, NULL) : "",
-			e->supported ? ": available" : "");
+		fprintf(stderr, "%-*s%-*s%s",
+			e->tag ? 13 : 0,
+			e->tag ? : "",
+			e->tag && verbose > 0 ? 25 : 0,
+			e->tag && verbose > 0 ? perf_mem_events__name(j, NULL) : "",
+			e->supported ? ": available\n" : "");
 	}
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 13/14] perf mem: Use more generic term for LFB
  2022-06-16 11:52 ` [PATCH v2 10/14] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
  2022-06-16 11:52   ` [PATCH v2 11/14] perf mem/c2c: Add load store event mappings for AMD Ravi Bangoria
  2022-06-16 11:52   ` [PATCH v2 12/14] perf mem/c2c: Avoid printing empty lines for unsupported events Ravi Bangoria
@ 2022-06-16 11:52   ` Ravi Bangoria
  2022-06-16 11:52   ` [PATCH v2 14/14] perf script: Add missing fields in usage hint Ravi Bangoria
  3 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:52 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

A hw component to track outstanding L1 Data Cache misses is called
LFB (Line Fill Buffer) on Intel and Arm. However similar component
exists on other arch with different names, for ex, it's called MAB
(Miss Address Buffer) on AMD. Replace LFB with generic name "Cache
Fill Buffer".

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/mem-events.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 91db7a0e2da6..eaa8efcf255b 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -282,7 +282,7 @@ static const char * const mem_lvl[] = {
 	"HIT",
 	"MISS",
 	"L1",
-	"LFB",
+	"Cache Fill Buffer",
 	"L2",
 	"L3",
 	"Local RAM",
@@ -298,7 +298,7 @@ static const char * const mem_lvlnum[] = {
 	[PERF_MEM_LVLNUM_EXTN_MEM] = "Ext Mem",
 	[PERF_MEM_LVLNUM_IO] = "I/O",
 	[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
-	[PERF_MEM_LVLNUM_LFB] = "LFB",
+	[PERF_MEM_LVLNUM_LFB] = "Cache Fill Buffer",
 	[PERF_MEM_LVLNUM_RAM] = "RAM",
 	[PERF_MEM_LVLNUM_PMEM] = "PMEM",
 	[PERF_MEM_LVLNUM_NA] = "N/A",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 14/14] perf script: Add missing fields in usage hint
  2022-06-16 11:52 ` [PATCH v2 10/14] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
                     ` (2 preceding siblings ...)
  2022-06-16 11:52   ` [PATCH v2 13/14] perf mem: Use more generic term for LFB Ravi Bangoria
@ 2022-06-16 11:52   ` Ravi Bangoria
  3 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-06-16 11:52 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Few fields are missing in the usage message printed when wrong
field option is passed. Add them in the list.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/builtin-script.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c689054002cc..35e10c71692f 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3824,9 +3824,10 @@ int cmd_script(int argc, const char **argv)
 		     "Valid types: hw,sw,trace,raw,synth. "
 		     "Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,"
 		     "addr,symoff,srcline,period,iregs,uregs,brstack,"
-		     "brstacksym,flags,bpf-output,brstackinsn,brstackinsnlen,brstackoff,"
-		     "callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc,tod,"
-		     "data_page_size,code_page_size,ins_lat",
+		     "brstacksym,flags,data_src,weight,bpf-output,brstackinsn,"
+		     "brstackinsnlen,brstackoff,callindent,insn,insnlen,synth,"
+		     "phys_addr,metric,misc,srccode,ipc,tod,data_page_size,"
+		     "code_page_size,ins_lat",
 		     parse_output_fields),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (9 preceding siblings ...)
  2022-06-16 11:52 ` [PATCH v2 10/14] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
@ 2022-07-12  9:00 ` Ravi Bangoria
  2022-07-12 11:35 ` Jiri Olsa
  11 siblings, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-07-12  9:00 UTC (permalink / raw)
  To: peterz, acme
  Cc: jolsa, namhyung, eranian, irogers, jmario, leo.yan, alisaidi, ak,
	kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla, ravi.bangoria


On 16-Jun-22 5:06 PM, Ravi Bangoria wrote:
> Perf mem and c2c tools are wrappers around perf record with mem load/
> store events. IBS tagged load/store sample provides most of the
> information needed for these tools. Enable support for these tools on
> AMD Zen processors based on IBS Op pmu.

Gentle ping!

Thank,
Ravi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD
  2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (10 preceding siblings ...)
  2022-07-12  9:00 ` [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
@ 2022-07-12 11:35 ` Jiri Olsa
  2022-07-18 15:34   ` Arnaldo Carvalho de Melo
  11 siblings, 1 reply; 21+ messages in thread
From: Jiri Olsa @ 2022-07-12 11:35 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, acme, namhyung, eranian, irogers, jmario, leo.yan,
	alisaidi, ak, kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla

On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
> Perf mem and c2c tools are wrappers around perf record with mem load/
> store events. IBS tagged load/store sample provides most of the
> information needed for these tools. Enable support for these tools on
> AMD Zen processors based on IBS Op pmu.
> 
> There are some limitations though: Only load/store instructions provide
> mem/c2c information. However, IBS does not provide a way to choose a
> particular type of instruction to tag. This results in many non-LS
> instructions being tagged which appear as N/A. IBS, being an uncore pmu
> from kernel point of view[1], does not support per process monitoring.
> Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.
> 
> Example:
>   $ sudo ./perf mem record -- -c 10000
>   ^C[ perf record: Woken up 227 times to write data ]
>   [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
> 
>   $ sudo ./perf mem report -F mem,sample,snoop
>   Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
>   Memory access                  Samples  Snoop
>   N/A                             700620  N/A
>   L1 hit                          126675  N/A
>   L2 hit                             424  N/A
>   L3 hit                             664  HitM
>   L3 hit                              10  N/A
>   Local RAM hit                        2  N/A
>   Remote RAM (1 hop) hit            8558  N/A
>   Remote Cache (1 hop) hit             3  N/A
>   Remote Cache (1 hop) hit             2  HitM
>   Remote Cache (2 hops) hit            10  HitM
>   Remote Cache (2 hops) hit             6  N/A
>   Uncached hit                         4  N/A
> 
> Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement patches[2]
> 
> [1]: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com
> [2]: https://lore.kernel.org/lkml/20220604044519.594-1-ravi.bangoria@amd.com
> 
> v1: https://lore.kernel.org/lkml/20220525093938.4101-1-ravi.bangoria@amd.com
> v1->v2:
>  - Instead of defining macros to extract IBS register bits, use existing
>    bitfield definitions. Zen4 has introduced additional set of bits in
>    IBS registers which this series also exploits and thus this series
>    now depends on IBS Zen4 enhancement patchset.
>  - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new event,
>    perf tool starts with a set of attributes and goes on reverting some
>    attributes in a predefined order until it succeeds or run out or all
>    attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
>    which always fails because IBS does not support guest filtering. The
>    problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
>    exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
>    support from kernel, using it from the perf tool need more changes.
>    I'll try to address this bug later.
>  - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
>    that physical address is set by arch pmu driver and should not be
>    overwritten.
> 
> 
> Ravi Bangoria (14):
>   perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>   perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
>   perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
>   perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
>   perf/x86/amd: Support PERF_SAMPLE_ADDR
>   perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
>   perf tool: Sync include/uapi/linux/perf_event.h header
>   perf tool: Sync arch/x86/include/asm/amd-ibs.h header
>   perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>   perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
>   perf mem/c2c: Add load store event mappings for AMD
>   perf mem/c2c: Avoid printing empty lines for unsupported events
>   perf mem: Use more generic term for LFB
>   perf script: Add missing fields in usage hint

tools part looks good to me

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

> 
>  arch/x86/events/amd/ibs.c                | 372 ++++++++++++++++++++++-
>  arch/x86/include/asm/amd-ibs.h           |  16 +
>  include/uapi/linux/perf_event.h          |   5 +-
>  kernel/events/core.c                     |   4 +-
>  tools/arch/x86/include/asm/amd-ibs.h     |  16 +
>  tools/include/uapi/linux/perf_event.h    |   5 +-
>  tools/perf/Documentation/perf-c2c.txt    |  14 +-
>  tools/perf/Documentation/perf-mem.txt    |   3 +-
>  tools/perf/Documentation/perf-record.txt |   1 +
>  tools/perf/arch/x86/util/mem-events.c    |  31 +-
>  tools/perf/builtin-c2c.c                 |   1 +
>  tools/perf/builtin-mem.c                 |   1 +
>  tools/perf/builtin-script.c              |   7 +-
>  tools/perf/util/mem-events.c             |  17 +-
>  14 files changed, 467 insertions(+), 26 deletions(-)
> 
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD
  2022-07-12 11:35 ` Jiri Olsa
@ 2022-07-18 15:34   ` Arnaldo Carvalho de Melo
       [not found]     ` <CA+JHD91X9_dMV-sXho_L9k326-Eneor4ZeOtw_WgWNtHbKzWxg@mail.gmail.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-07-18 15:34 UTC (permalink / raw)
  To: Peter Zijlstra, Jiri Olsa, Ravi Bangoria
  Cc: namhyung, eranian, irogers, jmario, leo.yan, alisaidi, ak,
	kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla

Em Tue, Jul 12, 2022 at 01:35:25PM +0200, Jiri Olsa escreveu:
> On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
> > Perf mem and c2c tools are wrappers around perf record with mem load/
> > store events. IBS tagged load/store sample provides most of the
> > information needed for these tools. Enable support for these tools on
> > AMD Zen processors based on IBS Op pmu.
> > 
> > There are some limitations though: Only load/store instructions provide
> > mem/c2c information. However, IBS does not provide a way to choose a
> > particular type of instruction to tag. This results in many non-LS
> > instructions being tagged which appear as N/A. IBS, being an uncore pmu
> > from kernel point of view[1], does not support per process monitoring.
> > Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.
> > 
> > Example:
> >   $ sudo ./perf mem record -- -c 10000
> >   ^C[ perf record: Woken up 227 times to write data ]
> >   [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
> > 
> >   $ sudo ./perf mem report -F mem,sample,snoop
> >   Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
> >   Memory access                  Samples  Snoop
> >   N/A                             700620  N/A
> >   L1 hit                          126675  N/A
> >   L2 hit                             424  N/A
> >   L3 hit                             664  HitM
> >   L3 hit                              10  N/A
> >   Local RAM hit                        2  N/A
> >   Remote RAM (1 hop) hit            8558  N/A
> >   Remote Cache (1 hop) hit             3  N/A
> >   Remote Cache (1 hop) hit             2  HitM
> >   Remote Cache (2 hops) hit            10  HitM
> >   Remote Cache (2 hops) hit             6  N/A
> >   Uncached hit                         4  N/A
> > 
> > Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement patches[2]
> > 
> > [1]: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com
> > [2]: https://lore.kernel.org/lkml/20220604044519.594-1-ravi.bangoria@amd.com
> > 
> > v1: https://lore.kernel.org/lkml/20220525093938.4101-1-ravi.bangoria@amd.com
> > v1->v2:
> >  - Instead of defining macros to extract IBS register bits, use existing
> >    bitfield definitions. Zen4 has introduced additional set of bits in
> >    IBS registers which this series also exploits and thus this series
> >    now depends on IBS Zen4 enhancement patchset.
> >  - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new event,
> >    perf tool starts with a set of attributes and goes on reverting some
> >    attributes in a predefined order until it succeeds or run out or all
> >    attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
> >    which always fails because IBS does not support guest filtering. The
> >    problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
> >    exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
> >    support from kernel, using it from the perf tool need more changes.
> >    I'll try to address this bug later.
> >  - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
> >    that physical address is set by arch pmu driver and should not be
> >    overwritten.
> > 
> > 
> > Ravi Bangoria (14):
> >   perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> >   perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
> >   perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
> >   perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
> >   perf/x86/amd: Support PERF_SAMPLE_ADDR
> >   perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
> >   perf tool: Sync include/uapi/linux/perf_event.h header
> >   perf tool: Sync arch/x86/include/asm/amd-ibs.h header
> >   perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> >   perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
> >   perf mem/c2c: Add load store event mappings for AMD
> >   perf mem/c2c: Avoid printing empty lines for unsupported events
> >   perf mem: Use more generic term for LFB
> >   perf script: Add missing fields in usage hint
> 
> tools part looks good to me
> 
> Acked-by: Jiri Olsa <jolsa@kernel.org>

What about the kernel bits? PeterZ? Is this in some tip branch?

- Arnaldo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD
       [not found]     ` <CA+JHD91X9_dMV-sXho_L9k326-Eneor4ZeOtw_WgWNtHbKzWxg@mail.gmail.com>
@ 2022-07-22  2:21       ` Ravi Bangoria
  2022-08-10 13:26         ` Arnaldo Carvalho de Melo
  2022-08-25 11:16         ` Ravi Bangoria
  0 siblings, 2 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-07-22  2:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, Ian Rogers, Joe Mario, Leo Yan, Ali Saidi,
	Andi Kleen, Kan Liang, Dave Hansen, H. Peter Anvin, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, X86 ML, linux-perf-users,
	Linux Kernel Mailing List, Sandipan Das, ananth.narayan,
	Kim Phillips, santosh.shukla, ravi.bangoria

On 21-Jul-22 10:54 PM, Arnaldo Carvalho de Melo wrote:
> Ping.
> 
> On Mon, Jul 18, 2022, 12:34 PM Arnaldo Carvalho de Melo <
> arnaldo.melo@gmail.com> wrote:
> 
>> Em Tue, Jul 12, 2022 at 01:35:25PM +0200, Jiri Olsa escreveu:
>>> On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
>>>> Perf mem and c2c tools are wrappers around perf record with mem load/
>>>> store events. IBS tagged load/store sample provides most of the
>>>> information needed for these tools. Enable support for these tools on
>>>> AMD Zen processors based on IBS Op pmu.
>>>>
>>>> There are some limitations though: Only load/store instructions provide
>>>> mem/c2c information. However, IBS does not provide a way to choose a
>>>> particular type of instruction to tag. This results in many non-LS
>>>> instructions being tagged which appear as N/A. IBS, being an uncore pmu
>>>> from kernel point of view[1], does not support per process monitoring.
>>>> Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.
>>>>
>>>> Example:
>>>>   $ sudo ./perf mem record -- -c 10000
>>>>   ^C[ perf record: Woken up 227 times to write data ]
>>>>   [ perf record: Captured and wrote 58.760 MB perf.data (836978
>> samples) ]
>>>>
>>>>   $ sudo ./perf mem report -F mem,sample,snoop
>>>>   Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
>>>>   Memory access                  Samples  Snoop
>>>>   N/A                             700620  N/A
>>>>   L1 hit                          126675  N/A
>>>>   L2 hit                             424  N/A
>>>>   L3 hit                             664  HitM
>>>>   L3 hit                              10  N/A
>>>>   Local RAM hit                        2  N/A
>>>>   Remote RAM (1 hop) hit            8558  N/A
>>>>   Remote Cache (1 hop) hit             3  N/A
>>>>   Remote Cache (1 hop) hit             2  HitM
>>>>   Remote Cache (2 hops) hit            10  HitM
>>>>   Remote Cache (2 hops) hit             6  N/A
>>>>   Uncached hit                         4  N/A
>>>>
>>>> Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement
>> patches[2]
>>>>
>>>> [1]:
>> https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com
>>>> [2]:
>> https://lore.kernel.org/lkml/20220604044519.594-1-ravi.bangoria@amd.com
>>>>
>>>> v1:
>> https://lore.kernel.org/lkml/20220525093938.4101-1-ravi.bangoria@amd.com
>>>> v1->v2:
>>>>  - Instead of defining macros to extract IBS register bits, use
>> existing
>>>>    bitfield definitions. Zen4 has introduced additional set of bits in
>>>>    IBS registers which this series also exploits and thus this series
>>>>    now depends on IBS Zen4 enhancement patchset.
>>>>  - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new
>> event,
>>>>    perf tool starts with a set of attributes and goes on reverting some
>>>>    attributes in a predefined order until it succeeds or run out or all
>>>>    attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
>>>>    which always fails because IBS does not support guest filtering. The
>>>>    problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
>>>>    exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
>>>>    support from kernel, using it from the perf tool need more changes.
>>>>    I'll try to address this bug later.
>>>>  - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
>>>>    that physical address is set by arch pmu driver and should not be
>>>>    overwritten.
>>>>
>>>>
>>>> Ravi Bangoria (14):
>>>>   perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>>>>   perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
>>>>   perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
>>>>   perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
>>>>   perf/x86/amd: Support PERF_SAMPLE_ADDR
>>>>   perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
>>>>   perf tool: Sync include/uapi/linux/perf_event.h header
>>>>   perf tool: Sync arch/x86/include/asm/amd-ibs.h header
>>>>   perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>>>>   perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
>>>>   perf mem/c2c: Add load store event mappings for AMD
>>>>   perf mem/c2c: Avoid printing empty lines for unsupported events
>>>>   perf mem: Use more generic term for LFB
>>>>   perf script: Add missing fields in usage hint
>>>
>>> tools part looks good to me
>>>
>>> Acked-by: Jiri Olsa <jolsa@kernel.org>
>>
>> What about the kernel bits? PeterZ? Is this in some tip branch?

Peter, Would you able to pick this up for next merge window? Please
note that, one dependency patch needs to be applied first from "IBS
Zen4 enhancement" series:

[PATCH v6 6/8] perf/x86/ibs: Add new IBS register bits into header
https://lore.kernel.org/lkml/20220604044519.594-7-ravi.bangoria@amd.com

Please let me know if you face any issues.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD
  2022-07-22  2:21       ` Ravi Bangoria
@ 2022-08-10 13:26         ` Arnaldo Carvalho de Melo
  2022-08-25 11:16         ` Ravi Bangoria
  1 sibling, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-08-10 13:26 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	Namhyung Kim, Stephane Eranian, Ian Rogers, Joe Mario, Leo Yan,
	Ali Saidi, Andi Kleen, Kan Liang, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, X86 ML, linux-perf-users,
	Linux Kernel Mailing List, Sandipan Das, ananth.narayan,
	Kim Phillips, santosh.shukla

Em Fri, Jul 22, 2022 at 07:51:27AM +0530, Ravi Bangoria escreveu:
> On 21-Jul-22 10:54 PM, Arnaldo Carvalho de Melo wrote:
> > On Mon, Jul 18, 2022, 12:34 PM Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
> >> Em Tue, Jul 12, 2022 at 01:35:25PM +0200, Jiri Olsa escreveu:
> >>> On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
> >>>> Perf mem and c2c tools are wrappers around perf record with mem load/
> >>>> store events. IBS tagged load/store sample provides most of the
> >>>> information needed for these tools. Enable support for these tools on
> >>>> AMD Zen processors based on IBS Op pmu.

> >>>> Ravi Bangoria (14):
> >>>>   perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> >>>>   perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
> >>>>   perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
> >>>>   perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
> >>>>   perf/x86/amd: Support PERF_SAMPLE_ADDR
> >>>>   perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
> >>>>   perf tool: Sync include/uapi/linux/perf_event.h header
> >>>>   perf tool: Sync arch/x86/include/asm/amd-ibs.h header
> >>>>   perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> >>>>   perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
> >>>>   perf mem/c2c: Add load store event mappings for AMD
> >>>>   perf mem/c2c: Avoid printing empty lines for unsupported events
> >>>>   perf mem: Use more generic term for LFB
> >>>>   perf script: Add missing fields in usage hint

> >>> tools part looks good to me

> >>> Acked-by: Jiri Olsa <jolsa@kernel.org>

> >> What about the kernel bits? PeterZ? Is this in some tip branch?

> Peter, Would you able to pick this up for next merge window? Please
> note that, one dependency patch needs to be applied first from "IBS
> Zen4 enhancement" series:
 
> [PATCH v6 6/8] perf/x86/ibs: Add new IBS register bits into header
> https://lore.kernel.org/lkml/20220604044519.594-7-ravi.bangoria@amd.com

It is there already:

⬢[acme@toolbox perf]$ git log --oneline torvalds/master | grep -m1 "Add new IBS register bits into header"
326ecc15c61c349c perf/x86/ibs: Add new IBS register bits into header
⬢[acme@toolbox perf]$

but not the other patches in this series:

⬢[acme@toolbox perf]$ git log --oneline torvalds/master | grep -m1 "amd: Support PERF_SAMPLE_PHY_ADDR"
⬢[acme@toolbox perf]$

- Arnaldo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD
  2022-07-22  2:21       ` Ravi Bangoria
  2022-08-10 13:26         ` Arnaldo Carvalho de Melo
@ 2022-08-25 11:16         ` Ravi Bangoria
  1 sibling, 0 replies; 21+ messages in thread
From: Ravi Bangoria @ 2022-08-25 11:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, Ian Rogers, Joe Mario, Leo Yan, Ali Saidi,
	Andi Kleen, Kan Liang, Dave Hansen, H. Peter Anvin, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, X86 ML, linux-perf-users,
	Linux Kernel Mailing List, Sandipan Das, ananth.narayan,
	Kim Phillips, santosh.shukla, ravi.bangoria

On 22-Jul-22 7:51 AM, Ravi Bangoria wrote:
> On 21-Jul-22 10:54 PM, Arnaldo Carvalho de Melo wrote:
>> Ping.
>>
>> On Mon, Jul 18, 2022, 12:34 PM Arnaldo Carvalho de Melo <
>> arnaldo.melo@gmail.com> wrote:
>>
>>> Em Tue, Jul 12, 2022 at 01:35:25PM +0200, Jiri Olsa escreveu:
>>>> On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
>>>>> Perf mem and c2c tools are wrappers around perf record with mem load/
>>>>> store events. IBS tagged load/store sample provides most of the
>>>>> information needed for these tools. Enable support for these tools on
>>>>> AMD Zen processors based on IBS Op pmu.
>>>>>
>>>>> There are some limitations though: Only load/store instructions provide
>>>>> mem/c2c information. However, IBS does not provide a way to choose a
>>>>> particular type of instruction to tag. This results in many non-LS
>>>>> instructions being tagged which appear as N/A. IBS, being an uncore pmu
>>>>> from kernel point of view[1], does not support per process monitoring.
>>>>> Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.
>>>>>
>>>>> Example:
>>>>>   $ sudo ./perf mem record -- -c 10000
>>>>>   ^C[ perf record: Woken up 227 times to write data ]
>>>>>   [ perf record: Captured and wrote 58.760 MB perf.data (836978
>>> samples) ]
>>>>>
>>>>>   $ sudo ./perf mem report -F mem,sample,snoop
>>>>>   Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
>>>>>   Memory access                  Samples  Snoop
>>>>>   N/A                             700620  N/A
>>>>>   L1 hit                          126675  N/A
>>>>>   L2 hit                             424  N/A
>>>>>   L3 hit                             664  HitM
>>>>>   L3 hit                              10  N/A
>>>>>   Local RAM hit                        2  N/A
>>>>>   Remote RAM (1 hop) hit            8558  N/A
>>>>>   Remote Cache (1 hop) hit             3  N/A
>>>>>   Remote Cache (1 hop) hit             2  HitM
>>>>>   Remote Cache (2 hops) hit            10  HitM
>>>>>   Remote Cache (2 hops) hit             6  N/A
>>>>>   Uncached hit                         4  N/A
>>>>>
>>>>> Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement
>>> patches[2]
>>>>>
>>>>> [1]:
>>> https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com
>>>>> [2]:
>>> https://lore.kernel.org/lkml/20220604044519.594-1-ravi.bangoria@amd.com
>>>>>
>>>>> v1:
>>> https://lore.kernel.org/lkml/20220525093938.4101-1-ravi.bangoria@amd.com
>>>>> v1->v2:
>>>>>  - Instead of defining macros to extract IBS register bits, use
>>> existing
>>>>>    bitfield definitions. Zen4 has introduced additional set of bits in
>>>>>    IBS registers which this series also exploits and thus this series
>>>>>    now depends on IBS Zen4 enhancement patchset.
>>>>>  - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new
>>> event,
>>>>>    perf tool starts with a set of attributes and goes on reverting some
>>>>>    attributes in a predefined order until it succeeds or run out or all
>>>>>    attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
>>>>>    which always fails because IBS does not support guest filtering. The
>>>>>    problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
>>>>>    exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
>>>>>    support from kernel, using it from the perf tool need more changes.
>>>>>    I'll try to address this bug later.
>>>>>  - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
>>>>>    that physical address is set by arch pmu driver and should not be
>>>>>    overwritten.
>>>>>
>>>>>
>>>>> Ravi Bangoria (14):
>>>>>   perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>>>>>   perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
>>>>>   perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
>>>>>   perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
>>>>>   perf/x86/amd: Support PERF_SAMPLE_ADDR
>>>>>   perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
>>>>>   perf tool: Sync include/uapi/linux/perf_event.h header
>>>>>   perf tool: Sync arch/x86/include/asm/amd-ibs.h header
>>>>>   perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>>>>>   perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
>>>>>   perf mem/c2c: Add load store event mappings for AMD
>>>>>   perf mem/c2c: Avoid printing empty lines for unsupported events
>>>>>   perf mem: Use more generic term for LFB
>>>>>   perf script: Add missing fields in usage hint
>>>>
>>>> tools part looks good to me
>>>>
>>>> Acked-by: Jiri Olsa <jolsa@kernel.org>
>>>
>>> What about the kernel bits? PeterZ? Is this in some tip branch?
> 
> Peter, Would you able to pick this up for next merge window? Please
> note that, one dependency patch needs to be applied first from "IBS
> Zen4 enhancement" series:
> 
> [PATCH v6 6/8] perf/x86/ibs: Add new IBS register bits into header
> https://lore.kernel.org/lkml/20220604044519.594-7-ravi.bangoria@amd.com

Peter, can you please pull this series. (Dependency patch is already
picked up by Boris.)

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-08-25 11:17 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16 11:36 [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
2022-06-16 11:36 ` [PATCH v2 01/14] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
2022-06-16 11:36 ` [PATCH v2 02/14] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions Ravi Bangoria
2022-06-16 11:36 ` [PATCH v2 03/14] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC Ravi Bangoria
2022-06-16 11:36 ` [PATCH v2 04/14] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} Ravi Bangoria
2022-06-16 11:36 ` [PATCH v2 05/14] perf/x86/amd: Support PERF_SAMPLE_ADDR Ravi Bangoria
2022-06-16 11:36 ` [PATCH v2 06/14] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR Ravi Bangoria
2022-06-16 11:36 ` [PATCH v2 07/14] perf tool: Sync include/uapi/linux/perf_event.h header Ravi Bangoria
2022-06-16 11:36 ` [PATCH v2 08/14] perf tool: Sync arch/x86/include/asm/amd-ibs.h header Ravi Bangoria
2022-06-16 11:36 ` [PATCH v2 09/14] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
2022-06-16 11:52 ` [PATCH v2 10/14] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
2022-06-16 11:52   ` [PATCH v2 11/14] perf mem/c2c: Add load store event mappings for AMD Ravi Bangoria
2022-06-16 11:52   ` [PATCH v2 12/14] perf mem/c2c: Avoid printing empty lines for unsupported events Ravi Bangoria
2022-06-16 11:52   ` [PATCH v2 13/14] perf mem: Use more generic term for LFB Ravi Bangoria
2022-06-16 11:52   ` [PATCH v2 14/14] perf script: Add missing fields in usage hint Ravi Bangoria
2022-07-12  9:00 ` [PATCH v2 00/14] perf mem/c2c: Add support for AMD Ravi Bangoria
2022-07-12 11:35 ` Jiri Olsa
2022-07-18 15:34   ` Arnaldo Carvalho de Melo
     [not found]     ` <CA+JHD91X9_dMV-sXho_L9k326-Eneor4ZeOtw_WgWNtHbKzWxg@mail.gmail.com>
2022-07-22  2:21       ` Ravi Bangoria
2022-08-10 13:26         ` Arnaldo Carvalho de Melo
2022-08-25 11:16         ` Ravi Bangoria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).