All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/15] perf mem/c2c: Add support for AMD
@ 2022-09-28  9:57 Ravi Bangoria
  2022-09-28  9:57 ` [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
                   ` (14 more replies)
  0 siblings, 15 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:57 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Enable support for these tools on
AMD Zen processors based on IBS Op pmu.

There are some limitations though: Only load/store micro-ops provide
mem/c2c information. Whereas, IBS does not have a way to choose a
particular type of micro-op to tag. This results in many non-LS
micro-ops being tagged which appear as N/A in the perf report. IBS,
being an uncore pmu from kernel point of view[1], does not support per
process monitoring. Thus, perf mem/c2c on AMD are currently supported
in per-cpu mode only.

Example:
  $ sudo ./perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo ./perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit            10  HitM
  Remote Cache (2 hops) hit             6  N/A
  Uncached hit                         4  N/A

Prepared on queue/perf/core (cce6a2d7e0e49).

v2: https://lore.kernel.org/all/20220616113638.900-1-ravi.bangoria@amd.com
v2->v3:
 - Use sample_flags instead of __PERF_SAMPLE_*_EARLY varients
 - Make PERF_SAMPLE_WEIGHT independent of PERF_SAMPLE_DATA_SRC
 - Add a patch to reverse sync PERF_MEM_SNOOPX_PEER from tools
   to kernel uapi header
 - Add Acked-by: Jiri Olsa for tool side unchanged patches

Also, a recent patch[2] to test perf mem fails on AMD because of
aforementioned limitations.

[1]: https://lore.kernel.org/lkml/20220829113347.295-1-ravi.bangoria@amd.com
[2]: https://lore.kernel.org/lkml/20220924133408.1125903-1-leo.yan%40linaro.org


Ravi Bangoria (15):
  perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
  perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  perf/x86/amd: Support PERF_SAMPLE_ADDR
  perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
  perf tool: Sync include/uapi/linux/perf_event.h header
  perf tool: Sync arch/x86/include/asm/amd-ibs.h header
  perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
  perf mem/c2c: Add load store event mappings for AMD
  perf mem/c2c: Avoid printing empty lines for unsupported events
  perf mem: Use more generic term for LFB
  perf script: Add missing fields in usage hint

 arch/x86/events/amd/ibs.c                | 345 ++++++++++++++++++++++-
 arch/x86/include/asm/amd-ibs.h           |  16 ++
 include/uapi/linux/perf_event.h          |   6 +-
 kernel/events/core.c                     |   3 +-
 tools/arch/x86/include/asm/amd-ibs.h     |  16 ++
 tools/include/uapi/linux/perf_event.h    |   4 +-
 tools/perf/Documentation/perf-c2c.txt    |  14 +-
 tools/perf/Documentation/perf-mem.txt    |   3 +-
 tools/perf/Documentation/perf-record.txt |   1 +
 tools/perf/arch/x86/util/mem-events.c    |  31 +-
 tools/perf/builtin-c2c.c                 |   1 +
 tools/perf/builtin-mem.c                 |   1 +
 tools/perf/builtin-script.c              |   7 +-
 tools/perf/util/mem-events.c             |  17 +-
 14 files changed, 438 insertions(+), 27 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
@ 2022-09-28  9:57 ` Ravi Bangoria
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-09-30 10:48   ` [PATCH v3 01/15] " kajoljain
  2022-09-28  9:57 ` [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions Ravi Bangoria
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:57 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
accesses but it can not distinguish between local and remote IO.
Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 include/uapi/linux/perf_event.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e639c74cf5fb..4ae3c249f675 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1336,7 +1336,9 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
-/* 5-0xa available */
+/* 5-0x8 available */
+#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
+#define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
 #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */
 #define PERF_MEM_LVLNUM_RAM	0x0d /* RAM */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
  2022-09-28  9:57 ` [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
@ 2022-09-28  9:57 ` Ravi Bangoria
  2022-09-30  4:41   ` Namhyung Kim
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-09-28  9:57 ` [PATCH v3 03/15] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC Ravi Bangoria
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:57 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IBS_OP_DATA2 DataSrc provides detail about location of the data
being accessed from by load ops. Define macros for legacy and
extended DataSrc values.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/include/asm/amd-ibs.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
index f3eb098d63d4..cb2a5e113daa 100644
--- a/arch/x86/include/asm/amd-ibs.h
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -6,6 +6,22 @@
 
 #include <asm/msr-index.h>
 
+/* IBS_OP_DATA2 DataSrc */
+#define IBS_DATA_SRC_LOC_CACHE			 2
+#define IBS_DATA_SRC_DRAM			 3
+#define IBS_DATA_SRC_REM_CACHE			 4
+#define IBS_DATA_SRC_IO				 7
+
+/* IBS_OP_DATA2 DataSrc Extension */
+#define IBS_DATA_SRC_EXT_LOC_CACHE		 1
+#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE		 2
+#define IBS_DATA_SRC_EXT_DRAM			 3
+#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE		 5
+#define IBS_DATA_SRC_EXT_PMEM			 6
+#define IBS_DATA_SRC_EXT_IO			 7
+#define IBS_DATA_SRC_EXT_EXT_MEM		 8
+#define IBS_DATA_SRC_EXT_PEER_AGENT_MEM		12
+
 /*
  * IBS Hardware MSRs
  */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 03/15] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
  2022-09-28  9:57 ` [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
  2022-09-28  9:57 ` [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions Ravi Bangoria
@ 2022-09-28  9:57 ` Ravi Bangoria
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-09-28  9:57 ` [PATCH v3 04/15] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} Ravi Bangoria
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:57 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

struct perf_mem_data_src is used to pass arch specific memory access
details into generic form. These details gets consumed by tools like
perf mem and c2c. IBS tagged load/store sample provides most of the
information needed for these tools. Add a logic to convert IBS
specific raw data into perf_mem_data_src.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 318 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 312 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index c29a006954c7..e20caa5cf02f 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -678,6 +678,312 @@ static struct perf_ibs perf_ibs_op = {
 	.get_count		= get_ibs_op_count,
 };
 
+static void perf_ibs_get_mem_op(union ibs_op_data3 *op_data3,
+				struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_op = PERF_MEM_OP_NA;
+
+	if (op_data3->ld_op)
+		data_src->mem_op = PERF_MEM_OP_LOAD;
+	else if (op_data3->st_op)
+		data_src->mem_op = PERF_MEM_OP_STORE;
+}
+
+/*
+ * Processors having CPUID_Fn8000001B_EAX[11] aka IBS_CAPS_ZEN4 has
+ * more fine granular DataSrc encodings. Others have coarse.
+ */
+static u8 perf_ibs_data_src(union ibs_op_data2 *op_data2)
+{
+	if (ibs_caps & IBS_CAPS_ZEN4)
+		return (op_data2->data_src_hi << 3) | op_data2->data_src_lo;
+
+	return op_data2->data_src_lo;
+}
+
+static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
+				 union ibs_op_data3 *op_data3,
+				 struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u8 ibs_data_src = perf_ibs_data_src(op_data2);
+
+	data_src->mem_lvl = 0;
+
+	/*
+	 * DcMiss, L2Miss, DataSrc, DcMissLat etc. are all invalid for Uncached
+	 * memory accesses. So, check DcUcMemAcc bit early.
+	 */
+	if (op_data3->dc_uc_mem_acc && ibs_data_src != IBS_DATA_SRC_EXT_IO) {
+		data_src->mem_lvl = PERF_MEM_LVL_UNC | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* L1 Hit */
+	if (op_data3->dc_miss == 0) {
+		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* L2 Hit */
+	if (op_data3->l2_miss == 0) {
+		/* Erratum #1293 */
+		if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xF ||
+		    !(op_data3->sw_pf || op_data3->dc_miss_no_mab_alloc)) {
+			data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/*
+	 * OP_DATA2 is valid only for load ops. Skip all checks which
+	 * uses OP_DATA2[DataSrc].
+	 */
+	if (data_src->mem_op != PERF_MEM_OP_LOAD)
+		goto check_mab;
+
+	/* L3 Hit */
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	} else {
+		if (ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_REM_CCE1 |
+					    PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/* A peer cache in a near CCX */
+	if (ibs_caps & IBS_CAPS_ZEN4 &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE) {
+		data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* A peer cache in a far CCX */
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	} else {
+		if (ibs_data_src == IBS_DATA_SRC_REM_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/* DRAM */
+	if (ibs_data_src == IBS_DATA_SRC_EXT_DRAM) {
+		if (op_data2->rmt_node == 0)
+			data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
+		else
+			data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* PMEM */
+	if (ibs_caps & IBS_CAPS_ZEN4 && ibs_data_src == IBS_DATA_SRC_EXT_PMEM) {
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_PMEM;
+		if (op_data2->rmt_node) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+	/* Extension Memory */
+	if (ibs_caps & IBS_CAPS_ZEN4 &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_EXTN_MEM;
+		if (op_data2->rmt_node) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+	/* IO */
+	if (ibs_data_src == IBS_DATA_SRC_EXT_IO) {
+		data_src->mem_lvl = PERF_MEM_LVL_IO;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_IO;
+		if (op_data2->rmt_node) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+check_mab:
+	/*
+	 * MAB (Miss Address Buffer) Hit. MAB keeps track of outstanding
+	 * DC misses. However, such data may come from any level in mem
+	 * hierarchy. IBS provides detail about both MAB as well as actual
+	 * DataSrc simultaneously. Prioritize DataSrc over MAB, i.e. set
+	 * MAB only when IBS fails to provide DataSrc.
+	 */
+	if (op_data3->dc_miss_no_mab_alloc) {
+		data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	data_src->mem_lvl = PERF_MEM_LVL_NA;
+}
+
+static bool perf_ibs_cache_hit_st_valid(void)
+{
+	/* 0: Uninitialized, 1: Valid, -1: Invalid */
+	static int cache_hit_st_valid;
+
+	if (unlikely(!cache_hit_st_valid)) {
+		if (boot_cpu_data.x86 == 0x19 &&
+		    (boot_cpu_data.x86_model <= 0xF ||
+		    (boot_cpu_data.x86_model >= 0x20 &&
+		     boot_cpu_data.x86_model <= 0x5F))) {
+			cache_hit_st_valid = -1;
+		} else {
+			cache_hit_st_valid = 1;
+		}
+	}
+
+	return cache_hit_st_valid == 1;
+}
+
+static void perf_ibs_get_mem_snoop(union ibs_op_data2 *op_data2,
+				   struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u8 ibs_data_src;
+
+	data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+
+	if (!perf_ibs_cache_hit_st_valid() ||
+	    data_src->mem_op != PERF_MEM_OP_LOAD ||
+	    data_src->mem_lvl & PERF_MEM_LVL_L1 ||
+	    data_src->mem_lvl & PERF_MEM_LVL_L2 ||
+	    op_data2->cache_hit_st)
+		return;
+
+	ibs_data_src = perf_ibs_data_src(op_data2);
+
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE ||
+		    ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE ||
+		    ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE)
+			data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+	} else if (ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
+		data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+	}
+}
+
+static void perf_ibs_get_tlb_lvl(union ibs_op_data3 *op_data3,
+				 struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_dtlb = PERF_MEM_TLB_NA;
+
+	if (!op_data3->dc_lin_addr_valid)
+		return;
+
+	if (!op_data3->dc_l1tlb_miss) {
+		data_src->mem_dtlb = PERF_MEM_TLB_L1 | PERF_MEM_TLB_HIT;
+		return;
+	}
+
+	if (!op_data3->dc_l2tlb_miss) {
+		data_src->mem_dtlb = PERF_MEM_TLB_L2 | PERF_MEM_TLB_HIT;
+		return;
+	}
+
+	data_src->mem_dtlb = PERF_MEM_TLB_L2 | PERF_MEM_TLB_MISS;
+}
+
+static void perf_ibs_get_mem_lock(union ibs_op_data3 *op_data3,
+				  struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_lock = PERF_MEM_LOCK_NA;
+
+	if (op_data3->dc_locked_op)
+		data_src->mem_lock = PERF_MEM_LOCK_LOCKED;
+}
+
+#define ibs_op_msr_idx(msr)	(msr - MSR_AMD64_IBSOPCTL)
+
+static void perf_ibs_get_data_src(struct perf_ibs_data *ibs_data,
+				  struct perf_sample_data *data,
+				  union ibs_op_data2 *op_data2,
+				  union ibs_op_data3 *op_data3)
+{
+	perf_ibs_get_mem_lvl(op_data2, op_data3, data);
+	perf_ibs_get_mem_snoop(op_data2, data);
+	perf_ibs_get_tlb_lvl(op_data3, data);
+	perf_ibs_get_mem_lock(op_data3, data);
+}
+
+static __u64 perf_ibs_get_op_data2(struct perf_ibs_data *ibs_data,
+				   union ibs_op_data3 *op_data3)
+{
+	__u64 val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA2)];
+
+	/* Erratum #1293 */
+	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model <= 0xF &&
+	    (op_data3->sw_pf || op_data3->dc_miss_no_mab_alloc)) {
+		/*
+		 * OP_DATA2 has only two fields on Zen3: DataSrc and RmtNode.
+		 * DataSrc=0 is 'No valid status' and RmtNode is invalid when
+		 * DataSrc=0.
+		 */
+		val = 0;
+	}
+	return val;
+}
+
+static void perf_ibs_parse_ld_st_data(__u64 sample_type,
+				      struct perf_ibs_data *ibs_data,
+				      struct perf_sample_data *data)
+{
+	union ibs_op_data3 op_data3;
+	union ibs_op_data2 op_data2;
+
+	data->data_src.val = PERF_MEM_NA;
+	op_data3.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+
+	perf_ibs_get_mem_op(&op_data3, data);
+	if (data->data_src.mem_op != PERF_MEM_OP_LOAD &&
+	    data->data_src.mem_op != PERF_MEM_OP_STORE)
+		return;
+
+	op_data2.val = perf_ibs_get_op_data2(ibs_data, &op_data3);
+
+	if (sample_type & PERF_SAMPLE_DATA_SRC) {
+		perf_ibs_get_data_src(ibs_data, data, &op_data2, &op_data3);
+		data->sample_flags |= PERF_SAMPLE_DATA_SRC;
+	}
+}
+
+static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
+				   int check_rip)
+{
+	if (sample_type & PERF_SAMPLE_RAW ||
+	    (perf_ibs == &perf_ibs_op &&
+	     sample_type & PERF_SAMPLE_DATA_SRC))
+		return perf_ibs->offset_max;
+	else if (check_rip)
+		return 3;
+	return 1;
+}
+
 static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 {
 	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
@@ -725,12 +1031,9 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	size = 1;
 	offset = 1;
 	check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
-	if (event->attr.sample_type & PERF_SAMPLE_RAW)
-		offset_max = perf_ibs->offset_max;
-	else if (check_rip)
-		offset_max = 3;
-	else
-		offset_max = 1;
+
+	offset_max = perf_ibs_get_offset_max(perf_ibs, event->attr.sample_type, check_rip);
+
 	do {
 		rdmsrl(msr + offset, *buf++);
 		size++;
@@ -784,6 +1087,9 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		data.sample_flags |= PERF_SAMPLE_RAW;
 	}
 
+	if (perf_ibs == &perf_ibs_op)
+		perf_ibs_parse_ld_st_data(event->attr.sample_type, &ibs_data, &data);
+
 	/*
 	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
 	 * recorded as part of interrupt regs. Thus we need to use rip from
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 04/15] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (2 preceding siblings ...)
  2022-09-28  9:57 ` [PATCH v3 03/15] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC Ravi Bangoria
@ 2022-09-28  9:57 ` Ravi Bangoria
  2022-09-30  5:09   ` Namhyung Kim
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-09-28  9:57 ` [PATCH v3 05/15] perf/x86/amd: Support PERF_SAMPLE_ADDR Ravi Bangoria
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:57 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IbsDcMissLat indicates the number of clock cycles from when a miss is
detected in the data cache to when the data was delivered to the core.
Similarly, IbsTagToRetCtr provides number of cycles from when the op
was tagged to when the op was retired. Consider these fields for
sample->weight.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
Note:
While opening a new event, perf tool starts with a set of attributes
and goes on reverting some attributes in a predefined order until it
succeeds or run out or all attempts. Here, 1st attempt includes both
WEIGHT_STRUCT and exclude_guest which always fails because IBS does
not support guest filtering. The problem however is, perf reverts
WEIGHT_STRUCT but keeps trying with exclude_guest. Thus, although,
this patch enables WEIGHT_STRUCT support from kernel, using it from
the perf tool needs more changes(not included in this series).

 arch/x86/events/amd/ibs.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e20caa5cf02f..d883694e0fd4 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -955,6 +955,7 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
 {
 	union ibs_op_data3 op_data3;
 	union ibs_op_data2 op_data2;
+	union ibs_op_data op_data;
 
 	data->data_src.val = PERF_MEM_NA;
 	op_data3.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
@@ -970,6 +971,19 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
 		perf_ibs_get_data_src(ibs_data, data, &op_data2, &op_data3);
 		data->sample_flags |= PERF_SAMPLE_DATA_SRC;
 	}
+
+	if (sample_type & PERF_SAMPLE_WEIGHT_TYPE && op_data3.dc_miss &&
+	    data->data_src.mem_op == PERF_MEM_OP_LOAD) {
+		op_data.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA)];
+
+		if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
+			data->weight.var1_dw = op_data3.dc_miss_lat;
+			data->weight.var2_w = op_data.tag_to_ret_ctr;
+		} else if (sample_type & PERF_SAMPLE_WEIGHT) {
+			data->weight.full = op_data3.dc_miss_lat;
+		}
+		data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+	}
 }
 
 static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
@@ -977,7 +991,8 @@ static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
 {
 	if (sample_type & PERF_SAMPLE_RAW ||
 	    (perf_ibs == &perf_ibs_op &&
-	     sample_type & PERF_SAMPLE_DATA_SRC))
+	     (sample_type & PERF_SAMPLE_DATA_SRC ||
+	      sample_type & PERF_SAMPLE_WEIGHT_TYPE)))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 05/15] perf/x86/amd: Support PERF_SAMPLE_ADDR
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (3 preceding siblings ...)
  2022-09-28  9:57 ` [PATCH v3 04/15] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} Ravi Bangoria
@ 2022-09-28  9:57 ` Ravi Bangoria
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-09-28  9:57 ` [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR Ravi Bangoria
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:57 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IBS_DC_LINADDR provides the linear data address for the tagged load/
store operation. Populate perf sample address using it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d883694e0fd4..0ad49105c154 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -984,6 +984,11 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
 		}
 		data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
 	}
+
+	if (sample_type & PERF_SAMPLE_ADDR && op_data3.dc_lin_addr_valid) {
+		data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
+		data->sample_flags |= PERF_SAMPLE_ADDR;
+	}
 }
 
 static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
@@ -992,7 +997,8 @@ static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
 	if (sample_type & PERF_SAMPLE_RAW ||
 	    (perf_ibs == &perf_ibs_op &&
 	     (sample_type & PERF_SAMPLE_DATA_SRC ||
-	      sample_type & PERF_SAMPLE_WEIGHT_TYPE)))
+	      sample_type & PERF_SAMPLE_WEIGHT_TYPE ||
+	      sample_type & PERF_SAMPLE_ADDR)))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (4 preceding siblings ...)
  2022-09-28  9:57 ` [PATCH v3 05/15] perf/x86/amd: Support PERF_SAMPLE_ADDR Ravi Bangoria
@ 2022-09-28  9:57 ` Ravi Bangoria
  2022-09-30  4:59   ` Namhyung Kim
                     ` (2 more replies)
  2022-09-28  9:57 ` [PATCH v3 07/15] perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file Ravi Bangoria
                   ` (8 subsequent siblings)
  14 siblings, 3 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:57 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IBS_DC_PHYSADDR provides the physical data address for the tagged load/
store operation. Populate perf sample physical address using it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 8 +++++++-
 kernel/events/core.c      | 3 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 0ad49105c154..3271735f0070 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -989,6 +989,11 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
 		data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
 		data->sample_flags |= PERF_SAMPLE_ADDR;
 	}
+
+	if (sample_type & PERF_SAMPLE_PHYS_ADDR && op_data3.dc_phy_addr_valid) {
+		data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
+		data->sample_flags |= PERF_SAMPLE_PHYS_ADDR;
+	}
 }
 
 static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
@@ -998,7 +1003,8 @@ static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
 	    (perf_ibs == &perf_ibs_op &&
 	     (sample_type & PERF_SAMPLE_DATA_SRC ||
 	      sample_type & PERF_SAMPLE_WEIGHT_TYPE ||
-	      sample_type & PERF_SAMPLE_ADDR)))
+	      sample_type & PERF_SAMPLE_ADDR ||
+	      sample_type & PERF_SAMPLE_PHYS_ADDR)))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e1ffdb861b53..49bc3b5e6c8a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7435,7 +7435,8 @@ void perf_prepare_sample(struct perf_event_header *header,
 		header->size += size;
 	}
 
-	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
+	if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
+	    filtered_sample_type & PERF_SAMPLE_PHYS_ADDR)
 		data->phys_addr = perf_virt_to_phys(data->addr);
 
 #ifdef CONFIG_CGROUP_PERF
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 07/15] perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (5 preceding siblings ...)
  2022-09-28  9:57 ` [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR Ravi Bangoria
@ 2022-09-28  9:57 ` Ravi Bangoria
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-09-28  9:57 ` [PATCH v3 08/15] perf tool: Sync include/uapi/linux/perf_event.h header Ravi Bangoria
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:57 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

PERF_MEM_SNOOPX_PEER is defined only in tools uapi header. Although
it's used only by perf tool, not defining it in kernel header can
create problems in future.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 include/uapi/linux/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 4ae3c249f675..85be78e0e7f6 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1356,7 +1356,7 @@ union perf_mem_data_src {
 #define PERF_MEM_SNOOP_SHIFT	19
 
 #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
 #define PERF_MEM_SNOOPX_SHIFT  38
 
 /* locked instruction */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 08/15] perf tool: Sync include/uapi/linux/perf_event.h header
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (6 preceding siblings ...)
  2022-09-28  9:57 ` [PATCH v3 07/15] perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file Ravi Bangoria
@ 2022-09-28  9:57 ` Ravi Bangoria
  2022-09-28  9:57 ` [PATCH v3 09/15] perf tool: Sync arch/x86/include/asm/amd-ibs.h header Ravi Bangoria
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:57 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Two new fields for mem_lvl_num has been introduced: PERF_MEM_LVLNUM_IO
and PERF_MEM_LVLNUM_EXTN_MEM which are required to support perf mem/c2c
on AMD platform.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/include/uapi/linux/perf_event.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 581ed4bdc062..9b65fc7d2377 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1295,7 +1295,9 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
-/* 5-0xa available */
+/* 5-0x8 available */
+#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
+#define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
 #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */
 #define PERF_MEM_LVLNUM_RAM	0x0d /* RAM */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 09/15] perf tool: Sync arch/x86/include/asm/amd-ibs.h header
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (7 preceding siblings ...)
  2022-09-28  9:57 ` [PATCH v3 08/15] perf tool: Sync include/uapi/linux/perf_event.h header Ravi Bangoria
@ 2022-09-28  9:57 ` Ravi Bangoria
  2022-09-28  9:58 ` [PATCH v3 10/15] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:57 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Although new details added into this header is currently used by
kernel only, tools copy needs to be in sync with kernel file.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/arch/x86/include/asm/amd-ibs.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/arch/x86/include/asm/amd-ibs.h b/tools/arch/x86/include/asm/amd-ibs.h
index 9a3312e12e2e..93807b437e4d 100644
--- a/tools/arch/x86/include/asm/amd-ibs.h
+++ b/tools/arch/x86/include/asm/amd-ibs.h
@@ -6,6 +6,22 @@
 
 #include "msr-index.h"
 
+/* IBS_OP_DATA2 DataSrc */
+#define IBS_DATA_SRC_LOC_CACHE			 2
+#define IBS_DATA_SRC_DRAM			 3
+#define IBS_DATA_SRC_REM_CACHE			 4
+#define IBS_DATA_SRC_IO				 7
+
+/* IBS_OP_DATA2 DataSrc Extension */
+#define IBS_DATA_SRC_EXT_LOC_CACHE		 1
+#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE		 2
+#define IBS_DATA_SRC_EXT_DRAM			 3
+#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE		 5
+#define IBS_DATA_SRC_EXT_PMEM			 6
+#define IBS_DATA_SRC_EXT_IO			 7
+#define IBS_DATA_SRC_EXT_EXT_MEM		 8
+#define IBS_DATA_SRC_EXT_PEER_AGENT_MEM		12
+
 /*
  * IBS Hardware MSRs
  */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 10/15] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (8 preceding siblings ...)
  2022-09-28  9:57 ` [PATCH v3 09/15] perf tool: Sync arch/x86/include/asm/amd-ibs.h header Ravi Bangoria
@ 2022-09-28  9:58 ` Ravi Bangoria
  2022-09-28  9:58 ` [PATCH v3 11/15] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:58 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Add support for printing these new fields in perf mem report.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/mem-events.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 764883183519..96a15b6dbfa3 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -294,6 +294,8 @@ static const char * const mem_lvl[] = {
 };
 
 static const char * const mem_lvlnum[] = {
+	[PERF_MEM_LVLNUM_EXTN_MEM] = "Ext Mem",
+	[PERF_MEM_LVLNUM_IO] = "I/O",
 	[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
 	[PERF_MEM_LVLNUM_LFB] = "LFB",
 	[PERF_MEM_LVLNUM_RAM] = "RAM",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 11/15] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (9 preceding siblings ...)
  2022-09-28  9:58 ` [PATCH v3 10/15] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
@ 2022-09-28  9:58 ` Ravi Bangoria
  2022-09-28  9:58 ` [PATCH v3 12/15] perf mem/c2c: Add load store event mappings for AMD Ravi Bangoria
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:58 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Currently perf sets PERF_SAMPLE_WEIGHT flag only for mem load events.
Set it for combined load-store event as well which will enable recording
of load latency by default on arch that does not support independent
mem load event.

Also document missing -W in perf-record man page.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/Documentation/perf-record.txt | 1 +
 tools/perf/builtin-c2c.c                 | 1 +
 tools/perf/builtin-mem.c                 | 1 +
 3 files changed, 3 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 099817ef5150..86d6c93a9552 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -407,6 +407,7 @@ is enabled for all the sampling events. The sampled branch type is the same for
 The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
 Note that this feature may not be available on all processors.
 
+-W::
 --weight::
 Enable weightened sampling. An additional weight is recorded per sample and can be
 displayed with the weight and local_weight sort keys.  This currently works for TSX
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 653e13b5037e..a222268cca3a 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -3284,6 +3284,7 @@ static int perf_c2c__record(int argc, const char **argv)
 		 */
 		if (e->tag) {
 			e->record = true;
+			rec_argv[i++] = "-W";
 		} else {
 			e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
 			e->record = true;
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 9e435fd23503..f7dd8216de72 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -122,6 +122,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 	    (mem->operation & MEM_OPERATION_LOAD) &&
 	    (mem->operation & MEM_OPERATION_STORE)) {
 		e->record = true;
+		rec_argv[i++] = "-W";
 	} else {
 		if (mem->operation & MEM_OPERATION_LOAD) {
 			e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 12/15] perf mem/c2c: Add load store event mappings for AMD
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (10 preceding siblings ...)
  2022-09-28  9:58 ` [PATCH v3 11/15] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
@ 2022-09-28  9:58 ` Ravi Bangoria
  2022-09-28  9:58 ` [PATCH v3 13/15] perf mem/c2c: Avoid printing empty lines for unsupported events Ravi Bangoria
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:58 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Wire in ibs_op// event as mem-ldst
event for AMD.

There are some limitations though: Only load/store micro-ops provide
mem/c2c information. Whereas, IBS does not have a way to choose a
particular type of micro-op to tag. This results in many non-LS
micro-ops being tagged which appear as N/A in the perf report. IBS,
being an uncore pmu from kernel point of view[1], does not support per
process monitoring. Thus, perf mem/c2c on AMD are currently supported
in per-cpu mode only.

Example:
  $ sudo ./perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo ./perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit            10  HitM
  Remote Cache (2 hops) hit             6  N/A
  Uncached hit                         4  N/A

[1]: https://lore.kernel.org/lkml/20220829113347.295-1-ravi.bangoria@amd.com

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/Documentation/perf-c2c.txt | 14 ++++++++----
 tools/perf/Documentation/perf-mem.txt |  3 ++-
 tools/perf/arch/x86/util/mem-events.c | 31 +++++++++++++++++++++++++--
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index f1f7ae6b08d1..5c5eb2def83e 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -19,9 +19,10 @@ C2C stands for Cache To Cache.
 The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
 you to track down the cacheline contentions.
 
-On x86, the tool is based on load latency and precise store facility events
+On Intel, the tool is based on load latency and precise store facility events
 provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
-with thresholding feature.
+with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware
+limitations, perf c2c is not supported on Zen3 cpus).
 
 These events provide:
   - memory address of the access
@@ -49,7 +50,8 @@ RECORD OPTIONS
 
 -l::
 --ldlat::
-	Configure mem-loads latency. (x86 only)
+	Configure mem-loads latency. Supported on Intel and Arm64 processors
+	only. Ignored on other archs.
 
 -k::
 --all-kernel::
@@ -135,11 +137,15 @@ Following perf record options are configured by default:
   -W,-d,--phys-data,--sample-cpu
 
 Unless specified otherwise with '-e' option, following events are monitored by
-default on x86:
+default on Intel:
 
   cpu/mem-loads,ldlat=30/P
   cpu/mem-stores/P
 
+following on AMD:
+
+  ibs_op//
+
 and following on PowerPC:
 
   cpu/mem-loads/
diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 66177511c5c4..005c95580b1e 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -85,7 +85,8 @@ RECORD OPTIONS
 	Be more verbose (show counter open errors, etc)
 
 --ldlat <n>::
-	Specify desired latency for loads event. (x86 only)
+	Specify desired latency for loads event. Supported on Intel and Arm64
+	processors only. Ignored on other archs.
 
 In addition, for report all perf report options are valid, and for record
 all perf record options.
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index 5214370ca4e4..f683ac702247 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -1,7 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "util/pmu.h"
+#include "util/env.h"
 #include "map_symbol.h"
 #include "mem-events.h"
+#include "linux/string.h"
 
 static char mem_loads_name[100];
 static bool mem_loads_name__init;
@@ -12,18 +14,43 @@ static char mem_stores_name[100];
 
 #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
 
-static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+static struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
 	E("ldlat-loads",	"%s/mem-loads,ldlat=%u/P",	"%s/events/mem-loads"),
 	E("ldlat-stores",	"%s/mem-stores/P",		"%s/events/mem-stores"),
 	E(NULL,			NULL,				NULL),
 };
 
+static struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
+	E(NULL,		NULL,		NULL),
+	E(NULL,		NULL,		NULL),
+	E("mem-ldst",	"ibs_op//",	"ibs_op"),
+};
+
+static int perf_mem_is_amd_cpu(void)
+{
+	struct perf_env env = { .total_mem = 0, };
+
+	perf_env__cpuid(&env);
+	if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD"))
+		return 1;
+	return -1;
+}
+
 struct perf_mem_event *perf_mem_events__ptr(int i)
 {
+	/* 0: Uninitialized, 1: Yes, -1: No */
+	static int is_amd;
+
 	if (i >= PERF_MEM_EVENTS__MAX)
 		return NULL;
 
-	return &perf_mem_events[i];
+	if (!is_amd)
+		is_amd = perf_mem_is_amd_cpu();
+
+	if (is_amd == 1)
+		return &perf_mem_events_amd[i];
+
+	return &perf_mem_events_intel[i];
 }
 
 bool is_mem_loads_aux_event(struct evsel *leader)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 13/15] perf mem/c2c: Avoid printing empty lines for unsupported events
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (11 preceding siblings ...)
  2022-09-28  9:58 ` [PATCH v3 12/15] perf mem/c2c: Add load store event mappings for AMD Ravi Bangoria
@ 2022-09-28  9:58 ` Ravi Bangoria
  2022-09-28  9:58 ` [PATCH v3 14/15] perf mem: Use more generic term for LFB Ravi Bangoria
  2022-09-28  9:58 ` [PATCH v3 15/15] perf script: Add missing fields in usage hint Ravi Bangoria
  14 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:58 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Perf mem and c2c can be used with 3 different events: load, store and
combined load-store. Some architectures might support only partial set
of events in which case, perf prints empty line for unsupported events.
Avoid that.

Ex, AMD Zen cpus supports only combined load-store event and does not
support individual load and store event.

Before patch:
  $ ./perf mem record -e list


  mem-ldst     : available

After patch:
  $ ./perf mem record -e list
  mem-ldst     : available

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/mem-events.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 96a15b6dbfa3..4553b4389b17 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -156,11 +156,12 @@ void perf_mem_events__list(void)
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
 		struct perf_mem_event *e = perf_mem_events__ptr(j);
 
-		fprintf(stderr, "%-13s%-*s%s\n",
-			e->tag ?: "",
-			verbose > 0 ? 25 : 0,
-			verbose > 0 ? perf_mem_events__name(j, NULL) : "",
-			e->supported ? ": available" : "");
+		fprintf(stderr, "%-*s%-*s%s",
+			e->tag ? 13 : 0,
+			e->tag ? : "",
+			e->tag && verbose > 0 ? 25 : 0,
+			e->tag && verbose > 0 ? perf_mem_events__name(j, NULL) : "",
+			e->supported ? ": available\n" : "");
 	}
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 14/15] perf mem: Use more generic term for LFB
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (12 preceding siblings ...)
  2022-09-28  9:58 ` [PATCH v3 13/15] perf mem/c2c: Avoid printing empty lines for unsupported events Ravi Bangoria
@ 2022-09-28  9:58 ` Ravi Bangoria
  2022-09-28  9:58 ` [PATCH v3 15/15] perf script: Add missing fields in usage hint Ravi Bangoria
  14 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:58 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

A hw component to track outstanding L1 Data Cache misses is called
LFB (Line Fill Buffer) on Intel and Arm. However similar component
exists on other arch with different names, for ex, it's called MAB
(Miss Address Buffer) on AMD. Use 'LFB/MAB' instead of just 'LFB'.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/mem-events.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 4553b4389b17..a1838a641777 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -282,7 +282,7 @@ static const char * const mem_lvl[] = {
 	"HIT",
 	"MISS",
 	"L1",
-	"LFB",
+	"LFB/MAB",
 	"L2",
 	"L3",
 	"Local RAM",
@@ -298,7 +298,7 @@ static const char * const mem_lvlnum[] = {
 	[PERF_MEM_LVLNUM_EXTN_MEM] = "Ext Mem",
 	[PERF_MEM_LVLNUM_IO] = "I/O",
 	[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
-	[PERF_MEM_LVLNUM_LFB] = "LFB",
+	[PERF_MEM_LVLNUM_LFB] = "LFB/MAB",
 	[PERF_MEM_LVLNUM_RAM] = "RAM",
 	[PERF_MEM_LVLNUM_PMEM] = "PMEM",
 	[PERF_MEM_LVLNUM_NA] = "N/A",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 15/15] perf script: Add missing fields in usage hint
  2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (13 preceding siblings ...)
  2022-09-28  9:58 ` [PATCH v3 14/15] perf mem: Use more generic term for LFB Ravi Bangoria
@ 2022-09-28  9:58 ` Ravi Bangoria
  14 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-28  9:58 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Few fields are missing in the usage message printed when wrong
field option is passed. Add them in the list.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-script.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 13580a9c50b8..b2bb3395e775 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3844,9 +3844,10 @@ int cmd_script(int argc, const char **argv)
 		     "Valid types: hw,sw,trace,raw,synth. "
 		     "Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,"
 		     "addr,symoff,srcline,period,iregs,uregs,brstack,"
-		     "brstacksym,flags,bpf-output,brstackinsn,brstackinsnlen,brstackoff,"
-		     "callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc,tod,"
-		     "data_page_size,code_page_size,ins_lat",
+		     "brstacksym,flags,data_src,weight,bpf-output,brstackinsn,"
+		     "brstackinsnlen,brstackoff,callindent,insn,insnlen,synth,"
+		     "phys_addr,metric,misc,srccode,ipc,tod,data_page_size,"
+		     "code_page_size,ins_lat",
 		     parse_output_fields),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  2022-09-28  9:57 ` [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions Ravi Bangoria
@ 2022-09-30  4:41   ` Namhyung Kim
  2022-09-30  4:48     ` Ravi Bangoria
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  1 sibling, 1 reply; 41+ messages in thread
From: Namhyung Kim @ 2022-09-30  4:41 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	Stephane Eranian, Ian Rogers, Joe Mario, Leo Yan, alisaidi,
	Andi Kleen, Kan Liang, dave.hansen, H. Peter Anvin, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, x86, linux-perf-users, linux-kernel,
	Sandipan Das, ananth.narayan, Kim Phillips, santosh.shukla

Hi Ravi,

On Wed, Sep 28, 2022 at 2:59 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> IBS_OP_DATA2 DataSrc provides detail about location of the data
> being accessed from by load ops. Define macros for legacy and
> extended DataSrc values.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/include/asm/amd-ibs.h | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
> index f3eb098d63d4..cb2a5e113daa 100644
> --- a/arch/x86/include/asm/amd-ibs.h
> +++ b/arch/x86/include/asm/amd-ibs.h
> @@ -6,6 +6,22 @@
>
>  #include <asm/msr-index.h>
>
> +/* IBS_OP_DATA2 DataSrc */
> +#define IBS_DATA_SRC_LOC_CACHE                  2
> +#define IBS_DATA_SRC_DRAM                       3
> +#define IBS_DATA_SRC_REM_CACHE                  4
> +#define IBS_DATA_SRC_IO                                 7
> +
> +/* IBS_OP_DATA2 DataSrc Extension */
> +#define IBS_DATA_SRC_EXT_LOC_CACHE              1
> +#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE                 2
> +#define IBS_DATA_SRC_EXT_DRAM                   3
> +#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE          5

Is 4 undefined intentionally?

Thanks,
Namhyung


> +#define IBS_DATA_SRC_EXT_PMEM                   6
> +#define IBS_DATA_SRC_EXT_IO                     7
> +#define IBS_DATA_SRC_EXT_EXT_MEM                8
> +#define IBS_DATA_SRC_EXT_PEER_AGENT_MEM                12
> +
>  /*
>   * IBS Hardware MSRs
>   */
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  2022-09-30  4:41   ` Namhyung Kim
@ 2022-09-30  4:48     ` Ravi Bangoria
  2022-09-30  5:11       ` Namhyung Kim
  0 siblings, 1 reply; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-30  4:48 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	Stephane Eranian, Ian Rogers, Joe Mario, Leo Yan, alisaidi,
	Andi Kleen, Kan Liang, dave.hansen, H. Peter Anvin, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, x86, linux-perf-users, linux-kernel,
	Sandipan Das, ananth.narayan, Kim Phillips, santosh.shukla,
	Ravi Bangoria

On 30-Sep-22 10:11 AM, Namhyung Kim wrote:
> Hi Ravi,
> 
> On Wed, Sep 28, 2022 at 2:59 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>> IBS_OP_DATA2 DataSrc provides detail about location of the data
>> being accessed from by load ops. Define macros for legacy and
>> extended DataSrc values.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>> ---
>>  arch/x86/include/asm/amd-ibs.h | 16 ++++++++++++++++
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
>> index f3eb098d63d4..cb2a5e113daa 100644
>> --- a/arch/x86/include/asm/amd-ibs.h
>> +++ b/arch/x86/include/asm/amd-ibs.h
>> @@ -6,6 +6,22 @@
>>
>>  #include <asm/msr-index.h>
>>
>> +/* IBS_OP_DATA2 DataSrc */
>> +#define IBS_DATA_SRC_LOC_CACHE                  2
>> +#define IBS_DATA_SRC_DRAM                       3
>> +#define IBS_DATA_SRC_REM_CACHE                  4
>> +#define IBS_DATA_SRC_IO                                 7
>> +
>> +/* IBS_OP_DATA2 DataSrc Extension */
>> +#define IBS_DATA_SRC_EXT_LOC_CACHE              1
>> +#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE                 2
>> +#define IBS_DATA_SRC_EXT_DRAM                   3
>> +#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE          5
> 
> Is 4 undefined intentionally?

Yes, Here is the snippet from PPR (Processor Programming Reference) doc:

  Values | Description
  ---------------------------------------------------------------------
  0h     | No valid status.
  1h     | Local L3 or other L1/L2 in CCX.
  2h     | Another CCX cache in the same NUMA node.
  3h     | DRAM.
  4h     | Reserved.
  5h     | Another CCX cache in a different NUMA node.
  6h     | DRAM address map with "long latency" bit set.
  7h     | MMIO/Config/PCI/APIC.
  8h     | Extension Memory (S-Link, GenZ, etc - identified by the CS
         | target and/or address map at DF's choice).
  9h-Bh  | Reserved.
  Ch     | Peer Agent Memory.
  Dh-1Fh | Reserved.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  2022-09-28  9:57 ` [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR Ravi Bangoria
@ 2022-09-30  4:59   ` Namhyung Kim
  2022-09-30  5:05     ` Ravi Bangoria
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  2022-09-30 17:02   ` [PATCH v3 06/15] " Jiri Olsa
  2 siblings, 1 reply; 41+ messages in thread
From: Namhyung Kim @ 2022-09-30  4:59 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	Stephane Eranian, Ian Rogers, Joe Mario, Leo Yan, alisaidi,
	Andi Kleen, Kan Liang, dave.hansen, H. Peter Anvin, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, x86, linux-perf-users, linux-kernel,
	Sandipan Das, ananth.narayan, Kim Phillips, santosh.shukla

On Wed, Sep 28, 2022 at 3:00 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> IBS_DC_PHYSADDR provides the physical data address for the tagged load/
> store operation. Populate perf sample physical address using it.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/ibs.c | 8 +++++++-
>  kernel/events/core.c      | 3 ++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 0ad49105c154..3271735f0070 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -989,6 +989,11 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
>                 data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
>                 data->sample_flags |= PERF_SAMPLE_ADDR;
>         }
> +
> +       if (sample_type & PERF_SAMPLE_PHYS_ADDR && op_data3.dc_phy_addr_valid) {
> +               data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
> +               data->sample_flags |= PERF_SAMPLE_PHYS_ADDR;
> +       }
>  }
>
>  static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
> @@ -998,7 +1003,8 @@ static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
>             (perf_ibs == &perf_ibs_op &&
>              (sample_type & PERF_SAMPLE_DATA_SRC ||
>               sample_type & PERF_SAMPLE_WEIGHT_TYPE ||
> -             sample_type & PERF_SAMPLE_ADDR)))
> +             sample_type & PERF_SAMPLE_ADDR ||
> +             sample_type & PERF_SAMPLE_PHYS_ADDR)))
>                 return perf_ibs->offset_max;
>         else if (check_rip)
>                 return 3;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index e1ffdb861b53..49bc3b5e6c8a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7435,7 +7435,8 @@ void perf_prepare_sample(struct perf_event_header *header,
>                 header->size += size;
>         }
>
> -       if (sample_type & PERF_SAMPLE_PHYS_ADDR)
> +       if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
> +           filtered_sample_type & PERF_SAMPLE_PHYS_ADDR)

It'd be enough to check the filtered_sample_type only.

Thanks,
Namhyung


>                 data->phys_addr = perf_virt_to_phys(data->addr);
>
>  #ifdef CONFIG_CGROUP_PERF
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  2022-09-30  4:59   ` Namhyung Kim
@ 2022-09-30  5:05     ` Ravi Bangoria
  0 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-30  5:05 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	Stephane Eranian, Ian Rogers, Joe Mario, Leo Yan, alisaidi,
	Andi Kleen, Kan Liang, dave.hansen, H. Peter Anvin, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, x86, linux-perf-users, linux-kernel,
	Sandipan Das, ananth.narayan, Kim Phillips, santosh.shukla,
	Ravi Bangoria

>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index e1ffdb861b53..49bc3b5e6c8a 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -7435,7 +7435,8 @@ void perf_prepare_sample(struct perf_event_header *header,
>>                 header->size += size;
>>         }
>>
>> -       if (sample_type & PERF_SAMPLE_PHYS_ADDR)
>> +       if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
>> +           filtered_sample_type & PERF_SAMPLE_PHYS_ADDR)
> 
> It'd be enough to check the filtered_sample_type only.

+1. Will fix it.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 04/15] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  2022-09-28  9:57 ` [PATCH v3 04/15] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} Ravi Bangoria
@ 2022-09-30  5:09   ` Namhyung Kim
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
  1 sibling, 0 replies; 41+ messages in thread
From: Namhyung Kim @ 2022-09-30  5:09 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	Stephane Eranian, Ian Rogers, Joe Mario, Leo Yan, alisaidi,
	Andi Kleen, Kan Liang, dave.hansen, H. Peter Anvin, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, x86, linux-perf-users, linux-kernel,
	Sandipan Das, ananth.narayan, Kim Phillips, santosh.shukla

On Wed, Sep 28, 2022 at 2:59 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> IbsDcMissLat indicates the number of clock cycles from when a miss is
> detected in the data cache to when the data was delivered to the core.
> Similarly, IbsTagToRetCtr provides number of cycles from when the op
> was tagged to when the op was retired. Consider these fields for
> sample->weight.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
> Note:
> While opening a new event, perf tool starts with a set of attributes
> and goes on reverting some attributes in a predefined order until it
> succeeds or run out or all attempts. Here, 1st attempt includes both
> WEIGHT_STRUCT and exclude_guest which always fails because IBS does
> not support guest filtering. The problem however is, perf reverts
> WEIGHT_STRUCT but keeps trying with exclude_guest. Thus, although,
> this patch enables WEIGHT_STRUCT support from kernel, using it from
> the perf tool needs more changes(not included in this series).

Yeah, it'd be nice if kernel could expose more pmu capabilities like
no-exclude then tools can skip setting it for them.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  2022-09-30  4:48     ` Ravi Bangoria
@ 2022-09-30  5:11       ` Namhyung Kim
  2022-09-30  6:16         ` Ravi Bangoria
  0 siblings, 1 reply; 41+ messages in thread
From: Namhyung Kim @ 2022-09-30  5:11 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	Stephane Eranian, Ian Rogers, Joe Mario, Leo Yan, alisaidi,
	Andi Kleen, Kan Liang, dave.hansen, H. Peter Anvin, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, x86, linux-perf-users, linux-kernel,
	Sandipan Das, ananth.narayan, Kim Phillips, santosh.shukla

On Thu, Sep 29, 2022 at 9:49 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> On 30-Sep-22 10:11 AM, Namhyung Kim wrote:
> > Hi Ravi,
> >
> > On Wed, Sep 28, 2022 at 2:59 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>
> >> IBS_OP_DATA2 DataSrc provides detail about location of the data
> >> being accessed from by load ops. Define macros for legacy and
> >> extended DataSrc values.
> >>
> >> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> >> ---
> >>  arch/x86/include/asm/amd-ibs.h | 16 ++++++++++++++++
> >>  1 file changed, 16 insertions(+)
> >>
> >> diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
> >> index f3eb098d63d4..cb2a5e113daa 100644
> >> --- a/arch/x86/include/asm/amd-ibs.h
> >> +++ b/arch/x86/include/asm/amd-ibs.h
> >> @@ -6,6 +6,22 @@
> >>
> >>  #include <asm/msr-index.h>
> >>
> >> +/* IBS_OP_DATA2 DataSrc */
> >> +#define IBS_DATA_SRC_LOC_CACHE                  2
> >> +#define IBS_DATA_SRC_DRAM                       3
> >> +#define IBS_DATA_SRC_REM_CACHE                  4
> >> +#define IBS_DATA_SRC_IO                                 7
> >> +
> >> +/* IBS_OP_DATA2 DataSrc Extension */
> >> +#define IBS_DATA_SRC_EXT_LOC_CACHE              1
> >> +#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE                 2
> >> +#define IBS_DATA_SRC_EXT_DRAM                   3
> >> +#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE          5
> >
> > Is 4 undefined intentionally?
>
> Yes, Here is the snippet from PPR (Processor Programming Reference) doc:
>
>   Values | Description
>   ---------------------------------------------------------------------
>   0h     | No valid status.
>   1h     | Local L3 or other L1/L2 in CCX.
>   2h     | Another CCX cache in the same NUMA node.
>   3h     | DRAM.
>   4h     | Reserved.
>   5h     | Another CCX cache in a different NUMA node.
>   6h     | DRAM address map with "long latency" bit set.
>   7h     | MMIO/Config/PCI/APIC.
>   8h     | Extension Memory (S-Link, GenZ, etc - identified by the CS
>          | target and/or address map at DF's choice).
>   9h-Bh  | Reserved.
>   Ch     | Peer Agent Memory.
>   Dh-1Fh | Reserved.

Thanks for sharing it.  It's a bit confusing since it was available before.

Anyway, is the PPR for Zen4 publicly available now?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  2022-09-30  5:11       ` Namhyung Kim
@ 2022-09-30  6:16         ` Ravi Bangoria
  0 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-30  6:16 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	Stephane Eranian, Ian Rogers, Joe Mario, Leo Yan, alisaidi,
	Andi Kleen, Kan Liang, dave.hansen, H. Peter Anvin, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, x86, linux-perf-users, linux-kernel,
	Sandipan Das, ananth.narayan, Kim Phillips, santosh.shukla,
	Ravi Bangoria

>>>> +/* IBS_OP_DATA2 DataSrc */
>>>> +#define IBS_DATA_SRC_LOC_CACHE                  2
>>>> +#define IBS_DATA_SRC_DRAM                       3
>>>> +#define IBS_DATA_SRC_REM_CACHE                  4
>>>> +#define IBS_DATA_SRC_IO                                 7
>>>> +
>>>> +/* IBS_OP_DATA2 DataSrc Extension */
>>>> +#define IBS_DATA_SRC_EXT_LOC_CACHE              1
>>>> +#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE                 2
>>>> +#define IBS_DATA_SRC_EXT_DRAM                   3
>>>> +#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE          5
>>>
>>> Is 4 undefined intentionally?
>>
>> Yes, Here is the snippet from PPR (Processor Programming Reference) doc:
>>
>>   Values | Description
>>   ---------------------------------------------------------------------
>>   0h     | No valid status.
>>   1h     | Local L3 or other L1/L2 in CCX.
>>   2h     | Another CCX cache in the same NUMA node.
>>   3h     | DRAM.
>>   4h     | Reserved.
>>   5h     | Another CCX cache in a different NUMA node.
>>   6h     | DRAM address map with "long latency" bit set.
>>   7h     | MMIO/Config/PCI/APIC.
>>   8h     | Extension Memory (S-Link, GenZ, etc - identified by the CS
>>          | target and/or address map at DF's choice).
>>   9h-Bh  | Reserved.
>>   Ch     | Peer Agent Memory.
>>   Dh-1Fh | Reserved.
> 
> Thanks for sharing it.  It's a bit confusing since it was available before.

Right, these bit definitions have changed in Zen4.

> 
> Anyway, is the PPR for Zen4 publicly available now?

Sadly, no. But it's in progress.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip: perf/core] perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
  2022-09-28  9:57 ` [PATCH v3 07/15] perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file Ravi Bangoria
@ 2022-09-30  9:31   ` tip-bot2 for Ravi Bangoria
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-09-30  9:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     cfef80bad4cf79cdc964a53c98254dfa462be83f
Gitweb:        https://git.kernel.org/tip/cfef80bad4cf79cdc964a53c98254dfa462be83f
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Wed, 28 Sep 2022 15:27:57 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 29 Sep 2022 12:20:56 +02:00

perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file

PERF_MEM_SNOOPX_PEER is defined only in tools uapi header. Although
it's used only by perf tool, not defining it in kernel header can
create problems in future.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-8-ravi.bangoria@amd.com
---
 include/uapi/linux/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 4ae3c24..85be78e 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1356,7 +1356,7 @@ union perf_mem_data_src {
 #define PERF_MEM_SNOOP_SHIFT	19
 
 #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
 #define PERF_MEM_SNOOPX_SHIFT  38
 
 /* locked instruction */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: perf/core] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  2022-09-28  9:57 ` [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR Ravi Bangoria
  2022-09-30  4:59   ` Namhyung Kim
@ 2022-09-30  9:31   ` tip-bot2 for Ravi Bangoria
  2022-09-30 17:02   ` [PATCH v3 06/15] " Jiri Olsa
  2 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-09-30  9:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     5b26af6d2b7854639ddf893366bbca7e74fa7c54
Gitweb:        https://git.kernel.org/tip/5b26af6d2b7854639ddf893366bbca7e74fa7c54
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Wed, 28 Sep 2022 15:27:56 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 29 Sep 2022 12:20:56 +02:00

perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR

IBS_DC_PHYSADDR provides the physical data address for the tagged load/
store operation. Populate perf sample physical address using it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-7-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c | 8 +++++++-
 kernel/events/core.c      | 3 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 0ad4910..3271735 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -989,6 +989,11 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
 		data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
 		data->sample_flags |= PERF_SAMPLE_ADDR;
 	}
+
+	if (sample_type & PERF_SAMPLE_PHYS_ADDR && op_data3.dc_phy_addr_valid) {
+		data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
+		data->sample_flags |= PERF_SAMPLE_PHYS_ADDR;
+	}
 }
 
 static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
@@ -998,7 +1003,8 @@ static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
 	    (perf_ibs == &perf_ibs_op &&
 	     (sample_type & PERF_SAMPLE_DATA_SRC ||
 	      sample_type & PERF_SAMPLE_WEIGHT_TYPE ||
-	      sample_type & PERF_SAMPLE_ADDR)))
+	      sample_type & PERF_SAMPLE_ADDR ||
+	      sample_type & PERF_SAMPLE_PHYS_ADDR)))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e1ffdb8..49bc3b5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7435,7 +7435,8 @@ void perf_prepare_sample(struct perf_event_header *header,
 		header->size += size;
 	}
 
-	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
+	if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
+	    filtered_sample_type & PERF_SAMPLE_PHYS_ADDR)
 		data->phys_addr = perf_virt_to_phys(data->addr);
 
 #ifdef CONFIG_CGROUP_PERF

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: perf/core] perf/x86/amd: Support PERF_SAMPLE_ADDR
  2022-09-28  9:57 ` [PATCH v3 05/15] perf/x86/amd: Support PERF_SAMPLE_ADDR Ravi Bangoria
@ 2022-09-30  9:31   ` tip-bot2 for Ravi Bangoria
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-09-30  9:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     cb2bb85f7ed8740ab5fc06bbec386faa39ba44ef
Gitweb:        https://git.kernel.org/tip/cb2bb85f7ed8740ab5fc06bbec386faa39ba44ef
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Wed, 28 Sep 2022 15:27:55 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 29 Sep 2022 12:20:55 +02:00

perf/x86/amd: Support PERF_SAMPLE_ADDR

IBS_DC_LINADDR provides the linear data address for the tagged load/
store operation. Populate perf sample address using it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-6-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d883694..0ad4910 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -984,6 +984,11 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
 		}
 		data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
 	}
+
+	if (sample_type & PERF_SAMPLE_ADDR && op_data3.dc_lin_addr_valid) {
+		data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
+		data->sample_flags |= PERF_SAMPLE_ADDR;
+	}
 }
 
 static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
@@ -992,7 +997,8 @@ static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
 	if (sample_type & PERF_SAMPLE_RAW ||
 	    (perf_ibs == &perf_ibs_op &&
 	     (sample_type & PERF_SAMPLE_DATA_SRC ||
-	      sample_type & PERF_SAMPLE_WEIGHT_TYPE)))
+	      sample_type & PERF_SAMPLE_WEIGHT_TYPE ||
+	      sample_type & PERF_SAMPLE_ADDR)))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: perf/core] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  2022-09-28  9:57 ` [PATCH v3 04/15] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} Ravi Bangoria
  2022-09-30  5:09   ` Namhyung Kim
@ 2022-09-30  9:31   ` tip-bot2 for Ravi Bangoria
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-09-30  9:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     6b2ae4952ef8ac23b467bc10776404092b581143
Gitweb:        https://git.kernel.org/tip/6b2ae4952ef8ac23b467bc10776404092b581143
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Wed, 28 Sep 2022 15:27:54 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 29 Sep 2022 12:20:55 +02:00

perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}

IbsDcMissLat indicates the number of clock cycles from when a miss is
detected in the data cache to when the data was delivered to the core.
Similarly, IbsTagToRetCtr provides number of cycles from when the op
was tagged to when the op was retired. Consider these fields for
sample->weight.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-5-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e20caa5..d883694 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -955,6 +955,7 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
 {
 	union ibs_op_data3 op_data3;
 	union ibs_op_data2 op_data2;
+	union ibs_op_data op_data;
 
 	data->data_src.val = PERF_MEM_NA;
 	op_data3.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
@@ -970,6 +971,19 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
 		perf_ibs_get_data_src(ibs_data, data, &op_data2, &op_data3);
 		data->sample_flags |= PERF_SAMPLE_DATA_SRC;
 	}
+
+	if (sample_type & PERF_SAMPLE_WEIGHT_TYPE && op_data3.dc_miss &&
+	    data->data_src.mem_op == PERF_MEM_OP_LOAD) {
+		op_data.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA)];
+
+		if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
+			data->weight.var1_dw = op_data3.dc_miss_lat;
+			data->weight.var2_w = op_data.tag_to_ret_ctr;
+		} else if (sample_type & PERF_SAMPLE_WEIGHT) {
+			data->weight.full = op_data3.dc_miss_lat;
+		}
+		data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+	}
 }
 
 static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
@@ -977,7 +991,8 @@ static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
 {
 	if (sample_type & PERF_SAMPLE_RAW ||
 	    (perf_ibs == &perf_ibs_op &&
-	     sample_type & PERF_SAMPLE_DATA_SRC))
+	     (sample_type & PERF_SAMPLE_DATA_SRC ||
+	      sample_type & PERF_SAMPLE_WEIGHT_TYPE)))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: perf/core] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
  2022-09-28  9:57 ` [PATCH v3 03/15] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC Ravi Bangoria
@ 2022-09-30  9:31   ` tip-bot2 for Ravi Bangoria
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-09-30  9:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     7c10dd0a88b1cc6ae4637fffb494c5e080027eb6
Gitweb:        https://git.kernel.org/tip/7c10dd0a88b1cc6ae4637fffb494c5e080027eb6
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Wed, 28 Sep 2022 15:27:53 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 29 Sep 2022 12:20:55 +02:00

perf/x86/amd: Support PERF_SAMPLE_DATA_SRC

struct perf_mem_data_src is used to pass arch specific memory access
details into generic form. These details gets consumed by tools like
perf mem and c2c. IBS tagged load/store sample provides most of the
information needed for these tools. Add a logic to convert IBS
specific raw data into perf_mem_data_src.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-4-ravi.bangoria@amd.com
---
 arch/x86/events/amd/ibs.c | 318 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 312 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index c29a006..e20caa5 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -678,6 +678,312 @@ static struct perf_ibs perf_ibs_op = {
 	.get_count		= get_ibs_op_count,
 };
 
+static void perf_ibs_get_mem_op(union ibs_op_data3 *op_data3,
+				struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_op = PERF_MEM_OP_NA;
+
+	if (op_data3->ld_op)
+		data_src->mem_op = PERF_MEM_OP_LOAD;
+	else if (op_data3->st_op)
+		data_src->mem_op = PERF_MEM_OP_STORE;
+}
+
+/*
+ * Processors having CPUID_Fn8000001B_EAX[11] aka IBS_CAPS_ZEN4 has
+ * more fine granular DataSrc encodings. Others have coarse.
+ */
+static u8 perf_ibs_data_src(union ibs_op_data2 *op_data2)
+{
+	if (ibs_caps & IBS_CAPS_ZEN4)
+		return (op_data2->data_src_hi << 3) | op_data2->data_src_lo;
+
+	return op_data2->data_src_lo;
+}
+
+static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
+				 union ibs_op_data3 *op_data3,
+				 struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u8 ibs_data_src = perf_ibs_data_src(op_data2);
+
+	data_src->mem_lvl = 0;
+
+	/*
+	 * DcMiss, L2Miss, DataSrc, DcMissLat etc. are all invalid for Uncached
+	 * memory accesses. So, check DcUcMemAcc bit early.
+	 */
+	if (op_data3->dc_uc_mem_acc && ibs_data_src != IBS_DATA_SRC_EXT_IO) {
+		data_src->mem_lvl = PERF_MEM_LVL_UNC | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* L1 Hit */
+	if (op_data3->dc_miss == 0) {
+		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* L2 Hit */
+	if (op_data3->l2_miss == 0) {
+		/* Erratum #1293 */
+		if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xF ||
+		    !(op_data3->sw_pf || op_data3->dc_miss_no_mab_alloc)) {
+			data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/*
+	 * OP_DATA2 is valid only for load ops. Skip all checks which
+	 * uses OP_DATA2[DataSrc].
+	 */
+	if (data_src->mem_op != PERF_MEM_OP_LOAD)
+		goto check_mab;
+
+	/* L3 Hit */
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	} else {
+		if (ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_REM_CCE1 |
+					    PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/* A peer cache in a near CCX */
+	if (ibs_caps & IBS_CAPS_ZEN4 &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE) {
+		data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* A peer cache in a far CCX */
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	} else {
+		if (ibs_data_src == IBS_DATA_SRC_REM_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/* DRAM */
+	if (ibs_data_src == IBS_DATA_SRC_EXT_DRAM) {
+		if (op_data2->rmt_node == 0)
+			data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
+		else
+			data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* PMEM */
+	if (ibs_caps & IBS_CAPS_ZEN4 && ibs_data_src == IBS_DATA_SRC_EXT_PMEM) {
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_PMEM;
+		if (op_data2->rmt_node) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+	/* Extension Memory */
+	if (ibs_caps & IBS_CAPS_ZEN4 &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_EXTN_MEM;
+		if (op_data2->rmt_node) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+	/* IO */
+	if (ibs_data_src == IBS_DATA_SRC_EXT_IO) {
+		data_src->mem_lvl = PERF_MEM_LVL_IO;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_IO;
+		if (op_data2->rmt_node) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+check_mab:
+	/*
+	 * MAB (Miss Address Buffer) Hit. MAB keeps track of outstanding
+	 * DC misses. However, such data may come from any level in mem
+	 * hierarchy. IBS provides detail about both MAB as well as actual
+	 * DataSrc simultaneously. Prioritize DataSrc over MAB, i.e. set
+	 * MAB only when IBS fails to provide DataSrc.
+	 */
+	if (op_data3->dc_miss_no_mab_alloc) {
+		data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	data_src->mem_lvl = PERF_MEM_LVL_NA;
+}
+
+static bool perf_ibs_cache_hit_st_valid(void)
+{
+	/* 0: Uninitialized, 1: Valid, -1: Invalid */
+	static int cache_hit_st_valid;
+
+	if (unlikely(!cache_hit_st_valid)) {
+		if (boot_cpu_data.x86 == 0x19 &&
+		    (boot_cpu_data.x86_model <= 0xF ||
+		    (boot_cpu_data.x86_model >= 0x20 &&
+		     boot_cpu_data.x86_model <= 0x5F))) {
+			cache_hit_st_valid = -1;
+		} else {
+			cache_hit_st_valid = 1;
+		}
+	}
+
+	return cache_hit_st_valid == 1;
+}
+
+static void perf_ibs_get_mem_snoop(union ibs_op_data2 *op_data2,
+				   struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u8 ibs_data_src;
+
+	data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+
+	if (!perf_ibs_cache_hit_st_valid() ||
+	    data_src->mem_op != PERF_MEM_OP_LOAD ||
+	    data_src->mem_lvl & PERF_MEM_LVL_L1 ||
+	    data_src->mem_lvl & PERF_MEM_LVL_L2 ||
+	    op_data2->cache_hit_st)
+		return;
+
+	ibs_data_src = perf_ibs_data_src(op_data2);
+
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE ||
+		    ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE ||
+		    ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE)
+			data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+	} else if (ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
+		data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+	}
+}
+
+static void perf_ibs_get_tlb_lvl(union ibs_op_data3 *op_data3,
+				 struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_dtlb = PERF_MEM_TLB_NA;
+
+	if (!op_data3->dc_lin_addr_valid)
+		return;
+
+	if (!op_data3->dc_l1tlb_miss) {
+		data_src->mem_dtlb = PERF_MEM_TLB_L1 | PERF_MEM_TLB_HIT;
+		return;
+	}
+
+	if (!op_data3->dc_l2tlb_miss) {
+		data_src->mem_dtlb = PERF_MEM_TLB_L2 | PERF_MEM_TLB_HIT;
+		return;
+	}
+
+	data_src->mem_dtlb = PERF_MEM_TLB_L2 | PERF_MEM_TLB_MISS;
+}
+
+static void perf_ibs_get_mem_lock(union ibs_op_data3 *op_data3,
+				  struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_lock = PERF_MEM_LOCK_NA;
+
+	if (op_data3->dc_locked_op)
+		data_src->mem_lock = PERF_MEM_LOCK_LOCKED;
+}
+
+#define ibs_op_msr_idx(msr)	(msr - MSR_AMD64_IBSOPCTL)
+
+static void perf_ibs_get_data_src(struct perf_ibs_data *ibs_data,
+				  struct perf_sample_data *data,
+				  union ibs_op_data2 *op_data2,
+				  union ibs_op_data3 *op_data3)
+{
+	perf_ibs_get_mem_lvl(op_data2, op_data3, data);
+	perf_ibs_get_mem_snoop(op_data2, data);
+	perf_ibs_get_tlb_lvl(op_data3, data);
+	perf_ibs_get_mem_lock(op_data3, data);
+}
+
+static __u64 perf_ibs_get_op_data2(struct perf_ibs_data *ibs_data,
+				   union ibs_op_data3 *op_data3)
+{
+	__u64 val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA2)];
+
+	/* Erratum #1293 */
+	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model <= 0xF &&
+	    (op_data3->sw_pf || op_data3->dc_miss_no_mab_alloc)) {
+		/*
+		 * OP_DATA2 has only two fields on Zen3: DataSrc and RmtNode.
+		 * DataSrc=0 is 'No valid status' and RmtNode is invalid when
+		 * DataSrc=0.
+		 */
+		val = 0;
+	}
+	return val;
+}
+
+static void perf_ibs_parse_ld_st_data(__u64 sample_type,
+				      struct perf_ibs_data *ibs_data,
+				      struct perf_sample_data *data)
+{
+	union ibs_op_data3 op_data3;
+	union ibs_op_data2 op_data2;
+
+	data->data_src.val = PERF_MEM_NA;
+	op_data3.val = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+
+	perf_ibs_get_mem_op(&op_data3, data);
+	if (data->data_src.mem_op != PERF_MEM_OP_LOAD &&
+	    data->data_src.mem_op != PERF_MEM_OP_STORE)
+		return;
+
+	op_data2.val = perf_ibs_get_op_data2(ibs_data, &op_data3);
+
+	if (sample_type & PERF_SAMPLE_DATA_SRC) {
+		perf_ibs_get_data_src(ibs_data, data, &op_data2, &op_data3);
+		data->sample_flags |= PERF_SAMPLE_DATA_SRC;
+	}
+}
+
+static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
+				   int check_rip)
+{
+	if (sample_type & PERF_SAMPLE_RAW ||
+	    (perf_ibs == &perf_ibs_op &&
+	     sample_type & PERF_SAMPLE_DATA_SRC))
+		return perf_ibs->offset_max;
+	else if (check_rip)
+		return 3;
+	return 1;
+}
+
 static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 {
 	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
@@ -725,12 +1031,9 @@ fail:
 	size = 1;
 	offset = 1;
 	check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
-	if (event->attr.sample_type & PERF_SAMPLE_RAW)
-		offset_max = perf_ibs->offset_max;
-	else if (check_rip)
-		offset_max = 3;
-	else
-		offset_max = 1;
+
+	offset_max = perf_ibs_get_offset_max(perf_ibs, event->attr.sample_type, check_rip);
+
 	do {
 		rdmsrl(msr + offset, *buf++);
 		size++;
@@ -784,6 +1087,9 @@ fail:
 		data.sample_flags |= PERF_SAMPLE_RAW;
 	}
 
+	if (perf_ibs == &perf_ibs_op)
+		perf_ibs_parse_ld_st_data(event->attr.sample_type, &ibs_data, &data);
+
 	/*
 	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
 	 * recorded as part of interrupt regs. Thus we need to use rip from

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: perf/core] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  2022-09-28  9:57 ` [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions Ravi Bangoria
  2022-09-30  4:41   ` Namhyung Kim
@ 2022-09-30  9:31   ` tip-bot2 for Ravi Bangoria
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-09-30  9:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     610c238041fbc682936d34132362a54a802600fe
Gitweb:        https://git.kernel.org/tip/610c238041fbc682936d34132362a54a802600fe
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Wed, 28 Sep 2022 15:27:52 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 29 Sep 2022 12:20:54 +02:00

perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions

IBS_OP_DATA2 DataSrc provides detail about location of the data
being accessed from by load ops. Define macros for legacy and
extended DataSrc values.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-3-ravi.bangoria@amd.com
---
 arch/x86/include/asm/amd-ibs.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
index f3eb098..cb2a5e1 100644
--- a/arch/x86/include/asm/amd-ibs.h
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -6,6 +6,22 @@
 
 #include <asm/msr-index.h>
 
+/* IBS_OP_DATA2 DataSrc */
+#define IBS_DATA_SRC_LOC_CACHE			 2
+#define IBS_DATA_SRC_DRAM			 3
+#define IBS_DATA_SRC_REM_CACHE			 4
+#define IBS_DATA_SRC_IO				 7
+
+/* IBS_OP_DATA2 DataSrc Extension */
+#define IBS_DATA_SRC_EXT_LOC_CACHE		 1
+#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE		 2
+#define IBS_DATA_SRC_EXT_DRAM			 3
+#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE		 5
+#define IBS_DATA_SRC_EXT_PMEM			 6
+#define IBS_DATA_SRC_EXT_IO			 7
+#define IBS_DATA_SRC_EXT_EXT_MEM		 8
+#define IBS_DATA_SRC_EXT_PEER_AGENT_MEM		12
+
 /*
  * IBS Hardware MSRs
  */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: perf/core] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-09-28  9:57 ` [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
@ 2022-09-30  9:31   ` tip-bot2 for Ravi Bangoria
  2022-09-30 10:48   ` [PATCH v3 01/15] " kajoljain
  1 sibling, 0 replies; 41+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-09-30  9:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     ee3e88dfec23153d0675b5d00522297b9adf657c
Gitweb:        https://git.kernel.org/tip/ee3e88dfec23153d0675b5d00522297b9adf657c
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Wed, 28 Sep 2022 15:27:51 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 29 Sep 2022 12:20:54 +02:00

perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}

PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
accesses but it can not distinguish between local and remote IO.
Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220928095805.596-2-ravi.bangoria@amd.com
---
 include/uapi/linux/perf_event.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e639c74..4ae3c24 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1336,7 +1336,9 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
-/* 5-0xa available */
+/* 5-0x8 available */
+#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
+#define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
 #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */
 #define PERF_MEM_LVLNUM_RAM	0x0d /* RAM */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-09-28  9:57 ` [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
@ 2022-09-30 10:48   ` kajoljain
  2022-09-30 12:50     ` Ravi Bangoria
  1 sibling, 1 reply; 41+ messages in thread
From: kajoljain @ 2022-09-30 10:48 UTC (permalink / raw)
  To: Ravi Bangoria, peterz, acme
  Cc: jolsa, namhyung, eranian, irogers, jmario, leo.yan, alisaidi, ak,
	kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla



On 9/28/22 15:27, Ravi Bangoria wrote:
> PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
> extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
> accesses but it can not distinguish between local and remote IO.
> Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
> PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  include/uapi/linux/perf_event.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index e639c74cf5fb..4ae3c249f675 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -1336,7 +1336,9 @@ union perf_mem_data_src {
>  #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
>  #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
>  #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
> -/* 5-0xa available */
> +/* 5-0x8 available */
> +#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */

Hi Ravi,
    Here we are adding entry explicitly for accesses to Extension memory
like CXL. In future if we want to extend it for cache or other accesses
, we again need to add new entries.
Can we rather add single entry say PERF_MEM_LVLNUM_EXTN and further can
use reserved bits to specify memory/cache?

Thanks,
Kajol Jain

> +#define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
>  #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
>  #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */
>  #define PERF_MEM_LVLNUM_RAM	0x0d /* RAM */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-09-30 10:48   ` [PATCH v3 01/15] " kajoljain
@ 2022-09-30 12:50     ` Ravi Bangoria
  2022-09-30 14:17       ` Liang, Kan
  0 siblings, 1 reply; 41+ messages in thread
From: Ravi Bangoria @ 2022-09-30 12:50 UTC (permalink / raw)
  To: kajoljain, peterz, acme
  Cc: jolsa, namhyung, eranian, irogers, jmario, leo.yan, alisaidi, ak,
	kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla, Ravi Bangoria

On 30-Sep-22 4:18 PM, kajoljain wrote:
> 
> 
> On 9/28/22 15:27, Ravi Bangoria wrote:
>> PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
>> extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
>> accesses but it can not distinguish between local and remote IO.
>> Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
>> PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>> ---
>>  include/uapi/linux/perf_event.h | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index e639c74cf5fb..4ae3c249f675 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -1336,7 +1336,9 @@ union perf_mem_data_src {
>>  #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
>>  #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
>>  #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
>> -/* 5-0xa available */
>> +/* 5-0x8 available */
>> +#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
> 
> Hi Ravi,
>     Here we are adding entry explicitly for accesses to Extension memory
> like CXL. In future if we want to extend it for cache or other accesses
> , we again need to add new entries.
> Can we rather add single entry say PERF_MEM_LVLNUM_EXTN and further can
> use reserved bits to specify memory/cache?

Is everybody okay with this:

#define PERF_MEM_LVLNUM_EXTN	0x09 /* CXL */

And a 3 bit variable to define what type of cxl that would be:

#define PERF_MEM_EXTN_CXL_ANY	0x1
#define PERF_MEM_EXTN_CXL_MEM	0x2
#define PERF_MEM_EXTN_CXL_CACHE	0x2
#define PERF_MEM_EXTN_CXL_IO	0x3

Peter, Shall I send this as addon patch series or are you okay reverting
current patches?

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-09-30 12:50     ` Ravi Bangoria
@ 2022-09-30 14:17       ` Liang, Kan
  2022-10-01  6:37         ` Ravi Bangoria
  0 siblings, 1 reply; 41+ messages in thread
From: Liang, Kan @ 2022-09-30 14:17 UTC (permalink / raw)
  To: Ravi Bangoria, kajoljain, peterz, acme
  Cc: jolsa, namhyung, eranian, irogers, jmario, leo.yan, alisaidi, ak,
	dave.hansen, hpa, mingo, mark.rutland, alexander.shishkin, tglx,
	bp, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla



On 2022-09-30 8:50 a.m., Ravi Bangoria wrote:
> On 30-Sep-22 4:18 PM, kajoljain wrote:
>>
>>
>> On 9/28/22 15:27, Ravi Bangoria wrote:
>>> PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
>>> extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
>>> accesses but it can not distinguish between local and remote IO.
>>> Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
>>> PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.
>>>
>>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>>> ---
>>>  include/uapi/linux/perf_event.h | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>>> index e639c74cf5fb..4ae3c249f675 100644
>>> --- a/include/uapi/linux/perf_event.h
>>> +++ b/include/uapi/linux/perf_event.h
>>> @@ -1336,7 +1336,9 @@ union perf_mem_data_src {
>>>  #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
>>>  #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
>>>  #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
>>> -/* 5-0xa available */
>>> +/* 5-0x8 available */
>>> +#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
>>
>> Hi Ravi,
>>     Here we are adding entry explicitly for accesses to Extension memory
>> like CXL. In future if we want to extend it for cache or other accesses
>> , we again need to add new entries.
>> Can we rather add single entry say PERF_MEM_LVLNUM_EXTN and further can
>> use reserved bits to specify memory/cache?
> 
> Is everybody okay with this:
> 
> #define PERF_MEM_LVLNUM_EXTN	0x09 /* CXL */

I think a generic name, PERF_MEM_LVLNUM_EXTN, only make sense, when it
wants to include all the types of the Extension memory, e.g., CXL, PMEM,
HBM, etc. Then we can set this bit and the corresponding CXL bits to
understand the real source. Is it the case here?

But if it's only for the CXL, I think it's better to use a dedicated
name, PERF_MEM_LVLNUM_CXL. (as we did for PMEM, PERF_MEM_LVLNUM_PMEM).
If so, I don't think we need the PERF_MEM_EXTN_CXL_ANY.

Thanks,
Kan

> 
> And a 3 bit variable to define what type of cxl that would be:
> 
> #define PERF_MEM_EXTN_CXL_ANY	0x1
> #define PERF_MEM_EXTN_CXL_MEM	0x2
> #define PERF_MEM_EXTN_CXL_CACHE	0x2
> #define PERF_MEM_EXTN_CXL_IO	0x3
> 
> Peter, Shall I send this as addon patch series or are you okay reverting
> current patches?
> 
> Thanks,
> Ravi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  2022-09-28  9:57 ` [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR Ravi Bangoria
  2022-09-30  4:59   ` Namhyung Kim
  2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
@ 2022-09-30 17:02   ` Jiri Olsa
  2 siblings, 0 replies; 41+ messages in thread
From: Jiri Olsa @ 2022-09-30 17:02 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, acme, namhyung, eranian, irogers, jmario, leo.yan,
	alisaidi, ak, kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla

hi,
there's typo in the subject 'PHYS' ;-)

jirka

On Wed, Sep 28, 2022 at 03:27:56PM +0530, Ravi Bangoria wrote:
> IBS_DC_PHYSADDR provides the physical data address for the tagged load/
> store operation. Populate perf sample physical address using it.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/ibs.c | 8 +++++++-
>  kernel/events/core.c      | 3 ++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 0ad49105c154..3271735f0070 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -989,6 +989,11 @@ static void perf_ibs_parse_ld_st_data(__u64 sample_type,
>  		data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
>  		data->sample_flags |= PERF_SAMPLE_ADDR;
>  	}
> +
> +	if (sample_type & PERF_SAMPLE_PHYS_ADDR && op_data3.dc_phy_addr_valid) {
> +		data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
> +		data->sample_flags |= PERF_SAMPLE_PHYS_ADDR;
> +	}
>  }
>  
>  static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
> @@ -998,7 +1003,8 @@ static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
>  	    (perf_ibs == &perf_ibs_op &&
>  	     (sample_type & PERF_SAMPLE_DATA_SRC ||
>  	      sample_type & PERF_SAMPLE_WEIGHT_TYPE ||
> -	      sample_type & PERF_SAMPLE_ADDR)))
> +	      sample_type & PERF_SAMPLE_ADDR ||
> +	      sample_type & PERF_SAMPLE_PHYS_ADDR)))
>  		return perf_ibs->offset_max;
>  	else if (check_rip)
>  		return 3;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index e1ffdb861b53..49bc3b5e6c8a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7435,7 +7435,8 @@ void perf_prepare_sample(struct perf_event_header *header,
>  		header->size += size;
>  	}
>  
> -	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
> +	if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
> +	    filtered_sample_type & PERF_SAMPLE_PHYS_ADDR)
>  		data->phys_addr = perf_virt_to_phys(data->addr);
>  
>  #ifdef CONFIG_CGROUP_PERF
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-09-30 14:17       ` Liang, Kan
@ 2022-10-01  6:37         ` Ravi Bangoria
  2022-10-03 13:15           ` Liang, Kan
                             ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-10-01  6:37 UTC (permalink / raw)
  To: Liang, Kan, kajoljain, peterz, acme
  Cc: jolsa, namhyung, eranian, irogers, jmario, leo.yan, alisaidi, ak,
	dave.hansen, hpa, mingo, mark.rutland, alexander.shishkin, tglx,
	bp, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria

On 30-Sep-22 7:47 PM, Liang, Kan wrote:
> 
> 
> On 2022-09-30 8:50 a.m., Ravi Bangoria wrote:
>> On 30-Sep-22 4:18 PM, kajoljain wrote:
>>>
>>>
>>> On 9/28/22 15:27, Ravi Bangoria wrote:
>>>> PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
>>>> extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
>>>> accesses but it can not distinguish between local and remote IO.
>>>> Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
>>>> PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.
>>>>
>>>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>>>> ---
>>>>  include/uapi/linux/perf_event.h | 4 +++-
>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>>>> index e639c74cf5fb..4ae3c249f675 100644
>>>> --- a/include/uapi/linux/perf_event.h
>>>> +++ b/include/uapi/linux/perf_event.h
>>>> @@ -1336,7 +1336,9 @@ union perf_mem_data_src {
>>>>  #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
>>>>  #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
>>>>  #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
>>>> -/* 5-0xa available */
>>>> +/* 5-0x8 available */
>>>> +#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
>>>
>>> Hi Ravi,
>>>     Here we are adding entry explicitly for accesses to Extension memory
>>> like CXL. In future if we want to extend it for cache or other accesses
>>> , we again need to add new entries.
>>> Can we rather add single entry say PERF_MEM_LVLNUM_EXTN and further can
>>> use reserved bits to specify memory/cache?
>>
>> Is everybody okay with this:
>>
>> #define PERF_MEM_LVLNUM_EXTN	0x09 /* CXL */
> 
> I think a generic name, PERF_MEM_LVLNUM_EXTN, only make sense, when it
> wants to include all the types of the Extension memory, e.g., CXL, PMEM,
> HBM, etc. Then we can set this bit and the corresponding CXL bits to
> understand the real source. Is it the case here?
> 
> But if it's only for the CXL, I think it's better to use a dedicated
> name, PERF_MEM_LVLNUM_CXL. (as we did for PMEM, PERF_MEM_LVLNUM_PMEM).
> If so, I don't think we need the PERF_MEM_EXTN_CXL_ANY.

Ok. For now, I think below is good enough? Later we can introduce new
variable to provide type of cxl device.


From 5deb2055e2b5b0a61403f2d5f4e5a784b14a65e3 Mon Sep 17 00:00:00 2001
From: Ravi Bangoria <ravi.bangoria@amd.com>
Date: Sat, 1 Oct 2022 11:37:05 +0530
Subject: [PATCH] perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to
 PERF_MEM_LVLNUM_CXL

PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
bit ambiguous name and also not generic enough to cover cxl.cache and
cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c       | 2 +-
 include/uapi/linux/perf_event.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 3271735f0070..4cb710efbdd9 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -801,7 +801,7 @@ static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
 	/* Extension Memory */
 	if (ibs_caps & IBS_CAPS_ZEN4 &&
 	    ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
-		data_src->mem_lvl_num = PERF_MEM_LVLNUM_EXTN_MEM;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_CXL;
 		if (op_data2->rmt_node) {
 			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
 			/* IBS doesn't provide Remote socket detail */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 85be78e0e7f6..eb1090604d53 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1337,7 +1337,7 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
 /* 5-0x8 available */
-#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
+#define PERF_MEM_LVLNUM_CXL	0x09 /* CXL */
 #define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
 #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-10-01  6:37         ` Ravi Bangoria
@ 2022-10-03 13:15           ` Liang, Kan
  2022-10-06 11:38             ` Ravi Bangoria
  2022-10-27  8:25           ` Peter Zijlstra
  2022-10-28  6:41           ` [tip: perf/urgent] perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL tip-bot2 for Ravi Bangoria
  2 siblings, 1 reply; 41+ messages in thread
From: Liang, Kan @ 2022-10-03 13:15 UTC (permalink / raw)
  To: Ravi Bangoria, kajoljain, peterz, acme
  Cc: jolsa, namhyung, eranian, irogers, jmario, leo.yan, alisaidi, ak,
	dave.hansen, hpa, mingo, mark.rutland, alexander.shishkin, tglx,
	bp, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla



On 2022-10-01 2:37 a.m., Ravi Bangoria wrote:
> On 30-Sep-22 7:47 PM, Liang, Kan wrote:
>>
>>
>> On 2022-09-30 8:50 a.m., Ravi Bangoria wrote:
>>> On 30-Sep-22 4:18 PM, kajoljain wrote:
>>>>
>>>>
>>>> On 9/28/22 15:27, Ravi Bangoria wrote:
>>>>> PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
>>>>> extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
>>>>> accesses but it can not distinguish between local and remote IO.
>>>>> Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
>>>>> PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.
>>>>>
>>>>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>>>>> ---
>>>>>  include/uapi/linux/perf_event.h | 4 +++-
>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>>>>> index e639c74cf5fb..4ae3c249f675 100644
>>>>> --- a/include/uapi/linux/perf_event.h
>>>>> +++ b/include/uapi/linux/perf_event.h
>>>>> @@ -1336,7 +1336,9 @@ union perf_mem_data_src {
>>>>>  #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
>>>>>  #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
>>>>>  #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
>>>>> -/* 5-0xa available */
>>>>> +/* 5-0x8 available */
>>>>> +#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
>>>>
>>>> Hi Ravi,
>>>>     Here we are adding entry explicitly for accesses to Extension memory
>>>> like CXL. In future if we want to extend it for cache or other accesses
>>>> , we again need to add new entries.
>>>> Can we rather add single entry say PERF_MEM_LVLNUM_EXTN and further can
>>>> use reserved bits to specify memory/cache?
>>>
>>> Is everybody okay with this:
>>>
>>> #define PERF_MEM_LVLNUM_EXTN	0x09 /* CXL */
>>
>> I think a generic name, PERF_MEM_LVLNUM_EXTN, only make sense, when it
>> wants to include all the types of the Extension memory, e.g., CXL, PMEM,
>> HBM, etc. Then we can set this bit and the corresponding CXL bits to
>> understand the real source. Is it the case here?
>>
>> But if it's only for the CXL, I think it's better to use a dedicated
>> name, PERF_MEM_LVLNUM_CXL. (as we did for PMEM, PERF_MEM_LVLNUM_PMEM).
>> If so, I don't think we need the PERF_MEM_EXTN_CXL_ANY.
> 
> Ok. For now, I think below is good enough? Later we can introduce new
> variable to provide type of cxl device.
> 
> 
> From 5deb2055e2b5b0a61403f2d5f4e5a784b14a65e3 Mon Sep 17 00:00:00 2001
> From: Ravi Bangoria <ravi.bangoria@amd.com>
> Date: Sat, 1 Oct 2022 11:37:05 +0530
> Subject: [PATCH] perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to
>  PERF_MEM_LVLNUM_CXL
> 
> PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
> bit ambiguous name and also not generic enough to cover cxl.cache and
> cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.

Looks good to me.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>

Thanks,
Kan

> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/ibs.c       | 2 +-
>  include/uapi/linux/perf_event.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 3271735f0070..4cb710efbdd9 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -801,7 +801,7 @@ static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
>  	/* Extension Memory */
>  	if (ibs_caps & IBS_CAPS_ZEN4 &&
>  	    ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
> -		data_src->mem_lvl_num = PERF_MEM_LVLNUM_EXTN_MEM;
> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_CXL;
>  		if (op_data2->rmt_node) {
>  			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
>  			/* IBS doesn't provide Remote socket detail */
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 85be78e0e7f6..eb1090604d53 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -1337,7 +1337,7 @@ union perf_mem_data_src {
>  #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
>  #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
>  /* 5-0x8 available */
> -#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
> +#define PERF_MEM_LVLNUM_CXL	0x09 /* CXL */
>  #define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
>  #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
>  #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-10-03 13:15           ` Liang, Kan
@ 2022-10-06 11:38             ` Ravi Bangoria
  2022-10-14 13:53               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 41+ messages in thread
From: Ravi Bangoria @ 2022-10-06 11:38 UTC (permalink / raw)
  To: Liang, Kan, peterz, acme
  Cc: jolsa, namhyung, eranian, irogers, jmario, leo.yan, alisaidi, ak,
	dave.hansen, hpa, mingo, mark.rutland, alexander.shishkin, tglx,
	bp, x86, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, kim.phillips, santosh.shukla, kajoljain,
	Ravi Bangoria

On 03-Oct-22 6:45 PM, Liang, Kan wrote:
> 
> 
> On 2022-10-01 2:37 a.m., Ravi Bangoria wrote:
>> On 30-Sep-22 7:47 PM, Liang, Kan wrote:
>>>
>>>
>>> On 2022-09-30 8:50 a.m., Ravi Bangoria wrote:
>>>> On 30-Sep-22 4:18 PM, kajoljain wrote:
>>>>>
>>>>>
>>>>> On 9/28/22 15:27, Ravi Bangoria wrote:
>>>>>> PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
>>>>>> extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
>>>>>> accesses but it can not distinguish between local and remote IO.
>>>>>> Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
>>>>>> PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.
>>>>>>
>>>>>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>>>>>> ---
>>>>>>  include/uapi/linux/perf_event.h | 4 +++-
>>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>>>>>> index e639c74cf5fb..4ae3c249f675 100644
>>>>>> --- a/include/uapi/linux/perf_event.h
>>>>>> +++ b/include/uapi/linux/perf_event.h
>>>>>> @@ -1336,7 +1336,9 @@ union perf_mem_data_src {
>>>>>>  #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
>>>>>>  #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
>>>>>>  #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
>>>>>> -/* 5-0xa available */
>>>>>> +/* 5-0x8 available */
>>>>>> +#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
>>>>>
>>>>> Hi Ravi,
>>>>>     Here we are adding entry explicitly for accesses to Extension memory
>>>>> like CXL. In future if we want to extend it for cache or other accesses
>>>>> , we again need to add new entries.
>>>>> Can we rather add single entry say PERF_MEM_LVLNUM_EXTN and further can
>>>>> use reserved bits to specify memory/cache?
>>>>
>>>> Is everybody okay with this:
>>>>
>>>> #define PERF_MEM_LVLNUM_EXTN	0x09 /* CXL */
>>>
>>> I think a generic name, PERF_MEM_LVLNUM_EXTN, only make sense, when it
>>> wants to include all the types of the Extension memory, e.g., CXL, PMEM,
>>> HBM, etc. Then we can set this bit and the corresponding CXL bits to
>>> understand the real source. Is it the case here?
>>>
>>> But if it's only for the CXL, I think it's better to use a dedicated
>>> name, PERF_MEM_LVLNUM_CXL. (as we did for PMEM, PERF_MEM_LVLNUM_PMEM).
>>> If so, I don't think we need the PERF_MEM_EXTN_CXL_ANY.
>>
>> Ok. For now, I think below is good enough? Later we can introduce new
>> variable to provide type of cxl device.
>>
>>
>> From 5deb2055e2b5b0a61403f2d5f4e5a784b14a65e3 Mon Sep 17 00:00:00 2001
>> From: Ravi Bangoria <ravi.bangoria@amd.com>
>> Date: Sat, 1 Oct 2022 11:37:05 +0530
>> Subject: [PATCH] perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to
>>  PERF_MEM_LVLNUM_CXL
>>
>> PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
>> bit ambiguous name and also not generic enough to cover cxl.cache and
>> cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.
> 
> Looks good to me.
> 
> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>

Thanks Kan.

Peter, can you please include this patch along with the series?

Arnaldo, I'll respin tool side of patches with this change.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-10-06 11:38             ` Ravi Bangoria
@ 2022-10-14 13:53               ` Arnaldo Carvalho de Melo
  2022-10-14 15:04                 ` Ravi Bangoria
  0 siblings, 1 reply; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-10-14 13:53 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Liang, Kan, peterz, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla, kajoljain

Em Thu, Oct 06, 2022 at 05:08:55PM +0530, Ravi Bangoria escreveu:
> On 03-Oct-22 6:45 PM, Liang, Kan wrote:
> >> PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
> >> bit ambiguous name and also not generic enough to cover cxl.cache and
> >> cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.

> > Looks good to me.

> > Reviewed-by: Kan Liang <kan.liang@linux.intel.com>

> Thanks Kan.

> Peter, can you please include this patch along with the series?

> Arnaldo, I'll respin tool side of patches with this change.

Its already upstream, so please go on from there, ok?

I'm now processing perf patches after getting lost in pahole land for a
bit :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-10-14 13:53               ` Arnaldo Carvalho de Melo
@ 2022-10-14 15:04                 ` Ravi Bangoria
  0 siblings, 0 replies; 41+ messages in thread
From: Ravi Bangoria @ 2022-10-14 15:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Liang, Kan, peterz, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla, kajoljain, Ravi Bangoria

> Its already upstream, so please go on from there, ok?

Right. Only PERF_MEM_LVLNUM_EXTN_MEM -> PERF_MEM_LVLNUM_CXL kernel side
patch is pending. Tool side already uses PERF_MEM_LVLNUM_CXL macro.

Peter, let me know if you want me to resend.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-10-01  6:37         ` Ravi Bangoria
  2022-10-03 13:15           ` Liang, Kan
@ 2022-10-27  8:25           ` Peter Zijlstra
  2022-10-28  6:41           ` [tip: perf/urgent] perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL tip-bot2 for Ravi Bangoria
  2 siblings, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2022-10-27  8:25 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Liang, Kan, kajoljain, acme, jolsa, namhyung, eranian, irogers,
	jmario, leo.yan, alisaidi, ak, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

On Sat, Oct 01, 2022 at 12:07:40PM +0530, Ravi Bangoria wrote:

> From 5deb2055e2b5b0a61403f2d5f4e5a784b14a65e3 Mon Sep 17 00:00:00 2001
> From: Ravi Bangoria <ravi.bangoria@amd.com>
> Date: Sat, 1 Oct 2022 11:37:05 +0530
> Subject: [PATCH] perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to
>  PERF_MEM_LVLNUM_CXL
> 
> PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
> bit ambiguous name and also not generic enough to cover cxl.cache and
> cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>

Thanks!

> ---
>  arch/x86/events/amd/ibs.c       | 2 +-
>  include/uapi/linux/perf_event.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 3271735f0070..4cb710efbdd9 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -801,7 +801,7 @@ static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
>  	/* Extension Memory */
>  	if (ibs_caps & IBS_CAPS_ZEN4 &&
>  	    ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
> -		data_src->mem_lvl_num = PERF_MEM_LVLNUM_EXTN_MEM;
> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_CXL;
>  		if (op_data2->rmt_node) {
>  			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
>  			/* IBS doesn't provide Remote socket detail */
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 85be78e0e7f6..eb1090604d53 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -1337,7 +1337,7 @@ union perf_mem_data_src {
>  #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
>  #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
>  /* 5-0x8 available */
> -#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
> +#define PERF_MEM_LVLNUM_CXL	0x09 /* CXL */
>  #define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
>  #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
>  #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip: perf/urgent] perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
  2022-10-01  6:37         ` Ravi Bangoria
  2022-10-03 13:15           ` Liang, Kan
  2022-10-27  8:25           ` Peter Zijlstra
@ 2022-10-28  6:41           ` tip-bot2 for Ravi Bangoria
  2 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Ravi Bangoria @ 2022-10-28  6:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ravi Bangoria, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     cb6c18b5a41622c7a439508f7421f8766a91cb87
Gitweb:        https://git.kernel.org/tip/cb6c18b5a41622c7a439508f7421f8766a91cb87
Author:        Ravi Bangoria <ravi.bangoria@amd.com>
AuthorDate:    Sat, 01 Oct 2022 11:37:05 +05:30
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 27 Oct 2022 10:27:32 +02:00

perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL

PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
bit ambiguous name and also not generic enough to cover cxl.cache and
cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/f6268268-b4e9-9ed6-0453-65792644d953@amd.com
---
 arch/x86/events/amd/ibs.c       | 2 +-
 include/uapi/linux/perf_event.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 3271735..4cb710e 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -801,7 +801,7 @@ static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
 	/* Extension Memory */
 	if (ibs_caps & IBS_CAPS_ZEN4 &&
 	    ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
-		data_src->mem_lvl_num = PERF_MEM_LVLNUM_EXTN_MEM;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_CXL;
 		if (op_data2->rmt_node) {
 			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
 			/* IBS doesn't provide Remote socket detail */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 85be78e..ccb7f5d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1337,7 +1337,7 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
 /* 5-0x8 available */
-#define PERF_MEM_LVLNUM_EXTN_MEM 0x09 /* Extension memory */
+#define PERF_MEM_LVLNUM_CXL	0x09 /* CXL */
 #define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
 #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2022-10-28  6:42 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-30 10:48   ` [PATCH v3 01/15] " kajoljain
2022-09-30 12:50     ` Ravi Bangoria
2022-09-30 14:17       ` Liang, Kan
2022-10-01  6:37         ` Ravi Bangoria
2022-10-03 13:15           ` Liang, Kan
2022-10-06 11:38             ` Ravi Bangoria
2022-10-14 13:53               ` Arnaldo Carvalho de Melo
2022-10-14 15:04                 ` Ravi Bangoria
2022-10-27  8:25           ` Peter Zijlstra
2022-10-28  6:41           ` [tip: perf/urgent] perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions Ravi Bangoria
2022-09-30  4:41   ` Namhyung Kim
2022-09-30  4:48     ` Ravi Bangoria
2022-09-30  5:11       ` Namhyung Kim
2022-09-30  6:16         ` Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 03/15] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 04/15] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} Ravi Bangoria
2022-09-30  5:09   ` Namhyung Kim
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 05/15] perf/x86/amd: Support PERF_SAMPLE_ADDR Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR Ravi Bangoria
2022-09-30  4:59   ` Namhyung Kim
2022-09-30  5:05     ` Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-30 17:02   ` [PATCH v3 06/15] " Jiri Olsa
2022-09-28  9:57 ` [PATCH v3 07/15] perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 08/15] perf tool: Sync include/uapi/linux/perf_event.h header Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 09/15] perf tool: Sync arch/x86/include/asm/amd-ibs.h header Ravi Bangoria
2022-09-28  9:58 ` [PATCH v3 10/15] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
2022-09-28  9:58 ` [PATCH v3 11/15] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
2022-09-28  9:58 ` [PATCH v3 12/15] perf mem/c2c: Add load store event mappings for AMD Ravi Bangoria
2022-09-28  9:58 ` [PATCH v3 13/15] perf mem/c2c: Avoid printing empty lines for unsupported events Ravi Bangoria
2022-09-28  9:58 ` [PATCH v3 14/15] perf mem: Use more generic term for LFB Ravi Bangoria
2022-09-28  9:58 ` [PATCH v3 15/15] perf script: Add missing fields in usage hint Ravi Bangoria

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.