All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] perf mem/c2c: Add support for AMD
@ 2022-05-25  9:39 Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 01/13] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
                   ` (12 more replies)
  0 siblings, 13 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Enable support for these tools on
AMD Zen processors based on IBS Op pmu.

There are some limitations though: Only load/store instructions provide
mem/c2c information. However, IBS does not provide a way to choose a
particular type of instruction to tag. This results in many non-LS
instructions being tagged which appear as N/A. IBS, being an uncore pmu
from kernel point of view[1], does not support per process monitoring.
Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.

Example:
  $ sudo ./perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo ./perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit            10  HitM
  Remote Cache (2 hops) hit             6  N/A
  Uncached hit                         4  N/A

[1]: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com

Prepared on tip/perf/core (bae19fdd7e9e)

Ravi Bangoria (13):
  perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf/x86/amd: Add IBS OP_DATA2/3 register bit definitions
  perf/x86/amd: Support PERF_SAMPLE_DATA_SRC based on IBS_OP_DATA*
  perf/x86/amd: Support PERF_SAMPLE_WEIGHT using IBS
    OP_DATA3[IbsDcMissLat]
  perf/x86/amd: Support PERF_SAMPLE_ADDR using IBS_DC_LINADDR
  perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR
  perf tool: Sync include/uapi/linux/perf_event.h header
  perf tool: Sync arch/x86/include/asm/amd-ibs.h header
  perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
  perf mem/c2c: Add load store event mappings for AMD
  perf mem/c2c: Avoid printing empty lines for unsupported events
  perf mem: Use more generic term for LFB

 arch/x86/events/amd/ibs.c                | 351 ++++++++++++++++++++++-
 arch/x86/include/asm/amd-ibs.h           |  76 +++++
 include/uapi/linux/perf_event.h          |   4 +-
 tools/arch/x86/include/asm/amd-ibs.h     |  76 +++++
 tools/include/uapi/linux/perf_event.h    |   4 +-
 tools/perf/Documentation/perf-c2c.txt    |  14 +-
 tools/perf/Documentation/perf-mem.txt    |   3 +-
 tools/perf/Documentation/perf-record.txt |   1 +
 tools/perf/arch/x86/util/mem-events.c    |  31 +-
 tools/perf/builtin-c2c.c                 |   1 +
 tools/perf/builtin-mem.c                 |   1 +
 tools/perf/util/mem-events.c             |  17 +-
 12 files changed, 557 insertions(+), 22 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/13] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 02/13] perf/x86/amd: Add IBS OP_DATA2/3 register bit definitions Ravi Bangoria
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to
extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO
accesses but it can not distinguish between local and remote IO.
Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with
PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 include/uapi/linux/perf_event.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d37629dbad72..1c3157c1be9d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1292,7 +1292,9 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
-/* 5-0xa available */
+/* 5-0x8 available */
+#define PERF_MEM_LVLNUM_EXTN_MEM 0x9 /* Extension memory */
+#define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
 #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */
 #define PERF_MEM_LVLNUM_RAM	0x0d /* RAM */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/13] perf/x86/amd: Add IBS OP_DATA2/3 register bit definitions
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 01/13] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-26 15:08   ` Kim Phillips
  2022-05-25  9:39 ` [PATCH 03/13] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC based on IBS_OP_DATA* Ravi Bangoria
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

AMD IBS OP_DATA2 and OP_DATA3 provides detail about tagged load/store
ops. Add definitions for these registers into header file. In addition
to those, IBS_OP_DATA2 DataSrc provides detail about location of the
data being accessed from by load ops. Define macros for legacy and
extended DataSrc values.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/include/asm/amd-ibs.h | 76 ++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
index aabdbb5ab920..22184fe20cf0 100644
--- a/arch/x86/include/asm/amd-ibs.h
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -6,6 +6,82 @@
 
 #include <asm/msr-index.h>
 
+/* IBS_OP_DATA2 Bits */
+#define IBS_DATA_SRC_HI_SHIFT			6
+#define IBS_DATA_SRC_HI_MASK			(0x3ULL << IBS_DATA_SRC_HI_SHIFT)
+#define IBS_CACHE_HIT_ST_SHIFT			5
+#define IBS_CACHE_HIT_ST_MASK			(0x1ULL << IBS_CACHE_HIT_ST_SHIFT)
+#define IBS_RMT_NODE_SHIFT			4
+#define IBS_RMT_NODE_MASK			(0x1ULL << IBS_RMT_NODE_SHIFT)
+#define IBS_DATA_SRC_LO_SHIFT			0
+#define IBS_DATA_SRC_LO_MASK			(0x7ULL << IBS_DATA_SRC_LO_SHIFT)
+
+/* IBS_OP_DATA2 DataSrc */
+#define IBS_DATA_SRC_LOC_CACHE			 2
+#define IBS_DATA_SRC_DRAM			 3
+#define IBS_DATA_SRC_REM_CACHE			 4
+#define IBS_DATA_SRC_IO				 7
+
+/* IBS_OP_DATA2 with DataSrc Extension */
+#define IBS_DATA_SRC_EXT_LOC_CACHE		 1
+#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE		 2
+#define IBS_DATA_SRC_EXT_DRAM			 3
+#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE		 5
+#define IBS_DATA_SRC_EXT_PMEM			 6
+#define IBS_DATA_SRC_EXT_IO			 7
+#define IBS_DATA_SRC_EXT_EXT_MEM		 8
+#define IBS_DATA_SRC_EXT_PEER_AGENT_MEM		12
+
+/* IBS_OP_DATA3 Bits */
+#define IBS_TLB_REFILL_LAT_SHIFT		48
+#define IBS_TLB_REFILL_LAT_MASK			(0xFFFFULL << IBS_TLB_REFILL_LAT_SHIFT)
+#define IBS_DC_MISS_LAT_SHIFT			32
+#define IBS_DC_MISS_LAT_MASK			(0xFFFFULL << IBS_DC_MISS_LAT_SHIFT)
+#define IBS_OP_DC_MISS_OPEN_MEM_REQS_SHIFT	26
+#define IBS_OP_DC_MISS_OPEN_MEM_REQS_MASK	(0x3FULL << IBS_OP_DC_MISS_OPEN_MEM_REQS_SHIFT)
+#define IBS_OP_MEM_WIDTH_SHIFT			22
+#define IBS_OP_MEM_WIDTH_MASK			(0xFULL << IBS_OP_MEM_WIDTH_SHIFT)
+#define IBS_SW_PF_SHIFT				21
+#define IBS_SW_PF_MASK				(0x1ULL << IBS_SW_PF_SHIFT)
+#define IBS_L2_MISS_SHIFT			20
+#define IBS_L2_MISS_MASK			(0x1ULL << IBS_L2_MISS_SHIFT)
+#define IBS_DC_L2_TLB_HIT_1G_SHIFT		19
+#define IBS_DC_L2_TLB_HIT_1G_MASK		(0x1ULL << IBS_DC_L2_TLB_HIT_1G_SHIFT)
+#define IBS_DC_PHY_ADDR_VALID_SHIFT		18
+#define IBS_DC_PHY_ADDR_VALID_MASK		(0x1ULL << IBS_DC_PHY_ADDR_VALID_SHIFT)
+#define IBS_DC_LIN_ADDR_VALID_SHIFT		17
+#define IBS_DC_LIN_ADDR_VALID_MASK		(0x1ULL << IBS_DC_LIN_ADDR_VALID_SHIFT)
+#define IBS_DC_MISS_NO_MAB_ALLOC_SHIFT		16
+#define IBS_DC_MISS_NO_MAB_ALLOC_MASK		(0x1ULL << IBS_DC_MISS_NO_MAB_ALLOC_SHIFT)
+#define IBS_DC_LOCKED_OP_SHIFT			15
+#define IBS_DC_LOCKED_OP_MASK			(0x1ULL << IBS_DC_LOCKED_OP_SHIFT)
+#define IBS_DC_UC_MEM_ACC_SHIFT			14
+#define IBS_DC_UC_MEM_ACC_MASK			(0x1ULL << IBS_DC_UC_MEM_ACC_SHIFT)
+#define IBS_DC_WC_MEM_ACC_SHIFT			13
+#define IBS_DC_WC_MEM_ACC_MASK			(0x1ULL << IBS_DC_WC_MEM_ACC_SHIFT)
+#define IBS_DC_MIS_ACC_SHIFT			8
+#define IBS_DC_MIS_ACC_MASK			(0x1ULL << IBS_DC_MIS_ACC_SHIFT)
+#define IBS_DC_MISS_SHIFT			7
+#define IBS_DC_MISS_MASK			(0x1ULL << IBS_DC_MISS_SHIFT)
+#define IBS_DC_L2_TLB_HIT_2M_SHIFT		6
+#define IBS_DC_L2_TLB_HIT_2M_MASK		(0x1ULL << IBS_DC_L2_TLB_HIT_2M_SHIFT)
+/*
+ * Definition of 5-4 bits is different between Zen3 and Zen4 (Zen2 definition
+ * is same as Zen4) but the end result is same. So using Zen4 definition here.
+ */
+#define IBS_DC_L1_TLB_HIT_1G_SHIFT		5
+#define IBS_DC_L1_TLB_HIT_1G_MASK		(0x1ULL << IBS_DC_L1_TLB_HIT_1G_SHIFT)
+#define IBS_DC_L1_TLB_HIT_2M_SHIFT		4
+#define IBS_DC_L1_TLB_HIT_2M_MASK		(0x1ULL << IBS_DC_L1_TLB_HIT_2M_SHIFT)
+#define IBS_DC_L2_TLB_MISS_SHIFT		3
+#define IBS_DC_L2_TLB_MISS_MASK			(0x1ULL << IBS_DC_L2_TLB_MISS_SHIFT)
+#define IBS_DC_L1_TLB_MISS_SHIFT		2
+#define IBS_DC_L1_TLB_MISS_MASK			(0x1ULL << IBS_DC_L1_TLB_MISS_SHIFT)
+#define IBS_ST_OP_SHIFT				1
+#define IBS_ST_OP_MASK				(0x1ULL << IBS_ST_OP_SHIFT)
+#define IBS_LD_OP_SHIFT				0
+#define IBS_LD_OP_MASK				(0x1ULL << IBS_LD_OP_SHIFT)
+
 /*
  * IBS Hardware MSRs
  */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/13] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC based on IBS_OP_DATA*
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 01/13] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 02/13] perf/x86/amd: Add IBS OP_DATA2/3 register bit definitions Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 04/13] perf/x86/amd: Support PERF_SAMPLE_WEIGHT using IBS OP_DATA3[IbsDcMissLat] Ravi Bangoria
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

struct perf_mem_data_src is used to pass arch specific memory access
details into generic form. These details gets consumed by tools like
perf mem and c2c. Each IBS tagged load/store sample provides most of
the information needed for these tools. Add a logic to convert IBS
specific raw data into perf_mem_data_src.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 297 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 291 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index c251bc44c088..6626caeed6a1 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -688,6 +688,289 @@ static struct perf_ibs perf_ibs_op = {
 	.get_count		= get_ibs_op_count,
 };
 
+static void perf_ibs_get_mem_op(u64 op_data3, struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_op = PERF_MEM_OP_NA;
+
+	if (op_data3 & IBS_LD_OP_MASK)
+		data_src->mem_op = PERF_MEM_OP_LOAD;
+	else if (op_data3 & IBS_ST_OP_MASK)
+		data_src->mem_op = PERF_MEM_OP_STORE;
+}
+
+/*
+ * Processors having CPUID_Fn8000001B_EAX[11] aka IBS_CAPS_ZEN4 has
+ * more fine granular DataSrc encodings. Others have coarse.
+ */
+static u8 perf_ibs_data_src(u64 op_data2)
+{
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		return ((op_data2 & IBS_DATA_SRC_HI_MASK) >> (IBS_DATA_SRC_HI_SHIFT - 3)) |
+		       ((op_data2 & IBS_DATA_SRC_LO_MASK) >> IBS_DATA_SRC_LO_SHIFT);
+	}
+
+	return (op_data2 & IBS_DATA_SRC_LO_MASK) >> IBS_DATA_SRC_LO_SHIFT;
+}
+
+static void perf_ibs_get_mem_lvl(struct perf_event *event, u64 op_data2,
+				 u64 op_data3, struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u8 ibs_data_src = perf_ibs_data_src(op_data2);
+
+	data_src->mem_lvl = 0;
+
+	/*
+	 * DcMiss, L2Miss, DataSrc, DcMissLat etc. are all invalid for Uncached
+	 * memory accesses. So, check DcUcMemAcc bit early.
+	 */
+	if (op_data3 & IBS_DC_UC_MEM_ACC_MASK &&
+	    ibs_data_src != IBS_DATA_SRC_EXT_IO) {
+		data_src->mem_lvl = PERF_MEM_LVL_UNC | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* L1 Hit */
+	if ((op_data3 & IBS_DC_MISS_MASK) == 0) {
+		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* L2 Hit */
+	if ((op_data3 & IBS_L2_MISS_MASK) == 0) {
+		/* Erratum #1293 */
+		if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xF ||
+		    !(op_data3 & IBS_SW_PF_MASK || op_data3 & IBS_DC_MISS_NO_MAB_ALLOC_MASK)) {
+			data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/* L3 Hit */
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+		    ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	} else {
+		if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+		    ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_REM_CCE1 |
+					    PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/* A peer cache in a near CCX. */
+	if (ibs_caps & IBS_CAPS_ZEN4 && data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE) {
+		data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* A peer cache in a far CCX. */
+	if (ibs_caps & IBS_CAPS_ZEN4) {
+		if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+		    ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	} else {
+		if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+		    ibs_data_src == IBS_DATA_SRC_REM_CACHE) {
+			data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
+			return;
+		}
+	}
+
+	/* DRAM */
+	if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_DRAM) {
+		if ((op_data2 & IBS_RMT_NODE_MASK) == 0)
+			data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
+		else
+			data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	/* PMEM */
+	if (ibs_caps & IBS_CAPS_ZEN4 && data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_PMEM) {
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_PMEM;
+		if (op_data2 & IBS_RMT_NODE_MASK) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+	/* Extension Memory */
+	if (ibs_caps & IBS_CAPS_ZEN4 && data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_EXTN_MEM;
+		if (op_data2 & IBS_RMT_NODE_MASK) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+	/* IO */
+	if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    ibs_data_src == IBS_DATA_SRC_EXT_IO) {
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_IO;
+		if (op_data2 & IBS_RMT_NODE_MASK) {
+			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+			/* IBS doesn't provide Remote socket detail */
+			data_src->mem_hops = PERF_MEM_HOPS_1;
+		}
+		return;
+	}
+
+	/*
+	 * MAB (Miss Address Buffer) Hit. MAB keeps track of outstanding
+	 * DC misses. However such data may come from any level in mem
+	 * hierarchy. IBS provides detail about both MAB as well as actual
+	 * DataSrc simultaneously. Prioritize DataSrc over MAB, i.e. set
+	 * MAB only when IBS fails to provide DataSrc.
+	 */
+	if (op_data3 & IBS_DC_MISS_NO_MAB_ALLOC_MASK) {
+		data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
+		return;
+	}
+
+	data_src->mem_lvl = PERF_MEM_LVL_NA;
+}
+
+static bool perf_ibs_cache_hit_st_valid(void)
+{
+	/* 0: Uninitialized, 1: Valid, -1: Invalid */
+	static int cache_hist_st_valid;
+
+	if (unlikely(!cache_hist_st_valid)) {
+		if (boot_cpu_data.x86 == 0x19 &&
+		    (boot_cpu_data.x86_model <= 0xF ||
+		    (boot_cpu_data.x86_model >= 0x20 &&
+		     boot_cpu_data.x86_model <= 0x5F))) {
+			cache_hist_st_valid = -1;
+		} else {
+			cache_hist_st_valid = 1;
+		}
+	}
+
+	return cache_hist_st_valid == 1;
+}
+
+static void perf_ibs_get_mem_snoop(u64 op_data2, struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u8 ibs_data_src;
+
+	data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+
+	if (!perf_ibs_cache_hit_st_valid() ||
+	    data_src->mem_op != PERF_MEM_OP_LOAD ||
+	    data_src->mem_lvl & PERF_MEM_LVL_L1 ||
+	    data_src->mem_lvl & PERF_MEM_LVL_L2 ||
+	    op_data2 & IBS_CACHE_HIT_ST_MASK)
+		return;
+
+	ibs_data_src = perf_ibs_data_src(op_data2);
+
+	if ((ibs_data_src == IBS_DATA_SRC_LOC_CACHE) ||
+	    (ibs_caps & IBS_CAPS_ZEN4 && (
+	     ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE ||
+	     ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE ||
+	     ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE))) {
+		data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
+	}
+}
+
+static void perf_ibs_get_tlb_lvl(u64 op_data3, struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u64 l1_tlb_miss = op_data3 & IBS_DC_L1_TLB_MISS_MASK;
+	u64 lin_addr_valid = op_data3 & IBS_DC_LIN_ADDR_VALID_MASK;
+	u64 l2_tlb_miss = op_data3 & IBS_DC_L2_TLB_MISS_MASK;
+
+	data_src->mem_dtlb = PERF_MEM_TLB_NA;
+
+	if (!lin_addr_valid)
+		return;
+
+	if (!l1_tlb_miss) {
+		data_src->mem_dtlb = PERF_MEM_TLB_L1 | PERF_MEM_TLB_HIT;
+		return;
+	}
+
+	if (!l2_tlb_miss) {
+		data_src->mem_dtlb = PERF_MEM_TLB_L2 | PERF_MEM_TLB_HIT;
+		return;
+	}
+
+	data_src->mem_dtlb = PERF_MEM_TLB_L2 | PERF_MEM_TLB_MISS;
+}
+
+static void perf_ibs_get_mem_lock(u64 op_data3, struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->mem_lock = PERF_MEM_LOCK_NA;
+
+	if (op_data3 & IBS_DC_LOCKED_OP_MASK)
+		data_src->mem_lock = PERF_MEM_LOCK_LOCKED;
+}
+
+#define ibs_op_msr_idx(msr)	(msr - MSR_AMD64_IBSOPCTL)
+
+static void perf_ibs_get_data_src(struct perf_event *event,
+				  struct perf_ibs_data *ibs_data,
+				  struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u64 op_data2 = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA2)];
+	u64 op_data3 = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+
+	perf_ibs_get_mem_op(op_data3, data);
+	if (data_src->mem_op != PERF_MEM_OP_LOAD &&
+	    data_src->mem_op != PERF_MEM_OP_STORE)
+		return;
+
+	/* Erratum #1293 */
+	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model <= 0xF &&
+	    (op_data3 & IBS_SW_PF_MASK ||
+	     op_data3 & IBS_DC_MISS_NO_MAB_ALLOC_MASK)) {
+		/*
+		 * OP_DATA2 has only two fields on Zen3: DataSrc and RmtNode.
+		 * DataSrc=0 is No valid status and RmtNode is invalid when
+		 * DataSrc=0.
+		 */
+		op_data2 = 0;
+	}
+
+	perf_ibs_get_mem_lvl(event, op_data2, op_data3, data);
+	perf_ibs_get_mem_snoop(op_data2, data);
+	perf_ibs_get_tlb_lvl(op_data3, data);
+	perf_ibs_get_mem_lock(op_data3, data);
+}
+
+static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
+				   int check_rip)
+{
+	if (sample_type & PERF_SAMPLE_RAW ||
+	    (perf_ibs == &perf_ibs_op &&
+	     sample_type & PERF_SAMPLE_DATA_SRC))
+		return perf_ibs->offset_max;
+	else if (check_rip)
+		return 3;
+	return 1;
+}
+
 static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 {
 	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
@@ -735,12 +1018,9 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	size = 1;
 	offset = 1;
 	check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
-	if (event->attr.sample_type & PERF_SAMPLE_RAW)
-		offset_max = perf_ibs->offset_max;
-	else if (check_rip)
-		offset_max = 3;
-	else
-		offset_max = 1;
+
+	offset_max = perf_ibs_get_offset_max(perf_ibs, event->attr.sample_type, check_rip);
+
 	do {
 		rdmsrl(msr + offset, *buf++);
 		size++;
@@ -793,6 +1073,11 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		data.raw = &raw;
 	}
 
+	if (perf_ibs == &perf_ibs_op) {
+		if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC)
+			perf_ibs_get_data_src(event, &ibs_data, &data);
+	}
+
 	/*
 	 * rip recorded by IbsOpRip will not be consistent with rsp and rbp
 	 * recorded as part of interrupt regs. Thus we need to use rip from
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/13] perf/x86/amd: Support PERF_SAMPLE_WEIGHT using IBS OP_DATA3[IbsDcMissLat]
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (2 preceding siblings ...)
  2022-05-25  9:39 ` [PATCH 03/13] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC based on IBS_OP_DATA* Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25 12:58   ` Stephane Eranian
  2022-05-25  9:39 ` [PATCH 05/13] perf/x86/amd: Support PERF_SAMPLE_ADDR using IBS_DC_LINADDR Ravi Bangoria
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IBS Op data3 provides data cache miss latency which can be passed as
sample->weight along with perf_mem_data_src. Note that sample->weight
will be populated only when PERF_SAMPLE_DATA_SRC is also set, although
both sample types are independent.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 6626caeed6a1..5a6e278713f4 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -738,6 +738,12 @@ static void perf_ibs_get_mem_lvl(struct perf_event *event, u64 op_data2,
 		return;
 	}
 
+	/* Load latency (Data cache miss latency) */
+	if (data_src->mem_op == PERF_MEM_OP_LOAD &&
+	    event->attr.sample_type & PERF_SAMPLE_WEIGHT) {
+		data->weight.full = (op_data3 & IBS_DC_MISS_LAT_MASK) >> IBS_DC_MISS_LAT_SHIFT;
+	}
+
 	/* L2 Hit */
 	if ((op_data3 & IBS_L2_MISS_MASK) == 0) {
 		/* Erratum #1293 */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/13] perf/x86/amd: Support PERF_SAMPLE_ADDR using IBS_DC_LINADDR
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (3 preceding siblings ...)
  2022-05-25  9:39 ` [PATCH 04/13] perf/x86/amd: Support PERF_SAMPLE_WEIGHT using IBS OP_DATA3[IbsDcMissLat] Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 06/13] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR Ravi Bangoria
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IBS_DC_LINADDR provides the linear data address for the tagged load/
store operation. Populate perf sample address using it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 5a6e278713f4..b57736357e25 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -965,12 +965,34 @@ static void perf_ibs_get_data_src(struct perf_event *event,
 	perf_ibs_get_mem_lock(op_data3, data);
 }
 
+static void perf_ibs_get_data_addr(struct perf_event *event,
+				   struct perf_ibs_data *ibs_data,
+				   struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u64 op_data3 = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+	u64 lin_addr_valid = op_data3 & IBS_DC_LIN_ADDR_VALID_MASK;
+
+	if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
+		perf_ibs_get_mem_op(op_data3, data);
+
+	if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
+	    data_src->mem_op != PERF_MEM_OP_STORE) ||
+	    !lin_addr_valid) {
+		data->addr = 0x0;
+		return;
+	}
+
+	data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
+}
+
 static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
 				   int check_rip)
 {
 	if (sample_type & PERF_SAMPLE_RAW ||
 	    (perf_ibs == &perf_ibs_op &&
-	     sample_type & PERF_SAMPLE_DATA_SRC))
+	    (sample_type & PERF_SAMPLE_DATA_SRC ||
+	     sample_type & PERF_SAMPLE_ADDR)))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;
@@ -1082,6 +1104,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	if (perf_ibs == &perf_ibs_op) {
 		if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC)
 			perf_ibs_get_data_src(event, &ibs_data, &data);
+		if (event->attr.sample_type & PERF_SAMPLE_ADDR)
+			perf_ibs_get_data_addr(event, &ibs_data, &data);
 	}
 
 	/*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/13] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (4 preceding siblings ...)
  2022-05-25  9:39 ` [PATCH 05/13] perf/x86/amd: Support PERF_SAMPLE_ADDR using IBS_DC_LINADDR Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25 11:21   ` Peter Zijlstra
  2022-05-25  9:39 ` [PATCH 07/13] perf tool: Sync include/uapi/linux/perf_event.h header Ravi Bangoria
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

IBS_DC_PHYSADDR provides the physical data address for the tagged load/
store operation. Populate perf sample physical address using it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index b57736357e25..c719020c0e83 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -986,13 +986,35 @@ static void perf_ibs_get_data_addr(struct perf_event *event,
 	data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
 }
 
+static void perf_ibs_get_phy_addr(struct perf_event *event,
+				  struct perf_ibs_data *ibs_data,
+				  struct perf_sample_data *data)
+{
+	union perf_mem_data_src *data_src = &data->data_src;
+	u64 op_data3 = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
+	u64 phy_addr_valid = op_data3 & IBS_DC_PHY_ADDR_VALID_MASK;
+
+	if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
+		perf_ibs_get_mem_op(op_data3, data);
+
+	if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
+	    data_src->mem_op != PERF_MEM_OP_STORE) ||
+	    !phy_addr_valid) {
+		data->phys_addr = 0x0;
+		return;
+	}
+
+	data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
+}
+
 static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs, u64 sample_type,
 				   int check_rip)
 {
 	if (sample_type & PERF_SAMPLE_RAW ||
 	    (perf_ibs == &perf_ibs_op &&
 	    (sample_type & PERF_SAMPLE_DATA_SRC ||
-	     sample_type & PERF_SAMPLE_ADDR)))
+	     sample_type & PERF_SAMPLE_ADDR ||
+	     sample_type & PERF_SAMPLE_PHYS_ADDR)))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;
@@ -1106,6 +1128,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 			perf_ibs_get_data_src(event, &ibs_data, &data);
 		if (event->attr.sample_type & PERF_SAMPLE_ADDR)
 			perf_ibs_get_data_addr(event, &ibs_data, &data);
+		if (event->attr.sample_type & PERF_SAMPLE_PHYS_ADDR)
+			perf_ibs_get_phy_addr(event, &ibs_data, &data);
 	}
 
 	/*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/13] perf tool: Sync include/uapi/linux/perf_event.h header
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (5 preceding siblings ...)
  2022-05-25  9:39 ` [PATCH 06/13] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 08/13] perf tool: Sync arch/x86/include/asm/amd-ibs.h header Ravi Bangoria
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Two new fields for mem_lvl_num has been introduced: PERF_MEM_LVLNUM_IO
and PERF_MEM_LVLNUM_EXTN_MEM. Kernel header already contains those
definitions. Sync them into tools header as well.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/include/uapi/linux/perf_event.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index d37629dbad72..1c3157c1be9d 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1292,7 +1292,9 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
-/* 5-0xa available */
+/* 5-0x8 available */
+#define PERF_MEM_LVLNUM_EXTN_MEM 0x9 /* Extension memory */
+#define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
 #define PERF_MEM_LVLNUM_LFB	0x0c /* LFB */
 #define PERF_MEM_LVLNUM_RAM	0x0d /* RAM */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/13] perf tool: Sync arch/x86/include/asm/amd-ibs.h header
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (6 preceding siblings ...)
  2022-05-25  9:39 ` [PATCH 07/13] perf tool: Sync include/uapi/linux/perf_event.h header Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 09/13] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Although new details added into this header is currently used by
kernel only, tools copy needs to be in sync with kernel file.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/arch/x86/include/asm/amd-ibs.h | 76 ++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/tools/arch/x86/include/asm/amd-ibs.h b/tools/arch/x86/include/asm/amd-ibs.h
index 765e9e752d03..c6f5f5f316ad 100644
--- a/tools/arch/x86/include/asm/amd-ibs.h
+++ b/tools/arch/x86/include/asm/amd-ibs.h
@@ -6,6 +6,82 @@
 
 #include "msr-index.h"
 
+/* IBS_OP_DATA2 Bits */
+#define IBS_DATA_SRC_HI_SHIFT			6
+#define IBS_DATA_SRC_HI_MASK			(0x3ULL << IBS_DATA_SRC_HI_SHIFT)
+#define IBS_CACHE_HIT_ST_SHIFT			5
+#define IBS_CACHE_HIT_ST_MASK			(0x1ULL << IBS_CACHE_HIT_ST_SHIFT)
+#define IBS_RMT_NODE_SHIFT			4
+#define IBS_RMT_NODE_MASK			(0x1ULL << IBS_RMT_NODE_SHIFT)
+#define IBS_DATA_SRC_LO_SHIFT			0
+#define IBS_DATA_SRC_LO_MASK			(0x7ULL << IBS_DATA_SRC_LO_SHIFT)
+
+/* IBS_OP_DATA2 DataSrc */
+#define IBS_DATA_SRC_LOC_CACHE			 2
+#define IBS_DATA_SRC_DRAM			 3
+#define IBS_DATA_SRC_REM_CACHE			 4
+#define IBS_DATA_SRC_IO				 7
+
+/* IBS_OP_DATA2 with DataSrc Extension */
+#define IBS_DATA_SRC_EXT_LOC_CACHE		 1
+#define IBS_DATA_SRC_EXT_NEAR_CCX_CACHE		 2
+#define IBS_DATA_SRC_EXT_DRAM			 3
+#define IBS_DATA_SRC_EXT_FAR_CCX_CACHE		 5
+#define IBS_DATA_SRC_EXT_PMEM			 6
+#define IBS_DATA_SRC_EXT_IO			 7
+#define IBS_DATA_SRC_EXT_EXT_MEM		 8
+#define IBS_DATA_SRC_EXT_PEER_AGENT_MEM		12
+
+/* IBS_OP_DATA3 Bits */
+#define IBS_TLB_REFILL_LAT_SHIFT		48
+#define IBS_TLB_REFILL_LAT_MASK			(0xFFFFULL << IBS_TLB_REFILL_LAT_SHIFT)
+#define IBS_DC_MISS_LAT_SHIFT			32
+#define IBS_DC_MISS_LAT_MASK			(0xFFFFULL << IBS_DC_MISS_LAT_SHIFT)
+#define IBS_OP_DC_MISS_OPEN_MEM_REQS_SHIFT	26
+#define IBS_OP_DC_MISS_OPEN_MEM_REQS_MASK	(0x3FULL << IBS_OP_DC_MISS_OPEN_MEM_REQS_SHIFT)
+#define IBS_OP_MEM_WIDTH_SHIFT			22
+#define IBS_OP_MEM_WIDTH_MASK			(0xFULL << IBS_OP_MEM_WIDTH_SHIFT)
+#define IBS_SW_PF_SHIFT				21
+#define IBS_SW_PF_MASK				(0x1ULL << IBS_SW_PF_SHIFT)
+#define IBS_L2_MISS_SHIFT			20
+#define IBS_L2_MISS_MASK			(0x1ULL << IBS_L2_MISS_SHIFT)
+#define IBS_DC_L2_TLB_HIT_1G_SHIFT		19
+#define IBS_DC_L2_TLB_HIT_1G_MASK		(0x1ULL << IBS_DC_L2_TLB_HIT_1G_SHIFT)
+#define IBS_DC_PHY_ADDR_VALID_SHIFT		18
+#define IBS_DC_PHY_ADDR_VALID_MASK		(0x1ULL << IBS_DC_PHY_ADDR_VALID_SHIFT)
+#define IBS_DC_LIN_ADDR_VALID_SHIFT		17
+#define IBS_DC_LIN_ADDR_VALID_MASK		(0x1ULL << IBS_DC_LIN_ADDR_VALID_SHIFT)
+#define IBS_DC_MISS_NO_MAB_ALLOC_SHIFT		16
+#define IBS_DC_MISS_NO_MAB_ALLOC_MASK		(0x1ULL << IBS_DC_MISS_NO_MAB_ALLOC_SHIFT)
+#define IBS_DC_LOCKED_OP_SHIFT			15
+#define IBS_DC_LOCKED_OP_MASK			(0x1ULL << IBS_DC_LOCKED_OP_SHIFT)
+#define IBS_DC_UC_MEM_ACC_SHIFT			14
+#define IBS_DC_UC_MEM_ACC_MASK			(0x1ULL << IBS_DC_UC_MEM_ACC_SHIFT)
+#define IBS_DC_WC_MEM_ACC_SHIFT			13
+#define IBS_DC_WC_MEM_ACC_MASK			(0x1ULL << IBS_DC_WC_MEM_ACC_SHIFT)
+#define IBS_DC_MIS_ACC_SHIFT			8
+#define IBS_DC_MIS_ACC_MASK			(0x1ULL << IBS_DC_MIS_ACC_SHIFT)
+#define IBS_DC_MISS_SHIFT			7
+#define IBS_DC_MISS_MASK			(0x1ULL << IBS_DC_MISS_SHIFT)
+#define IBS_DC_L2_TLB_HIT_2M_SHIFT		6
+#define IBS_DC_L2_TLB_HIT_2M_MASK		(0x1ULL << IBS_DC_L2_TLB_HIT_2M_SHIFT)
+/*
+ * Definition of 5-4 bits is different between Zen3 and Zen4 (Zen2 definition
+ * is same as Zen4) but the end result is same. So using Zen4 definition here.
+ */
+#define IBS_DC_L1_TLB_HIT_1G_SHIFT		5
+#define IBS_DC_L1_TLB_HIT_1G_MASK		(0x1ULL << IBS_DC_L1_TLB_HIT_1G_SHIFT)
+#define IBS_DC_L1_TLB_HIT_2M_SHIFT		4
+#define IBS_DC_L1_TLB_HIT_2M_MASK		(0x1ULL << IBS_DC_L1_TLB_HIT_2M_SHIFT)
+#define IBS_DC_L2_TLB_MISS_SHIFT		3
+#define IBS_DC_L2_TLB_MISS_MASK			(0x1ULL << IBS_DC_L2_TLB_MISS_SHIFT)
+#define IBS_DC_L1_TLB_MISS_SHIFT		2
+#define IBS_DC_L1_TLB_MISS_MASK			(0x1ULL << IBS_DC_L1_TLB_MISS_SHIFT)
+#define IBS_ST_OP_SHIFT				1
+#define IBS_ST_OP_MASK				(0x1ULL << IBS_ST_OP_SHIFT)
+#define IBS_LD_OP_SHIFT				0
+#define IBS_LD_OP_MASK				(0x1ULL << IBS_LD_OP_SHIFT)
+
 /*
  * IBS Hardware MSRs
  */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/13] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (7 preceding siblings ...)
  2022-05-25  9:39 ` [PATCH 08/13] perf tool: Sync arch/x86/include/asm/amd-ibs.h header Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 10/13] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Add support for printing these new fields in perf mem report.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/mem-events.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index ed0ab838bcc5..027cd6d62f21 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -294,6 +294,8 @@ static const char * const mem_lvl[] = {
 };
 
 static const char * const mem_lvlnum[] = {
+	[PERF_MEM_LVLNUM_EXTN_MEM] = "Ext Mem",
+	[PERF_MEM_LVLNUM_IO] = "I/O",
 	[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
 	[PERF_MEM_LVLNUM_LFB] = "LFB",
 	[PERF_MEM_LVLNUM_RAM] = "RAM",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/13] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (8 preceding siblings ...)
  2022-05-25  9:39 ` [PATCH 09/13] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 11/13] perf mem/c2c: Add load store event mappings for AMD Ravi Bangoria
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Currently perf sets PERF_SAMPLE_WEIGHT flag only for mem load events.
Set it for combined load-store event as well which will enable recording
of load latency by default on arch that does not support independent
mem load event.

Also document missing -W in perf-record man page.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/Documentation/perf-record.txt | 1 +
 tools/perf/builtin-c2c.c                 | 1 +
 tools/perf/builtin-mem.c                 | 1 +
 3 files changed, 3 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 465be4e62a17..c85faaa1635f 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -397,6 +397,7 @@ is enabled for all the sampling events. The sampled branch type is the same for
 The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
 Note that this feature may not be available on all processors.
 
+-W::
 --weight::
 Enable weightened sampling. An additional weight is recorded per sample and can be
 displayed with the weight and local_weight sort keys.  This currently works for TSX
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index fbbed434014f..d39b0c12e1f6 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2966,6 +2966,7 @@ static int perf_c2c__record(int argc, const char **argv)
 		 */
 		if (e->tag) {
 			e->record = true;
+			rec_argv[i++] = "-W";
 		} else {
 			e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
 			e->record = true;
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 9e435fd23503..f7dd8216de72 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -122,6 +122,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 	    (mem->operation & MEM_OPERATION_LOAD) &&
 	    (mem->operation & MEM_OPERATION_STORE)) {
 		e->record = true;
+		rec_argv[i++] = "-W";
 	} else {
 		if (mem->operation & MEM_OPERATION_LOAD) {
 			e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/13] perf mem/c2c: Add load store event mappings for AMD
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (9 preceding siblings ...)
  2022-05-25  9:39 ` [PATCH 10/13] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 12/13] perf mem/c2c: Avoid printing empty lines for unsupported events Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 13/13] perf mem: Use more generic term for LFB Ravi Bangoria
  12 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Wire in ibs_op// event as mem-ldst
event for AMD.

There are some limitations though: IBS does not have any filtering
capabilities which means many non-load/store samples are captured by
IBS which are immaterial for these tools. Such samples will be shown
as N/A in perf mem/c2c report. IBS, being an uncore pmu from kernel
point of view[1], does not support per process monitoring. Thus, perf
mem/c2c on AMD are currently supported in per-cpu mode only.

Example:
  $ sudo ./perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo ./perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit            10  HitM
  Remote Cache (2 hops) hit             6  N/A
  Uncached hit                         4  N/A

[1]: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/Documentation/perf-c2c.txt | 14 ++++++++----
 tools/perf/Documentation/perf-mem.txt |  3 ++-
 tools/perf/arch/x86/util/mem-events.c | 31 +++++++++++++++++++++++++--
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index 3b6a2c84ea02..b07f258ec6a5 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -19,9 +19,10 @@ C2C stands for Cache To Cache.
 The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
 you to track down the cacheline contentions.
 
-On x86, the tool is based on load latency and precise store facility events
+On Intel, the tool is based on load latency and precise store facility events
 provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
-with thresholding feature.
+with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware
+limitations, perf c2c is not supported on Zen3 cpus).
 
 These events provide:
   - memory address of the access
@@ -49,7 +50,8 @@ RECORD OPTIONS
 
 -l::
 --ldlat::
-	Configure mem-loads latency. (x86 only)
+	Configure mem-loads latency. Supported on Intel and Arm64 processors
+	only. Ignored on other archs.
 
 -k::
 --all-kernel::
@@ -133,11 +135,15 @@ Following perf record options are configured by default:
   -W,-d,--phys-data,--sample-cpu
 
 Unless specified otherwise with '-e' option, following events are monitored by
-default on x86:
+default on Intel:
 
   cpu/mem-loads,ldlat=30/P
   cpu/mem-stores/P
 
+following on AMD:
+
+  ibs_op//
+
 and following on PowerPC:
 
   cpu/mem-loads/
diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 66177511c5c4..005c95580b1e 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -85,7 +85,8 @@ RECORD OPTIONS
 	Be more verbose (show counter open errors, etc)
 
 --ldlat <n>::
-	Specify desired latency for loads event. (x86 only)
+	Specify desired latency for loads event. Supported on Intel and Arm64
+	processors only. Ignored on other archs.
 
 In addition, for report all perf report options are valid, and for record
 all perf record options.
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index 5214370ca4e4..f683ac702247 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -1,7 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "util/pmu.h"
+#include "util/env.h"
 #include "map_symbol.h"
 #include "mem-events.h"
+#include "linux/string.h"
 
 static char mem_loads_name[100];
 static bool mem_loads_name__init;
@@ -12,18 +14,43 @@ static char mem_stores_name[100];
 
 #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
 
-static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+static struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
 	E("ldlat-loads",	"%s/mem-loads,ldlat=%u/P",	"%s/events/mem-loads"),
 	E("ldlat-stores",	"%s/mem-stores/P",		"%s/events/mem-stores"),
 	E(NULL,			NULL,				NULL),
 };
 
+static struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
+	E(NULL,		NULL,		NULL),
+	E(NULL,		NULL,		NULL),
+	E("mem-ldst",	"ibs_op//",	"ibs_op"),
+};
+
+static int perf_mem_is_amd_cpu(void)
+{
+	struct perf_env env = { .total_mem = 0, };
+
+	perf_env__cpuid(&env);
+	if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD"))
+		return 1;
+	return -1;
+}
+
 struct perf_mem_event *perf_mem_events__ptr(int i)
 {
+	/* 0: Uninitialized, 1: Yes, -1: No */
+	static int is_amd;
+
 	if (i >= PERF_MEM_EVENTS__MAX)
 		return NULL;
 
-	return &perf_mem_events[i];
+	if (!is_amd)
+		is_amd = perf_mem_is_amd_cpu();
+
+	if (is_amd == 1)
+		return &perf_mem_events_amd[i];
+
+	return &perf_mem_events_intel[i];
 }
 
 bool is_mem_loads_aux_event(struct evsel *leader)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/13] perf mem/c2c: Avoid printing empty lines for unsupported events
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (10 preceding siblings ...)
  2022-05-25  9:39 ` [PATCH 11/13] perf mem/c2c: Add load store event mappings for AMD Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  2022-05-25  9:39 ` [PATCH 13/13] perf mem: Use more generic term for LFB Ravi Bangoria
  12 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

Perf mem and c2c can be used with 3 different events: load, store and
combined load-store. Some architectures might support only partial set
of events in which case, perf prints empty line for unsupported events.
Avoid that.

Ex, AMD Zen cpus supports only combined load-store event and does not
support individual load store events.

Before patch:
  $ ./perf mem record -e list
  
  
  mem-ldst     : available

After patch:
  $ ./perf mem record -e list
  mem-ldst     : available

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/mem-events.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 027cd6d62f21..415d754fea8d 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -156,11 +156,12 @@ void perf_mem_events__list(void)
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
 		struct perf_mem_event *e = perf_mem_events__ptr(j);
 
-		fprintf(stderr, "%-13s%-*s%s\n",
-			e->tag ?: "",
-			verbose > 0 ? 25 : 0,
-			verbose > 0 ? perf_mem_events__name(j, NULL) : "",
-			e->supported ? ": available" : "");
+		fprintf(stderr, "%-*s%-*s%s",
+			e->tag ? 13 : 0,
+			e->tag ? : "",
+			e->tag && verbose > 0 ? 25 : 0,
+			e->tag && verbose > 0 ? perf_mem_events__name(j, NULL) : "",
+			e->supported ? ": available\n" : "");
 	}
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 13/13] perf mem: Use more generic term for LFB
  2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
                   ` (11 preceding siblings ...)
  2022-05-25  9:39 ` [PATCH 12/13] perf mem/c2c: Avoid printing empty lines for unsupported events Ravi Bangoria
@ 2022-05-25  9:39 ` Ravi Bangoria
  12 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-25  9:39 UTC (permalink / raw)
  To: peterz, acme
  Cc: ravi.bangoria, jolsa, namhyung, eranian, irogers, jmario,
	leo.yan, alisaidi, ak, kan.liang, dave.hansen, hpa, mingo,
	mark.rutland, alexander.shishkin, tglx, bp, x86,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	kim.phillips, santosh.shukla

A hw component to track outstanding L1 Data Cache misses is called
LFB (Line Fill Buffer) on Intel and Arm. However similar component
exists on other arch with different names, for ex, it's called MAB
(Miss Address Buffer) on AMD. Replace LFB with generic name "Cache
Fill Buffer".

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/mem-events.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 415d754fea8d..e3b8e174ceb4 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -282,7 +282,7 @@ static const char * const mem_lvl[] = {
 	"HIT",
 	"MISS",
 	"L1",
-	"LFB",
+	"Cache Fill Buffer",
 	"L2",
 	"L3",
 	"Local RAM",
@@ -298,7 +298,7 @@ static const char * const mem_lvlnum[] = {
 	[PERF_MEM_LVLNUM_EXTN_MEM] = "Ext Mem",
 	[PERF_MEM_LVLNUM_IO] = "I/O",
 	[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
-	[PERF_MEM_LVLNUM_LFB] = "LFB",
+	[PERF_MEM_LVLNUM_LFB] = "Cache Fill Buffer",
 	[PERF_MEM_LVLNUM_RAM] = "RAM",
 	[PERF_MEM_LVLNUM_PMEM] = "PMEM",
 	[PERF_MEM_LVLNUM_NA] = "N/A",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/13] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR
  2022-05-25  9:39 ` [PATCH 06/13] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR Ravi Bangoria
@ 2022-05-25 11:21   ` Peter Zijlstra
  2022-05-26  8:46     ` Ravi Bangoria
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2022-05-25 11:21 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: acme, jolsa, namhyung, eranian, irogers, jmario, leo.yan,
	alisaidi, ak, kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla

On Wed, May 25, 2022 at 03:09:31PM +0530, Ravi Bangoria wrote:
> IBS_DC_PHYSADDR provides the physical data address for the tagged load/
> store operation. Populate perf sample physical address using it.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/ibs.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index b57736357e25..c719020c0e83 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -986,13 +986,35 @@ static void perf_ibs_get_data_addr(struct perf_event *event,
>  	data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
>  }
>  
> +static void perf_ibs_get_phy_addr(struct perf_event *event,
> +				  struct perf_ibs_data *ibs_data,
> +				  struct perf_sample_data *data)
> +{
> +	union perf_mem_data_src *data_src = &data->data_src;
> +	u64 op_data3 = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
> +	u64 phy_addr_valid = op_data3 & IBS_DC_PHY_ADDR_VALID_MASK;
> +
> +	if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
> +		perf_ibs_get_mem_op(op_data3, data);
> +
> +	if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
> +	    data_src->mem_op != PERF_MEM_OP_STORE) ||
> +	    !phy_addr_valid) {
> +		data->phys_addr = 0x0;
> +		return;
> +	}
> +
> +	data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
> +}

perf_prepare_sample() will unconditionally overwrite data->phys_addr.
There is currently no facility to let the driver set this field.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] perf/x86/amd: Support PERF_SAMPLE_WEIGHT using IBS OP_DATA3[IbsDcMissLat]
  2022-05-25  9:39 ` [PATCH 04/13] perf/x86/amd: Support PERF_SAMPLE_WEIGHT using IBS OP_DATA3[IbsDcMissLat] Ravi Bangoria
@ 2022-05-25 12:58   ` Stephane Eranian
  2022-05-26 12:14     ` Ravi Bangoria
  0 siblings, 1 reply; 23+ messages in thread
From: Stephane Eranian @ 2022-05-25 12:58 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, acme, jolsa, namhyung, irogers, jmario, leo.yan,
	alisaidi, ak, kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla

On Wed, May 25, 2022 at 12:42 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> IBS Op data3 provides data cache miss latency which can be passed as
> sample->weight along with perf_mem_data_src. Note that sample->weight
> will be populated only when PERF_SAMPLE_DATA_SRC is also set, although
> both sample types are independent.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/ibs.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 6626caeed6a1..5a6e278713f4 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -738,6 +738,12 @@ static void perf_ibs_get_mem_lvl(struct perf_event *event, u64 op_data2,
>                 return;
>         }
>
> +       /* Load latency (Data cache miss latency) */
> +       if (data_src->mem_op == PERF_MEM_OP_LOAD &&
> +           event->attr.sample_type & PERF_SAMPLE_WEIGHT) {
> +               data->weight.full = (op_data3 & IBS_DC_MISS_LAT_MASK) >> IBS_DC_MISS_LAT_SHIFT;
> +       }
> +
I think here you also need to handle the WEIGHT_STRUCT case and put
the cache miss latency in the right
field. This IBS field covers the cache line movement and not the whole
instruction latency which is the tag to ret field.
In the case of WEIGHT_STRUCT you need to fill out the two fields.

>         /* L2 Hit */
>         if ((op_data3 & IBS_L2_MISS_MASK) == 0) {
>                 /* Erratum #1293 */
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/13] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR
  2022-05-25 11:21   ` Peter Zijlstra
@ 2022-05-26  8:46     ` Ravi Bangoria
  2022-05-26  9:56       ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-26  8:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, jolsa, namhyung, eranian, irogers, jmario, leo.yan,
	alisaidi, ak, kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla, Ravi Bangoria

On 25-May-22 4:51 PM, Peter Zijlstra wrote:
> On Wed, May 25, 2022 at 03:09:31PM +0530, Ravi Bangoria wrote:
>> IBS_DC_PHYSADDR provides the physical data address for the tagged load/
>> store operation. Populate perf sample physical address using it.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>> ---
>>  arch/x86/events/amd/ibs.c | 26 +++++++++++++++++++++++++-
>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
>> index b57736357e25..c719020c0e83 100644
>> --- a/arch/x86/events/amd/ibs.c
>> +++ b/arch/x86/events/amd/ibs.c
>> @@ -986,13 +986,35 @@ static void perf_ibs_get_data_addr(struct perf_event *event,
>>  	data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
>>  }
>>  
>> +static void perf_ibs_get_phy_addr(struct perf_event *event,
>> +				  struct perf_ibs_data *ibs_data,
>> +				  struct perf_sample_data *data)
>> +{
>> +	union perf_mem_data_src *data_src = &data->data_src;
>> +	u64 op_data3 = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
>> +	u64 phy_addr_valid = op_data3 & IBS_DC_PHY_ADDR_VALID_MASK;
>> +
>> +	if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
>> +		perf_ibs_get_mem_op(op_data3, data);
>> +
>> +	if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
>> +	    data_src->mem_op != PERF_MEM_OP_STORE) ||
>> +	    !phy_addr_valid) {
>> +		data->phys_addr = 0x0;
>> +		return;
>> +	}
>> +
>> +	data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
>> +}
> 
> perf_prepare_sample() will unconditionally overwrite data->phys_addr.
> There is currently no facility to let the driver set this field.

Thanks for pointing it Peter. Would you mind if I add:

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index c719020c0e83..fbd1f4e94d47 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -986,6 +986,19 @@ static void perf_ibs_get_data_addr(struct perf_event *event,
        data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
 }

+/* data_src->mem_op should have been set by perf_ibs_get_phy_addr() */
+bool perf_arch_phys_addr_set(struct perf_event *event,
+                            struct perf_sample_data *data)
+{
+       union perf_mem_data_src *data_src = &data->data_src;
+
+       if (event->pmu != &perf_ibs_op.pmu)
+               return false;
+
+       return (data_src->mem_op == PERF_MEM_OP_LOAD ||
+               data_src->mem_op == PERF_MEM_OP_STORE);
+}
+
 static void perf_ibs_get_phy_addr(struct perf_event *event,
                                  struct perf_ibs_data *ibs_data,
                                  struct perf_sample_data *data)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index da759560eec5..67402af3b70f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1477,6 +1477,8 @@ extern void perf_event_task_tick(void);
 extern int perf_event_account_interrupt(struct perf_event *event);
 extern int perf_event_period(struct perf_event *event, u64 value);
 extern u64 perf_event_pause(struct perf_event *event, bool reset);
+bool perf_arch_phys_addr_set(struct perf_event *event,
+                            struct perf_sample_data *data);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
 perf_aux_output_begin(struct perf_output_handle *handle,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7699be46f3a1..9baeb2d21bc0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7283,6 +7283,12 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
        return callchain ?: &__empty_callchain;
 }

+bool __weak perf_arch_phys_addr_set(struct perf_event *event,
+                                   struct perf_sample_data *data)
+{
+       return false;
+}
+
 void perf_prepare_sample(struct perf_event_header *header,
                         struct perf_sample_data *data,
                         struct perf_event *event,
@@ -7404,8 +7410,10 @@ void perf_prepare_sample(struct perf_event_header *header,
                header->size += size;
        }

-       if (sample_type & PERF_SAMPLE_PHYS_ADDR)
+       if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
+           !perf_arch_phys_addr_set(event, data)) {
                data->phys_addr = perf_virt_to_phys(data->addr);
+       }

 #ifdef CONFIG_CGROUP_PERF
        if (sample_type & PERF_SAMPLE_CGROUP) {

Thanks,
Ravi

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/13] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR
  2022-05-26  8:46     ` Ravi Bangoria
@ 2022-05-26  9:56       ` Peter Zijlstra
  2022-05-26 10:59         ` Ravi Bangoria
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2022-05-26  9:56 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: acme, jolsa, namhyung, eranian, irogers, jmario, leo.yan,
	alisaidi, ak, kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla

On Thu, May 26, 2022 at 02:16:28PM +0530, Ravi Bangoria wrote:
> On 25-May-22 4:51 PM, Peter Zijlstra wrote:
> > On Wed, May 25, 2022 at 03:09:31PM +0530, Ravi Bangoria wrote:
> >> IBS_DC_PHYSADDR provides the physical data address for the tagged load/
> >> store operation. Populate perf sample physical address using it.
> >>
> >> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> >> ---
> >>  arch/x86/events/amd/ibs.c | 26 +++++++++++++++++++++++++-
> >>  1 file changed, 25 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> >> index b57736357e25..c719020c0e83 100644
> >> --- a/arch/x86/events/amd/ibs.c
> >> +++ b/arch/x86/events/amd/ibs.c
> >> @@ -986,13 +986,35 @@ static void perf_ibs_get_data_addr(struct perf_event *event,
> >>  	data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
> >>  }
> >>  
> >> +static void perf_ibs_get_phy_addr(struct perf_event *event,
> >> +				  struct perf_ibs_data *ibs_data,
> >> +				  struct perf_sample_data *data)
> >> +{
> >> +	union perf_mem_data_src *data_src = &data->data_src;
> >> +	u64 op_data3 = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
> >> +	u64 phy_addr_valid = op_data3 & IBS_DC_PHY_ADDR_VALID_MASK;
> >> +
> >> +	if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
> >> +		perf_ibs_get_mem_op(op_data3, data);
> >> +
> >> +	if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
> >> +	    data_src->mem_op != PERF_MEM_OP_STORE) ||
> >> +	    !phy_addr_valid) {
> >> +		data->phys_addr = 0x0;
> >> +		return;
> >> +	}
> >> +
> >> +	data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
> >> +}
> > 
> > perf_prepare_sample() will unconditionally overwrite data->phys_addr.
> > There is currently no facility to let the driver set this field.
> 
> Thanks for pointing it Peter. Would you mind if I add:

I think it's best if you extend/mimic the __PERF_SAMPLE_CALLCHAIN_EARLY
hack. It's more or less the same problem and then at least the solution
is consistent.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/13] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR
  2022-05-26  9:56       ` Peter Zijlstra
@ 2022-05-26 10:59         ` Ravi Bangoria
  2022-05-26 11:09           ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-26 10:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, jolsa, namhyung, eranian, irogers, jmario, leo.yan,
	alisaidi, ak, kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla, Ravi Bangoria

On 26-May-22 3:26 PM, Peter Zijlstra wrote:
> On Thu, May 26, 2022 at 02:16:28PM +0530, Ravi Bangoria wrote:
>> On 25-May-22 4:51 PM, Peter Zijlstra wrote:
>>> On Wed, May 25, 2022 at 03:09:31PM +0530, Ravi Bangoria wrote:
>>>> IBS_DC_PHYSADDR provides the physical data address for the tagged load/
>>>> store operation. Populate perf sample physical address using it.
>>>>
>>>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>>>> ---
>>>>  arch/x86/events/amd/ibs.c | 26 +++++++++++++++++++++++++-
>>>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
>>>> index b57736357e25..c719020c0e83 100644
>>>> --- a/arch/x86/events/amd/ibs.c
>>>> +++ b/arch/x86/events/amd/ibs.c
>>>> @@ -986,13 +986,35 @@ static void perf_ibs_get_data_addr(struct perf_event *event,
>>>>  	data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
>>>>  }
>>>>  
>>>> +static void perf_ibs_get_phy_addr(struct perf_event *event,
>>>> +				  struct perf_ibs_data *ibs_data,
>>>> +				  struct perf_sample_data *data)
>>>> +{
>>>> +	union perf_mem_data_src *data_src = &data->data_src;
>>>> +	u64 op_data3 = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
>>>> +	u64 phy_addr_valid = op_data3 & IBS_DC_PHY_ADDR_VALID_MASK;
>>>> +
>>>> +	if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
>>>> +		perf_ibs_get_mem_op(op_data3, data);
>>>> +
>>>> +	if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
>>>> +	    data_src->mem_op != PERF_MEM_OP_STORE) ||
>>>> +	    !phy_addr_valid) {
>>>> +		data->phys_addr = 0x0;
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
>>>> +}
>>>
>>> perf_prepare_sample() will unconditionally overwrite data->phys_addr.
>>> There is currently no facility to let the driver set this field.
>>
>> Thanks for pointing it Peter. Would you mind if I add:
> 
> I think it's best if you extend/mimic the __PERF_SAMPLE_CALLCHAIN_EARLY
> hack. It's more or less the same problem and then at least the solution
> is consistent.

I've one more identical optimization in my list. IBS_OP_DATA3[IbsDcPgSz]
can provide PERF_SAMPLE_DATA_PAGE_SIZE. I hope consuming two more bits
for internal purpose is okay.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/13] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR
  2022-05-26 10:59         ` Ravi Bangoria
@ 2022-05-26 11:09           ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2022-05-26 11:09 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: acme, jolsa, namhyung, eranian, irogers, jmario, leo.yan,
	alisaidi, ak, kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla

On Thu, May 26, 2022 at 04:29:22PM +0530, Ravi Bangoria wrote:
> On 26-May-22 3:26 PM, Peter Zijlstra wrote:
> > On Thu, May 26, 2022 at 02:16:28PM +0530, Ravi Bangoria wrote:
> >> On 25-May-22 4:51 PM, Peter Zijlstra wrote:
> >>> On Wed, May 25, 2022 at 03:09:31PM +0530, Ravi Bangoria wrote:
> >>>> IBS_DC_PHYSADDR provides the physical data address for the tagged load/
> >>>> store operation. Populate perf sample physical address using it.
> >>>>
> >>>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> >>>> ---
> >>>>  arch/x86/events/amd/ibs.c | 26 +++++++++++++++++++++++++-
> >>>>  1 file changed, 25 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> >>>> index b57736357e25..c719020c0e83 100644
> >>>> --- a/arch/x86/events/amd/ibs.c
> >>>> +++ b/arch/x86/events/amd/ibs.c
> >>>> @@ -986,13 +986,35 @@ static void perf_ibs_get_data_addr(struct perf_event *event,
> >>>>  	data->addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCLINAD)];
> >>>>  }
> >>>>  
> >>>> +static void perf_ibs_get_phy_addr(struct perf_event *event,
> >>>> +				  struct perf_ibs_data *ibs_data,
> >>>> +				  struct perf_sample_data *data)
> >>>> +{
> >>>> +	union perf_mem_data_src *data_src = &data->data_src;
> >>>> +	u64 op_data3 = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)];
> >>>> +	u64 phy_addr_valid = op_data3 & IBS_DC_PHY_ADDR_VALID_MASK;
> >>>> +
> >>>> +	if (!(event->attr.sample_type & PERF_SAMPLE_DATA_SRC))
> >>>> +		perf_ibs_get_mem_op(op_data3, data);
> >>>> +
> >>>> +	if ((data_src->mem_op != PERF_MEM_OP_LOAD &&
> >>>> +	    data_src->mem_op != PERF_MEM_OP_STORE) ||
> >>>> +	    !phy_addr_valid) {
> >>>> +		data->phys_addr = 0x0;
> >>>> +		return;
> >>>> +	}
> >>>> +
> >>>> +	data->phys_addr = ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)];
> >>>> +}
> >>>
> >>> perf_prepare_sample() will unconditionally overwrite data->phys_addr.
> >>> There is currently no facility to let the driver set this field.
> >>
> >> Thanks for pointing it Peter. Would you mind if I add:
> > 
> > I think it's best if you extend/mimic the __PERF_SAMPLE_CALLCHAIN_EARLY
> > hack. It's more or less the same problem and then at least the solution
> > is consistent.
> 
> I've one more identical optimization in my list. IBS_OP_DATA3[IbsDcPgSz]
> can provide PERF_SAMPLE_DATA_PAGE_SIZE. I hope consuming two more bits
> for internal purpose is okay.

Yeah, I suppose so.. we'll need to hunt for bits once we run out, but
that's how it is...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] perf/x86/amd: Support PERF_SAMPLE_WEIGHT using IBS OP_DATA3[IbsDcMissLat]
  2022-05-25 12:58   ` Stephane Eranian
@ 2022-05-26 12:14     ` Ravi Bangoria
  0 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-05-26 12:14 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: peterz, acme, jolsa, namhyung, irogers, jmario, leo.yan,
	alisaidi, ak, kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, kim.phillips,
	santosh.shukla, Ravi Bangoria

On 25-May-22 6:28 PM, Stephane Eranian wrote:
> On Wed, May 25, 2022 at 12:42 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>> IBS Op data3 provides data cache miss latency which can be passed as
>> sample->weight along with perf_mem_data_src. Note that sample->weight
>> will be populated only when PERF_SAMPLE_DATA_SRC is also set, although
>> both sample types are independent.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>> ---
>>  arch/x86/events/amd/ibs.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
>> index 6626caeed6a1..5a6e278713f4 100644
>> --- a/arch/x86/events/amd/ibs.c
>> +++ b/arch/x86/events/amd/ibs.c
>> @@ -738,6 +738,12 @@ static void perf_ibs_get_mem_lvl(struct perf_event *event, u64 op_data2,
>>                 return;
>>         }
>>
>> +       /* Load latency (Data cache miss latency) */
>> +       if (data_src->mem_op == PERF_MEM_OP_LOAD &&
>> +           event->attr.sample_type & PERF_SAMPLE_WEIGHT) {
>> +               data->weight.full = (op_data3 & IBS_DC_MISS_LAT_MASK) >> IBS_DC_MISS_LAT_SHIFT;
>> +       }
>> +
> I think here you also need to handle the WEIGHT_STRUCT case and put
> the cache miss latency in the right
> field. This IBS field covers the cache line movement and not the whole
> instruction latency which is the tag to ret field.
> In the case of WEIGHT_STRUCT you need to fill out the two fields.

Yeah, will do. Thanks for pointing it.

-Ravi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/13] perf/x86/amd: Add IBS OP_DATA2/3 register bit definitions
  2022-05-25  9:39 ` [PATCH 02/13] perf/x86/amd: Add IBS OP_DATA2/3 register bit definitions Ravi Bangoria
@ 2022-05-26 15:08   ` Kim Phillips
  2022-06-01  4:25     ` Ravi Bangoria
  0 siblings, 1 reply; 23+ messages in thread
From: Kim Phillips @ 2022-05-26 15:08 UTC (permalink / raw)
  To: Ravi Bangoria, peterz, acme
  Cc: jolsa, namhyung, eranian, irogers, jmario, leo.yan, alisaidi, ak,
	kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, santosh.shukla

On 5/25/22 4:39 AM, Ravi Bangoria wrote:

Hi Ravi,

> AMD IBS OP_DATA2 and OP_DATA3 provides detail about tagged load/store
> ops. Add definitions for these registers into header file. In addition
> to those, IBS_OP_DATA2 DataSrc provides detail about location of the
> data being accessed from by load ops. Define macros for legacy and
> extended DataSrc values.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>   arch/x86/include/asm/amd-ibs.h | 76 ++++++++++++++++++++++++++++++++++
>   1 file changed, 76 insertions(+)
> 
> diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
> index aabdbb5ab920..22184fe20cf0 100644
> --- a/arch/x86/include/asm/amd-ibs.h
> +++ b/arch/x86/include/asm/amd-ibs.h
> @@ -6,6 +6,82 @@
>   
>   #include <asm/msr-index.h>
>   
> +/* IBS_OP_DATA2 Bits */
> +#define IBS_DATA_SRC_HI_SHIFT			6
> +#define IBS_DATA_SRC_HI_MASK			(0x3ULL << IBS_DATA_SRC_HI_SHIFT)

Is there a reason we're not using the existing bitfield
definitions?  E.g., data_src_hi for the case above.

Thanks,

Kim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/13] perf/x86/amd: Add IBS OP_DATA2/3 register bit definitions
  2022-05-26 15:08   ` Kim Phillips
@ 2022-06-01  4:25     ` Ravi Bangoria
  0 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2022-06-01  4:25 UTC (permalink / raw)
  To: Kim Phillips
  Cc: peterz, acme, jolsa, namhyung, eranian, irogers, jmario, leo.yan,
	alisaidi, ak, kan.liang, dave.hansen, hpa, mingo, mark.rutland,
	alexander.shishkin, tglx, bp, x86, linux-perf-users,
	linux-kernel, sandipan.das, ananth.narayan, santosh.shukla,
	Ravi Bangoria

Hi Kim,

On 26-May-22 8:38 PM, Kim Phillips wrote:
> On 5/25/22 4:39 AM, Ravi Bangoria wrote:
> 
> Hi Ravi,
> 
>> AMD IBS OP_DATA2 and OP_DATA3 provides detail about tagged load/store
>> ops. Add definitions for these registers into header file. In addition
>> to those, IBS_OP_DATA2 DataSrc provides detail about location of the
>> data being accessed from by load ops. Define macros for legacy and
>> extended DataSrc values.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>> ---
>>   arch/x86/include/asm/amd-ibs.h | 76 ++++++++++++++++++++++++++++++++++
>>   1 file changed, 76 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
>> index aabdbb5ab920..22184fe20cf0 100644
>> --- a/arch/x86/include/asm/amd-ibs.h
>> +++ b/arch/x86/include/asm/amd-ibs.h
>> @@ -6,6 +6,82 @@
>>     #include <asm/msr-index.h>
>>   +/* IBS_OP_DATA2 Bits */
>> +#define IBS_DATA_SRC_HI_SHIFT            6
>> +#define IBS_DATA_SRC_HI_MASK            (0x3ULL << IBS_DATA_SRC_HI_SHIFT)
> 
> Is there a reason we're not using the existing bitfield
> definitions?  E.g., data_src_hi for the case above.

Yes, we might be able to use those. Thanks for pointing.

- Ravi

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-06-01  4:25 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-25  9:39 [PATCH 00/13] perf mem/c2c: Add support for AMD Ravi Bangoria
2022-05-25  9:39 ` [PATCH 01/13] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
2022-05-25  9:39 ` [PATCH 02/13] perf/x86/amd: Add IBS OP_DATA2/3 register bit definitions Ravi Bangoria
2022-05-26 15:08   ` Kim Phillips
2022-06-01  4:25     ` Ravi Bangoria
2022-05-25  9:39 ` [PATCH 03/13] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC based on IBS_OP_DATA* Ravi Bangoria
2022-05-25  9:39 ` [PATCH 04/13] perf/x86/amd: Support PERF_SAMPLE_WEIGHT using IBS OP_DATA3[IbsDcMissLat] Ravi Bangoria
2022-05-25 12:58   ` Stephane Eranian
2022-05-26 12:14     ` Ravi Bangoria
2022-05-25  9:39 ` [PATCH 05/13] perf/x86/amd: Support PERF_SAMPLE_ADDR using IBS_DC_LINADDR Ravi Bangoria
2022-05-25  9:39 ` [PATCH 06/13] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR using IBS_DC_PHYSADDR Ravi Bangoria
2022-05-25 11:21   ` Peter Zijlstra
2022-05-26  8:46     ` Ravi Bangoria
2022-05-26  9:56       ` Peter Zijlstra
2022-05-26 10:59         ` Ravi Bangoria
2022-05-26 11:09           ` Peter Zijlstra
2022-05-25  9:39 ` [PATCH 07/13] perf tool: Sync include/uapi/linux/perf_event.h header Ravi Bangoria
2022-05-25  9:39 ` [PATCH 08/13] perf tool: Sync arch/x86/include/asm/amd-ibs.h header Ravi Bangoria
2022-05-25  9:39 ` [PATCH 09/13] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
2022-05-25  9:39 ` [PATCH 10/13] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
2022-05-25  9:39 ` [PATCH 11/13] perf mem/c2c: Add load store event mappings for AMD Ravi Bangoria
2022-05-25  9:39 ` [PATCH 12/13] perf mem/c2c: Avoid printing empty lines for unsupported events Ravi Bangoria
2022-05-25  9:39 ` [PATCH 13/13] perf mem: Use more generic term for LFB Ravi Bangoria

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.