All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ravi Bangoria <ravi.bangoria@amd.com>
To: <peterz@infradead.org>, <acme@kernel.org>
Cc: <ravi.bangoria@amd.com>, <jolsa@kernel.org>,
	<namhyung@kernel.org>, <eranian@google.com>, <irogers@google.com>,
	<jmario@redhat.com>, <leo.yan@linaro.org>, <alisaidi@amazon.com>,
	<ak@linux.intel.com>, <kan.liang@linux.intel.com>,
	<dave.hansen@linux.intel.com>, <hpa@zytor.com>,
	<mingo@redhat.com>, <mark.rutland@arm.com>,
	<alexander.shishkin@linux.intel.com>, <tglx@linutronix.de>,
	<bp@alien8.de>, <x86@kernel.org>,
	<linux-perf-users@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <sandipan.das@amd.com>,
	<ananth.narayan@amd.com>, <kim.phillips@amd.com>,
	<santosh.shukla@amd.com>
Subject: [PATCH v3 12/15] perf mem/c2c: Add load store event mappings for AMD
Date: Wed, 28 Sep 2022 15:28:02 +0530	[thread overview]
Message-ID: <20220928095805.596-13-ravi.bangoria@amd.com> (raw)
In-Reply-To: <20220928095805.596-1-ravi.bangoria@amd.com>

Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Wire in ibs_op// event as mem-ldst
event for AMD.

There are some limitations though: Only load/store micro-ops provide
mem/c2c information. Whereas, IBS does not have a way to choose a
particular type of micro-op to tag. This results in many non-LS
micro-ops being tagged which appear as N/A in the perf report. IBS,
being an uncore pmu from kernel point of view[1], does not support per
process monitoring. Thus, perf mem/c2c on AMD are currently supported
in per-cpu mode only.

Example:
  $ sudo ./perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo ./perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit            10  HitM
  Remote Cache (2 hops) hit             6  N/A
  Uncached hit                         4  N/A

[1]: https://lore.kernel.org/lkml/20220829113347.295-1-ravi.bangoria@amd.com

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/Documentation/perf-c2c.txt | 14 ++++++++----
 tools/perf/Documentation/perf-mem.txt |  3 ++-
 tools/perf/arch/x86/util/mem-events.c | 31 +++++++++++++++++++++++++--
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index f1f7ae6b08d1..5c5eb2def83e 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -19,9 +19,10 @@ C2C stands for Cache To Cache.
 The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
 you to track down the cacheline contentions.
 
-On x86, the tool is based on load latency and precise store facility events
+On Intel, the tool is based on load latency and precise store facility events
 provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
-with thresholding feature.
+with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware
+limitations, perf c2c is not supported on Zen3 cpus).
 
 These events provide:
   - memory address of the access
@@ -49,7 +50,8 @@ RECORD OPTIONS
 
 -l::
 --ldlat::
-	Configure mem-loads latency. (x86 only)
+	Configure mem-loads latency. Supported on Intel and Arm64 processors
+	only. Ignored on other archs.
 
 -k::
 --all-kernel::
@@ -135,11 +137,15 @@ Following perf record options are configured by default:
   -W,-d,--phys-data,--sample-cpu
 
 Unless specified otherwise with '-e' option, following events are monitored by
-default on x86:
+default on Intel:
 
   cpu/mem-loads,ldlat=30/P
   cpu/mem-stores/P
 
+following on AMD:
+
+  ibs_op//
+
 and following on PowerPC:
 
   cpu/mem-loads/
diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 66177511c5c4..005c95580b1e 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -85,7 +85,8 @@ RECORD OPTIONS
 	Be more verbose (show counter open errors, etc)
 
 --ldlat <n>::
-	Specify desired latency for loads event. (x86 only)
+	Specify desired latency for loads event. Supported on Intel and Arm64
+	processors only. Ignored on other archs.
 
 In addition, for report all perf report options are valid, and for record
 all perf record options.
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index 5214370ca4e4..f683ac702247 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -1,7 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "util/pmu.h"
+#include "util/env.h"
 #include "map_symbol.h"
 #include "mem-events.h"
+#include "linux/string.h"
 
 static char mem_loads_name[100];
 static bool mem_loads_name__init;
@@ -12,18 +14,43 @@ static char mem_stores_name[100];
 
 #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
 
-static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+static struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
 	E("ldlat-loads",	"%s/mem-loads,ldlat=%u/P",	"%s/events/mem-loads"),
 	E("ldlat-stores",	"%s/mem-stores/P",		"%s/events/mem-stores"),
 	E(NULL,			NULL,				NULL),
 };
 
+static struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
+	E(NULL,		NULL,		NULL),
+	E(NULL,		NULL,		NULL),
+	E("mem-ldst",	"ibs_op//",	"ibs_op"),
+};
+
+static int perf_mem_is_amd_cpu(void)
+{
+	struct perf_env env = { .total_mem = 0, };
+
+	perf_env__cpuid(&env);
+	if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD"))
+		return 1;
+	return -1;
+}
+
 struct perf_mem_event *perf_mem_events__ptr(int i)
 {
+	/* 0: Uninitialized, 1: Yes, -1: No */
+	static int is_amd;
+
 	if (i >= PERF_MEM_EVENTS__MAX)
 		return NULL;
 
-	return &perf_mem_events[i];
+	if (!is_amd)
+		is_amd = perf_mem_is_amd_cpu();
+
+	if (is_amd == 1)
+		return &perf_mem_events_amd[i];
+
+	return &perf_mem_events_intel[i];
 }
 
 bool is_mem_loads_aux_event(struct evsel *leader)
-- 
2.31.1


  parent reply	other threads:[~2022-09-28 10:02 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-28  9:57 [PATCH v3 00/15] perf mem/c2c: Add support for AMD Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 01/15] perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-30 10:48   ` [PATCH v3 01/15] " kajoljain
2022-09-30 12:50     ` Ravi Bangoria
2022-09-30 14:17       ` Liang, Kan
2022-10-01  6:37         ` Ravi Bangoria
2022-10-03 13:15           ` Liang, Kan
2022-10-06 11:38             ` Ravi Bangoria
2022-10-14 13:53               ` Arnaldo Carvalho de Melo
2022-10-14 15:04                 ` Ravi Bangoria
2022-10-27  8:25           ` Peter Zijlstra
2022-10-28  6:41           ` [tip: perf/urgent] perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 02/15] perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions Ravi Bangoria
2022-09-30  4:41   ` Namhyung Kim
2022-09-30  4:48     ` Ravi Bangoria
2022-09-30  5:11       ` Namhyung Kim
2022-09-30  6:16         ` Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 03/15] perf/x86/amd: Support PERF_SAMPLE_DATA_SRC Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 04/15] perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} Ravi Bangoria
2022-09-30  5:09   ` Namhyung Kim
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 05/15] perf/x86/amd: Support PERF_SAMPLE_ADDR Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 06/15] perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR Ravi Bangoria
2022-09-30  4:59   ` Namhyung Kim
2022-09-30  5:05     ` Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-30 17:02   ` [PATCH v3 06/15] " Jiri Olsa
2022-09-28  9:57 ` [PATCH v3 07/15] perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file Ravi Bangoria
2022-09-30  9:31   ` [tip: perf/core] " tip-bot2 for Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 08/15] perf tool: Sync include/uapi/linux/perf_event.h header Ravi Bangoria
2022-09-28  9:57 ` [PATCH v3 09/15] perf tool: Sync arch/x86/include/asm/amd-ibs.h header Ravi Bangoria
2022-09-28  9:58 ` [PATCH v3 10/15] perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO} Ravi Bangoria
2022-09-28  9:58 ` [PATCH v3 11/15] perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events Ravi Bangoria
2022-09-28  9:58 ` Ravi Bangoria [this message]
2022-09-28  9:58 ` [PATCH v3 13/15] perf mem/c2c: Avoid printing empty lines for unsupported events Ravi Bangoria
2022-09-28  9:58 ` [PATCH v3 14/15] perf mem: Use more generic term for LFB Ravi Bangoria
2022-09-28  9:58 ` [PATCH v3 15/15] perf script: Add missing fields in usage hint Ravi Bangoria

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220928095805.596-13-ravi.bangoria@amd.com \
    --to=ravi.bangoria@amd.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alisaidi@amazon.com \
    --cc=ananth.narayan@amd.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=eranian@google.com \
    --cc=hpa@zytor.com \
    --cc=irogers@google.com \
    --cc=jmario@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=kim.phillips@amd.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sandipan.das@amd.com \
    --cc=santosh.shukla@amd.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.