linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-04-26 13:59 Ali Saidi
  2022-04-26 13:59 ` [PATCH v7 1/5] perf: Add SNOOP_PEER flag to perf mem data struct Ali Saidi
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Ali Saidi @ 2022-04-26 13:59 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line
contention and transfers on Arm platforms. 

This changes enables future changes to c2c on a system with SPE where lines that
are shared among multiple cores show up in perf c2c output. 

Changes in v7:
 * Minor change requested by Leo Yan

Changes in v6:
  * Drop changes to c2c command which will come from Leo Yan

Changes in v5:
  * Add a new snooping type to disambiguate cache-to-cache transfers where
    we don't know if the data is clean or dirty.
  * Set snoop flags on all the data-source cases
  * Special case stores as we have no information on them

Changes in v4:
  * Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/ 
  * Add neoverse-v1 to the neoverse cores list

Ali Saidi (4):
  tools: arm64: Import cputype.h
  perf arm-spe: Use SPE data source for neoverse cores
  perf mem: Support mem_lvl_num in c2c command
  perf mem: Support HITM for when mem_lvl_num is any

 tools/arch/arm64/include/asm/cputype.h        | 258 ++++++++++++++++++
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
 tools/perf/util/arm-spe.c                     | 110 +++++++-
 tools/perf/util/mem-events.c                  |  20 +-
 5 files changed, 383 insertions(+), 18 deletions(-)
 create mode 100644 tools/arch/arm64/include/asm/cputype.h

-- 
2.32.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v7 1/5] perf: Add SNOOP_PEER flag to perf mem data struct
  2022-04-26 13:59 [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
@ 2022-04-26 13:59 ` Ali Saidi
  2022-04-26 13:59 ` [PATCH v7 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER Ali Saidi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Ali Saidi @ 2022-04-26 13:59 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Add a flag to the perf mem data struct to signal that a request caused a
cache-to-cache transfer of a line from a peer of the requestor and
wasn't sourced from a lower cache level.  The line being moved from one
peer cache to another has latency and performance implications. On Arm64
Neoverse systems the data source can indicate a cache-to-cache transfer
but not if the line is dirty or clean, so instead of overloading HITM
define a new flag that indicates this type of transfer.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
---
 include/uapi/linux/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d37629dbad72..7b88bfd097dc 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1310,7 +1310,7 @@ union perf_mem_data_src {
 #define PERF_MEM_SNOOP_SHIFT	19
 
 #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
 #define PERF_MEM_SNOOPX_SHIFT  38
 
 /* locked instruction */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
  2022-04-26 13:59 [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
  2022-04-26 13:59 ` [PATCH v7 1/5] perf: Add SNOOP_PEER flag to perf mem data struct Ali Saidi
@ 2022-04-26 13:59 ` Ali Saidi
  2022-04-26 13:59 ` [PATCH v7 3/5] perf mem: Print snoop peer flag Ali Saidi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Ali Saidi @ 2022-04-26 13:59 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Add a flag to the perf mem data struct to signal that a request caused a
cache-to-cache transfer of a line from a peer of the requestor and
wasn't sourced from a lower cache level.  The line being moved from one
peer cache to another has latency and performance implications. On Arm64
Neoverse systems the data source can indicate a cache-to-cache transfer
but not if the line is dirty or clean, so instead of overloading HITM
define a new flag that indicates this type of transfer.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
---
 tools/include/uapi/linux/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index d37629dbad72..7b88bfd097dc 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1310,7 +1310,7 @@ union perf_mem_data_src {
 #define PERF_MEM_SNOOP_SHIFT	19
 
 #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
 #define PERF_MEM_SNOOPX_SHIFT  38
 
 /* locked instruction */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 3/5] perf mem: Print snoop peer flag
  2022-04-26 13:59 [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
  2022-04-26 13:59 ` [PATCH v7 1/5] perf: Add SNOOP_PEER flag to perf mem data struct Ali Saidi
  2022-04-26 13:59 ` [PATCH v7 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER Ali Saidi
@ 2022-04-26 13:59 ` Ali Saidi
  2022-04-26 13:59 ` [PATCH v7 4/5] perf arm-spe: Don't set data source if it's not a memory operation Ali Saidi
  2022-04-26 13:59 ` [PATCH v7 5/5] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi
  4 siblings, 0 replies; 14+ messages in thread
From: Ali Saidi @ 2022-04-26 13:59 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

From: Leo Yan <leo.yan@linaro.org>

Since PERF_MEM_SNOOPX_PEER flag is a new snoop type, print this flag if
it is set.

Before:
       memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)

After:

       memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Ali Saidi <alisaidi@amazon.com>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/util/mem-events.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index efaf263464b9..db5225caaabe 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -410,6 +410,11 @@ static const char * const snoop_access[] = {
 	"HitM",
 };
 
+static const char * const snoopx_access[] = {
+	"Fwd",
+	"Peer",
+};
+
 int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 {
 	size_t i, l = 0;
@@ -430,13 +435,20 @@ int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 		}
 		l += scnprintf(out + l, sz - l, snoop_access[i]);
 	}
-	if (mem_info &&
-	     (mem_info->data_src.mem_snoopx & PERF_MEM_SNOOPX_FWD)) {
+
+	m = 0;
+	if (mem_info)
+		m = mem_info->data_src.mem_snoopx;
+
+	for (i = 0; m && i < ARRAY_SIZE(snoopx_access); i++, m >>= 1) {
+		if (!(m & 0x1))
+			continue;
+
 		if (l) {
 			strcat(out, " or ");
 			l += 4;
 		}
-		l += scnprintf(out + l, sz - l, "Fwd");
+		l += scnprintf(out + l, sz - l, snoopx_access[i]);
 	}
 
 	if (*out == '\0')
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 4/5] perf arm-spe: Don't set data source if it's not a memory operation
  2022-04-26 13:59 [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
                   ` (2 preceding siblings ...)
  2022-04-26 13:59 ` [PATCH v7 3/5] perf mem: Print snoop peer flag Ali Saidi
@ 2022-04-26 13:59 ` Ali Saidi
  2022-05-03 10:20   ` German Gomez
  2022-04-26 13:59 ` [PATCH v7 5/5] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi
  4 siblings, 1 reply; 14+ messages in thread
From: Ali Saidi @ 2022-04-26 13:59 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

From: Leo Yan <leo.yan@linaro.org>

Except memory load and store operations, Arm SPE records also can
support other operation types, bug when set the data source field the
current code assumes a record is a either load operation or store
operation, this leads to wrongly synthesize memory samples.

This patch strictly checks the record operation type, it only sets data
source only for the operation types ARM_SPE_LD and ARM_SPE_ST,
otherwise, returns zero for data source.  Therefore, we can synthesize
memory samples only when data source is a non-zero value, the function
arm_spe__is_memory_event() is useless and removed.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Ali Saidi <alisaidi@amazon.com>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/util/arm-spe.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index d2b64e3f588b..e032efc03274 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -387,26 +387,16 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
 
-#define SPE_MEM_TYPE	(ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS | \
-			 ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS | \
-			 ARM_SPE_REMOTE_ACCESS)
-
-static bool arm_spe__is_memory_event(enum arm_spe_sample_type type)
-{
-	if (type & SPE_MEM_TYPE)
-		return true;
-
-	return false;
-}
-
 static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
 {
 	union perf_mem_data_src	data_src = { 0 };
 
 	if (record->op == ARM_SPE_LD)
 		data_src.mem_op = PERF_MEM_OP_LOAD;
-	else
+	else if (record->op == ARM_SPE_ST)
 		data_src.mem_op = PERF_MEM_OP_STORE;
+	else
+		return 0;
 
 	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
 		data_src.mem_lvl = PERF_MEM_LVL_L3;
@@ -510,7 +500,11 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 			return err;
 	}
 
-	if (spe->sample_memory && arm_spe__is_memory_event(record->type)) {
+	/*
+	 * When data_src is zero it means the record is not a memory operation,
+	 * skip to synthesize memory sample for this case.
+	 */
+	if (spe->sample_memory && data_src) {
 		err = arm_spe__synth_mem_sample(speq, spe->memory_id, data_src);
 		if (err)
 			return err;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 5/5] perf arm-spe: Use SPE data source for neoverse cores
  2022-04-26 13:59 [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
                   ` (3 preceding siblings ...)
  2022-04-26 13:59 ` [PATCH v7 4/5] perf arm-spe: Don't set data source if it's not a memory operation Ali Saidi
@ 2022-04-26 13:59 ` Ali Saidi
  2022-04-27 15:59   ` Leo Yan
  2022-05-03  9:58   ` German Gomez
  4 siblings, 2 replies; 14+ messages in thread
From: Ali Saidi @ 2022-04-26 13:59 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
the same encoding. I can't find encoding information for any other SPE
implementations to unify their choices with Arm's thus that is left for
future work.

This change populates the mem_lvl_num for Neoverse cores as well as the
deprecated mem_lvl namespace.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
---
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 ++
 tools/perf/util/arm-spe.c                     | 130 +++++++++++++++---
 3 files changed, 127 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 5e390a1a79ab..091987dd3966 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -220,6 +220,7 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
 
 			break;
 		case ARM_SPE_DATA_SOURCE:
+			decoder->record.source = payload;
 			break;
 		case ARM_SPE_BAD:
 			break;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 69b31084d6be..46a61df1145b 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -29,6 +29,17 @@ enum arm_spe_op_type {
 	ARM_SPE_ST		= 1 << 1,
 };
 
+enum arm_spe_neoverse_data_source {
+	ARM_SPE_NV_L1D		 = 0x0,
+	ARM_SPE_NV_L2		 = 0x8,
+	ARM_SPE_NV_PEER_CORE	 = 0x9,
+	ARM_SPE_NV_LOCAL_CLUSTER = 0xa,
+	ARM_SPE_NV_SYS_CACHE	 = 0xb,
+	ARM_SPE_NV_PEER_CLUSTER	 = 0xc,
+	ARM_SPE_NV_REMOTE	 = 0xd,
+	ARM_SPE_NV_DRAM		 = 0xe,
+};
+
 struct arm_spe_record {
 	enum arm_spe_sample_type type;
 	int err;
@@ -40,6 +51,7 @@ struct arm_spe_record {
 	u64 virt_addr;
 	u64 phys_addr;
 	u64 context_id;
+	u16 source;
 };
 
 struct arm_spe_insn;
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index e032efc03274..4065fa180140 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -34,6 +34,7 @@
 #include "arm-spe-decoder/arm-spe-decoder.h"
 #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
+#include "../../arch/arm64/include/asm/cputype.h"
 #define MAX_TIMESTAMP (~0ULL)
 
 struct arm_spe {
@@ -45,6 +46,7 @@ struct arm_spe {
 	struct perf_session		*session;
 	struct machine			*machine;
 	u32				pmu_type;
+	u64				midr;
 
 	struct perf_tsc_conversion	tc;
 
@@ -387,35 +389,128 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
 
-static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
+static const struct midr_range neoverse_spe[] = {
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
+	{},
+};
+
+static void arm_spe__synth_data_source_neoverse(const struct arm_spe_record *record,
+						union perf_mem_data_src *data_src)
 {
-	union perf_mem_data_src	data_src = { 0 };
+	/*
+	 * Even though four levels of cache hierarchy are possible, no known
+	 * production Neoverse systems currently include more than three levels
+	 * so for the time being we assume three exist. If a production system
+	 * is built with four the this function would have to be changed to
+	 * detect the number of levels for reporting.
+	 */
 
-	if (record->op == ARM_SPE_LD)
-		data_src.mem_op = PERF_MEM_OP_LOAD;
-	else if (record->op == ARM_SPE_ST)
-		data_src.mem_op = PERF_MEM_OP_STORE;
-	else
-		return 0;
+	/*
+	 * We have no data on the hit level or data source for stores in the
+	 * Neoverse SPE records.
+	 */
+	if (record->op & ARM_SPE_ST) {
+		data_src->mem_lvl = PERF_MEM_LVL_NA;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+		return;
+	}
+
+	switch (record->source) {
+	case ARM_SPE_NV_L1D:
+		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+		break;
+	case ARM_SPE_NV_L2:
+		data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+		break;
+	case ARM_SPE_NV_PEER_CORE:
+		data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
+		data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
+		break;
+	/*
+	 * We don't know if this is L1, L2 but we do know it was a cache-2-cache
+	 * transfer, so set SNOOPX_PEER
+	 */
+	case ARM_SPE_NV_LOCAL_CLUSTER:
+	case ARM_SPE_NV_PEER_CLUSTER:
+		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+		data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
+		break;
+	/*
+	 * System cache is assumed to be L3
+	 */
+	case ARM_SPE_NV_SYS_CACHE:
+		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+		data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
+		break;
+	/*
+	 * We don't know what level it hit in, except it came from the other
+	 * socket
+	 */
+	case ARM_SPE_NV_REMOTE:
+		data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
+		data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+		break;
+	case ARM_SPE_NV_DRAM:
+		data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+		break;
+	default:
+		break;
+	}
+}
 
+static void arm_spe__synth_data_source_generic(const struct arm_spe_record *record,
+					       union perf_mem_data_src *data_src)
+{
 	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
-		data_src.mem_lvl = PERF_MEM_LVL_L3;
+		data_src->mem_lvl = PERF_MEM_LVL_L3;
 
 		if (record->type & ARM_SPE_LLC_MISS)
-			data_src.mem_lvl |= PERF_MEM_LVL_MISS;
+			data_src->mem_lvl |= PERF_MEM_LVL_MISS;
 		else
-			data_src.mem_lvl |= PERF_MEM_LVL_HIT;
+			data_src->mem_lvl |= PERF_MEM_LVL_HIT;
 	} else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
-		data_src.mem_lvl = PERF_MEM_LVL_L1;
+		data_src->mem_lvl = PERF_MEM_LVL_L1;
 
 		if (record->type & ARM_SPE_L1D_MISS)
-			data_src.mem_lvl |= PERF_MEM_LVL_MISS;
+			data_src->mem_lvl |= PERF_MEM_LVL_MISS;
 		else
-			data_src.mem_lvl |= PERF_MEM_LVL_HIT;
+			data_src->mem_lvl |= PERF_MEM_LVL_HIT;
 	}
 
 	if (record->type & ARM_SPE_REMOTE_ACCESS)
-		data_src.mem_lvl |= PERF_MEM_LVL_REM_CCE1;
+		data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
+}
+
+static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 midr)
+{
+	union perf_mem_data_src	data_src = { 0 };
+	bool is_neoverse = is_midr_in_range(midr, neoverse_spe);
+
+	if (record->op == ARM_SPE_LD)
+		data_src.mem_op = PERF_MEM_OP_LOAD;
+	else if (record->op == ARM_SPE_ST)
+		data_src.mem_op = PERF_MEM_OP_STORE;
+	else
+		return 0;
+
+	if (is_neoverse)
+		arm_spe__synth_data_source_neoverse(record, &data_src);
+	else
+		arm_spe__synth_data_source_generic(record, &data_src);
 
 	if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
 		data_src.mem_dtlb = PERF_MEM_TLB_WK;
@@ -436,7 +531,7 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 	u64 data_src;
 	int err;
 
-	data_src = arm_spe__synth_data_source(record);
+	data_src = arm_spe__synth_data_source(record, spe->midr);
 
 	if (spe->sample_flc) {
 		if (record->type & ARM_SPE_L1D_MISS) {
@@ -1177,6 +1272,8 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
 	size_t min_sz = sizeof(u64) * ARM_SPE_AUXTRACE_PRIV_MAX;
 	struct perf_record_time_conv *tc = &session->time_conv;
+	const char *cpuid = perf_env__cpuid(session->evlist->env);
+	u64 midr = strtol(cpuid, NULL, 16);
 	struct arm_spe *spe;
 	int err;
 
@@ -1196,6 +1293,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->machine = &session->machines.host; /* No kvm support */
 	spe->auxtrace_type = auxtrace_info->type;
 	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
+	spe->midr = midr;
 
 	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 5/5] perf arm-spe: Use SPE data source for neoverse cores
  2022-04-26 13:59 ` [PATCH v7 5/5] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi
@ 2022-04-27 15:59   ` Leo Yan
  2022-05-03  9:58   ` German Gomez
  1 sibling, 0 replies; 14+ messages in thread
From: Leo Yan @ 2022-04-27 15:59 UTC (permalink / raw)
  To: Ali Saidi
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	acme, benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will

On Tue, Apr 26, 2022 at 01:59:37PM +0000, Ali Saidi wrote:
> When synthesizing data from SPE, augment the type with source information
> for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
> the same encoding. I can't find encoding information for any other SPE
> implementations to unify their choices with Arm's thus that is left for
> future work.
> 
> This change populates the mem_lvl_num for Neoverse cores as well as the
> deprecated mem_lvl namespace.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>

I tested this patch and also did the integration testing with perf ć2c
tool, this patch looks good to me:

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>

James, German, it's better if this patch set can get the green light
from you.  So please take a look, thanks!

Leo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 5/5] perf arm-spe: Use SPE data source for neoverse cores
  2022-04-26 13:59 ` [PATCH v7 5/5] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi
  2022-04-27 15:59   ` Leo Yan
@ 2022-05-03  9:58   ` German Gomez
  2022-05-04 16:22     ` Ali Saidi
  2022-05-05 15:12     ` Leo Yan
  1 sibling, 2 replies; 14+ messages in thread
From: German Gomez @ 2022-05-03  9:58 UTC (permalink / raw)
  To: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	leo.yan, acme
  Cc: benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will

Hi Ali, Leo

Some minor comments below.

On 26/04/2022 14:59, Ali Saidi wrote:
> When synthesizing data from SPE, augment the type with source information
> for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
> the same encoding. I can't find encoding information for any other SPE
> implementations to unify their choices with Arm's thus that is left for
> future work.
>
> This change populates the mem_lvl_num for Neoverse cores as well as the
> deprecated mem_lvl namespace.
>
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> ---
>  .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 ++
>  tools/perf/util/arm-spe.c                     | 130 +++++++++++++++---
>  3 files changed, 127 insertions(+), 16 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> index 5e390a1a79ab..091987dd3966 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> @@ -220,6 +220,7 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
>  
>  			break;
>  		case ARM_SPE_DATA_SOURCE:
> +			decoder->record.source = payload;
>  			break;
>  		case ARM_SPE_BAD:
>  			break;
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> index 69b31084d6be..46a61df1145b 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> @@ -29,6 +29,17 @@ enum arm_spe_op_type {
>  	ARM_SPE_ST		= 1 << 1,
>  };
>  
> +enum arm_spe_neoverse_data_source {
> +	ARM_SPE_NV_L1D		 = 0x0,
> +	ARM_SPE_NV_L2		 = 0x8,
> +	ARM_SPE_NV_PEER_CORE	 = 0x9,
> +	ARM_SPE_NV_LOCAL_CLUSTER = 0xa,
> +	ARM_SPE_NV_SYS_CACHE	 = 0xb,
> +	ARM_SPE_NV_PEER_CLUSTER	 = 0xc,
> +	ARM_SPE_NV_REMOTE	 = 0xd,
> +	ARM_SPE_NV_DRAM		 = 0xe,
> +};
> +
>  struct arm_spe_record {
>  	enum arm_spe_sample_type type;
>  	int err;
> @@ -40,6 +51,7 @@ struct arm_spe_record {
>  	u64 virt_addr;
>  	u64 phys_addr;
>  	u64 context_id;
> +	u16 source;
>  };
>  
>  struct arm_spe_insn;
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index e032efc03274..4065fa180140 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -34,6 +34,7 @@
>  #include "arm-spe-decoder/arm-spe-decoder.h"
>  #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
>  
> +#include "../../arch/arm64/include/asm/cputype.h"
>  #define MAX_TIMESTAMP (~0ULL)
>  
>  struct arm_spe {
> @@ -45,6 +46,7 @@ struct arm_spe {
>  	struct perf_session		*session;
>  	struct machine			*machine;
>  	u32				pmu_type;
> +	u64				midr;
>  
>  	struct perf_tsc_conversion	tc;
>  
> @@ -387,35 +389,128 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
>  	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
>  }
>  
> -static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
> +static const struct midr_range neoverse_spe[] = {
> +	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
> +	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
> +	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
> +	{},
> +};
> +
> +static void arm_spe__synth_data_source_neoverse(const struct arm_spe_record *record,
> +						union perf_mem_data_src *data_src)
>  {
> -	union perf_mem_data_src	data_src = { 0 };
> +	/*
> +	 * Even though four levels of cache hierarchy are possible, no known
> +	 * production Neoverse systems currently include more than three levels
> +	 * so for the time being we assume three exist. If a production system
> +	 * is built with four the this function would have to be changed to
> +	 * detect the number of levels for reporting.
> +	 */
>  
> -	if (record->op == ARM_SPE_LD)
> -		data_src.mem_op = PERF_MEM_OP_LOAD;
> -	else if (record->op == ARM_SPE_ST)
> -		data_src.mem_op = PERF_MEM_OP_STORE;
> -	else
> -		return 0;
> +	/*
> +	 * We have no data on the hit level or data source for stores in the
> +	 * Neoverse SPE records.
> +	 */
> +	if (record->op & ARM_SPE_ST) {
> +		data_src->mem_lvl = PERF_MEM_LVL_NA;
> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;

Is it correct to use ANY_CACHE as the NA value? The LVLNUM_* enum begins at 1, so it looks like this should be set to 0 instead (like in HOPS_*).

> +		data_src->mem_snoop = PERF_MEM_SNOOP_NA;
> +		return;
> +	}
> +
> +	switch (record->source) {
> +	case ARM_SPE_NV_L1D:
> +		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
> +		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
> +		break;
> +	case ARM_SPE_NV_L2:
> +		data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
> +		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
> +		break;
> +	case ARM_SPE_NV_PEER_CORE:
> +		data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
> +		data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
> +		break;
> +	/*
> +	 * We don't know if this is L1, L2 but we do know it was a cache-2-cache
> +	 * transfer, so set SNOOPX_PEER
> +	 */
> +	case ARM_SPE_NV_LOCAL_CLUSTER:
> +	case ARM_SPE_NV_PEER_CLUSTER:
> +		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
> +		data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
> +		break;
> +	/*
> +	 * System cache is assumed to be L3
> +	 */
> +	case ARM_SPE_NV_SYS_CACHE:
> +		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
> +		data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> +		break;
> +	/*
> +	 * We don't know what level it hit in, except it came from the other
> +	 * socket
> +	 */
> +	case ARM_SPE_NV_REMOTE:
> +		data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
> +		data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
> +		data_src->mem_snoop = PERF_MEM_SNOOP_NA;
> +		break;
> +	case ARM_SPE_NV_DRAM:
> +		data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
> +		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
> +		break;
> +	default:
> +		break;
> +	}
> +}
>  
> +static void arm_spe__synth_data_source_generic(const struct arm_spe_record *record,
> +					       union perf_mem_data_src *data_src)
> +{
>  	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
> -		data_src.mem_lvl = PERF_MEM_LVL_L3;
> +		data_src->mem_lvl = PERF_MEM_LVL_L3;

Thanks for addressing my previous comment about filling both mem_lvl and mem_lvl_num.

I wonder if it's also worth updating for the non-Neoverse cores as well while we're at it. I'll let Leo decide since this patchset is only focused on Neoverse.

Aside from these comments, this change also looks good to me. Thanks for catching the things I missed in my previous review.

Reviewed-by: German Gomez <german.gomez@arm.com>

>  
>  		if (record->type & ARM_SPE_LLC_MISS)
> -			data_src.mem_lvl |= PERF_MEM_LVL_MISS;
> +			data_src->mem_lvl |= PERF_MEM_LVL_MISS;
>  		else
> -			data_src.mem_lvl |= PERF_MEM_LVL_HIT;
> +			data_src->mem_lvl |= PERF_MEM_LVL_HIT;
>  	} else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
> -		data_src.mem_lvl = PERF_MEM_LVL_L1;
> +		data_src->mem_lvl = PERF_MEM_LVL_L1;
>  
>  		if (record->type & ARM_SPE_L1D_MISS)
> -			data_src.mem_lvl |= PERF_MEM_LVL_MISS;
> +			data_src->mem_lvl |= PERF_MEM_LVL_MISS;
>  		else
> -			data_src.mem_lvl |= PERF_MEM_LVL_HIT;
> +			data_src->mem_lvl |= PERF_MEM_LVL_HIT;
>  	}
>  
>  	if (record->type & ARM_SPE_REMOTE_ACCESS)
> -		data_src.mem_lvl |= PERF_MEM_LVL_REM_CCE1;
> +		data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
> +}
> +
> +static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 midr)
> +{
> +	union perf_mem_data_src	data_src = { 0 };
> +	bool is_neoverse = is_midr_in_range(midr, neoverse_spe);
> +
> +	if (record->op == ARM_SPE_LD)
> +		data_src.mem_op = PERF_MEM_OP_LOAD;
> +	else if (record->op == ARM_SPE_ST)
> +		data_src.mem_op = PERF_MEM_OP_STORE;
> +	else
> +		return 0;
> +
> +	if (is_neoverse)
> +		arm_spe__synth_data_source_neoverse(record, &data_src);
> +	else
> +		arm_spe__synth_data_source_generic(record, &data_src);
>  
>  	if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
>  		data_src.mem_dtlb = PERF_MEM_TLB_WK;
> @@ -436,7 +531,7 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
>  	u64 data_src;
>  	int err;
>  
> -	data_src = arm_spe__synth_data_source(record);
> +	data_src = arm_spe__synth_data_source(record, spe->midr);
>  
>  	if (spe->sample_flc) {
>  		if (record->type & ARM_SPE_L1D_MISS) {
> @@ -1177,6 +1272,8 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>  	struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
>  	size_t min_sz = sizeof(u64) * ARM_SPE_AUXTRACE_PRIV_MAX;
>  	struct perf_record_time_conv *tc = &session->time_conv;
> +	const char *cpuid = perf_env__cpuid(session->evlist->env);
> +	u64 midr = strtol(cpuid, NULL, 16);
>  	struct arm_spe *spe;
>  	int err;
>  
> @@ -1196,6 +1293,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>  	spe->machine = &session->machines.host; /* No kvm support */
>  	spe->auxtrace_type = auxtrace_info->type;
>  	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
> +	spe->midr = midr;
>  
>  	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
>  

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 4/5] perf arm-spe: Don't set data source if it's not a memory operation
  2022-04-26 13:59 ` [PATCH v7 4/5] perf arm-spe: Don't set data source if it's not a memory operation Ali Saidi
@ 2022-05-03 10:20   ` German Gomez
  0 siblings, 0 replies; 14+ messages in thread
From: German Gomez @ 2022-05-03 10:20 UTC (permalink / raw)
  To: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	leo.yan, acme
  Cc: benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will


On 26/04/2022 14:59, Ali Saidi wrote:
> From: Leo Yan <leo.yan@linaro.org>
>
> Except memory load and store operations, Arm SPE records also can
> support other operation types, bug when set the data source field the
> current code assumes a record is a either load operation or store
> operation, this leads to wrongly synthesize memory samples.
>
> This patch strictly checks the record operation type, it only sets data
> source only for the operation types ARM_SPE_LD and ARM_SPE_ST,
> otherwise, returns zero for data source.  Therefore, we can synthesize
> memory samples only when data source is a non-zero value, the function
> arm_spe__is_memory_event() is useless and removed.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> Reviewed-by: Ali Saidi <alisaidi@amazon.com>
> Tested-by: Ali Saidi <alisaidi@amazon.com>

I think the Fixes tag is missing, right?

Fixes: e55ed3423c1b ("perf arm-spe: Synthesize memory event")
Reviewed-by: German Gomez <german.gomez@arm.com>

Thanks,
German

> ---
>  tools/perf/util/arm-spe.c | 22 ++++++++--------------
>  1 file changed, 8 insertions(+), 14 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index d2b64e3f588b..e032efc03274 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -387,26 +387,16 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
>  	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
>  }
>  
> -#define SPE_MEM_TYPE	(ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS | \
> -			 ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS | \
> -			 ARM_SPE_REMOTE_ACCESS)
> -
> -static bool arm_spe__is_memory_event(enum arm_spe_sample_type type)
> -{
> -	if (type & SPE_MEM_TYPE)
> -		return true;
> -
> -	return false;
> -}
> -
>  static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
>  {
>  	union perf_mem_data_src	data_src = { 0 };
>  
>  	if (record->op == ARM_SPE_LD)
>  		data_src.mem_op = PERF_MEM_OP_LOAD;
> -	else
> +	else if (record->op == ARM_SPE_ST)
>  		data_src.mem_op = PERF_MEM_OP_STORE;
> +	else
> +		return 0;
>  
>  	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
>  		data_src.mem_lvl = PERF_MEM_LVL_L3;
> @@ -510,7 +500,11 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
>  			return err;
>  	}
>  
> -	if (spe->sample_memory && arm_spe__is_memory_event(record->type)) {
> +	/*
> +	 * When data_src is zero it means the record is not a memory operation,
> +	 * skip to synthesize memory sample for this case.
> +	 */
> +	if (spe->sample_memory && data_src) {
>  		err = arm_spe__synth_mem_sample(speq, spe->memory_id, data_src);
>  		if (err)
>  			return err;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 5/5] perf arm-spe: Use SPE data source for neoverse cores
  2022-05-03  9:58   ` German Gomez
@ 2022-05-04 16:22     ` Ali Saidi
  2022-05-05 15:12     ` Leo Yan
  1 sibling, 0 replies; 14+ messages in thread
From: Ali Saidi @ 2022-05-04 16:22 UTC (permalink / raw)
  To: german.gomez
  Cc: Nick.Forrington, acme, alexander.shishkin, alisaidi,
	andrew.kilroy, benh, james.clark, john.garry, jolsa, kjain,
	leo.yan, lihuafei1, linux-arm-kernel, linux-kernel,
	linux-perf-users, mark.rutland, mathieu.poirier, mingo, namhyung,
	peterz, will

Hi German,

Thanks for taking another look!

On Tue, 3 May 2022 09:58:15 +0000, German Gomez wrote:
> Hi Ali, Leo
> 
> Some minor comments below.
> 
> On 26/04/2022 14:59, Ali Saidi wrote:
> > When synthesizing data from SPE, augment the type with source information
> > for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
> > the same encoding. I can't find encoding information for any other SPE
> > implementations to unify their choices with Arm's thus that is left for
> > future work.
> >
> > This change populates the mem_lvl_num for Neoverse cores as well as the
> > deprecated mem_lvl namespace.
> >
> > Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> > ---
> >  .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
> >  .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 ++
> >  tools/perf/util/arm-spe.c                     | 130 +++++++++++++++---
> >  3 files changed, 127 insertions(+), 16 deletions(-)
> >
> [snip]
> > +	/*
> > +	 * We have no data on the hit level or data source for stores in the
> > +	 * Neoverse SPE records.
> > +	 */
> > +	if (record->op & ARM_SPE_ST) {
> > +		data_src->mem_lvl = PERF_MEM_LVL_NA;
> > +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
> 
> Is it correct to use ANY_CACHE as the NA value? The LVLNUM_* enum begins at 1, so it looks like this should be set to 0 instead (like in HOPS_*).

I think you're making a good point here. To be consistent we should set
mem_lvl_num to PERF_MEM_LVLNUM_NA just like mem_lvl.

Thanks,
Ali




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 5/5] perf arm-spe: Use SPE data source for neoverse cores
  2022-05-03  9:58   ` German Gomez
  2022-05-04 16:22     ` Ali Saidi
@ 2022-05-05 15:12     ` Leo Yan
  1 sibling, 0 replies; 14+ messages in thread
From: Leo Yan @ 2022-05-05 15:12 UTC (permalink / raw)
  To: German Gomez
  Cc: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	acme, benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will

On Tue, May 03, 2022 at 10:58:15AM +0100, German Gomez wrote:

[...]

> > +static void arm_spe__synth_data_source_generic(const struct arm_spe_record *record,
> > +					       union perf_mem_data_src *data_src)
> > +{
> >  	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
> > -		data_src.mem_lvl = PERF_MEM_LVL_L3;
> > +		data_src->mem_lvl = PERF_MEM_LVL_L3;
> 
> Thanks for addressing my previous comment about filling both mem_lvl and mem_lvl_num.
> 
> I wonder if it's also worth updating for the non-Neoverse cores as well while we're at it. I'll let Leo decide since this patchset is only focused on Neoverse.

Thanks for pointing out this.  Yeah, Let's use this patch set for
enabling Neoverse data source.

We can use a new patch to updating cache level for non-Neoverse cores
(it's better also set store cache level as LVL_NA for non-Neoverse cores).

Thanks,
Leo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-04-26  1:59 Ali Saidi
  0 siblings, 0 replies; 14+ messages in thread
From: Ali Saidi @ 2022-04-26  1:59 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line
contention and transfers on Arm platforms. 

This changes enables future changes to c2c on a system with SPE where lines that
are shared among multiple cores show up in perf c2c output. 

Changes in v6:
  * Drop changes to c2c command which will come from Leo Yan

Changes in v5:
  * Add a new snooping type to disambiguate cache-to-cache transfers where
    we don't know if the data is clean or dirty.
  * Set snoop flags on all the data-source cases
  * Special case stores as we have no information on them

Changes in v4:
  * Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/ 
  * Add neoverse-v1 to the neoverse cores list

Ali Saidi (4):
  tools: arm64: Import cputype.h
  perf arm-spe: Use SPE data source for neoverse cores
  perf mem: Support mem_lvl_num in c2c command
  perf mem: Support HITM for when mem_lvl_num is any

 tools/arch/arm64/include/asm/cputype.h        | 258 ++++++++++++++++++
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
 tools/perf/util/arm-spe.c                     | 110 +++++++-
 tools/perf/util/mem-events.c                  |  20 +-
 5 files changed, 383 insertions(+), 18 deletions(-)
 create mode 100644 tools/arch/arm64/include/asm/cputype.h

-- 
2.32.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-04-08 19:53 Ali Saidi
  0 siblings, 0 replies; 14+ messages in thread
From: Ali Saidi @ 2022-04-08 19:53 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line
contention and transfers on Arm platforms. 

This changes enables the expected behavior of perf c2c on a system with
SPE where lines that are shared among multiple cores show up in perf c2c
output. 

These changes switch to use mem_lvl_num to encode the level information
instead of mem_lvl which is being deprecated, but I haven't found other
users of mem_lvl_num. 

Changes in v5:
  * Add a new snooping type to disambiguate cache-to-cache transfers where
    we don't know if the data is clean or dirty.
  * Set snoop flags on all the data-source cases
  * Special case stores as we have no information on them

Changes in v4:
  * Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/ 
  * Add neoverse-v1 to the neoverse cores list

Ali Saidi (4):
  tools: arm64: Import cputype.h
  perf arm-spe: Use SPE data source for neoverse cores
  perf mem: Support mem_lvl_num in c2c command
  perf mem: Support HITM for when mem_lvl_num is any

 tools/arch/arm64/include/asm/cputype.h        | 258 ++++++++++++++++++
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
 tools/perf/util/arm-spe.c                     | 110 +++++++-
 tools/perf/util/mem-events.c                  |  20 +-
 5 files changed, 383 insertions(+), 18 deletions(-)
 create mode 100644 tools/arch/arm64/include/asm/cputype.h

-- 
2.32.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-03-24 18:33 Ali Saidi
  0 siblings, 0 replies; 14+ messages in thread
From: Ali Saidi @ 2022-03-24 18:33 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line
contention and transfers on Arm platforms. 

This changes enables the expected behavior of perf c2c on a system with
SPE where lines that are shared among multiple cores show up in perf c2c
output. 

These changes switch to use mem_lvl_num to encode the level information
instead of mem_lvl which is being deprecated, but I haven't found other
users of mem_lvl_num. 

Changes in v4:
  * Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/ 
  * Add neoverse-v1 to the neoverse cores list

Ali Saidi (4):
  tools: arm64: Import cputype.h
  perf arm-spe: Use SPE data source for neoverse cores
  perf mem: Support mem_lvl_num in c2c command
  perf mem: Support HITM for when mem_lvl_num is any

 tools/arch/arm64/include/asm/cputype.h        | 258 ++++++++++++++++++
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
 tools/perf/util/arm-spe.c                     | 110 +++++++-
 tools/perf/util/mem-events.c                  |  20 +-
 5 files changed, 383 insertions(+), 18 deletions(-)
 create mode 100644 tools/arch/arm64/include/asm/cputype.h

-- 
2.32.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-05 15:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-26 13:59 [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
2022-04-26 13:59 ` [PATCH v7 1/5] perf: Add SNOOP_PEER flag to perf mem data struct Ali Saidi
2022-04-26 13:59 ` [PATCH v7 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER Ali Saidi
2022-04-26 13:59 ` [PATCH v7 3/5] perf mem: Print snoop peer flag Ali Saidi
2022-04-26 13:59 ` [PATCH v7 4/5] perf arm-spe: Don't set data source if it's not a memory operation Ali Saidi
2022-05-03 10:20   ` German Gomez
2022-04-26 13:59 ` [PATCH v7 5/5] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi
2022-04-27 15:59   ` Leo Yan
2022-05-03  9:58   ` German Gomez
2022-05-04 16:22     ` Ali Saidi
2022-05-05 15:12     ` Leo Yan
  -- strict thread matches above, loose matches on Subject: below --
2022-04-26  1:59 [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
2022-04-08 19:53 Ali Saidi
2022-03-24 18:33 Ali Saidi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).