All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-04 18:48 ` Ali Saidi
  0 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line
contention and transfers on Arm platforms. 

This changes enables future changes to c2c on a system with SPE where lines that
are shared among multiple cores show up in perf c2c output. 

Changes in v8:
 * Report NA for both mem_lvl and mem_lvl_num for stores where we have no
   information

Changes in v7:
 * Minor change requested by Leo Yan

Changes in v6:
  * Drop changes to c2c command which will come from Leo Yan

Changes in v5:
  * Add a new snooping type to disambiguate cache-to-cache transfers where
    we don't know if the data is clean or dirty.
  * Set snoop flags on all the data-source cases
  * Special case stores as we have no information on them

Changes in v4:
  * Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/ 
  * Add neoverse-v1 to the neoverse cores list

Ali Saidi (4):
  tools: arm64: Import cputype.h
  perf arm-spe: Use SPE data source for neoverse cores
  perf mem: Support mem_lvl_num in c2c command
  perf mem: Support HITM for when mem_lvl_num is any

 tools/arch/arm64/include/asm/cputype.h        | 258 ++++++++++++++++++
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
 tools/perf/util/arm-spe.c                     | 110 +++++++-
 tools/perf/util/mem-events.c                  |  20 +-
 5 files changed, 383 insertions(+), 18 deletions(-)
 create mode 100644 tools/arch/arm64/include/asm/cputype.h

-- 
2.32.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-04 18:48 ` Ali Saidi
  0 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line
contention and transfers on Arm platforms. 

This changes enables future changes to c2c on a system with SPE where lines that
are shared among multiple cores show up in perf c2c output. 

Changes in v8:
 * Report NA for both mem_lvl and mem_lvl_num for stores where we have no
   information

Changes in v7:
 * Minor change requested by Leo Yan

Changes in v6:
  * Drop changes to c2c command which will come from Leo Yan

Changes in v5:
  * Add a new snooping type to disambiguate cache-to-cache transfers where
    we don't know if the data is clean or dirty.
  * Set snoop flags on all the data-source cases
  * Special case stores as we have no information on them

Changes in v4:
  * Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/ 
  * Add neoverse-v1 to the neoverse cores list

Ali Saidi (4):
  tools: arm64: Import cputype.h
  perf arm-spe: Use SPE data source for neoverse cores
  perf mem: Support mem_lvl_num in c2c command
  perf mem: Support HITM for when mem_lvl_num is any

 tools/arch/arm64/include/asm/cputype.h        | 258 ++++++++++++++++++
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
 tools/perf/util/arm-spe.c                     | 110 +++++++-
 tools/perf/util/mem-events.c                  |  20 +-
 5 files changed, 383 insertions(+), 18 deletions(-)
 create mode 100644 tools/arch/arm64/include/asm/cputype.h

-- 
2.32.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v8 1/5] perf: Add SNOOP_PEER flag to perf mem data struct
  2022-05-04 18:48 ` Ali Saidi
@ 2022-05-04 18:48   ` Ali Saidi
  -1 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Add a flag to the perf mem data struct to signal that a request caused a
cache-to-cache transfer of a line from a peer of the requestor and
wasn't sourced from a lower cache level.  The line being moved from one
peer cache to another has latency and performance implications. On Arm64
Neoverse systems the data source can indicate a cache-to-cache transfer
but not if the line is dirty or clean, so instead of overloading HITM
define a new flag that indicates this type of transfer.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
---
 include/uapi/linux/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d37629dbad72..7b88bfd097dc 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1310,7 +1310,7 @@ union perf_mem_data_src {
 #define PERF_MEM_SNOOP_SHIFT	19
 
 #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
 #define PERF_MEM_SNOOPX_SHIFT  38
 
 /* locked instruction */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v8 1/5] perf: Add SNOOP_PEER flag to perf mem data struct
@ 2022-05-04 18:48   ` Ali Saidi
  0 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Add a flag to the perf mem data struct to signal that a request caused a
cache-to-cache transfer of a line from a peer of the requestor and
wasn't sourced from a lower cache level.  The line being moved from one
peer cache to another has latency and performance implications. On Arm64
Neoverse systems the data source can indicate a cache-to-cache transfer
but not if the line is dirty or clean, so instead of overloading HITM
define a new flag that indicates this type of transfer.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
---
 include/uapi/linux/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d37629dbad72..7b88bfd097dc 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1310,7 +1310,7 @@ union perf_mem_data_src {
 #define PERF_MEM_SNOOP_SHIFT	19
 
 #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
 #define PERF_MEM_SNOOPX_SHIFT  38
 
 /* locked instruction */
-- 
2.32.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
  2022-05-04 18:48 ` Ali Saidi
@ 2022-05-04 18:48   ` Ali Saidi
  -1 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Add a flag to the perf mem data struct to signal that a request caused a
cache-to-cache transfer of a line from a peer of the requestor and
wasn't sourced from a lower cache level.  The line being moved from one
peer cache to another has latency and performance implications. On Arm64
Neoverse systems the data source can indicate a cache-to-cache transfer
but not if the line is dirty or clean, so instead of overloading HITM
define a new flag that indicates this type of transfer.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
---
 tools/include/uapi/linux/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index d37629dbad72..7b88bfd097dc 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1310,7 +1310,7 @@ union perf_mem_data_src {
 #define PERF_MEM_SNOOP_SHIFT	19
 
 #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
 #define PERF_MEM_SNOOPX_SHIFT  38
 
 /* locked instruction */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
@ 2022-05-04 18:48   ` Ali Saidi
  0 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Add a flag to the perf mem data struct to signal that a request caused a
cache-to-cache transfer of a line from a peer of the requestor and
wasn't sourced from a lower cache level.  The line being moved from one
peer cache to another has latency and performance implications. On Arm64
Neoverse systems the data source can indicate a cache-to-cache transfer
but not if the line is dirty or clean, so instead of overloading HITM
define a new flag that indicates this type of transfer.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
---
 tools/include/uapi/linux/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index d37629dbad72..7b88bfd097dc 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1310,7 +1310,7 @@ union perf_mem_data_src {
 #define PERF_MEM_SNOOP_SHIFT	19
 
 #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
 #define PERF_MEM_SNOOPX_SHIFT  38
 
 /* locked instruction */
-- 
2.32.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v8 3/5] perf mem: Print snoop peer flag
  2022-05-04 18:48 ` Ali Saidi
@ 2022-05-04 18:48   ` Ali Saidi
  -1 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

From: Leo Yan <leo.yan@linaro.org>

Since PERF_MEM_SNOOPX_PEER flag is a new snoop type, print this flag if
it is set.

Before:
       memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)

After:

       memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Ali Saidi <alisaidi@amazon.com>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/util/mem-events.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index efaf263464b9..db5225caaabe 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -410,6 +410,11 @@ static const char * const snoop_access[] = {
 	"HitM",
 };
 
+static const char * const snoopx_access[] = {
+	"Fwd",
+	"Peer",
+};
+
 int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 {
 	size_t i, l = 0;
@@ -430,13 +435,20 @@ int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 		}
 		l += scnprintf(out + l, sz - l, snoop_access[i]);
 	}
-	if (mem_info &&
-	     (mem_info->data_src.mem_snoopx & PERF_MEM_SNOOPX_FWD)) {
+
+	m = 0;
+	if (mem_info)
+		m = mem_info->data_src.mem_snoopx;
+
+	for (i = 0; m && i < ARRAY_SIZE(snoopx_access); i++, m >>= 1) {
+		if (!(m & 0x1))
+			continue;
+
 		if (l) {
 			strcat(out, " or ");
 			l += 4;
 		}
-		l += scnprintf(out + l, sz - l, "Fwd");
+		l += scnprintf(out + l, sz - l, snoopx_access[i]);
 	}
 
 	if (*out == '\0')
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v8 3/5] perf mem: Print snoop peer flag
@ 2022-05-04 18:48   ` Ali Saidi
  0 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

From: Leo Yan <leo.yan@linaro.org>

Since PERF_MEM_SNOOPX_PEER flag is a new snoop type, print this flag if
it is set.

Before:
       memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)

After:

       memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
       memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Ali Saidi <alisaidi@amazon.com>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/util/mem-events.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index efaf263464b9..db5225caaabe 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -410,6 +410,11 @@ static const char * const snoop_access[] = {
 	"HitM",
 };
 
+static const char * const snoopx_access[] = {
+	"Fwd",
+	"Peer",
+};
+
 int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 {
 	size_t i, l = 0;
@@ -430,13 +435,20 @@ int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 		}
 		l += scnprintf(out + l, sz - l, snoop_access[i]);
 	}
-	if (mem_info &&
-	     (mem_info->data_src.mem_snoopx & PERF_MEM_SNOOPX_FWD)) {
+
+	m = 0;
+	if (mem_info)
+		m = mem_info->data_src.mem_snoopx;
+
+	for (i = 0; m && i < ARRAY_SIZE(snoopx_access); i++, m >>= 1) {
+		if (!(m & 0x1))
+			continue;
+
 		if (l) {
 			strcat(out, " or ");
 			l += 4;
 		}
-		l += scnprintf(out + l, sz - l, "Fwd");
+		l += scnprintf(out + l, sz - l, snoopx_access[i]);
 	}
 
 	if (*out == '\0')
-- 
2.32.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v8 4/5] perf arm-spe: Don't set data source if it's not a memory operation
  2022-05-04 18:48 ` Ali Saidi
@ 2022-05-04 18:48   ` Ali Saidi
  -1 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

From: Leo Yan <leo.yan@linaro.org>

Except memory load and store operations, Arm SPE records also can
support other operation types, bug when set the data source field the
current code assumes a record is a either load operation or store
operation, this leads to wrongly synthesize memory samples.

This patch strictly checks the record operation type, it only sets data
source only for the operation types ARM_SPE_LD and ARM_SPE_ST,
otherwise, returns zero for data source.  Therefore, we can synthesize
memory samples only when data source is a non-zero value, the function
arm_spe__is_memory_event() is useless and removed.

Fixes: e55ed3423c1b ("perf arm-spe: Synthesize memory event")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Ali Saidi <alisaidi@amazon.com>
Tested-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/arm-spe.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index d2b64e3f588b..e032efc03274 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -387,26 +387,16 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
 
-#define SPE_MEM_TYPE	(ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS | \
-			 ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS | \
-			 ARM_SPE_REMOTE_ACCESS)
-
-static bool arm_spe__is_memory_event(enum arm_spe_sample_type type)
-{
-	if (type & SPE_MEM_TYPE)
-		return true;
-
-	return false;
-}
-
 static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
 {
 	union perf_mem_data_src	data_src = { 0 };
 
 	if (record->op == ARM_SPE_LD)
 		data_src.mem_op = PERF_MEM_OP_LOAD;
-	else
+	else if (record->op == ARM_SPE_ST)
 		data_src.mem_op = PERF_MEM_OP_STORE;
+	else
+		return 0;
 
 	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
 		data_src.mem_lvl = PERF_MEM_LVL_L3;
@@ -510,7 +500,11 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 			return err;
 	}
 
-	if (spe->sample_memory && arm_spe__is_memory_event(record->type)) {
+	/*
+	 * When data_src is zero it means the record is not a memory operation,
+	 * skip to synthesize memory sample for this case.
+	 */
+	if (spe->sample_memory && data_src) {
 		err = arm_spe__synth_mem_sample(speq, spe->memory_id, data_src);
 		if (err)
 			return err;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v8 4/5] perf arm-spe: Don't set data source if it's not a memory operation
@ 2022-05-04 18:48   ` Ali Saidi
  0 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

From: Leo Yan <leo.yan@linaro.org>

Except memory load and store operations, Arm SPE records also can
support other operation types, bug when set the data source field the
current code assumes a record is a either load operation or store
operation, this leads to wrongly synthesize memory samples.

This patch strictly checks the record operation type, it only sets data
source only for the operation types ARM_SPE_LD and ARM_SPE_ST,
otherwise, returns zero for data source.  Therefore, we can synthesize
memory samples only when data source is a non-zero value, the function
arm_spe__is_memory_event() is useless and removed.

Fixes: e55ed3423c1b ("perf arm-spe: Synthesize memory event")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Ali Saidi <alisaidi@amazon.com>
Tested-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/arm-spe.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index d2b64e3f588b..e032efc03274 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -387,26 +387,16 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
 
-#define SPE_MEM_TYPE	(ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS | \
-			 ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS | \
-			 ARM_SPE_REMOTE_ACCESS)
-
-static bool arm_spe__is_memory_event(enum arm_spe_sample_type type)
-{
-	if (type & SPE_MEM_TYPE)
-		return true;
-
-	return false;
-}
-
 static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
 {
 	union perf_mem_data_src	data_src = { 0 };
 
 	if (record->op == ARM_SPE_LD)
 		data_src.mem_op = PERF_MEM_OP_LOAD;
-	else
+	else if (record->op == ARM_SPE_ST)
 		data_src.mem_op = PERF_MEM_OP_STORE;
+	else
+		return 0;
 
 	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
 		data_src.mem_lvl = PERF_MEM_LVL_L3;
@@ -510,7 +500,11 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 			return err;
 	}
 
-	if (spe->sample_memory && arm_spe__is_memory_event(record->type)) {
+	/*
+	 * When data_src is zero it means the record is not a memory operation,
+	 * skip to synthesize memory sample for this case.
+	 */
+	if (spe->sample_memory && data_src) {
 		err = arm_spe__synth_mem_sample(speq, spe->memory_id, data_src);
 		if (err)
 			return err;
-- 
2.32.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v8 5/5] perf arm-spe: Use SPE data source for neoverse cores
  2022-05-04 18:48 ` Ali Saidi
@ 2022-05-04 18:48   ` Ali Saidi
  -1 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
the same encoding. I can't find encoding information for any other SPE
implementations to unify their choices with Arm's thus that is left for
future work.

This change populates the mem_lvl_num for Neoverse cores as well as the
deprecated mem_lvl namespace.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: German Gomez <german.gomez@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
---
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 ++
 tools/perf/util/arm-spe.c                     | 130 +++++++++++++++---
 3 files changed, 127 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 5e390a1a79ab..091987dd3966 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -220,6 +220,7 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
 
 			break;
 		case ARM_SPE_DATA_SOURCE:
+			decoder->record.source = payload;
 			break;
 		case ARM_SPE_BAD:
 			break;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 69b31084d6be..46a61df1145b 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -29,6 +29,17 @@ enum arm_spe_op_type {
 	ARM_SPE_ST		= 1 << 1,
 };
 
+enum arm_spe_neoverse_data_source {
+	ARM_SPE_NV_L1D		 = 0x0,
+	ARM_SPE_NV_L2		 = 0x8,
+	ARM_SPE_NV_PEER_CORE	 = 0x9,
+	ARM_SPE_NV_LOCAL_CLUSTER = 0xa,
+	ARM_SPE_NV_SYS_CACHE	 = 0xb,
+	ARM_SPE_NV_PEER_CLUSTER	 = 0xc,
+	ARM_SPE_NV_REMOTE	 = 0xd,
+	ARM_SPE_NV_DRAM		 = 0xe,
+};
+
 struct arm_spe_record {
 	enum arm_spe_sample_type type;
 	int err;
@@ -40,6 +51,7 @@ struct arm_spe_record {
 	u64 virt_addr;
 	u64 phys_addr;
 	u64 context_id;
+	u16 source;
 };
 
 struct arm_spe_insn;
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index e032efc03274..db3bd41a257b 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -34,6 +34,7 @@
 #include "arm-spe-decoder/arm-spe-decoder.h"
 #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
+#include "../../arch/arm64/include/asm/cputype.h"
 #define MAX_TIMESTAMP (~0ULL)
 
 struct arm_spe {
@@ -45,6 +46,7 @@ struct arm_spe {
 	struct perf_session		*session;
 	struct machine			*machine;
 	u32				pmu_type;
+	u64				midr;
 
 	struct perf_tsc_conversion	tc;
 
@@ -387,35 +389,128 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
 
-static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
+static const struct midr_range neoverse_spe[] = {
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
+	{},
+};
+
+static void arm_spe__synth_data_source_neoverse(const struct arm_spe_record *record,
+						union perf_mem_data_src *data_src)
 {
-	union perf_mem_data_src	data_src = { 0 };
+	/*
+	 * Even though four levels of cache hierarchy are possible, no known
+	 * production Neoverse systems currently include more than three levels
+	 * so for the time being we assume three exist. If a production system
+	 * is built with four the this function would have to be changed to
+	 * detect the number of levels for reporting.
+	 */
 
-	if (record->op == ARM_SPE_LD)
-		data_src.mem_op = PERF_MEM_OP_LOAD;
-	else if (record->op == ARM_SPE_ST)
-		data_src.mem_op = PERF_MEM_OP_STORE;
-	else
-		return 0;
+	/*
+	 * We have no data on the hit level or data source for stores in the
+	 * Neoverse SPE records.
+	 */
+	if (record->op & ARM_SPE_ST) {
+		data_src->mem_lvl = PERF_MEM_LVL_NA;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+		return;
+	}
+
+	switch (record->source) {
+	case ARM_SPE_NV_L1D:
+		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+		break;
+	case ARM_SPE_NV_L2:
+		data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+		break;
+	case ARM_SPE_NV_PEER_CORE:
+		data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
+		data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
+		break;
+	/*
+	 * We don't know if this is L1, L2 but we do know it was a cache-2-cache
+	 * transfer, so set SNOOPX_PEER
+	 */
+	case ARM_SPE_NV_LOCAL_CLUSTER:
+	case ARM_SPE_NV_PEER_CLUSTER:
+		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+		data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
+		break;
+	/*
+	 * System cache is assumed to be L3
+	 */
+	case ARM_SPE_NV_SYS_CACHE:
+		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+		data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
+		break;
+	/*
+	 * We don't know what level it hit in, except it came from the other
+	 * socket
+	 */
+	case ARM_SPE_NV_REMOTE:
+		data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
+		data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+		break;
+	case ARM_SPE_NV_DRAM:
+		data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+		break;
+	default:
+		break;
+	}
+}
 
+static void arm_spe__synth_data_source_generic(const struct arm_spe_record *record,
+					       union perf_mem_data_src *data_src)
+{
 	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
-		data_src.mem_lvl = PERF_MEM_LVL_L3;
+		data_src->mem_lvl = PERF_MEM_LVL_L3;
 
 		if (record->type & ARM_SPE_LLC_MISS)
-			data_src.mem_lvl |= PERF_MEM_LVL_MISS;
+			data_src->mem_lvl |= PERF_MEM_LVL_MISS;
 		else
-			data_src.mem_lvl |= PERF_MEM_LVL_HIT;
+			data_src->mem_lvl |= PERF_MEM_LVL_HIT;
 	} else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
-		data_src.mem_lvl = PERF_MEM_LVL_L1;
+		data_src->mem_lvl = PERF_MEM_LVL_L1;
 
 		if (record->type & ARM_SPE_L1D_MISS)
-			data_src.mem_lvl |= PERF_MEM_LVL_MISS;
+			data_src->mem_lvl |= PERF_MEM_LVL_MISS;
 		else
-			data_src.mem_lvl |= PERF_MEM_LVL_HIT;
+			data_src->mem_lvl |= PERF_MEM_LVL_HIT;
 	}
 
 	if (record->type & ARM_SPE_REMOTE_ACCESS)
-		data_src.mem_lvl |= PERF_MEM_LVL_REM_CCE1;
+		data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
+}
+
+static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 midr)
+{
+	union perf_mem_data_src	data_src = { 0 };
+	bool is_neoverse = is_midr_in_range(midr, neoverse_spe);
+
+	if (record->op == ARM_SPE_LD)
+		data_src.mem_op = PERF_MEM_OP_LOAD;
+	else if (record->op == ARM_SPE_ST)
+		data_src.mem_op = PERF_MEM_OP_STORE;
+	else
+		return 0;
+
+	if (is_neoverse)
+		arm_spe__synth_data_source_neoverse(record, &data_src);
+	else
+		arm_spe__synth_data_source_generic(record, &data_src);
 
 	if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
 		data_src.mem_dtlb = PERF_MEM_TLB_WK;
@@ -436,7 +531,7 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 	u64 data_src;
 	int err;
 
-	data_src = arm_spe__synth_data_source(record);
+	data_src = arm_spe__synth_data_source(record, spe->midr);
 
 	if (spe->sample_flc) {
 		if (record->type & ARM_SPE_L1D_MISS) {
@@ -1177,6 +1272,8 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
 	size_t min_sz = sizeof(u64) * ARM_SPE_AUXTRACE_PRIV_MAX;
 	struct perf_record_time_conv *tc = &session->time_conv;
+	const char *cpuid = perf_env__cpuid(session->evlist->env);
+	u64 midr = strtol(cpuid, NULL, 16);
 	struct arm_spe *spe;
 	int err;
 
@@ -1196,6 +1293,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->machine = &session->machines.host; /* No kvm support */
 	spe->auxtrace_type = auxtrace_info->type;
 	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
+	spe->midr = midr;
 
 	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v8 5/5] perf arm-spe: Use SPE data source for neoverse cores
@ 2022-05-04 18:48   ` Ali Saidi
  0 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-04 18:48 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
the same encoding. I can't find encoding information for any other SPE
implementations to unify their choices with Arm's thus that is left for
future work.

This change populates the mem_lvl_num for Neoverse cores as well as the
deprecated mem_lvl namespace.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: German Gomez <german.gomez@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
---
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 ++
 tools/perf/util/arm-spe.c                     | 130 +++++++++++++++---
 3 files changed, 127 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 5e390a1a79ab..091987dd3966 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -220,6 +220,7 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
 
 			break;
 		case ARM_SPE_DATA_SOURCE:
+			decoder->record.source = payload;
 			break;
 		case ARM_SPE_BAD:
 			break;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 69b31084d6be..46a61df1145b 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -29,6 +29,17 @@ enum arm_spe_op_type {
 	ARM_SPE_ST		= 1 << 1,
 };
 
+enum arm_spe_neoverse_data_source {
+	ARM_SPE_NV_L1D		 = 0x0,
+	ARM_SPE_NV_L2		 = 0x8,
+	ARM_SPE_NV_PEER_CORE	 = 0x9,
+	ARM_SPE_NV_LOCAL_CLUSTER = 0xa,
+	ARM_SPE_NV_SYS_CACHE	 = 0xb,
+	ARM_SPE_NV_PEER_CLUSTER	 = 0xc,
+	ARM_SPE_NV_REMOTE	 = 0xd,
+	ARM_SPE_NV_DRAM		 = 0xe,
+};
+
 struct arm_spe_record {
 	enum arm_spe_sample_type type;
 	int err;
@@ -40,6 +51,7 @@ struct arm_spe_record {
 	u64 virt_addr;
 	u64 phys_addr;
 	u64 context_id;
+	u16 source;
 };
 
 struct arm_spe_insn;
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index e032efc03274..db3bd41a257b 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -34,6 +34,7 @@
 #include "arm-spe-decoder/arm-spe-decoder.h"
 #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
+#include "../../arch/arm64/include/asm/cputype.h"
 #define MAX_TIMESTAMP (~0ULL)
 
 struct arm_spe {
@@ -45,6 +46,7 @@ struct arm_spe {
 	struct perf_session		*session;
 	struct machine			*machine;
 	u32				pmu_type;
+	u64				midr;
 
 	struct perf_tsc_conversion	tc;
 
@@ -387,35 +389,128 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
 
-static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
+static const struct midr_range neoverse_spe[] = {
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+	MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
+	{},
+};
+
+static void arm_spe__synth_data_source_neoverse(const struct arm_spe_record *record,
+						union perf_mem_data_src *data_src)
 {
-	union perf_mem_data_src	data_src = { 0 };
+	/*
+	 * Even though four levels of cache hierarchy are possible, no known
+	 * production Neoverse systems currently include more than three levels
+	 * so for the time being we assume three exist. If a production system
+	 * is built with four the this function would have to be changed to
+	 * detect the number of levels for reporting.
+	 */
 
-	if (record->op == ARM_SPE_LD)
-		data_src.mem_op = PERF_MEM_OP_LOAD;
-	else if (record->op == ARM_SPE_ST)
-		data_src.mem_op = PERF_MEM_OP_STORE;
-	else
-		return 0;
+	/*
+	 * We have no data on the hit level or data source for stores in the
+	 * Neoverse SPE records.
+	 */
+	if (record->op & ARM_SPE_ST) {
+		data_src->mem_lvl = PERF_MEM_LVL_NA;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_NA;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+		return;
+	}
+
+	switch (record->source) {
+	case ARM_SPE_NV_L1D:
+		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+		break;
+	case ARM_SPE_NV_L2:
+		data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+		break;
+	case ARM_SPE_NV_PEER_CORE:
+		data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
+		data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
+		break;
+	/*
+	 * We don't know if this is L1, L2 but we do know it was a cache-2-cache
+	 * transfer, so set SNOOPX_PEER
+	 */
+	case ARM_SPE_NV_LOCAL_CLUSTER:
+	case ARM_SPE_NV_PEER_CLUSTER:
+		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+		data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
+		break;
+	/*
+	 * System cache is assumed to be L3
+	 */
+	case ARM_SPE_NV_SYS_CACHE:
+		data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
+		data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
+		break;
+	/*
+	 * We don't know what level it hit in, except it came from the other
+	 * socket
+	 */
+	case ARM_SPE_NV_REMOTE:
+		data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
+		data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NA;
+		break;
+	case ARM_SPE_NV_DRAM:
+		data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
+		data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
+		data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
+		break;
+	default:
+		break;
+	}
+}
 
+static void arm_spe__synth_data_source_generic(const struct arm_spe_record *record,
+					       union perf_mem_data_src *data_src)
+{
 	if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
-		data_src.mem_lvl = PERF_MEM_LVL_L3;
+		data_src->mem_lvl = PERF_MEM_LVL_L3;
 
 		if (record->type & ARM_SPE_LLC_MISS)
-			data_src.mem_lvl |= PERF_MEM_LVL_MISS;
+			data_src->mem_lvl |= PERF_MEM_LVL_MISS;
 		else
-			data_src.mem_lvl |= PERF_MEM_LVL_HIT;
+			data_src->mem_lvl |= PERF_MEM_LVL_HIT;
 	} else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
-		data_src.mem_lvl = PERF_MEM_LVL_L1;
+		data_src->mem_lvl = PERF_MEM_LVL_L1;
 
 		if (record->type & ARM_SPE_L1D_MISS)
-			data_src.mem_lvl |= PERF_MEM_LVL_MISS;
+			data_src->mem_lvl |= PERF_MEM_LVL_MISS;
 		else
-			data_src.mem_lvl |= PERF_MEM_LVL_HIT;
+			data_src->mem_lvl |= PERF_MEM_LVL_HIT;
 	}
 
 	if (record->type & ARM_SPE_REMOTE_ACCESS)
-		data_src.mem_lvl |= PERF_MEM_LVL_REM_CCE1;
+		data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
+}
+
+static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 midr)
+{
+	union perf_mem_data_src	data_src = { 0 };
+	bool is_neoverse = is_midr_in_range(midr, neoverse_spe);
+
+	if (record->op == ARM_SPE_LD)
+		data_src.mem_op = PERF_MEM_OP_LOAD;
+	else if (record->op == ARM_SPE_ST)
+		data_src.mem_op = PERF_MEM_OP_STORE;
+	else
+		return 0;
+
+	if (is_neoverse)
+		arm_spe__synth_data_source_neoverse(record, &data_src);
+	else
+		arm_spe__synth_data_source_generic(record, &data_src);
 
 	if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
 		data_src.mem_dtlb = PERF_MEM_TLB_WK;
@@ -436,7 +531,7 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 	u64 data_src;
 	int err;
 
-	data_src = arm_spe__synth_data_source(record);
+	data_src = arm_spe__synth_data_source(record, spe->midr);
 
 	if (spe->sample_flc) {
 		if (record->type & ARM_SPE_L1D_MISS) {
@@ -1177,6 +1272,8 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
 	size_t min_sz = sizeof(u64) * ARM_SPE_AUXTRACE_PRIV_MAX;
 	struct perf_record_time_conv *tc = &session->time_conv;
+	const char *cpuid = perf_env__cpuid(session->evlist->env);
+	u64 midr = strtol(cpuid, NULL, 16);
 	struct arm_spe *spe;
 	int err;
 
@@ -1196,6 +1293,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->machine = &session->machines.host; /* No kvm support */
 	spe->auxtrace_type = auxtrace_info->type;
 	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
+	spe->midr = midr;
 
 	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 
-- 
2.32.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 5/5] perf arm-spe: Use SPE data source for neoverse cores
  2022-05-04 18:48   ` Ali Saidi
@ 2022-05-05 15:03     ` Leo Yan
  -1 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-05 15:03 UTC (permalink / raw)
  To: Ali Saidi
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	acme, benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will

On Wed, May 04, 2022 at 06:48:50PM +0000, Ali Saidi wrote:
> When synthesizing data from SPE, augment the type with source information
> for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
> the same encoding. I can't find encoding information for any other SPE
> implementations to unify their choices with Arm's thus that is left for
> future work.
> 
> This change populates the mem_lvl_num for Neoverse cores as well as the
> deprecated mem_lvl namespace.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> Reviewed-by: German Gomez <german.gomez@arm.com>
> Reviewed-by: Leo Yan <leo.yan@linaro.org>
> Tested-by: Leo Yan <leo.yan@linaro.org>

Thanks for updating, Ali.  It looks good to me.

Since the store operations have been set as PERF_MEM_LVL_NA and
PERF_MEM_LVLNUM_NA, this is right thing for me, I will update perf c2c
patch set for statistics of store operations with PERF_MEM_LVL_NA.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 5/5] perf arm-spe: Use SPE data source for neoverse cores
@ 2022-05-05 15:03     ` Leo Yan
  0 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-05 15:03 UTC (permalink / raw)
  To: Ali Saidi
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	acme, benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will

On Wed, May 04, 2022 at 06:48:50PM +0000, Ali Saidi wrote:
> When synthesizing data from SPE, augment the type with source information
> for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
> the same encoding. I can't find encoding information for any other SPE
> implementations to unify their choices with Arm's thus that is left for
> future work.
> 
> This change populates the mem_lvl_num for Neoverse cores as well as the
> deprecated mem_lvl namespace.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> Reviewed-by: German Gomez <german.gomez@arm.com>
> Reviewed-by: Leo Yan <leo.yan@linaro.org>
> Tested-by: Leo Yan <leo.yan@linaro.org>

Thanks for updating, Ali.  It looks good to me.

Since the store operations have been set as PERF_MEM_LVL_NA and
PERF_MEM_LVLNUM_NA, this is right thing for me, I will update perf c2c
patch set for statistics of store operations with PERF_MEM_LVL_NA.

Thanks,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
  2022-05-04 18:48   ` Ali Saidi
@ 2022-05-10 16:28     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 44+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-10 16:28 UTC (permalink / raw)
  To: Ali Saidi
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Em Wed, May 04, 2022 at 06:48:47PM +0000, Ali Saidi escreveu:
> Add a flag to the perf mem data struct to signal that a request caused a
> cache-to-cache transfer of a line from a peer of the requestor and
> wasn't sourced from a lower cache level.  The line being moved from one
> peer cache to another has latency and performance implications. On Arm64
> Neoverse systems the data source can indicate a cache-to-cache transfer
> but not if the line is dirty or clean, so instead of overloading HITM
> define a new flag that indicates this type of transfer.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> Reviewed-by: Leo Yan <leo.yan@linaro.org>

Was this already merged on the ARM kernel tree?

- Arnaldo

> ---
>  tools/include/uapi/linux/perf_event.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index d37629dbad72..7b88bfd097dc 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -1310,7 +1310,7 @@ union perf_mem_data_src {
>  #define PERF_MEM_SNOOP_SHIFT	19
>  
>  #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
> -/* 1 free */
> +#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
>  #define PERF_MEM_SNOOPX_SHIFT  38
>  
>  /* locked instruction */
> -- 
> 2.32.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
@ 2022-05-10 16:28     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 44+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-10 16:28 UTC (permalink / raw)
  To: Ali Saidi
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Em Wed, May 04, 2022 at 06:48:47PM +0000, Ali Saidi escreveu:
> Add a flag to the perf mem data struct to signal that a request caused a
> cache-to-cache transfer of a line from a peer of the requestor and
> wasn't sourced from a lower cache level.  The line being moved from one
> peer cache to another has latency and performance implications. On Arm64
> Neoverse systems the data source can indicate a cache-to-cache transfer
> but not if the line is dirty or clean, so instead of overloading HITM
> define a new flag that indicates this type of transfer.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> Reviewed-by: Leo Yan <leo.yan@linaro.org>

Was this already merged on the ARM kernel tree?

- Arnaldo

> ---
>  tools/include/uapi/linux/perf_event.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index d37629dbad72..7b88bfd097dc 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -1310,7 +1310,7 @@ union perf_mem_data_src {
>  #define PERF_MEM_SNOOP_SHIFT	19
>  
>  #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
> -/* 1 free */
> +#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
>  #define PERF_MEM_SNOOPX_SHIFT  38
>  
>  /* locked instruction */
> -- 
> 2.32.0

-- 

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
  2022-05-10 16:28     ` Arnaldo Carvalho de Melo
@ 2022-05-11  2:20       ` Leo Yan
  -1 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-11  2:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

On Tue, May 10, 2022 at 01:28:38PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, May 04, 2022 at 06:48:47PM +0000, Ali Saidi escreveu:
> > Add a flag to the perf mem data struct to signal that a request caused a
> > cache-to-cache transfer of a line from a peer of the requestor and
> > wasn't sourced from a lower cache level.  The line being moved from one
> > peer cache to another has latency and performance implications. On Arm64
> > Neoverse systems the data source can indicate a cache-to-cache transfer
> > but not if the line is dirty or clean, so instead of overloading HITM
> > define a new flag that indicates this type of transfer.
> > 
> > Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> > Reviewed-by: Leo Yan <leo.yan@linaro.org>
> 
> Was this already merged on the ARM kernel tree?

No, I don't think this patch has been merged on Arm kernel tree.  I searched
Arm and Arm64 git repos, none of them has merged this patch.

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?qt=author&q=Ali+Saidi
http://git.armlinux.org.uk/cgit/linux-arm.git/log/?qt=author&q=Ali+Saidi

P.s. Ali missed to include German's review tag, see:
https://lore.kernel.org/lkml/458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com/

Do you want us to resend the patch set for adding tags?

Thanks,
Leo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
@ 2022-05-11  2:20       ` Leo Yan
  0 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-11  2:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

On Tue, May 10, 2022 at 01:28:38PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, May 04, 2022 at 06:48:47PM +0000, Ali Saidi escreveu:
> > Add a flag to the perf mem data struct to signal that a request caused a
> > cache-to-cache transfer of a line from a peer of the requestor and
> > wasn't sourced from a lower cache level.  The line being moved from one
> > peer cache to another has latency and performance implications. On Arm64
> > Neoverse systems the data source can indicate a cache-to-cache transfer
> > but not if the line is dirty or clean, so instead of overloading HITM
> > define a new flag that indicates this type of transfer.
> > 
> > Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> > Reviewed-by: Leo Yan <leo.yan@linaro.org>
> 
> Was this already merged on the ARM kernel tree?

No, I don't think this patch has been merged on Arm kernel tree.  I searched
Arm and Arm64 git repos, none of them has merged this patch.

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?qt=author&q=Ali+Saidi
http://git.armlinux.org.uk/cgit/linux-arm.git/log/?qt=author&q=Ali+Saidi

P.s. Ali missed to include German's review tag, see:
https://lore.kernel.org/lkml/458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com/

Do you want us to resend the patch set for adding tags?

Thanks,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 1/5] perf: Add SNOOP_PEER flag to perf mem data struct
  2022-05-04 18:48   ` Ali Saidi
@ 2022-05-11  5:41     ` kajoljain
  -1 siblings, 0 replies; 44+ messages in thread
From: kajoljain @ 2022-05-11  5:41 UTC (permalink / raw)
  To: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, leo.yan, acme
  Cc: benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will



On 5/5/22 00:18, Ali Saidi wrote:
> Add a flag to the perf mem data struct to signal that a request caused a
> cache-to-cache transfer of a line from a peer of the requestor and
> wasn't sourced from a lower cache level.  The line being moved from one
> peer cache to another has latency and performance implications. On Arm64
> Neoverse systems the data source can indicate a cache-to-cache transfer
> but not if the line is dirty or clean, so instead of overloading HITM
> define a new flag that indicates this type of transfer.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> Reviewed-by: Leo Yan <leo.yan@linaro.org>
> ---
>  include/uapi/linux/perf_event.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index d37629dbad72..7b88bfd097dc 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -1310,7 +1310,7 @@ union perf_mem_data_src {
>  #define PERF_MEM_SNOOP_SHIFT	19
>  
>  #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
> -/* 1 free */
> +#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
>  #define PERF_MEM_SNOOPX_SHIFT  38
>  
>  /* locked instruction */

Patch looks good to me.

Reviewed-By: Kajol Jain<kjain@linux.ibm.com>

Thanks,
Kajol Jain

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 1/5] perf: Add SNOOP_PEER flag to perf mem data struct
@ 2022-05-11  5:41     ` kajoljain
  0 siblings, 0 replies; 44+ messages in thread
From: kajoljain @ 2022-05-11  5:41 UTC (permalink / raw)
  To: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, leo.yan, acme
  Cc: benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will



On 5/5/22 00:18, Ali Saidi wrote:
> Add a flag to the perf mem data struct to signal that a request caused a
> cache-to-cache transfer of a line from a peer of the requestor and
> wasn't sourced from a lower cache level.  The line being moved from one
> peer cache to another has latency and performance implications. On Arm64
> Neoverse systems the data source can indicate a cache-to-cache transfer
> but not if the line is dirty or clean, so instead of overloading HITM
> define a new flag that indicates this type of transfer.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> Reviewed-by: Leo Yan <leo.yan@linaro.org>
> ---
>  include/uapi/linux/perf_event.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index d37629dbad72..7b88bfd097dc 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -1310,7 +1310,7 @@ union perf_mem_data_src {
>  #define PERF_MEM_SNOOP_SHIFT	19
>  
>  #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
> -/* 1 free */
> +#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
>  #define PERF_MEM_SNOOPX_SHIFT  38
>  
>  /* locked instruction */

Patch looks good to me.

Reviewed-By: Kajol Jain<kjain@linux.ibm.com>

Thanks,
Kajol Jain

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
  2022-05-04 18:48   ` Ali Saidi
@ 2022-05-11  5:42     ` kajoljain
  -1 siblings, 0 replies; 44+ messages in thread
From: kajoljain @ 2022-05-11  5:42 UTC (permalink / raw)
  To: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, leo.yan, acme
  Cc: benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will



On 5/5/22 00:18, Ali Saidi wrote:
> Add a flag to the perf mem data struct to signal that a request caused a
> cache-to-cache transfer of a line from a peer of the requestor and
> wasn't sourced from a lower cache level.  The line being moved from one
> peer cache to another has latency and performance implications. On Arm64
> Neoverse systems the data source can indicate a cache-to-cache transfer
> but not if the line is dirty or clean, so instead of overloading HITM
> define a new flag that indicates this type of transfer.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> Reviewed-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/include/uapi/linux/perf_event.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index d37629dbad72..7b88bfd097dc 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -1310,7 +1310,7 @@ union perf_mem_data_src {
>  #define PERF_MEM_SNOOP_SHIFT	19
>  
>  #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
> -/* 1 free */
> +#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
>  #define PERF_MEM_SNOOPX_SHIFT  38
>  
>  /* locked instruction */
Patch looks good to me.

Reviewed-By: Kajol Jain<kjain@linux.ibm.com>

Thanks,
Kajol Jain

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
@ 2022-05-11  5:42     ` kajoljain
  0 siblings, 0 replies; 44+ messages in thread
From: kajoljain @ 2022-05-11  5:42 UTC (permalink / raw)
  To: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, leo.yan, acme
  Cc: benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will



On 5/5/22 00:18, Ali Saidi wrote:
> Add a flag to the perf mem data struct to signal that a request caused a
> cache-to-cache transfer of a line from a peer of the requestor and
> wasn't sourced from a lower cache level.  The line being moved from one
> peer cache to another has latency and performance implications. On Arm64
> Neoverse systems the data source can indicate a cache-to-cache transfer
> but not if the line is dirty or clean, so instead of overloading HITM
> define a new flag that indicates this type of transfer.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> Reviewed-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/include/uapi/linux/perf_event.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index d37629dbad72..7b88bfd097dc 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -1310,7 +1310,7 @@ union perf_mem_data_src {
>  #define PERF_MEM_SNOOP_SHIFT	19
>  
>  #define PERF_MEM_SNOOPX_FWD	0x01 /* forward */
> -/* 1 free */
> +#define PERF_MEM_SNOOPX_PEER	0x02 /* xfer from peer */
>  #define PERF_MEM_SNOOPX_SHIFT  38
>  
>  /* locked instruction */
Patch looks good to me.

Reviewed-By: Kajol Jain<kjain@linux.ibm.com>

Thanks,
Kajol Jain

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 3/5] perf mem: Print snoop peer flag
  2022-05-04 18:48   ` Ali Saidi
@ 2022-05-11  5:45     ` kajoljain
  -1 siblings, 0 replies; 44+ messages in thread
From: kajoljain @ 2022-05-11  5:45 UTC (permalink / raw)
  To: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, leo.yan, acme
  Cc: benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will



On 5/5/22 00:18, Ali Saidi wrote:
> From: Leo Yan <leo.yan@linaro.org>
> 
> Since PERF_MEM_SNOOPX_PEER flag is a new snoop type, print this flag if
> it is set.
> 
> Before:
>        memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
> 
> After:
> 
>        memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> Reviewed-by: Ali Saidi <alisaidi@amazon.com>
> Tested-by: Ali Saidi <alisaidi@amazon.com>

Patch looks good to me.

Reviewed-By: Kajol Jain<kjain@linux.ibm.com>

Thanks,
Kajol Jain

> ---
>  tools/perf/util/mem-events.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index efaf263464b9..db5225caaabe 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -410,6 +410,11 @@ static const char * const snoop_access[] = {
>  	"HitM",
>  };
>  
> +static const char * const snoopx_access[] = {
> +	"Fwd",
> +	"Peer",
> +};
> +
>  int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
>  {
>  	size_t i, l = 0;
> @@ -430,13 +435,20 @@ int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
>  		}
>  		l += scnprintf(out + l, sz - l, snoop_access[i]);
>  	}
> -	if (mem_info &&
> -	     (mem_info->data_src.mem_snoopx & PERF_MEM_SNOOPX_FWD)) {
> +
> +	m = 0;
> +	if (mem_info)
> +		m = mem_info->data_src.mem_snoopx;
> +
> +	for (i = 0; m && i < ARRAY_SIZE(snoopx_access); i++, m >>= 1) {
> +		if (!(m & 0x1))
> +			continue;
> +
>  		if (l) {
>  			strcat(out, " or ");
>  			l += 4;
>  		}
> -		l += scnprintf(out + l, sz - l, "Fwd");
> +		l += scnprintf(out + l, sz - l, snoopx_access[i]);
>  	}
>  
>  	if (*out == '\0')

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 3/5] perf mem: Print snoop peer flag
@ 2022-05-11  5:45     ` kajoljain
  0 siblings, 0 replies; 44+ messages in thread
From: kajoljain @ 2022-05-11  5:45 UTC (permalink / raw)
  To: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, leo.yan, acme
  Cc: benh, Nick.Forrington, alexander.shishkin, andrew.kilroy,
	james.clark, john.garry, jolsa, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will



On 5/5/22 00:18, Ali Saidi wrote:
> From: Leo Yan <leo.yan@linaro.org>
> 
> Since PERF_MEM_SNOOPX_PEER flag is a new snoop type, print this flag if
> it is set.
> 
> Before:
>        memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
> 
> After:
> 
>        memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
>        memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> Reviewed-by: Ali Saidi <alisaidi@amazon.com>
> Tested-by: Ali Saidi <alisaidi@amazon.com>

Patch looks good to me.

Reviewed-By: Kajol Jain<kjain@linux.ibm.com>

Thanks,
Kajol Jain

> ---
>  tools/perf/util/mem-events.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index efaf263464b9..db5225caaabe 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -410,6 +410,11 @@ static const char * const snoop_access[] = {
>  	"HitM",
>  };
>  
> +static const char * const snoopx_access[] = {
> +	"Fwd",
> +	"Peer",
> +};
> +
>  int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
>  {
>  	size_t i, l = 0;
> @@ -430,13 +435,20 @@ int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
>  		}
>  		l += scnprintf(out + l, sz - l, snoop_access[i]);
>  	}
> -	if (mem_info &&
> -	     (mem_info->data_src.mem_snoopx & PERF_MEM_SNOOPX_FWD)) {
> +
> +	m = 0;
> +	if (mem_info)
> +		m = mem_info->data_src.mem_snoopx;
> +
> +	for (i = 0; m && i < ARRAY_SIZE(snoopx_access); i++, m >>= 1) {
> +		if (!(m & 0x1))
> +			continue;
> +
>  		if (l) {
>  			strcat(out, " or ");
>  			l += 4;
>  		}
> -		l += scnprintf(out + l, sz - l, "Fwd");
> +		l += scnprintf(out + l, sz - l, snoopx_access[i]);
>  	}
>  
>  	if (*out == '\0')

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
  2022-05-11  2:20       ` Leo Yan
@ 2022-05-11 18:28         ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 44+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-11 18:28 UTC (permalink / raw)
  To: Leo Yan
  Cc: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Em Wed, May 11, 2022 at 10:20:04AM +0800, Leo Yan escreveu:
> On Tue, May 10, 2022 at 01:28:38PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, May 04, 2022 at 06:48:47PM +0000, Ali Saidi escreveu:
> > > Add a flag to the perf mem data struct to signal that a request caused a
> > > cache-to-cache transfer of a line from a peer of the requestor and
> > > wasn't sourced from a lower cache level.  The line being moved from one
> > > peer cache to another has latency and performance implications. On Arm64
> > > Neoverse systems the data source can indicate a cache-to-cache transfer
> > > but not if the line is dirty or clean, so instead of overloading HITM
> > > define a new flag that indicates this type of transfer.
> > > 
> > > Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> > > Reviewed-by: Leo Yan <leo.yan@linaro.org>
> > 
> > Was this already merged on the ARM kernel tree?
> 
> No, I don't think this patch has been merged on Arm kernel tree.  I searched
> Arm and Arm64 git repos, none of them has merged this patch.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?qt=author&q=Ali+Saidi
> http://git.armlinux.org.uk/cgit/linux-arm.git/log/?qt=author&q=Ali+Saidi
> 
> P.s. Ali missed to include German's review tag, see:
> https://lore.kernel.org/lkml/458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com/
> 
> Do you want us to resend the patch set for adding tags?

I use b4 and it should collect Reviewed-by, Acked-by, etc tags, for
instance, if I use the message-id in your message:

⬢[acme@toolbox perf]$ b4 am -ctsl --cc-trailers 20220511022004.GA956170@leoy-ThinkPad-X240s
Looking up https://lore.kernel.org/r/20220511022004.GA956170%40leoy-ThinkPad-X240s
Grabbing thread from lore.kernel.org/all/20220511022004.GA956170%40leoy-ThinkPad-X240s/t.mbox.gz
Checking for newer revisions on https://lore.kernel.org/all/
Analyzing 12 messages in the thread
Checking attestation on all messages, may take a moment...
---
  ✓ [PATCH v8 1/5] perf: Add SNOOP_PEER flag to perf mem data struct
    + Reviewed-By: Kajol Jain<kjain@linux.ibm.com> (✓ DKIM/ibm.com)
    + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    + Link: https://lore.kernel.org/r/20220504184850.24986-2-alisaidi@amazon.com
    + Cc: Nick.Forrington@arm.com
    + Cc: andrew.kilroy@arm.com
    + Cc: german.gomez@arm.com
    + Cc: james.clark@arm.com
    + Cc: mark.rutland@arm.com
    + Cc: john.garry@huawei.com
    + Cc: lihuafei1@huawei.com
    + Cc: peterz@infradead.org
    + Cc: benh@kernel.crashing.org
    + Cc: acme@kernel.org
    + Cc: jolsa@kernel.org
    + Cc: namhyung@kernel.org
    + Cc: will@kernel.org
    + Cc: mathieu.poirier@linaro.org
    + Cc: alexander.shishkin@linux.intel.com
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: mingo@redhat.com
    + Cc: linux-kernel@vger.kernel.org
    + Cc: linux-perf-users@vger.kernel.org
  ✓ [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
    + Reviewed-By: Kajol Jain<kjain@linux.ibm.com> (✓ DKIM/ibm.com)
    + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    + Link: https://lore.kernel.org/r/20220504184850.24986-3-alisaidi@amazon.com
    + Cc: Nick.Forrington@arm.com
    + Cc: andrew.kilroy@arm.com
    + Cc: german.gomez@arm.com
    + Cc: james.clark@arm.com
    + Cc: mark.rutland@arm.com
    + Cc: john.garry@huawei.com
    + Cc: lihuafei1@huawei.com
    + Cc: peterz@infradead.org
    + Cc: benh@kernel.crashing.org
    + Cc: acme@kernel.org
    + Cc: jolsa@kernel.org
    + Cc: namhyung@kernel.org
    + Cc: will@kernel.org
    + Cc: mathieu.poirier@linaro.org
    + Cc: alexander.shishkin@linux.intel.com
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: mingo@redhat.com
    + Cc: linux-kernel@vger.kernel.org
    + Cc: linux-perf-users@vger.kernel.org
  ✓ [PATCH v8 3/5] perf mem: Print snoop peer flag
    + Reviewed-By: Kajol Jain<kjain@linux.ibm.com> (✓ DKIM/ibm.com)
    + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    + Link: https://lore.kernel.org/r/20220504184850.24986-4-alisaidi@amazon.com
    + Cc: Nick.Forrington@arm.com
    + Cc: andrew.kilroy@arm.com
    + Cc: german.gomez@arm.com
    + Cc: james.clark@arm.com
    + Cc: mark.rutland@arm.com
    + Cc: john.garry@huawei.com
    + Cc: lihuafei1@huawei.com
    + Cc: peterz@infradead.org
    + Cc: benh@kernel.crashing.org
    + Cc: acme@kernel.org
    + Cc: jolsa@kernel.org
    + Cc: namhyung@kernel.org
    + Cc: will@kernel.org
    + Cc: mathieu.poirier@linaro.org
    + Cc: alexander.shishkin@linux.intel.com
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: mingo@redhat.com
    + Cc: linux-kernel@vger.kernel.org
    + Cc: linux-perf-users@vger.kernel.org
  ✓ [PATCH v8 4/5] perf arm-spe: Don't set data source if it's not a memory operation
    + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    + Link: https://lore.kernel.org/r/20220504184850.24986-5-alisaidi@amazon.com
    + Cc: Nick.Forrington@arm.com
    + Cc: andrew.kilroy@arm.com
    + Cc: james.clark@arm.com
    + Cc: mark.rutland@arm.com
    + Cc: john.garry@huawei.com
    + Cc: lihuafei1@huawei.com
    + Cc: peterz@infradead.org
    + Cc: benh@kernel.crashing.org
    + Cc: acme@kernel.org
    + Cc: jolsa@kernel.org
    + Cc: namhyung@kernel.org
    + Cc: will@kernel.org
    + Cc: mathieu.poirier@linaro.org
    + Cc: kjain@linux.ibm.com
    + Cc: alexander.shishkin@linux.intel.com
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: mingo@redhat.com
    + Cc: linux-kernel@vger.kernel.org
    + Cc: linux-perf-users@vger.kernel.org
  ✓ [PATCH v8 5/5] perf arm-spe: Use SPE data source for neoverse cores
    + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    + Link: https://lore.kernel.org/r/20220504184850.24986-6-alisaidi@amazon.com
    + Cc: Nick.Forrington@arm.com
    + Cc: andrew.kilroy@arm.com
    + Cc: james.clark@arm.com
    + Cc: mark.rutland@arm.com
    + Cc: john.garry@huawei.com
    + Cc: lihuafei1@huawei.com
    + Cc: peterz@infradead.org
    + Cc: benh@kernel.crashing.org
    + Cc: acme@kernel.org
    + Cc: jolsa@kernel.org
    + Cc: namhyung@kernel.org
    + Cc: will@kernel.org
    + Cc: mathieu.poirier@linaro.org
    + Cc: kjain@linux.ibm.com
    + Cc: alexander.shishkin@linux.intel.com
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: mingo@redhat.com
    + Cc: linux-kernel@vger.kernel.org
    + Cc: linux-perf-users@vger.kernel.org
  ---
  ✓ Signed: DKIM/amazon.com
---
Total patches: 5
---
Cover: ./v8_20220504_alisaidi_perf_arm_spe_decode_spe_source_and_use_for_perf_c2c.cover
 Link: https://lore.kernel.org/r/20220504184850.24986-1-alisaidi@amazon.com
 Base: not specified
       git am ./v8_20220504_alisaidi_perf_arm_spe_decode_spe_source_and_use_for_perf_c2c.mbx
⬢[acme@toolbox perf]$

Somehow it is not being collected... :-\

Not even when I use:

> P.s. Ali missed to include German's review tag, see:
> https://lore.kernel.org/lkml/458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com/


458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com

Will try updating b4...

- Arnaldo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
@ 2022-05-11 18:28         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 44+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-11 18:28 UTC (permalink / raw)
  To: Leo Yan
  Cc: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Em Wed, May 11, 2022 at 10:20:04AM +0800, Leo Yan escreveu:
> On Tue, May 10, 2022 at 01:28:38PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, May 04, 2022 at 06:48:47PM +0000, Ali Saidi escreveu:
> > > Add a flag to the perf mem data struct to signal that a request caused a
> > > cache-to-cache transfer of a line from a peer of the requestor and
> > > wasn't sourced from a lower cache level.  The line being moved from one
> > > peer cache to another has latency and performance implications. On Arm64
> > > Neoverse systems the data source can indicate a cache-to-cache transfer
> > > but not if the line is dirty or clean, so instead of overloading HITM
> > > define a new flag that indicates this type of transfer.
> > > 
> > > Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> > > Reviewed-by: Leo Yan <leo.yan@linaro.org>
> > 
> > Was this already merged on the ARM kernel tree?
> 
> No, I don't think this patch has been merged on Arm kernel tree.  I searched
> Arm and Arm64 git repos, none of them has merged this patch.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?qt=author&q=Ali+Saidi
> http://git.armlinux.org.uk/cgit/linux-arm.git/log/?qt=author&q=Ali+Saidi
> 
> P.s. Ali missed to include German's review tag, see:
> https://lore.kernel.org/lkml/458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com/
> 
> Do you want us to resend the patch set for adding tags?

I use b4 and it should collect Reviewed-by, Acked-by, etc tags, for
instance, if I use the message-id in your message:

⬢[acme@toolbox perf]$ b4 am -ctsl --cc-trailers 20220511022004.GA956170@leoy-ThinkPad-X240s
Looking up https://lore.kernel.org/r/20220511022004.GA956170%40leoy-ThinkPad-X240s
Grabbing thread from lore.kernel.org/all/20220511022004.GA956170%40leoy-ThinkPad-X240s/t.mbox.gz
Checking for newer revisions on https://lore.kernel.org/all/
Analyzing 12 messages in the thread
Checking attestation on all messages, may take a moment...
---
  ✓ [PATCH v8 1/5] perf: Add SNOOP_PEER flag to perf mem data struct
    + Reviewed-By: Kajol Jain<kjain@linux.ibm.com> (✓ DKIM/ibm.com)
    + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    + Link: https://lore.kernel.org/r/20220504184850.24986-2-alisaidi@amazon.com
    + Cc: Nick.Forrington@arm.com
    + Cc: andrew.kilroy@arm.com
    + Cc: german.gomez@arm.com
    + Cc: james.clark@arm.com
    + Cc: mark.rutland@arm.com
    + Cc: john.garry@huawei.com
    + Cc: lihuafei1@huawei.com
    + Cc: peterz@infradead.org
    + Cc: benh@kernel.crashing.org
    + Cc: acme@kernel.org
    + Cc: jolsa@kernel.org
    + Cc: namhyung@kernel.org
    + Cc: will@kernel.org
    + Cc: mathieu.poirier@linaro.org
    + Cc: alexander.shishkin@linux.intel.com
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: mingo@redhat.com
    + Cc: linux-kernel@vger.kernel.org
    + Cc: linux-perf-users@vger.kernel.org
  ✓ [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
    + Reviewed-By: Kajol Jain<kjain@linux.ibm.com> (✓ DKIM/ibm.com)
    + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    + Link: https://lore.kernel.org/r/20220504184850.24986-3-alisaidi@amazon.com
    + Cc: Nick.Forrington@arm.com
    + Cc: andrew.kilroy@arm.com
    + Cc: german.gomez@arm.com
    + Cc: james.clark@arm.com
    + Cc: mark.rutland@arm.com
    + Cc: john.garry@huawei.com
    + Cc: lihuafei1@huawei.com
    + Cc: peterz@infradead.org
    + Cc: benh@kernel.crashing.org
    + Cc: acme@kernel.org
    + Cc: jolsa@kernel.org
    + Cc: namhyung@kernel.org
    + Cc: will@kernel.org
    + Cc: mathieu.poirier@linaro.org
    + Cc: alexander.shishkin@linux.intel.com
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: mingo@redhat.com
    + Cc: linux-kernel@vger.kernel.org
    + Cc: linux-perf-users@vger.kernel.org
  ✓ [PATCH v8 3/5] perf mem: Print snoop peer flag
    + Reviewed-By: Kajol Jain<kjain@linux.ibm.com> (✓ DKIM/ibm.com)
    + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    + Link: https://lore.kernel.org/r/20220504184850.24986-4-alisaidi@amazon.com
    + Cc: Nick.Forrington@arm.com
    + Cc: andrew.kilroy@arm.com
    + Cc: german.gomez@arm.com
    + Cc: james.clark@arm.com
    + Cc: mark.rutland@arm.com
    + Cc: john.garry@huawei.com
    + Cc: lihuafei1@huawei.com
    + Cc: peterz@infradead.org
    + Cc: benh@kernel.crashing.org
    + Cc: acme@kernel.org
    + Cc: jolsa@kernel.org
    + Cc: namhyung@kernel.org
    + Cc: will@kernel.org
    + Cc: mathieu.poirier@linaro.org
    + Cc: alexander.shishkin@linux.intel.com
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: mingo@redhat.com
    + Cc: linux-kernel@vger.kernel.org
    + Cc: linux-perf-users@vger.kernel.org
  ✓ [PATCH v8 4/5] perf arm-spe: Don't set data source if it's not a memory operation
    + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    + Link: https://lore.kernel.org/r/20220504184850.24986-5-alisaidi@amazon.com
    + Cc: Nick.Forrington@arm.com
    + Cc: andrew.kilroy@arm.com
    + Cc: james.clark@arm.com
    + Cc: mark.rutland@arm.com
    + Cc: john.garry@huawei.com
    + Cc: lihuafei1@huawei.com
    + Cc: peterz@infradead.org
    + Cc: benh@kernel.crashing.org
    + Cc: acme@kernel.org
    + Cc: jolsa@kernel.org
    + Cc: namhyung@kernel.org
    + Cc: will@kernel.org
    + Cc: mathieu.poirier@linaro.org
    + Cc: kjain@linux.ibm.com
    + Cc: alexander.shishkin@linux.intel.com
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: mingo@redhat.com
    + Cc: linux-kernel@vger.kernel.org
    + Cc: linux-perf-users@vger.kernel.org
  ✓ [PATCH v8 5/5] perf arm-spe: Use SPE data source for neoverse cores
    + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    + Link: https://lore.kernel.org/r/20220504184850.24986-6-alisaidi@amazon.com
    + Cc: Nick.Forrington@arm.com
    + Cc: andrew.kilroy@arm.com
    + Cc: james.clark@arm.com
    + Cc: mark.rutland@arm.com
    + Cc: john.garry@huawei.com
    + Cc: lihuafei1@huawei.com
    + Cc: peterz@infradead.org
    + Cc: benh@kernel.crashing.org
    + Cc: acme@kernel.org
    + Cc: jolsa@kernel.org
    + Cc: namhyung@kernel.org
    + Cc: will@kernel.org
    + Cc: mathieu.poirier@linaro.org
    + Cc: kjain@linux.ibm.com
    + Cc: alexander.shishkin@linux.intel.com
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: mingo@redhat.com
    + Cc: linux-kernel@vger.kernel.org
    + Cc: linux-perf-users@vger.kernel.org
  ---
  ✓ Signed: DKIM/amazon.com
---
Total patches: 5
---
Cover: ./v8_20220504_alisaidi_perf_arm_spe_decode_spe_source_and_use_for_perf_c2c.cover
 Link: https://lore.kernel.org/r/20220504184850.24986-1-alisaidi@amazon.com
 Base: not specified
       git am ./v8_20220504_alisaidi_perf_arm_spe_decode_spe_source_and_use_for_perf_c2c.mbx
⬢[acme@toolbox perf]$

Somehow it is not being collected... :-\

Not even when I use:

> P.s. Ali missed to include German's review tag, see:
> https://lore.kernel.org/lkml/458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com/


458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com

Will try updating b4...

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
  2022-05-11 18:28         ` Arnaldo Carvalho de Melo
@ 2022-05-11 18:29           ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 44+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-11 18:29 UTC (permalink / raw)
  To: Leo Yan
  Cc: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Em Wed, May 11, 2022 at 03:28:00PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, May 11, 2022 at 10:20:04AM +0800, Leo Yan escreveu:
> > On Tue, May 10, 2022 at 01:28:38PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, May 04, 2022 at 06:48:47PM +0000, Ali Saidi escreveu:
> > > > Add a flag to the perf mem data struct to signal that a request caused a
> > > > cache-to-cache transfer of a line from a peer of the requestor and
> > > > wasn't sourced from a lower cache level.  The line being moved from one
> > > > peer cache to another has latency and performance implications. On Arm64
> > > > Neoverse systems the data source can indicate a cache-to-cache transfer
> > > > but not if the line is dirty or clean, so instead of overloading HITM
> > > > define a new flag that indicates this type of transfer.
> > > > 
> > > > Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> > > > Reviewed-by: Leo Yan <leo.yan@linaro.org>
> > > 
> > > Was this already merged on the ARM kernel tree?
> > 
> > No, I don't think this patch has been merged on Arm kernel tree.  I searched
> > Arm and Arm64 git repos, none of them has merged this patch.
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?qt=author&q=Ali+Saidi
> > http://git.armlinux.org.uk/cgit/linux-arm.git/log/?qt=author&q=Ali+Saidi
> > 
> > P.s. Ali missed to include German's review tag, see:
> > https://lore.kernel.org/lkml/458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com/
> > 
> > Do you want us to resend the patch set for adding tags?
> 
> I use b4 and it should collect Reviewed-by, Acked-by, etc tags, for
> instance, if I use the message-id in your message:
> 
> ⬢[acme@toolbox perf]$ b4 am -ctsl --cc-trailers 20220511022004.GA956170@leoy-ThinkPad-X240s
> Looking up https://lore.kernel.org/r/20220511022004.GA956170%40leoy-ThinkPad-X240s
> Grabbing thread from lore.kernel.org/all/20220511022004.GA956170%40leoy-ThinkPad-X240s/t.mbox.gz
> Checking for newer revisions on https://lore.kernel.org/all/
> Analyzing 12 messages in the thread
<SNIP>
>   ✓ [PATCH v8 5/5] perf arm-spe: Use SPE data source for neoverse cores
>     + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>     + Link: https://lore.kernel.org/r/20220504184850.24986-6-alisaidi@amazon.com
>     + Cc: Nick.Forrington@arm.com
>     + Cc: andrew.kilroy@arm.com
>     + Cc: james.clark@arm.com
>     + Cc: mark.rutland@arm.com
>     + Cc: john.garry@huawei.com
>     + Cc: lihuafei1@huawei.com
>     + Cc: peterz@infradead.org
>     + Cc: benh@kernel.crashing.org
>     + Cc: acme@kernel.org
>     + Cc: jolsa@kernel.org
>     + Cc: namhyung@kernel.org
>     + Cc: will@kernel.org
>     + Cc: mathieu.poirier@linaro.org
>     + Cc: kjain@linux.ibm.com
>     + Cc: alexander.shishkin@linux.intel.com
>     + Cc: linux-arm-kernel@lists.infradead.org
>     + Cc: mingo@redhat.com
>     + Cc: linux-kernel@vger.kernel.org
>     + Cc: linux-perf-users@vger.kernel.org
>   ---
>   ✓ Signed: DKIM/amazon.com
> ---
> Total patches: 5
> ---
> Cover: ./v8_20220504_alisaidi_perf_arm_spe_decode_spe_source_and_use_for_perf_c2c.cover
>  Link: https://lore.kernel.org/r/20220504184850.24986-1-alisaidi@amazon.com
>  Base: not specified
>        git am ./v8_20220504_alisaidi_perf_arm_spe_decode_spe_source_and_use_for_perf_c2c.mbx
> ⬢[acme@toolbox perf]$
> 
> Somehow it is not being collected... :-\
> 
> Not even when I use:
> 
> > P.s. Ali missed to include German's review tag, see:
> > https://lore.kernel.org/lkml/458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com/
> 
> 
> 458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com
> 
> Will try updating b4...

Didn't help, so please collect the new tags and resubmit.

- Arnaldo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER
@ 2022-05-11 18:29           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 44+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-11 18:29 UTC (permalink / raw)
  To: Leo Yan
  Cc: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

Em Wed, May 11, 2022 at 03:28:00PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, May 11, 2022 at 10:20:04AM +0800, Leo Yan escreveu:
> > On Tue, May 10, 2022 at 01:28:38PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, May 04, 2022 at 06:48:47PM +0000, Ali Saidi escreveu:
> > > > Add a flag to the perf mem data struct to signal that a request caused a
> > > > cache-to-cache transfer of a line from a peer of the requestor and
> > > > wasn't sourced from a lower cache level.  The line being moved from one
> > > > peer cache to another has latency and performance implications. On Arm64
> > > > Neoverse systems the data source can indicate a cache-to-cache transfer
> > > > but not if the line is dirty or clean, so instead of overloading HITM
> > > > define a new flag that indicates this type of transfer.
> > > > 
> > > > Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> > > > Reviewed-by: Leo Yan <leo.yan@linaro.org>
> > > 
> > > Was this already merged on the ARM kernel tree?
> > 
> > No, I don't think this patch has been merged on Arm kernel tree.  I searched
> > Arm and Arm64 git repos, none of them has merged this patch.
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?qt=author&q=Ali+Saidi
> > http://git.armlinux.org.uk/cgit/linux-arm.git/log/?qt=author&q=Ali+Saidi
> > 
> > P.s. Ali missed to include German's review tag, see:
> > https://lore.kernel.org/lkml/458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com/
> > 
> > Do you want us to resend the patch set for adding tags?
> 
> I use b4 and it should collect Reviewed-by, Acked-by, etc tags, for
> instance, if I use the message-id in your message:
> 
> ⬢[acme@toolbox perf]$ b4 am -ctsl --cc-trailers 20220511022004.GA956170@leoy-ThinkPad-X240s
> Looking up https://lore.kernel.org/r/20220511022004.GA956170%40leoy-ThinkPad-X240s
> Grabbing thread from lore.kernel.org/all/20220511022004.GA956170%40leoy-ThinkPad-X240s/t.mbox.gz
> Checking for newer revisions on https://lore.kernel.org/all/
> Analyzing 12 messages in the thread
<SNIP>
>   ✓ [PATCH v8 5/5] perf arm-spe: Use SPE data source for neoverse cores
>     + Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>     + Link: https://lore.kernel.org/r/20220504184850.24986-6-alisaidi@amazon.com
>     + Cc: Nick.Forrington@arm.com
>     + Cc: andrew.kilroy@arm.com
>     + Cc: james.clark@arm.com
>     + Cc: mark.rutland@arm.com
>     + Cc: john.garry@huawei.com
>     + Cc: lihuafei1@huawei.com
>     + Cc: peterz@infradead.org
>     + Cc: benh@kernel.crashing.org
>     + Cc: acme@kernel.org
>     + Cc: jolsa@kernel.org
>     + Cc: namhyung@kernel.org
>     + Cc: will@kernel.org
>     + Cc: mathieu.poirier@linaro.org
>     + Cc: kjain@linux.ibm.com
>     + Cc: alexander.shishkin@linux.intel.com
>     + Cc: linux-arm-kernel@lists.infradead.org
>     + Cc: mingo@redhat.com
>     + Cc: linux-kernel@vger.kernel.org
>     + Cc: linux-perf-users@vger.kernel.org
>   ---
>   ✓ Signed: DKIM/amazon.com
> ---
> Total patches: 5
> ---
> Cover: ./v8_20220504_alisaidi_perf_arm_spe_decode_spe_source_and_use_for_perf_c2c.cover
>  Link: https://lore.kernel.org/r/20220504184850.24986-1-alisaidi@amazon.com
>  Base: not specified
>        git am ./v8_20220504_alisaidi_perf_arm_spe_decode_spe_source_and_use_for_perf_c2c.mbx
> ⬢[acme@toolbox perf]$
> 
> Somehow it is not being collected... :-\
> 
> Not even when I use:
> 
> > P.s. Ali missed to include German's review tag, see:
> > https://lore.kernel.org/lkml/458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com/
> 
> 
> 458a2de1-dc93-7e2d-5dc5-fbcd670572b6@arm.com
> 
> Will try updating b4...

Didn't help, so please collect the new tags and resubmit.

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
  2022-05-23 17:24           ` Joe Mario
@ 2022-05-26 14:44             ` Leo Yan
  -1 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-26 14:44 UTC (permalink / raw)
  To: Joe Mario
  Cc: Arnaldo Carvalho de Melo, Ali Saidi, linux-kernel,
	linux-perf-users, linux-arm-kernel, german.gomez, benh,
	Nick.Forrington, alexander.shishkin, andrew.kilroy, james.clark,
	john.garry, Jiri Olsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will

Hi Joe,

On Mon, May 23, 2022 at 01:24:32PM -0400, Joe Mario wrote:

[...]

> Hi Leo:
> I figured out why my output was different than yours.
> 
> I did not have the slang-devel rpm installed on the host system.  
> 
> In my original perf build, I missed the this output in the build log:
>  > slang not found, disables TUI support. Please install slang-devel, libslang-dev or libslang2-dev
> 
> Once I installed slang-devel, rebuilt perf, and then reran my test, the pareto output looked fine.
> 
> When the TUI support is disabled, it shouldn't corrupt the resulting stdio output.  I don't believe this has anything to do with your commits.  

Thanks for taking time to hunt this issue.  I checked the code and
sent out a patch to fix the stdio interface if slang lib is not
installed.  Please see the patch:

https://lore.kernel.org/lkml/20220526143917.607928-1-leo.yan@linaro.org/T/#u

> Last, it looks like you should update the help text for the display flag options to reflect your new peer option.
> Currently it says:
>    -d, --display <Switch HITM output type>
>                           lcl,rmt
> 
> But since you added the "peer" display, shouldn't the output for that help text state:
>    -d, --display <Switch HITM output type>
>                           lcl,rmt,peer

Yeah, will fix.

Very appreciate for your detailed testing and suggestions.

Leo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-26 14:44             ` Leo Yan
  0 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-26 14:44 UTC (permalink / raw)
  To: Joe Mario
  Cc: Arnaldo Carvalho de Melo, Ali Saidi, linux-kernel,
	linux-perf-users, linux-arm-kernel, german.gomez, benh,
	Nick.Forrington, alexander.shishkin, andrew.kilroy, james.clark,
	john.garry, Jiri Olsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will

Hi Joe,

On Mon, May 23, 2022 at 01:24:32PM -0400, Joe Mario wrote:

[...]

> Hi Leo:
> I figured out why my output was different than yours.
> 
> I did not have the slang-devel rpm installed on the host system.  
> 
> In my original perf build, I missed the this output in the build log:
>  > slang not found, disables TUI support. Please install slang-devel, libslang-dev or libslang2-dev
> 
> Once I installed slang-devel, rebuilt perf, and then reran my test, the pareto output looked fine.
> 
> When the TUI support is disabled, it shouldn't corrupt the resulting stdio output.  I don't believe this has anything to do with your commits.  

Thanks for taking time to hunt this issue.  I checked the code and
sent out a patch to fix the stdio interface if slang lib is not
installed.  Please see the patch:

https://lore.kernel.org/lkml/20220526143917.607928-1-leo.yan@linaro.org/T/#u

> Last, it looks like you should update the help text for the display flag options to reflect your new peer option.
> Currently it says:
>    -d, --display <Switch HITM output type>
>                           lcl,rmt
> 
> But since you added the "peer" display, shouldn't the output for that help text state:
>    -d, --display <Switch HITM output type>
>                           lcl,rmt,peer

Yeah, will fix.

Very appreciate for your detailed testing and suggestions.

Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
  2022-05-22  6:15         ` Leo Yan
@ 2022-05-23 17:24           ` Joe Mario
  -1 siblings, 0 replies; 44+ messages in thread
From: Joe Mario @ 2022-05-23 17:24 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Ali Saidi, linux-kernel,
	linux-perf-users, linux-arm-kernel, german.gomez, benh,
	Nick.Forrington, alexander.shishkin, andrew.kilroy, james.clark,
	john.garry, Jiri Olsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will



On 5/22/22 2:15 AM, Leo Yan wrote:
> Hi Joe,
> 
> On Thu, May 19, 2022 at 11:16:53AM -0400, Joe Mario wrote:
> 
> [SNIP]
> 
>> Last Comment:
>> There's a row in the Pareto table that has incorrect column alignment.
>> Look at row 80 below in the truncated snipit of output.  It has an extra field inserted in it at the beginning.
>> I also show what the corrected output should look like.
>>
>> Incorrect row 80:
>>     71	=================================================
>>     72	      Shared Cache Line Distribution Pareto      
>>     73	=================================================
>>     74	#
>>     75	# ----- HITM -----    Snoop  ------- Store Refs ------  ------- CL --------                      
>>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A    Off  Node  PA cnt        Code address
>>     77	# .......  .......  .......  .......  .......  .......  .....  ....  ......  ..................
>>     78	#
>>     79	  -------------------------------------------------------------------------------
>>     80	      0        0        0     4648        0        0    11572            0x422140
>>     81	  -------------------------------------------------------------------------------
>>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
>>
>>
>> Corrected row 80:
>>     71	=================================================
>>     72	      Shared Cache Line Distribution Pareto      
>>     73	=================================================
>>     74	#
>>     75	# ----- HITM -----    Snoop  ------- Store Refs -----   ------- CL --------                       
>>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss     N/A     Off  Node  PA cnt        Code address
>>     77	# .......  .......  .......  .......  .......  ......   .....  ....  ......  ..................
>>     78	#
>>     79	  -------------------------------------------------------------------------------
>>     80	       0        0     4648        0        0    11572            0x422140
>>     81	  -------------------------------------------------------------------------------
>>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
> 
> Hmm‥.  At my side, I used below command to output pareto view, but I
> cannot see the conlumn "CL", the conlumn "CL" is only shown for TUI
> mode but not for the mode "--stdio".  Could you share the method for
> how to reproduce this issue?

Hi Leo:
I figured out why my output was different than yours.

I did not have the slang-devel rpm installed on the host system.  

In my original perf build, I missed the this output in the build log:
 > slang not found, disables TUI support. Please install slang-devel, libslang-dev or libslang2-dev

Once I installed slang-devel, rebuilt perf, and then reran my test, the pareto output looked fine.

When the TUI support is disabled, it shouldn't corrupt the resulting stdio output.  I don't believe this has anything to do with your commits.  

Last, it looks like you should update the help text for the display flag options to reflect your new peer option.
Currently it says:
   -d, --display <Switch HITM output type>
                          lcl,rmt

But since you added the "peer" display, shouldn't the output for that help text state:
   -d, --display <Switch HITM output type>
                          lcl,rmt,peer

Joe


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-23 17:24           ` Joe Mario
  0 siblings, 0 replies; 44+ messages in thread
From: Joe Mario @ 2022-05-23 17:24 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Ali Saidi, linux-kernel,
	linux-perf-users, linux-arm-kernel, german.gomez, benh,
	Nick.Forrington, alexander.shishkin, andrew.kilroy, james.clark,
	john.garry, Jiri Olsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will



On 5/22/22 2:15 AM, Leo Yan wrote:
> Hi Joe,
> 
> On Thu, May 19, 2022 at 11:16:53AM -0400, Joe Mario wrote:
> 
> [SNIP]
> 
>> Last Comment:
>> There's a row in the Pareto table that has incorrect column alignment.
>> Look at row 80 below in the truncated snipit of output.  It has an extra field inserted in it at the beginning.
>> I also show what the corrected output should look like.
>>
>> Incorrect row 80:
>>     71	=================================================
>>     72	      Shared Cache Line Distribution Pareto      
>>     73	=================================================
>>     74	#
>>     75	# ----- HITM -----    Snoop  ------- Store Refs ------  ------- CL --------                      
>>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A    Off  Node  PA cnt        Code address
>>     77	# .......  .......  .......  .......  .......  .......  .....  ....  ......  ..................
>>     78	#
>>     79	  -------------------------------------------------------------------------------
>>     80	      0        0        0     4648        0        0    11572            0x422140
>>     81	  -------------------------------------------------------------------------------
>>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
>>
>>
>> Corrected row 80:
>>     71	=================================================
>>     72	      Shared Cache Line Distribution Pareto      
>>     73	=================================================
>>     74	#
>>     75	# ----- HITM -----    Snoop  ------- Store Refs -----   ------- CL --------                       
>>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss     N/A     Off  Node  PA cnt        Code address
>>     77	# .......  .......  .......  .......  .......  ......   .....  ....  ......  ..................
>>     78	#
>>     79	  -------------------------------------------------------------------------------
>>     80	       0        0     4648        0        0    11572            0x422140
>>     81	  -------------------------------------------------------------------------------
>>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
> 
> Hmm‥.  At my side, I used below command to output pareto view, but I
> cannot see the conlumn "CL", the conlumn "CL" is only shown for TUI
> mode but not for the mode "--stdio".  Could you share the method for
> how to reproduce this issue?

Hi Leo:
I figured out why my output was different than yours.

I did not have the slang-devel rpm installed on the host system.  

In my original perf build, I missed the this output in the build log:
 > slang not found, disables TUI support. Please install slang-devel, libslang-dev or libslang2-dev

Once I installed slang-devel, rebuilt perf, and then reran my test, the pareto output looked fine.

When the TUI support is disabled, it shouldn't corrupt the resulting stdio output.  I don't believe this has anything to do with your commits.  

Last, it looks like you should update the help text for the display flag options to reflect your new peer option.
Currently it says:
   -d, --display <Switch HITM output type>
                          lcl,rmt

But since you added the "peer" display, shouldn't the output for that help text state:
   -d, --display <Switch HITM output type>
                          lcl,rmt,peer

Joe


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
  2022-05-19 15:16       ` Joe Mario
@ 2022-05-22  6:15         ` Leo Yan
  -1 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-22  6:15 UTC (permalink / raw)
  To: Joe Mario
  Cc: Arnaldo Carvalho de Melo, Ali Saidi, linux-kernel,
	linux-perf-users, linux-arm-kernel, german.gomez, benh,
	Nick.Forrington, alexander.shishkin, andrew.kilroy, james.clark,
	john.garry, Jiri Olsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will

Hi Joe,

On Thu, May 19, 2022 at 11:16:53AM -0400, Joe Mario wrote:

[...]

> Hi Leo:
> Thanks for getting this working on ARM.  I do have a few comments.
> 
> I built and ran this on a ARM Neoverse-N1 system with 2 numa nodes.  
> 
> Comment 1:
> When I run "perf c2c report", the "Node" field is marked "N/A".  It's supposed to show the numa node where the data address for the cacheline resides.  That's important both to see what node hot data resides on and if that data is getting lots of cross-numa accesses. 

Good catching.  Will fix it.

> Comment 2:
> I'm assuming you're identifying the contended cachelines using the "peer" load response, which indicates the load was resolved from a "peer" cpu's cacheline.  Please confirm.

Yeah, "peer" is ambiguous.  AFAIK, "peer" load can come from:
- Local node which in peer CPU's cache (can be same cluster or peer cluster);
- Remove ndoe which in CPU's cache line, or even from *remote DRAM*.

> If that's true, is it possible to identify if that "peer" response was on the local or remote numa node?  

Good point.  Yes, we can do this.  So far, the remote accesses are
accounted in the metric "rmt_hit", it should be same with the
remote peer load; but so far we have no a metric to account local
peer loads, it would be not hard to add metric "lcl_peer".

> I ask because being able to identify both local and remote HitM's on Intel X86_64 has been quite valuable.  That's because remote HitM's are costly and because it helps the viewer see if they need to optimize their cpu affinity or what node their hot data resides on.

Thanks a lot for the info.  This means at least I should refine the shared
cache line distribution pareto for remote peer access, will do some
experiment for the enhancement.

> Last Comment:
> There's a row in the Pareto table that has incorrect column alignment.
> Look at row 80 below in the truncated snipit of output.  It has an extra field inserted in it at the beginning.
> I also show what the corrected output should look like.
> 
> Incorrect row 80:
>     71	=================================================
>     72	      Shared Cache Line Distribution Pareto      
>     73	=================================================
>     74	#
>     75	# ----- HITM -----    Snoop  ------- Store Refs ------  ------- CL --------                      
>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A    Off  Node  PA cnt        Code address
>     77	# .......  .......  .......  .......  .......  .......  .....  ....  ......  ..................
>     78	#
>     79	  -------------------------------------------------------------------------------
>     80	      0        0        0     4648        0        0    11572            0x422140
>     81	  -------------------------------------------------------------------------------
>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
> 
> 
> Corrected row 80:
>     71	=================================================
>     72	      Shared Cache Line Distribution Pareto      
>     73	=================================================
>     74	#
>     75	# ----- HITM -----    Snoop  ------- Store Refs -----   ------- CL --------                       
>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss     N/A     Off  Node  PA cnt        Code address
>     77	# .......  .......  .......  .......  .......  ......   .....  ....  ......  ..................
>     78	#
>     79	  -------------------------------------------------------------------------------
>     80	       0        0     4648        0        0    11572            0x422140
>     81	  -------------------------------------------------------------------------------
>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038

Hmm‥.  At my side, I used below command to output pareto view, but I
cannot see the conlumn "CL", the conlumn "CL" is only shown for TUI
mode but not for the mode "--stdio".  Could you share the method for
how to reproduce this issue?

$ ./perf c2c report -i perf.data.v3 -N

=================================================
      Shared Cache Line Distribution Pareto      
=================================================
#
#        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                              
#   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object             Source:Line  Node{cpus %peers %stores}
# .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  ......................  ....
#
  -------------------------------------------------------------------------------
      0        0        0    56183        0        0    26534            0x420180
  -------------------------------------------------------------------------------
           0.00%    0.00%   99.85%    0.00%    0.00%    0.00%                 0x0   N/A       0            0x400bd0         0         0      1587      4034   188785         2  [.] 0x0000000000000bd0  false_sharing.exe  false_sharing.exe[bd0]   0{ 1  87.4%    n/a}  1{ 1  12.6%    n/a}
           0.00%    0.00%    0.00%    0.00%    0.00%   54.56%                 0x0   N/A       0            0x400bd4         0         0         0         0    14476         2  [.] 0x0000000000000bd4  false_sharing.exe  false_sharing.exe[bd4]   0{ 1    n/a   0.2%}  1{ 1    n/a  99.8%}
           0.00%    0.00%    0.00%    0.00%    0.00%   45.44%                 0x0   N/A       0            0x400bf8         0         0         0         0    12058         2  [.] 0x0000000000000bf8  false_sharing.exe  false_sharing.exe[bf8]   0{ 1    n/a  70.3%}  1{ 1    n/a  29.7%}
           0.00%    0.00%    0.15%    0.00%    0.00%    0.00%                0x20   N/A       0            0x400c64         0         0      2462      2451     4835         2  [.] 0x0000000000000c64  false_sharing.exe  false_sharing.exe[c64]   0{ 1  11.9%    n/a}  1{ 1  88.1%    n/a}

  -------------------------------------------------------------------------------
      1        0        0     2571        0        0    69861            0x420100
  -------------------------------------------------------------------------------
           0.00%    0.00%    0.00%    0.00%    0.00%  100.00%                 0x8   N/A       0            0x400c08         0         0         0         0    69861         2  [.] 0x0000000000000c08  false_sharing.exe  false_sharing.exe[c08]   0{ 1    n/a  62.1%}  1{ 1    n/a  37.9%}
           0.00%    0.00%  100.00%    0.00%    0.00%    0.00%                0x20   N/A       0            0x400c74         0         0       834       641     6576         2  [.] 0x0000000000000c74  false_sharing.exe  false_sharing.exe[c74]   0{ 1  93.2%    n/a}  1{ 1   6.8%    n/a}

Very appreciate your testing and suggestions!

Leo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-22  6:15         ` Leo Yan
  0 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-22  6:15 UTC (permalink / raw)
  To: Joe Mario
  Cc: Arnaldo Carvalho de Melo, Ali Saidi, linux-kernel,
	linux-perf-users, linux-arm-kernel, german.gomez, benh,
	Nick.Forrington, alexander.shishkin, andrew.kilroy, james.clark,
	john.garry, Jiri Olsa, kjain, lihuafei1, mark.rutland,
	mathieu.poirier, mingo, namhyung, peterz, will

Hi Joe,

On Thu, May 19, 2022 at 11:16:53AM -0400, Joe Mario wrote:

[...]

> Hi Leo:
> Thanks for getting this working on ARM.  I do have a few comments.
> 
> I built and ran this on a ARM Neoverse-N1 system with 2 numa nodes.  
> 
> Comment 1:
> When I run "perf c2c report", the "Node" field is marked "N/A".  It's supposed to show the numa node where the data address for the cacheline resides.  That's important both to see what node hot data resides on and if that data is getting lots of cross-numa accesses. 

Good catching.  Will fix it.

> Comment 2:
> I'm assuming you're identifying the contended cachelines using the "peer" load response, which indicates the load was resolved from a "peer" cpu's cacheline.  Please confirm.

Yeah, "peer" is ambiguous.  AFAIK, "peer" load can come from:
- Local node which in peer CPU's cache (can be same cluster or peer cluster);
- Remove ndoe which in CPU's cache line, or even from *remote DRAM*.

> If that's true, is it possible to identify if that "peer" response was on the local or remote numa node?  

Good point.  Yes, we can do this.  So far, the remote accesses are
accounted in the metric "rmt_hit", it should be same with the
remote peer load; but so far we have no a metric to account local
peer loads, it would be not hard to add metric "lcl_peer".

> I ask because being able to identify both local and remote HitM's on Intel X86_64 has been quite valuable.  That's because remote HitM's are costly and because it helps the viewer see if they need to optimize their cpu affinity or what node their hot data resides on.

Thanks a lot for the info.  This means at least I should refine the shared
cache line distribution pareto for remote peer access, will do some
experiment for the enhancement.

> Last Comment:
> There's a row in the Pareto table that has incorrect column alignment.
> Look at row 80 below in the truncated snipit of output.  It has an extra field inserted in it at the beginning.
> I also show what the corrected output should look like.
> 
> Incorrect row 80:
>     71	=================================================
>     72	      Shared Cache Line Distribution Pareto      
>     73	=================================================
>     74	#
>     75	# ----- HITM -----    Snoop  ------- Store Refs ------  ------- CL --------                      
>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A    Off  Node  PA cnt        Code address
>     77	# .......  .......  .......  .......  .......  .......  .....  ....  ......  ..................
>     78	#
>     79	  -------------------------------------------------------------------------------
>     80	      0        0        0     4648        0        0    11572            0x422140
>     81	  -------------------------------------------------------------------------------
>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
> 
> 
> Corrected row 80:
>     71	=================================================
>     72	      Shared Cache Line Distribution Pareto      
>     73	=================================================
>     74	#
>     75	# ----- HITM -----    Snoop  ------- Store Refs -----   ------- CL --------                       
>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss     N/A     Off  Node  PA cnt        Code address
>     77	# .......  .......  .......  .......  .......  ......   .....  ....  ......  ..................
>     78	#
>     79	  -------------------------------------------------------------------------------
>     80	       0        0     4648        0        0    11572            0x422140
>     81	  -------------------------------------------------------------------------------
>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038

Hmm‥.  At my side, I used below command to output pareto view, but I
cannot see the conlumn "CL", the conlumn "CL" is only shown for TUI
mode but not for the mode "--stdio".  Could you share the method for
how to reproduce this issue?

$ ./perf c2c report -i perf.data.v3 -N

=================================================
      Shared Cache Line Distribution Pareto      
=================================================
#
#        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                              
#   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object             Source:Line  Node{cpus %peers %stores}
# .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  ......................  ....
#
  -------------------------------------------------------------------------------
      0        0        0    56183        0        0    26534            0x420180
  -------------------------------------------------------------------------------
           0.00%    0.00%   99.85%    0.00%    0.00%    0.00%                 0x0   N/A       0            0x400bd0         0         0      1587      4034   188785         2  [.] 0x0000000000000bd0  false_sharing.exe  false_sharing.exe[bd0]   0{ 1  87.4%    n/a}  1{ 1  12.6%    n/a}
           0.00%    0.00%    0.00%    0.00%    0.00%   54.56%                 0x0   N/A       0            0x400bd4         0         0         0         0    14476         2  [.] 0x0000000000000bd4  false_sharing.exe  false_sharing.exe[bd4]   0{ 1    n/a   0.2%}  1{ 1    n/a  99.8%}
           0.00%    0.00%    0.00%    0.00%    0.00%   45.44%                 0x0   N/A       0            0x400bf8         0         0         0         0    12058         2  [.] 0x0000000000000bf8  false_sharing.exe  false_sharing.exe[bf8]   0{ 1    n/a  70.3%}  1{ 1    n/a  29.7%}
           0.00%    0.00%    0.15%    0.00%    0.00%    0.00%                0x20   N/A       0            0x400c64         0         0      2462      2451     4835         2  [.] 0x0000000000000c64  false_sharing.exe  false_sharing.exe[c64]   0{ 1  11.9%    n/a}  1{ 1  88.1%    n/a}

  -------------------------------------------------------------------------------
      1        0        0     2571        0        0    69861            0x420100
  -------------------------------------------------------------------------------
           0.00%    0.00%    0.00%    0.00%    0.00%  100.00%                 0x8   N/A       0            0x400c08         0         0         0         0    69861         2  [.] 0x0000000000000c08  false_sharing.exe  false_sharing.exe[c08]   0{ 1    n/a  62.1%}  1{ 1    n/a  37.9%}
           0.00%    0.00%  100.00%    0.00%    0.00%    0.00%                0x20   N/A       0            0x400c74         0         0       834       641     6576         2  [.] 0x0000000000000c74  false_sharing.exe  false_sharing.exe[c74]   0{ 1  93.2%    n/a}  1{ 1   6.8%    n/a}

Very appreciate your testing and suggestions!

Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
  2022-05-18  4:16     ` Leo Yan
@ 2022-05-19 15:16       ` Joe Mario
  -1 siblings, 0 replies; 44+ messages in thread
From: Joe Mario @ 2022-05-19 15:16 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo
  Cc: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, Jiri Olsa, kjain,
	lihuafei1, mark.rutland, mathieu.poirier, mingo, namhyung,
	peterz, will



On 5/18/22 12:16 AM, Leo Yan wrote:
> Hi Joe,
> 
> On Tue, May 17, 2022 at 06:20:03PM -0300, Arnaldo Carvalho de Melo wrote:
>> Em Tue, May 17, 2022 at 02:03:21AM +0000, Ali Saidi escreveu:
>>> When synthesizing data from SPE, augment the type with source information
>>> for Arm Neoverse cores so we can detect situtions like cache line
>>> contention and transfers on Arm platforms. 
>>>
>>> This changes enables future changes to c2c on a system with SPE where lines that
>>> are shared among multiple cores show up in perf c2c output. 
>>>
>>> Changes is v9:
>>>  * Change reporting of remote socket data which should make Leo's upcomping
>>>    patch set for c2c make sense on multi-socket platforms  
>>
>> Hey,
>>
>> 	Joe Mario, who is one of 'perf c2c' authors asked me about some
>> git tree he could clone from for both building the kernel and
>> tools/perf/ so that he could do tests, can you please provide that?
> 
> I have uploaded the latest patches for enabling 'perf c2c' on Arm SPE
> on the repo:
> 
> https://git.linaro.org/people/leo.yan/linux-spe.git branch: perf_c2c_arm_spe_peer_v3
> 
> Below are the quick notes for build the kernel with enabling Arm SPE:
> 
>   $ git clone -b perf_c2c_arm_spe_peer_v3 https://git.linaro.org/people/leo.yan/linux-spe.git
> 
>   Or
> 
>   $ git clone -b perf_c2c_arm_spe_peer_v3 ssh://git@git.linaro.org/people/leo.yan/linux-spe.git
> 
>   $ cd linux-spe
> 
>   # Build kernel
>   $ make defconfig
>   $ ./scripts/config -e CONFIG_PID_IN_CONTEXTIDR
>   $ ./scripts/config -e CONFIG_ARM_SPE_PMU
>   $ make Image
> 
>   # Build perf
>   $ cd tools/perf
>   $ make VF=1 DEBUG=1
> 
> When boot the kernel, please add option "kpti=off" in kernel command
> line, you might need to update grub menu for this.
> 
> Please feel free let us know if anything is not clear for you.
> 
> Thank you,
> Leo
> 

Hi Leo:
Thanks for getting this working on ARM.  I do have a few comments.

I built and ran this on a ARM Neoverse-N1 system with 2 numa nodes.  

Comment 1:
When I run "perf c2c report", the "Node" field is marked "N/A".  It's supposed to show the numa node where the data address for the cacheline resides.  That's important both to see what node hot data resides on and if that data is getting lots of cross-numa accesses. 

Comment 2:
I'm assuming you're identifying the contended cachelines using the "peer" load response, which indicates the load was resolved from a "peer" cpu's cacheline.  Please confirm.
If that's true, is it possible to identify if that "peer" response was on the local or remote numa node?  

I ask because being able to identify both local and remote HitM's on Intel X86_64 has been quite valuable.  That's because remote HitM's are costly and because it helps the viewer see if they need to optimize their cpu affinity or what node their hot data resides on.

Last Comment:
There's a row in the Pareto table that has incorrect column alignment.
Look at row 80 below in the truncated snipit of output.  It has an extra field inserted in it at the beginning.
I also show what the corrected output should look like.

Incorrect row 80:
    71	=================================================
    72	      Shared Cache Line Distribution Pareto      
    73	=================================================
    74	#
    75	# ----- HITM -----    Snoop  ------- Store Refs ------  ------- CL --------                      
    76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A    Off  Node  PA cnt        Code address
    77	# .......  .......  .......  .......  .......  .......  .....  ....  ......  ..................
    78	#
    79	  -------------------------------------------------------------------------------
    80	      0        0        0     4648        0        0    11572            0x422140
    81	  -------------------------------------------------------------------------------
    82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
    83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
    84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
    85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038


Corrected row 80:
    71	=================================================
    72	      Shared Cache Line Distribution Pareto      
    73	=================================================
    74	#
    75	# ----- HITM -----    Snoop  ------- Store Refs -----   ------- CL --------                       
    76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss     N/A     Off  Node  PA cnt        Code address
    77	# .......  .......  .......  .......  .......  ......   .....  ....  ......  ..................
    78	#
    79	  -------------------------------------------------------------------------------
    80	       0        0     4648        0        0    11572            0x422140
    81	  -------------------------------------------------------------------------------
    82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
    83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
    84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
    85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
       
Thanks again for doing this.
Joe


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-19 15:16       ` Joe Mario
  0 siblings, 0 replies; 44+ messages in thread
From: Joe Mario @ 2022-05-19 15:16 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo
  Cc: Ali Saidi, linux-kernel, linux-perf-users, linux-arm-kernel,
	german.gomez, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, Jiri Olsa, kjain,
	lihuafei1, mark.rutland, mathieu.poirier, mingo, namhyung,
	peterz, will



On 5/18/22 12:16 AM, Leo Yan wrote:
> Hi Joe,
> 
> On Tue, May 17, 2022 at 06:20:03PM -0300, Arnaldo Carvalho de Melo wrote:
>> Em Tue, May 17, 2022 at 02:03:21AM +0000, Ali Saidi escreveu:
>>> When synthesizing data from SPE, augment the type with source information
>>> for Arm Neoverse cores so we can detect situtions like cache line
>>> contention and transfers on Arm platforms. 
>>>
>>> This changes enables future changes to c2c on a system with SPE where lines that
>>> are shared among multiple cores show up in perf c2c output. 
>>>
>>> Changes is v9:
>>>  * Change reporting of remote socket data which should make Leo's upcomping
>>>    patch set for c2c make sense on multi-socket platforms  
>>
>> Hey,
>>
>> 	Joe Mario, who is one of 'perf c2c' authors asked me about some
>> git tree he could clone from for both building the kernel and
>> tools/perf/ so that he could do tests, can you please provide that?
> 
> I have uploaded the latest patches for enabling 'perf c2c' on Arm SPE
> on the repo:
> 
> https://git.linaro.org/people/leo.yan/linux-spe.git branch: perf_c2c_arm_spe_peer_v3
> 
> Below are the quick notes for build the kernel with enabling Arm SPE:
> 
>   $ git clone -b perf_c2c_arm_spe_peer_v3 https://git.linaro.org/people/leo.yan/linux-spe.git
> 
>   Or
> 
>   $ git clone -b perf_c2c_arm_spe_peer_v3 ssh://git@git.linaro.org/people/leo.yan/linux-spe.git
> 
>   $ cd linux-spe
> 
>   # Build kernel
>   $ make defconfig
>   $ ./scripts/config -e CONFIG_PID_IN_CONTEXTIDR
>   $ ./scripts/config -e CONFIG_ARM_SPE_PMU
>   $ make Image
> 
>   # Build perf
>   $ cd tools/perf
>   $ make VF=1 DEBUG=1
> 
> When boot the kernel, please add option "kpti=off" in kernel command
> line, you might need to update grub menu for this.
> 
> Please feel free let us know if anything is not clear for you.
> 
> Thank you,
> Leo
> 

Hi Leo:
Thanks for getting this working on ARM.  I do have a few comments.

I built and ran this on a ARM Neoverse-N1 system with 2 numa nodes.  

Comment 1:
When I run "perf c2c report", the "Node" field is marked "N/A".  It's supposed to show the numa node where the data address for the cacheline resides.  That's important both to see what node hot data resides on and if that data is getting lots of cross-numa accesses. 

Comment 2:
I'm assuming you're identifying the contended cachelines using the "peer" load response, which indicates the load was resolved from a "peer" cpu's cacheline.  Please confirm.
If that's true, is it possible to identify if that "peer" response was on the local or remote numa node?  

I ask because being able to identify both local and remote HitM's on Intel X86_64 has been quite valuable.  That's because remote HitM's are costly and because it helps the viewer see if they need to optimize their cpu affinity or what node their hot data resides on.

Last Comment:
There's a row in the Pareto table that has incorrect column alignment.
Look at row 80 below in the truncated snipit of output.  It has an extra field inserted in it at the beginning.
I also show what the corrected output should look like.

Incorrect row 80:
    71	=================================================
    72	      Shared Cache Line Distribution Pareto      
    73	=================================================
    74	#
    75	# ----- HITM -----    Snoop  ------- Store Refs ------  ------- CL --------                      
    76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A    Off  Node  PA cnt        Code address
    77	# .......  .......  .......  .......  .......  .......  .....  ....  ......  ..................
    78	#
    79	  -------------------------------------------------------------------------------
    80	      0        0        0     4648        0        0    11572            0x422140
    81	  -------------------------------------------------------------------------------
    82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
    83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
    84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
    85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038


Corrected row 80:
    71	=================================================
    72	      Shared Cache Line Distribution Pareto      
    73	=================================================
    74	#
    75	# ----- HITM -----    Snoop  ------- Store Refs -----   ------- CL --------                       
    76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss     N/A     Off  Node  PA cnt        Code address
    77	# .......  .......  .......  .......  .......  ......   .....  ....  ......  ..................
    78	#
    79	  -------------------------------------------------------------------------------
    80	       0        0     4648        0        0    11572            0x422140
    81	  -------------------------------------------------------------------------------
    82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
    83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
    84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
    85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
       
Thanks again for doing this.
Joe


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
  2022-05-17 21:20   ` Arnaldo Carvalho de Melo
@ 2022-05-18  4:16     ` Leo Yan
  -1 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-18  4:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ali Saidi, Joe Mario, linux-kernel, linux-perf-users,
	linux-arm-kernel, german.gomez, benh, Nick.Forrington,
	alexander.shishkin, andrew.kilroy, james.clark, john.garry,
	Jiri Olsa, kjain, lihuafei1, mark.rutland, mathieu.poirier,
	mingo, namhyung, peterz, will

Hi Joe,

On Tue, May 17, 2022 at 06:20:03PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 17, 2022 at 02:03:21AM +0000, Ali Saidi escreveu:
> > When synthesizing data from SPE, augment the type with source information
> > for Arm Neoverse cores so we can detect situtions like cache line
> > contention and transfers on Arm platforms. 
> > 
> > This changes enables future changes to c2c on a system with SPE where lines that
> > are shared among multiple cores show up in perf c2c output. 
> > 
> > Changes is v9:
> >  * Change reporting of remote socket data which should make Leo's upcomping
> >    patch set for c2c make sense on multi-socket platforms  
> 
> Hey,
> 
> 	Joe Mario, who is one of 'perf c2c' authors asked me about some
> git tree he could clone from for both building the kernel and
> tools/perf/ so that he could do tests, can you please provide that?

I have uploaded the latest patches for enabling 'perf c2c' on Arm SPE
on the repo:

https://git.linaro.org/people/leo.yan/linux-spe.git branch: perf_c2c_arm_spe_peer_v3

Below are the quick notes for build the kernel with enabling Arm SPE:

  $ git clone -b perf_c2c_arm_spe_peer_v3 https://git.linaro.org/people/leo.yan/linux-spe.git

  Or

  $ git clone -b perf_c2c_arm_spe_peer_v3 ssh://git@git.linaro.org/people/leo.yan/linux-spe.git

  $ cd linux-spe

  # Build kernel
  $ make defconfig
  $ ./scripts/config -e CONFIG_PID_IN_CONTEXTIDR
  $ ./scripts/config -e CONFIG_ARM_SPE_PMU
  $ make Image

  # Build perf
  $ cd tools/perf
  $ make VF=1 DEBUG=1

When boot the kernel, please add option "kpti=off" in kernel command
line, you might need to update grub menu for this.

Please feel free let us know if anything is not clear for you.

Thank you,
Leo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-18  4:16     ` Leo Yan
  0 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-18  4:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ali Saidi, Joe Mario, linux-kernel, linux-perf-users,
	linux-arm-kernel, german.gomez, benh, Nick.Forrington,
	alexander.shishkin, andrew.kilroy, james.clark, john.garry,
	Jiri Olsa, kjain, lihuafei1, mark.rutland, mathieu.poirier,
	mingo, namhyung, peterz, will

Hi Joe,

On Tue, May 17, 2022 at 06:20:03PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 17, 2022 at 02:03:21AM +0000, Ali Saidi escreveu:
> > When synthesizing data from SPE, augment the type with source information
> > for Arm Neoverse cores so we can detect situtions like cache line
> > contention and transfers on Arm platforms. 
> > 
> > This changes enables future changes to c2c on a system with SPE where lines that
> > are shared among multiple cores show up in perf c2c output. 
> > 
> > Changes is v9:
> >  * Change reporting of remote socket data which should make Leo's upcomping
> >    patch set for c2c make sense on multi-socket platforms  
> 
> Hey,
> 
> 	Joe Mario, who is one of 'perf c2c' authors asked me about some
> git tree he could clone from for both building the kernel and
> tools/perf/ so that he could do tests, can you please provide that?

I have uploaded the latest patches for enabling 'perf c2c' on Arm SPE
on the repo:

https://git.linaro.org/people/leo.yan/linux-spe.git branch: perf_c2c_arm_spe_peer_v3

Below are the quick notes for build the kernel with enabling Arm SPE:

  $ git clone -b perf_c2c_arm_spe_peer_v3 https://git.linaro.org/people/leo.yan/linux-spe.git

  Or

  $ git clone -b perf_c2c_arm_spe_peer_v3 ssh://git@git.linaro.org/people/leo.yan/linux-spe.git

  $ cd linux-spe

  # Build kernel
  $ make defconfig
  $ ./scripts/config -e CONFIG_PID_IN_CONTEXTIDR
  $ ./scripts/config -e CONFIG_ARM_SPE_PMU
  $ make Image

  # Build perf
  $ cd tools/perf
  $ make VF=1 DEBUG=1

When boot the kernel, please add option "kpti=off" in kernel command
line, you might need to update grub menu for this.

Please feel free let us know if anything is not clear for you.

Thank you,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
  2022-05-17 21:20   ` Arnaldo Carvalho de Melo
@ 2022-05-18  1:06     ` Leo Yan
  -1 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-18  1:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ali Saidi, Joe Mario, linux-kernel, linux-perf-users,
	linux-arm-kernel, german.gomez, benh, Nick.Forrington,
	alexander.shishkin, andrew.kilroy, james.clark, john.garry,
	Jiri Olsa, kjain, lihuafei1, mark.rutland, mathieu.poirier,
	mingo, namhyung, peterz, will

Hi Arnaldo,

On Tue, May 17, 2022 at 06:20:03PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 17, 2022 at 02:03:21AM +0000, Ali Saidi escreveu:
> > When synthesizing data from SPE, augment the type with source information
> > for Arm Neoverse cores so we can detect situtions like cache line
> > contention and transfers on Arm platforms. 
> > 
> > This changes enables future changes to c2c on a system with SPE where lines that
> > are shared among multiple cores show up in perf c2c output. 
> > 
> > Changes is v9:
> >  * Change reporting of remote socket data which should make Leo's upcomping
> >    patch set for c2c make sense on multi-socket platforms  
> 
> Hey,
> 
> 	Joe Mario, who is one of 'perf c2c' authors asked me about some
> git tree he could clone from for both building the kernel and
> tools/perf/ so that he could do tests, can you please provide that?

Sure, I will prepare a git tree for testing and share with Joe.

> thanks!

Also thanks for your reminding.

Leo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-18  1:06     ` Leo Yan
  0 siblings, 0 replies; 44+ messages in thread
From: Leo Yan @ 2022-05-18  1:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ali Saidi, Joe Mario, linux-kernel, linux-perf-users,
	linux-arm-kernel, german.gomez, benh, Nick.Forrington,
	alexander.shishkin, andrew.kilroy, james.clark, john.garry,
	Jiri Olsa, kjain, lihuafei1, mark.rutland, mathieu.poirier,
	mingo, namhyung, peterz, will

Hi Arnaldo,

On Tue, May 17, 2022 at 06:20:03PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 17, 2022 at 02:03:21AM +0000, Ali Saidi escreveu:
> > When synthesizing data from SPE, augment the type with source information
> > for Arm Neoverse cores so we can detect situtions like cache line
> > contention and transfers on Arm platforms. 
> > 
> > This changes enables future changes to c2c on a system with SPE where lines that
> > are shared among multiple cores show up in perf c2c output. 
> > 
> > Changes is v9:
> >  * Change reporting of remote socket data which should make Leo's upcomping
> >    patch set for c2c make sense on multi-socket platforms  
> 
> Hey,
> 
> 	Joe Mario, who is one of 'perf c2c' authors asked me about some
> git tree he could clone from for both building the kernel and
> tools/perf/ so that he could do tests, can you please provide that?

Sure, I will prepare a git tree for testing and share with Joe.

> thanks!

Also thanks for your reminding.

Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
  2022-05-17  2:03 ` Ali Saidi
@ 2022-05-17 21:20   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 44+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-17 21:20 UTC (permalink / raw)
  To: Ali Saidi, Joe Mario
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, Jiri Olsa, kjain,
	lihuafei1, mark.rutland, mathieu.poirier, mingo, namhyung,
	peterz, will

Em Tue, May 17, 2022 at 02:03:21AM +0000, Ali Saidi escreveu:
> When synthesizing data from SPE, augment the type with source information
> for Arm Neoverse cores so we can detect situtions like cache line
> contention and transfers on Arm platforms. 
> 
> This changes enables future changes to c2c on a system with SPE where lines that
> are shared among multiple cores show up in perf c2c output. 
> 
> Changes is v9:
>  * Change reporting of remote socket data which should make Leo's upcomping
>    patch set for c2c make sense on multi-socket platforms  

Hey,

	Joe Mario, who is one of 'perf c2c' authors asked me about some
git tree he could clone from for both building the kernel and
tools/perf/ so that he could do tests, can you please provide that?

thanks!

- Arnaldo
 
> Changes in v8:
>  * Report NA for both mem_lvl and mem_lvl_num for stores where we have no
>    information
> 
> Changes in v7:
>  * Minor change requested by Leo Yan
> 
> Changes in v6:
>   * Drop changes to c2c command which will come from Leo Yan
> 
> Changes in v5:
>   * Add a new snooping type to disambiguate cache-to-cache transfers where
>     we don't know if the data is clean or dirty.
>   * Set snoop flags on all the data-source cases
>   * Special case stores as we have no information on them
> 
> Changes in v4:
>   * Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/ 
>   * Add neoverse-v1 to the neoverse cores list
> 
> Ali Saidi (4):
>   tools: arm64: Import cputype.h
>   perf arm-spe: Use SPE data source for neoverse cores
>   perf mem: Support mem_lvl_num in c2c command
>   perf mem: Support HITM for when mem_lvl_num is any
> 
>  tools/arch/arm64/include/asm/cputype.h        | 258 ++++++++++++++++++
>  .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
>  tools/perf/util/arm-spe.c                     | 110 +++++++-
>  tools/perf/util/mem-events.c                  |  20 +-
>  5 files changed, 383 insertions(+), 18 deletions(-)
>  create mode 100644 tools/arch/arm64/include/asm/cputype.h
> 
> -- 
> 2.32.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-17 21:20   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 44+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-17 21:20 UTC (permalink / raw)
  To: Ali Saidi, Joe Mario
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, Jiri Olsa, kjain,
	lihuafei1, mark.rutland, mathieu.poirier, mingo, namhyung,
	peterz, will

Em Tue, May 17, 2022 at 02:03:21AM +0000, Ali Saidi escreveu:
> When synthesizing data from SPE, augment the type with source information
> for Arm Neoverse cores so we can detect situtions like cache line
> contention and transfers on Arm platforms. 
> 
> This changes enables future changes to c2c on a system with SPE where lines that
> are shared among multiple cores show up in perf c2c output. 
> 
> Changes is v9:
>  * Change reporting of remote socket data which should make Leo's upcomping
>    patch set for c2c make sense on multi-socket platforms  

Hey,

	Joe Mario, who is one of 'perf c2c' authors asked me about some
git tree he could clone from for both building the kernel and
tools/perf/ so that he could do tests, can you please provide that?

thanks!

- Arnaldo
 
> Changes in v8:
>  * Report NA for both mem_lvl and mem_lvl_num for stores where we have no
>    information
> 
> Changes in v7:
>  * Minor change requested by Leo Yan
> 
> Changes in v6:
>   * Drop changes to c2c command which will come from Leo Yan
> 
> Changes in v5:
>   * Add a new snooping type to disambiguate cache-to-cache transfers where
>     we don't know if the data is clean or dirty.
>   * Set snoop flags on all the data-source cases
>   * Special case stores as we have no information on them
> 
> Changes in v4:
>   * Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/ 
>   * Add neoverse-v1 to the neoverse cores list
> 
> Ali Saidi (4):
>   tools: arm64: Import cputype.h
>   perf arm-spe: Use SPE data source for neoverse cores
>   perf mem: Support mem_lvl_num in c2c command
>   perf mem: Support HITM for when mem_lvl_num is any
> 
>  tools/arch/arm64/include/asm/cputype.h        | 258 ++++++++++++++++++
>  .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
>  tools/perf/util/arm-spe.c                     | 110 +++++++-
>  tools/perf/util/mem-events.c                  |  20 +-
>  5 files changed, 383 insertions(+), 18 deletions(-)
>  create mode 100644 tools/arch/arm64/include/asm/cputype.h
> 
> -- 
> 2.32.0

-- 

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-17  2:03 ` Ali Saidi
  0 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-17  2:03 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line
contention and transfers on Arm platforms. 

This changes enables future changes to c2c on a system with SPE where lines that
are shared among multiple cores show up in perf c2c output. 

Changes is v9:
 * Change reporting of remote socket data which should make Leo's upcomping
   patch set for c2c make sense on multi-socket platforms  

Changes in v8:
 * Report NA for both mem_lvl and mem_lvl_num for stores where we have no
   information

Changes in v7:
 * Minor change requested by Leo Yan

Changes in v6:
  * Drop changes to c2c command which will come from Leo Yan

Changes in v5:
  * Add a new snooping type to disambiguate cache-to-cache transfers where
    we don't know if the data is clean or dirty.
  * Set snoop flags on all the data-source cases
  * Special case stores as we have no information on them

Changes in v4:
  * Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/ 
  * Add neoverse-v1 to the neoverse cores list

Ali Saidi (4):
  tools: arm64: Import cputype.h
  perf arm-spe: Use SPE data source for neoverse cores
  perf mem: Support mem_lvl_num in c2c command
  perf mem: Support HITM for when mem_lvl_num is any

 tools/arch/arm64/include/asm/cputype.h        | 258 ++++++++++++++++++
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
 tools/perf/util/arm-spe.c                     | 110 +++++++-
 tools/perf/util/mem-events.c                  |  20 +-
 5 files changed, 383 insertions(+), 18 deletions(-)
 create mode 100644 tools/arch/arm64/include/asm/cputype.h

-- 
2.32.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
@ 2022-05-17  2:03 ` Ali Saidi
  0 siblings, 0 replies; 44+ messages in thread
From: Ali Saidi @ 2022-05-17  2:03 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, german.gomez,
	leo.yan, acme
  Cc: alisaidi, benh, Nick.Forrington, alexander.shishkin,
	andrew.kilroy, james.clark, john.garry, jolsa, kjain, lihuafei1,
	mark.rutland, mathieu.poirier, mingo, namhyung, peterz, will

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line
contention and transfers on Arm platforms. 

This changes enables future changes to c2c on a system with SPE where lines that
are shared among multiple cores show up in perf c2c output. 

Changes is v9:
 * Change reporting of remote socket data which should make Leo's upcomping
   patch set for c2c make sense on multi-socket platforms  

Changes in v8:
 * Report NA for both mem_lvl and mem_lvl_num for stores where we have no
   information

Changes in v7:
 * Minor change requested by Leo Yan

Changes in v6:
  * Drop changes to c2c command which will come from Leo Yan

Changes in v5:
  * Add a new snooping type to disambiguate cache-to-cache transfers where
    we don't know if the data is clean or dirty.
  * Set snoop flags on all the data-source cases
  * Special case stores as we have no information on them

Changes in v4:
  * Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/ 
  * Add neoverse-v1 to the neoverse cores list

Ali Saidi (4):
  tools: arm64: Import cputype.h
  perf arm-spe: Use SPE data source for neoverse cores
  perf mem: Support mem_lvl_num in c2c command
  perf mem: Support HITM for when mem_lvl_num is any

 tools/arch/arm64/include/asm/cputype.h        | 258 ++++++++++++++++++
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
 tools/perf/util/arm-spe.c                     | 110 +++++++-
 tools/perf/util/mem-events.c                  |  20 +-
 5 files changed, 383 insertions(+), 18 deletions(-)
 create mode 100644 tools/arch/arm64/include/asm/cputype.h

-- 
2.32.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2022-05-26 14:46 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-04 18:48 [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
2022-05-04 18:48 ` Ali Saidi
2022-05-04 18:48 ` [PATCH v8 1/5] perf: Add SNOOP_PEER flag to perf mem data struct Ali Saidi
2022-05-04 18:48   ` Ali Saidi
2022-05-11  5:41   ` kajoljain
2022-05-11  5:41     ` kajoljain
2022-05-04 18:48 ` [PATCH v8 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER Ali Saidi
2022-05-04 18:48   ` Ali Saidi
2022-05-10 16:28   ` Arnaldo Carvalho de Melo
2022-05-10 16:28     ` Arnaldo Carvalho de Melo
2022-05-11  2:20     ` Leo Yan
2022-05-11  2:20       ` Leo Yan
2022-05-11 18:28       ` Arnaldo Carvalho de Melo
2022-05-11 18:28         ` Arnaldo Carvalho de Melo
2022-05-11 18:29         ` Arnaldo Carvalho de Melo
2022-05-11 18:29           ` Arnaldo Carvalho de Melo
2022-05-11  5:42   ` kajoljain
2022-05-11  5:42     ` kajoljain
2022-05-04 18:48 ` [PATCH v8 3/5] perf mem: Print snoop peer flag Ali Saidi
2022-05-04 18:48   ` Ali Saidi
2022-05-11  5:45   ` kajoljain
2022-05-11  5:45     ` kajoljain
2022-05-04 18:48 ` [PATCH v8 4/5] perf arm-spe: Don't set data source if it's not a memory operation Ali Saidi
2022-05-04 18:48   ` Ali Saidi
2022-05-04 18:48 ` [PATCH v8 5/5] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi
2022-05-04 18:48   ` Ali Saidi
2022-05-05 15:03   ` Leo Yan
2022-05-05 15:03     ` Leo Yan
2022-05-17  2:03 [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
2022-05-17  2:03 ` Ali Saidi
2022-05-17 21:20 ` Arnaldo Carvalho de Melo
2022-05-17 21:20   ` Arnaldo Carvalho de Melo
2022-05-18  1:06   ` Leo Yan
2022-05-18  1:06     ` Leo Yan
2022-05-18  4:16   ` Leo Yan
2022-05-18  4:16     ` Leo Yan
2022-05-19 15:16     ` Joe Mario
2022-05-19 15:16       ` Joe Mario
2022-05-22  6:15       ` Leo Yan
2022-05-22  6:15         ` Leo Yan
2022-05-23 17:24         ` Joe Mario
2022-05-23 17:24           ` Joe Mario
2022-05-26 14:44           ` Leo Yan
2022-05-26 14:44             ` Leo Yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.