All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] perf: Add new macros for mem_hops field
@ 2021-12-06  9:17 ` Kajol Jain
  0 siblings, 0 replies; 26+ messages in thread
From: Kajol Jain @ 2021-12-06  9:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus, kjain

Patchset adds new macros for mem_hops field which can be
used to represent remote-node, socket and board level details.

Currently the code had macro for HOPS_0, which corresponds
to data coming from another core but same node.
Add new macros for HOPS_1 to HOPS_3 to represent
remote-node, socket and board level data.

For ex: Encodings for mem_hops fields with L2 cache:

L2                      - local L2
L2 | REMOTE | HOPS_0    - remote core, same node L2
L2 | REMOTE | HOPS_1    - remote node, same socket L2
L2 | REMOTE | HOPS_2    - remote socket, same board L2
L2 | REMOTE | HOPS_3    - remote board L2

Patch 1 & 2 adds tool and kernel side changes to add new macros for
mem_hops field

Patch 3 add data source encodings for power10 and older platforms
to represent data based on newer composite  PERF_MEM_LVLNUM* fields

Patch 4 add data source encodings with proper sub_index used to
represent memory/cache level data for power10 platform.

Kajol Jain (4):
  perf: Add new macros for mem_hops field
  tools/perf: Add new macros for mem_hops field
  powerpc/perf: Add encodings to represent data based on newer composite
    PERF_MEM_LVLNUM* fields
  powerpc/perf: Add data source encodings for power10 platform

 arch/powerpc/perf/isa207-common.c     | 60 ++++++++++++++++++++-------
 include/uapi/linux/perf_event.h       |  5 ++-
 tools/include/uapi/linux/perf_event.h |  5 ++-
 tools/perf/util/mem-events.c          | 29 ++++++++-----
 4 files changed, 71 insertions(+), 28 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/4] perf: Add new macros for mem_hops field
@ 2021-12-06  9:17 ` Kajol Jain
  0 siblings, 0 replies; 26+ messages in thread
From: Kajol Jain @ 2021-12-06  9:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, kjain, ast, linux-perf-users, yao.jin, maddy,
	paulus, kan.liang

Patchset adds new macros for mem_hops field which can be
used to represent remote-node, socket and board level details.

Currently the code had macro for HOPS_0, which corresponds
to data coming from another core but same node.
Add new macros for HOPS_1 to HOPS_3 to represent
remote-node, socket and board level data.

For ex: Encodings for mem_hops fields with L2 cache:

L2                      - local L2
L2 | REMOTE | HOPS_0    - remote core, same node L2
L2 | REMOTE | HOPS_1    - remote node, same socket L2
L2 | REMOTE | HOPS_2    - remote socket, same board L2
L2 | REMOTE | HOPS_3    - remote board L2

Patch 1 & 2 adds tool and kernel side changes to add new macros for
mem_hops field

Patch 3 add data source encodings for power10 and older platforms
to represent data based on newer composite  PERF_MEM_LVLNUM* fields

Patch 4 add data source encodings with proper sub_index used to
represent memory/cache level data for power10 platform.

Kajol Jain (4):
  perf: Add new macros for mem_hops field
  tools/perf: Add new macros for mem_hops field
  powerpc/perf: Add encodings to represent data based on newer composite
    PERF_MEM_LVLNUM* fields
  powerpc/perf: Add data source encodings for power10 platform

 arch/powerpc/perf/isa207-common.c     | 60 ++++++++++++++++++++-------
 include/uapi/linux/perf_event.h       |  5 ++-
 tools/include/uapi/linux/perf_event.h |  5 ++-
 tools/perf/util/mem-events.c          | 29 ++++++++-----
 4 files changed, 71 insertions(+), 28 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] perf: Add new macros for mem_hops field
  2021-12-06  9:17 ` Kajol Jain
@ 2021-12-06  9:17   ` Kajol Jain
  -1 siblings, 0 replies; 26+ messages in thread
From: Kajol Jain @ 2021-12-06  9:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus, kjain

Add new macros for mem_hops field which can be used to
represent remote-node, socket and board level details.

Currently the code had macro for HOPS_0, which corresponds
to data coming from another core but same node.
Add new macros for HOPS_1 to HOPS_3 to represent
remote-node, socket and board level data.

For ex: Encodings for mem_hops fields with L2 cache:

L2			- local L2
L2 | REMOTE | HOPS_0	- remote core, same node L2
L2 | REMOTE | HOPS_1	- remote node, same socket L2
L2 | REMOTE | HOPS_2	- remote socket, same board L2
L2 | REMOTE | HOPS_3	- remote board L2

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 include/uapi/linux/perf_event.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index bd8860eeb291..1b65042ab1db 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1332,7 +1332,10 @@ union perf_mem_data_src {
 
 /* hop level */
 #define PERF_MEM_HOPS_0		0x01 /* remote core, same node */
-/* 2-7 available */
+#define PERF_MEM_HOPS_1		0x02 /* remote node, same socket */
+#define PERF_MEM_HOPS_2		0x03 /* remote socket, same board */
+#define PERF_MEM_HOPS_3		0x04 /* remote board */
+/* 5-7 available */
 #define PERF_MEM_HOPS_SHIFT	43
 
 #define PERF_MEM_S(a, s) \
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/4] perf: Add new macros for mem_hops field
@ 2021-12-06  9:17   ` Kajol Jain
  0 siblings, 0 replies; 26+ messages in thread
From: Kajol Jain @ 2021-12-06  9:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, kjain, ast, linux-perf-users, yao.jin, maddy,
	paulus, kan.liang

Add new macros for mem_hops field which can be used to
represent remote-node, socket and board level details.

Currently the code had macro for HOPS_0, which corresponds
to data coming from another core but same node.
Add new macros for HOPS_1 to HOPS_3 to represent
remote-node, socket and board level data.

For ex: Encodings for mem_hops fields with L2 cache:

L2			- local L2
L2 | REMOTE | HOPS_0	- remote core, same node L2
L2 | REMOTE | HOPS_1	- remote node, same socket L2
L2 | REMOTE | HOPS_2	- remote socket, same board L2
L2 | REMOTE | HOPS_3	- remote board L2

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 include/uapi/linux/perf_event.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index bd8860eeb291..1b65042ab1db 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1332,7 +1332,10 @@ union perf_mem_data_src {
 
 /* hop level */
 #define PERF_MEM_HOPS_0		0x01 /* remote core, same node */
-/* 2-7 available */
+#define PERF_MEM_HOPS_1		0x02 /* remote node, same socket */
+#define PERF_MEM_HOPS_2		0x03 /* remote socket, same board */
+#define PERF_MEM_HOPS_3		0x04 /* remote board */
+/* 5-7 available */
 #define PERF_MEM_HOPS_SHIFT	43
 
 #define PERF_MEM_S(a, s) \
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] tools/perf: Add new macros for mem_hops field
  2021-12-06  9:17 ` Kajol Jain
@ 2021-12-06  9:17   ` Kajol Jain
  -1 siblings, 0 replies; 26+ messages in thread
From: Kajol Jain @ 2021-12-06  9:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus, kjain

Add new macros for mem_hops field which can be used to
represent remote-node, socket and board level details.

Currently the code had macro for HOPS_0 which, corresponds
to data coming from another core but same node.
Add new macros for HOPS_1 to HOPS_3 to represent
remote-node, socket and board level data.

Also add corresponding strings in the mem_hops array to
represent mem_hop field data in perf_mem__lvl_scnprintf function

Incase mem_hops field is used, PERF_MEM_LVLNUM field also need
to be set inorder to represent the data source. Hence printing
data source via PERF_MEM_LVL field can be skip in that scenario.

For ex: Encodings for mem_hops fields with L2 cache:

L2                      - local L2
L2 | REMOTE | HOPS_0    - remote core, same node L2
L2 | REMOTE | HOPS_1    - remote node, same socket L2
L2 | REMOTE | HOPS_2    - remote socket, same board L2
L2 | REMOTE | HOPS_3    - remote board L2

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 tools/include/uapi/linux/perf_event.h |  5 ++++-
 tools/perf/util/mem-events.c          | 29 +++++++++++++++++----------
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index bd8860eeb291..4cd39aaccbe7 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1332,7 +1332,10 @@ union perf_mem_data_src {
 
 /* hop level */
 #define PERF_MEM_HOPS_0		0x01 /* remote core, same node */
-/* 2-7 available */
+#define PERF_MEM_HOPS_1         0x02 /* remote node, same socket */
+#define PERF_MEM_HOPS_2         0x03 /* remote socket, same board */
+#define PERF_MEM_HOPS_3         0x04 /* remote board */
+/* 5-7 available */
 #define PERF_MEM_HOPS_SHIFT	43
 
 #define PERF_MEM_S(a, s) \
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 3167b4628b6d..ed0ab838bcc5 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -309,6 +309,9 @@ static const char * const mem_hops[] = {
 	 * to be set with mem_hops field.
 	 */
 	"core, same node",
+	"node, same socket",
+	"socket, same board",
+	"board",
 };
 
 int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
@@ -316,7 +319,7 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 	size_t i, l = 0;
 	u64 m =  PERF_MEM_LVL_NA;
 	u64 hit, miss;
-	int printed;
+	int printed = 0;
 
 	if (mem_info)
 		m  = mem_info->data_src.mem_lvl;
@@ -335,18 +338,22 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 		l += 7;
 	}
 
-	if (mem_info && mem_info->data_src.mem_hops)
+	/*
+	 * Incase mem_hops field is set, we can skip printing data source via
+	 * PERF_MEM_LVL namespace.
+	 */
+	if (mem_info && mem_info->data_src.mem_hops) {
 		l += scnprintf(out + l, sz - l, "%s ", mem_hops[mem_info->data_src.mem_hops]);
-
-	printed = 0;
-	for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
-		if (!(m & 0x1))
-			continue;
-		if (printed++) {
-			strcat(out, " or ");
-			l += 4;
+	} else {
+		for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
+			if (!(m & 0x1))
+				continue;
+			if (printed++) {
+				strcat(out, " or ");
+				l += 4;
+			}
+			l += scnprintf(out + l, sz - l, mem_lvl[i]);
 		}
-		l += scnprintf(out + l, sz - l, mem_lvl[i]);
 	}
 
 	if (mem_info && mem_info->data_src.mem_lvl_num) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] tools/perf: Add new macros for mem_hops field
@ 2021-12-06  9:17   ` Kajol Jain
  0 siblings, 0 replies; 26+ messages in thread
From: Kajol Jain @ 2021-12-06  9:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, kjain, ast, linux-perf-users, yao.jin, maddy,
	paulus, kan.liang

Add new macros for mem_hops field which can be used to
represent remote-node, socket and board level details.

Currently the code had macro for HOPS_0 which, corresponds
to data coming from another core but same node.
Add new macros for HOPS_1 to HOPS_3 to represent
remote-node, socket and board level data.

Also add corresponding strings in the mem_hops array to
represent mem_hop field data in perf_mem__lvl_scnprintf function

Incase mem_hops field is used, PERF_MEM_LVLNUM field also need
to be set inorder to represent the data source. Hence printing
data source via PERF_MEM_LVL field can be skip in that scenario.

For ex: Encodings for mem_hops fields with L2 cache:

L2                      - local L2
L2 | REMOTE | HOPS_0    - remote core, same node L2
L2 | REMOTE | HOPS_1    - remote node, same socket L2
L2 | REMOTE | HOPS_2    - remote socket, same board L2
L2 | REMOTE | HOPS_3    - remote board L2

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 tools/include/uapi/linux/perf_event.h |  5 ++++-
 tools/perf/util/mem-events.c          | 29 +++++++++++++++++----------
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index bd8860eeb291..4cd39aaccbe7 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1332,7 +1332,10 @@ union perf_mem_data_src {
 
 /* hop level */
 #define PERF_MEM_HOPS_0		0x01 /* remote core, same node */
-/* 2-7 available */
+#define PERF_MEM_HOPS_1         0x02 /* remote node, same socket */
+#define PERF_MEM_HOPS_2         0x03 /* remote socket, same board */
+#define PERF_MEM_HOPS_3         0x04 /* remote board */
+/* 5-7 available */
 #define PERF_MEM_HOPS_SHIFT	43
 
 #define PERF_MEM_S(a, s) \
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 3167b4628b6d..ed0ab838bcc5 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -309,6 +309,9 @@ static const char * const mem_hops[] = {
 	 * to be set with mem_hops field.
 	 */
 	"core, same node",
+	"node, same socket",
+	"socket, same board",
+	"board",
 };
 
 int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
@@ -316,7 +319,7 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 	size_t i, l = 0;
 	u64 m =  PERF_MEM_LVL_NA;
 	u64 hit, miss;
-	int printed;
+	int printed = 0;
 
 	if (mem_info)
 		m  = mem_info->data_src.mem_lvl;
@@ -335,18 +338,22 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 		l += 7;
 	}
 
-	if (mem_info && mem_info->data_src.mem_hops)
+	/*
+	 * Incase mem_hops field is set, we can skip printing data source via
+	 * PERF_MEM_LVL namespace.
+	 */
+	if (mem_info && mem_info->data_src.mem_hops) {
 		l += scnprintf(out + l, sz - l, "%s ", mem_hops[mem_info->data_src.mem_hops]);
-
-	printed = 0;
-	for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
-		if (!(m & 0x1))
-			continue;
-		if (printed++) {
-			strcat(out, " or ");
-			l += 4;
+	} else {
+		for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
+			if (!(m & 0x1))
+				continue;
+			if (printed++) {
+				strcat(out, " or ");
+				l += 4;
+			}
+			l += scnprintf(out + l, sz - l, mem_lvl[i]);
 		}
-		l += scnprintf(out + l, sz - l, mem_lvl[i]);
 	}
 
 	if (mem_info && mem_info->data_src.mem_lvl_num) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields
  2021-12-06  9:17 ` Kajol Jain
@ 2021-12-06  9:17   ` Kajol Jain
  -1 siblings, 0 replies; 26+ messages in thread
From: Kajol Jain @ 2021-12-06  9:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus, kjain

The code represent data coming from L1/L2/L3 cache hits based on
PERF_MEM_LVL_* namespace, which is in the process of deprecation in
the favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_}
fields.

Add data source encodings to represent L1/L2/L3 cache hits based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields for
power10 and older platforms

Result in power9 system without patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                             Shared Object
 # ........  ............  ........................  .................................  ................
 #
    29.51%             1  L2 hit                    [k] perf_event_exec                [kernel.vmlinux]
    27.05%             1  L1 hit                    [k] perf_ctx_unlock                [kernel.vmlinux]
    13.93%             1  L1 hit                    [k] vtime_delta                    [kernel.vmlinux]
    13.11%             1  L1 hit                    [k] prepend_path.isra.11           [kernel.vmlinux]
     8.20%             1  L1 hit                    [.] 00000038.plt_call.__GI_strlen  libc-2.28.so
     8.20%             1  L1 hit                    [k] perf_event_interrupt           [kernel.vmlinux]

Result in power9 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    36.63%             1  L2 or L2 hit              [k] perf_event_exec         [kernel.vmlinux]
    25.50%             1  L1 or L1 hit              [k] vtime_delta             [kernel.vmlinux]
    13.12%             1  L1 or L1 hit              [k] unmap_region            [kernel.vmlinux]
    12.62%             1  L1 or L1 hit              [k] perf_sample_event_took  [kernel.vmlinux]
     6.93%             1  L1 or L1 hit              [k] perf_ctx_unlock         [kernel.vmlinux]
     5.20%             1  L1 or L1 hit              [.] __memcpy_power7         libc-2.28.so

Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/perf/isa207-common.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 7ea873ab2e6f..6c6bc8b7d887 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -220,13 +220,13 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
 		/* Nothing to do */
 		break;
 	case 1:
-		ret = PH(LVL, L1);
+		ret = PH(LVL, L1) | LEVEL(L1) | P(SNOOP, HIT);
 		break;
 	case 2:
-		ret = PH(LVL, L2);
+		ret = PH(LVL, L2) | LEVEL(L2) | P(SNOOP, HIT);
 		break;
 	case 3:
-		ret = PH(LVL, L3);
+		ret = PH(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT);
 		break;
 	case 4:
 		if (sub_idx <= 1)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields
@ 2021-12-06  9:17   ` Kajol Jain
  0 siblings, 0 replies; 26+ messages in thread
From: Kajol Jain @ 2021-12-06  9:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, kjain, ast, linux-perf-users, yao.jin, maddy,
	paulus, kan.liang

The code represent data coming from L1/L2/L3 cache hits based on
PERF_MEM_LVL_* namespace, which is in the process of deprecation in
the favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_}
fields.

Add data source encodings to represent L1/L2/L3 cache hits based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields for
power10 and older platforms

Result in power9 system without patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                             Shared Object
 # ........  ............  ........................  .................................  ................
 #
    29.51%             1  L2 hit                    [k] perf_event_exec                [kernel.vmlinux]
    27.05%             1  L1 hit                    [k] perf_ctx_unlock                [kernel.vmlinux]
    13.93%             1  L1 hit                    [k] vtime_delta                    [kernel.vmlinux]
    13.11%             1  L1 hit                    [k] prepend_path.isra.11           [kernel.vmlinux]
     8.20%             1  L1 hit                    [.] 00000038.plt_call.__GI_strlen  libc-2.28.so
     8.20%             1  L1 hit                    [k] perf_event_interrupt           [kernel.vmlinux]

Result in power9 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    36.63%             1  L2 or L2 hit              [k] perf_event_exec         [kernel.vmlinux]
    25.50%             1  L1 or L1 hit              [k] vtime_delta             [kernel.vmlinux]
    13.12%             1  L1 or L1 hit              [k] unmap_region            [kernel.vmlinux]
    12.62%             1  L1 or L1 hit              [k] perf_sample_event_took  [kernel.vmlinux]
     6.93%             1  L1 or L1 hit              [k] perf_ctx_unlock         [kernel.vmlinux]
     5.20%             1  L1 or L1 hit              [.] __memcpy_power7         libc-2.28.so

Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/perf/isa207-common.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 7ea873ab2e6f..6c6bc8b7d887 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -220,13 +220,13 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
 		/* Nothing to do */
 		break;
 	case 1:
-		ret = PH(LVL, L1);
+		ret = PH(LVL, L1) | LEVEL(L1) | P(SNOOP, HIT);
 		break;
 	case 2:
-		ret = PH(LVL, L2);
+		ret = PH(LVL, L2) | LEVEL(L2) | P(SNOOP, HIT);
 		break;
 	case 3:
-		ret = PH(LVL, L3);
+		ret = PH(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT);
 		break;
 	case 4:
 		if (sub_idx <= 1)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] powerpc/perf: Add data source encodings for power10 platform
  2021-12-06  9:17 ` Kajol Jain
@ 2021-12-06  9:17   ` Kajol Jain
  -1 siblings, 0 replies; 26+ messages in thread
From: Kajol Jain @ 2021-12-06  9:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus, kjain

The code represent memory/cache level data based on PERF_MEM_LVL_*
namespace, which is in the process of deprication in the favour of
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
Add data source encodings to represent cache/memory data based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.

Add data source encodings to represent data coming from local
memory/Remote memory/distant memory and remote/distant cache hits.

Inorder to represent data coming from OpenCAPI cache/memory, we use
LVLNUM "PMEM" field which is used to present persistent memory accesses.

Result in power10 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    29.46%          2331  L1 or L1 hit              [.] __random                                     libc-2.28.so
    23.11%          2121  L1 or L1 hit              [.] producer_populate_cache                      producer_consumer
    18.56%          1758  L1 or L1 hit              [.] __random_r                                   libc-2.28.so
    15.64%          1559  L2 or L2 hit              [.] __random                                     libc-2.28.so
    .....
    0.09%              5  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    0.07%              4  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    .....

Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/perf/isa207-common.c | 54 ++++++++++++++++++++++++-------
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 6c6bc8b7d887..4037ea652522 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -229,13 +229,28 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
 		ret = PH(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT);
 		break;
 	case 4:
-		if (sub_idx <= 1)
-			ret = PH(LVL, LOC_RAM);
-		else if (sub_idx > 1 && sub_idx <= 2)
-			ret = PH(LVL, REM_RAM1);
-		else
-			ret = PH(LVL, REM_RAM2);
-		ret |= P(SNOOP, HIT);
+		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+			ret = P(SNOOP, HIT);
+
+			if (sub_idx == 1)
+				ret |= PH(LVL, LOC_RAM) | LEVEL(RAM);
+			else if (sub_idx == 2 || sub_idx == 3)
+				ret |= P(LVL, HIT) | LEVEL(PMEM);
+			else if (sub_idx == 4)
+				ret |= PH(LVL, REM_RAM1) | REM | LEVEL(RAM) | P(HOPS, 2);
+			else if (sub_idx == 5 || sub_idx == 7)
+				ret |= P(LVL, HIT) | LEVEL(PMEM) | REM;
+			else if (sub_idx == 6)
+				ret |= PH(LVL, REM_RAM2) | REM | LEVEL(RAM) | P(HOPS, 3);
+		} else {
+			if (sub_idx <= 1)
+				ret = PH(LVL, LOC_RAM);
+			else if (sub_idx > 1 && sub_idx <= 2)
+				ret = PH(LVL, REM_RAM1);
+			else
+				ret = PH(LVL, REM_RAM2);
+			ret |= P(SNOOP, HIT);
+		}
 		break;
 	case 5:
 		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
@@ -261,11 +276,26 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
 		}
 		break;
 	case 6:
-		ret = PH(LVL, REM_CCE2);
-		if ((sub_idx == 0) || (sub_idx == 2))
-			ret |= P(SNOOP, HIT);
-		else if ((sub_idx == 1) || (sub_idx == 3))
-			ret |= P(SNOOP, HITM);
+		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+			if (sub_idx == 0)
+				ret = PH(LVL, REM_CCE1) | LEVEL(ANY_CACHE) | REM |
+					P(SNOOP, HIT) | P(HOPS, 2);
+			else if (sub_idx == 1)
+				ret = PH(LVL, REM_CCE1) | LEVEL(ANY_CACHE) | REM |
+					P(SNOOP, HITM) | P(HOPS, 2);
+			else if (sub_idx == 2)
+				ret = PH(LVL, REM_CCE2) | LEVEL(ANY_CACHE) | REM |
+					P(SNOOP, HIT) | P(HOPS, 3);
+			else if (sub_idx == 3)
+				ret = PH(LVL, REM_CCE2) | LEVEL(ANY_CACHE) | REM |
+					P(SNOOP, HITM) | P(HOPS, 3);
+		} else {
+			ret = PH(LVL, REM_CCE2);
+			if (sub_idx == 0 || sub_idx == 2)
+				ret |= P(SNOOP, HIT);
+			else if (sub_idx == 1 || sub_idx == 3)
+				ret |= P(SNOOP, HITM);
+		}
 		break;
 	case 7:
 		ret = PM(LVL, L1);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] powerpc/perf: Add data source encodings for power10 platform
@ 2021-12-06  9:17   ` Kajol Jain
  0 siblings, 0 replies; 26+ messages in thread
From: Kajol Jain @ 2021-12-06  9:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev, linux-kernel, peterz, mingo, acme, jolsa,
	namhyung, ak
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, kjain, ast, linux-perf-users, yao.jin, maddy,
	paulus, kan.liang

The code represent memory/cache level data based on PERF_MEM_LVL_*
namespace, which is in the process of deprication in the favour of
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
Add data source encodings to represent cache/memory data based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.

Add data source encodings to represent data coming from local
memory/Remote memory/distant memory and remote/distant cache hits.

Inorder to represent data coming from OpenCAPI cache/memory, we use
LVLNUM "PMEM" field which is used to present persistent memory accesses.

Result in power10 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    29.46%          2331  L1 or L1 hit              [.] __random                                     libc-2.28.so
    23.11%          2121  L1 or L1 hit              [.] producer_populate_cache                      producer_consumer
    18.56%          1758  L1 or L1 hit              [.] __random_r                                   libc-2.28.so
    15.64%          1559  L2 or L2 hit              [.] __random                                     libc-2.28.so
    .....
    0.09%              5  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    0.07%              4  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    .....

Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/perf/isa207-common.c | 54 ++++++++++++++++++++++++-------
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 6c6bc8b7d887..4037ea652522 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -229,13 +229,28 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
 		ret = PH(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT);
 		break;
 	case 4:
-		if (sub_idx <= 1)
-			ret = PH(LVL, LOC_RAM);
-		else if (sub_idx > 1 && sub_idx <= 2)
-			ret = PH(LVL, REM_RAM1);
-		else
-			ret = PH(LVL, REM_RAM2);
-		ret |= P(SNOOP, HIT);
+		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+			ret = P(SNOOP, HIT);
+
+			if (sub_idx == 1)
+				ret |= PH(LVL, LOC_RAM) | LEVEL(RAM);
+			else if (sub_idx == 2 || sub_idx == 3)
+				ret |= P(LVL, HIT) | LEVEL(PMEM);
+			else if (sub_idx == 4)
+				ret |= PH(LVL, REM_RAM1) | REM | LEVEL(RAM) | P(HOPS, 2);
+			else if (sub_idx == 5 || sub_idx == 7)
+				ret |= P(LVL, HIT) | LEVEL(PMEM) | REM;
+			else if (sub_idx == 6)
+				ret |= PH(LVL, REM_RAM2) | REM | LEVEL(RAM) | P(HOPS, 3);
+		} else {
+			if (sub_idx <= 1)
+				ret = PH(LVL, LOC_RAM);
+			else if (sub_idx > 1 && sub_idx <= 2)
+				ret = PH(LVL, REM_RAM1);
+			else
+				ret = PH(LVL, REM_RAM2);
+			ret |= P(SNOOP, HIT);
+		}
 		break;
 	case 5:
 		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
@@ -261,11 +276,26 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
 		}
 		break;
 	case 6:
-		ret = PH(LVL, REM_CCE2);
-		if ((sub_idx == 0) || (sub_idx == 2))
-			ret |= P(SNOOP, HIT);
-		else if ((sub_idx == 1) || (sub_idx == 3))
-			ret |= P(SNOOP, HITM);
+		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+			if (sub_idx == 0)
+				ret = PH(LVL, REM_CCE1) | LEVEL(ANY_CACHE) | REM |
+					P(SNOOP, HIT) | P(HOPS, 2);
+			else if (sub_idx == 1)
+				ret = PH(LVL, REM_CCE1) | LEVEL(ANY_CACHE) | REM |
+					P(SNOOP, HITM) | P(HOPS, 2);
+			else if (sub_idx == 2)
+				ret = PH(LVL, REM_CCE2) | LEVEL(ANY_CACHE) | REM |
+					P(SNOOP, HIT) | P(HOPS, 3);
+			else if (sub_idx == 3)
+				ret = PH(LVL, REM_CCE2) | LEVEL(ANY_CACHE) | REM |
+					P(SNOOP, HITM) | P(HOPS, 3);
+		} else {
+			ret = PH(LVL, REM_CCE2);
+			if (sub_idx == 0 || sub_idx == 2)
+				ret |= P(SNOOP, HIT);
+			else if (sub_idx == 1 || sub_idx == 3)
+				ret |= P(SNOOP, HITM);
+		}
 		break;
 	case 7:
 		ret = PM(LVL, L1);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] perf: Add new macros for mem_hops field
  2021-12-06  9:17 ` Kajol Jain
@ 2021-12-09 19:17   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-09 19:17 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mpe, linuxppc-dev, linux-kernel, peterz, mingo, jolsa, namhyung,
	ak, linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus

Em Mon, Dec 06, 2021 at 02:47:45PM +0530, Kajol Jain escreveu:
> Patchset adds new macros for mem_hops field which can be
> used to represent remote-node, socket and board level details.
> 
> Currently the code had macro for HOPS_0, which corresponds
> to data coming from another core but same node.
> Add new macros for HOPS_1 to HOPS_3 to represent
> remote-node, socket and board level data.
> 
> For ex: Encodings for mem_hops fields with L2 cache:

I checked and this hasn't hit mainstream, is it already merged on a tree
where this is slated to be submitted in the next window? If so please
let me know which one so that I can merge it on perf/core.

- Arnaldo
 
> L2                      - local L2
> L2 | REMOTE | HOPS_0    - remote core, same node L2
> L2 | REMOTE | HOPS_1    - remote node, same socket L2
> L2 | REMOTE | HOPS_2    - remote socket, same board L2
> L2 | REMOTE | HOPS_3    - remote board L2
> 
> Patch 1 & 2 adds tool and kernel side changes to add new macros for
> mem_hops field
> 
> Patch 3 add data source encodings for power10 and older platforms
> to represent data based on newer composite  PERF_MEM_LVLNUM* fields
> 
> Patch 4 add data source encodings with proper sub_index used to
> represent memory/cache level data for power10 platform.
> 
> Kajol Jain (4):
>   perf: Add new macros for mem_hops field
>   tools/perf: Add new macros for mem_hops field
>   powerpc/perf: Add encodings to represent data based on newer composite
>     PERF_MEM_LVLNUM* fields
>   powerpc/perf: Add data source encodings for power10 platform
> 
>  arch/powerpc/perf/isa207-common.c     | 60 ++++++++++++++++++++-------
>  include/uapi/linux/perf_event.h       |  5 ++-
>  tools/include/uapi/linux/perf_event.h |  5 ++-
>  tools/perf/util/mem-events.c          | 29 ++++++++-----
>  4 files changed, 71 insertions(+), 28 deletions(-)
> 
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] perf: Add new macros for mem_hops field
@ 2021-12-09 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-09 19:17 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mark.rutland, atrajeev, ak, daniel, rnsastry, peterz,
	linux-kernel, ast, linux-perf-users, alexander.shishkin, yao.jin,
	mingo, paulus, maddy, jolsa, namhyung, songliubraving,
	linuxppc-dev, kan.liang

Em Mon, Dec 06, 2021 at 02:47:45PM +0530, Kajol Jain escreveu:
> Patchset adds new macros for mem_hops field which can be
> used to represent remote-node, socket and board level details.
> 
> Currently the code had macro for HOPS_0, which corresponds
> to data coming from another core but same node.
> Add new macros for HOPS_1 to HOPS_3 to represent
> remote-node, socket and board level data.
> 
> For ex: Encodings for mem_hops fields with L2 cache:

I checked and this hasn't hit mainstream, is it already merged on a tree
where this is slated to be submitted in the next window? If so please
let me know which one so that I can merge it on perf/core.

- Arnaldo
 
> L2                      - local L2
> L2 | REMOTE | HOPS_0    - remote core, same node L2
> L2 | REMOTE | HOPS_1    - remote node, same socket L2
> L2 | REMOTE | HOPS_2    - remote socket, same board L2
> L2 | REMOTE | HOPS_3    - remote board L2
> 
> Patch 1 & 2 adds tool and kernel side changes to add new macros for
> mem_hops field
> 
> Patch 3 add data source encodings for power10 and older platforms
> to represent data based on newer composite  PERF_MEM_LVLNUM* fields
> 
> Patch 4 add data source encodings with proper sub_index used to
> represent memory/cache level data for power10 platform.
> 
> Kajol Jain (4):
>   perf: Add new macros for mem_hops field
>   tools/perf: Add new macros for mem_hops field
>   powerpc/perf: Add encodings to represent data based on newer composite
>     PERF_MEM_LVLNUM* fields
>   powerpc/perf: Add data source encodings for power10 platform
> 
>  arch/powerpc/perf/isa207-common.c     | 60 ++++++++++++++++++++-------
>  include/uapi/linux/perf_event.h       |  5 ++-
>  tools/include/uapi/linux/perf_event.h |  5 ++-
>  tools/perf/util/mem-events.c          | 29 ++++++++-----
>  4 files changed, 71 insertions(+), 28 deletions(-)
> 
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] perf: Add new macros for mem_hops field
  2021-12-09 19:17   ` Arnaldo Carvalho de Melo
@ 2021-12-10  6:35     ` Michael Ellerman
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2021-12-10  6:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Kajol Jain
  Cc: linuxppc-dev, linux-kernel, peterz, mingo, jolsa, namhyung, ak,
	linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus

Arnaldo Carvalho de Melo <acme@kernel.org> writes:
> Em Mon, Dec 06, 2021 at 02:47:45PM +0530, Kajol Jain escreveu:
>> Patchset adds new macros for mem_hops field which can be
>> used to represent remote-node, socket and board level details.
>> 
>> Currently the code had macro for HOPS_0, which corresponds
>> to data coming from another core but same node.
>> Add new macros for HOPS_1 to HOPS_3 to represent
>> remote-node, socket and board level data.
>> 
>> For ex: Encodings for mem_hops fields with L2 cache:
>
> I checked and this hasn't hit mainstream, is it already merged on a tree
> where this is slated to be submitted in the next window? If so please
> let me know which one so that I can merge it on perf/core.

I haven't picked it up. I guess the kernel changes are mainly in
powerpc, but I'd at least need an ack from eg. Peter for the generic
perf uapi change.

Equally the whole series could go via tip.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] perf: Add new macros for mem_hops field
@ 2021-12-10  6:35     ` Michael Ellerman
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2021-12-10  6:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Kajol Jain
  Cc: mark.rutland, atrajeev, ak, daniel, rnsastry, peterz,
	linux-kernel, ast, linux-perf-users, alexander.shishkin, yao.jin,
	mingo, paulus, maddy, jolsa, namhyung, songliubraving,
	linuxppc-dev, kan.liang

Arnaldo Carvalho de Melo <acme@kernel.org> writes:
> Em Mon, Dec 06, 2021 at 02:47:45PM +0530, Kajol Jain escreveu:
>> Patchset adds new macros for mem_hops field which can be
>> used to represent remote-node, socket and board level details.
>> 
>> Currently the code had macro for HOPS_0, which corresponds
>> to data coming from another core but same node.
>> Add new macros for HOPS_1 to HOPS_3 to represent
>> remote-node, socket and board level data.
>> 
>> For ex: Encodings for mem_hops fields with L2 cache:
>
> I checked and this hasn't hit mainstream, is it already merged on a tree
> where this is slated to be submitted in the next window? If so please
> let me know which one so that I can merge it on perf/core.

I haven't picked it up. I guess the kernel changes are mainly in
powerpc, but I'd at least need an ack from eg. Peter for the generic
perf uapi change.

Equally the whole series could go via tip.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] perf: Add new macros for mem_hops field
  2021-12-06  9:17   ` Kajol Jain
@ 2021-12-10  8:21     ` Peter Zijlstra
  -1 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-12-10  8:21 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mpe, linuxppc-dev, linux-kernel, mingo, acme, jolsa, namhyung,
	ak, linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus

On Mon, Dec 06, 2021 at 02:47:46PM +0530, Kajol Jain wrote:
> Add new macros for mem_hops field which can be used to
> represent remote-node, socket and board level details.
> 
> Currently the code had macro for HOPS_0, which corresponds
> to data coming from another core but same node.
> Add new macros for HOPS_1 to HOPS_3 to represent
> remote-node, socket and board level data.
> 
> For ex: Encodings for mem_hops fields with L2 cache:
> 
> L2			- local L2
> L2 | REMOTE | HOPS_0	- remote core, same node L2
> L2 | REMOTE | HOPS_1	- remote node, same socket L2
> L2 | REMOTE | HOPS_2	- remote socket, same board L2
> L2 | REMOTE | HOPS_3	- remote board L2
> 
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] perf: Add new macros for mem_hops field
@ 2021-12-10  8:21     ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-12-10  8:21 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mark.rutland, atrajeev, ak, daniel, rnsastry, alexander.shishkin,
	linux-kernel, acme, ast, linux-perf-users, yao.jin, mingo,
	paulus, maddy, jolsa, namhyung, songliubraving, linuxppc-dev,
	kan.liang

On Mon, Dec 06, 2021 at 02:47:46PM +0530, Kajol Jain wrote:
> Add new macros for mem_hops field which can be used to
> represent remote-node, socket and board level details.
> 
> Currently the code had macro for HOPS_0, which corresponds
> to data coming from another core but same node.
> Add new macros for HOPS_1 to HOPS_3 to represent
> remote-node, socket and board level data.
> 
> For ex: Encodings for mem_hops fields with L2 cache:
> 
> L2			- local L2
> L2 | REMOTE | HOPS_0	- remote core, same node L2
> L2 | REMOTE | HOPS_1	- remote node, same socket L2
> L2 | REMOTE | HOPS_2	- remote socket, same board L2
> L2 | REMOTE | HOPS_3	- remote board L2
> 
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] perf: Add new macros for mem_hops field
  2021-12-10  6:35     ` Michael Ellerman
@ 2021-12-10  8:22       ` Peter Zijlstra
  -1 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-12-10  8:22 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Arnaldo Carvalho de Melo, Kajol Jain, linuxppc-dev, linux-kernel,
	mingo, jolsa, namhyung, ak, linux-perf-users, maddy, atrajeev,
	rnsastry, yao.jin, ast, daniel, songliubraving, kan.liang,
	mark.rutland, alexander.shishkin, paulus

On Fri, Dec 10, 2021 at 05:35:41PM +1100, Michael Ellerman wrote:
> Arnaldo Carvalho de Melo <acme@kernel.org> writes:
> > Em Mon, Dec 06, 2021 at 02:47:45PM +0530, Kajol Jain escreveu:
> >> Patchset adds new macros for mem_hops field which can be
> >> used to represent remote-node, socket and board level details.
> >> 
> >> Currently the code had macro for HOPS_0, which corresponds
> >> to data coming from another core but same node.
> >> Add new macros for HOPS_1 to HOPS_3 to represent
> >> remote-node, socket and board level data.
> >> 
> >> For ex: Encodings for mem_hops fields with L2 cache:
> >
> > I checked and this hasn't hit mainstream, is it already merged on a tree
> > where this is slated to be submitted in the next window? If so please
> > let me know which one so that I can merge it on perf/core.
> 
> I haven't picked it up. I guess the kernel changes are mainly in
> powerpc, but I'd at least need an ack from eg. Peter for the generic
> perf uapi change.

Done :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] perf: Add new macros for mem_hops field
@ 2021-12-10  8:22       ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-12-10  8:22 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mark.rutland, atrajeev, ak, daniel, rnsastry, alexander.shishkin,
	Kajol Jain, linux-kernel, Arnaldo Carvalho de Melo, ast,
	linux-perf-users, yao.jin, mingo, paulus, maddy, jolsa, namhyung,
	songliubraving, linuxppc-dev, kan.liang

On Fri, Dec 10, 2021 at 05:35:41PM +1100, Michael Ellerman wrote:
> Arnaldo Carvalho de Melo <acme@kernel.org> writes:
> > Em Mon, Dec 06, 2021 at 02:47:45PM +0530, Kajol Jain escreveu:
> >> Patchset adds new macros for mem_hops field which can be
> >> used to represent remote-node, socket and board level details.
> >> 
> >> Currently the code had macro for HOPS_0, which corresponds
> >> to data coming from another core but same node.
> >> Add new macros for HOPS_1 to HOPS_3 to represent
> >> remote-node, socket and board level data.
> >> 
> >> For ex: Encodings for mem_hops fields with L2 cache:
> >
> > I checked and this hasn't hit mainstream, is it already merged on a tree
> > where this is slated to be submitted in the next window? If so please
> > let me know which one so that I can merge it on perf/core.
> 
> I haven't picked it up. I guess the kernel changes are mainly in
> powerpc, but I'd at least need an ack from eg. Peter for the generic
> perf uapi change.

Done :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] perf: Add new macros for mem_hops field
  2021-12-06  9:17 ` Kajol Jain
@ 2021-12-21 12:14   ` Michael Ellerman
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2021-12-21 12:14 UTC (permalink / raw)
  To: ak, Kajol Jain, linuxppc-dev, acme, mpe, namhyung, linux-kernel,
	mingo, jolsa, peterz
  Cc: alexander.shishkin, kan.liang, atrajeev, linux-perf-users, maddy,
	ast, daniel, rnsastry, songliubraving, mark.rutland, yao.jin,
	paulus

On Mon, 6 Dec 2021 14:47:45 +0530, Kajol Jain wrote:
> Patchset adds new macros for mem_hops field which can be
> used to represent remote-node, socket and board level details.
> 
> Currently the code had macro for HOPS_0, which corresponds
> to data coming from another core but same node.
> Add new macros for HOPS_1 to HOPS_3 to represent
> remote-node, socket and board level data.
> 
> [...]

Patches 1, 3 and 4 applied to powerpc/next.

[1/4] perf: Add new macros for mem_hops field
      https://git.kernel.org/powerpc/c/cb1c4aba055f928ffae0c868e8dfe08eeab302e7
[3/4] powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields
      https://git.kernel.org/powerpc/c/4a20ee106154ac1765dea97932faad29f0ba57fc
[4/4] powerpc/perf: Add data source encodings for power10 platform
      https://git.kernel.org/powerpc/c/6ed05a8efda56e5be11081954929421de19cce88

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] perf: Add new macros for mem_hops field
@ 2021-12-21 12:14   ` Michael Ellerman
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2021-12-21 12:14 UTC (permalink / raw)
  To: ak, Kajol Jain, linuxppc-dev, acme, mpe, namhyung, linux-kernel,
	mingo, jolsa, peterz
  Cc: mark.rutland, songliubraving, atrajeev, daniel, rnsastry,
	alexander.shishkin, ast, linux-perf-users, yao.jin, maddy,
	paulus, kan.liang

On Mon, 6 Dec 2021 14:47:45 +0530, Kajol Jain wrote:
> Patchset adds new macros for mem_hops field which can be
> used to represent remote-node, socket and board level details.
> 
> Currently the code had macro for HOPS_0, which corresponds
> to data coming from another core but same node.
> Add new macros for HOPS_1 to HOPS_3 to represent
> remote-node, socket and board level data.
> 
> [...]

Patches 1, 3 and 4 applied to powerpc/next.

[1/4] perf: Add new macros for mem_hops field
      https://git.kernel.org/powerpc/c/cb1c4aba055f928ffae0c868e8dfe08eeab302e7
[3/4] powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields
      https://git.kernel.org/powerpc/c/4a20ee106154ac1765dea97932faad29f0ba57fc
[4/4] powerpc/perf: Add data source encodings for power10 platform
      https://git.kernel.org/powerpc/c/6ed05a8efda56e5be11081954929421de19cce88

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] tools/perf: Add new macros for mem_hops field
  2021-12-06  9:17   ` Kajol Jain
@ 2021-12-22 12:36     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-22 12:36 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mpe, linuxppc-dev, linux-kernel, peterz, mingo, jolsa, namhyung,
	ak, linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus

Em Mon, Dec 06, 2021 at 02:47:47PM +0530, Kajol Jain escreveu:
> Add new macros for mem_hops field which can be used to
> represent remote-node, socket and board level details.
> 
> Currently the code had macro for HOPS_0 which, corresponds
> to data coming from another core but same node.
> Add new macros for HOPS_1 to HOPS_3 to represent
> remote-node, socket and board level data.
> 
> Also add corresponding strings in the mem_hops array to
> represent mem_hop field data in perf_mem__lvl_scnprintf function
> 
> Incase mem_hops field is used, PERF_MEM_LVLNUM field also need
> to be set inorder to represent the data source. Hence printing
> data source via PERF_MEM_LVL field can be skip in that scenario.
> 
> For ex: Encodings for mem_hops fields with L2 cache:

Thanks, applied.

- Arnaldo

 
> L2                      - local L2
> L2 | REMOTE | HOPS_0    - remote core, same node L2
> L2 | REMOTE | HOPS_1    - remote node, same socket L2
> L2 | REMOTE | HOPS_2    - remote socket, same board L2
> L2 | REMOTE | HOPS_3    - remote board L2
> 
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  tools/include/uapi/linux/perf_event.h |  5 ++++-
>  tools/perf/util/mem-events.c          | 29 +++++++++++++++++----------
>  2 files changed, 22 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index bd8860eeb291..4cd39aaccbe7 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -1332,7 +1332,10 @@ union perf_mem_data_src {
>  
>  /* hop level */
>  #define PERF_MEM_HOPS_0		0x01 /* remote core, same node */
> -/* 2-7 available */
> +#define PERF_MEM_HOPS_1         0x02 /* remote node, same socket */
> +#define PERF_MEM_HOPS_2         0x03 /* remote socket, same board */
> +#define PERF_MEM_HOPS_3         0x04 /* remote board */
> +/* 5-7 available */
>  #define PERF_MEM_HOPS_SHIFT	43
>  
>  #define PERF_MEM_S(a, s) \
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index 3167b4628b6d..ed0ab838bcc5 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -309,6 +309,9 @@ static const char * const mem_hops[] = {
>  	 * to be set with mem_hops field.
>  	 */
>  	"core, same node",
> +	"node, same socket",
> +	"socket, same board",
> +	"board",
>  };
>  
>  int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
> @@ -316,7 +319,7 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
>  	size_t i, l = 0;
>  	u64 m =  PERF_MEM_LVL_NA;
>  	u64 hit, miss;
> -	int printed;
> +	int printed = 0;
>  
>  	if (mem_info)
>  		m  = mem_info->data_src.mem_lvl;
> @@ -335,18 +338,22 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
>  		l += 7;
>  	}
>  
> -	if (mem_info && mem_info->data_src.mem_hops)
> +	/*
> +	 * Incase mem_hops field is set, we can skip printing data source via
> +	 * PERF_MEM_LVL namespace.
> +	 */
> +	if (mem_info && mem_info->data_src.mem_hops) {
>  		l += scnprintf(out + l, sz - l, "%s ", mem_hops[mem_info->data_src.mem_hops]);
> -
> -	printed = 0;
> -	for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
> -		if (!(m & 0x1))
> -			continue;
> -		if (printed++) {
> -			strcat(out, " or ");
> -			l += 4;
> +	} else {
> +		for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
> +			if (!(m & 0x1))
> +				continue;
> +			if (printed++) {
> +				strcat(out, " or ");
> +				l += 4;
> +			}
> +			l += scnprintf(out + l, sz - l, mem_lvl[i]);
>  		}
> -		l += scnprintf(out + l, sz - l, mem_lvl[i]);
>  	}
>  
>  	if (mem_info && mem_info->data_src.mem_lvl_num) {
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] tools/perf: Add new macros for mem_hops field
@ 2021-12-22 12:36     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-22 12:36 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mark.rutland, atrajeev, ak, daniel, rnsastry, peterz,
	linux-kernel, ast, linux-perf-users, alexander.shishkin, yao.jin,
	mingo, paulus, maddy, jolsa, namhyung, songliubraving,
	linuxppc-dev, kan.liang

Em Mon, Dec 06, 2021 at 02:47:47PM +0530, Kajol Jain escreveu:
> Add new macros for mem_hops field which can be used to
> represent remote-node, socket and board level details.
> 
> Currently the code had macro for HOPS_0 which, corresponds
> to data coming from another core but same node.
> Add new macros for HOPS_1 to HOPS_3 to represent
> remote-node, socket and board level data.
> 
> Also add corresponding strings in the mem_hops array to
> represent mem_hop field data in perf_mem__lvl_scnprintf function
> 
> Incase mem_hops field is used, PERF_MEM_LVLNUM field also need
> to be set inorder to represent the data source. Hence printing
> data source via PERF_MEM_LVL field can be skip in that scenario.
> 
> For ex: Encodings for mem_hops fields with L2 cache:

Thanks, applied.

- Arnaldo

 
> L2                      - local L2
> L2 | REMOTE | HOPS_0    - remote core, same node L2
> L2 | REMOTE | HOPS_1    - remote node, same socket L2
> L2 | REMOTE | HOPS_2    - remote socket, same board L2
> L2 | REMOTE | HOPS_3    - remote board L2
> 
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  tools/include/uapi/linux/perf_event.h |  5 ++++-
>  tools/perf/util/mem-events.c          | 29 +++++++++++++++++----------
>  2 files changed, 22 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index bd8860eeb291..4cd39aaccbe7 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -1332,7 +1332,10 @@ union perf_mem_data_src {
>  
>  /* hop level */
>  #define PERF_MEM_HOPS_0		0x01 /* remote core, same node */
> -/* 2-7 available */
> +#define PERF_MEM_HOPS_1         0x02 /* remote node, same socket */
> +#define PERF_MEM_HOPS_2         0x03 /* remote socket, same board */
> +#define PERF_MEM_HOPS_3         0x04 /* remote board */
> +/* 5-7 available */
>  #define PERF_MEM_HOPS_SHIFT	43
>  
>  #define PERF_MEM_S(a, s) \
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index 3167b4628b6d..ed0ab838bcc5 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -309,6 +309,9 @@ static const char * const mem_hops[] = {
>  	 * to be set with mem_hops field.
>  	 */
>  	"core, same node",
> +	"node, same socket",
> +	"socket, same board",
> +	"board",
>  };
>  
>  int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
> @@ -316,7 +319,7 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
>  	size_t i, l = 0;
>  	u64 m =  PERF_MEM_LVL_NA;
>  	u64 hit, miss;
> -	int printed;
> +	int printed = 0;
>  
>  	if (mem_info)
>  		m  = mem_info->data_src.mem_lvl;
> @@ -335,18 +338,22 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
>  		l += 7;
>  	}
>  
> -	if (mem_info && mem_info->data_src.mem_hops)
> +	/*
> +	 * Incase mem_hops field is set, we can skip printing data source via
> +	 * PERF_MEM_LVL namespace.
> +	 */
> +	if (mem_info && mem_info->data_src.mem_hops) {
>  		l += scnprintf(out + l, sz - l, "%s ", mem_hops[mem_info->data_src.mem_hops]);
> -
> -	printed = 0;
> -	for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
> -		if (!(m & 0x1))
> -			continue;
> -		if (printed++) {
> -			strcat(out, " or ");
> -			l += 4;
> +	} else {
> +		for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
> +			if (!(m & 0x1))
> +				continue;
> +			if (printed++) {
> +				strcat(out, " or ");
> +				l += 4;
> +			}
> +			l += scnprintf(out + l, sz - l, mem_lvl[i]);
>  		}
> -		l += scnprintf(out + l, sz - l, mem_lvl[i]);
>  	}
>  
>  	if (mem_info && mem_info->data_src.mem_lvl_num) {
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields
  2021-12-06  9:17   ` Kajol Jain
@ 2021-12-22 12:38     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-22 12:38 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mpe, linuxppc-dev, linux-kernel, peterz, mingo, jolsa, namhyung,
	ak, linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus

Em Mon, Dec 06, 2021 at 02:47:48PM +0530, Kajol Jain escreveu:
> The code represent data coming from L1/L2/L3 cache hits based on
> PERF_MEM_LVL_* namespace, which is in the process of deprecation in
> the favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_}
> fields.

Thanks, applied.

- Arnaldo

 
> Add data source encodings to represent L1/L2/L3 cache hits based on
> newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields for
> power10 and older platforms
> 
> Result in power9 system without patch changes:
> 
> localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
>  # Overhead       Samples  Memory access             Symbol                             Shared Object
>  # ........  ............  ........................  .................................  ................
>  #
>     29.51%             1  L2 hit                    [k] perf_event_exec                [kernel.vmlinux]
>     27.05%             1  L1 hit                    [k] perf_ctx_unlock                [kernel.vmlinux]
>     13.93%             1  L1 hit                    [k] vtime_delta                    [kernel.vmlinux]
>     13.11%             1  L1 hit                    [k] prepend_path.isra.11           [kernel.vmlinux]
>      8.20%             1  L1 hit                    [.] 00000038.plt_call.__GI_strlen  libc-2.28.so
>      8.20%             1  L1 hit                    [k] perf_event_interrupt           [kernel.vmlinux]
> 
> Result in power9 system with patch changes:
> 
> localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
>  # Overhead       Samples  Memory access             Symbol                      Shared Object
>  # ........  ............  ........................  ..........................  ................
>  #
>     36.63%             1  L2 or L2 hit              [k] perf_event_exec         [kernel.vmlinux]
>     25.50%             1  L1 or L1 hit              [k] vtime_delta             [kernel.vmlinux]
>     13.12%             1  L1 or L1 hit              [k] unmap_region            [kernel.vmlinux]
>     12.62%             1  L1 or L1 hit              [k] perf_sample_event_took  [kernel.vmlinux]
>      6.93%             1  L1 or L1 hit              [k] perf_ctx_unlock         [kernel.vmlinux]
>      5.20%             1  L1 or L1 hit              [.] __memcpy_power7         libc-2.28.so
> 
> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  arch/powerpc/perf/isa207-common.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
> index 7ea873ab2e6f..6c6bc8b7d887 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -220,13 +220,13 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
>  		/* Nothing to do */
>  		break;
>  	case 1:
> -		ret = PH(LVL, L1);
> +		ret = PH(LVL, L1) | LEVEL(L1) | P(SNOOP, HIT);
>  		break;
>  	case 2:
> -		ret = PH(LVL, L2);
> +		ret = PH(LVL, L2) | LEVEL(L2) | P(SNOOP, HIT);
>  		break;
>  	case 3:
> -		ret = PH(LVL, L3);
> +		ret = PH(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT);
>  		break;
>  	case 4:
>  		if (sub_idx <= 1)
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields
@ 2021-12-22 12:38     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-22 12:38 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mark.rutland, atrajeev, ak, daniel, rnsastry, peterz,
	linux-kernel, ast, linux-perf-users, alexander.shishkin, yao.jin,
	mingo, paulus, maddy, jolsa, namhyung, songliubraving,
	linuxppc-dev, kan.liang

Em Mon, Dec 06, 2021 at 02:47:48PM +0530, Kajol Jain escreveu:
> The code represent data coming from L1/L2/L3 cache hits based on
> PERF_MEM_LVL_* namespace, which is in the process of deprecation in
> the favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_}
> fields.

Thanks, applied.

- Arnaldo

 
> Add data source encodings to represent L1/L2/L3 cache hits based on
> newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields for
> power10 and older platforms
> 
> Result in power9 system without patch changes:
> 
> localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
>  # Overhead       Samples  Memory access             Symbol                             Shared Object
>  # ........  ............  ........................  .................................  ................
>  #
>     29.51%             1  L2 hit                    [k] perf_event_exec                [kernel.vmlinux]
>     27.05%             1  L1 hit                    [k] perf_ctx_unlock                [kernel.vmlinux]
>     13.93%             1  L1 hit                    [k] vtime_delta                    [kernel.vmlinux]
>     13.11%             1  L1 hit                    [k] prepend_path.isra.11           [kernel.vmlinux]
>      8.20%             1  L1 hit                    [.] 00000038.plt_call.__GI_strlen  libc-2.28.so
>      8.20%             1  L1 hit                    [k] perf_event_interrupt           [kernel.vmlinux]
> 
> Result in power9 system with patch changes:
> 
> localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
>  # Overhead       Samples  Memory access             Symbol                      Shared Object
>  # ........  ............  ........................  ..........................  ................
>  #
>     36.63%             1  L2 or L2 hit              [k] perf_event_exec         [kernel.vmlinux]
>     25.50%             1  L1 or L1 hit              [k] vtime_delta             [kernel.vmlinux]
>     13.12%             1  L1 or L1 hit              [k] unmap_region            [kernel.vmlinux]
>     12.62%             1  L1 or L1 hit              [k] perf_sample_event_took  [kernel.vmlinux]
>      6.93%             1  L1 or L1 hit              [k] perf_ctx_unlock         [kernel.vmlinux]
>      5.20%             1  L1 or L1 hit              [.] __memcpy_power7         libc-2.28.so
> 
> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  arch/powerpc/perf/isa207-common.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
> index 7ea873ab2e6f..6c6bc8b7d887 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -220,13 +220,13 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
>  		/* Nothing to do */
>  		break;
>  	case 1:
> -		ret = PH(LVL, L1);
> +		ret = PH(LVL, L1) | LEVEL(L1) | P(SNOOP, HIT);
>  		break;
>  	case 2:
> -		ret = PH(LVL, L2);
> +		ret = PH(LVL, L2) | LEVEL(L2) | P(SNOOP, HIT);
>  		break;
>  	case 3:
> -		ret = PH(LVL, L3);
> +		ret = PH(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT);
>  		break;
>  	case 4:
>  		if (sub_idx <= 1)
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] powerpc/perf: Add data source encodings for power10 platform
  2021-12-06  9:17   ` Kajol Jain
@ 2021-12-22 12:41     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-22 12:41 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mpe, linuxppc-dev, linux-kernel, peterz, mingo, jolsa, namhyung,
	ak, linux-perf-users, maddy, atrajeev, rnsastry, yao.jin, ast,
	daniel, songliubraving, kan.liang, mark.rutland,
	alexander.shishkin, paulus

Em Mon, Dec 06, 2021 at 02:47:49PM +0530, Kajol Jain escreveu:
> The code represent memory/cache level data based on PERF_MEM_LVL_*
> namespace, which is in the process of deprication in the favour of
> newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
> Add data source encodings to represent cache/memory data based on
> newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.

Thanks, applied.

- Arnaldo

 
> Add data source encodings to represent data coming from local
> memory/Remote memory/distant memory and remote/distant cache hits.
> 
> Inorder to represent data coming from OpenCAPI cache/memory, we use
> LVLNUM "PMEM" field which is used to present persistent memory accesses.
> 
> Result in power10 system with patch changes:
> 
> localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
>  # Overhead       Samples  Memory access             Symbol                      Shared Object
>  # ........  ............  ........................  ..........................  ................
>  #
>     29.46%          2331  L1 or L1 hit              [.] __random                                     libc-2.28.so
>     23.11%          2121  L1 or L1 hit              [.] producer_populate_cache                      producer_consumer
>     18.56%          1758  L1 or L1 hit              [.] __random_r                                   libc-2.28.so
>     15.64%          1559  L2 or L2 hit              [.] __random                                     libc-2.28.so
>     .....
>     0.09%              5  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
>     0.07%              4  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
>     .....
> 
> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  arch/powerpc/perf/isa207-common.c | 54 ++++++++++++++++++++++++-------
>  1 file changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
> index 6c6bc8b7d887..4037ea652522 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -229,13 +229,28 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
>  		ret = PH(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT);
>  		break;
>  	case 4:
> -		if (sub_idx <= 1)
> -			ret = PH(LVL, LOC_RAM);
> -		else if (sub_idx > 1 && sub_idx <= 2)
> -			ret = PH(LVL, REM_RAM1);
> -		else
> -			ret = PH(LVL, REM_RAM2);
> -		ret |= P(SNOOP, HIT);
> +		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> +			ret = P(SNOOP, HIT);
> +
> +			if (sub_idx == 1)
> +				ret |= PH(LVL, LOC_RAM) | LEVEL(RAM);
> +			else if (sub_idx == 2 || sub_idx == 3)
> +				ret |= P(LVL, HIT) | LEVEL(PMEM);
> +			else if (sub_idx == 4)
> +				ret |= PH(LVL, REM_RAM1) | REM | LEVEL(RAM) | P(HOPS, 2);
> +			else if (sub_idx == 5 || sub_idx == 7)
> +				ret |= P(LVL, HIT) | LEVEL(PMEM) | REM;
> +			else if (sub_idx == 6)
> +				ret |= PH(LVL, REM_RAM2) | REM | LEVEL(RAM) | P(HOPS, 3);
> +		} else {
> +			if (sub_idx <= 1)
> +				ret = PH(LVL, LOC_RAM);
> +			else if (sub_idx > 1 && sub_idx <= 2)
> +				ret = PH(LVL, REM_RAM1);
> +			else
> +				ret = PH(LVL, REM_RAM2);
> +			ret |= P(SNOOP, HIT);
> +		}
>  		break;
>  	case 5:
>  		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> @@ -261,11 +276,26 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
>  		}
>  		break;
>  	case 6:
> -		ret = PH(LVL, REM_CCE2);
> -		if ((sub_idx == 0) || (sub_idx == 2))
> -			ret |= P(SNOOP, HIT);
> -		else if ((sub_idx == 1) || (sub_idx == 3))
> -			ret |= P(SNOOP, HITM);
> +		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> +			if (sub_idx == 0)
> +				ret = PH(LVL, REM_CCE1) | LEVEL(ANY_CACHE) | REM |
> +					P(SNOOP, HIT) | P(HOPS, 2);
> +			else if (sub_idx == 1)
> +				ret = PH(LVL, REM_CCE1) | LEVEL(ANY_CACHE) | REM |
> +					P(SNOOP, HITM) | P(HOPS, 2);
> +			else if (sub_idx == 2)
> +				ret = PH(LVL, REM_CCE2) | LEVEL(ANY_CACHE) | REM |
> +					P(SNOOP, HIT) | P(HOPS, 3);
> +			else if (sub_idx == 3)
> +				ret = PH(LVL, REM_CCE2) | LEVEL(ANY_CACHE) | REM |
> +					P(SNOOP, HITM) | P(HOPS, 3);
> +		} else {
> +			ret = PH(LVL, REM_CCE2);
> +			if (sub_idx == 0 || sub_idx == 2)
> +				ret |= P(SNOOP, HIT);
> +			else if (sub_idx == 1 || sub_idx == 3)
> +				ret |= P(SNOOP, HITM);
> +		}
>  		break;
>  	case 7:
>  		ret = PM(LVL, L1);
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] powerpc/perf: Add data source encodings for power10 platform
@ 2021-12-22 12:41     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-22 12:41 UTC (permalink / raw)
  To: Kajol Jain
  Cc: mark.rutland, atrajeev, ak, daniel, rnsastry, peterz,
	linux-kernel, ast, linux-perf-users, alexander.shishkin, yao.jin,
	mingo, paulus, maddy, jolsa, namhyung, songliubraving,
	linuxppc-dev, kan.liang

Em Mon, Dec 06, 2021 at 02:47:49PM +0530, Kajol Jain escreveu:
> The code represent memory/cache level data based on PERF_MEM_LVL_*
> namespace, which is in the process of deprication in the favour of
> newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
> Add data source encodings to represent cache/memory data based on
> newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.

Thanks, applied.

- Arnaldo

 
> Add data source encodings to represent data coming from local
> memory/Remote memory/distant memory and remote/distant cache hits.
> 
> Inorder to represent data coming from OpenCAPI cache/memory, we use
> LVLNUM "PMEM" field which is used to present persistent memory accesses.
> 
> Result in power10 system with patch changes:
> 
> localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
>  # Overhead       Samples  Memory access             Symbol                      Shared Object
>  # ........  ............  ........................  ..........................  ................
>  #
>     29.46%          2331  L1 or L1 hit              [.] __random                                     libc-2.28.so
>     23.11%          2121  L1 or L1 hit              [.] producer_populate_cache                      producer_consumer
>     18.56%          1758  L1 or L1 hit              [.] __random_r                                   libc-2.28.so
>     15.64%          1559  L2 or L2 hit              [.] __random                                     libc-2.28.so
>     .....
>     0.09%              5  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
>     0.07%              4  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
>     .....
> 
> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  arch/powerpc/perf/isa207-common.c | 54 ++++++++++++++++++++++++-------
>  1 file changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
> index 6c6bc8b7d887..4037ea652522 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -229,13 +229,28 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
>  		ret = PH(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT);
>  		break;
>  	case 4:
> -		if (sub_idx <= 1)
> -			ret = PH(LVL, LOC_RAM);
> -		else if (sub_idx > 1 && sub_idx <= 2)
> -			ret = PH(LVL, REM_RAM1);
> -		else
> -			ret = PH(LVL, REM_RAM2);
> -		ret |= P(SNOOP, HIT);
> +		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> +			ret = P(SNOOP, HIT);
> +
> +			if (sub_idx == 1)
> +				ret |= PH(LVL, LOC_RAM) | LEVEL(RAM);
> +			else if (sub_idx == 2 || sub_idx == 3)
> +				ret |= P(LVL, HIT) | LEVEL(PMEM);
> +			else if (sub_idx == 4)
> +				ret |= PH(LVL, REM_RAM1) | REM | LEVEL(RAM) | P(HOPS, 2);
> +			else if (sub_idx == 5 || sub_idx == 7)
> +				ret |= P(LVL, HIT) | LEVEL(PMEM) | REM;
> +			else if (sub_idx == 6)
> +				ret |= PH(LVL, REM_RAM2) | REM | LEVEL(RAM) | P(HOPS, 3);
> +		} else {
> +			if (sub_idx <= 1)
> +				ret = PH(LVL, LOC_RAM);
> +			else if (sub_idx > 1 && sub_idx <= 2)
> +				ret = PH(LVL, REM_RAM1);
> +			else
> +				ret = PH(LVL, REM_RAM2);
> +			ret |= P(SNOOP, HIT);
> +		}
>  		break;
>  	case 5:
>  		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> @@ -261,11 +276,26 @@ static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
>  		}
>  		break;
>  	case 6:
> -		ret = PH(LVL, REM_CCE2);
> -		if ((sub_idx == 0) || (sub_idx == 2))
> -			ret |= P(SNOOP, HIT);
> -		else if ((sub_idx == 1) || (sub_idx == 3))
> -			ret |= P(SNOOP, HITM);
> +		if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> +			if (sub_idx == 0)
> +				ret = PH(LVL, REM_CCE1) | LEVEL(ANY_CACHE) | REM |
> +					P(SNOOP, HIT) | P(HOPS, 2);
> +			else if (sub_idx == 1)
> +				ret = PH(LVL, REM_CCE1) | LEVEL(ANY_CACHE) | REM |
> +					P(SNOOP, HITM) | P(HOPS, 2);
> +			else if (sub_idx == 2)
> +				ret = PH(LVL, REM_CCE2) | LEVEL(ANY_CACHE) | REM |
> +					P(SNOOP, HIT) | P(HOPS, 3);
> +			else if (sub_idx == 3)
> +				ret = PH(LVL, REM_CCE2) | LEVEL(ANY_CACHE) | REM |
> +					P(SNOOP, HITM) | P(HOPS, 3);
> +		} else {
> +			ret = PH(LVL, REM_CCE2);
> +			if (sub_idx == 0 || sub_idx == 2)
> +				ret |= P(SNOOP, HIT);
> +			else if (sub_idx == 1 || sub_idx == 3)
> +				ret |= P(SNOOP, HITM);
> +		}
>  		break;
>  	case 7:
>  		ret = PM(LVL, L1);
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-12-22 12:42 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-06  9:17 [PATCH 0/4] perf: Add new macros for mem_hops field Kajol Jain
2021-12-06  9:17 ` Kajol Jain
2021-12-06  9:17 ` [PATCH 1/4] " Kajol Jain
2021-12-06  9:17   ` Kajol Jain
2021-12-10  8:21   ` Peter Zijlstra
2021-12-10  8:21     ` Peter Zijlstra
2021-12-06  9:17 ` [PATCH 2/4] tools/perf: " Kajol Jain
2021-12-06  9:17   ` Kajol Jain
2021-12-22 12:36   ` Arnaldo Carvalho de Melo
2021-12-22 12:36     ` Arnaldo Carvalho de Melo
2021-12-06  9:17 ` [PATCH 3/4] powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields Kajol Jain
2021-12-06  9:17   ` Kajol Jain
2021-12-22 12:38   ` Arnaldo Carvalho de Melo
2021-12-22 12:38     ` Arnaldo Carvalho de Melo
2021-12-06  9:17 ` [PATCH 4/4] powerpc/perf: Add data source encodings for power10 platform Kajol Jain
2021-12-06  9:17   ` Kajol Jain
2021-12-22 12:41   ` Arnaldo Carvalho de Melo
2021-12-22 12:41     ` Arnaldo Carvalho de Melo
2021-12-09 19:17 ` [PATCH 0/4] perf: Add new macros for mem_hops field Arnaldo Carvalho de Melo
2021-12-09 19:17   ` Arnaldo Carvalho de Melo
2021-12-10  6:35   ` Michael Ellerman
2021-12-10  6:35     ` Michael Ellerman
2021-12-10  8:22     ` Peter Zijlstra
2021-12-10  8:22       ` Peter Zijlstra
2021-12-21 12:14 ` Michael Ellerman
2021-12-21 12:14   ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.