linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE
@ 2020-08-06  3:07 Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 01/11] perf mem: Search event name with more flexible path Leo Yan
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

This patch set is to support AUX trace and Arm SPE as the first enabled
hardware tracing for Perf memory tool.

Patches 01 ~ 04 are preparasion patches which mainly resolve the issue
for memory events, since the existed code is hard coded the memory
events which based on x86 and PowerPC architectures, so patches 01 ~ 04
extend to support more flexible memory event name, and introduce weak
functions so can allow every architecture to define its own memory
events structure and returning event pointer and name respectively.

Patch 05 is used to extend Perf memory tool to support AUX trace.

Patch 06 ~ 11 are to support Arm SPE with Perf memory tool.  Firstly it
registers SPE events for memory events, then it extends the SPE packet
to pass addresses info and operation types, and also set 'data_src'
field so can allow the tool to display readable string in the result.

This patch set has been tested on ARMv8 Hisilicon D06 platform.  I noted
now the 'data object' cannot be displayed properly, this should be
another issue so need to check separately.   Below is testing result:

# Samples: 73  of event 'l1d-miss'
# Total weight : 73
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead       Samples  Local Weight  Memory access             Symbol                              Shared Object      Data Symbol                                                   Data Object         Snoop         TLB access              Locked
# ........  ............  ............  ........................  ..................................  .................  ............................................................  ..................  ............  ......................  ......
#
     2.74%             2  0             L1 or L1 miss             [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027aacb08a8                                        [unknown]           N/A           N/A                     No
     2.74%             2  0             L1 or L1 miss             [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027be6488a8                                        [unknown]           N/A           N/A                     No
     2.74%             2  0             L1 or L1 miss             [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027c432f8a8                                        [unknown]           N/A           N/A                     No
     1.37%             1  0             L1 or L1 miss             [k] __arch_copy_to_user             [kernel.kallsyms]  [k] 0xffff0027a65352a0                                        [unknown]           N/A           N/A                     No
     1.37%             1  0             L1 or L1 miss             [k] __d_lookup_rcu                  [kernel.kallsyms]  [k] 0xffff0027d3cbf468                                        [unknown]           N/A           N/A                     No
     1.37%             1  0             L1 or L1 miss             [k] __d_lookup_rcu                  [kernel.kallsyms]  [k] 0xffff0027d8f44490                                        [unknown]           N/A           N/A                     No
     [...]


# Samples: 101  of event 'l1d-access'
# Total weight : 101
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead       Samples  Local Weight  Memory access             Symbol                              Shared Object      Data Symbol                                                   Data Object             Snoop         TLB access              Locked
# ........  ............  ............  ........................  ..................................  .................  ............................................................  ......................  ............  ......................  ......
#
     2.97%             3  0             L1 or L1 hit              [k] perf_event_mmap                 [kernel.kallsyms]  [k] perf_swevent+0x5c                                         [kernel.kallsyms].data  N/A           N/A                     No
     1.98%             2  0             L1 or L1 hit              [k] kmem_cache_alloc                [kernel.kallsyms]  [k] 0xffff2027af40e3d0                                        [unknown]               N/A           N/A                     No
     1.98%             2  0             L1 or L1 hit              [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027aacb08a8                                        [unknown]               N/A           N/A                     No
     1.98%             2  0             L1 or L1 hit              [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027be6488a8                                        [unknown]               N/A           N/A                     No
     1.98%             2  0             L1 or L1 hit              [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027c432f8a8                                        [unknown]               N/A           N/A                     No
     0.99%             1  0             L1 or L1 hit              [k] __arch_copy_to_user             [kernel.kallsyms]  [k] 0xffff0027a65352a0                                        [unknown]               N/A           N/A                     No
     0.99%             1  0             L1 or L1 hit              [k] __d_lookup_rcu                  [kernel.kallsyms]  [k] 0xffff0027d3cbf468                                        [unknown]               N/A           N/A                     No
     0.99%             1  0             L1 or L1 hit              [k] __d_lookup_rcu                  [kernel.kallsyms]  [k] 0xffff0027d8f44490                                        [unknown]               N/A           N/A                     No
     [...]


# Samples: 46  of event 'llc-miss'
# Total weight : 46
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead       Samples  Local Weight  Memory access             Symbol                              Shared Object      Data Symbol                                                   Data Object         Snoop         TLB access              Locked
# ........  ............  ............  ........................  ..................................  .................  ............................................................  ..................  ............  ......................  ......
#
     2.17%             1  0             L3 or L3 miss             [k] __arch_copy_to_user             [kernel.kallsyms]  [k] 0xffff0027a65352a0                                        [unknown]           N/A           N/A                     No
     2.17%             1  0             L3 or L3 miss             [k] __d_lookup_rcu                  [kernel.kallsyms]  [k] 0xffff0027d3cbf468                                        [unknown]           N/A           N/A                     No
     2.17%             1  0             L3 or L3 miss             [k] __d_lookup_rcu                  [kernel.kallsyms]  [k] 0xffff0027d8f44490                                        [unknown]           N/A           N/A                     No
     2.17%             1  0             L3 or L3 miss             [k] __tty_buffer_request_room       [kernel.kallsyms]  [k] 0xffff2027c424ac08                                        [unknown]           N/A           N/A                     No
     2.17%             1  0             L3 or L3 miss             [.] _dl_addr                        libc-2.28.so       [.] 0x0000ffff9afc94c4                                        libc-2.28.so        N/A           N/A                     No
     2.17%             1  0             L3 or L3 miss             [.] _dl_addr                        libc-2.28.so       [.] 0x0000ffff9afc98b6                                        libc-2.28.so        N/A           N/A                     No
     2.17%             1  0             L3 or L3 miss             [.] _dl_lookup_symbol_x             ld-2.28.so         [.] 0x0000ffff9af38703                                        libdl-2.28.so       N/A           N/A                     No
     [...]


# Samples: 6  of event 'llc-access'
# Total weight : 6
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead       Samples  Local Weight  Memory access             Symbol                   Shared Object      Data Symbol             Data Object       Snoop         TLB access              Locked
# ........  ............  ............  ........................  .......................  .................  ......................  ................  ............  ......................  ......
#
    16.67%             1  0             L3 or L3 hit              [.] _dl_addr             libc-2.28.so       [.] 0x0000ffff9afc98b6  libc-2.28.so      N/A           N/A                     No
    16.67%             1  0             L3 or L3 hit              [.] _dl_lookup_symbol_x  ld-2.28.so         [.] 0x0000ffff9af38703  libdl-2.28.so     N/A           N/A                     No
    16.67%             1  0             L3 or L3 hit              [.] _dl_relocate_object  ld-2.28.so         [.] 0x0000aaaadc9e4dd0  ls                N/A           N/A                     No
    16.67%             1  0             L3 or L3 hit              [.] _dl_relocate_object  ld-2.28.so         [.] 0x0000aaaadc9e5970  ls                N/A           N/A                     No
    16.67%             1  0             L3 or L3 hit              [k] copy_page            [kernel.kallsyms]  [k] 0xffff0027a8528be0  [unknown]         N/A           N/A                     No
    16.67%             1  0             L3 or L3 hit              [k] copy_page            [kernel.kallsyms]  [k] 0xffff2027c66a65e0  [unknown]         N/A           N/A                     No


# Samples: 32  of event 'tlb-miss'
# Total weight : 32
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead       Samples  Local Weight  Memory access             Symbol                              Shared Object      Data Symbol             Data Object         Snoop         TLB access              Locked
# ........  ............  ............  ........................  ..................................  .................  ......................  ..................  ............  ......................  ......
#
     6.25%             2  0             N/A                       [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027be6488a8  [unknown]           N/A           Walker miss             No
     6.25%             2  0             N/A                       [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027c432f8a8  [unknown]           N/A           Walker miss             No
     3.12%             1  0             N/A                       [k] __arch_clear_user               [kernel.kallsyms]  [k] 0x0000aaaadca14658  [unknown]           N/A           Walker miss             No
     3.12%             1  0             N/A                       [.] _dl_map_object_deps             ld-2.28.so         [.] 0x0000ffffdd807c50  [stack]             N/A           Walker miss             No
     3.12%             1  0             N/A                       [.] _dl_map_object_from_fd          ld-2.28.so         [.] 0x0000ffff9af32d50  libpthread-2.28.so  N/A           Walker miss             No
     [...]


# Samples: 114  of event 'tlb-access'
# Total weight : 114
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead       Samples  Local Weight  Memory access             Symbol                              Shared Object      Data Symbol                                                   Data Object             Snoop         TLB access              Locked
# ........  ............  ............  ........................  ..................................  .................  ............................................................  ......................  ............  ......................  ......
#
     2.63%             3  0             N/A                       [k] perf_event_mmap                 [kernel.kallsyms]  [k] perf_swevent+0x5c                                         [kernel.kallsyms].data  N/A           Walker hit              No
     1.75%             2  0             N/A                       [k] kmem_cache_alloc                [kernel.kallsyms]  [k] 0xffff2027af40e3d0                                        [unknown]               N/A           Walker hit              No
     1.75%             2  0             N/A                       [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027aacb08a8                                        [unknown]               N/A           Walker hit              No
     1.75%             2  0             N/A                       [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027be6488a8                                        [unknown]               N/A           Walker hit              No
     1.75%             2  0             N/A                       [k] perf_iterate_ctx.constprop.151  [kernel.kallsyms]  [k] 0xffff2027c432f8a8                                        [unknown]               N/A           Walker hit              No
     0.88%             1  0             N/A                       [k] __arch_clear_user               [kernel.kallsyms]  [k] 0x0000aaaadca14658                                        [unknown]               N/A           Walker hit              No
     0.88%             1  0             N/A                       [k] __arch_clear_user               [kernel.kallsyms]  [k] 0x0000ffff9b1963f8                                        [unknown]               N/A           Walker hit              No
     [...]


# Samples: 21  of event 'remote-access'
# Total weight : 21
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead       Samples  Local Weight  Memory access             Symbol                      Shared Object      Data Symbol                                                   Data Object         Snoop         TLB access              Locked
# ........  ............  ............  ........................  ..........................  .................  ............................................................  ..................  ............  ......................  ......
#
     4.76%             1  0             Remote Cache (1 hop) or Any cache hit  [k] __arch_copy_to_user     [kernel.kallsyms]  [k] 0xffff0027a65352a0                                        [unknown]           N/A           N/A                     No
     4.76%             1  0             Remote Cache (1 hop) or Any cache hit  [k] __d_lookup_rcu          [kernel.kallsyms]  [k] 0xffff0027d3cbf468                                        [unknown]           N/A           N/A                     No
     4.76%             1  0             Remote Cache (1 hop) or Any cache hit  [k] __d_lookup_rcu          [kernel.kallsyms]  [k] 0xffff0027d8f44490                                        [unknown]           N/A           N/A                     No
     4.76%             1  0             Remote Cache (1 hop) or Any cache hit  [.] _dl_addr                libc-2.28.so       [.] 0x0000ffff9afc94c4                                        libc-2.28.so        N/A           N/A                     No
     [...]


Notes: the networking was not stable at my side and git didn't send
out all patches at last time, so resend this patch set.  I did a bit
improvement for patch 01's commit log for this resending patch set.
Sorry for spamming and introducing inconvenience.


Leo Yan (11):
  perf mem: Search event name with more flexible path
  perf mem: Introduce weak function perf_mem_events__ptr()
  perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE
  perf mem: Only initialize memory event for recording
  perf mem: Support AUX trace
  perf mem: Support Arm SPE events
  perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC
  perf arm-spe: Save memory addresses in packet
  perf arm-spe: Store operation types in packet
  perf arm-spe: Fill address info for memory samples
  perf arm-spe: Set sample's data source field

 tools/perf/arch/arm64/util/Build              |   2 +-
 tools/perf/arch/arm64/util/mem-events.c       |  46 +++++++
 tools/perf/builtin-c2c.c                      |  18 ++-
 tools/perf/builtin-mem.c                      |  71 ++++++++--
 .../util/arm-spe-decoder/arm-spe-decoder.c    |  15 +++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |   8 ++
 tools/perf/util/arm-spe.c                     | 125 +++++++++++++++---
 tools/perf/util/mem-events.c                  |  32 +++--
 tools/perf/util/mem-events.h                  |   3 +-
 9 files changed, 266 insertions(+), 54 deletions(-)
 create mode 100644 tools/perf/arch/arm64/util/mem-events.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 01/11] perf mem: Search event name with more flexible path
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 02/11] perf mem: Introduce weak function perf_mem_events__ptr() Leo Yan
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

Perf tool searches memory event name under the folder
'/sys/devices/cpu/events/', this leads to the limitation for selection
memory profiling event which must be under this folder.  Thus it's
impossible to use any other event as memory event which is not under
this specific folder, e.g. Arm SPE hardware event is not located in
'/sys/devices/cpu/events/' so it cannot be enabled for memory profiling.

This patch changes to search folder from '/sys/devices/cpu/events/' to
'/sys/devices', so it give flexibility to find events which can be used
for memory profiling.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/mem-events.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index ea0af0bc4314..35c8d175a9d2 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -18,8 +18,8 @@ unsigned int perf_mem_events__loads_ldlat = 30;
 #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
 
 struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
-	E("ldlat-loads",	"cpu/mem-loads,ldlat=%u/P",	"mem-loads"),
-	E("ldlat-stores",	"cpu/mem-stores/P",		"mem-stores"),
+	E("ldlat-loads",	"cpu/mem-loads,ldlat=%u/P",	"cpu/events/mem-loads"),
+	E("ldlat-stores",	"cpu/mem-stores/P",		"cpu/events/mem-stores"),
 };
 #undef E
 
@@ -93,7 +93,7 @@ int perf_mem_events__init(void)
 		struct perf_mem_event *e = &perf_mem_events[j];
 		struct stat st;
 
-		scnprintf(path, PATH_MAX, "%s/devices/cpu/events/%s",
+		scnprintf(path, PATH_MAX, "%s/devices/%s",
 			  mnt, e->sysfs_name);
 
 		if (!stat(path, &st))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 02/11] perf mem: Introduce weak function perf_mem_events__ptr()
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 01/11] perf mem: Search event name with more flexible path Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-08-28 15:40   ` James Clark
  2020-08-06  3:07 ` [PATCH RESEND v1 03/11] perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE Leo Yan
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

Different architectures might use different event or different event
parameters for memory profiling, this patch introduces weak function
perf_mem_events__ptr(), which allows to return back architecture
specific memory event.

After the function perf_mem_events__ptr() is introduced, the variable
'perf_mem_events' can be accessed by using this new function; so marks
the variable as 'static' variable, this can allow the architectures to
define its own memory event array.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c     | 18 ++++++++++++------
 tools/perf/builtin-mem.c     | 21 ++++++++++++++-------
 tools/perf/util/mem-events.c | 26 +++++++++++++++++++-------
 tools/perf/util/mem-events.h |  2 +-
 4 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 5938b100eaf4..88e68f36aa62 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2914,6 +2914,7 @@ static int perf_c2c__record(int argc, const char **argv)
 	int ret;
 	bool all_user = false, all_kernel = false;
 	bool event_set = false;
+	struct perf_mem_event *e;
 	struct option options[] = {
 	OPT_CALLBACK('e', "event", &event_set, "event",
 		     "event selector. Use 'perf mem record -e list' to list available events",
@@ -2941,11 +2942,15 @@ static int perf_c2c__record(int argc, const char **argv)
 	rec_argv[i++] = "record";
 
 	if (!event_set) {
-		perf_mem_events[PERF_MEM_EVENTS__LOAD].record  = true;
-		perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
+		e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+		e->record = true;
+
+		e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+		e->record = true;
 	}
 
-	if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
+	e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+	if (e->record)
 		rec_argv[i++] = "-W";
 
 	rec_argv[i++] = "-d";
@@ -2953,12 +2958,13 @@ static int perf_c2c__record(int argc, const char **argv)
 	rec_argv[i++] = "--sample-cpu";
 
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
-		if (!perf_mem_events[j].record)
+		e = perf_mem_events__ptr(j);
+		if (!e->record)
 			continue;
 
-		if (!perf_mem_events[j].supported) {
+		if (!e->supported) {
 			pr_err("failed: event '%s' not supported\n",
-			       perf_mem_events[j].name);
+			       perf_mem_events__name(j));
 			free(rec_argv);
 			return -1;
 		}
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 3523279af6af..9a7df8d01296 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -64,6 +64,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 	const char **rec_argv;
 	int ret;
 	bool all_user = false, all_kernel = false;
+	struct perf_mem_event *e;
 	struct option options[] = {
 	OPT_CALLBACK('e', "event", &mem, "event",
 		     "event selector. use 'perf mem record -e list' to list available events",
@@ -86,13 +87,18 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 
 	rec_argv[i++] = "record";
 
-	if (mem->operation & MEM_OPERATION_LOAD)
-		perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
+	if (mem->operation & MEM_OPERATION_LOAD) {
+		e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+		e->record = true;
+	}
 
-	if (mem->operation & MEM_OPERATION_STORE)
-		perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
+	if (mem->operation & MEM_OPERATION_STORE) {
+		e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+		e->record = true;
+	}
 
-	if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
+	e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+	if (e->record)
 		rec_argv[i++] = "-W";
 
 	rec_argv[i++] = "-d";
@@ -101,10 +107,11 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 		rec_argv[i++] = "--phys-data";
 
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
-		if (!perf_mem_events[j].record)
+		e = perf_mem_events__ptr(j);
+		if (!e->record)
 			continue;
 
-		if (!perf_mem_events[j].supported) {
+		if (!e->supported) {
 			pr_err("failed: event '%s' not supported\n",
 			       perf_mem_events__name(j));
 			free(rec_argv);
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 35c8d175a9d2..7a5a0d699e27 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -17,7 +17,7 @@ unsigned int perf_mem_events__loads_ldlat = 30;
 
 #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
 
-struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
 	E("ldlat-loads",	"cpu/mem-loads,ldlat=%u/P",	"cpu/events/mem-loads"),
 	E("ldlat-stores",	"cpu/mem-stores/P",		"cpu/events/mem-stores"),
 };
@@ -28,19 +28,31 @@ struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
 static char mem_loads_name[100];
 static bool mem_loads_name__init;
 
+struct perf_mem_event * __weak perf_mem_events__ptr(int i)
+{
+	if (i >= PERF_MEM_EVENTS__MAX)
+		return NULL;
+
+	return &perf_mem_events[i];
+}
+
 char * __weak perf_mem_events__name(int i)
 {
+	struct perf_mem_event *e = perf_mem_events__ptr(i);
+
+	if (!e)
+		return NULL;
+
 	if (i == PERF_MEM_EVENTS__LOAD) {
 		if (!mem_loads_name__init) {
 			mem_loads_name__init = true;
 			scnprintf(mem_loads_name, sizeof(mem_loads_name),
-				  perf_mem_events[i].name,
-				  perf_mem_events__loads_ldlat);
+				  e->name, perf_mem_events__loads_ldlat);
 		}
 		return mem_loads_name;
 	}
 
-	return (char *)perf_mem_events[i].name;
+	return (char *)e->name;
 }
 
 int perf_mem_events__parse(const char *str)
@@ -61,7 +73,7 @@ int perf_mem_events__parse(const char *str)
 
 	while (tok) {
 		for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
-			struct perf_mem_event *e = &perf_mem_events[j];
+			struct perf_mem_event *e = perf_mem_events__ptr(j);
 
 			if (strstr(e->tag, tok))
 				e->record = found = true;
@@ -90,7 +102,7 @@ int perf_mem_events__init(void)
 
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
 		char path[PATH_MAX];
-		struct perf_mem_event *e = &perf_mem_events[j];
+		struct perf_mem_event *e = perf_mem_events__ptr(j);
 		struct stat st;
 
 		scnprintf(path, PATH_MAX, "%s/devices/%s",
@@ -108,7 +120,7 @@ void perf_mem_events__list(void)
 	int j;
 
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
-		struct perf_mem_event *e = &perf_mem_events[j];
+		struct perf_mem_event *e = perf_mem_events__ptr(j);
 
 		fprintf(stderr, "%-13s%-*s%s\n",
 			e->tag,
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 904dad34f7f7..726a9c8103e4 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -31,13 +31,13 @@ enum {
 	PERF_MEM_EVENTS__MAX,
 };
 
-extern struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX];
 extern unsigned int perf_mem_events__loads_ldlat;
 
 int perf_mem_events__parse(const char *str);
 int perf_mem_events__init(void);
 
 char *perf_mem_events__name(int i);
+struct perf_mem_event *perf_mem_events__ptr(int i);
 
 void perf_mem_events__list(void);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 03/11] perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 01/11] perf mem: Search event name with more flexible path Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 02/11] perf mem: Introduce weak function perf_mem_events__ptr() Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 04/11] perf mem: Only initialize memory event for recording Leo Yan
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

The existed architectures which have supported perf memory profiling,
usually it contains two types of hardware events: load and store, so if
want to profile memory for both load and store operations, the tool will
use these two events at the same time.  But this is not valid for aux
tracing event, the same event can be used with setting different
configurations for memory operation filtering, e.g the event can be used
to only trace memory load, or only memory store, or trace for both memory
load and store.

This patch introduces a new event PERF_MEM_EVENTS__LOAD_STORE, which is
used to support the event which can record both memory load and store
operations.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-mem.c     | 11 +++++++++--
 tools/perf/util/mem-events.h |  1 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 9a7df8d01296..bd4229ca3685 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -19,8 +19,9 @@
 #include "util/symbol.h"
 #include <linux/err.h>
 
-#define MEM_OPERATION_LOAD	0x1
-#define MEM_OPERATION_STORE	0x2
+#define MEM_OPERATION_LOAD		0x1
+#define MEM_OPERATION_STORE		0x2
+#define MEM_OPERATION_LOAD_STORE	0x4
 
 struct perf_mem {
 	struct perf_tool	tool;
@@ -97,6 +98,11 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 		e->record = true;
 	}
 
+	if (mem->operation & MEM_OPERATION_LOAD_STORE) {
+		e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD_STORE);
+		e->record = true;
+	}
+
 	e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
 	if (e->record)
 		rec_argv[i++] = "-W";
@@ -326,6 +332,7 @@ struct mem_mode {
 static const struct mem_mode mem_modes[]={
 	MEM_OPT("load", MEM_OPERATION_LOAD),
 	MEM_OPT("store", MEM_OPERATION_STORE),
+	MEM_OPT("ldst", MEM_OPERATION_LOAD_STORE),
 	MEM_END
 };
 
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 726a9c8103e4..5ef178278909 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -28,6 +28,7 @@ struct mem_info {
 enum {
 	PERF_MEM_EVENTS__LOAD,
 	PERF_MEM_EVENTS__STORE,
+	PERF_MEM_EVENTS__LOAD_STORE,
 	PERF_MEM_EVENTS__MAX,
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 04/11] perf mem: Only initialize memory event for recording
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (2 preceding siblings ...)
  2020-08-06  3:07 ` [PATCH RESEND v1 03/11] perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 05/11] perf mem: Support AUX trace Leo Yan
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

It's needless to initialize memory events for perf reporting, so only
initialize memory event for perf recording.  This change allows to parse
perf data on cross platforms, e.g. perf tool can output reports even the
machine doesn't enable any memory events.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-mem.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index bd4229ca3685..a7204634893c 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -78,6 +78,11 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 	OPT_END()
 	};
 
+	if (perf_mem_events__init()) {
+		pr_err("failed: memory events not supported\n");
+		return -1;
+	}
+
 	argc = parse_options(argc, argv, options, record_mem_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
 
@@ -436,11 +441,6 @@ int cmd_mem(int argc, const char **argv)
 		NULL
 	};
 
-	if (perf_mem_events__init()) {
-		pr_err("failed: memory events not supported\n");
-		return -1;
-	}
-
 	argc = parse_options_subcommand(argc, argv, mem_options, mem_subcommands,
 					mem_usage, PARSE_OPT_KEEP_UNKNOWN);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 05/11] perf mem: Support AUX trace
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (3 preceding siblings ...)
  2020-08-06  3:07 ` [PATCH RESEND v1 04/11] perf mem: Only initialize memory event for recording Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-09-01 15:52   ` James Clark
  2020-08-06  3:07 ` [PATCH RESEND v1 06/11] perf mem: Support Arm SPE events Leo Yan
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

Perf memory profiling doesn't support aux trace data so the tool cannot
receive the synthesized samples from hardware tracing data.  On the
Arm64 platform, though it doesn't support PMU events for memory load and
store, but Armv8's SPE is a good candidate for memory profiling, the
hardware tracer can record memory accessing operations with physical
address and virtual address for different cache level and it also stats
the memory operations for remote access and TLB.

To allow the perf memory tool to support AUX trace, this patches adds
the aux callbacks for session structure.  It passes the predefined synth
options (like llc, flc, remote_access, tlb, etc) so this notifies the
tracing decoder to generate corresponding samples.  This patch also
invokes the standard API perf_event__process_attr() to register sample
IDs into evlist.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-mem.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index a7204634893c..6c8b5e956a4a 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -7,6 +7,7 @@
 #include "perf.h"
 
 #include <subcmd/parse-options.h>
+#include "util/auxtrace.h"
 #include "util/trace-event.h"
 #include "util/tool.h"
 #include "util/session.h"
@@ -249,6 +250,15 @@ static int process_sample_event(struct perf_tool *tool,
 
 static int report_raw_events(struct perf_mem *mem)
 {
+	struct itrace_synth_opts itrace_synth_opts = {
+		.set = true,
+		.flc = true,		/* First level cache samples */
+		.llc = true,		/* Last level cache samples */
+		.tlb = true,		/* TLB samples */
+		.remote_access = true,	/* Remote access samples */
+		.default_no_sample = true,
+	};
+
 	struct perf_data data = {
 		.path  = input_name,
 		.mode  = PERF_DATA_MODE_READ,
@@ -261,6 +271,8 @@ static int report_raw_events(struct perf_mem *mem)
 	if (IS_ERR(session))
 		return PTR_ERR(session);
 
+	session->itrace_synth_opts = &itrace_synth_opts;
+
 	if (mem->cpu_list) {
 		ret = perf_session__cpu_bitmap(session, mem->cpu_list,
 					       mem->cpu_bitmap);
@@ -394,6 +406,19 @@ parse_mem_ops(const struct option *opt, const char *str, int unset)
 	return ret;
 }
 
+static int process_attr(struct perf_tool *tool __maybe_unused,
+			union perf_event *event,
+			struct evlist **pevlist)
+{
+	int err;
+
+	err = perf_event__process_attr(tool, event, pevlist);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 int cmd_mem(int argc, const char **argv)
 {
 	struct stat st;
@@ -405,8 +430,12 @@ int cmd_mem(int argc, const char **argv)
 			.comm		= perf_event__process_comm,
 			.lost		= perf_event__process_lost,
 			.fork		= perf_event__process_fork,
+			.attr		= process_attr,
 			.build_id	= perf_event__process_build_id,
 			.namespaces	= perf_event__process_namespaces,
+			.auxtrace_info  = perf_event__process_auxtrace_info,
+			.auxtrace       = perf_event__process_auxtrace,
+			.auxtrace_error = perf_event__process_auxtrace_error,
 			.ordered_events	= true,
 		},
 		.input_name		 = "perf.data",
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 06/11] perf mem: Support Arm SPE events
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (4 preceding siblings ...)
  2020-08-06  3:07 ` [PATCH RESEND v1 05/11] perf mem: Support AUX trace Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 07/11] perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC Leo Yan
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

This patch is to add Arm SPE events for perf memory profiling.  It
supports three Arm SPE events:

  - spe-load: memory event for only recording memory load ops;
  - spe-store: memory event for only recording memory store ops;
  - spe-ldst: memory event for recording memory load and store ops.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/arch/arm64/util/Build        |  2 +-
 tools/perf/arch/arm64/util/mem-events.c | 46 +++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/arch/arm64/util/mem-events.c

diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
index 5c13438c7bd4..cb18442e840f 100644
--- a/tools/perf/arch/arm64/util/Build
+++ b/tools/perf/arch/arm64/util/Build
@@ -8,4 +8,4 @@ perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 perf-$(CONFIG_AUXTRACE) += ../../arm/util/pmu.o \
 			      ../../arm/util/auxtrace.o \
 			      ../../arm/util/cs-etm.o \
-			      arm-spe.o
+			      arm-spe.o mem-events.o
diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
new file mode 100644
index 000000000000..f23128db54fb
--- /dev/null
+++ b/tools/perf/arch/arm64/util/mem-events.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "map_symbol.h"
+#include "mem-events.h"
+
+#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
+
+static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+	E("spe-load",	"arm_spe_0/ts_enable=1,load_filter=1,store_filter=0,min_latency=%u/",	"arm_spe_0"),
+	E("spe-store",	"arm_spe_0/ts_enable=1,load_filter=0,store_filter=1/",			"arm_spe_0"),
+	E("spe-ldst",	"arm_spe_0/ts_enable=1,load_filter=1,store_filter=1,min_latency=%u/",	"arm_spe_0"),
+};
+
+static char mem_ld_name[100];
+static char mem_st_name[100];
+static char mem_ldst_name[100];
+
+struct perf_mem_event *perf_mem_events__ptr(int i)
+{
+	if (i >= PERF_MEM_EVENTS__MAX)
+		return NULL;
+
+	return &perf_mem_events[i];
+}
+
+char *perf_mem_events__name(int i)
+{
+	struct perf_mem_event *e = perf_mem_events__ptr(i);
+
+	if (i >= PERF_MEM_EVENTS__MAX)
+		return NULL;
+
+	if (i == PERF_MEM_EVENTS__LOAD) {
+		scnprintf(mem_ld_name, sizeof(mem_ld_name),
+			  e->name, perf_mem_events__loads_ldlat);
+		return mem_ld_name;
+	}
+
+	if (i == PERF_MEM_EVENTS__STORE) {
+		scnprintf(mem_st_name, sizeof(mem_st_name), e->name);
+		return mem_st_name;
+	}
+
+	scnprintf(mem_ldst_name, sizeof(mem_ldst_name),
+		  e->name, perf_mem_events__loads_ldlat);
+	return mem_ldst_name;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 07/11] perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (5 preceding siblings ...)
  2020-08-06  3:07 ` [PATCH RESEND v1 06/11] perf mem: Support Arm SPE events Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 08/11] perf arm-spe: Save memory addresses in packet Leo Yan
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

This patch is to enable attribution PERF_SAMPLE_DATA_SRC for the perf
data, when decoding the tracing data, it will tells the tool it contains
memory data.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 3882a5360ada..c2cf5058648f 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -803,7 +803,7 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	attr.type = PERF_TYPE_HARDWARE;
 	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
 	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
-		PERF_SAMPLE_PERIOD;
+			    PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC;
 	if (spe->timeless_decoding)
 		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
 	else
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 08/11] perf arm-spe: Save memory addresses in packet
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (6 preceding siblings ...)
  2020-08-06  3:07 ` [PATCH RESEND v1 07/11] perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 09/11] perf arm-spe: Store operation types " Leo Yan
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

This patch is to save virtual and physical memory addresses in packet,
the address info can be used for generating memory samples.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 4 ++++
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 93e063f22be5..373dc2d1cf06 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -162,6 +162,10 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
 				decoder->record.from_ip = ip;
 			else if (idx == SPE_ADDR_PKT_HDR_INDEX_BRANCH)
 				decoder->record.to_ip = ip;
+			else if (idx == SPE_ADDR_PKT_HDR_INDEX_DATA_VIRT)
+				decoder->record.addr = ip;
+			else if (idx == SPE_ADDR_PKT_HDR_INDEX_DATA_PHYS)
+				decoder->record.phys_addr = ip;
 			break;
 		case ARM_SPE_COUNTER:
 			break;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index a5111a8d4360..5acddfcffbd1 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -47,6 +47,8 @@ struct arm_spe_record {
 	u64 from_ip;
 	u64 to_ip;
 	u64 timestamp;
+	u64 addr;
+	u64 phys_addr;
 };
 
 struct arm_spe_insn;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 09/11] perf arm-spe: Store operation types in packet
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (7 preceding siblings ...)
  2020-08-06  3:07 ` [PATCH RESEND v1 08/11] perf arm-spe: Save memory addresses in packet Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 10/11] perf arm-spe: Fill address info for memory samples Leo Yan
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

This patch is to store operation types into packet structure, this can
be used by frontend to generate memory accessing info for samples.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 11 +++++++++++
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h |  6 ++++++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 373dc2d1cf06..cba394784b0d 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -172,6 +172,17 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
 		case ARM_SPE_CONTEXT:
 			break;
 		case ARM_SPE_OP_TYPE:
+			/*
+			 * When operation type packet header's class equals 1,
+			 * the payload's least significant bit (LSB) indicates
+			 * the operation type: load/swap or store.
+			 */
+			if (idx == 1) {
+				if (payload & 0x1)
+					decoder->record.op = ARM_SPE_ST;
+				else
+					decoder->record.op = ARM_SPE_LD;
+			}
 			break;
 		case ARM_SPE_EVENTS:
 			if (payload & BIT(EV_L1D_REFILL))
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 5acddfcffbd1..f23188282ef0 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -41,9 +41,15 @@ enum arm_spe_sample_type {
 	ARM_SPE_REMOTE_ACCESS	= 1 << 7,
 };
 
+enum arm_spe_op_type {
+	ARM_SPE_LD		= 1 << 0,
+	ARM_SPE_ST		= 1 << 1,
+};
+
 struct arm_spe_record {
 	enum arm_spe_sample_type type;
 	int err;
+	u32 op;
 	u64 from_ip;
 	u64 to_ip;
 	u64 timestamp;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 10/11] perf arm-spe: Fill address info for memory samples
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (8 preceding siblings ...)
  2020-08-06  3:07 ` [PATCH RESEND v1 09/11] perf arm-spe: Store operation types " Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-08-06  3:07 ` [PATCH RESEND v1 11/11] perf arm-spe: Set sample's data source field Leo Yan
  2020-09-01 16:36 ` [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE James Clark
  11 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

Since the Arm SPE backend decoder has passed virtual and physical
addresses info through packet, these addresses info can be filled into
the synthesize samples, finally the address info can be used for memory
profiling.

To support memory related samples, this patch divides into two functions
for generating samples:
  - arm_spe__synth_mem_sample() is for synthesizing memory accessing and
    TLB related samples;
  - arm_spe__synth_branch_sample() is to synthesize branch samples which
    is mainly for branch miss prediction.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe.c | 52 +++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index c2cf5058648f..74308a72b000 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -235,7 +235,6 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
 	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
 	sample->pid = speq->pid;
 	sample->tid = speq->tid;
-	sample->addr = record->to_ip;
 	sample->period = 1;
 	sample->cpu = speq->cpu;
 
@@ -259,18 +258,37 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
 	return ret;
 }
 
-static int
-arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
-				u64 spe_events_id)
+static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
+				     u64 spe_events_id)
 {
 	struct arm_spe *spe = speq->spe;
+	struct arm_spe_record *record = &speq->decoder->record;
+	union perf_event *event = speq->event_buf;
+	struct perf_sample sample = { 0 };
+
+	arm_spe_prep_sample(spe, speq, event, &sample);
+
+	sample.id = spe_events_id;
+	sample.stream_id = spe_events_id;
+	sample.addr = record->addr;
+	sample.phys_addr = record->phys_addr;
+
+	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq,
+					u64 spe_events_id)
+{
+	struct arm_spe *spe = speq->spe;
+	struct arm_spe_record *record = &speq->decoder->record;
 	union perf_event *event = speq->event_buf;
-	struct perf_sample sample = { .ip = 0, };
+	struct perf_sample sample = { 0 };
 
 	arm_spe_prep_sample(spe, speq, event, &sample);
 
 	sample.id = spe_events_id;
 	sample.stream_id = spe_events_id;
+	sample.addr = record->to_ip;
 
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
@@ -283,15 +301,13 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_flc) {
 		if (record->type & ARM_SPE_L1D_MISS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->l1d_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_L1D_ACCESS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->l1d_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->l1d_access_id);
 			if (err)
 				return err;
 		}
@@ -299,15 +315,13 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_llc) {
 		if (record->type & ARM_SPE_LLC_MISS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->llc_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->llc_miss_id);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_LLC_ACCESS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->llc_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->llc_access_id);
 			if (err)
 				return err;
 		}
@@ -315,31 +329,27 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_tlb) {
 		if (record->type & ARM_SPE_TLB_MISS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->tlb_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->tlb_miss_id);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_TLB_ACCESS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->tlb_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->tlb_access_id);
 			if (err)
 				return err;
 		}
 	}
 
 	if (spe->sample_branch && (record->type & ARM_SPE_BRANCH_MISS)) {
-		err = arm_spe_synth_spe_events_sample(speq,
-						      spe->branch_miss_id);
+		err = arm_spe__synth_branch_sample(speq, spe->branch_miss_id);
 		if (err)
 			return err;
 	}
 
 	if (spe->sample_remote_access &&
 	    (record->type & ARM_SPE_REMOTE_ACCESS)) {
-		err = arm_spe_synth_spe_events_sample(speq,
-						      spe->remote_access_id);
+		err = arm_spe__synth_mem_sample(speq, spe->remote_access_id);
 		if (err)
 			return err;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RESEND v1 11/11] perf arm-spe: Set sample's data source field
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (9 preceding siblings ...)
  2020-08-06  3:07 ` [PATCH RESEND v1 10/11] perf arm-spe: Fill address info for memory samples Leo Yan
@ 2020-08-06  3:07 ` Leo Yan
  2020-09-01 16:36 ` [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE James Clark
  11 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-06  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, James Clark, Wei Li,
	Adrian Hunter, Al Grant, linux-kernel, Mathieu Poirier,
	Mike Leach
  Cc: Leo Yan

The sample structure contains the field 'data_src' which is used to
tell the detailed info for data operations, e.g. this field indicates
the data operation is loading or storing, on which cache level, it's
snooping or remote accessing, etc.  At the end, the 'data_src' will be
parsed by perf memory tool to display human readable strings.

This patch is to fill the 'data_src' field in the synthesized samples
base on different types.  Now support types for Level 1 dcache miss,
Level 1 dcache hit, Last level cache miss, Last level cache access,
TLB miss, TLB hit, remote access for other socket.

Note, current perf tool can display statistics for L1/L2/L3 caches but
it doesn't support the 'last level cache'.  To fit into current
implementation, 'data_src' field uses L3 cache for last level cache.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe.c | 87 +++++++++++++++++++++++++++++++++++----
 1 file changed, 79 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 74308a72b000..3114f059fc2f 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -259,7 +259,7 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
 }
 
 static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
-				     u64 spe_events_id)
+				     u64 spe_events_id, u64 data_src)
 {
 	struct arm_spe *spe = speq->spe;
 	struct arm_spe_record *record = &speq->decoder->record;
@@ -272,6 +272,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
 	sample.stream_id = spe_events_id;
 	sample.addr = record->addr;
 	sample.phys_addr = record->phys_addr;
+	sample.data_src = data_src;
 
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
@@ -293,21 +294,74 @@ static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq,
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
 
+static u64 arm_spe__synth_data_source(const struct arm_spe_record *record,
+				      int type)
+{
+	union perf_mem_data_src	data_src = { 0 };
+
+	if (record->op == ARM_SPE_LD)
+		data_src.mem_op = PERF_MEM_OP_LOAD;
+	else
+		data_src.mem_op = PERF_MEM_OP_STORE;
+
+	switch (type) {
+	case ARM_SPE_L1D_MISS:
+		data_src.mem_lvl_num = PERF_MEM_LVLNUM_L1;
+		data_src.mem_lvl = PERF_MEM_LVL_MISS | PERF_MEM_LVL_L1;
+		break;
+	case ARM_SPE_L1D_ACCESS:
+		data_src.mem_lvl_num = PERF_MEM_LVLNUM_L1;
+		data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L1;
+		break;
+	case ARM_SPE_LLC_MISS:
+		data_src.mem_lvl_num = PERF_MEM_LVLNUM_L3;
+		data_src.mem_lvl = PERF_MEM_LVL_MISS | PERF_MEM_LVL_L3;
+		break;
+	case ARM_SPE_LLC_ACCESS:
+		data_src.mem_lvl_num = PERF_MEM_LVLNUM_L3;
+		data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L3;
+		break;
+	case ARM_SPE_TLB_MISS:
+		data_src.mem_dtlb = PERF_MEM_TLB_WK | PERF_MEM_TLB_MISS;
+		break;
+	case ARM_SPE_TLB_ACCESS:
+		data_src.mem_dtlb = PERF_MEM_TLB_WK | PERF_MEM_TLB_HIT;
+		break;
+	case ARM_SPE_REMOTE_ACCESS:
+		data_src.mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
+		data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_REM_CCE1;
+		break;
+	default:
+		break;
+	}
+
+	return data_src.val;
+}
+
 static int arm_spe_sample(struct arm_spe_queue *speq)
 {
 	const struct arm_spe_record *record = &speq->decoder->record;
 	struct arm_spe *spe = speq->spe;
+	u64 data_src;
 	int err;
 
 	if (spe->sample_flc) {
 		if (record->type & ARM_SPE_L1D_MISS) {
-			err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id);
+			data_src = arm_spe__synth_data_source(record,
+							      ARM_SPE_L1D_MISS);
+
+			err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id,
+							data_src);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_L1D_ACCESS) {
-			err = arm_spe__synth_mem_sample(speq, spe->l1d_access_id);
+			data_src = arm_spe__synth_data_source(record,
+							      ARM_SPE_L1D_ACCESS);
+
+			err = arm_spe__synth_mem_sample(speq, spe->l1d_access_id,
+							data_src);
 			if (err)
 				return err;
 		}
@@ -315,13 +369,21 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_llc) {
 		if (record->type & ARM_SPE_LLC_MISS) {
-			err = arm_spe__synth_mem_sample(speq, spe->llc_miss_id);
+			data_src = arm_spe__synth_data_source(record,
+							      ARM_SPE_LLC_MISS);
+
+			err = arm_spe__synth_mem_sample(speq, spe->llc_miss_id,
+							data_src);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_LLC_ACCESS) {
-			err = arm_spe__synth_mem_sample(speq, spe->llc_access_id);
+			data_src = arm_spe__synth_data_source(record,
+							      ARM_SPE_LLC_ACCESS);
+
+			err = arm_spe__synth_mem_sample(speq, spe->llc_access_id,
+							data_src);
 			if (err)
 				return err;
 		}
@@ -329,13 +391,19 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_tlb) {
 		if (record->type & ARM_SPE_TLB_MISS) {
-			err = arm_spe__synth_mem_sample(speq, spe->tlb_miss_id);
+			data_src = arm_spe__synth_data_source(record,
+							      ARM_SPE_TLB_MISS);
+			err = arm_spe__synth_mem_sample(speq, spe->tlb_miss_id,
+							data_src);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_TLB_ACCESS) {
-			err = arm_spe__synth_mem_sample(speq, spe->tlb_access_id);
+			data_src = arm_spe__synth_data_source(record,
+							      ARM_SPE_TLB_ACCESS);
+			err = arm_spe__synth_mem_sample(speq, spe->tlb_access_id,
+							data_src);
 			if (err)
 				return err;
 		}
@@ -349,7 +417,10 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_remote_access &&
 	    (record->type & ARM_SPE_REMOTE_ACCESS)) {
-		err = arm_spe__synth_mem_sample(speq, spe->remote_access_id);
+		data_src = arm_spe__synth_data_source(record,
+						      ARM_SPE_REMOTE_ACCESS);
+		err = arm_spe__synth_mem_sample(speq, spe->remote_access_id,
+						data_src);
 		if (err)
 			return err;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND v1 02/11] perf mem: Introduce weak function perf_mem_events__ptr()
  2020-08-06  3:07 ` [PATCH RESEND v1 02/11] perf mem: Introduce weak function perf_mem_events__ptr() Leo Yan
@ 2020-08-28 15:40   ` James Clark
  2020-08-31  2:52     ` Leo Yan
  0 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2020-08-28 15:40 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, Wei Li, Adrian Hunter,
	Al Grant, linux-kernel, Mathieu Poirier, Mike Leach
  Cc: nd

Hi Leo,

On 06/08/2020 04:07, Leo Yan wrote:
>  
>  	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> -		if (!perf_mem_events[j].record)
> +		e = perf_mem_events__ptr(j);
> +		if (!e->record)
>  			continue;
>  
> -		if (!perf_mem_events[j].supported) {
> +		if (!e->supported) {
>  			pr_err("failed: event '%s' not supported\n",
> -			       perf_mem_events[j].name);
> +			       perf_mem_events__name(j));
>  			free(rec_argv);
>  			return -1;

Does it make sense to do something like:

   for(j = 0; e = perf_mem_events__ptr(j); j++) {
       ...
   }

now that it's a weak function that returns NULL when the argument out of range. That way the caller
doesn't need to know about PERF_MEM_EVENTS__MAX as well and it could potentially be a different
value. I don't know if it would ever make sense to have a different number of events on different platforms?

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND v1 02/11] perf mem: Introduce weak function perf_mem_events__ptr()
  2020-08-28 15:40   ` James Clark
@ 2020-08-31  2:52     ` Leo Yan
  0 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-08-31  2:52 UTC (permalink / raw)
  To: James Clark
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, Wei Li, Adrian Hunter,
	Al Grant, linux-kernel, Mathieu Poirier, Mike Leach, nd

On Fri, Aug 28, 2020 at 04:40:29PM +0100, James Clark wrote:
> Hi Leo,
> 
> On 06/08/2020 04:07, Leo Yan wrote:
> >  
> >  	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > -		if (!perf_mem_events[j].record)
> > +		e = perf_mem_events__ptr(j);
> > +		if (!e->record)
> >  			continue;
> >  
> > -		if (!perf_mem_events[j].supported) {
> > +		if (!e->supported) {
> >  			pr_err("failed: event '%s' not supported\n",
> > -			       perf_mem_events[j].name);
> > +			       perf_mem_events__name(j));
> >  			free(rec_argv);
> >  			return -1;
> 
> Does it make sense to do something like:
> 
>    for(j = 0; e = perf_mem_events__ptr(j); j++) {
>        ...
>    }
> 
> now that it's a weak function that returns NULL when the argument out of range. That way the caller
> doesn't need to know about PERF_MEM_EVENTS__MAX as well and it could potentially be a different
> value. I don't know if it would ever make sense to have a different number of events on different platforms?

Thanks for reviewing, James.

If you look into the later patch "perf mem: Support new memory event
PERF_MEM_EVENTS__LOAD_STORE", you could find it introduces a new event
which will be only used for Arm SPE but will not be used by other
archs.

Your suggestion is good to encapsulate the macro PERF_MEM_EVENTS__MAX
into perf_mem_events__ptr(), I will try it in next spin.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND v1 05/11] perf mem: Support AUX trace
  2020-08-06  3:07 ` [PATCH RESEND v1 05/11] perf mem: Support AUX trace Leo Yan
@ 2020-09-01 15:52   ` James Clark
  2020-09-03  9:07     ` Leo Yan
  0 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2020-09-01 15:52 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, Wei Li, Adrian Hunter,
	Al Grant, linux-kernel, Mathieu Poirier, Mike Leach
  Cc: nd

Hi Leo,

>  
> +static int process_attr(struct perf_tool *tool __maybe_unused,
> +			union perf_event *event,
> +			struct evlist **pevlist)
> +{
> +	int err;
> +
> +	err = perf_event__process_attr(tool, event, pevlist);
> +	if (err)
> +		return err;
> +
> +	return 0;
> +}
> +
>  int cmd_mem(int argc, const char **argv)
>  {
>  	struct stat st;
> @@ -405,8 +430,12 @@ int cmd_mem(int argc, const char **argv)
>  			.comm		= perf_event__process_comm,
>  			.lost		= perf_event__process_lost,
>  			.fork		= perf_event__process_fork,
> +			.attr		= process_attr,
>  			.build_id	= perf_event__process_build_id,

I don't understand the __maybe_unused here. And also isn't this equivalent
to this without the new function:

  @@ -405,8 +430,12 @@ int cmd_mem(int argc, const char **argv)
   			.comm		= perf_event__process_comm,
   			.lost		= perf_event__process_lost,
   			.fork		= perf_event__process_fork,
  +			.attr		= perf_event__process_attr,
   			.build_id	= perf_event__process_build_id,


James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE
  2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (10 preceding siblings ...)
  2020-08-06  3:07 ` [PATCH RESEND v1 11/11] perf arm-spe: Set sample's data source field Leo Yan
@ 2020-09-01 16:36 ` James Clark
  2020-09-03  9:13   ` Leo Yan
  11 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2020-09-01 16:36 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, Wei Li, Adrian Hunter,
	Al Grant, linux-kernel, Mathieu Poirier, Mike Leach
  Cc: nd

On 06/08/2020 04:07, Leo Yan wrote:
> This patch set is to support AUX trace and Arm SPE as the first enabled
> hardware tracing for Perf memory tool.
> 

Hi Leo,

I've tested this patchset with "./perf mem record -e spe-store ./a.out" on N1 and it's working for me.
Thanks for submitting this!

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND v1 05/11] perf mem: Support AUX trace
  2020-09-01 15:52   ` James Clark
@ 2020-09-03  9:07     ` Leo Yan
  0 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-09-03  9:07 UTC (permalink / raw)
  To: James Clark
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, Wei Li, Adrian Hunter,
	Al Grant, linux-kernel, Mathieu Poirier, Mike Leach, nd

Hi James,

On Tue, Sep 01, 2020 at 04:52:54PM +0100, James Clark wrote:
> Hi Leo,
> 
> >  
> > +static int process_attr(struct perf_tool *tool __maybe_unused,
> > +			union perf_event *event,
> > +			struct evlist **pevlist)
> > +{
> > +	int err;
> > +
> > +	err = perf_event__process_attr(tool, event, pevlist);
> > +	if (err)
> > +		return err;
> > +
> > +	return 0;
> > +}
> > +
> >  int cmd_mem(int argc, const char **argv)
> >  {
> >  	struct stat st;
> > @@ -405,8 +430,12 @@ int cmd_mem(int argc, const char **argv)
> >  			.comm		= perf_event__process_comm,
> >  			.lost		= perf_event__process_lost,
> >  			.fork		= perf_event__process_fork,
> > +			.attr		= process_attr,
> >  			.build_id	= perf_event__process_build_id,
> 
> I don't understand the __maybe_unused here. And also isn't this equivalent
> to this without the new function:
> 
>   @@ -405,8 +430,12 @@ int cmd_mem(int argc, const char **argv)
>    			.comm		= perf_event__process_comm,
>    			.lost		= perf_event__process_lost,
>    			.fork		= perf_event__process_fork,
>   +			.attr		= perf_event__process_attr,
>    			.build_id	= perf_event__process_build_id,

Thanks for pointing out this, will fix this with your suggestion.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE
  2020-09-01 16:36 ` [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE James Clark
@ 2020-09-03  9:13   ` Leo Yan
  0 siblings, 0 replies; 18+ messages in thread
From: Leo Yan @ 2020-09-03  9:13 UTC (permalink / raw)
  To: James Clark
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Namhyung Kim,
	Naveen N. Rao, Ian Rogers, Kemeng Shi, Wei Li, Adrian Hunter,
	Al Grant, linux-kernel, Mathieu Poirier, Mike Leach, nd

On Tue, Sep 01, 2020 at 05:36:50PM +0100, James Clark wrote:
> On 06/08/2020 04:07, Leo Yan wrote:
> > This patch set is to support AUX trace and Arm SPE as the first enabled
> > hardware tracing for Perf memory tool.
> > 
> 
> Hi Leo,
> 
> I've tested this patchset with "./perf mem record -e spe-store ./a.out" on N1 and it's working for me.
> Thanks for submitting this!

Thanks a lot for your testing!  I will add your testing tag for the
patches which have not been changed in later patch set.

P.s. I have sent patch set v2 [1] for the reviewing, a brief change
comparing to v1 is it introduces 'memory' event so can allow all memory
operations to display in the same view.  You are welcome to review and
give comments, thanks!

Leo

[1] https://lore.kernel.org/lkml/20200901083815.13755-1-leo.yan@linaro.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-09-03  9:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-06  3:07 [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 01/11] perf mem: Search event name with more flexible path Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 02/11] perf mem: Introduce weak function perf_mem_events__ptr() Leo Yan
2020-08-28 15:40   ` James Clark
2020-08-31  2:52     ` Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 03/11] perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 04/11] perf mem: Only initialize memory event for recording Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 05/11] perf mem: Support AUX trace Leo Yan
2020-09-01 15:52   ` James Clark
2020-09-03  9:07     ` Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 06/11] perf mem: Support Arm SPE events Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 07/11] perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 08/11] perf arm-spe: Save memory addresses in packet Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 09/11] perf arm-spe: Store operation types " Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 10/11] perf arm-spe: Fill address info for memory samples Leo Yan
2020-08-06  3:07 ` [PATCH RESEND v1 11/11] perf arm-spe: Set sample's data source field Leo Yan
2020-09-01 16:36 ` [PATCH RESEND v1 00/11] perf mem: Support AUX trace and Arm SPE James Clark
2020-09-03  9:13   ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).