linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE
@ 2020-09-01  8:38 Leo Yan
  2020-09-01  8:38 ` [PATCH v2 01/14] perf mem: Search event name with more flexible path Leo Yan
                   ` (13 more replies)
  0 siblings, 14 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

This patch set is to support AUX trace and Arm SPE as the first enabled
hardware tracing for Perf memory tool.

Patches 01 ~ 04 are preparasion patches which mainly resolve the issue
for memory events, since the existed code is hard coded the memory
events which based on x86 and PowerPC architectures, so patches 01 ~ 04
extend to support more flexible memory event name, and introduce weak
functions so can allow every architecture to define its own memory
events structure and returning event pointer and name respectively.

Patches 05 and 06 are used to extend Perf memory tool to support AUX
trace, and add a new option 'M' for itrace for generate memory events.

Patches 07 ~ 13 are to support Arm SPE with Perf memory tool.  Firstly it
registers SPE events for memory events, then it extends the SPE packet
to pass addresses info and operation types, and also set 'data_src'
field so can allow the tool to display readable string in the result.

Patch 14 is to update documentation to reflect changes introduced for
support Arm SPE.

This patch set has been tested on ARMv8 Hisilicon D06 platform and
verfied on x86 so avoid to cause regression.  Please note, this patch
set is dependent on the patch set "perf arm-spe: Refactor decoding &
dumping flow" [1].

Below commands can run successfully on D06:

  $ perf mem record -t ldst -- ~/false_sharing.exe 2
  $ perf mem record -t load -- ~/false_sharing.exe 2
  $ perf mem record -t store -- ~/false_sharing.exe 2
  $ perf mem report
  $ perf mem report --itrace=M

  # Samples: 391K of event 'memory'
  # Total weight : 391193
  # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
  #
  # Overhead       Samples  Local Weight  Memory access             Symbol                                           Shared Object       Data Symbol                                                   Data Object        Snoop         TLB access              Locked
  # ........  ............  ............  ........................  ...............................................  ..................  ............................................................  .................  ............  ......................  ......
  #
      18.56%         72611  0             L1 or L1 miss             [.] read_write_func                              false_sharing.exe   [.] buf1+0x0                                                  false_sharing.exe  N/A           Walker hit              No
      16.16%         63207  0             L1 or L1 hit              [.] read_write_func                              false_sharing.exe   [.] __do_global_dtors_aux_fini_array_entry+0x228              false_sharing.exe  N/A           Walker hit              No
      15.91%         62239  0             L1 or L1 hit              [.] read_write_func                              false_sharing.exe   [.] __do_global_dtors_aux_fini_array_entry+0x250              false_sharing.exe  N/A           Walker hit              No
       4.67%         18280  0             N/A                       [.] read_write_func                              false_sharing.exe   [.] buf2+0x8                                                  false_sharing.exe  N/A           Walker hit              No
       3.34%         13082  0             L1 or L1 hit              [.] read_write_func                              false_sharing.exe   [.] __do_global_dtors_aux_fini_array_entry+0x230              false_sharing.exe  N/A           Walker hit              No
       2.49%          9755  0             L1 or L1 hit              [.] read_write_func                              false_sharing.exe   [.] 0x0000aaaac23a3450                                        false_sharing.exe  N/A           Walker hit              No
       2.46%          9611  0             L1 or L1 hit              [.] read_write_func                              false_sharing.exe   [.] lock_thd_name+0x0                                         false_sharing.exe  N/A           Walker hit              No
       2.26%          8856  0             L1 or L1 hit              [.] read_write_func                              false_sharing.exe   [.] 0x0000aaaac23a3458                                        false_sharing.exe  N/A           Walker hit              No
       2.19%          8549  0             L1 or L1 miss             [.] read_write_func                              false_sharing.exe   [.] buf2+0x28                                                 false_sharing.exe  N/A           Walker hit              No

Changes from v1:
* Refined patch 02 to use perf_mem_events__ptr() to return event pointer
  and check if pointer is NULL, and remove the condition checking for
  PERF_MEM_EVENTS__MAX; (James Clark)
* Added new itrace option 'M' for memory events;
* Added patch 14 to update documentation.

[1] https://lore.kernel.org/patchwork/cover/1288406/


Leo Yan (14):
  perf mem: Search event name with more flexible path
  perf mem: Introduce weak function perf_mem_events__ptr()
  perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE
  perf mem: Only initialize memory event for recording
  perf auxtrace: Add option '-M' for memory events
  perf mem: Support AUX trace
  perf mem: Support Arm SPE events
  perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC
  perf arm-spe: Save memory addresses in packet
  perf arm-spe: Store operation types in packet
  perf arm-spe: Fill address info for samples
  perf arm-spe: Synthesize memory event
  perf arm-spe: Set sample's data source field
  perf mem: Document options introduced by Arm SPE

 tools/perf/Documentation/itrace.txt           |   1 +
 tools/perf/Documentation/perf-mem.txt         |  10 +-
 tools/perf/arch/arm64/util/Build              |   2 +-
 tools/perf/arch/arm64/util/mem-events.c       |  46 ++++++
 tools/perf/builtin-c2c.c                      |  23 ++-
 tools/perf/builtin-mem.c                      |  73 ++++++++--
 .../util/arm-spe-decoder/arm-spe-decoder.c    |  15 ++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |   8 ++
 tools/perf/util/arm-spe.c                     | 132 +++++++++++++++---
 tools/perf/util/auxtrace.c                    |   4 +
 tools/perf/util/auxtrace.h                    |   2 +
 tools/perf/util/mem-events.c                  |  41 ++++--
 tools/perf/util/mem-events.h                  |   3 +-
 13 files changed, 302 insertions(+), 58 deletions(-)
 create mode 100644 tools/perf/arch/arm64/util/mem-events.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/14] perf mem: Search event name with more flexible path
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-03 13:50   ` Jiri Olsa
  2020-09-01  8:38 ` [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr() Leo Yan
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

Perf tool searches memory event name under the folder
'/sys/devices/cpu/events/', this leads to the limitation for selection
memory profiling event which must be under this folder.  Thus it's
impossible to use any other event as memory event which is not under
this specific folder, e.g. Arm SPE hardware event is not located in
'/sys/devices/cpu/events/' so it cannot be enabled for memory profiling.

This patch changes to search folder from '/sys/devices/cpu/events/' to
'/sys/devices', so it give flexibility to find events which can be used
for memory profiling.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/mem-events.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index ea0af0bc4314..35c8d175a9d2 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -18,8 +18,8 @@ unsigned int perf_mem_events__loads_ldlat = 30;
 #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
 
 struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
-	E("ldlat-loads",	"cpu/mem-loads,ldlat=%u/P",	"mem-loads"),
-	E("ldlat-stores",	"cpu/mem-stores/P",		"mem-stores"),
+	E("ldlat-loads",	"cpu/mem-loads,ldlat=%u/P",	"cpu/events/mem-loads"),
+	E("ldlat-stores",	"cpu/mem-stores/P",		"cpu/events/mem-stores"),
 };
 #undef E
 
@@ -93,7 +93,7 @@ int perf_mem_events__init(void)
 		struct perf_mem_event *e = &perf_mem_events[j];
 		struct stat st;
 
-		scnprintf(path, PATH_MAX, "%s/devices/cpu/events/%s",
+		scnprintf(path, PATH_MAX, "%s/devices/%s",
 			  mnt, e->sysfs_name);
 
 		if (!stat(path, &st))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr()
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
  2020-09-01  8:38 ` [PATCH v2 01/14] perf mem: Search event name with more flexible path Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-03 13:50   ` Jiri Olsa
  2020-09-01  8:38 ` [PATCH v2 03/14] perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE Leo Yan
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

Different architectures might use different event or different event
parameters for memory profiling, this patch introduces weak function
perf_mem_events__ptr(), which allows to return back architecture
specific memory event.

After the function perf_mem_events__ptr() is introduced, the variable
'perf_mem_events' can be accessed by using this new function; so marks
the variable as 'static' variable, this can allow the architectures to
define its own memory event array.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c     | 23 ++++++++++++++++-------
 tools/perf/builtin-mem.c     | 26 ++++++++++++++++++--------
 tools/perf/util/mem-events.c | 26 +++++++++++++++++++-------
 tools/perf/util/mem-events.h |  2 +-
 4 files changed, 54 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 5938b100eaf4..594ec6b015b5 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2914,6 +2914,7 @@ static int perf_c2c__record(int argc, const char **argv)
 	int ret;
 	bool all_user = false, all_kernel = false;
 	bool event_set = false;
+	struct perf_mem_event *e;
 	struct option options[] = {
 	OPT_CALLBACK('e', "event", &event_set, "event",
 		     "event selector. Use 'perf mem record -e list' to list available events",
@@ -2941,30 +2942,38 @@ static int perf_c2c__record(int argc, const char **argv)
 	rec_argv[i++] = "record";
 
 	if (!event_set) {
-		perf_mem_events[PERF_MEM_EVENTS__LOAD].record  = true;
-		perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
+		e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+		e->record = true;
+
+		e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+		e->record = true;
 	}
 
-	if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
+	e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+	if (e->record)
 		rec_argv[i++] = "-W";
 
 	rec_argv[i++] = "-d";
 	rec_argv[i++] = "--phys-data";
 	rec_argv[i++] = "--sample-cpu";
 
-	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
-		if (!perf_mem_events[j].record)
+	j = 0;
+	while ((e = perf_mem_events__ptr(j)) != NULL) {
+		if (!e->record) {
+			j++;
 			continue;
+		}
 
-		if (!perf_mem_events[j].supported) {
+		if (!e->supported) {
 			pr_err("failed: event '%s' not supported\n",
-			       perf_mem_events[j].name);
+			       perf_mem_events__name(j));
 			free(rec_argv);
 			return -1;
 		}
 
 		rec_argv[i++] = "-e";
 		rec_argv[i++] = perf_mem_events__name(j);
+		j++;
 	}
 
 	if (all_user)
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 3523279af6af..070e0f1d3300 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -64,6 +64,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 	const char **rec_argv;
 	int ret;
 	bool all_user = false, all_kernel = false;
+	struct perf_mem_event *e;
 	struct option options[] = {
 	OPT_CALLBACK('e', "event", &mem, "event",
 		     "event selector. use 'perf mem record -e list' to list available events",
@@ -86,13 +87,18 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 
 	rec_argv[i++] = "record";
 
-	if (mem->operation & MEM_OPERATION_LOAD)
-		perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
+	if (mem->operation & MEM_OPERATION_LOAD) {
+		e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+		e->record = true;
+	}
 
-	if (mem->operation & MEM_OPERATION_STORE)
-		perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
+	if (mem->operation & MEM_OPERATION_STORE) {
+		e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+		e->record = true;
+	}
 
-	if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
+	e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+	if (e->record)
 		rec_argv[i++] = "-W";
 
 	rec_argv[i++] = "-d";
@@ -100,11 +106,14 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 	if (mem->phys_addr)
 		rec_argv[i++] = "--phys-data";
 
-	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
-		if (!perf_mem_events[j].record)
+	j = 0;
+	while ((e = perf_mem_events__ptr(j)) != NULL) {
+		if (!e->record) {
+			j++;
 			continue;
+		}
 
-		if (!perf_mem_events[j].supported) {
+		if (!e->supported) {
 			pr_err("failed: event '%s' not supported\n",
 			       perf_mem_events__name(j));
 			free(rec_argv);
@@ -113,6 +122,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 
 		rec_argv[i++] = "-e";
 		rec_argv[i++] = perf_mem_events__name(j);
+		j++;
 	}
 
 	if (all_user)
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 35c8d175a9d2..7a5a0d699e27 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -17,7 +17,7 @@ unsigned int perf_mem_events__loads_ldlat = 30;
 
 #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
 
-struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
 	E("ldlat-loads",	"cpu/mem-loads,ldlat=%u/P",	"cpu/events/mem-loads"),
 	E("ldlat-stores",	"cpu/mem-stores/P",		"cpu/events/mem-stores"),
 };
@@ -28,19 +28,31 @@ struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
 static char mem_loads_name[100];
 static bool mem_loads_name__init;
 
+struct perf_mem_event * __weak perf_mem_events__ptr(int i)
+{
+	if (i >= PERF_MEM_EVENTS__MAX)
+		return NULL;
+
+	return &perf_mem_events[i];
+}
+
 char * __weak perf_mem_events__name(int i)
 {
+	struct perf_mem_event *e = perf_mem_events__ptr(i);
+
+	if (!e)
+		return NULL;
+
 	if (i == PERF_MEM_EVENTS__LOAD) {
 		if (!mem_loads_name__init) {
 			mem_loads_name__init = true;
 			scnprintf(mem_loads_name, sizeof(mem_loads_name),
-				  perf_mem_events[i].name,
-				  perf_mem_events__loads_ldlat);
+				  e->name, perf_mem_events__loads_ldlat);
 		}
 		return mem_loads_name;
 	}
 
-	return (char *)perf_mem_events[i].name;
+	return (char *)e->name;
 }
 
 int perf_mem_events__parse(const char *str)
@@ -61,7 +73,7 @@ int perf_mem_events__parse(const char *str)
 
 	while (tok) {
 		for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
-			struct perf_mem_event *e = &perf_mem_events[j];
+			struct perf_mem_event *e = perf_mem_events__ptr(j);
 
 			if (strstr(e->tag, tok))
 				e->record = found = true;
@@ -90,7 +102,7 @@ int perf_mem_events__init(void)
 
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
 		char path[PATH_MAX];
-		struct perf_mem_event *e = &perf_mem_events[j];
+		struct perf_mem_event *e = perf_mem_events__ptr(j);
 		struct stat st;
 
 		scnprintf(path, PATH_MAX, "%s/devices/%s",
@@ -108,7 +120,7 @@ void perf_mem_events__list(void)
 	int j;
 
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
-		struct perf_mem_event *e = &perf_mem_events[j];
+		struct perf_mem_event *e = perf_mem_events__ptr(j);
 
 		fprintf(stderr, "%-13s%-*s%s\n",
 			e->tag,
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 904dad34f7f7..726a9c8103e4 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -31,13 +31,13 @@ enum {
 	PERF_MEM_EVENTS__MAX,
 };
 
-extern struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX];
 extern unsigned int perf_mem_events__loads_ldlat;
 
 int perf_mem_events__parse(const char *str);
 int perf_mem_events__init(void);
 
 char *perf_mem_events__name(int i);
+struct perf_mem_event *perf_mem_events__ptr(int i);
 
 void perf_mem_events__list(void);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/14] perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
  2020-09-01  8:38 ` [PATCH v2 01/14] perf mem: Search event name with more flexible path Leo Yan
  2020-09-01  8:38 ` [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr() Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 04/14] perf mem: Only initialize memory event for recording Leo Yan
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

The existed architectures which have supported perf memory profiling,
usually it contains two types of hardware events: load and store, so if
want to profile memory for both load and store operations, the tool will
use these two events at the same time.  But this is not valid for aux
tracing event, the same event can be used with setting different
configurations for memory operation filtering, e.g the event can be used
to only trace memory load, or only memory store, or trace for both memory
load and store.

This patch introduces a new event PERF_MEM_EVENTS__LOAD_STORE, which is
used to support the event which can record both memory load and store
operations.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-mem.c     | 11 +++++++++--
 tools/perf/util/mem-events.c |  9 ++++++++-
 tools/perf/util/mem-events.h |  1 +
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 070e0f1d3300..9fd730019e45 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -19,8 +19,9 @@
 #include "util/symbol.h"
 #include <linux/err.h>
 
-#define MEM_OPERATION_LOAD	0x1
-#define MEM_OPERATION_STORE	0x2
+#define MEM_OPERATION_LOAD		0x1
+#define MEM_OPERATION_STORE		0x2
+#define MEM_OPERATION_LOAD_STORE	0x4
 
 struct perf_mem {
 	struct perf_tool	tool;
@@ -97,6 +98,11 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 		e->record = true;
 	}
 
+	if (mem->operation & MEM_OPERATION_LOAD_STORE) {
+		e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD_STORE);
+		e->record = true;
+	}
+
 	e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
 	if (e->record)
 		rec_argv[i++] = "-W";
@@ -329,6 +335,7 @@ struct mem_mode {
 static const struct mem_mode mem_modes[]={
 	MEM_OPT("load", MEM_OPERATION_LOAD),
 	MEM_OPT("store", MEM_OPERATION_STORE),
+	MEM_OPT("ldst", MEM_OPERATION_LOAD_STORE),
 	MEM_END
 };
 
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 7a5a0d699e27..74449cf33a0e 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -20,6 +20,7 @@ unsigned int perf_mem_events__loads_ldlat = 30;
 static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
 	E("ldlat-loads",	"cpu/mem-loads,ldlat=%u/P",	"cpu/events/mem-loads"),
 	E("ldlat-stores",	"cpu/mem-stores/P",		"cpu/events/mem-stores"),
+	E(NULL,			NULL,				NULL),
 };
 #undef E
 
@@ -75,6 +76,9 @@ int perf_mem_events__parse(const char *str)
 		for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
 			struct perf_mem_event *e = perf_mem_events__ptr(j);
 
+			if (!e->tag)
+				continue;
+
 			if (strstr(e->tag, tok))
 				e->record = found = true;
 		}
@@ -105,6 +109,9 @@ int perf_mem_events__init(void)
 		struct perf_mem_event *e = perf_mem_events__ptr(j);
 		struct stat st;
 
+		if (!e->sysfs_name)
+			continue;
+
 		scnprintf(path, PATH_MAX, "%s/devices/%s",
 			  mnt, e->sysfs_name);
 
@@ -123,7 +130,7 @@ void perf_mem_events__list(void)
 		struct perf_mem_event *e = perf_mem_events__ptr(j);
 
 		fprintf(stderr, "%-13s%-*s%s\n",
-			e->tag,
+			e->tag ? e->tag : "",
 			verbose > 0 ? 25 : 0,
 			verbose > 0 ? perf_mem_events__name(j) : "",
 			e->supported ? ": available" : "");
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 726a9c8103e4..5ef178278909 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -28,6 +28,7 @@ struct mem_info {
 enum {
 	PERF_MEM_EVENTS__LOAD,
 	PERF_MEM_EVENTS__STORE,
+	PERF_MEM_EVENTS__LOAD_STORE,
 	PERF_MEM_EVENTS__MAX,
 };
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/14] perf mem: Only initialize memory event for recording
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (2 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 03/14] perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 05/14] perf auxtrace: Add option '-M' for memory events Leo Yan
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

It's needless to initialize memory events for perf reporting, so only
initialize memory event for perf recording.  This change allows to parse
perf data on cross platforms, e.g. perf tool can output reports even the
machine doesn't enable any memory events.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-mem.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 9fd730019e45..b9432ee27754 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -78,6 +78,11 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 	OPT_END()
 	};
 
+	if (perf_mem_events__init()) {
+		pr_err("failed: memory events not supported\n");
+		return -1;
+	}
+
 	argc = parse_options(argc, argv, options, record_mem_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
 
@@ -439,11 +444,6 @@ int cmd_mem(int argc, const char **argv)
 		NULL
 	};
 
-	if (perf_mem_events__init()) {
-		pr_err("failed: memory events not supported\n");
-		return -1;
-	}
-
 	argc = parse_options_subcommand(argc, argv, mem_options, mem_subcommands,
 					mem_usage, PARSE_OPT_KEEP_UNKNOWN);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/14] perf auxtrace: Add option '-M' for memory events
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (3 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 04/14] perf mem: Only initialize memory event for recording Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 06/14] perf mem: Support AUX trace Leo Yan
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

This patch is to add option '-M', the AUX trace data can use this option
to synthesize memory event.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/Documentation/itrace.txt | 1 +
 tools/perf/util/auxtrace.c          | 4 ++++
 tools/perf/util/auxtrace.h          | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index d3740c8f399b..079cdfabb352 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -11,6 +11,7 @@
 		d	create a debug log
 		f	synthesize first level cache events
 		m	synthesize last level cache events
+		M	synthesize memory events
 		t	synthesize TLB events
 		a	synthesize remote access events
 		g	synthesize a call chain (use with i or x)
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 42a85c86421d..62e7f6c5f8b5 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1333,6 +1333,7 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
 	synth_opts->flc = true;
 	synth_opts->llc = true;
 	synth_opts->tlb = true;
+	synth_opts->mem = true;
 	synth_opts->remote_access = true;
 
 	if (no_sample) {
@@ -1554,6 +1555,9 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 		case 'a':
 			synth_opts->remote_access = true;
 			break;
+		case 'M':
+			synth_opts->mem = true;
+			break;
 		case 'q':
 			synth_opts->quick += 1;
 			break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 951d2d14cf24..7e5c9e1552bd 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -88,6 +88,7 @@ enum itrace_period_type {
  * @llc: whether to synthesize last level cache events
  * @tlb: whether to synthesize TLB events
  * @remote_access: whether to synthesize remote access events
+ * @mem: whether to synthesize memory events
  * @callchain_sz: maximum callchain size
  * @last_branch_sz: branch context size
  * @period: 'instructions' events period
@@ -126,6 +127,7 @@ struct itrace_synth_opts {
 	bool			llc;
 	bool			tlb;
 	bool			remote_access;
+	bool			mem;
 	unsigned int		callchain_sz;
 	unsigned int		last_branch_sz;
 	unsigned long long	period;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/14] perf mem: Support AUX trace
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (4 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 05/14] perf auxtrace: Add option '-M' for memory events Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 07/14] perf mem: Support Arm SPE events Leo Yan
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

Perf memory profiling doesn't support aux trace data so the tool cannot
receive the synthesized samples from hardware tracing data.  On the
Arm64 platform, though it doesn't support PMU events for memory load and
store, but Armv8's SPE is a good candidate for memory profiling, the
hardware tracer can record memory accessing operations with physical
address and virtual address for different cache levels and it also stats
the memory operations for remote access and TLB.

To allow the perf memory tool to support AUX trace, this patches adds
the aux callbacks for session structure.  It passes the predefined
itrace option to ask the AUX trace decoder to generate memory samples.
This patch also invokes the standard API perf_event__process_attr() to
register sample IDs into evlist.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-mem.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index b9432ee27754..ded416d68d88 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -7,6 +7,7 @@
 #include "perf.h"
 
 #include <subcmd/parse-options.h>
+#include "util/auxtrace.h"
 #include "util/trace-event.h"
 #include "util/tool.h"
 #include "util/session.h"
@@ -252,6 +253,12 @@ static int process_sample_event(struct perf_tool *tool,
 
 static int report_raw_events(struct perf_mem *mem)
 {
+	struct itrace_synth_opts itrace_synth_opts = {
+		.set = true,
+		.mem = true,		/* Memory samples */
+		.default_no_sample = true,
+	};
+
 	struct perf_data data = {
 		.path  = input_name,
 		.mode  = PERF_DATA_MODE_READ,
@@ -264,6 +271,8 @@ static int report_raw_events(struct perf_mem *mem)
 	if (IS_ERR(session))
 		return PTR_ERR(session);
 
+	session->itrace_synth_opts = &itrace_synth_opts;
+
 	if (mem->cpu_list) {
 		ret = perf_session__cpu_bitmap(session, mem->cpu_list,
 					       mem->cpu_bitmap);
@@ -397,6 +406,19 @@ parse_mem_ops(const struct option *opt, const char *str, int unset)
 	return ret;
 }
 
+static int process_attr(struct perf_tool *tool __maybe_unused,
+			union perf_event *event,
+			struct evlist **pevlist)
+{
+	int err;
+
+	err = perf_event__process_attr(tool, event, pevlist);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 int cmd_mem(int argc, const char **argv)
 {
 	struct stat st;
@@ -408,8 +430,12 @@ int cmd_mem(int argc, const char **argv)
 			.comm		= perf_event__process_comm,
 			.lost		= perf_event__process_lost,
 			.fork		= perf_event__process_fork,
+			.attr		= process_attr,
 			.build_id	= perf_event__process_build_id,
 			.namespaces	= perf_event__process_namespaces,
+			.auxtrace_info  = perf_event__process_auxtrace_info,
+			.auxtrace       = perf_event__process_auxtrace,
+			.auxtrace_error = perf_event__process_auxtrace_error,
 			.ordered_events	= true,
 		},
 		.input_name		 = "perf.data",
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/14] perf mem: Support Arm SPE events
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (5 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 06/14] perf mem: Support AUX trace Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 08/14] perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC Leo Yan
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

This patch is to add Arm SPE events for perf memory profiling.  It
supports three Arm SPE events:

  - spe-load: memory event for only recording memory load ops;
  - spe-store: memory event for only recording memory store ops;
  - spe-ldst: memory event for recording memory load and store ops.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/arch/arm64/util/Build        |  2 +-
 tools/perf/arch/arm64/util/mem-events.c | 46 +++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/arch/arm64/util/mem-events.c

diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
index 77f4d7b30932..df6c3d9ebaa6 100644
--- a/tools/perf/arch/arm64/util/Build
+++ b/tools/perf/arch/arm64/util/Build
@@ -9,4 +9,4 @@ perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 perf-$(CONFIG_AUXTRACE) += ../../arm/util/pmu.o \
 			      ../../arm/util/auxtrace.o \
 			      ../../arm/util/cs-etm.o \
-			      arm-spe.o
+			      arm-spe.o mem-events.o
diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
new file mode 100644
index 000000000000..f23128db54fb
--- /dev/null
+++ b/tools/perf/arch/arm64/util/mem-events.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "map_symbol.h"
+#include "mem-events.h"
+
+#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
+
+static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+	E("spe-load",	"arm_spe_0/ts_enable=1,load_filter=1,store_filter=0,min_latency=%u/",	"arm_spe_0"),
+	E("spe-store",	"arm_spe_0/ts_enable=1,load_filter=0,store_filter=1/",			"arm_spe_0"),
+	E("spe-ldst",	"arm_spe_0/ts_enable=1,load_filter=1,store_filter=1,min_latency=%u/",	"arm_spe_0"),
+};
+
+static char mem_ld_name[100];
+static char mem_st_name[100];
+static char mem_ldst_name[100];
+
+struct perf_mem_event *perf_mem_events__ptr(int i)
+{
+	if (i >= PERF_MEM_EVENTS__MAX)
+		return NULL;
+
+	return &perf_mem_events[i];
+}
+
+char *perf_mem_events__name(int i)
+{
+	struct perf_mem_event *e = perf_mem_events__ptr(i);
+
+	if (i >= PERF_MEM_EVENTS__MAX)
+		return NULL;
+
+	if (i == PERF_MEM_EVENTS__LOAD) {
+		scnprintf(mem_ld_name, sizeof(mem_ld_name),
+			  e->name, perf_mem_events__loads_ldlat);
+		return mem_ld_name;
+	}
+
+	if (i == PERF_MEM_EVENTS__STORE) {
+		scnprintf(mem_st_name, sizeof(mem_st_name), e->name);
+		return mem_st_name;
+	}
+
+	scnprintf(mem_ldst_name, sizeof(mem_ldst_name),
+		  e->name, perf_mem_events__loads_ldlat);
+	return mem_ldst_name;
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/14] perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (6 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 07/14] perf mem: Support Arm SPE events Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 09/14] perf arm-spe: Save memory addresses in packet Leo Yan
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

This patch is to enable attribution PERF_SAMPLE_DATA_SRC for the perf
data, when decoding the tracing data, it will tells the tool it contains
memory data.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 07232664c927..305ab725b3ba 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -810,7 +810,7 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	attr.type = PERF_TYPE_HARDWARE;
 	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
 	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
-		PERF_SAMPLE_PERIOD;
+			    PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC;
 	if (spe->timeless_decoding)
 		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
 	else
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/14] perf arm-spe: Save memory addresses in packet
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (7 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 08/14] perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 10/14] perf arm-spe: Store operation types " Leo Yan
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

This patch is to save virtual and physical memory addresses in packet,
the address info can be used for generating memory samples.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 4 ++++
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index ae718e3419e3..1c430657939f 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -160,6 +160,10 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
 				decoder->record.from_ip = ip;
 			else if (idx == SPE_ADDR_PKT_HDR_INDEX_BRANCH)
 				decoder->record.to_ip = ip;
+			else if (idx == SPE_ADDR_PKT_HDR_INDEX_DATA_VIRT)
+				decoder->record.addr = ip;
+			else if (idx == SPE_ADDR_PKT_HDR_INDEX_DATA_PHYS)
+				decoder->record.phys_addr = ip;
 			break;
 		case ARM_SPE_COUNTER:
 			break;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 24727b8ca7ff..31d1776785de 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -30,6 +30,8 @@ struct arm_spe_record {
 	u64 from_ip;
 	u64 to_ip;
 	u64 timestamp;
+	u64 addr;
+	u64 phys_addr;
 };
 
 struct arm_spe_insn;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/14] perf arm-spe: Store operation types in packet
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (8 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 09/14] perf arm-spe: Save memory addresses in packet Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 11/14] perf arm-spe: Fill address info for samples Leo Yan
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

This patch is to store operation types into packet structure, this can
be used by frontend to generate memory accessing info for samples.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 11 +++++++++++
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h |  6 ++++++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 1c430657939f..7bf787c47f5b 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -170,6 +170,17 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
 		case ARM_SPE_CONTEXT:
 			break;
 		case ARM_SPE_OP_TYPE:
+			/*
+			 * When operation type packet header's class equals 1,
+			 * the payload's least significant bit (LSB) indicates
+			 * the operation type: load/swap or store.
+			 */
+			if (idx == 1) {
+				if (payload & 0x1)
+					decoder->record.op = ARM_SPE_ST;
+				else
+					decoder->record.op = ARM_SPE_LD;
+			}
 			break;
 		case ARM_SPE_EVENTS:
 			if (payload & SPE_EVT_PKT_L1D_REFILL)
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 31d1776785de..3273cee95ea1 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -24,9 +24,15 @@ enum arm_spe_sample_type {
 	ARM_SPE_REMOTE_ACCESS	= 1 << 7,
 };
 
+enum arm_spe_op_type {
+	ARM_SPE_LD		= 1 << 0,
+	ARM_SPE_ST		= 1 << 1,
+};
+
 struct arm_spe_record {
 	enum arm_spe_sample_type type;
 	int err;
+	u32 op;
 	u64 from_ip;
 	u64 to_ip;
 	u64 timestamp;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 11/14] perf arm-spe: Fill address info for samples
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (9 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 10/14] perf arm-spe: Store operation types " Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 12/14] perf arm-spe: Synthesize memory event Leo Yan
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

Since the Arm SPE backend decoder has passed virtual and physical
addresses info through packet, these addresses info can be filled into
the synthesize samples, finally the address info can be used for memory
profiling.

This patch divides into two functions for generating samples:
  - arm_spe__synth_mem_sample() is for synthesizing memory accessing and
    TLB related samples;
  - arm_spe__synth_branch_sample() is to synthesize branch samples which
    is mainly for branch miss prediction.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe.c | 52 +++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 305ab725b3ba..44e73e0ff4e7 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -238,7 +238,6 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
 	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
 	sample->pid = speq->pid;
 	sample->tid = speq->tid;
-	sample->addr = record->to_ip;
 	sample->period = 1;
 	sample->cpu = speq->cpu;
 
@@ -262,18 +261,37 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
 	return ret;
 }
 
-static int
-arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
-				u64 spe_events_id)
+static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
+				     u64 spe_events_id)
 {
 	struct arm_spe *spe = speq->spe;
+	struct arm_spe_record *record = &speq->decoder->record;
+	union perf_event *event = speq->event_buf;
+	struct perf_sample sample = { 0 };
+
+	arm_spe_prep_sample(spe, speq, event, &sample);
+
+	sample.id = spe_events_id;
+	sample.stream_id = spe_events_id;
+	sample.addr = record->addr;
+	sample.phys_addr = record->phys_addr;
+
+	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq,
+					u64 spe_events_id)
+{
+	struct arm_spe *spe = speq->spe;
+	struct arm_spe_record *record = &speq->decoder->record;
 	union perf_event *event = speq->event_buf;
-	struct perf_sample sample = { .ip = 0, };
+	struct perf_sample sample = { 0 };
 
 	arm_spe_prep_sample(spe, speq, event, &sample);
 
 	sample.id = spe_events_id;
 	sample.stream_id = spe_events_id;
+	sample.addr = record->to_ip;
 
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
@@ -286,15 +304,13 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_flc) {
 		if (record->type & ARM_SPE_L1D_MISS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->l1d_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_L1D_ACCESS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->l1d_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->l1d_access_id);
 			if (err)
 				return err;
 		}
@@ -302,15 +318,13 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_llc) {
 		if (record->type & ARM_SPE_LLC_MISS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->llc_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->llc_miss_id);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_LLC_ACCESS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->llc_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->llc_access_id);
 			if (err)
 				return err;
 		}
@@ -318,31 +332,27 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_tlb) {
 		if (record->type & ARM_SPE_TLB_MISS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->tlb_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->tlb_miss_id);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_TLB_ACCESS) {
-			err = arm_spe_synth_spe_events_sample(
-					speq, spe->tlb_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->tlb_access_id);
 			if (err)
 				return err;
 		}
 	}
 
 	if (spe->sample_branch && (record->type & ARM_SPE_BRANCH_MISS)) {
-		err = arm_spe_synth_spe_events_sample(speq,
-						      spe->branch_miss_id);
+		err = arm_spe__synth_branch_sample(speq, spe->branch_miss_id);
 		if (err)
 			return err;
 	}
 
 	if (spe->sample_remote_access &&
 	    (record->type & ARM_SPE_REMOTE_ACCESS)) {
-		err = arm_spe_synth_spe_events_sample(speq,
-						      spe->remote_access_id);
+		err = arm_spe__synth_mem_sample(speq, spe->remote_access_id);
 		if (err)
 			return err;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 12/14] perf arm-spe: Synthesize memory event
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (10 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 11/14] perf arm-spe: Fill address info for samples Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 13/14] perf arm-spe: Set sample's data source field Leo Yan
  2020-09-01  8:38 ` [PATCH v2 14/14] perf mem: Document options introduced by Arm SPE Leo Yan
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

This patch is to synthesize memory event, it generates memory events for
all memory levels.  The memory event can delivery two benefits for SPE:

- The first benefit is the memory event can give out global view for
  memory accessing, rather than using scatter mode to organize events
  (such as L1 cache, last level cache, remote accessing, etc) which can
  only display a single memory type, memory events contain all memory
  accessing so it's easier to review the memory behaviour cross
  different memory levels;

- The second benefit is the samples generation might introduce big
  overhead and need to wait for long time for Perf reporting, we can
  specify itrace option '--itrace=M' to filter out other events and only
  output memory events, this can significantly reduce the overhead
  caused by generating samples.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 44e73e0ff4e7..7f44ef8c89f1 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -55,6 +55,7 @@ struct arm_spe {
 	u8				sample_tlb;
 	u8				sample_branch;
 	u8				sample_remote_access;
+	u8				sample_memory;
 
 	u64				l1d_miss_id;
 	u64				l1d_access_id;
@@ -64,6 +65,7 @@ struct arm_spe {
 	u64				tlb_access_id;
 	u64				branch_miss_id;
 	u64				remote_access_id;
+	u64				memory_id;
 
 	u64				kernel_start;
 
@@ -296,6 +298,19 @@ static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq,
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
 
+static bool arm_spe__is_memory_event(enum arm_spe_sample_type type)
+{
+	int mem_type = ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS |
+		       ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS |
+		       ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS |
+		       ARM_SPE_REMOTE_ACCESS;
+
+	if (type & mem_type)
+		return true;
+
+	return false;
+}
+
 static int arm_spe_sample(struct arm_spe_queue *speq)
 {
 	const struct arm_spe_record *record = &speq->decoder->record;
@@ -357,6 +372,12 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 			return err;
 	}
 
+	if (spe->sample_memory && arm_spe__is_memory_event(record->type)) {
+		err = arm_spe__synth_mem_sample(speq, spe->memory_id);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
@@ -924,6 +945,18 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 		id += 1;
 	}
 
+	if (spe->synth_opts.mem) {
+		spe->sample_memory = true;
+
+		/* Remote access */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->memory_id = id;
+		arm_spe_set_event_name(evlist, id, "memory");
+		id += 1;
+	}
+
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 13/14] perf arm-spe: Set sample's data source field
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (11 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 12/14] perf arm-spe: Synthesize memory event Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  2020-09-01  8:38 ` [PATCH v2 14/14] perf mem: Document options introduced by Arm SPE Leo Yan
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

The sample structure contains the field 'data_src' which is used to
tell the detailed info for data operations, e.g. this field indicates
the data operation is loading or storing, on which cache level, it's
snooping or remote accessing, etc.  At the end, the 'data_src' will be
parsed by perf memory tool to display human readable strings.

This patch is to fill the 'data_src' field in the synthesized samples
base on different types.  Now support types for Level 1 dcache miss,
Level 1 dcache hit, Last level cache miss, Last level cache access,
TLB miss, TLB hit, remote access for other socket.

Note, current perf tool can display statistics for L1/L2/L3 caches but
it doesn't support the 'last level cache'.  To fit into current
implementation, 'data_src' field uses L3 cache for last level cache.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe.c | 63 +++++++++++++++++++++++++++++++++------
 1 file changed, 54 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 7f44ef8c89f1..142149f732b3 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -264,7 +264,7 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
 }
 
 static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
-				     u64 spe_events_id)
+				     u64 spe_events_id, u64 data_src)
 {
 	struct arm_spe *spe = speq->spe;
 	struct arm_spe_record *record = &speq->decoder->record;
@@ -277,6 +277,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
 	sample.stream_id = spe_events_id;
 	sample.addr = record->addr;
 	sample.phys_addr = record->phys_addr;
+	sample.data_src = data_src;
 
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
@@ -311,21 +312,60 @@ static bool arm_spe__is_memory_event(enum arm_spe_sample_type type)
 	return false;
 }
 
+static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
+{
+	union perf_mem_data_src	data_src = { 0 };
+
+	if (record->op == ARM_SPE_LD)
+		data_src.mem_op = PERF_MEM_OP_LOAD;
+	else
+		data_src.mem_op = PERF_MEM_OP_STORE;
+
+	if (record->type & ARM_SPE_L1D_MISS) {
+		data_src.mem_lvl_num = PERF_MEM_LVLNUM_L1;
+		data_src.mem_lvl = PERF_MEM_LVL_MISS | PERF_MEM_LVL_L1;
+	} else if (record->type & ARM_SPE_L1D_ACCESS) {
+		data_src.mem_lvl_num = PERF_MEM_LVLNUM_L1;
+		data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L1;
+	} else if (record->type & ARM_SPE_LLC_MISS) {
+		data_src.mem_lvl_num = PERF_MEM_LVLNUM_L3;
+		data_src.mem_lvl = PERF_MEM_LVL_MISS | PERF_MEM_LVL_L3;
+	} else if (record->type & ARM_SPE_LLC_ACCESS) {
+		data_src.mem_lvl_num = PERF_MEM_LVLNUM_L3;
+		data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L3;
+	} else if (record->type & ARM_SPE_REMOTE_ACCESS) {
+		data_src.mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
+		data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_REM_CCE1;
+	}
+
+	if (record->type & ARM_SPE_TLB_MISS)
+		data_src.mem_dtlb = PERF_MEM_TLB_WK | PERF_MEM_TLB_MISS;
+	else if (record->type & ARM_SPE_TLB_ACCESS)
+		data_src.mem_dtlb = PERF_MEM_TLB_WK | PERF_MEM_TLB_HIT;
+
+	return data_src.val;
+}
+
 static int arm_spe_sample(struct arm_spe_queue *speq)
 {
 	const struct arm_spe_record *record = &speq->decoder->record;
 	struct arm_spe *spe = speq->spe;
+	u64 data_src;
 	int err;
 
+	data_src = arm_spe__synth_data_source(record);
+
 	if (spe->sample_flc) {
 		if (record->type & ARM_SPE_L1D_MISS) {
-			err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id,
+							data_src);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_L1D_ACCESS) {
-			err = arm_spe__synth_mem_sample(speq, spe->l1d_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->l1d_access_id,
+							data_src);
 			if (err)
 				return err;
 		}
@@ -333,13 +373,15 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_llc) {
 		if (record->type & ARM_SPE_LLC_MISS) {
-			err = arm_spe__synth_mem_sample(speq, spe->llc_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->llc_miss_id,
+							data_src);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_LLC_ACCESS) {
-			err = arm_spe__synth_mem_sample(speq, spe->llc_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->llc_access_id,
+							data_src);
 			if (err)
 				return err;
 		}
@@ -347,13 +389,15 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_tlb) {
 		if (record->type & ARM_SPE_TLB_MISS) {
-			err = arm_spe__synth_mem_sample(speq, spe->tlb_miss_id);
+			err = arm_spe__synth_mem_sample(speq, spe->tlb_miss_id,
+							data_src);
 			if (err)
 				return err;
 		}
 
 		if (record->type & ARM_SPE_TLB_ACCESS) {
-			err = arm_spe__synth_mem_sample(speq, spe->tlb_access_id);
+			err = arm_spe__synth_mem_sample(speq, spe->tlb_access_id,
+							data_src);
 			if (err)
 				return err;
 		}
@@ -367,13 +411,14 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
 
 	if (spe->sample_remote_access &&
 	    (record->type & ARM_SPE_REMOTE_ACCESS)) {
-		err = arm_spe__synth_mem_sample(speq, spe->remote_access_id);
+		err = arm_spe__synth_mem_sample(speq, spe->remote_access_id,
+						data_src);
 		if (err)
 			return err;
 	}
 
 	if (spe->sample_memory && arm_spe__is_memory_event(record->type)) {
-		err = arm_spe__synth_mem_sample(speq, spe->memory_id);
+		err = arm_spe__synth_mem_sample(speq, spe->memory_id, data_src);
 		if (err)
 			return err;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 14/14] perf mem: Document options introduced by Arm SPE
  2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
                   ` (12 preceding siblings ...)
  2020-09-01  8:38 ` [PATCH v2 13/14] perf arm-spe: Set sample's data source field Leo Yan
@ 2020-09-01  8:38 ` Leo Yan
  13 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-01  8:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Andi Kleen, Ian Rogers, Nick Desaulniers,
	Naveen N. Rao, Kemeng Shi, James Clark, Wei Li, Al Grant,
	Will Deacon, Mathieu Poirier, Mike Leach, linux-kernel
  Cc: Leo Yan

Document new options which is introduced by Arm SPE, the event type
'ldst' is added for recording both load and store memory operations;
and adds the information for '--itrace=M' which can be used to
synthesize memory samples for Arm SPE.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/Documentation/perf-mem.txt | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 199ea0f0a6c0..2455d485044f 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -38,7 +38,12 @@ OPTIONS
 
 -t::
 --type=<type>::
-	Select the memory operation type: load or store (default: load,store)
+	Select the memory operation type: load, store, ldst (default: load,store).
+	The type 'ldst' means the single event can record both for load and store
+	operations; on Intel and PowerPC systems, the types 'load' and 'store' are
+        supported but misses the type 'ldst'; on Arm64 system, it uses SPE AUX
+	trace data to generate memory events, so need to specify one of these
+	three types.
 
 -D::
 --dump-raw-samples::
@@ -84,6 +89,9 @@ RECORD OPTIONS
 --ldlat <n>::
 	Specify desired latency for loads event. (x86 only)
 
+--itrace=M::
+	Synthesize memory samples from the AUX trace data. (Arm SPE only)
+
 In addition, for report all perf report options are valid, and for record
 all perf record options.
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 01/14] perf mem: Search event name with more flexible path
  2020-09-01  8:38 ` [PATCH v2 01/14] perf mem: Search event name with more flexible path Leo Yan
@ 2020-09-03 13:50   ` Jiri Olsa
  0 siblings, 0 replies; 20+ messages in thread
From: Jiri Olsa @ 2020-09-03 13:50 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Adrian Hunter,
	Andi Kleen, Ian Rogers, Nick Desaulniers, Naveen N. Rao,
	Kemeng Shi, James Clark, Wei Li, Al Grant, Will Deacon,
	Mathieu Poirier, Mike Leach, linux-kernel

On Tue, Sep 01, 2020 at 09:38:02AM +0100, Leo Yan wrote:
> Perf tool searches memory event name under the folder
> '/sys/devices/cpu/events/', this leads to the limitation for selection
> memory profiling event which must be under this folder.  Thus it's
> impossible to use any other event as memory event which is not under
> this specific folder, e.g. Arm SPE hardware event is not located in
> '/sys/devices/cpu/events/' so it cannot be enabled for memory profiling.
> 
> This patch changes to search folder from '/sys/devices/cpu/events/' to
> '/sys/devices', so it give flexibility to find events which can be used
> for memory profiling.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>

Acked-by: Jiri Olsa <jolsa@redhat.com>

thanks,
jirka

> ---
>  tools/perf/util/mem-events.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index ea0af0bc4314..35c8d175a9d2 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -18,8 +18,8 @@ unsigned int perf_mem_events__loads_ldlat = 30;
>  #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
>  
>  struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
> -	E("ldlat-loads",	"cpu/mem-loads,ldlat=%u/P",	"mem-loads"),
> -	E("ldlat-stores",	"cpu/mem-stores/P",		"mem-stores"),
> +	E("ldlat-loads",	"cpu/mem-loads,ldlat=%u/P",	"cpu/events/mem-loads"),
> +	E("ldlat-stores",	"cpu/mem-stores/P",		"cpu/events/mem-stores"),
>  };
>  #undef E
>  
> @@ -93,7 +93,7 @@ int perf_mem_events__init(void)
>  		struct perf_mem_event *e = &perf_mem_events[j];
>  		struct stat st;
>  
> -		scnprintf(path, PATH_MAX, "%s/devices/cpu/events/%s",
> +		scnprintf(path, PATH_MAX, "%s/devices/%s",
>  			  mnt, e->sysfs_name);
>  
>  		if (!stat(path, &st))
> -- 
> 2.20.1
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr()
  2020-09-01  8:38 ` [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr() Leo Yan
@ 2020-09-03 13:50   ` Jiri Olsa
  2020-09-04  0:34     ` Leo Yan
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2020-09-03 13:50 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Adrian Hunter,
	Andi Kleen, Ian Rogers, Nick Desaulniers, Naveen N. Rao,
	Kemeng Shi, James Clark, Wei Li, Al Grant, Will Deacon,
	Mathieu Poirier, Mike Leach, linux-kernel

On Tue, Sep 01, 2020 at 09:38:03AM +0100, Leo Yan wrote:

SNIP

> @@ -2941,30 +2942,38 @@ static int perf_c2c__record(int argc, const char **argv)
>  	rec_argv[i++] = "record";
>  
>  	if (!event_set) {
> -		perf_mem_events[PERF_MEM_EVENTS__LOAD].record  = true;
> -		perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
> +		e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> +		e->record = true;
> +
> +		e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
> +		e->record = true;
>  	}
>  
> -	if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
> +	e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> +	if (e->record)
>  		rec_argv[i++] = "-W";
>  
>  	rec_argv[i++] = "-d";
>  	rec_argv[i++] = "--phys-data";
>  	rec_argv[i++] = "--sample-cpu";
>  
> -	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> -		if (!perf_mem_events[j].record)
> +	j = 0;
> +	while ((e = perf_mem_events__ptr(j)) != NULL) {
> +		if (!e->record) {

you could keep the above 'for loop' in here, it seems better
than taking care of j++

> +			j++;
>  			continue;
> +		}
>  
> -		if (!perf_mem_events[j].supported) {
> +		if (!e->supported) {
>  			pr_err("failed: event '%s' not supported\n",
> -			       perf_mem_events[j].name);
> +			       perf_mem_events__name(j));
>  			free(rec_argv);
>  			return -1;
>  		}
>  
>  		rec_argv[i++] = "-e";
>  		rec_argv[i++] = perf_mem_events__name(j);
> +		j++;
>  	}
>  
>  	if (all_user)

SNIP

> @@ -100,11 +106,14 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
>  	if (mem->phys_addr)
>  		rec_argv[i++] = "--phys-data";
>  
> -	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> -		if (!perf_mem_events[j].record)
> +	j = 0;
> +	while ((e = perf_mem_events__ptr(j)) != NULL) {
> +		if (!e->record) {

same here

thanks,
jirka

> +			j++;
>  			continue;
> +		}
>  
> -		if (!perf_mem_events[j].supported) {
> +		if (!e->supported) {
>  			pr_err("failed: event '%s' not supported\n",
>  			       perf_mem_events__name(j));
>  			free(rec_argv);
> @@ -113,6 +122,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
>  
>  		rec_argv[i++] = "-e";
>  		rec_argv[i++] = perf_mem_events__name(j);
> +		j++;

SNIP


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr()
  2020-09-03 13:50   ` Jiri Olsa
@ 2020-09-04  0:34     ` Leo Yan
  2020-09-04 15:52       ` Jiri Olsa
  0 siblings, 1 reply; 20+ messages in thread
From: Leo Yan @ 2020-09-04  0:34 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Adrian Hunter,
	Andi Kleen, Ian Rogers, Nick Desaulniers, Naveen N. Rao,
	Kemeng Shi, James Clark, Wei Li, Al Grant, Will Deacon,
	Mathieu Poirier, Mike Leach, linux-kernel

Hi Jiri,

On Thu, Sep 03, 2020 at 03:50:54PM +0200, Jiri Olsa wrote:
> On Tue, Sep 01, 2020 at 09:38:03AM +0100, Leo Yan wrote:
> 
> SNIP
> 
> > @@ -2941,30 +2942,38 @@ static int perf_c2c__record(int argc, const char **argv)
> >  	rec_argv[i++] = "record";
> >  
> >  	if (!event_set) {
> > -		perf_mem_events[PERF_MEM_EVENTS__LOAD].record  = true;
> > -		perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
> > +		e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > +		e->record = true;
> > +
> > +		e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
> > +		e->record = true;
> >  	}
> >  
> > -	if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
> > +	e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > +	if (e->record)
> >  		rec_argv[i++] = "-W";
> >  
> >  	rec_argv[i++] = "-d";
> >  	rec_argv[i++] = "--phys-data";
> >  	rec_argv[i++] = "--sample-cpu";
> >  
> > -	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > -		if (!perf_mem_events[j].record)
> > +	j = 0;
> > +	while ((e = perf_mem_events__ptr(j)) != NULL) {
> > +		if (!e->record) {
> 
> you could keep the above 'for loop' in here, it seems better
> than taking care of j++

Actually in patch v1 I did this way :)  I followed James' suggestion to
encapsulate PERF_MEM_EVENTS__MAX into perf_mem_events__ptr(), thus
builtin-mem.c and buildin-c2c.c are not necessary to use
PERF_MEM_EVENTS__MAX in the loop and only needs to detect if the
pointer is NULL or not when return from perf_mem_events__ptr().

How about change as below?

        for (j = 0; (e = perf_mem_events__ptr(j)) != NULL; j++) {
                [...]
        }

If you still think this is not good, I will change back to the old
code style in next spin

Thanks for reviewing!

Leo

> > +			j++;
> >  			continue;
> > +		}
> >  
> > -		if (!perf_mem_events[j].supported) {
> > +		if (!e->supported) {
> >  			pr_err("failed: event '%s' not supported\n",
> > -			       perf_mem_events[j].name);
> > +			       perf_mem_events__name(j));
> >  			free(rec_argv);
> >  			return -1;
> >  		}
> >  
> >  		rec_argv[i++] = "-e";
> >  		rec_argv[i++] = perf_mem_events__name(j);
> > +		j++;
> >  	}
> >  
> >  	if (all_user)
> 
> SNIP
> 
> > @@ -100,11 +106,14 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
> >  	if (mem->phys_addr)
> >  		rec_argv[i++] = "--phys-data";
> >  
> > -	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > -		if (!perf_mem_events[j].record)
> > +	j = 0;
> > +	while ((e = perf_mem_events__ptr(j)) != NULL) {
> > +		if (!e->record) {
> 
> same here
> 
> thanks,
> jirka
> 
> > +			j++;
> >  			continue;
> > +		}
> >  
> > -		if (!perf_mem_events[j].supported) {
> > +		if (!e->supported) {
> >  			pr_err("failed: event '%s' not supported\n",
> >  			       perf_mem_events__name(j));
> >  			free(rec_argv);
> > @@ -113,6 +122,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
> >  
> >  		rec_argv[i++] = "-e";
> >  		rec_argv[i++] = perf_mem_events__name(j);
> > +		j++;
> 
> SNIP
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr()
  2020-09-04  0:34     ` Leo Yan
@ 2020-09-04 15:52       ` Jiri Olsa
  2020-09-07  8:17         ` Leo Yan
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2020-09-04 15:52 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Adrian Hunter,
	Andi Kleen, Ian Rogers, Nick Desaulniers, Naveen N. Rao,
	Kemeng Shi, James Clark, Wei Li, Al Grant, Will Deacon,
	Mathieu Poirier, Mike Leach, linux-kernel

On Fri, Sep 04, 2020 at 08:34:47AM +0800, Leo Yan wrote:
> Hi Jiri,
> 
> On Thu, Sep 03, 2020 at 03:50:54PM +0200, Jiri Olsa wrote:
> > On Tue, Sep 01, 2020 at 09:38:03AM +0100, Leo Yan wrote:
> > 
> > SNIP
> > 
> > > @@ -2941,30 +2942,38 @@ static int perf_c2c__record(int argc, const char **argv)
> > >  	rec_argv[i++] = "record";
> > >  
> > >  	if (!event_set) {
> > > -		perf_mem_events[PERF_MEM_EVENTS__LOAD].record  = true;
> > > -		perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
> > > +		e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > > +		e->record = true;
> > > +
> > > +		e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
> > > +		e->record = true;
> > >  	}
> > >  
> > > -	if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
> > > +	e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > > +	if (e->record)
> > >  		rec_argv[i++] = "-W";
> > >  
> > >  	rec_argv[i++] = "-d";
> > >  	rec_argv[i++] = "--phys-data";
> > >  	rec_argv[i++] = "--sample-cpu";
> > >  
> > > -	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > > -		if (!perf_mem_events[j].record)
> > > +	j = 0;
> > > +	while ((e = perf_mem_events__ptr(j)) != NULL) {
> > > +		if (!e->record) {
> > 
> > you could keep the above 'for loop' in here, it seems better
> > than taking care of j++
> 
> Actually in patch v1 I did this way :)  I followed James' suggestion to
> encapsulate PERF_MEM_EVENTS__MAX into perf_mem_events__ptr(), thus
> builtin-mem.c and buildin-c2c.c are not necessary to use
> PERF_MEM_EVENTS__MAX in the loop and only needs to detect if the
> pointer is NULL or not when return from perf_mem_events__ptr().

ah because u added that load_store event

> 
> How about change as below?
> 
>         for (j = 0; (e = perf_mem_events__ptr(j)) != NULL; j++) {
>                 [...]

will this work? e will be NULL for first iteration no?

there are still other for loops with PERF_MEM_EVENTS__MAX used
in the patch.. you overload the perf_mem_events access for arm,
and add missing load_store NULL item to generic version, so there's
always PERF_MEM_EVENTS__MAX items in the array

can we just use the current for loop and check for e->tag != NULL
or any other field

thanks,
jirka


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr()
  2020-09-04 15:52       ` Jiri Olsa
@ 2020-09-07  8:17         ` Leo Yan
  0 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2020-09-07  8:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Adrian Hunter,
	Andi Kleen, Ian Rogers, Nick Desaulniers, Naveen N. Rao,
	Kemeng Shi, James Clark, Wei Li, Al Grant, Will Deacon,
	Mathieu Poirier, Mike Leach, linux-kernel

On Fri, Sep 04, 2020 at 05:52:51PM +0200, Jiri Olsa wrote:
> On Fri, Sep 04, 2020 at 08:34:47AM +0800, Leo Yan wrote:
> > Hi Jiri,
> > 
> > On Thu, Sep 03, 2020 at 03:50:54PM +0200, Jiri Olsa wrote:
> > > On Tue, Sep 01, 2020 at 09:38:03AM +0100, Leo Yan wrote:
> > > 
> > > SNIP
> > > 
> > > > @@ -2941,30 +2942,38 @@ static int perf_c2c__record(int argc, const char **argv)
> > > >  	rec_argv[i++] = "record";
> > > >  
> > > >  	if (!event_set) {
> > > > -		perf_mem_events[PERF_MEM_EVENTS__LOAD].record  = true;
> > > > -		perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
> > > > +		e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > > > +		e->record = true;
> > > > +
> > > > +		e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
> > > > +		e->record = true;
> > > >  	}
> > > >  
> > > > -	if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
> > > > +	e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > > > +	if (e->record)
> > > >  		rec_argv[i++] = "-W";
> > > >  
> > > >  	rec_argv[i++] = "-d";
> > > >  	rec_argv[i++] = "--phys-data";
> > > >  	rec_argv[i++] = "--sample-cpu";
> > > >  
> > > > -	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > > > -		if (!perf_mem_events[j].record)
> > > > +	j = 0;
> > > > +	while ((e = perf_mem_events__ptr(j)) != NULL) {
> > > > +		if (!e->record) {
> > > 
> > > you could keep the above 'for loop' in here, it seems better
> > > than taking care of j++
> > 
> > Actually in patch v1 I did this way :)  I followed James' suggestion to
> > encapsulate PERF_MEM_EVENTS__MAX into perf_mem_events__ptr(), thus
> > builtin-mem.c and buildin-c2c.c are not necessary to use
> > PERF_MEM_EVENTS__MAX in the loop and only needs to detect if the
> > pointer is NULL or not when return from perf_mem_events__ptr().
> 
> ah because u added that load_store event

Yes.

> > 
> > How about change as below?
> > 
> >         for (j = 0; (e = perf_mem_events__ptr(j)) != NULL; j++) {
> >                 [...]
> 
> will this work? e will be NULL for first iteration no?
> 
> there are still other for loops with PERF_MEM_EVENTS__MAX used
> in the patch.. you overload the perf_mem_events access for arm,
> and add missing load_store NULL item to generic version, so there's
> always PERF_MEM_EVENTS__MAX items in the array

Yes, exactly.

> can we just use the current for loop and check for e->tag != NULL
> or any other field

Understood.  This would be directive, will keep current code and will
check 'e->record' field.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-09-07  8:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-01  8:38 [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE Leo Yan
2020-09-01  8:38 ` [PATCH v2 01/14] perf mem: Search event name with more flexible path Leo Yan
2020-09-03 13:50   ` Jiri Olsa
2020-09-01  8:38 ` [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr() Leo Yan
2020-09-03 13:50   ` Jiri Olsa
2020-09-04  0:34     ` Leo Yan
2020-09-04 15:52       ` Jiri Olsa
2020-09-07  8:17         ` Leo Yan
2020-09-01  8:38 ` [PATCH v2 03/14] perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE Leo Yan
2020-09-01  8:38 ` [PATCH v2 04/14] perf mem: Only initialize memory event for recording Leo Yan
2020-09-01  8:38 ` [PATCH v2 05/14] perf auxtrace: Add option '-M' for memory events Leo Yan
2020-09-01  8:38 ` [PATCH v2 06/14] perf mem: Support AUX trace Leo Yan
2020-09-01  8:38 ` [PATCH v2 07/14] perf mem: Support Arm SPE events Leo Yan
2020-09-01  8:38 ` [PATCH v2 08/14] perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC Leo Yan
2020-09-01  8:38 ` [PATCH v2 09/14] perf arm-spe: Save memory addresses in packet Leo Yan
2020-09-01  8:38 ` [PATCH v2 10/14] perf arm-spe: Store operation types " Leo Yan
2020-09-01  8:38 ` [PATCH v2 11/14] perf arm-spe: Fill address info for samples Leo Yan
2020-09-01  8:38 ` [PATCH v2 12/14] perf arm-spe: Synthesize memory event Leo Yan
2020-09-01  8:38 ` [PATCH v2 13/14] perf arm-spe: Set sample's data source field Leo Yan
2020-09-01  8:38 ` [PATCH v2 14/14] perf mem: Document options introduced by Arm SPE Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).