linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v3 0/5] perf tools: Add support for some spe events and precise ip
@ 2019-11-23 10:11 Tan Xiaojun
  2019-11-23 10:11 ` [RFC v3 1/5] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir Tan Xiaojun
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Tan Xiaojun @ 2019-11-23 10:11 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung, ak,
	adrian.hunter, yao.jin, tmricht, brueckner, songliubraving,
	gregkh, kim.phillips, James.Clark, jeremy.linton
  Cc: gengdongjiu, wxf.wang, liwei391, tanxiaojun, huawei.libin,
	linux-kernel, linux-perf-users

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patchset is to improve the "perf report" support for spe, and
further process the data. Currently, support for the three events
of llc-miss, tlb-miss, branch-miss and remote-access is added.

v1->v2:
Some cleanup and bugfix fixes were made, and support for the precise
ip of branch-misses was added. Thanks for the suggestions of Jeremy
and James.

v2->v3:
Mainly add four spe precise ip events, you can see through perf list.
More details in [5/5].

Tan Xiaojun (5):
  perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  perf tools: Add support for "report" for some spe events
  perf report: Add --spe options for arm-spe
  drivers: perf: add some arm spe events
  perf tools: Add support to process multi spe events

 drivers/perf/arm_spe_pmu.c                    |  44 +
 tools/perf/Documentation/perf-report.txt      |  10 +
 tools/perf/arch/arm64/util/arm-spe.c          |  47 +-
 tools/perf/builtin-report.c                   |   5 +
 tools/perf/util/Build                         |   2 +-
 tools/perf/util/arm-spe-decoder/Build         |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 +++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-pkt-decoder.c                     |   0
 .../arm-spe-pkt-decoder.h                     |   2 +
 tools/perf/util/arm-spe.c                     | 771 +++++++++++++++++-
 tools/perf/util/arm-spe.h                     |  20 +
 tools/perf/util/auxtrace.c                    |  49 ++
 tools/perf/util/auxtrace.h                    |  29 +
 tools/perf/util/session.h                     |   2 +
 15 files changed, 1231 insertions(+), 42 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (96%)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC v3 1/5] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  2019-11-23 10:11 [RFC v3 0/5] perf tools: Add support for some spe events and precise ip Tan Xiaojun
@ 2019-11-23 10:11 ` Tan Xiaojun
  2019-11-23 10:11 ` [RFC v3 2/5] perf tools: Add support for "report" for some spe events Tan Xiaojun
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Tan Xiaojun @ 2019-11-23 10:11 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung, ak,
	adrian.hunter, yao.jin, tmricht, brueckner, songliubraving,
	gregkh, kim.phillips, James.Clark, jeremy.linton
  Cc: gengdongjiu, wxf.wang, liwei391, tanxiaojun, huawei.libin,
	linux-kernel, linux-perf-users

Create a new arm-spe-decoder directory for subsequent extensions and
move arm-spe-pkt-decoder.h/c to this directory. No code changes.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
---
 tools/perf/util/Build                                       | 2 +-
 tools/perf/util/arm-spe-decoder/Build                       | 1 +
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c | 0
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h | 0
 tools/perf/util/arm-spe.c                                   | 2 +-
 5 files changed, 3 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (100%)

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 8dcfca1a882f..f2e5a217e0aa 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -100,7 +100,7 @@ perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 perf-$(CONFIG_AUXTRACE) += intel-pt.o
 perf-$(CONFIG_AUXTRACE) += intel-bts.o
 perf-$(CONFIG_AUXTRACE) += arm-spe.o
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-decoder/
 perf-$(CONFIG_AUXTRACE) += s390-cpumsf.o
 
 ifdef CONFIG_LIBOPENCSD
diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
new file mode 100644
index 000000000000..16efbc245028
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -0,0 +1 @@
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
diff --git a/tools/perf/util/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.c
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
diff --git a/tools/perf/util/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.h
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 53be12b23ff4..f3382a38d48e 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -23,7 +23,7 @@
 #include "debug.h"
 #include "auxtrace.h"
 #include "arm-spe.h"
-#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
 struct arm_spe {
 	struct auxtrace			auxtrace;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v3 2/5] perf tools: Add support for "report" for some spe events
  2019-11-23 10:11 [RFC v3 0/5] perf tools: Add support for some spe events and precise ip Tan Xiaojun
  2019-11-23 10:11 ` [RFC v3 1/5] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir Tan Xiaojun
@ 2019-11-23 10:11 ` Tan Xiaojun
  2019-11-23 10:11 ` [RFC v3 3/5] perf report: Add --spe options for arm-spe Tan Xiaojun
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Tan Xiaojun @ 2019-11-23 10:11 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung, ak,
	adrian.hunter, yao.jin, tmricht, brueckner, songliubraving,
	gregkh, kim.phillips, James.Clark, jeremy.linton
  Cc: gengdongjiu, wxf.wang, liwei391, tanxiaojun, huawei.libin,
	linux-kernel, linux-perf-users

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patch is to improve the "perf report" support for spe, and
further process the data. Currently, support for the four events
of llc-miss, tlb-miss, branch-miss, and remote-access is added.

Example usage:

$ ./perf record -c 1024 -e arm_spe_0/branch_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ -o perf-armspe-dd.data dd if=/dev/zero of=/dev/null count=10000

$ ./perf report -i perf-armspe-dd.data --stdio
--------------------------------------------------------------------
...
 # Samples: 23  of event 'llc-miss'
 # Event count (approx.): 23
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    12.12%    12.12%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     6.06%     6.06%  dd       [kernel.kallsyms]  [k] copy_page
     6.06%     6.06%  dd       ld-2.28.so         [.] _dl_relocate_object
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] change_protection_range
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] generic_permission
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] kmem_cache_alloc
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] lookup_fast
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] perf_event_exec
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] ring_buffer_record_is_on
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_start
     3.03%     3.03%  dd       ld-2.28.so         [.] dl_main
     3.03%     3.03%  dd       ld-2.28.so         [.] strcmp
     3.03%     3.03%  dd       libc-2.28.so       [.] _dl_addr
...
 # Samples: 3  of event 'tlb-miss'
 # Event count (approx.): 3
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] filemap_map_pages
    33.33%    33.33%  dd       ld-2.28.so         [.] _dl_start
    33.33%    33.33%  dd       ld-2.28.so         [.] dl_main
...
 # Samples: 20  of event 'branch-miss'
 # Event count (approx.): 20
...
    15.38%    15.38%  dd       [kernel.kallsyms]  [k] __fput
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] do_el0_ia_bp_hardening
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] pagevec_lru_move_fn
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] perf_event_mmap_output
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] task_work_run
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] unmap_single_vma
     7.69%     7.69%  dd       libc-2.28.so       [.] _IO_flush_all_lockp
     7.69%     7.69%  dd       libc-2.28.so       [.] __memcpy_generic
     7.69%     7.69%  dd       libc-2.28.so       [.] _dl_addr
     7.69%     7.69%  dd       libc-2.28.so       [.] msort_with_tmp.part.0
     7.69%     7.69%  dd       libc-2.28.so       [.] read_alias_file
...
 # Samples: 5  of event 'remote-access'
 # Event count (approx.): 5
...
    27.78%    27.78%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    16.67%    16.67%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] change_protection_range
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] generic_permission
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] lookup_fast
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] perf_event_exec
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_relocate_object
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_start
     5.56%     5.56%  dd       ld-2.28.so         [.] dl_main

--------------------------------------------------------------------
After that, more analysis and processing of the raw data of spe
will be done.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
---
 tools/perf/builtin-report.c                   |   5 +
 tools/perf/util/arm-spe-decoder/Build         |   2 +-
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-decoder/arm-spe-pkt-decoder.h     |   2 +
 tools/perf/util/arm-spe.c                     | 744 +++++++++++++++++-
 tools/perf/util/auxtrace.c                    |  49 ++
 tools/perf/util/auxtrace.h                    |  29 +
 tools/perf/util/session.h                     |   2 +
 9 files changed, 1087 insertions(+), 37 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index aae0e57c60fb..131537778253 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1021,6 +1021,7 @@ int cmd_report(int argc, const char **argv)
 {
 	struct perf_session *session;
 	struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
+	struct arm_spe_synth_opts arm_spe_synth_opts = { .set = 0, };
 	struct stat st;
 	bool has_br_stack = false;
 	int branch_mode = -1;
@@ -1179,6 +1180,9 @@ int cmd_report(int argc, const char **argv)
 	OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
 			    "Instruction Tracing options\n" ITRACE_HELP,
 			    itrace_parse_synth_opts),
+	OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts",
+			    "ARM SPE Tracing options",
+			    arm_spe_parse_synth_opts),
 	OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
 			"Show full source file name path for source lines"),
 	OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
@@ -1285,6 +1289,7 @@ int cmd_report(int argc, const char **argv)
 	}
 
 	session->itrace_synth_opts = &itrace_synth_opts;
+	session->arm_spe_synth_opts = &arm_spe_synth_opts;
 
 	report.session = session;
 
diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
index 16efbc245028..f8dae13fc876 100644
--- a/tools/perf/util/arm-spe-decoder/Build
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -1 +1 @@
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
new file mode 100644
index 000000000000..50e796b89a95
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/zalloc.h>
+
+#include "../util.h"
+#include "../debug.h"
+#include "../auxtrace.h"
+
+#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder.h"
+
+#ifndef BIT
+#define BIT(n)		(1UL << (n))
+#endif
+
+struct arm_spe_decoder {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+	struct arm_spe_state state;
+	const unsigned char *buf;
+	size_t len;
+	uint64_t pos;
+	struct arm_spe_pkt packet;
+	int pkt_step;
+	int pkt_len;
+	int last_packet_type;
+
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t timestamp;
+	uint64_t sample_timestamp;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
+};
+
+static uint64_t arm_spe_calc_ip(uint64_t payload)
+{
+	uint64_t ip = (payload & ~(0xffULL << 56));
+
+	/* fill high 8 bits for kernel virtual address */
+	/* In Armv8 Architecture Reference Manual: Xn[55] determines
+	 * whether the address lies in the upper or lower address range
+	 * for the purpose of determining whether address tagging is
+	 * used */
+	if (ip & BIT(55))
+		ip |= (uint64_t)(0xffULL << 56);
+
+	return ip;
+}
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
+{
+	struct arm_spe_decoder *decoder;
+
+	if (!params->get_trace)
+		return NULL;
+
+	decoder = zalloc(sizeof(struct arm_spe_decoder));
+	if (!decoder)
+		return NULL;
+
+	decoder->get_trace          = params->get_trace;
+	decoder->data               = params->data;
+
+	return decoder;
+}
+
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
+{
+	free(decoder);
+}
+
+static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
+{
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	pr_debug("ERROR: Bad packet\n");
+
+	return -EBADMSG;
+}
+
+
+static int arm_spe_get_data(struct arm_spe_decoder *decoder)
+{
+	struct arm_spe_buffer buffer = { .buf = 0, };
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	pr_debug("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		pr_debug("No more data\n");
+		return -ENODATA;
+	}
+
+	return 0;
+}
+
+static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
+{
+	return arm_spe_get_data(decoder);
+}
+
+static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
+{
+	int ret;
+
+	decoder->last_packet_type = decoder->packet.type;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+
+		if (!decoder->len) {
+			ret = arm_spe_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = arm_spe_get_packet(decoder->buf, decoder->len,
+				&decoder->packet);
+		if (ret <= 0)
+			return arm_spe_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+	} while (decoder->packet.type == ARM_SPE_PAD);
+
+	return 0;
+}
+
+static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
+{
+	int err;
+	int idx;
+	uint64_t payload;
+
+	while (1) {
+		err = arm_spe_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		idx = decoder->packet.index;
+		payload = decoder->packet.payload;
+
+		switch (decoder->packet.type) {
+		case ARM_SPE_TIMESTAMP:
+			decoder->sample_timestamp = payload;
+			return 0;
+		case ARM_SPE_END:
+			decoder->sample_timestamp = 0;
+			return 0;
+		case ARM_SPE_ADDRESS:
+			decoder->ip = arm_spe_calc_ip(payload);
+			if (idx == 0)
+				decoder->state.from_ip = decoder->ip;
+			else if (idx == 1)
+				decoder->state.to_ip = decoder->ip;
+			break;
+		case ARM_SPE_COUNTER:
+			break;
+		case ARM_SPE_CONTEXT:
+			break;
+		case ARM_SPE_OP_TYPE:
+			break;
+		case ARM_SPE_EVENTS:
+			if (payload & BIT(EV_TLB_REFILL))
+				decoder->state.type |= ARM_SPE_TLB_MISS;
+			if (payload & BIT(EV_MISPRED))
+				decoder->state.type |= ARM_SPE_BRANCH_MISS;
+			if (idx > 1 && (payload & BIT(EV_LLC_REFILL)))
+				decoder->state.type |= ARM_SPE_LLC_MISS;
+			if (idx > 1 && (payload & BIT(EV_REMOTE_ACCESS)))
+				decoder->state.type |= ARM_SPE_REMOTE_ACCESS;
+
+			break;
+		case ARM_SPE_DATA_SOURCE:
+			break;
+		case ARM_SPE_BAD:
+			break;
+		case ARM_SPE_PAD:
+			break;
+		default:
+			pr_err("Get Packet Error!\n");
+			return -ENOSYS;
+		}
+	}
+}
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
+{
+	int err;
+
+	decoder->state.type = 0;
+
+	err = arm_spe_walk_trace(decoder);
+	if (err)
+		decoder->state.err = err;
+
+	decoder->state.timestamp = decoder->sample_timestamp;
+
+	return &decoder->state;
+}
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
new file mode 100644
index 000000000000..330f9e1e71ab
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef INCLUDE__ARM_SPE_DECODER_H__
+#define INCLUDE__ARM_SPE_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+enum arm_spe_events {
+	EV_EXCEPTION_GEN,
+	EV_RETIRED,
+	EV_L1D_ACCESS,
+	EV_L1D_REFILL,
+	EV_TLB_ACCESS,
+	EV_TLB_REFILL,
+	EV_NOT_TAKEN,
+	EV_MISPRED,
+	EV_LLC_ACCESS,
+	EV_LLC_REFILL,
+	EV_REMOTE_ACCESS,
+};
+
+enum arm_spe_sample_type {
+	ARM_SPE_LLC_MISS	= 1 << 0,
+	ARM_SPE_TLB_MISS	= 1 << 1,
+	ARM_SPE_BRANCH_MISS	= 1 << 2,
+	ARM_SPE_REMOTE_ACCESS	= 1 << 3,
+	ARM_SPE_EX_STOP		= 1 << 6,
+};
+
+struct arm_spe_state {
+	enum arm_spe_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t timestamp;
+};
+
+struct arm_spe_insn;
+
+struct arm_spe_buffer {
+	const unsigned char *buf;
+	size_t len;
+	u64 offset;
+	bool consecutive;
+	uint64_t ref_timestamp;
+	uint64_t trace_nr;
+};
+
+struct arm_spe_params {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+};
+
+struct arm_spe_decoder;
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params);
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder);
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder);
+
+#endif
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
index d786ef65113f..865d1e35b401 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -15,6 +15,8 @@
 #define ARM_SPE_NEED_MORE_BYTES		-1
 #define ARM_SPE_BAD_PACKET		-2
 
+#define ARM_SPE_PKT_MAX_SZ		16
+
 enum arm_spe_pkt_type {
 	ARM_SPE_BAD,
 	ARM_SPE_PAD,
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index f3382a38d48e..e7282c2616f3 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -16,34 +16,68 @@
 #include <linux/log2.h>
 #include <linux/zalloc.h>
 
+#include "auxtrace.h"
 #include "color.h"
+#include "debug.h"
 #include "evsel.h"
+#include "evlist.h"
 #include "machine.h"
 #include "session.h"
-#include "debug.h"
-#include "auxtrace.h"
+#include "symbol.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "tool.h"
+#include "util/synthetic-events.h"
+
 #include "arm-spe.h"
+#include "arm-spe-decoder/arm-spe-decoder.h"
 #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
+#define MAX_TIMESTAMP (~0ULL)
+
 struct arm_spe {
 	struct auxtrace			auxtrace;
 	struct auxtrace_queues		queues;
 	struct auxtrace_heap		heap;
+	struct arm_spe_synth_opts	synth_opts;
 	u32				auxtrace_type;
 	struct perf_session		*session;
 	struct machine			*machine;
 	u32				pmu_type;
+
+	u8				timeless_decoding;
+	u8				data_queued;
+
+	u8				sample_llc_miss;
+	u8				sample_tlb_miss;
+	u8				sample_branch_miss;
+	u8				sample_remote_access;
+	u64				llc_miss_id;
+	u64				tlb_miss_id;
+	u64				branch_miss_id;
+	u64				remote_access_id;
+	u64				kernel_start;
+
+	unsigned long			num_events;
 };
 
 struct arm_spe_queue {
-	struct arm_spe		*spe;
-	unsigned int		queue_nr;
-	struct auxtrace_buffer	*buffer;
-	bool			on_heap;
-	bool			done;
-	pid_t			pid;
-	pid_t			tid;
-	int			cpu;
+	struct arm_spe			*spe;
+	unsigned int			queue_nr;
+	struct auxtrace_buffer		*buffer;
+	struct auxtrace_buffer		*old_buffer;
+	union perf_event		*event_buf;
+	bool				on_heap;
+	bool				done;
+	pid_t				pid;
+	pid_t				tid;
+	int				cpu;
+	void				*decoder;
+	const struct arm_spe_state	*state;
+	u64				time;
+	u64				timestamp;
+	struct thread			*thread;
+	bool				have_sample;
 };
 
 static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
@@ -92,44 +126,494 @@ static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
 	arm_spe_dump(spe, buf, len);
 }
 
-static int arm_spe_process_event(struct perf_session *session __maybe_unused,
-				 union perf_event *event __maybe_unused,
-				 struct perf_sample *sample __maybe_unused,
-				 struct perf_tool *tool __maybe_unused)
+static int arm_spe_get_trace(struct arm_spe_buffer *b, void *data)
+{
+	struct arm_spe_queue *speq = data;
+	struct auxtrace_buffer *buffer = speq->buffer;
+	struct auxtrace_buffer *old_buffer = speq->old_buffer;
+	struct auxtrace_queue *queue;
+
+	queue = &speq->spe->queues.queue_array[speq->queue_nr];
+
+	buffer = auxtrace_buffer__next(queue, buffer);
+	/* If no more data, drop the previous auxtrace_buffer and return */
+	if (!buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	speq->buffer = buffer;
+
+	/* If the aux_buffer doesn't have data associated, try to load it */
+	if (!buffer->data) {
+		/* get the file desc associated with the perf data file */
+		int fd = perf_data__fd(speq->spe->session->data);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (buffer->use_data) {
+		b->len = buffer->use_size;
+		b->buf = buffer->use_data;
+	} else {
+		b->len = buffer->size;
+		b->buf = buffer->data;
+	}
+
+	b->ref_timestamp = buffer->reference;
+
+	if (b->len) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		speq->old_buffer = buffer;
+	} else {
+		auxtrace_buffer__drop_data(buffer);
+		return arm_spe_get_trace(b, data);
+	}
+
+	return 0;
+}
+
+static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe,
+		unsigned int queue_nr)
+{
+	struct arm_spe_params params = { .get_trace = 0, };
+	struct arm_spe_queue *speq;
+
+	speq = zalloc(sizeof(*speq));
+	if (!speq)
+		return NULL;
+
+	speq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!speq->event_buf)
+		goto out_free;
+
+	speq->spe = spe;
+	speq->queue_nr = queue_nr;
+	speq->pid = -1;
+	speq->tid = -1;
+	speq->cpu = -1;
+
+	/* params set */
+	params.get_trace = arm_spe_get_trace;
+	params.data = speq;
+
+	/* create new decoder */
+	speq->decoder = arm_spe_decoder_new(&params);
+	if (!speq->decoder)
+		goto out_free;
+
+	return speq;
+
+out_free:
+	zfree(&speq->event_buf);
+	free(speq);
+
+	return NULL;
+}
+
+static inline u8 arm_spe_cpumode(struct arm_spe *spe, uint64_t ip)
+{
+	return ip >= spe->kernel_start ?
+		PERF_RECORD_MISC_KERNEL :
+		PERF_RECORD_MISC_USER;
+}
+
+static void arm_spe_prep_sample(struct arm_spe *spe,
+				struct arm_spe_queue *speq,
+				union perf_event *event,
+				struct perf_sample *sample)
+{
+	if (!spe->timeless_decoding)
+		sample->time = speq->timestamp;
+
+	sample->ip = speq->state->from_ip;
+	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
+	sample->pid = speq->pid;
+	sample->tid = speq->tid;
+	sample->addr = speq->state->to_ip;
+	sample->period = 1;
+	sample->cpu = speq->cpu;
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = sample->cpumode;
+	event->sample.header.size = sizeof(struct perf_event_header);
+}
+
+static inline int
+arm_spe_deliver_synth_event(struct arm_spe *spe,
+			    struct arm_spe_queue *speq __maybe_unused,
+			    union perf_event *event,
+			    struct perf_sample *sample)
+{
+	int ret;
+
+	ret = perf_session__deliver_synth_event(spe->session, event, sample);
+	if (ret)
+		pr_err("ARM SPE: failed to deliver event, error %d\n", ret);
+
+	return ret;
+}
+
+static int
+arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
+				u64 spe_events_id)
+{
+	struct arm_spe *spe = speq->spe;
+	union perf_event *event = speq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	arm_spe_prep_sample(spe, speq, event, &sample);
+
+	sample.id = spe_events_id;
+	sample.stream_id = spe_events_id;
+
+	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+static int arm_spe_sample(struct arm_spe_queue *speq)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!speq->have_sample)
+		return 0;
+
+	speq->have_sample = false;
+
+	if (spe->sample_llc_miss && (state->type & ARM_SPE_LLC_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->llc_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_tlb_miss && (state->type & ARM_SPE_TLB_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->tlb_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_branch_miss && (state->type & ARM_SPE_BRANCH_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq,
+						      spe->branch_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_remote_access && (state->type & ARM_SPE_REMOTE_ACCESS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->remote_access_id);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!spe->kernel_start)
+		spe->kernel_start = machine__kernel_start(spe->machine);
+
+	while (1) {
+		err = arm_spe_sample(speq);
+		if (err)
+			return err;
+
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("No data or all data has been processed.\n");
+				return 1;
+			}
+			continue;
+		}
+
+		speq->state = state;
+		speq->have_sample = true;
+
+		if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
+			*timestamp = speq->timestamp;
+			return 0;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queue(struct arm_spe *spe,
+			       struct auxtrace_queue *queue,
+			       unsigned int queue_nr)
+{
+	struct arm_spe_queue *speq = queue->priv;
+
+	if (list_empty(&queue->head) || speq)
+		return 0;
+
+	speq = arm_spe__alloc_queue(spe, queue_nr);
+
+	if (!speq)
+		return -ENOMEM;
+
+	queue->priv = speq;
+
+	if (queue->cpu != -1)
+		speq->cpu = queue->cpu;
+
+	if (!speq->on_heap) {
+		const struct arm_spe_state *state;
+		int ret;
+
+		if (spe->timeless_decoding)
+			return 0;
+
+retry:
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("queue %u has no timestamp\n",
+						queue_nr);
+				return 0;
+			}
+			goto retry;
+		}
+
+		speq->timestamp = state->timestamp;
+		speq->state = state;
+		speq->have_sample = true;
+		ret = auxtrace_heap__add(&spe->heap, queue_nr, speq->timestamp);
+		if (ret)
+			return ret;
+		speq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queues(struct arm_spe *spe)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < spe->queues.nr_queues; i++) {
+		ret = arm_spe__setup_queue(spe, &spe->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int arm_spe__update_queues(struct arm_spe *spe)
+{
+	if (spe->queues.new_data) {
+		spe->queues.new_data = false;
+		return arm_spe__setup_queues(spe);
+	}
+
+	return 0;
+}
+
+static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
+{
+	struct evsel *evsel;
+	struct evlist *evlist = spe->session->evlist;
+	bool timeless_decoding = true;
+
+	/*
+	 * Circle through the list of event and complain if we find one
+	 * with the time bit set.
+	 */
+	evlist__for_each_entry(evlist, evsel) {
+		if ((evsel->core.attr.sample_type & PERF_SAMPLE_TIME))
+			timeless_decoding = false;
+	}
+
+	return timeless_decoding;
+}
+
+static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
+				    struct auxtrace_queue *queue)
+{
+	struct arm_spe_queue *speq = queue->priv;
+	pid_t tid;
+
+	tid = machine__get_current_tid(spe->machine, speq->cpu);
+	if (tid != -1) {
+		speq->tid = tid;
+		thread__zput(speq->thread);
+	} else
+		speq->tid = queue->tid;
+
+	if ((!speq->thread) && (speq->tid != -1)) {
+		speq->thread = machine__find_thread(spe->machine, -1,
+						    speq->tid);
+	}
+
+	if (speq->thread) {
+		speq->pid = speq->thread->pid_;
+		if (queue->cpu == -1)
+			speq->cpu = speq->thread->cpu;
+	}
+}
+
+static int arm_spe_process_queues(struct arm_spe *spe, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct arm_spe_queue *speq;
+
+		if (!spe->heap.heap_cnt)
+			return 0;
+
+		if (spe->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = spe->heap.heap_array[0].queue_nr;
+		queue = &spe->queues.queue_array[queue_nr];
+		speq = queue->priv;
+
+		auxtrace_heap__pop(&spe->heap);
+
+		if (spe->heap.heap_cnt) {
+			ts = spe->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		arm_spe_set_pid_tid_cpu(spe, queue);
+
+		ret = arm_spe_run_decoder(speq, &ts);
+		if (ret < 0) {
+			auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			speq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid,
+					    u64 time_)
 {
+	struct auxtrace_queues *queues = &spe->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &spe->queues.queue_array[i];
+		struct arm_spe_queue *speq = queue->priv;
+
+		if (speq && (tid == -1 || speq->tid == tid)) {
+			speq->time = time_;
+			arm_spe_set_pid_tid_cpu(spe, queue);
+			arm_spe_run_decoder(speq, &ts);
+		}
+	}
 	return 0;
 }
 
+static int arm_spe_process_event(struct perf_session *session,
+				 union perf_event *event,
+				 struct perf_sample *sample,
+				 struct perf_tool *tool)
+{
+	int err = 0;
+	u64 timestamp;
+	struct arm_spe *spe = container_of(session->auxtrace,
+			struct arm_spe, auxtrace);
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("CoreSight SPE Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time && (sample->time != (u64) -1))
+		timestamp = sample->time;
+	else
+		timestamp = 0;
+
+	if (timestamp || spe->timeless_decoding) {
+		err = arm_spe__update_queues(spe);
+		if (err)
+			return err;
+	}
+
+	if (spe->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_timeless_queues(spe,
+					event->fork.tid,
+					sample->time);
+		}
+	} else if (timestamp) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_queues(spe, timestamp);
+			if (err)
+				return err;
+		}
+	}
+
+	return err;
+}
+
 static int arm_spe_process_auxtrace_event(struct perf_session *session,
 					  union perf_event *event,
 					  struct perf_tool *tool __maybe_unused)
 {
 	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
 					     auxtrace);
-	struct auxtrace_buffer *buffer;
-	off_t data_offset;
-	int fd = perf_data__fd(session->data);
-	int err;
 
-	if (perf_data__is_pipe(session->data)) {
-		data_offset = 0;
-	} else {
-		data_offset = lseek(fd, 0, SEEK_CUR);
-		if (data_offset == -1)
-			return -errno;
-	}
+	if (!spe->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data__fd(session->data);
+		int err;
 
-	err = auxtrace_queues__add_event(&spe->queues, session, event,
-					 data_offset, &buffer);
-	if (err)
-		return err;
-
-	/* Dump here now we have copied a piped trace out of the pipe */
-	if (dump_trace) {
-		if (auxtrace_buffer__get_data(buffer, fd)) {
-			arm_spe_dump_event(spe, buffer->data,
-					     buffer->size);
-			auxtrace_buffer__put_data(buffer);
+		if (perf_data__is_pipe(session->data)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&spe->queues, session, event,
+				data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				arm_spe_dump_event(spe, buffer->data,
+						buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
 		}
 	}
 
@@ -139,6 +623,25 @@ static int arm_spe_process_auxtrace_event(struct perf_session *session,
 static int arm_spe_flush(struct perf_session *session __maybe_unused,
 			 struct perf_tool *tool __maybe_unused)
 {
+	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
+			auxtrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = arm_spe__update_queues(spe);
+	if (ret < 0)
+		return ret;
+
+	if (spe->timeless_decoding)
+		return arm_spe_process_timeless_queues(spe, -1,
+				MAX_TIMESTAMP - 1);
+
+	return arm_spe_process_queues(spe, MAX_TIMESTAMP);
 	return 0;
 }
 
@@ -148,6 +651,9 @@ static void arm_spe_free_queue(void *priv)
 
 	if (!speq)
 		return;
+	thread__zput(speq->thread);
+	arm_spe_decoder_free(speq->decoder);
+	zfree(&speq->event_buf);
 	free(speq);
 }
 
@@ -188,6 +694,149 @@ static void arm_spe_print_info(__u64 *arr)
 	fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
 }
 
+struct arm_spe_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int arm_spe_event_synth(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample __maybe_unused,
+			       struct machine *machine __maybe_unused)
+{
+	struct arm_spe_synth *arm_spe_synth =
+		      container_of(tool, struct arm_spe_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(arm_spe_synth->session,
+						 event, NULL);
+}
+
+static int arm_spe_synth_event(struct perf_session *session,
+			       struct perf_event_attr *attr, u64 id)
+{
+	struct arm_spe_synth arm_spe_synth;
+
+	memset(&arm_spe_synth, 0, sizeof(struct arm_spe_synth));
+	arm_spe_synth.session = session;
+
+	return perf_event__synthesize_attr(&arm_spe_synth.dummy_tool, attr, 1,
+					   &id, arm_spe_event_synth);
+}
+
+static void arm_spe_set_event_name(struct evlist *evlist, u64 id,
+				    const char *name)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.id && evsel->core.id[0] == id) {
+			if (evsel->name)
+				zfree(&evsel->name);
+			evsel->name = strdup(name);
+			break;
+		}
+	}
+}
+
+static int
+arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
+{
+	struct evlist *evlist = session->evlist;
+	struct evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.attr.type == spe->pmu_type) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("No selected events with CoreSight Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+		PERF_SAMPLE_PERIOD;
+	if (spe->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+
+	attr.exclude_user = evsel->core.attr.exclude_user;
+	attr.exclude_kernel = evsel->core.attr.exclude_kernel;
+	attr.exclude_hv = evsel->core.attr.exclude_hv;
+	attr.exclude_host = evsel->core.attr.exclude_host;
+	attr.exclude_guest = evsel->core.attr.exclude_guest;
+	attr.sample_id_all = evsel->core.attr.sample_id_all;
+	attr.read_format = evsel->core.attr.read_format;
+
+	/* create new id val to be a fixed offset from evsel id */
+	id = evsel->core.id[0] + 1000000000;
+
+	if (!id)
+		id = 1;
+
+	/* spe events set */
+	if (spe->synth_opts.llc_miss) {
+		spe->sample_llc_miss = true;
+
+		/* llc-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->llc_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "llc-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.tlb_miss) {
+		spe->sample_tlb_miss = true;
+
+		/* tlb-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->tlb_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "tlb-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.branch_miss) {
+		spe->sample_branch_miss = true;
+
+		/* branch-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->branch_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "branch-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.remote_access) {
+		spe->sample_remote_access = true;
+
+		/* remote-access */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->remote_access_id = id;
+		arm_spe_set_event_name(evlist, id, "remote-access");
+		id += 1;
+	}
+
+	return 0;
+}
+
 int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session)
 {
@@ -213,6 +862,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->auxtrace_type = auxtrace_info->type;
 	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
 
+	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 	spe->auxtrace.process_event = arm_spe_process_event;
 	spe->auxtrace.process_auxtrace_event = arm_spe_process_auxtrace_event;
 	spe->auxtrace.flush_events = arm_spe_flush;
@@ -222,8 +872,30 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 
 	arm_spe_print_info(&auxtrace_info->priv[0]);
 
+	if (dump_trace)
+		return 0;
+
+	if (session->arm_spe_synth_opts && session->arm_spe_synth_opts->set)
+		spe->synth_opts = *session->arm_spe_synth_opts;
+	else
+		arm_spe_synth_opts__set_default(&spe->synth_opts);
+
+	err = arm_spe_synth_events(spe, session);
+	if (err)
+		goto err_free_queues;
+
+	err = auxtrace_queues__process_index(&spe->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (spe->queues.populated)
+		spe->data_queued = true;
+
 	return 0;
 
+err_free_queues:
+	auxtrace_queues__free(&spe->queues);
+	session->auxtrace = NULL;
 err_free:
 	free(spe);
 	return err;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 8470dfe9fe97..e5dd0d32fa33 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1154,6 +1154,55 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 	return -EINVAL;
 }
 
+void arm_spe_synth_opts__set_default(struct arm_spe_synth_opts *synth_opts)
+{
+	synth_opts->llc_miss = true;
+	synth_opts->tlb_miss = true;
+	synth_opts->branch_miss = true;
+	synth_opts->remote_access = true;
+}
+
+int arm_spe_parse_synth_opts(const struct option *opt, const char *str,
+			    int unset __maybe_unused)
+{
+	struct arm_spe_synth_opts *synth_opts = opt->value;
+	const char *p;
+
+	synth_opts->set = true;
+
+	if (!str) {
+		arm_spe_synth_opts__set_default(synth_opts);
+		return 0;
+	}
+
+	for (p = str; *p;) {
+		switch (*p++) {
+		case 'l':
+			synth_opts->llc_miss = true;
+			break;
+		case 't':
+			synth_opts->tlb_miss = true;
+			break;
+		case 'b':
+			synth_opts->branch_miss = true;
+			break;
+		case 'r':
+			synth_opts->remote_access = true;
+			break;
+		case ' ':
+		case ',':
+			break;
+		default:
+			goto out_err;
+		}
+	}
+
+	return 0;
+
+out_err:
+	pr_err("Bad ARM SPE Tracing options '%s'\n", str);
+	return -EINVAL;
+}
 static const char * const auxtrace_error_type_name[] = {
 	[PERF_AUXTRACE_ERROR_ITRACE] = "instruction trace",
 };
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index f201f36bc35f..e3d4438ef2cb 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -111,6 +111,22 @@ struct itrace_synth_opts {
 	int			range_num;
 };
 
+/**
+ * struct arm_spe_synth_opts - ARM SPE tracing synthesis options.
+ * @set: indicates whether or not options have been set
+ * @llc_miss: whether to synthesize last level cache miss events
+ * @tlb_miss: whether to synthesize TLB miss events
+ * @branch_miss: whether to synthesize Branch miss events
+ * @remote_access: whether to synthesize Remote access events
+ */
+struct arm_spe_synth_opts {
+	bool			set;
+	bool			llc_miss;
+	bool			tlb_miss;
+	bool			branch_miss;
+	bool			remote_access;
+};
+
 /**
  * struct auxtrace_index_entry - indexes a AUX area tracing event within a
  *                               perf.data file.
@@ -536,6 +552,10 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
 				    bool no_sample);
 
+int arm_spe_parse_synth_opts(const struct option *opt, const char *str,
+			    int unset);
+void arm_spe_synth_opts__set_default(struct arm_spe_synth_opts *synth_opts);
+
 size_t perf_event__fprintf_auxtrace_error(union perf_event *event, FILE *fp);
 void perf_session__auxtrace_error_inc(struct perf_session *session,
 				      union perf_event *event);
@@ -636,6 +656,15 @@ int itrace_parse_synth_opts(const struct option *opt __maybe_unused,
 	return -EINVAL;
 }
 
+static inline
+int arm_spe_parse_synth_opts(const struct option *opt __maybe_unused,
+			    const char *str __maybe_unused,
+			    int unset __maybe_unused)
+{
+	pr_err("ARM SPE area tracing not supported\n");
+	return -EINVAL;
+}
+
 static inline
 int auxtrace_parse_snapshot_options(struct auxtrace_record *itr __maybe_unused,
 				    struct record_opts *opts __maybe_unused,
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index b4c9428c18f0..fbd9f8e3fca3 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -19,6 +19,7 @@ struct thread;
 
 struct auxtrace;
 struct itrace_synth_opts;
+struct arm_spe_synth_opts;
 
 struct perf_session {
 	struct perf_header	header;
@@ -26,6 +27,7 @@ struct perf_session {
 	struct evlist	*evlist;
 	struct auxtrace		*auxtrace;
 	struct itrace_synth_opts *itrace_synth_opts;
+	struct arm_spe_synth_opts *arm_spe_synth_opts;
 	struct list_head	auxtrace_index;
 	struct trace_event	tevent;
 	struct perf_record_time_conv	time_conv;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v3 3/5] perf report: Add --spe options for arm-spe
  2019-11-23 10:11 [RFC v3 0/5] perf tools: Add support for some spe events and precise ip Tan Xiaojun
  2019-11-23 10:11 ` [RFC v3 1/5] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir Tan Xiaojun
  2019-11-23 10:11 ` [RFC v3 2/5] perf tools: Add support for "report" for some spe events Tan Xiaojun
@ 2019-11-23 10:11 ` Tan Xiaojun
  2019-11-23 10:11 ` [RFC v3 4/5] drivers: perf: add some arm spe events Tan Xiaojun
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Tan Xiaojun @ 2019-11-23 10:11 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung, ak,
	adrian.hunter, yao.jin, tmricht, brueckner, songliubraving,
	gregkh, kim.phillips, James.Clark, jeremy.linton
  Cc: gengdongjiu, wxf.wang, liwei391, tanxiaojun, huawei.libin,
	linux-kernel, linux-perf-users

The previous patch added support in "perf report" for some arm-spe
events(llc-miss, tlb-miss, branch-miss, remote_access). This patch adds their help
instructions.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
---
 tools/perf/Documentation/perf-report.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 7315f155803f..188a9477558b 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -462,6 +462,16 @@ include::itrace.txt[]
 
 	To disable decoding entirely, use --no-itrace.
 
+--spe::
+	Options for decoding arm-spe tracing data. The options are:
+
+		l	synthesize llc miss events
+		t	synthesize tlb miss events
+		b	synthesize branch miss events
+		r	synthesize remote access events
+
+	The default is all events i.e. the same as --spe=ltbr
+
 --full-source-path::
 	Show the full path for source files for srcline output.
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v3 4/5] drivers: perf: add some arm spe events
  2019-11-23 10:11 [RFC v3 0/5] perf tools: Add support for some spe events and precise ip Tan Xiaojun
                   ` (2 preceding siblings ...)
  2019-11-23 10:11 ` [RFC v3 3/5] perf report: Add --spe options for arm-spe Tan Xiaojun
@ 2019-11-23 10:11 ` Tan Xiaojun
  2019-11-23 10:11 ` [RFC v3 5/5] perf tools: Add support to process multi " Tan Xiaojun
  2019-12-02  7:07 ` [RFC v3 0/5] perf tools: Add support for some spe events and precise ip Qi Liu
  5 siblings, 0 replies; 13+ messages in thread
From: Tan Xiaojun @ 2019-11-23 10:11 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung, ak,
	adrian.hunter, yao.jin, tmricht, brueckner, songliubraving,
	gregkh, kim.phillips, James.Clark, jeremy.linton
  Cc: gengdongjiu, wxf.wang, liwei391, tanxiaojun, huawei.libin,
	linux-kernel, linux-perf-users

Add some definitions of arm spe events, these are precise ip events.
Displayed in the perf list as follows:

---------------------------------------------------------------------
...
arm_spe_0//                                        [Kernel PMU event]
arm_spe_0/branch_miss/                             [Kernel PMU event]
arm_spe_0/llc_miss/                                [Kernel PMU event]
arm_spe_0/remote_access/                           [Kernel PMU event]
arm_spe_0/tlb_miss/                                [Kernel PMU event]
...
---------------------------------------------------------------------

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
---
 drivers/perf/arm_spe_pmu.c | 44 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 4e4984a55cd1..4df9abdb2255 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -161,6 +161,9 @@ static struct attribute_group arm_spe_pmu_cap_group = {
 #define ATTR_CFG_FLD_pct_enable_CFG		config	/* PMSCR_EL1.PCT */
 #define ATTR_CFG_FLD_pct_enable_LO		2
 #define ATTR_CFG_FLD_pct_enable_HI		2
+#define ATTR_CFG_FLD_event_CFG			config	/* ARM SPE EVENTS */
+#define ATTR_CFG_FLD_event_LO			3
+#define ATTR_CFG_FLD_event_HI			6
 #define ATTR_CFG_FLD_jitter_CFG			config	/* PMSIRR_EL1.RND */
 #define ATTR_CFG_FLD_jitter_LO			16
 #define ATTR_CFG_FLD_jitter_HI			16
@@ -174,6 +177,11 @@ static struct attribute_group arm_spe_pmu_cap_group = {
 #define ATTR_CFG_FLD_store_filter_LO		34
 #define ATTR_CFG_FLD_store_filter_HI		34
 
+#define ARM_SPE_EVENT_LLC_MISS			BIT(0)
+#define ARM_SPE_EVENT_BRANCH_MISS		BIT(1)
+#define ARM_SPE_EVENT_TLB_MISS			BIT(2)
+#define ARM_SPE_EVENT_REMOTE_ACCESS		BIT(3)
+
 #define ATTR_CFG_FLD_event_filter_CFG		config1	/* PMSEVFR_EL1 */
 #define ATTR_CFG_FLD_event_filter_LO		0
 #define ATTR_CFG_FLD_event_filter_HI		63
@@ -213,8 +221,43 @@ GEN_PMU_FORMAT_ATTR(load_filter);
 GEN_PMU_FORMAT_ATTR(store_filter);
 GEN_PMU_FORMAT_ATTR(event_filter);
 GEN_PMU_FORMAT_ATTR(min_latency);
+GEN_PMU_FORMAT_ATTR(event);
+
+static ssize_t
+arm_spe_events_sysfs_show(struct device *dev,
+		struct device_attribute *attr, char *page)
+{
+	struct perf_pmu_events_attr *pmu_attr;
+
+	pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
+
+	return sprintf(page, "event=0x%03llx\n", pmu_attr->id);
+}
+
+#define ARM_SPE_EVENT_ATTR(name, config) \
+	PMU_EVENT_ATTR(name, arm_spe_event_attr_##name, \
+		       config, arm_spe_events_sysfs_show)
+
+ARM_SPE_EVENT_ATTR(llc_miss, ARM_SPE_EVENT_LLC_MISS);
+ARM_SPE_EVENT_ATTR(branch_miss, ARM_SPE_EVENT_BRANCH_MISS);
+ARM_SPE_EVENT_ATTR(tlb_miss, ARM_SPE_EVENT_TLB_MISS);
+ARM_SPE_EVENT_ATTR(remote_access, ARM_SPE_EVENT_REMOTE_ACCESS);
+
+static struct attribute *arm_spe_pmu_event_attrs[] = {
+	&arm_spe_event_attr_llc_miss.attr.attr,
+	&arm_spe_event_attr_branch_miss.attr.attr,
+	&arm_spe_event_attr_tlb_miss.attr.attr,
+	&arm_spe_event_attr_remote_access.attr.attr,
+	NULL,
+};
+
+static struct attribute_group arm_spe_pmu_event_group = {
+	.name = "events",
+	.attrs	= arm_spe_pmu_event_attrs,
+};
 
 static struct attribute *arm_spe_pmu_formats_attr[] = {
+	&format_attr_event.attr,
 	&format_attr_ts_enable.attr,
 	&format_attr_pa_enable.attr,
 	&format_attr_pct_enable.attr,
@@ -252,6 +295,7 @@ static struct attribute_group arm_spe_pmu_group = {
 };
 
 static const struct attribute_group *arm_spe_pmu_attr_groups[] = {
+	&arm_spe_pmu_event_group,
 	&arm_spe_pmu_group,
 	&arm_spe_pmu_cap_group,
 	&arm_spe_pmu_format_group,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v3 5/5] perf tools: Add support to process multi spe events
  2019-11-23 10:11 [RFC v3 0/5] perf tools: Add support for some spe events and precise ip Tan Xiaojun
                   ` (3 preceding siblings ...)
  2019-11-23 10:11 ` [RFC v3 4/5] drivers: perf: add some arm spe events Tan Xiaojun
@ 2019-11-23 10:11 ` Tan Xiaojun
  2019-11-29 16:32   ` James Clark
  2019-12-02  7:07 ` [RFC v3 0/5] perf tools: Add support for some spe events and precise ip Qi Liu
  5 siblings, 1 reply; 13+ messages in thread
From: Tan Xiaojun @ 2019-11-23 10:11 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung, ak,
	adrian.hunter, yao.jin, tmricht, brueckner, songliubraving,
	gregkh, kim.phillips, James.Clark, jeremy.linton
  Cc: gengdongjiu, wxf.wang, liwei391, tanxiaojun, huawei.libin,
	linux-kernel, linux-perf-users

Under the original logic, if the user specifies multiple spe
events during the record, perf will report an error and exit
without actually running. This is not very friendly.

This patch slightly modifies this logic, in which case a
warning is reported and the first spe event is taken as a
record.

At the same time, this patch also supports the recording of
multi new synthetic events. However, if the user specifies the
spe event and then specifies the synthetic spe events, a warning
will be reported and the above principles will still be followed,
only the first spe event will be recorded.

Example:
------------------------------------------------------------------
1) For multiple spe events
$ perf record -e arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/ -e arm_spe_0/ts_enable=0,store_filter=1,jitter=1,min_latency=0/ ls
Warning:
There may be only one arm_spe_x event. More than one spe event will be ignored, unless they are synthetic events of spe, like:
arm_spe_x/llc_miss/
arm_spe_x/branch_miss/
arm_spe_x/tlb_miss/
arm_spe_x/remote_access/
(see 'perf list')
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.078 MB perf.data ]

$ perf report --stdio
...
 # Samples: 0  of event 'arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/'
...

2) For multiple spe precise ip events (synthetic event)
$ perf record -e arm_spe_0/llc_miss/ -e arm_spe_0/llc_miss/ -e arm_spe_0/tlb_miss/ ls
Warning:
These events are precise ip events, please add :p/pp/ppp after the event.
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.343 MB perf.data ]

$ perf report --stdio
 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 0  of event 'arm_spe_0/llc_miss/, arm_spe_0/tlb_miss/'
 # Event count (approx.): 0
 #
 # Children      Self  Command  Shared Object  Symbol
 # ........  ........  .......  .............  ......
 #

 # Samples: 0  of event 'dummy:u'
 # Event count (approx.): 0
 #
 # Children      Self  Command  Shared Object  Symbol
 # ........  ........  .......  .............  ......
 #

 # Samples: 83  of event 'llc-miss'
 # Event count (approx.): 83
 #
 # Children      Self  Command  Shared Object      Symbol
 # ........  ........  .......  .................  ....................................
 #
     42.17%    42.17%  ls       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
     14.46%    14.46%  ls       [kernel.kallsyms]  [k] memchr_inv
     13.25%    13.25%  ls       [kernel.kallsyms]  [k] perf_event_mmap
      2.41%     2.41%  ls       [kernel.kallsyms]  [k] available_idle_cpu
      2.41%     2.41%  ls       [kernel.kallsyms]  [k] copy_page
      2.41%     2.41%  ls       [kernel.kallsyms]  [k] try_to_wake_up
      2.41%     2.41%  ls       [kernel.kallsyms]  [k] vma_interval_tree_insert
      2.41%     2.41%  ls       ld-2.28.so         [.] _dl_lookup_symbol_x
      2.41%     2.41%  ls       ld-2.28.so         [.] _dl_relocate_object
      1.20%     1.20%  ls       [kernel.kallsyms]  [k] ext4_getattr
      1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_page_from_freelist
      1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
      1.20%     1.20%  ls       [kernel.kallsyms]  [k] lock_page_memcg
      1.20%     1.20%  ls       [kernel.kallsyms]  [k] may_open
      1.20%     1.20%  ls       [kernel.kallsyms]  [k] radix_tree_next_chunk
      1.20%     1.20%  ls       [kernel.kallsyms]  [k] rb_prev
      1.20%     1.20%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
      1.20%     1.20%  ls       ld-2.28.so         [.] _dl_start
      1.20%     1.20%  ls       ld-2.28.so         [.] do_lookup_x
      1.20%     1.20%  ls       ld-2.28.so         [.] rtld_lock_default_lock_recursive
      1.20%     1.20%  ls       libc-2.28.so       [.] getenv
      1.20%     1.20%  ls       [unknown]          [.] 0xffff29f1190029b8

 # Samples: 13  of event 'tlb-miss'
 # Event count (approx.): 13
 #
 # Children      Self  Command  Shared Object      Symbol
 # ........  ........  .......  .................  ............................
 #
     15.38%    15.38%  ls       [kernel.kallsyms]  [k] __audit_syscall_entry
     15.38%    15.38%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
     15.38%    15.38%  ls       ld-2.28.so         [.] _dl_relocate_object
     15.38%    15.38%  ls       ld-2.28.so         [.] do_lookup_x
      7.69%     7.69%  ls       [kernel.kallsyms]  [k] memchr_inv
      7.69%     7.69%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
      7.69%     7.69%  ls       ld-2.28.so         [.] _dl_setup_hash
      7.69%     7.69%  ls       ld-2.28.so         [.] _dl_start
      7.69%     7.69%  ls       ls                 [.] 0x00000000000097a0

------------------------------------------------------------------

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
---
 tools/perf/arch/arm64/util/arm-spe.c | 47 +++++++++++++++++++++++++---
 tools/perf/util/arm-spe.c            | 25 +++++++++++++++
 tools/perf/util/arm-spe.h            | 20 ++++++++++++
 3 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index eba6541ec0f1..68e91f3c9614 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -67,21 +67,60 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
 	struct arm_spe_recording *sper =
 			container_of(itr, struct arm_spe_recording, itr);
 	struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
-	struct evsel *evsel, *arm_spe_evsel = NULL;
+	struct evsel *evsel, *tmp, *arm_spe_evsel = NULL;
 	bool privileged = perf_event_paranoid_check(-1);
 	struct evsel *tracking_evsel;
+	char evsel_name[128];
 	int err;
 
 	sper->evlist = evlist;
 
-	evlist__for_each_entry(evlist, evsel) {
+	evlist__for_each_entry_safe(evlist, tmp, evsel) {
 		if (evsel->core.attr.type == arm_spe_pmu->type) {
 			if (arm_spe_evsel) {
-				pr_err("There may be only one " ARM_SPE_PMU_NAME "x event\n");
-				return -EINVAL;
+				if ((evsel->core.attr.config
+						& GENMASK_ULL(ARM_SPE_EVENT_HI,
+							ARM_SPE_EVENT_LO))
+						&& (arm_spe_evsel->core.attr.config
+						& GENMASK_ULL(ARM_SPE_EVENT_HI,
+							ARM_SPE_EVENT_LO))) {
+					arm_spe_evsel->core.attr.config |=
+								evsel->core.attr.config;
+
+					if (!strstr(arm_spe_evsel->name, evsel->name)) {
+						scnprintf(evsel_name, sizeof(evsel_name),
+								"%s, %s", arm_spe_evsel->name,
+								evsel->name);
+						arm_spe_evsel->name = strdup(evsel_name);
+					}
+				} else
+					pr_warning("Warning:\n"
+						"There may be only one "
+						ARM_SPE_PMU_NAME "x event."
+						" More than one spe event"
+						" will be ignored, unless"
+						" they are synthetic events"
+						" of spe, like:"
+						"\narm_spe_x/llc_miss/"
+						"\narm_spe_x/branch_miss/"
+						"\narm_spe_x/tlb_miss/"
+						"\narm_spe_x/remote_access/"
+						"\n(see 'perf list')\n");
+				evlist__remove(evlist, evsel);
+				evsel__delete(evsel);
+				continue;
 			}
 			evsel->core.attr.freq = 0;
 			evsel->core.attr.sample_period = 1;
+			if (evsel->core.attr.config
+					& GENMASK_ULL(ARM_SPE_EVENT_HI, ARM_SPE_EVENT_LO)) {
+				evsel->core.attr.config |= SPE_ATTR_TS_ENABLE;
+				if (!evsel->core.attr.precise_ip)
+					pr_warning("Warning:\n"
+						"These events are precise ip events,"
+						" please add :p/pp/ppp after the event.\n");
+			}
+
 			arm_spe_evsel = evsel;
 			opts->full_auxtrace = true;
 		}
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index e7282c2616f3..0c9d7fa518a5 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -779,6 +779,31 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	attr.sample_id_all = evsel->core.attr.sample_id_all;
 	attr.read_format = evsel->core.attr.read_format;
 
+	if (evsel->core.attr.config
+			& GENMASK_ULL(ARM_SPE_EVENT_HI,
+				ARM_SPE_EVENT_LO)) {
+		spe->synth_opts.llc_miss = false;
+		spe->synth_opts.tlb_miss = false;
+		spe->synth_opts.branch_miss = false;
+		spe->synth_opts.remote_access = false;
+
+		if (evsel->core.attr.config
+				& (ARM_SPE_EVENT_LLC_MISS << ARM_SPE_EVENT_LO))
+			spe->synth_opts.llc_miss = true;
+
+		if (evsel->core.attr.config
+				& (ARM_SPE_EVENT_TLB_MISS << ARM_SPE_EVENT_LO))
+			spe->synth_opts.tlb_miss = true;
+
+		if (evsel->core.attr.config
+				& (ARM_SPE_EVENT_BRANCH_MISS << ARM_SPE_EVENT_LO))
+			spe->synth_opts.branch_miss = true;
+
+		if (evsel->core.attr.config
+				& (ARM_SPE_EVENT_REMOTE_ACCESS << ARM_SPE_EVENT_LO))
+			spe->synth_opts.remote_access = true;
+	}
+
 	/* create new id val to be a fixed offset from evsel id */
 	id = evsel->core.id[0] + 1000000000;
 
diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index 98d3235781c3..db7420121979 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -9,6 +9,26 @@
 
 #define ARM_SPE_PMU_NAME "arm_spe_"
 
+#define ARM_SPE_EVENT_LO			3
+#define ARM_SPE_EVENT_HI			6
+#define ARM_SPE_EVENT_LLC_MISS			BIT(0)
+#define ARM_SPE_EVENT_BRANCH_MISS		BIT(1)
+#define ARM_SPE_EVENT_TLB_MISS			BIT(2)
+#define ARM_SPE_EVENT_REMOTE_ACCESS		BIT(3)
+
+#define SPE_ATTR_TS_ENABLE			BIT(0)
+#define SPE_ATTR_PA_ENABLE			BIT(1)
+#define SPE_ATTR_PCT_ENABLE			BIT(2)
+#define SPE_ATTR_JITTER				BIT(16)
+#define SPE_ATTR_BRANCH_FILTER			BIT(32)
+#define SPE_ATTR_LOAD_FILTER			BIT(33)
+#define SPE_ATTR_STORE_FILTER			BIT(34)
+
+#define SPE_ATTR_EV_RETIRED			BIT(1)
+#define SPE_ATTR_EV_CACHE			BIT(3)
+#define SPE_ATTR_EV_TLB				BIT(5)
+#define SPE_ATTR_EV_BRANCH			BIT(7)
+
 enum {
 	ARM_SPE_PMU_TYPE,
 	ARM_SPE_PER_CPU_MMAPS,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC v3 5/5] perf tools: Add support to process multi spe events
  2019-11-23 10:11 ` [RFC v3 5/5] perf tools: Add support to process multi " Tan Xiaojun
@ 2019-11-29 16:32   ` James Clark
  2019-11-30  0:42     ` Tan Xiaojun
  0 siblings, 1 reply; 13+ messages in thread
From: James Clark @ 2019-11-29 16:32 UTC (permalink / raw)
  To: Tan Xiaojun, peterz, mingo, acme, alexander.shishkin, jolsa,
	namhyung, ak, adrian.hunter, yao.jin, tmricht, brueckner,
	songliubraving, gregkh, Kim Phillips, Jeremy Linton
  Cc: gengdongjiu, wxf.wang, liwei391, huawei.libin, linux-kernel,
	linux-perf-users

Hi Xiaojun,

Sorry for not replying earlier, I was at a conference. Unfortunately I have temporarily lost access to SPE enabled hardware but I will test this out and get back to you as soon as possible.


Thanks
James

On 23/11/2019 10:11, Tan Xiaojun wrote:
> Under the original logic, if the user specifies multiple spe
> events during the record, perf will report an error and exit
> without actually running. This is not very friendly.
>
> This patch slightly modifies this logic, in which case a
> warning is reported and the first spe event is taken as a
> record.
>
> At the same time, this patch also supports the recording of
> multi new synthetic events. However, if the user specifies the
> spe event and then specifies the synthetic spe events, a warning
> will be reported and the above principles will still be followed,
> only the first spe event will be recorded.
>
> Example:
> ------------------------------------------------------------------
> 1) For multiple spe events
> $ perf record -e arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/ -e arm_spe_0/ts_enable=0,store_filter=1,jitter=1,min_latency=0/ ls
> Warning:
> There may be only one arm_spe_x event. More than one spe event will be ignored, unless they are synthetic events of spe, like:
> arm_spe_x/llc_miss/
> arm_spe_x/branch_miss/
> arm_spe_x/tlb_miss/
> arm_spe_x/remote_access/
> (see 'perf list')
> ...
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.078 MB perf.data ]
>
> $ perf report --stdio
> ...
>  # Samples: 0  of event 'arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/'
> ...
>
> 2) For multiple spe precise ip events (synthetic event)
> $ perf record -e arm_spe_0/llc_miss/ -e arm_spe_0/llc_miss/ -e arm_spe_0/tlb_miss/ ls
> Warning:
> These events are precise ip events, please add :p/pp/ppp after the event.
> ...
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.343 MB perf.data ]
>
> $ perf report --stdio
>  # To display the perf.data header info, please use --header/--header-only options.
>  #
>  #
>  # Total Lost Samples: 0
>  #
>  # Samples: 0  of event 'arm_spe_0/llc_miss/, arm_spe_0/tlb_miss/'
>  # Event count (approx.): 0
>  #
>  # Children      Self  Command  Shared Object  Symbol
>  # ........  ........  .......  .............  ......
>  #
>
>  # Samples: 0  of event 'dummy:u'
>  # Event count (approx.): 0
>  #
>  # Children      Self  Command  Shared Object  Symbol
>  # ........  ........  .......  .............  ......
>  #
>
>  # Samples: 83  of event 'llc-miss'
>  # Event count (approx.): 83
>  #
>  # Children      Self  Command  Shared Object      Symbol
>  # ........  ........  .......  .................  ....................................
>  #
>      42.17%    42.17%  ls       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>      14.46%    14.46%  ls       [kernel.kallsyms]  [k] memchr_inv
>      13.25%    13.25%  ls       [kernel.kallsyms]  [k] perf_event_mmap
>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] available_idle_cpu
>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] copy_page
>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] try_to_wake_up
>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] vma_interval_tree_insert
>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_lookup_symbol_x
>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_relocate_object
>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] ext4_getattr
>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_page_from_freelist
>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] lock_page_memcg
>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] may_open
>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] radix_tree_next_chunk
>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] rb_prev
>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_start
>       1.20%     1.20%  ls       ld-2.28.so         [.] do_lookup_x
>       1.20%     1.20%  ls       ld-2.28.so         [.] rtld_lock_default_lock_recursive
>       1.20%     1.20%  ls       libc-2.28.so       [.] getenv
>       1.20%     1.20%  ls       [unknown]          [.] 0xffff29f1190029b8
>
>  # Samples: 13  of event 'tlb-miss'
>  # Event count (approx.): 13
>  #
>  # Children      Self  Command  Shared Object      Symbol
>  # ........  ........  .......  .................  ............................
>  #
>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] __audit_syscall_entry
>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>      15.38%    15.38%  ls       ld-2.28.so         [.] _dl_relocate_object
>      15.38%    15.38%  ls       ld-2.28.so         [.] do_lookup_x
>       7.69%     7.69%  ls       [kernel.kallsyms]  [k] memchr_inv
>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_setup_hash
>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_start
>       7.69%     7.69%  ls       ls                 [.] 0x00000000000097a0
>
> ------------------------------------------------------------------
>
> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
> ---
>  tools/perf/arch/arm64/util/arm-spe.c | 47 +++++++++++++++++++++++++---
>  tools/perf/util/arm-spe.c            | 25 +++++++++++++++
>  tools/perf/util/arm-spe.h            | 20 ++++++++++++
>  3 files changed, 88 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
> index eba6541ec0f1..68e91f3c9614 100644
> --- a/tools/perf/arch/arm64/util/arm-spe.c
> +++ b/tools/perf/arch/arm64/util/arm-spe.c
> @@ -67,21 +67,60 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
>       struct arm_spe_recording *sper =
>                       container_of(itr, struct arm_spe_recording, itr);
>       struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
> -     struct evsel *evsel, *arm_spe_evsel = NULL;
> +     struct evsel *evsel, *tmp, *arm_spe_evsel = NULL;
>       bool privileged = perf_event_paranoid_check(-1);
>       struct evsel *tracking_evsel;
> +     char evsel_name[128];
>       int err;
>
>       sper->evlist = evlist;
>
> -     evlist__for_each_entry(evlist, evsel) {
> +     evlist__for_each_entry_safe(evlist, tmp, evsel) {
>               if (evsel->core.attr.type == arm_spe_pmu->type) {
>                       if (arm_spe_evsel) {
> -                             pr_err("There may be only one " ARM_SPE_PMU_NAME "x event\n");
> -                             return -EINVAL;
> +                             if ((evsel->core.attr.config
> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
> +                                                     ARM_SPE_EVENT_LO))
> +                                             && (arm_spe_evsel->core.attr.config
> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
> +                                                     ARM_SPE_EVENT_LO))) {
> +                                     arm_spe_evsel->core.attr.config |=
> +                                                             evsel->core.attr.config;
> +
> +                                     if (!strstr(arm_spe_evsel->name, evsel->name)) {
> +                                             scnprintf(evsel_name, sizeof(evsel_name),
> +                                                             "%s, %s", arm_spe_evsel->name,
> +                                                             evsel->name);
> +                                             arm_spe_evsel->name = strdup(evsel_name);
> +                                     }
> +                             } else
> +                                     pr_warning("Warning:\n"
> +                                             "There may be only one "
> +                                             ARM_SPE_PMU_NAME "x event."
> +                                             " More than one spe event"
> +                                             " will be ignored, unless"
> +                                             " they are synthetic events"
> +                                             " of spe, like:"
> +                                             "\narm_spe_x/llc_miss/"
> +                                             "\narm_spe_x/branch_miss/"
> +                                             "\narm_spe_x/tlb_miss/"
> +                                             "\narm_spe_x/remote_access/"
> +                                             "\n(see 'perf list')\n");
> +                             evlist__remove(evlist, evsel);
> +                             evsel__delete(evsel);
> +                             continue;
>                       }
>                       evsel->core.attr.freq = 0;
>                       evsel->core.attr.sample_period = 1;
> +                     if (evsel->core.attr.config
> +                                     & GENMASK_ULL(ARM_SPE_EVENT_HI, ARM_SPE_EVENT_LO)) {
> +                             evsel->core.attr.config |= SPE_ATTR_TS_ENABLE;
> +                             if (!evsel->core.attr.precise_ip)
> +                                     pr_warning("Warning:\n"
> +                                             "These events are precise ip events,"
> +                                             " please add :p/pp/ppp after the event.\n");
> +                     }
> +
>                       arm_spe_evsel = evsel;
>                       opts->full_auxtrace = true;
>               }
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index e7282c2616f3..0c9d7fa518a5 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -779,6 +779,31 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
>       attr.sample_id_all = evsel->core.attr.sample_id_all;
>       attr.read_format = evsel->core.attr.read_format;
>
> +     if (evsel->core.attr.config
> +                     & GENMASK_ULL(ARM_SPE_EVENT_HI,
> +                             ARM_SPE_EVENT_LO)) {
> +             spe->synth_opts.llc_miss = false;
> +             spe->synth_opts.tlb_miss = false;
> +             spe->synth_opts.branch_miss = false;
> +             spe->synth_opts.remote_access = false;
> +
> +             if (evsel->core.attr.config
> +                             & (ARM_SPE_EVENT_LLC_MISS << ARM_SPE_EVENT_LO))
> +                     spe->synth_opts.llc_miss = true;
> +
> +             if (evsel->core.attr.config
> +                             & (ARM_SPE_EVENT_TLB_MISS << ARM_SPE_EVENT_LO))
> +                     spe->synth_opts.tlb_miss = true;
> +
> +             if (evsel->core.attr.config
> +                             & (ARM_SPE_EVENT_BRANCH_MISS << ARM_SPE_EVENT_LO))
> +                     spe->synth_opts.branch_miss = true;
> +
> +             if (evsel->core.attr.config
> +                             & (ARM_SPE_EVENT_REMOTE_ACCESS << ARM_SPE_EVENT_LO))
> +                     spe->synth_opts.remote_access = true;
> +     }
> +
>       /* create new id val to be a fixed offset from evsel id */
>       id = evsel->core.id[0] + 1000000000;
>
> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
> index 98d3235781c3..db7420121979 100644
> --- a/tools/perf/util/arm-spe.h
> +++ b/tools/perf/util/arm-spe.h
> @@ -9,6 +9,26 @@
>
>  #define ARM_SPE_PMU_NAME "arm_spe_"
>
> +#define ARM_SPE_EVENT_LO                     3
> +#define ARM_SPE_EVENT_HI                     6
> +#define ARM_SPE_EVENT_LLC_MISS                       BIT(0)
> +#define ARM_SPE_EVENT_BRANCH_MISS            BIT(1)
> +#define ARM_SPE_EVENT_TLB_MISS                       BIT(2)
> +#define ARM_SPE_EVENT_REMOTE_ACCESS          BIT(3)
> +
> +#define SPE_ATTR_TS_ENABLE                   BIT(0)
> +#define SPE_ATTR_PA_ENABLE                   BIT(1)
> +#define SPE_ATTR_PCT_ENABLE                  BIT(2)
> +#define SPE_ATTR_JITTER                              BIT(16)
> +#define SPE_ATTR_BRANCH_FILTER                       BIT(32)
> +#define SPE_ATTR_LOAD_FILTER                 BIT(33)
> +#define SPE_ATTR_STORE_FILTER                        BIT(34)
> +
> +#define SPE_ATTR_EV_RETIRED                  BIT(1)
> +#define SPE_ATTR_EV_CACHE                    BIT(3)
> +#define SPE_ATTR_EV_TLB                              BIT(5)
> +#define SPE_ATTR_EV_BRANCH                   BIT(7)
> +
>  enum {
>       ARM_SPE_PMU_TYPE,
>       ARM_SPE_PER_CPU_MMAPS,
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v3 5/5] perf tools: Add support to process multi spe events
  2019-11-29 16:32   ` James Clark
@ 2019-11-30  0:42     ` Tan Xiaojun
  2019-12-06 15:48       ` James Clark
  0 siblings, 1 reply; 13+ messages in thread
From: Tan Xiaojun @ 2019-11-30  0:42 UTC (permalink / raw)
  To: James Clark, peterz, mingo, acme, alexander.shishkin, jolsa,
	namhyung, ak, adrian.hunter, yao.jin, tmricht, brueckner,
	songliubraving, gregkh, Kim Phillips, Jeremy Linton
  Cc: gengdongjiu, wxf.wang, liwei391, huawei.libin, linux-kernel,
	linux-perf-users

On 2019/11/30 0:32, James Clark wrote:
> Hi Xiaojun,
> 
> Sorry for not replying earlier, I was at a conference. Unfortunately I have temporarily lost access to SPE enabled hardware but I will test this out and get back to you as soon as possible.
> 
> 
> Thanks
> James
> 

OK.

Thanks.
Xiaojun.

> On 23/11/2019 10:11, Tan Xiaojun wrote:
>> Under the original logic, if the user specifies multiple spe
>> events during the record, perf will report an error and exit
>> without actually running. This is not very friendly.
>>
>> This patch slightly modifies this logic, in which case a
>> warning is reported and the first spe event is taken as a
>> record.
>>
>> At the same time, this patch also supports the recording of
>> multi new synthetic events. However, if the user specifies the
>> spe event and then specifies the synthetic spe events, a warning
>> will be reported and the above principles will still be followed,
>> only the first spe event will be recorded.
>>
>> Example:
>> ------------------------------------------------------------------
>> 1) For multiple spe events
>> $ perf record -e arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/ -e arm_spe_0/ts_enable=0,store_filter=1,jitter=1,min_latency=0/ ls
>> Warning:
>> There may be only one arm_spe_x event. More than one spe event will be ignored, unless they are synthetic events of spe, like:
>> arm_spe_x/llc_miss/
>> arm_spe_x/branch_miss/
>> arm_spe_x/tlb_miss/
>> arm_spe_x/remote_access/
>> (see 'perf list')
>> ...
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.078 MB perf.data ]
>>
>> $ perf report --stdio
>> ...
>>  # Samples: 0  of event 'arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/'
>> ...
>>
>> 2) For multiple spe precise ip events (synthetic event)
>> $ perf record -e arm_spe_0/llc_miss/ -e arm_spe_0/llc_miss/ -e arm_spe_0/tlb_miss/ ls
>> Warning:
>> These events are precise ip events, please add :p/pp/ppp after the event.
>> ...
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.343 MB perf.data ]
>>
>> $ perf report --stdio
>>  # To display the perf.data header info, please use --header/--header-only options.
>>  #
>>  #
>>  # Total Lost Samples: 0
>>  #
>>  # Samples: 0  of event 'arm_spe_0/llc_miss/, arm_spe_0/tlb_miss/'
>>  # Event count (approx.): 0
>>  #
>>  # Children      Self  Command  Shared Object  Symbol
>>  # ........  ........  .......  .............  ......
>>  #
>>
>>  # Samples: 0  of event 'dummy:u'
>>  # Event count (approx.): 0
>>  #
>>  # Children      Self  Command  Shared Object  Symbol
>>  # ........  ........  .......  .............  ......
>>  #
>>
>>  # Samples: 83  of event 'llc-miss'
>>  # Event count (approx.): 83
>>  #
>>  # Children      Self  Command  Shared Object      Symbol
>>  # ........  ........  .......  .................  ....................................
>>  #
>>      42.17%    42.17%  ls       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>      14.46%    14.46%  ls       [kernel.kallsyms]  [k] memchr_inv
>>      13.25%    13.25%  ls       [kernel.kallsyms]  [k] perf_event_mmap
>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] available_idle_cpu
>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] copy_page
>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] try_to_wake_up
>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] vma_interval_tree_insert
>>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_lookup_symbol_x
>>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_relocate_object
>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] ext4_getattr
>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_page_from_freelist
>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] lock_page_memcg
>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] may_open
>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] radix_tree_next_chunk
>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] rb_prev
>>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_start
>>       1.20%     1.20%  ls       ld-2.28.so         [.] do_lookup_x
>>       1.20%     1.20%  ls       ld-2.28.so         [.] rtld_lock_default_lock_recursive
>>       1.20%     1.20%  ls       libc-2.28.so       [.] getenv
>>       1.20%     1.20%  ls       [unknown]          [.] 0xffff29f1190029b8
>>
>>  # Samples: 13  of event 'tlb-miss'
>>  # Event count (approx.): 13
>>  #
>>  # Children      Self  Command  Shared Object      Symbol
>>  # ........  ........  .......  .................  ............................
>>  #
>>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] __audit_syscall_entry
>>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>>      15.38%    15.38%  ls       ld-2.28.so         [.] _dl_relocate_object
>>      15.38%    15.38%  ls       ld-2.28.so         [.] do_lookup_x
>>       7.69%     7.69%  ls       [kernel.kallsyms]  [k] memchr_inv
>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_setup_hash
>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_start
>>       7.69%     7.69%  ls       ls                 [.] 0x00000000000097a0
>>
>> ------------------------------------------------------------------
>>
>> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
>> ---
>>  tools/perf/arch/arm64/util/arm-spe.c | 47 +++++++++++++++++++++++++---
>>  tools/perf/util/arm-spe.c            | 25 +++++++++++++++
>>  tools/perf/util/arm-spe.h            | 20 ++++++++++++
>>  3 files changed, 88 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
>> index eba6541ec0f1..68e91f3c9614 100644
>> --- a/tools/perf/arch/arm64/util/arm-spe.c
>> +++ b/tools/perf/arch/arm64/util/arm-spe.c
>> @@ -67,21 +67,60 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
>>       struct arm_spe_recording *sper =
>>                       container_of(itr, struct arm_spe_recording, itr);
>>       struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
>> -     struct evsel *evsel, *arm_spe_evsel = NULL;
>> +     struct evsel *evsel, *tmp, *arm_spe_evsel = NULL;
>>       bool privileged = perf_event_paranoid_check(-1);
>>       struct evsel *tracking_evsel;
>> +     char evsel_name[128];
>>       int err;
>>
>>       sper->evlist = evlist;
>>
>> -     evlist__for_each_entry(evlist, evsel) {
>> +     evlist__for_each_entry_safe(evlist, tmp, evsel) {
>>               if (evsel->core.attr.type == arm_spe_pmu->type) {
>>                       if (arm_spe_evsel) {
>> -                             pr_err("There may be only one " ARM_SPE_PMU_NAME "x event\n");
>> -                             return -EINVAL;
>> +                             if ((evsel->core.attr.config
>> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
>> +                                                     ARM_SPE_EVENT_LO))
>> +                                             && (arm_spe_evsel->core.attr.config
>> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
>> +                                                     ARM_SPE_EVENT_LO))) {
>> +                                     arm_spe_evsel->core.attr.config |=
>> +                                                             evsel->core.attr.config;
>> +
>> +                                     if (!strstr(arm_spe_evsel->name, evsel->name)) {
>> +                                             scnprintf(evsel_name, sizeof(evsel_name),
>> +                                                             "%s, %s", arm_spe_evsel->name,
>> +                                                             evsel->name);
>> +                                             arm_spe_evsel->name = strdup(evsel_name);
>> +                                     }
>> +                             } else
>> +                                     pr_warning("Warning:\n"
>> +                                             "There may be only one "
>> +                                             ARM_SPE_PMU_NAME "x event."
>> +                                             " More than one spe event"
>> +                                             " will be ignored, unless"
>> +                                             " they are synthetic events"
>> +                                             " of spe, like:"
>> +                                             "\narm_spe_x/llc_miss/"
>> +                                             "\narm_spe_x/branch_miss/"
>> +                                             "\narm_spe_x/tlb_miss/"
>> +                                             "\narm_spe_x/remote_access/"
>> +                                             "\n(see 'perf list')\n");
>> +                             evlist__remove(evlist, evsel);
>> +                             evsel__delete(evsel);
>> +                             continue;
>>                       }
>>                       evsel->core.attr.freq = 0;
>>                       evsel->core.attr.sample_period = 1;
>> +                     if (evsel->core.attr.config
>> +                                     & GENMASK_ULL(ARM_SPE_EVENT_HI, ARM_SPE_EVENT_LO)) {
>> +                             evsel->core.attr.config |= SPE_ATTR_TS_ENABLE;
>> +                             if (!evsel->core.attr.precise_ip)
>> +                                     pr_warning("Warning:\n"
>> +                                             "These events are precise ip events,"
>> +                                             " please add :p/pp/ppp after the event.\n");
>> +                     }
>> +
>>                       arm_spe_evsel = evsel;
>>                       opts->full_auxtrace = true;
>>               }
>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>> index e7282c2616f3..0c9d7fa518a5 100644
>> --- a/tools/perf/util/arm-spe.c
>> +++ b/tools/perf/util/arm-spe.c
>> @@ -779,6 +779,31 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
>>       attr.sample_id_all = evsel->core.attr.sample_id_all;
>>       attr.read_format = evsel->core.attr.read_format;
>>
>> +     if (evsel->core.attr.config
>> +                     & GENMASK_ULL(ARM_SPE_EVENT_HI,
>> +                             ARM_SPE_EVENT_LO)) {
>> +             spe->synth_opts.llc_miss = false;
>> +             spe->synth_opts.tlb_miss = false;
>> +             spe->synth_opts.branch_miss = false;
>> +             spe->synth_opts.remote_access = false;
>> +
>> +             if (evsel->core.attr.config
>> +                             & (ARM_SPE_EVENT_LLC_MISS << ARM_SPE_EVENT_LO))
>> +                     spe->synth_opts.llc_miss = true;
>> +
>> +             if (evsel->core.attr.config
>> +                             & (ARM_SPE_EVENT_TLB_MISS << ARM_SPE_EVENT_LO))
>> +                     spe->synth_opts.tlb_miss = true;
>> +
>> +             if (evsel->core.attr.config
>> +                             & (ARM_SPE_EVENT_BRANCH_MISS << ARM_SPE_EVENT_LO))
>> +                     spe->synth_opts.branch_miss = true;
>> +
>> +             if (evsel->core.attr.config
>> +                             & (ARM_SPE_EVENT_REMOTE_ACCESS << ARM_SPE_EVENT_LO))
>> +                     spe->synth_opts.remote_access = true;
>> +     }
>> +
>>       /* create new id val to be a fixed offset from evsel id */
>>       id = evsel->core.id[0] + 1000000000;
>>
>> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
>> index 98d3235781c3..db7420121979 100644
>> --- a/tools/perf/util/arm-spe.h
>> +++ b/tools/perf/util/arm-spe.h
>> @@ -9,6 +9,26 @@
>>
>>  #define ARM_SPE_PMU_NAME "arm_spe_"
>>
>> +#define ARM_SPE_EVENT_LO                     3
>> +#define ARM_SPE_EVENT_HI                     6
>> +#define ARM_SPE_EVENT_LLC_MISS                       BIT(0)
>> +#define ARM_SPE_EVENT_BRANCH_MISS            BIT(1)
>> +#define ARM_SPE_EVENT_TLB_MISS                       BIT(2)
>> +#define ARM_SPE_EVENT_REMOTE_ACCESS          BIT(3)
>> +
>> +#define SPE_ATTR_TS_ENABLE                   BIT(0)
>> +#define SPE_ATTR_PA_ENABLE                   BIT(1)
>> +#define SPE_ATTR_PCT_ENABLE                  BIT(2)
>> +#define SPE_ATTR_JITTER                              BIT(16)
>> +#define SPE_ATTR_BRANCH_FILTER                       BIT(32)
>> +#define SPE_ATTR_LOAD_FILTER                 BIT(33)
>> +#define SPE_ATTR_STORE_FILTER                        BIT(34)
>> +
>> +#define SPE_ATTR_EV_RETIRED                  BIT(1)
>> +#define SPE_ATTR_EV_CACHE                    BIT(3)
>> +#define SPE_ATTR_EV_TLB                              BIT(5)
>> +#define SPE_ATTR_EV_BRANCH                   BIT(7)
>> +
>>  enum {
>>       ARM_SPE_PMU_TYPE,
>>       ARM_SPE_PER_CPU_MMAPS,
>>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v3 0/5] perf tools: Add support for some spe events and precise ip
  2019-11-23 10:11 [RFC v3 0/5] perf tools: Add support for some spe events and precise ip Tan Xiaojun
                   ` (4 preceding siblings ...)
  2019-11-23 10:11 ` [RFC v3 5/5] perf tools: Add support to process multi " Tan Xiaojun
@ 2019-12-02  7:07 ` Qi Liu
  5 siblings, 0 replies; 13+ messages in thread
From: Qi Liu @ 2019-12-02  7:07 UTC (permalink / raw)
  To: Tan Xiaojun, peterz, mingo, acme, alexander.shishkin, jolsa,
	namhyung, ak, adrian.hunter, yao.jin, tmricht, brueckner,
	songliubraving, gregkh, kim.phillips, James.Clark, jeremy.linton
  Cc: gengdongjiu, wxf.wang, liwei391, huawei.libin, linux-kernel,
	linux-perf-users


Tested-by: Qi Liu <liuqi115@hisilicon.com>


On 2019/11/23 18:11, Tan Xiaojun wrote:
> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
> Profiling Extensions (SPE) support") is merged, "perf record" and
> "perf report --dump-raw-trace" have been supported. However, the
> raw data that is dumped cannot be used without parsing.
> 
> This patchset is to improve the "perf report" support for spe, and
> further process the data. Currently, support for the three events
> of llc-miss, tlb-miss, branch-miss and remote-access is added.
> 
> v1->v2:
> Some cleanup and bugfix fixes were made, and support for the precise
> ip of branch-misses was added. Thanks for the suggestions of Jeremy
> and James.
> 
> v2->v3:
> Mainly add four spe precise ip events, you can see through perf list.
> More details in [5/5].
> 
> Tan Xiaojun (5):
>   perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
>   perf tools: Add support for "report" for some spe events
>   perf report: Add --spe options for arm-spe
>   drivers: perf: add some arm spe events
>   perf tools: Add support to process multi spe events
> 
>  drivers/perf/arm_spe_pmu.c                    |  44 +
>  tools/perf/Documentation/perf-report.txt      |  10 +
>  tools/perf/arch/arm64/util/arm-spe.c          |  47 +-
>  tools/perf/builtin-report.c                   |   5 +
>  tools/perf/util/Build                         |   2 +-
>  tools/perf/util/arm-spe-decoder/Build         |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 +++++
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
>  .../arm-spe-pkt-decoder.c                     |   0
>  .../arm-spe-pkt-decoder.h                     |   2 +
>  tools/perf/util/arm-spe.c                     | 771 +++++++++++++++++-
>  tools/perf/util/arm-spe.h                     |  20 +
>  tools/perf/util/auxtrace.c                    |  49 ++
>  tools/perf/util/auxtrace.h                    |  29 +
>  tools/perf/util/session.h                     |   2 +
>  15 files changed, 1231 insertions(+), 42 deletions(-)
>  create mode 100644 tools/perf/util/arm-spe-decoder/Build
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>  rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
>  rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (96%)
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v3 5/5] perf tools: Add support to process multi spe events
  2019-11-30  0:42     ` Tan Xiaojun
@ 2019-12-06 15:48       ` James Clark
  2019-12-09  0:46         ` Tan Xiaojun
  0 siblings, 1 reply; 13+ messages in thread
From: James Clark @ 2019-12-06 15:48 UTC (permalink / raw)
  To: Tan Xiaojun, peterz, mingo, acme, alexander.shishkin, jolsa,
	namhyung, ak, adrian.hunter, yao.jin, tmricht, brueckner,
	songliubraving, gregkh, Kim Phillips, Jeremy Linton
  Cc: gengdongjiu, wxf.wang, liwei391, huawei.libin, linux-kernel,
	linux-perf-users, nd

Hi Xiaojun,

> 
> What do you think of this current implementation? Or you prefer the previous way(like branch-misses:pp dTLB-load-misses:pp cache-misses:pp), then I will modify it again.
> 

Yes I think I prefer the previous way. The reason to add support for :p on the standard event names was to make the user experience more similar to x86. What was the reason for moving to arm_spe_x/branch_miss/? If we are going to use this format then I don't see the need for requiring users to add :p to the end of arm_spe_x/branch_miss/.

I've tested these patches, but unfortunately I don't see the new events when I do perf list:
    ...
      arm_spe_0//                                        [Kernel PMU event]
      armv8_pmuv3/l3d_cache_wb/                          [Kernel PMU event]
      armv8_pmuv3/sample_collision/                      [Kernel PMU event]
      armv8_pmuv3/sample_feed/                           [Kernel PMU event]
      armv8_pmuv3/sample_filtrate/                       [Kernel PMU event]
      armv8_pmuv3/sample_pop/                            [Kernel PMU event]
    
    branch:
      br_mis_pred
    ...

Should I see events like /arm_spe_0/branch_miss/ in that list?

And then if I attempt to record them I get this error:

    ./perf record -e arm_spe_0/branch_miss/ ls
    event syntax error: 'arm_spe_0/branch_miss/'
                                   \___ unknown term

But using the plain event name still works:

    ./perf record -e arm_spe/ts_enable=1/ ls
    ...
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.571 MB perf.data ]


Thanks
James

On 30/11/2019 00:42, Tan Xiaojun wrote:
> On 2019/11/30 0:32, James Clark wrote:
>> Hi Xiaojun,
>>
>> Sorry for not replying earlier, I was at a conference. Unfortunately I have temporarily lost access to SPE enabled hardware but I will test this out and get back to you as soon as possible.
>>
>>
>> Thanks
>> James
>>
> 
> OK.
> 
> Thanks.
> Xiaojun.
> 
>> On 23/11/2019 10:11, Tan Xiaojun wrote:
>>> Under the original logic, if the user specifies multiple spe
>>> events during the record, perf will report an error and exit
>>> without actually running. This is not very friendly.
>>>
>>> This patch slightly modifies this logic, in which case a
>>> warning is reported and the first spe event is taken as a
>>> record.
>>>
>>> At the same time, this patch also supports the recording of
>>> multi new synthetic events. However, if the user specifies the
>>> spe event and then specifies the synthetic spe events, a warning
>>> will be reported and the above principles will still be followed,
>>> only the first spe event will be recorded.
>>>
>>> Example:
>>> ------------------------------------------------------------------
>>> 1) For multiple spe events
>>> $ perf record -e arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/ -e arm_spe_0/ts_enable=0,store_filter=1,jitter=1,min_latency=0/ ls
>>> Warning:
>>> There may be only one arm_spe_x event. More than one spe event will be ignored, unless they are synthetic events of spe, like:
>>> arm_spe_x/llc_miss/
>>> arm_spe_x/branch_miss/
>>> arm_spe_x/tlb_miss/
>>> arm_spe_x/remote_access/
>>> (see 'perf list')
>>> ...
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.078 MB perf.data ]
>>>
>>> $ perf report --stdio
>>> ...
>>>  # Samples: 0  of event 'arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/'
>>> ...
>>>
>>> 2) For multiple spe precise ip events (synthetic event)
>>> $ perf record -e arm_spe_0/llc_miss/ -e arm_spe_0/llc_miss/ -e arm_spe_0/tlb_miss/ ls
>>> Warning:
>>> These events are precise ip events, please add :p/pp/ppp after the event.
>>> ...
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.343 MB perf.data ]
>>>
>>> $ perf report --stdio
>>>  # To display the perf.data header info, please use --header/--header-only options.
>>>  #
>>>  #
>>>  # Total Lost Samples: 0
>>>  #
>>>  # Samples: 0  of event 'arm_spe_0/llc_miss/, arm_spe_0/tlb_miss/'
>>>  # Event count (approx.): 0
>>>  #
>>>  # Children      Self  Command  Shared Object  Symbol
>>>  # ........  ........  .......  .............  ......
>>>  #
>>>
>>>  # Samples: 0  of event 'dummy:u'
>>>  # Event count (approx.): 0
>>>  #
>>>  # Children      Self  Command  Shared Object  Symbol
>>>  # ........  ........  .......  .............  ......
>>>  #
>>>
>>>  # Samples: 83  of event 'llc-miss'
>>>  # Event count (approx.): 83
>>>  #
>>>  # Children      Self  Command  Shared Object      Symbol
>>>  # ........  ........  .......  .................  ....................................
>>>  #
>>>      42.17%    42.17%  ls       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>>      14.46%    14.46%  ls       [kernel.kallsyms]  [k] memchr_inv
>>>      13.25%    13.25%  ls       [kernel.kallsyms]  [k] perf_event_mmap
>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] available_idle_cpu
>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] copy_page
>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] try_to_wake_up
>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] vma_interval_tree_insert
>>>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_lookup_symbol_x
>>>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_relocate_object
>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] ext4_getattr
>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_page_from_freelist
>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] lock_page_memcg
>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] may_open
>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] radix_tree_next_chunk
>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] rb_prev
>>>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>>>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_start
>>>       1.20%     1.20%  ls       ld-2.28.so         [.] do_lookup_x
>>>       1.20%     1.20%  ls       ld-2.28.so         [.] rtld_lock_default_lock_recursive
>>>       1.20%     1.20%  ls       libc-2.28.so       [.] getenv
>>>       1.20%     1.20%  ls       [unknown]          [.] 0xffff29f1190029b8
>>>
>>>  # Samples: 13  of event 'tlb-miss'
>>>  # Event count (approx.): 13
>>>  #
>>>  # Children      Self  Command  Shared Object      Symbol
>>>  # ........  ........  .......  .................  ............................
>>>  #
>>>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] __audit_syscall_entry
>>>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>>>      15.38%    15.38%  ls       ld-2.28.so         [.] _dl_relocate_object
>>>      15.38%    15.38%  ls       ld-2.28.so         [.] do_lookup_x
>>>       7.69%     7.69%  ls       [kernel.kallsyms]  [k] memchr_inv
>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_setup_hash
>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_start
>>>       7.69%     7.69%  ls       ls                 [.] 0x00000000000097a0
>>>
>>> ------------------------------------------------------------------
>>>
>>> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
>>> ---
>>>  tools/perf/arch/arm64/util/arm-spe.c | 47 +++++++++++++++++++++++++---
>>>  tools/perf/util/arm-spe.c            | 25 +++++++++++++++
>>>  tools/perf/util/arm-spe.h            | 20 ++++++++++++
>>>  3 files changed, 88 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
>>> index eba6541ec0f1..68e91f3c9614 100644
>>> --- a/tools/perf/arch/arm64/util/arm-spe.c
>>> +++ b/tools/perf/arch/arm64/util/arm-spe.c
>>> @@ -67,21 +67,60 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
>>>       struct arm_spe_recording *sper =
>>>                       container_of(itr, struct arm_spe_recording, itr);
>>>       struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
>>> -     struct evsel *evsel, *arm_spe_evsel = NULL;
>>> +     struct evsel *evsel, *tmp, *arm_spe_evsel = NULL;
>>>       bool privileged = perf_event_paranoid_check(-1);
>>>       struct evsel *tracking_evsel;
>>> +     char evsel_name[128];
>>>       int err;
>>>
>>>       sper->evlist = evlist;
>>>
>>> -     evlist__for_each_entry(evlist, evsel) {
>>> +     evlist__for_each_entry_safe(evlist, tmp, evsel) {
>>>               if (evsel->core.attr.type == arm_spe_pmu->type) {
>>>                       if (arm_spe_evsel) {
>>> -                             pr_err("There may be only one " ARM_SPE_PMU_NAME "x event\n");
>>> -                             return -EINVAL;
>>> +                             if ((evsel->core.attr.config
>>> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>> +                                                     ARM_SPE_EVENT_LO))
>>> +                                             && (arm_spe_evsel->core.attr.config
>>> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>> +                                                     ARM_SPE_EVENT_LO))) {
>>> +                                     arm_spe_evsel->core.attr.config |=
>>> +                                                             evsel->core.attr.config;
>>> +
>>> +                                     if (!strstr(arm_spe_evsel->name, evsel->name)) {
>>> +                                             scnprintf(evsel_name, sizeof(evsel_name),
>>> +                                                             "%s, %s", arm_spe_evsel->name,
>>> +                                                             evsel->name);
>>> +                                             arm_spe_evsel->name = strdup(evsel_name);
>>> +                                     }
>>> +                             } else
>>> +                                     pr_warning("Warning:\n"
>>> +                                             "There may be only one "
>>> +                                             ARM_SPE_PMU_NAME "x event."
>>> +                                             " More than one spe event"
>>> +                                             " will be ignored, unless"
>>> +                                             " they are synthetic events"
>>> +                                             " of spe, like:"
>>> +                                             "\narm_spe_x/llc_miss/"
>>> +                                             "\narm_spe_x/branch_miss/"
>>> +                                             "\narm_spe_x/tlb_miss/"
>>> +                                             "\narm_spe_x/remote_access/"
>>> +                                             "\n(see 'perf list')\n");
>>> +                             evlist__remove(evlist, evsel);
>>> +                             evsel__delete(evsel);
>>> +                             continue;
>>>                       }
>>>                       evsel->core.attr.freq = 0;
>>>                       evsel->core.attr.sample_period = 1;
>>> +                     if (evsel->core.attr.config
>>> +                                     & GENMASK_ULL(ARM_SPE_EVENT_HI, ARM_SPE_EVENT_LO)) {
>>> +                             evsel->core.attr.config |= SPE_ATTR_TS_ENABLE;
>>> +                             if (!evsel->core.attr.precise_ip)
>>> +                                     pr_warning("Warning:\n"
>>> +                                             "These events are precise ip events,"
>>> +                                             " please add :p/pp/ppp after the event.\n");
>>> +                     }
>>> +
>>>                       arm_spe_evsel = evsel;
>>>                       opts->full_auxtrace = true;
>>>               }
>>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>>> index e7282c2616f3..0c9d7fa518a5 100644
>>> --- a/tools/perf/util/arm-spe.c
>>> +++ b/tools/perf/util/arm-spe.c
>>> @@ -779,6 +779,31 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
>>>       attr.sample_id_all = evsel->core.attr.sample_id_all;
>>>       attr.read_format = evsel->core.attr.read_format;
>>>
>>> +     if (evsel->core.attr.config
>>> +                     & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>> +                             ARM_SPE_EVENT_LO)) {
>>> +             spe->synth_opts.llc_miss = false;
>>> +             spe->synth_opts.tlb_miss = false;
>>> +             spe->synth_opts.branch_miss = false;
>>> +             spe->synth_opts.remote_access = false;
>>> +
>>> +             if (evsel->core.attr.config
>>> +                             & (ARM_SPE_EVENT_LLC_MISS << ARM_SPE_EVENT_LO))
>>> +                     spe->synth_opts.llc_miss = true;
>>> +
>>> +             if (evsel->core.attr.config
>>> +                             & (ARM_SPE_EVENT_TLB_MISS << ARM_SPE_EVENT_LO))
>>> +                     spe->synth_opts.tlb_miss = true;
>>> +
>>> +             if (evsel->core.attr.config
>>> +                             & (ARM_SPE_EVENT_BRANCH_MISS << ARM_SPE_EVENT_LO))
>>> +                     spe->synth_opts.branch_miss = true;
>>> +
>>> +             if (evsel->core.attr.config
>>> +                             & (ARM_SPE_EVENT_REMOTE_ACCESS << ARM_SPE_EVENT_LO))
>>> +                     spe->synth_opts.remote_access = true;
>>> +     }
>>> +
>>>       /* create new id val to be a fixed offset from evsel id */
>>>       id = evsel->core.id[0] + 1000000000;
>>>
>>> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
>>> index 98d3235781c3..db7420121979 100644
>>> --- a/tools/perf/util/arm-spe.h
>>> +++ b/tools/perf/util/arm-spe.h
>>> @@ -9,6 +9,26 @@
>>>
>>>  #define ARM_SPE_PMU_NAME "arm_spe_"
>>>
>>> +#define ARM_SPE_EVENT_LO                     3
>>> +#define ARM_SPE_EVENT_HI                     6
>>> +#define ARM_SPE_EVENT_LLC_MISS                       BIT(0)
>>> +#define ARM_SPE_EVENT_BRANCH_MISS            BIT(1)
>>> +#define ARM_SPE_EVENT_TLB_MISS                       BIT(2)
>>> +#define ARM_SPE_EVENT_REMOTE_ACCESS          BIT(3)
>>> +
>>> +#define SPE_ATTR_TS_ENABLE                   BIT(0)
>>> +#define SPE_ATTR_PA_ENABLE                   BIT(1)
>>> +#define SPE_ATTR_PCT_ENABLE                  BIT(2)
>>> +#define SPE_ATTR_JITTER                              BIT(16)
>>> +#define SPE_ATTR_BRANCH_FILTER                       BIT(32)
>>> +#define SPE_ATTR_LOAD_FILTER                 BIT(33)
>>> +#define SPE_ATTR_STORE_FILTER                        BIT(34)
>>> +
>>> +#define SPE_ATTR_EV_RETIRED                  BIT(1)
>>> +#define SPE_ATTR_EV_CACHE                    BIT(3)
>>> +#define SPE_ATTR_EV_TLB                              BIT(5)
>>> +#define SPE_ATTR_EV_BRANCH                   BIT(7)
>>> +
>>>  enum {
>>>       ARM_SPE_PMU_TYPE,
>>>       ARM_SPE_PER_CPU_MMAPS,
>>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>>
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v3 5/5] perf tools: Add support to process multi spe events
  2019-12-06 15:48       ` James Clark
@ 2019-12-09  0:46         ` Tan Xiaojun
  2019-12-12 15:50           ` James Clark
  0 siblings, 1 reply; 13+ messages in thread
From: Tan Xiaojun @ 2019-12-09  0:46 UTC (permalink / raw)
  To: James Clark, peterz, mingo, acme, alexander.shishkin, jolsa,
	namhyung, ak, adrian.hunter, yao.jin, tmricht, brueckner,
	songliubraving, gregkh, Kim Phillips, Jeremy Linton
  Cc: gengdongjiu, wxf.wang, liwei391, huawei.libin, linux-kernel,
	linux-perf-users, nd

On 2019/12/6 23:48, James Clark wrote:
> Hi Xiaojun,
> 
>>
>> What do you think of this current implementation? Or you prefer the previous way(like branch-misses:pp dTLB-load-misses:pp cache-misses:pp), then I will modify it again.
>>
> 
> Yes I think I prefer the previous way. The reason to add support for :p on the standard event names was to make the user experience more similar to x86. What was the reason for moving to arm_spe_x/branch_miss/? If we are going to use this format then I don't see the need for requiring users to add :p to the end of arm_spe_x/branch_miss/.
> 

Hi, James,

OK, I will reconsider how to modify it.

> I've tested these patches, but unfortunately I don't see the new events when I do perf list:
>     ...
>       arm_spe_0//                                        [Kernel PMU event]
>       armv8_pmuv3/l3d_cache_wb/                          [Kernel PMU event]
>       armv8_pmuv3/sample_collision/                      [Kernel PMU event]
>       armv8_pmuv3/sample_feed/                           [Kernel PMU event]
>       armv8_pmuv3/sample_filtrate/                       [Kernel PMU event]
>       armv8_pmuv3/sample_pop/                            [Kernel PMU event]
>     
>     branch:
>       br_mis_pred
>     ...
> 
> Should I see events like /arm_spe_0/branch_miss/ in that list?
> 
> And then if I attempt to record them I get this error:
> 
>     ./perf record -e arm_spe_0/branch_miss/ ls
>     event syntax error: 'arm_spe_0/branch_miss/'
>                                    \___ unknown term
> 
> But using the plain event name still works:
> 
>     ./perf record -e arm_spe/ts_enable=1/ ls
>     ...
>     [ perf record: Woken up 1 times to write data ]
>     [ perf record: Captured and wrote 0.571 MB perf.data ]
> 

It may be that you did not recompile the kernel. In order to support this method, I modified the spe driver, which requires recompiling the kernel to support it.

Thanks.
Xiaojun.

> 
> Thanks
> James
> 
> On 30/11/2019 00:42, Tan Xiaojun wrote:
>> On 2019/11/30 0:32, James Clark wrote:
>>> Hi Xiaojun,
>>>
>>> Sorry for not replying earlier, I was at a conference. Unfortunately I have temporarily lost access to SPE enabled hardware but I will test this out and get back to you as soon as possible.
>>>
>>>
>>> Thanks
>>> James
>>>
>>
>> OK.
>>
>> Thanks.
>> Xiaojun.
>>
>>> On 23/11/2019 10:11, Tan Xiaojun wrote:
>>>> Under the original logic, if the user specifies multiple spe
>>>> events during the record, perf will report an error and exit
>>>> without actually running. This is not very friendly.
>>>>
>>>> This patch slightly modifies this logic, in which case a
>>>> warning is reported and the first spe event is taken as a
>>>> record.
>>>>
>>>> At the same time, this patch also supports the recording of
>>>> multi new synthetic events. However, if the user specifies the
>>>> spe event and then specifies the synthetic spe events, a warning
>>>> will be reported and the above principles will still be followed,
>>>> only the first spe event will be recorded.
>>>>
>>>> Example:
>>>> ------------------------------------------------------------------
>>>> 1) For multiple spe events
>>>> $ perf record -e arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/ -e arm_spe_0/ts_enable=0,store_filter=1,jitter=1,min_latency=0/ ls
>>>> Warning:
>>>> There may be only one arm_spe_x event. More than one spe event will be ignored, unless they are synthetic events of spe, like:
>>>> arm_spe_x/llc_miss/
>>>> arm_spe_x/branch_miss/
>>>> arm_spe_x/tlb_miss/
>>>> arm_spe_x/remote_access/
>>>> (see 'perf list')
>>>> ...
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.078 MB perf.data ]
>>>>
>>>> $ perf report --stdio
>>>> ...
>>>>  # Samples: 0  of event 'arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/'
>>>> ...
>>>>
>>>> 2) For multiple spe precise ip events (synthetic event)
>>>> $ perf record -e arm_spe_0/llc_miss/ -e arm_spe_0/llc_miss/ -e arm_spe_0/tlb_miss/ ls
>>>> Warning:
>>>> These events are precise ip events, please add :p/pp/ppp after the event.
>>>> ...
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.343 MB perf.data ]
>>>>
>>>> $ perf report --stdio
>>>>  # To display the perf.data header info, please use --header/--header-only options.
>>>>  #
>>>>  #
>>>>  # Total Lost Samples: 0
>>>>  #
>>>>  # Samples: 0  of event 'arm_spe_0/llc_miss/, arm_spe_0/tlb_miss/'
>>>>  # Event count (approx.): 0
>>>>  #
>>>>  # Children      Self  Command  Shared Object  Symbol
>>>>  # ........  ........  .......  .............  ......
>>>>  #
>>>>
>>>>  # Samples: 0  of event 'dummy:u'
>>>>  # Event count (approx.): 0
>>>>  #
>>>>  # Children      Self  Command  Shared Object  Symbol
>>>>  # ........  ........  .......  .............  ......
>>>>  #
>>>>
>>>>  # Samples: 83  of event 'llc-miss'
>>>>  # Event count (approx.): 83
>>>>  #
>>>>  # Children      Self  Command  Shared Object      Symbol
>>>>  # ........  ........  .......  .................  ....................................
>>>>  #
>>>>      42.17%    42.17%  ls       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>>>      14.46%    14.46%  ls       [kernel.kallsyms]  [k] memchr_inv
>>>>      13.25%    13.25%  ls       [kernel.kallsyms]  [k] perf_event_mmap
>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] available_idle_cpu
>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] copy_page
>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] try_to_wake_up
>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] vma_interval_tree_insert
>>>>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_lookup_symbol_x
>>>>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_relocate_object
>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] ext4_getattr
>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_page_from_freelist
>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] lock_page_memcg
>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] may_open
>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] radix_tree_next_chunk
>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] rb_prev
>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_start
>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] do_lookup_x
>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] rtld_lock_default_lock_recursive
>>>>       1.20%     1.20%  ls       libc-2.28.so       [.] getenv
>>>>       1.20%     1.20%  ls       [unknown]          [.] 0xffff29f1190029b8
>>>>
>>>>  # Samples: 13  of event 'tlb-miss'
>>>>  # Event count (approx.): 13
>>>>  #
>>>>  # Children      Self  Command  Shared Object      Symbol
>>>>  # ........  ........  .......  .................  ............................
>>>>  #
>>>>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] __audit_syscall_entry
>>>>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>>>>      15.38%    15.38%  ls       ld-2.28.so         [.] _dl_relocate_object
>>>>      15.38%    15.38%  ls       ld-2.28.so         [.] do_lookup_x
>>>>       7.69%     7.69%  ls       [kernel.kallsyms]  [k] memchr_inv
>>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_setup_hash
>>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_start
>>>>       7.69%     7.69%  ls       ls                 [.] 0x00000000000097a0
>>>>
>>>> ------------------------------------------------------------------
>>>>
>>>> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
>>>> ---
>>>>  tools/perf/arch/arm64/util/arm-spe.c | 47 +++++++++++++++++++++++++---
>>>>  tools/perf/util/arm-spe.c            | 25 +++++++++++++++
>>>>  tools/perf/util/arm-spe.h            | 20 ++++++++++++
>>>>  3 files changed, 88 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
>>>> index eba6541ec0f1..68e91f3c9614 100644
>>>> --- a/tools/perf/arch/arm64/util/arm-spe.c
>>>> +++ b/tools/perf/arch/arm64/util/arm-spe.c
>>>> @@ -67,21 +67,60 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
>>>>       struct arm_spe_recording *sper =
>>>>                       container_of(itr, struct arm_spe_recording, itr);
>>>>       struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
>>>> -     struct evsel *evsel, *arm_spe_evsel = NULL;
>>>> +     struct evsel *evsel, *tmp, *arm_spe_evsel = NULL;
>>>>       bool privileged = perf_event_paranoid_check(-1);
>>>>       struct evsel *tracking_evsel;
>>>> +     char evsel_name[128];
>>>>       int err;
>>>>
>>>>       sper->evlist = evlist;
>>>>
>>>> -     evlist__for_each_entry(evlist, evsel) {
>>>> +     evlist__for_each_entry_safe(evlist, tmp, evsel) {
>>>>               if (evsel->core.attr.type == arm_spe_pmu->type) {
>>>>                       if (arm_spe_evsel) {
>>>> -                             pr_err("There may be only one " ARM_SPE_PMU_NAME "x event\n");
>>>> -                             return -EINVAL;
>>>> +                             if ((evsel->core.attr.config
>>>> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>>> +                                                     ARM_SPE_EVENT_LO))
>>>> +                                             && (arm_spe_evsel->core.attr.config
>>>> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>>> +                                                     ARM_SPE_EVENT_LO))) {
>>>> +                                     arm_spe_evsel->core.attr.config |=
>>>> +                                                             evsel->core.attr.config;
>>>> +
>>>> +                                     if (!strstr(arm_spe_evsel->name, evsel->name)) {
>>>> +                                             scnprintf(evsel_name, sizeof(evsel_name),
>>>> +                                                             "%s, %s", arm_spe_evsel->name,
>>>> +                                                             evsel->name);
>>>> +                                             arm_spe_evsel->name = strdup(evsel_name);
>>>> +                                     }
>>>> +                             } else
>>>> +                                     pr_warning("Warning:\n"
>>>> +                                             "There may be only one "
>>>> +                                             ARM_SPE_PMU_NAME "x event."
>>>> +                                             " More than one spe event"
>>>> +                                             " will be ignored, unless"
>>>> +                                             " they are synthetic events"
>>>> +                                             " of spe, like:"
>>>> +                                             "\narm_spe_x/llc_miss/"
>>>> +                                             "\narm_spe_x/branch_miss/"
>>>> +                                             "\narm_spe_x/tlb_miss/"
>>>> +                                             "\narm_spe_x/remote_access/"
>>>> +                                             "\n(see 'perf list')\n");
>>>> +                             evlist__remove(evlist, evsel);
>>>> +                             evsel__delete(evsel);
>>>> +                             continue;
>>>>                       }
>>>>                       evsel->core.attr.freq = 0;
>>>>                       evsel->core.attr.sample_period = 1;
>>>> +                     if (evsel->core.attr.config
>>>> +                                     & GENMASK_ULL(ARM_SPE_EVENT_HI, ARM_SPE_EVENT_LO)) {
>>>> +                             evsel->core.attr.config |= SPE_ATTR_TS_ENABLE;
>>>> +                             if (!evsel->core.attr.precise_ip)
>>>> +                                     pr_warning("Warning:\n"
>>>> +                                             "These events are precise ip events,"
>>>> +                                             " please add :p/pp/ppp after the event.\n");
>>>> +                     }
>>>> +
>>>>                       arm_spe_evsel = evsel;
>>>>                       opts->full_auxtrace = true;
>>>>               }
>>>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>>>> index e7282c2616f3..0c9d7fa518a5 100644
>>>> --- a/tools/perf/util/arm-spe.c
>>>> +++ b/tools/perf/util/arm-spe.c
>>>> @@ -779,6 +779,31 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
>>>>       attr.sample_id_all = evsel->core.attr.sample_id_all;
>>>>       attr.read_format = evsel->core.attr.read_format;
>>>>
>>>> +     if (evsel->core.attr.config
>>>> +                     & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>>> +                             ARM_SPE_EVENT_LO)) {
>>>> +             spe->synth_opts.llc_miss = false;
>>>> +             spe->synth_opts.tlb_miss = false;
>>>> +             spe->synth_opts.branch_miss = false;
>>>> +             spe->synth_opts.remote_access = false;
>>>> +
>>>> +             if (evsel->core.attr.config
>>>> +                             & (ARM_SPE_EVENT_LLC_MISS << ARM_SPE_EVENT_LO))
>>>> +                     spe->synth_opts.llc_miss = true;
>>>> +
>>>> +             if (evsel->core.attr.config
>>>> +                             & (ARM_SPE_EVENT_TLB_MISS << ARM_SPE_EVENT_LO))
>>>> +                     spe->synth_opts.tlb_miss = true;
>>>> +
>>>> +             if (evsel->core.attr.config
>>>> +                             & (ARM_SPE_EVENT_BRANCH_MISS << ARM_SPE_EVENT_LO))
>>>> +                     spe->synth_opts.branch_miss = true;
>>>> +
>>>> +             if (evsel->core.attr.config
>>>> +                             & (ARM_SPE_EVENT_REMOTE_ACCESS << ARM_SPE_EVENT_LO))
>>>> +                     spe->synth_opts.remote_access = true;
>>>> +     }
>>>> +
>>>>       /* create new id val to be a fixed offset from evsel id */
>>>>       id = evsel->core.id[0] + 1000000000;
>>>>
>>>> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
>>>> index 98d3235781c3..db7420121979 100644
>>>> --- a/tools/perf/util/arm-spe.h
>>>> +++ b/tools/perf/util/arm-spe.h
>>>> @@ -9,6 +9,26 @@
>>>>
>>>>  #define ARM_SPE_PMU_NAME "arm_spe_"
>>>>
>>>> +#define ARM_SPE_EVENT_LO                     3
>>>> +#define ARM_SPE_EVENT_HI                     6
>>>> +#define ARM_SPE_EVENT_LLC_MISS                       BIT(0)
>>>> +#define ARM_SPE_EVENT_BRANCH_MISS            BIT(1)
>>>> +#define ARM_SPE_EVENT_TLB_MISS                       BIT(2)
>>>> +#define ARM_SPE_EVENT_REMOTE_ACCESS          BIT(3)
>>>> +
>>>> +#define SPE_ATTR_TS_ENABLE                   BIT(0)
>>>> +#define SPE_ATTR_PA_ENABLE                   BIT(1)
>>>> +#define SPE_ATTR_PCT_ENABLE                  BIT(2)
>>>> +#define SPE_ATTR_JITTER                              BIT(16)
>>>> +#define SPE_ATTR_BRANCH_FILTER                       BIT(32)
>>>> +#define SPE_ATTR_LOAD_FILTER                 BIT(33)
>>>> +#define SPE_ATTR_STORE_FILTER                        BIT(34)
>>>> +
>>>> +#define SPE_ATTR_EV_RETIRED                  BIT(1)
>>>> +#define SPE_ATTR_EV_CACHE                    BIT(3)
>>>> +#define SPE_ATTR_EV_TLB                              BIT(5)
>>>> +#define SPE_ATTR_EV_BRANCH                   BIT(7)
>>>> +
>>>>  enum {
>>>>       ARM_SPE_PMU_TYPE,
>>>>       ARM_SPE_PER_CPU_MMAPS,
>>>>
>>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>>>
>>
>>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v3 5/5] perf tools: Add support to process multi spe events
  2019-12-09  0:46         ` Tan Xiaojun
@ 2019-12-12 15:50           ` James Clark
  2019-12-16  3:20             ` Tan Xiaojun
  0 siblings, 1 reply; 13+ messages in thread
From: James Clark @ 2019-12-12 15:50 UTC (permalink / raw)
  To: Tan Xiaojun, peterz, mingo, acme, alexander.shishkin, jolsa,
	namhyung, ak, adrian.hunter, yao.jin, tmricht, brueckner,
	songliubraving, gregkh, Kim Phillips, Jeremy Linton
  Cc: gengdongjiu, wxf.wang, liwei391, huawei.libin, linux-kernel,
	linux-perf-users, nd


Hi Xiaojun,

> It may be that you did not recompile the kernel. In order to support this method, I modified the spe driver, which requires recompiling the kernel to support it.

Ah yes sorry I missed that there were kernel changes. I'm currently trying to understand what additional behavior is. The way I understand it, is that it is that it's already
possible to configure the filtering like this: ./perf record -e arm_spe/ts_enable=1,branch_filter=1/ ls. So the new synthetic SPE events aren't strictly necessary.

I think it would be best to avoid kernel changes because the SPE driver has been in the kernel for quite some time now. And relying on a new version in perf will
make it more difficult for people to access this feature easily.

Do you think it will be possible to get all the functionality with the existing driver? I think RFC v2 was working quite well, apart from the multiple events issue.
But maybe that is not that important of a use case. And it would be better to get a basic version accepted sooner.


Regards
James

>>
>> Thanks
>> James
>>
>> On 30/11/2019 00:42, Tan Xiaojun wrote:
>>> On 2019/11/30 0:32, James Clark wrote:
>>>> Hi Xiaojun,
>>>>
>>>> Sorry for not replying earlier, I was at a conference. Unfortunately I have temporarily lost access to SPE enabled hardware but I will test this out and get back to you as soon as possible.
>>>>
>>>>
>>>> Thanks
>>>> James
>>>>
>>>
>>> OK.
>>>
>>> Thanks.
>>> Xiaojun.
>>>
>>>> On 23/11/2019 10:11, Tan Xiaojun wrote:
>>>>> Under the original logic, if the user specifies multiple spe
>>>>> events during the record, perf will report an error and exit
>>>>> without actually running. This is not very friendly.
>>>>>
>>>>> This patch slightly modifies this logic, in which case a
>>>>> warning is reported and the first spe event is taken as a
>>>>> record.
>>>>>
>>>>> At the same time, this patch also supports the recording of
>>>>> multi new synthetic events. However, if the user specifies the
>>>>> spe event and then specifies the synthetic spe events, a warning
>>>>> will be reported and the above principles will still be followed,
>>>>> only the first spe event will be recorded.
>>>>>
>>>>> Example:
>>>>> ------------------------------------------------------------------
>>>>> 1) For multiple spe events
>>>>> $ perf record -e arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/ -e arm_spe_0/ts_enable=0,store_filter=1,jitter=1,min_latency=0/ ls
>>>>> Warning:
>>>>> There may be only one arm_spe_x event. More than one spe event will be ignored, unless they are synthetic events of spe, like:
>>>>> arm_spe_x/llc_miss/
>>>>> arm_spe_x/branch_miss/
>>>>> arm_spe_x/tlb_miss/
>>>>> arm_spe_x/remote_access/
>>>>> (see 'perf list')
>>>>> ...
>>>>> [ perf record: Woken up 1 times to write data ]
>>>>> [ perf record: Captured and wrote 0.078 MB perf.data ]
>>>>>
>>>>> $ perf report --stdio
>>>>> ...
>>>>>  # Samples: 0  of event 'arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/'
>>>>> ...
>>>>>
>>>>> 2) For multiple spe precise ip events (synthetic event)
>>>>> $ perf record -e arm_spe_0/llc_miss/ -e arm_spe_0/llc_miss/ -e arm_spe_0/tlb_miss/ ls
>>>>> Warning:
>>>>> These events are precise ip events, please add :p/pp/ppp after the event.
>>>>> ...
>>>>> [ perf record: Woken up 1 times to write data ]
>>>>> [ perf record: Captured and wrote 0.343 MB perf.data ]
>>>>>
>>>>> $ perf report --stdio
>>>>>  # To display the perf.data header info, please use --header/--header-only options.
>>>>>  #
>>>>>  #
>>>>>  # Total Lost Samples: 0
>>>>>  #
>>>>>  # Samples: 0  of event 'arm_spe_0/llc_miss/, arm_spe_0/tlb_miss/'
>>>>>  # Event count (approx.): 0
>>>>>  #
>>>>>  # Children      Self  Command  Shared Object  Symbol
>>>>>  # ........  ........  .......  .............  ......
>>>>>  #
>>>>>
>>>>>  # Samples: 0  of event 'dummy:u'
>>>>>  # Event count (approx.): 0
>>>>>  #
>>>>>  # Children      Self  Command  Shared Object  Symbol
>>>>>  # ........  ........  .......  .............  ......
>>>>>  #
>>>>>
>>>>>  # Samples: 83  of event 'llc-miss'
>>>>>  # Event count (approx.): 83
>>>>>  #
>>>>>  # Children      Self  Command  Shared Object      Symbol
>>>>>  # ........  ........  .......  .................  ....................................
>>>>>  #
>>>>>      42.17%    42.17%  ls       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>>>>      14.46%    14.46%  ls       [kernel.kallsyms]  [k] memchr_inv
>>>>>      13.25%    13.25%  ls       [kernel.kallsyms]  [k] perf_event_mmap
>>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] available_idle_cpu
>>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] copy_page
>>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] try_to_wake_up
>>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] vma_interval_tree_insert
>>>>>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_lookup_symbol_x
>>>>>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_relocate_object
>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] ext4_getattr
>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_page_from_freelist
>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] lock_page_memcg
>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] may_open
>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] radix_tree_next_chunk
>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] rb_prev
>>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_start
>>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] do_lookup_x
>>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] rtld_lock_default_lock_recursive
>>>>>       1.20%     1.20%  ls       libc-2.28.so       [.] getenv
>>>>>       1.20%     1.20%  ls       [unknown]          [.] 0xffff29f1190029b8
>>>>>
>>>>>  # Samples: 13  of event 'tlb-miss'
>>>>>  # Event count (approx.): 13
>>>>>  #
>>>>>  # Children      Self  Command  Shared Object      Symbol
>>>>>  # ........  ........  .......  .................  ............................
>>>>>  #
>>>>>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] __audit_syscall_entry
>>>>>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>>>>>      15.38%    15.38%  ls       ld-2.28.so         [.] _dl_relocate_object
>>>>>      15.38%    15.38%  ls       ld-2.28.so         [.] do_lookup_x
>>>>>       7.69%     7.69%  ls       [kernel.kallsyms]  [k] memchr_inv
>>>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>>>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_setup_hash
>>>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_start
>>>>>       7.69%     7.69%  ls       ls                 [.] 0x00000000000097a0
>>>>>
>>>>> ------------------------------------------------------------------
>>>>>
>>>>> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
>>>>> ---
>>>>>  tools/perf/arch/arm64/util/arm-spe.c | 47 +++++++++++++++++++++++++---
>>>>>  tools/perf/util/arm-spe.c            | 25 +++++++++++++++
>>>>>  tools/perf/util/arm-spe.h            | 20 ++++++++++++
>>>>>  3 files changed, 88 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
>>>>> index eba6541ec0f1..68e91f3c9614 100644
>>>>> --- a/tools/perf/arch/arm64/util/arm-spe.c
>>>>> +++ b/tools/perf/arch/arm64/util/arm-spe.c
>>>>> @@ -67,21 +67,60 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
>>>>>       struct arm_spe_recording *sper =
>>>>>                       container_of(itr, struct arm_spe_recording, itr);
>>>>>       struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
>>>>> -     struct evsel *evsel, *arm_spe_evsel = NULL;
>>>>> +     struct evsel *evsel, *tmp, *arm_spe_evsel = NULL;
>>>>>       bool privileged = perf_event_paranoid_check(-1);
>>>>>       struct evsel *tracking_evsel;
>>>>> +     char evsel_name[128];
>>>>>       int err;
>>>>>
>>>>>       sper->evlist = evlist;
>>>>>
>>>>> -     evlist__for_each_entry(evlist, evsel) {
>>>>> +     evlist__for_each_entry_safe(evlist, tmp, evsel) {
>>>>>               if (evsel->core.attr.type == arm_spe_pmu->type) {
>>>>>                       if (arm_spe_evsel) {
>>>>> -                             pr_err("There may be only one " ARM_SPE_PMU_NAME "x event\n");
>>>>> -                             return -EINVAL;
>>>>> +                             if ((evsel->core.attr.config
>>>>> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>>>> +                                                     ARM_SPE_EVENT_LO))
>>>>> +                                             && (arm_spe_evsel->core.attr.config
>>>>> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>>>> +                                                     ARM_SPE_EVENT_LO))) {
>>>>> +                                     arm_spe_evsel->core.attr.config |=
>>>>> +                                                             evsel->core.attr.config;
>>>>> +
>>>>> +                                     if (!strstr(arm_spe_evsel->name, evsel->name)) {
>>>>> +                                             scnprintf(evsel_name, sizeof(evsel_name),
>>>>> +                                                             "%s, %s", arm_spe_evsel->name,
>>>>> +                                                             evsel->name);
>>>>> +                                             arm_spe_evsel->name = strdup(evsel_name);
>>>>> +                                     }
>>>>> +                             } else
>>>>> +                                     pr_warning("Warning:\n"
>>>>> +                                             "There may be only one "
>>>>> +                                             ARM_SPE_PMU_NAME "x event."
>>>>> +                                             " More than one spe event"
>>>>> +                                             " will be ignored, unless"
>>>>> +                                             " they are synthetic events"
>>>>> +                                             " of spe, like:"
>>>>> +                                             "\narm_spe_x/llc_miss/"
>>>>> +                                             "\narm_spe_x/branch_miss/"
>>>>> +                                             "\narm_spe_x/tlb_miss/"
>>>>> +                                             "\narm_spe_x/remote_access/"
>>>>> +                                             "\n(see 'perf list')\n");
>>>>> +                             evlist__remove(evlist, evsel);
>>>>> +                             evsel__delete(evsel);
>>>>> +                             continue;
>>>>>                       }
>>>>>                       evsel->core.attr.freq = 0;
>>>>>                       evsel->core.attr.sample_period = 1;
>>>>> +                     if (evsel->core.attr.config
>>>>> +                                     & GENMASK_ULL(ARM_SPE_EVENT_HI, ARM_SPE_EVENT_LO)) {
>>>>> +                             evsel->core.attr.config |= SPE_ATTR_TS_ENABLE;
>>>>> +                             if (!evsel->core.attr.precise_ip)
>>>>> +                                     pr_warning("Warning:\n"
>>>>> +                                             "These events are precise ip events,"
>>>>> +                                             " please add :p/pp/ppp after the event.\n");
>>>>> +                     }
>>>>> +
>>>>>                       arm_spe_evsel = evsel;
>>>>>                       opts->full_auxtrace = true;
>>>>>               }
>>>>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>>>>> index e7282c2616f3..0c9d7fa518a5 100644
>>>>> --- a/tools/perf/util/arm-spe.c
>>>>> +++ b/tools/perf/util/arm-spe.c
>>>>> @@ -779,6 +779,31 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
>>>>>       attr.sample_id_all = evsel->core.attr.sample_id_all;
>>>>>       attr.read_format = evsel->core.attr.read_format;
>>>>>
>>>>> +     if (evsel->core.attr.config
>>>>> +                     & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>>>> +                             ARM_SPE_EVENT_LO)) {
>>>>> +             spe->synth_opts.llc_miss = false;
>>>>> +             spe->synth_opts.tlb_miss = false;
>>>>> +             spe->synth_opts.branch_miss = false;
>>>>> +             spe->synth_opts.remote_access = false;
>>>>> +
>>>>> +             if (evsel->core.attr.config
>>>>> +                             & (ARM_SPE_EVENT_LLC_MISS << ARM_SPE_EVENT_LO))
>>>>> +                     spe->synth_opts.llc_miss = true;
>>>>> +
>>>>> +             if (evsel->core.attr.config
>>>>> +                             & (ARM_SPE_EVENT_TLB_MISS << ARM_SPE_EVENT_LO))
>>>>> +                     spe->synth_opts.tlb_miss = true;
>>>>> +
>>>>> +             if (evsel->core.attr.config
>>>>> +                             & (ARM_SPE_EVENT_BRANCH_MISS << ARM_SPE_EVENT_LO))
>>>>> +                     spe->synth_opts.branch_miss = true;
>>>>> +
>>>>> +             if (evsel->core.attr.config
>>>>> +                             & (ARM_SPE_EVENT_REMOTE_ACCESS << ARM_SPE_EVENT_LO))
>>>>> +                     spe->synth_opts.remote_access = true;
>>>>> +     }
>>>>> +
>>>>>       /* create new id val to be a fixed offset from evsel id */
>>>>>       id = evsel->core.id[0] + 1000000000;
>>>>>
>>>>> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
>>>>> index 98d3235781c3..db7420121979 100644
>>>>> --- a/tools/perf/util/arm-spe.h
>>>>> +++ b/tools/perf/util/arm-spe.h
>>>>> @@ -9,6 +9,26 @@
>>>>>
>>>>>  #define ARM_SPE_PMU_NAME "arm_spe_"
>>>>>
>>>>> +#define ARM_SPE_EVENT_LO                     3
>>>>> +#define ARM_SPE_EVENT_HI                     6
>>>>> +#define ARM_SPE_EVENT_LLC_MISS                       BIT(0)
>>>>> +#define ARM_SPE_EVENT_BRANCH_MISS            BIT(1)
>>>>> +#define ARM_SPE_EVENT_TLB_MISS                       BIT(2)
>>>>> +#define ARM_SPE_EVENT_REMOTE_ACCESS          BIT(3)
>>>>> +
>>>>> +#define SPE_ATTR_TS_ENABLE                   BIT(0)
>>>>> +#define SPE_ATTR_PA_ENABLE                   BIT(1)
>>>>> +#define SPE_ATTR_PCT_ENABLE                  BIT(2)
>>>>> +#define SPE_ATTR_JITTER                              BIT(16)
>>>>> +#define SPE_ATTR_BRANCH_FILTER                       BIT(32)
>>>>> +#define SPE_ATTR_LOAD_FILTER                 BIT(33)
>>>>> +#define SPE_ATTR_STORE_FILTER                        BIT(34)
>>>>> +
>>>>> +#define SPE_ATTR_EV_RETIRED                  BIT(1)
>>>>> +#define SPE_ATTR_EV_CACHE                    BIT(3)
>>>>> +#define SPE_ATTR_EV_TLB                              BIT(5)
>>>>> +#define SPE_ATTR_EV_BRANCH                   BIT(7)
>>>>> +
>>>>>  enum {
>>>>>       ARM_SPE_PMU_TYPE,
>>>>>       ARM_SPE_PER_CPU_MMAPS,
>>>>>
>>>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>>>>
>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v3 5/5] perf tools: Add support to process multi spe events
  2019-12-12 15:50           ` James Clark
@ 2019-12-16  3:20             ` Tan Xiaojun
  0 siblings, 0 replies; 13+ messages in thread
From: Tan Xiaojun @ 2019-12-16  3:20 UTC (permalink / raw)
  To: James Clark, peterz, mingo, acme, alexander.shishkin, jolsa,
	namhyung, ak, adrian.hunter, yao.jin, tmricht, brueckner,
	songliubraving, gregkh, Kim Phillips, Jeremy Linton
  Cc: gengdongjiu, wxf.wang, liwei391, huawei.libin, linux-kernel,
	linux-perf-users, nd

On 2019/12/12 23:50, James Clark wrote:
> 
> Hi Xiaojun,
> 
>> It may be that you did not recompile the kernel. In order to support this method, I modified the spe driver, which requires recompiling the kernel to support it.
> 
> Ah yes sorry I missed that there were kernel changes. I'm currently trying to understand what additional behavior is. The way I understand it, is that it is that it's already
> possible to configure the filtering like this: ./perf record -e arm_spe/ts_enable=1,branch_filter=1/ ls. So the new synthetic SPE events aren't strictly necessary.
> 
> I think it would be best to avoid kernel changes because the SPE driver has been in the kernel for quite some time now. And relying on a new version in perf will
> make it more difficult for people to access this feature easily.
> 
> Do you think it will be possible to get all the functionality with the existing driver? I think RFC v2 was working quite well, apart from the multiple events issue.
> But maybe that is not that important of a use case. And it would be better to get a basic version accepted sooner.
> 

OK, I will modify and send a new version as soon as possible.

Thanks.
Xiaojun.

> 
> Regards
> James
> 
>>>
>>> Thanks
>>> James
>>>
>>> On 30/11/2019 00:42, Tan Xiaojun wrote:
>>>> On 2019/11/30 0:32, James Clark wrote:
>>>>> Hi Xiaojun,
>>>>>
>>>>> Sorry for not replying earlier, I was at a conference. Unfortunately I have temporarily lost access to SPE enabled hardware but I will test this out and get back to you as soon as possible.
>>>>>
>>>>>
>>>>> Thanks
>>>>> James
>>>>>
>>>>
>>>> OK.
>>>>
>>>> Thanks.
>>>> Xiaojun.
>>>>
>>>>> On 23/11/2019 10:11, Tan Xiaojun wrote:
>>>>>> Under the original logic, if the user specifies multiple spe
>>>>>> events during the record, perf will report an error and exit
>>>>>> without actually running. This is not very friendly.
>>>>>>
>>>>>> This patch slightly modifies this logic, in which case a
>>>>>> warning is reported and the first spe event is taken as a
>>>>>> record.
>>>>>>
>>>>>> At the same time, this patch also supports the recording of
>>>>>> multi new synthetic events. However, if the user specifies the
>>>>>> spe event and then specifies the synthetic spe events, a warning
>>>>>> will be reported and the above principles will still be followed,
>>>>>> only the first spe event will be recorded.
>>>>>>
>>>>>> Example:
>>>>>> ------------------------------------------------------------------
>>>>>> 1) For multiple spe events
>>>>>> $ perf record -e arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/ -e arm_spe_0/ts_enable=0,store_filter=1,jitter=1,min_latency=0/ ls
>>>>>> Warning:
>>>>>> There may be only one arm_spe_x event. More than one spe event will be ignored, unless they are synthetic events of spe, like:
>>>>>> arm_spe_x/llc_miss/
>>>>>> arm_spe_x/branch_miss/
>>>>>> arm_spe_x/tlb_miss/
>>>>>> arm_spe_x/remote_access/
>>>>>> (see 'perf list')
>>>>>> ...
>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>> [ perf record: Captured and wrote 0.078 MB perf.data ]
>>>>>>
>>>>>> $ perf report --stdio
>>>>>> ...
>>>>>>  # Samples: 0  of event 'arm_spe_0/ts_enable=0,load_filter=1,jitter=1,min_latency=0/'
>>>>>> ...
>>>>>>
>>>>>> 2) For multiple spe precise ip events (synthetic event)
>>>>>> $ perf record -e arm_spe_0/llc_miss/ -e arm_spe_0/llc_miss/ -e arm_spe_0/tlb_miss/ ls
>>>>>> Warning:
>>>>>> These events are precise ip events, please add :p/pp/ppp after the event.
>>>>>> ...
>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>> [ perf record: Captured and wrote 0.343 MB perf.data ]
>>>>>>
>>>>>> $ perf report --stdio
>>>>>>  # To display the perf.data header info, please use --header/--header-only options.
>>>>>>  #
>>>>>>  #
>>>>>>  # Total Lost Samples: 0
>>>>>>  #
>>>>>>  # Samples: 0  of event 'arm_spe_0/llc_miss/, arm_spe_0/tlb_miss/'
>>>>>>  # Event count (approx.): 0
>>>>>>  #
>>>>>>  # Children      Self  Command  Shared Object  Symbol
>>>>>>  # ........  ........  .......  .............  ......
>>>>>>  #
>>>>>>
>>>>>>  # Samples: 0  of event 'dummy:u'
>>>>>>  # Event count (approx.): 0
>>>>>>  #
>>>>>>  # Children      Self  Command  Shared Object  Symbol
>>>>>>  # ........  ........  .......  .............  ......
>>>>>>  #
>>>>>>
>>>>>>  # Samples: 83  of event 'llc-miss'
>>>>>>  # Event count (approx.): 83
>>>>>>  #
>>>>>>  # Children      Self  Command  Shared Object      Symbol
>>>>>>  # ........  ........  .......  .................  ....................................
>>>>>>  #
>>>>>>      42.17%    42.17%  ls       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>>>>>      14.46%    14.46%  ls       [kernel.kallsyms]  [k] memchr_inv
>>>>>>      13.25%    13.25%  ls       [kernel.kallsyms]  [k] perf_event_mmap
>>>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] available_idle_cpu
>>>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] copy_page
>>>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] try_to_wake_up
>>>>>>       2.41%     2.41%  ls       [kernel.kallsyms]  [k] vma_interval_tree_insert
>>>>>>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_lookup_symbol_x
>>>>>>       2.41%     2.41%  ls       ld-2.28.so         [.] _dl_relocate_object
>>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] ext4_getattr
>>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_page_from_freelist
>>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] lock_page_memcg
>>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] may_open
>>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] radix_tree_next_chunk
>>>>>>       1.20%     1.20%  ls       [kernel.kallsyms]  [k] rb_prev
>>>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>>>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] _dl_start
>>>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] do_lookup_x
>>>>>>       1.20%     1.20%  ls       ld-2.28.so         [.] rtld_lock_default_lock_recursive
>>>>>>       1.20%     1.20%  ls       libc-2.28.so       [.] getenv
>>>>>>       1.20%     1.20%  ls       [unknown]          [.] 0xffff29f1190029b8
>>>>>>
>>>>>>  # Samples: 13  of event 'tlb-miss'
>>>>>>  # Event count (approx.): 13
>>>>>>  #
>>>>>>  # Children      Self  Command  Shared Object      Symbol
>>>>>>  # ........  ........  .......  .................  ............................
>>>>>>  #
>>>>>>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] __audit_syscall_entry
>>>>>>      15.38%    15.38%  ls       [kernel.kallsyms]  [k] get_partial_node.isra.25
>>>>>>      15.38%    15.38%  ls       ld-2.28.so         [.] _dl_relocate_object
>>>>>>      15.38%    15.38%  ls       ld-2.28.so         [.] do_lookup_x
>>>>>>       7.69%     7.69%  ls       [kernel.kallsyms]  [k] memchr_inv
>>>>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_map_object_from_fd
>>>>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_setup_hash
>>>>>>       7.69%     7.69%  ls       ld-2.28.so         [.] _dl_start
>>>>>>       7.69%     7.69%  ls       ls                 [.] 0x00000000000097a0
>>>>>>
>>>>>> ------------------------------------------------------------------
>>>>>>
>>>>>> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
>>>>>> ---
>>>>>>  tools/perf/arch/arm64/util/arm-spe.c | 47 +++++++++++++++++++++++++---
>>>>>>  tools/perf/util/arm-spe.c            | 25 +++++++++++++++
>>>>>>  tools/perf/util/arm-spe.h            | 20 ++++++++++++
>>>>>>  3 files changed, 88 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
>>>>>> index eba6541ec0f1..68e91f3c9614 100644
>>>>>> --- a/tools/perf/arch/arm64/util/arm-spe.c
>>>>>> +++ b/tools/perf/arch/arm64/util/arm-spe.c
>>>>>> @@ -67,21 +67,60 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
>>>>>>       struct arm_spe_recording *sper =
>>>>>>                       container_of(itr, struct arm_spe_recording, itr);
>>>>>>       struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
>>>>>> -     struct evsel *evsel, *arm_spe_evsel = NULL;
>>>>>> +     struct evsel *evsel, *tmp, *arm_spe_evsel = NULL;
>>>>>>       bool privileged = perf_event_paranoid_check(-1);
>>>>>>       struct evsel *tracking_evsel;
>>>>>> +     char evsel_name[128];
>>>>>>       int err;
>>>>>>
>>>>>>       sper->evlist = evlist;
>>>>>>
>>>>>> -     evlist__for_each_entry(evlist, evsel) {
>>>>>> +     evlist__for_each_entry_safe(evlist, tmp, evsel) {
>>>>>>               if (evsel->core.attr.type == arm_spe_pmu->type) {
>>>>>>                       if (arm_spe_evsel) {
>>>>>> -                             pr_err("There may be only one " ARM_SPE_PMU_NAME "x event\n");
>>>>>> -                             return -EINVAL;
>>>>>> +                             if ((evsel->core.attr.config
>>>>>> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>>>>> +                                                     ARM_SPE_EVENT_LO))
>>>>>> +                                             && (arm_spe_evsel->core.attr.config
>>>>>> +                                             & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>>>>> +                                                     ARM_SPE_EVENT_LO))) {
>>>>>> +                                     arm_spe_evsel->core.attr.config |=
>>>>>> +                                                             evsel->core.attr.config;
>>>>>> +
>>>>>> +                                     if (!strstr(arm_spe_evsel->name, evsel->name)) {
>>>>>> +                                             scnprintf(evsel_name, sizeof(evsel_name),
>>>>>> +                                                             "%s, %s", arm_spe_evsel->name,
>>>>>> +                                                             evsel->name);
>>>>>> +                                             arm_spe_evsel->name = strdup(evsel_name);
>>>>>> +                                     }
>>>>>> +                             } else
>>>>>> +                                     pr_warning("Warning:\n"
>>>>>> +                                             "There may be only one "
>>>>>> +                                             ARM_SPE_PMU_NAME "x event."
>>>>>> +                                             " More than one spe event"
>>>>>> +                                             " will be ignored, unless"
>>>>>> +                                             " they are synthetic events"
>>>>>> +                                             " of spe, like:"
>>>>>> +                                             "\narm_spe_x/llc_miss/"
>>>>>> +                                             "\narm_spe_x/branch_miss/"
>>>>>> +                                             "\narm_spe_x/tlb_miss/"
>>>>>> +                                             "\narm_spe_x/remote_access/"
>>>>>> +                                             "\n(see 'perf list')\n");
>>>>>> +                             evlist__remove(evlist, evsel);
>>>>>> +                             evsel__delete(evsel);
>>>>>> +                             continue;
>>>>>>                       }
>>>>>>                       evsel->core.attr.freq = 0;
>>>>>>                       evsel->core.attr.sample_period = 1;
>>>>>> +                     if (evsel->core.attr.config
>>>>>> +                                     & GENMASK_ULL(ARM_SPE_EVENT_HI, ARM_SPE_EVENT_LO)) {
>>>>>> +                             evsel->core.attr.config |= SPE_ATTR_TS_ENABLE;
>>>>>> +                             if (!evsel->core.attr.precise_ip)
>>>>>> +                                     pr_warning("Warning:\n"
>>>>>> +                                             "These events are precise ip events,"
>>>>>> +                                             " please add :p/pp/ppp after the event.\n");
>>>>>> +                     }
>>>>>> +
>>>>>>                       arm_spe_evsel = evsel;
>>>>>>                       opts->full_auxtrace = true;
>>>>>>               }
>>>>>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>>>>>> index e7282c2616f3..0c9d7fa518a5 100644
>>>>>> --- a/tools/perf/util/arm-spe.c
>>>>>> +++ b/tools/perf/util/arm-spe.c
>>>>>> @@ -779,6 +779,31 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
>>>>>>       attr.sample_id_all = evsel->core.attr.sample_id_all;
>>>>>>       attr.read_format = evsel->core.attr.read_format;
>>>>>>
>>>>>> +     if (evsel->core.attr.config
>>>>>> +                     & GENMASK_ULL(ARM_SPE_EVENT_HI,
>>>>>> +                             ARM_SPE_EVENT_LO)) {
>>>>>> +             spe->synth_opts.llc_miss = false;
>>>>>> +             spe->synth_opts.tlb_miss = false;
>>>>>> +             spe->synth_opts.branch_miss = false;
>>>>>> +             spe->synth_opts.remote_access = false;
>>>>>> +
>>>>>> +             if (evsel->core.attr.config
>>>>>> +                             & (ARM_SPE_EVENT_LLC_MISS << ARM_SPE_EVENT_LO))
>>>>>> +                     spe->synth_opts.llc_miss = true;
>>>>>> +
>>>>>> +             if (evsel->core.attr.config
>>>>>> +                             & (ARM_SPE_EVENT_TLB_MISS << ARM_SPE_EVENT_LO))
>>>>>> +                     spe->synth_opts.tlb_miss = true;
>>>>>> +
>>>>>> +             if (evsel->core.attr.config
>>>>>> +                             & (ARM_SPE_EVENT_BRANCH_MISS << ARM_SPE_EVENT_LO))
>>>>>> +                     spe->synth_opts.branch_miss = true;
>>>>>> +
>>>>>> +             if (evsel->core.attr.config
>>>>>> +                             & (ARM_SPE_EVENT_REMOTE_ACCESS << ARM_SPE_EVENT_LO))
>>>>>> +                     spe->synth_opts.remote_access = true;
>>>>>> +     }
>>>>>> +
>>>>>>       /* create new id val to be a fixed offset from evsel id */
>>>>>>       id = evsel->core.id[0] + 1000000000;
>>>>>>
>>>>>> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
>>>>>> index 98d3235781c3..db7420121979 100644
>>>>>> --- a/tools/perf/util/arm-spe.h
>>>>>> +++ b/tools/perf/util/arm-spe.h
>>>>>> @@ -9,6 +9,26 @@
>>>>>>
>>>>>>  #define ARM_SPE_PMU_NAME "arm_spe_"
>>>>>>
>>>>>> +#define ARM_SPE_EVENT_LO                     3
>>>>>> +#define ARM_SPE_EVENT_HI                     6
>>>>>> +#define ARM_SPE_EVENT_LLC_MISS                       BIT(0)
>>>>>> +#define ARM_SPE_EVENT_BRANCH_MISS            BIT(1)
>>>>>> +#define ARM_SPE_EVENT_TLB_MISS                       BIT(2)
>>>>>> +#define ARM_SPE_EVENT_REMOTE_ACCESS          BIT(3)
>>>>>> +
>>>>>> +#define SPE_ATTR_TS_ENABLE                   BIT(0)
>>>>>> +#define SPE_ATTR_PA_ENABLE                   BIT(1)
>>>>>> +#define SPE_ATTR_PCT_ENABLE                  BIT(2)
>>>>>> +#define SPE_ATTR_JITTER                              BIT(16)
>>>>>> +#define SPE_ATTR_BRANCH_FILTER                       BIT(32)
>>>>>> +#define SPE_ATTR_LOAD_FILTER                 BIT(33)
>>>>>> +#define SPE_ATTR_STORE_FILTER                        BIT(34)
>>>>>> +
>>>>>> +#define SPE_ATTR_EV_RETIRED                  BIT(1)
>>>>>> +#define SPE_ATTR_EV_CACHE                    BIT(3)
>>>>>> +#define SPE_ATTR_EV_TLB                              BIT(5)
>>>>>> +#define SPE_ATTR_EV_BRANCH                   BIT(7)
>>>>>> +
>>>>>>  enum {
>>>>>>       ARM_SPE_PMU_TYPE,
>>>>>>       ARM_SPE_PER_CPU_MMAPS,
>>>>>>
>>>>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>>>>>
>>>>
>>>>
>>
>>



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-12-16  3:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-23 10:11 [RFC v3 0/5] perf tools: Add support for some spe events and precise ip Tan Xiaojun
2019-11-23 10:11 ` [RFC v3 1/5] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir Tan Xiaojun
2019-11-23 10:11 ` [RFC v3 2/5] perf tools: Add support for "report" for some spe events Tan Xiaojun
2019-11-23 10:11 ` [RFC v3 3/5] perf report: Add --spe options for arm-spe Tan Xiaojun
2019-11-23 10:11 ` [RFC v3 4/5] drivers: perf: add some arm spe events Tan Xiaojun
2019-11-23 10:11 ` [RFC v3 5/5] perf tools: Add support to process multi " Tan Xiaojun
2019-11-29 16:32   ` James Clark
2019-11-30  0:42     ` Tan Xiaojun
2019-12-06 15:48       ` James Clark
2019-12-09  0:46         ` Tan Xiaojun
2019-12-12 15:50           ` James Clark
2019-12-16  3:20             ` Tan Xiaojun
2019-12-02  7:07 ` [RFC v3 0/5] perf tools: Add support for some spe events and precise ip Qi Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).