linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] perf tools: Add support for some spe events and precise ip
@ 2020-01-23 16:07 James Clark
  2020-01-23 16:07 ` [PATCH v2 1/7] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: James Clark @ 2020-01-23 16:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: suzuki.poulose, gengdongjiu, wxf.wang, liwei391, liuqi115,
	huawei.libin, nd, linux-perf-users, James Clark, Will Deacon,
	Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Tan Xiaojun, Al Grant, Namhyung Kim

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") was merged, "perf record" and
"perf report --dump-raw-trace" are supported. However, the
raw data that is dumped cannot be used without parsing.

This patchset is to improve the "perf report" support for SPE, and
further process the data. Currently, support for the three events
of llc-miss, tlb-miss, branch-miss and remote-access is added.

The macro definition was modified under Jeremy's suggestion, and
the "event:pp" approach was used under James' suggestion to achieve
the precise ip of some events on arm64. Currently, only branch-misses
are implemented, and other event support will be added later.

In addition, we also found that when recording large multi-threaded
programs, ctrl + c could not end recording, so it was fixed and two
patches were added.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>

James Clark (1):
  perf tools: Unset precise_ip when using SPE

Tan Xiaojun (4):
  perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  perf tools: Add support for "report" for some spe events
  perf report: Add --spe options for arm-spe
  perf tools: Support "branch-misses:pp" on arm64

Wei Li (2):
  perf tools: add perf_evlist__terminate() for terminate
  perf tools: arm-spe: fix record hang after being terminated

 tools/perf/Documentation/perf-report.txt      |  10 +
 tools/perf/arch/arm64/util/arm-spe.c          |  10 +-
 tools/perf/builtin-record.c                   |   1 +
 tools/perf/builtin-report.c                   |   5 +
 tools/perf/util/Build                         |   2 +-
 tools/perf/util/arm-spe-decoder/Build         |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 +++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-pkt-decoder.c                     |   0
 .../arm-spe-pkt-decoder.h                     |   2 +
 tools/perf/util/arm-spe.c                     | 789 +++++++++++++++++-
 tools/perf/util/arm-spe.h                     |   3 +
 tools/perf/util/auxtrace.c                    |  49 ++
 tools/perf/util/auxtrace.h                    |  29 +
 tools/perf/util/evlist.c                      |  16 +
 tools/perf/util/evlist.h                      |   1 +
 tools/perf/util/evsel.h                       |   1 +
 tools/perf/util/session.h                     |   2 +
 18 files changed, 1170 insertions(+), 42 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (96%)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/7] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  2020-01-23 16:07 [PATCH v2 0/7] perf tools: Add support for some spe events and precise ip James Clark
@ 2020-01-23 16:07 ` James Clark
  2020-01-23 16:07 ` [PATCH v2 2/7] perf tools: Add support for "report" for some spe events James Clark
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-01-23 16:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: suzuki.poulose, gengdongjiu, wxf.wang, liwei391, liuqi115,
	huawei.libin, nd, linux-perf-users, Tan Xiaojun, James Clark,
	Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

Create a new arm-spe-decoder directory for subsequent extensions and
move arm-spe-pkt-decoder.h/c to this directory. No code changes.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/Build                                       | 2 +-
 tools/perf/util/arm-spe-decoder/Build                       | 1 +
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c | 0
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h | 0
 tools/perf/util/arm-spe.c                                   | 2 +-
 5 files changed, 3 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (100%)

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 07da6c790b63..0184510083c2 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -104,7 +104,7 @@ perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 perf-$(CONFIG_AUXTRACE) += intel-pt.o
 perf-$(CONFIG_AUXTRACE) += intel-bts.o
 perf-$(CONFIG_AUXTRACE) += arm-spe.o
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-decoder/
 perf-$(CONFIG_AUXTRACE) += s390-cpumsf.o
 
 ifdef CONFIG_LIBOPENCSD
diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
new file mode 100644
index 000000000000..16efbc245028
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -0,0 +1 @@
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
diff --git a/tools/perf/util/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.c
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
diff --git a/tools/perf/util/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.h
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 53be12b23ff4..f3382a38d48e 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -23,7 +23,7 @@
 #include "debug.h"
 #include "auxtrace.h"
 #include "arm-spe.h"
-#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
 struct arm_spe {
 	struct auxtrace			auxtrace;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/7] perf tools: Add support for "report" for some spe events
  2020-01-23 16:07 [PATCH v2 0/7] perf tools: Add support for some spe events and precise ip James Clark
  2020-01-23 16:07 ` [PATCH v2 1/7] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
@ 2020-01-23 16:07 ` James Clark
  2020-01-27 12:31   ` Jiri Olsa
  2020-01-23 16:07 ` [PATCH v2 3/7] perf report: Add --spe options for arm-spe James Clark
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-01-23 16:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: suzuki.poulose, gengdongjiu, wxf.wang, liwei391, liuqi115,
	huawei.libin, nd, linux-perf-users, Tan Xiaojun, James Clark,
	Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patch is to improve the "perf report" support for spe, and
further process the data. Currently, support for the four events
of llc-miss, tlb-miss, branch-miss, and remote-access is added.

Example usage:

$ ./perf record -c 1024 -e arm_spe_0/branch_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ -o perf-armspe-dd.data dd if=/dev/zero of=/dev/null count=10000

$ ./perf report -i perf-armspe-dd.data --stdio
--------------------------------------------------------------------
...
 # Samples: 23  of event 'llc-miss'
 # Event count (approx.): 23
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    12.12%    12.12%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     6.06%     6.06%  dd       [kernel.kallsyms]  [k] copy_page
     6.06%     6.06%  dd       ld-2.28.so         [.] _dl_relocate_object
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] change_protection_range
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] generic_permission
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] kmem_cache_alloc
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] lookup_fast
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] perf_event_exec
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] ring_buffer_record_is_on
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_start
     3.03%     3.03%  dd       ld-2.28.so         [.] dl_main
     3.03%     3.03%  dd       ld-2.28.so         [.] strcmp
     3.03%     3.03%  dd       libc-2.28.so       [.] _dl_addr
...
 # Samples: 3  of event 'tlb-miss'
 # Event count (approx.): 3
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] filemap_map_pages
    33.33%    33.33%  dd       ld-2.28.so         [.] _dl_start
    33.33%    33.33%  dd       ld-2.28.so         [.] dl_main
...
 # Samples: 20  of event 'branch-miss'
 # Event count (approx.): 20
...
    15.38%    15.38%  dd       [kernel.kallsyms]  [k] __fput
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] do_el0_ia_bp_hardening
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] pagevec_lru_move_fn
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] perf_event_mmap_output
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] task_work_run
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] unmap_single_vma
     7.69%     7.69%  dd       libc-2.28.so       [.] _IO_flush_all_lockp
     7.69%     7.69%  dd       libc-2.28.so       [.] __memcpy_generic
     7.69%     7.69%  dd       libc-2.28.so       [.] _dl_addr
     7.69%     7.69%  dd       libc-2.28.so       [.] msort_with_tmp.part.0
     7.69%     7.69%  dd       libc-2.28.so       [.] read_alias_file
...
 # Samples: 5  of event 'remote-access'
 # Event count (approx.): 5
...
    27.78%    27.78%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    16.67%    16.67%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] change_protection_range
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] generic_permission
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] lookup_fast
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] perf_event_exec
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_relocate_object
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_start
     5.56%     5.56%  dd       ld-2.28.so         [.] dl_main

--------------------------------------------------------------------
After that, more analysis and processing of the raw data of spe
will be done.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-report.c                   |   5 +
 tools/perf/util/arm-spe-decoder/Build         |   2 +-
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-decoder/arm-spe-pkt-decoder.h     |   2 +
 tools/perf/util/arm-spe.c                     | 745 +++++++++++++++++-
 tools/perf/util/auxtrace.c                    |  49 ++
 tools/perf/util/auxtrace.h                    |  29 +
 tools/perf/util/session.h                     |   2 +
 9 files changed, 1087 insertions(+), 38 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 9483b3f0cae3..80513dd57fe3 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1073,6 +1073,7 @@ int cmd_report(int argc, const char **argv)
 {
 	struct perf_session *session;
 	struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
+	struct arm_spe_synth_opts arm_spe_synth_opts = { .set = 0, };
 	struct stat st;
 	bool has_br_stack = false;
 	int branch_mode = -1;
@@ -1237,6 +1238,9 @@ int cmd_report(int argc, const char **argv)
 	OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
 			    "Instruction Tracing options\n" ITRACE_HELP,
 			    itrace_parse_synth_opts),
+	OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts",
+			    "ARM SPE Tracing options",
+			    arm_spe_parse_synth_opts),
 	OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
 			"Show full source file name path for source lines"),
 	OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
@@ -1348,6 +1352,7 @@ int cmd_report(int argc, const char **argv)
 	}
 
 	session->itrace_synth_opts = &itrace_synth_opts;
+	session->arm_spe_synth_opts = &arm_spe_synth_opts;
 
 	report.session = session;
 
diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
index 16efbc245028..f8dae13fc876 100644
--- a/tools/perf/util/arm-spe-decoder/Build
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -1 +1 @@
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
new file mode 100644
index 000000000000..50e796b89a95
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/zalloc.h>
+
+#include "../util.h"
+#include "../debug.h"
+#include "../auxtrace.h"
+
+#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder.h"
+
+#ifndef BIT
+#define BIT(n)		(1UL << (n))
+#endif
+
+struct arm_spe_decoder {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+	struct arm_spe_state state;
+	const unsigned char *buf;
+	size_t len;
+	uint64_t pos;
+	struct arm_spe_pkt packet;
+	int pkt_step;
+	int pkt_len;
+	int last_packet_type;
+
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t timestamp;
+	uint64_t sample_timestamp;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
+};
+
+static uint64_t arm_spe_calc_ip(uint64_t payload)
+{
+	uint64_t ip = (payload & ~(0xffULL << 56));
+
+	/* fill high 8 bits for kernel virtual address */
+	/* In Armv8 Architecture Reference Manual: Xn[55] determines
+	 * whether the address lies in the upper or lower address range
+	 * for the purpose of determining whether address tagging is
+	 * used */
+	if (ip & BIT(55))
+		ip |= (uint64_t)(0xffULL << 56);
+
+	return ip;
+}
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
+{
+	struct arm_spe_decoder *decoder;
+
+	if (!params->get_trace)
+		return NULL;
+
+	decoder = zalloc(sizeof(struct arm_spe_decoder));
+	if (!decoder)
+		return NULL;
+
+	decoder->get_trace          = params->get_trace;
+	decoder->data               = params->data;
+
+	return decoder;
+}
+
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
+{
+	free(decoder);
+}
+
+static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
+{
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	pr_debug("ERROR: Bad packet\n");
+
+	return -EBADMSG;
+}
+
+
+static int arm_spe_get_data(struct arm_spe_decoder *decoder)
+{
+	struct arm_spe_buffer buffer = { .buf = 0, };
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	pr_debug("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		pr_debug("No more data\n");
+		return -ENODATA;
+	}
+
+	return 0;
+}
+
+static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
+{
+	return arm_spe_get_data(decoder);
+}
+
+static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
+{
+	int ret;
+
+	decoder->last_packet_type = decoder->packet.type;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+
+		if (!decoder->len) {
+			ret = arm_spe_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = arm_spe_get_packet(decoder->buf, decoder->len,
+				&decoder->packet);
+		if (ret <= 0)
+			return arm_spe_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+	} while (decoder->packet.type == ARM_SPE_PAD);
+
+	return 0;
+}
+
+static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
+{
+	int err;
+	int idx;
+	uint64_t payload;
+
+	while (1) {
+		err = arm_spe_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		idx = decoder->packet.index;
+		payload = decoder->packet.payload;
+
+		switch (decoder->packet.type) {
+		case ARM_SPE_TIMESTAMP:
+			decoder->sample_timestamp = payload;
+			return 0;
+		case ARM_SPE_END:
+			decoder->sample_timestamp = 0;
+			return 0;
+		case ARM_SPE_ADDRESS:
+			decoder->ip = arm_spe_calc_ip(payload);
+			if (idx == 0)
+				decoder->state.from_ip = decoder->ip;
+			else if (idx == 1)
+				decoder->state.to_ip = decoder->ip;
+			break;
+		case ARM_SPE_COUNTER:
+			break;
+		case ARM_SPE_CONTEXT:
+			break;
+		case ARM_SPE_OP_TYPE:
+			break;
+		case ARM_SPE_EVENTS:
+			if (payload & BIT(EV_TLB_REFILL))
+				decoder->state.type |= ARM_SPE_TLB_MISS;
+			if (payload & BIT(EV_MISPRED))
+				decoder->state.type |= ARM_SPE_BRANCH_MISS;
+			if (idx > 1 && (payload & BIT(EV_LLC_REFILL)))
+				decoder->state.type |= ARM_SPE_LLC_MISS;
+			if (idx > 1 && (payload & BIT(EV_REMOTE_ACCESS)))
+				decoder->state.type |= ARM_SPE_REMOTE_ACCESS;
+
+			break;
+		case ARM_SPE_DATA_SOURCE:
+			break;
+		case ARM_SPE_BAD:
+			break;
+		case ARM_SPE_PAD:
+			break;
+		default:
+			pr_err("Get Packet Error!\n");
+			return -ENOSYS;
+		}
+	}
+}
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
+{
+	int err;
+
+	decoder->state.type = 0;
+
+	err = arm_spe_walk_trace(decoder);
+	if (err)
+		decoder->state.err = err;
+
+	decoder->state.timestamp = decoder->sample_timestamp;
+
+	return &decoder->state;
+}
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
new file mode 100644
index 000000000000..330f9e1e71ab
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef INCLUDE__ARM_SPE_DECODER_H__
+#define INCLUDE__ARM_SPE_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+enum arm_spe_events {
+	EV_EXCEPTION_GEN,
+	EV_RETIRED,
+	EV_L1D_ACCESS,
+	EV_L1D_REFILL,
+	EV_TLB_ACCESS,
+	EV_TLB_REFILL,
+	EV_NOT_TAKEN,
+	EV_MISPRED,
+	EV_LLC_ACCESS,
+	EV_LLC_REFILL,
+	EV_REMOTE_ACCESS,
+};
+
+enum arm_spe_sample_type {
+	ARM_SPE_LLC_MISS	= 1 << 0,
+	ARM_SPE_TLB_MISS	= 1 << 1,
+	ARM_SPE_BRANCH_MISS	= 1 << 2,
+	ARM_SPE_REMOTE_ACCESS	= 1 << 3,
+	ARM_SPE_EX_STOP		= 1 << 6,
+};
+
+struct arm_spe_state {
+	enum arm_spe_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t timestamp;
+};
+
+struct arm_spe_insn;
+
+struct arm_spe_buffer {
+	const unsigned char *buf;
+	size_t len;
+	u64 offset;
+	bool consecutive;
+	uint64_t ref_timestamp;
+	uint64_t trace_nr;
+};
+
+struct arm_spe_params {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+};
+
+struct arm_spe_decoder;
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params);
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder);
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder);
+
+#endif
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
index d786ef65113f..865d1e35b401 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -15,6 +15,8 @@
 #define ARM_SPE_NEED_MORE_BYTES		-1
 #define ARM_SPE_BAD_PACKET		-2
 
+#define ARM_SPE_PKT_MAX_SZ		16
+
 enum arm_spe_pkt_type {
 	ARM_SPE_BAD,
 	ARM_SPE_PAD,
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index f3382a38d48e..c99814c58745 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -16,34 +16,68 @@
 #include <linux/log2.h>
 #include <linux/zalloc.h>
 
+#include "auxtrace.h"
 #include "color.h"
+#include "debug.h"
 #include "evsel.h"
+#include "evlist.h"
 #include "machine.h"
 #include "session.h"
-#include "debug.h"
-#include "auxtrace.h"
+#include "symbol.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "tool.h"
+#include "util/synthetic-events.h"
+
 #include "arm-spe.h"
+#include "arm-spe-decoder/arm-spe-decoder.h"
 #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
+#define MAX_TIMESTAMP (~0ULL)
+
 struct arm_spe {
 	struct auxtrace			auxtrace;
 	struct auxtrace_queues		queues;
 	struct auxtrace_heap		heap;
+	struct arm_spe_synth_opts	synth_opts;
 	u32				auxtrace_type;
 	struct perf_session		*session;
 	struct machine			*machine;
 	u32				pmu_type;
+
+	u8				timeless_decoding;
+	u8				data_queued;
+
+	u8				sample_llc_miss;
+	u8				sample_tlb_miss;
+	u8				sample_branch_miss;
+	u8				sample_remote_access;
+	u64				llc_miss_id;
+	u64				tlb_miss_id;
+	u64				branch_miss_id;
+	u64				remote_access_id;
+	u64				kernel_start;
+
+	unsigned long			num_events;
 };
 
 struct arm_spe_queue {
-	struct arm_spe		*spe;
-	unsigned int		queue_nr;
-	struct auxtrace_buffer	*buffer;
-	bool			on_heap;
-	bool			done;
-	pid_t			pid;
-	pid_t			tid;
-	int			cpu;
+	struct arm_spe			*spe;
+	unsigned int			queue_nr;
+	struct auxtrace_buffer		*buffer;
+	struct auxtrace_buffer		*old_buffer;
+	union perf_event		*event_buf;
+	bool				on_heap;
+	bool				done;
+	pid_t				pid;
+	pid_t				tid;
+	int				cpu;
+	void				*decoder;
+	const struct arm_spe_state	*state;
+	u64				time;
+	u64				timestamp;
+	struct thread			*thread;
+	bool				have_sample;
 };
 
 static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
@@ -92,44 +126,494 @@ static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
 	arm_spe_dump(spe, buf, len);
 }
 
-static int arm_spe_process_event(struct perf_session *session __maybe_unused,
-				 union perf_event *event __maybe_unused,
-				 struct perf_sample *sample __maybe_unused,
-				 struct perf_tool *tool __maybe_unused)
+static int arm_spe_get_trace(struct arm_spe_buffer *b, void *data)
+{
+	struct arm_spe_queue *speq = data;
+	struct auxtrace_buffer *buffer = speq->buffer;
+	struct auxtrace_buffer *old_buffer = speq->old_buffer;
+	struct auxtrace_queue *queue;
+
+	queue = &speq->spe->queues.queue_array[speq->queue_nr];
+
+	buffer = auxtrace_buffer__next(queue, buffer);
+	/* If no more data, drop the previous auxtrace_buffer and return */
+	if (!buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	speq->buffer = buffer;
+
+	/* If the aux_buffer doesn't have data associated, try to load it */
+	if (!buffer->data) {
+		/* get the file desc associated with the perf data file */
+		int fd = perf_data__fd(speq->spe->session->data);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (buffer->use_data) {
+		b->len = buffer->use_size;
+		b->buf = buffer->use_data;
+	} else {
+		b->len = buffer->size;
+		b->buf = buffer->data;
+	}
+
+	b->ref_timestamp = buffer->reference;
+
+	if (b->len) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		speq->old_buffer = buffer;
+	} else {
+		auxtrace_buffer__drop_data(buffer);
+		return arm_spe_get_trace(b, data);
+	}
+
+	return 0;
+}
+
+static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe,
+		unsigned int queue_nr)
+{
+	struct arm_spe_params params = { .get_trace = 0, };
+	struct arm_spe_queue *speq;
+
+	speq = zalloc(sizeof(*speq));
+	if (!speq)
+		return NULL;
+
+	speq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!speq->event_buf)
+		goto out_free;
+
+	speq->spe = spe;
+	speq->queue_nr = queue_nr;
+	speq->pid = -1;
+	speq->tid = -1;
+	speq->cpu = -1;
+
+	/* params set */
+	params.get_trace = arm_spe_get_trace;
+	params.data = speq;
+
+	/* create new decoder */
+	speq->decoder = arm_spe_decoder_new(&params);
+	if (!speq->decoder)
+		goto out_free;
+
+	return speq;
+
+out_free:
+	zfree(&speq->event_buf);
+	free(speq);
+
+	return NULL;
+}
+
+static inline u8 arm_spe_cpumode(struct arm_spe *spe, uint64_t ip)
+{
+	return ip >= spe->kernel_start ?
+		PERF_RECORD_MISC_KERNEL :
+		PERF_RECORD_MISC_USER;
+}
+
+static void arm_spe_prep_sample(struct arm_spe *spe,
+				struct arm_spe_queue *speq,
+				union perf_event *event,
+				struct perf_sample *sample)
+{
+	if (!spe->timeless_decoding)
+		sample->time = speq->timestamp;
+
+	sample->ip = speq->state->from_ip;
+	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
+	sample->pid = speq->pid;
+	sample->tid = speq->tid;
+	sample->addr = speq->state->to_ip;
+	sample->period = 1;
+	sample->cpu = speq->cpu;
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = sample->cpumode;
+	event->sample.header.size = sizeof(struct perf_event_header);
+}
+
+static inline int
+arm_spe_deliver_synth_event(struct arm_spe *spe,
+			    struct arm_spe_queue *speq __maybe_unused,
+			    union perf_event *event,
+			    struct perf_sample *sample)
+{
+	int ret;
+
+	ret = perf_session__deliver_synth_event(spe->session, event, sample);
+	if (ret)
+		pr_err("ARM SPE: failed to deliver event, error %d\n", ret);
+
+	return ret;
+}
+
+static int
+arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
+				u64 spe_events_id)
+{
+	struct arm_spe *spe = speq->spe;
+	union perf_event *event = speq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	arm_spe_prep_sample(spe, speq, event, &sample);
+
+	sample.id = spe_events_id;
+	sample.stream_id = spe_events_id;
+
+	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+static int arm_spe_sample(struct arm_spe_queue *speq)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!speq->have_sample)
+		return 0;
+
+	speq->have_sample = false;
+
+	if (spe->sample_llc_miss && (state->type & ARM_SPE_LLC_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->llc_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_tlb_miss && (state->type & ARM_SPE_TLB_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->tlb_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_branch_miss && (state->type & ARM_SPE_BRANCH_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq,
+						      spe->branch_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_remote_access && (state->type & ARM_SPE_REMOTE_ACCESS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->remote_access_id);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!spe->kernel_start)
+		spe->kernel_start = machine__kernel_start(spe->machine);
+
+	while (1) {
+		err = arm_spe_sample(speq);
+		if (err)
+			return err;
+
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("No data or all data has been processed.\n");
+				return 1;
+			}
+			continue;
+		}
+
+		speq->state = state;
+		speq->have_sample = true;
+
+		if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
+			*timestamp = speq->timestamp;
+			return 0;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queue(struct arm_spe *spe,
+			       struct auxtrace_queue *queue,
+			       unsigned int queue_nr)
+{
+	struct arm_spe_queue *speq = queue->priv;
+
+	if (list_empty(&queue->head) || speq)
+		return 0;
+
+	speq = arm_spe__alloc_queue(spe, queue_nr);
+
+	if (!speq)
+		return -ENOMEM;
+
+	queue->priv = speq;
+
+	if (queue->cpu != -1)
+		speq->cpu = queue->cpu;
+
+	if (!speq->on_heap) {
+		const struct arm_spe_state *state;
+		int ret;
+
+		if (spe->timeless_decoding)
+			return 0;
+
+retry:
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("queue %u has no timestamp\n",
+						queue_nr);
+				return 0;
+			}
+			goto retry;
+		}
+
+		speq->timestamp = state->timestamp;
+		speq->state = state;
+		speq->have_sample = true;
+		ret = auxtrace_heap__add(&spe->heap, queue_nr, speq->timestamp);
+		if (ret)
+			return ret;
+		speq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queues(struct arm_spe *spe)
 {
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < spe->queues.nr_queues; i++) {
+		ret = arm_spe__setup_queue(spe, &spe->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int arm_spe__update_queues(struct arm_spe *spe)
+{
+	if (spe->queues.new_data) {
+		spe->queues.new_data = false;
+		return arm_spe__setup_queues(spe);
+	}
+
 	return 0;
 }
 
+static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
+{
+	struct evsel *evsel;
+	struct evlist *evlist = spe->session->evlist;
+	bool timeless_decoding = true;
+
+	/*
+	 * Circle through the list of event and complain if we find one
+	 * with the time bit set.
+	 */
+	evlist__for_each_entry(evlist, evsel) {
+		if ((evsel->core.attr.sample_type & PERF_SAMPLE_TIME))
+			timeless_decoding = false;
+	}
+
+	return timeless_decoding;
+}
+
+static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
+				    struct auxtrace_queue *queue)
+{
+	struct arm_spe_queue *speq = queue->priv;
+	pid_t tid;
+
+	tid = machine__get_current_tid(spe->machine, speq->cpu);
+	if (tid != -1) {
+		speq->tid = tid;
+		thread__zput(speq->thread);
+	} else
+		speq->tid = queue->tid;
+
+	if ((!speq->thread) && (speq->tid != -1)) {
+		speq->thread = machine__find_thread(spe->machine, -1,
+						    speq->tid);
+	}
+
+	if (speq->thread) {
+		speq->pid = speq->thread->pid_;
+		if (queue->cpu == -1)
+			speq->cpu = speq->thread->cpu;
+	}
+}
+
+static int arm_spe_process_queues(struct arm_spe *spe, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct arm_spe_queue *speq;
+
+		if (!spe->heap.heap_cnt)
+			return 0;
+
+		if (spe->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = spe->heap.heap_array[0].queue_nr;
+		queue = &spe->queues.queue_array[queue_nr];
+		speq = queue->priv;
+
+		auxtrace_heap__pop(&spe->heap);
+
+		if (spe->heap.heap_cnt) {
+			ts = spe->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		arm_spe_set_pid_tid_cpu(spe, queue);
+
+		ret = arm_spe_run_decoder(speq, &ts);
+		if (ret < 0) {
+			auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			speq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid,
+					    u64 time_)
+{
+	struct auxtrace_queues *queues = &spe->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &spe->queues.queue_array[i];
+		struct arm_spe_queue *speq = queue->priv;
+
+		if (speq && (tid == -1 || speq->tid == tid)) {
+			speq->time = time_;
+			arm_spe_set_pid_tid_cpu(spe, queue);
+			arm_spe_run_decoder(speq, &ts);
+		}
+	}
+	return 0;
+}
+
+static int arm_spe_process_event(struct perf_session *session,
+				 union perf_event *event,
+				 struct perf_sample *sample,
+				 struct perf_tool *tool)
+{
+	int err = 0;
+	u64 timestamp;
+	struct arm_spe *spe = container_of(session->auxtrace,
+			struct arm_spe, auxtrace);
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("CoreSight SPE Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time && (sample->time != (u64) -1))
+		timestamp = sample->time;
+	else
+		timestamp = 0;
+
+	if (timestamp || spe->timeless_decoding) {
+		err = arm_spe__update_queues(spe);
+		if (err)
+			return err;
+	}
+
+	if (spe->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_timeless_queues(spe,
+					event->fork.tid,
+					sample->time);
+		}
+	} else if (timestamp) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_queues(spe, timestamp);
+			if (err)
+				return err;
+		}
+	}
+
+	return err;
+}
+
 static int arm_spe_process_auxtrace_event(struct perf_session *session,
 					  union perf_event *event,
 					  struct perf_tool *tool __maybe_unused)
 {
 	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
 					     auxtrace);
-	struct auxtrace_buffer *buffer;
-	off_t data_offset;
-	int fd = perf_data__fd(session->data);
-	int err;
 
-	if (perf_data__is_pipe(session->data)) {
-		data_offset = 0;
-	} else {
-		data_offset = lseek(fd, 0, SEEK_CUR);
-		if (data_offset == -1)
-			return -errno;
-	}
+	if (!spe->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data__fd(session->data);
+		int err;
 
-	err = auxtrace_queues__add_event(&spe->queues, session, event,
-					 data_offset, &buffer);
-	if (err)
-		return err;
-
-	/* Dump here now we have copied a piped trace out of the pipe */
-	if (dump_trace) {
-		if (auxtrace_buffer__get_data(buffer, fd)) {
-			arm_spe_dump_event(spe, buffer->data,
-					     buffer->size);
-			auxtrace_buffer__put_data(buffer);
+		if (perf_data__is_pipe(session->data)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&spe->queues, session, event,
+				data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				arm_spe_dump_event(spe, buffer->data,
+						buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
 		}
 	}
 
@@ -139,7 +623,25 @@ static int arm_spe_process_auxtrace_event(struct perf_session *session,
 static int arm_spe_flush(struct perf_session *session __maybe_unused,
 			 struct perf_tool *tool __maybe_unused)
 {
-	return 0;
+	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
+			auxtrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = arm_spe__update_queues(spe);
+	if (ret < 0)
+		return ret;
+
+	if (spe->timeless_decoding)
+		return arm_spe_process_timeless_queues(spe, -1,
+				MAX_TIMESTAMP - 1);
+
+	return arm_spe_process_queues(spe, MAX_TIMESTAMP);
 }
 
 static void arm_spe_free_queue(void *priv)
@@ -148,6 +650,9 @@ static void arm_spe_free_queue(void *priv)
 
 	if (!speq)
 		return;
+	thread__zput(speq->thread);
+	arm_spe_decoder_free(speq->decoder);
+	zfree(&speq->event_buf);
 	free(speq);
 }
 
@@ -188,6 +693,149 @@ static void arm_spe_print_info(__u64 *arr)
 	fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
 }
 
+struct arm_spe_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int arm_spe_event_synth(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample __maybe_unused,
+			       struct machine *machine __maybe_unused)
+{
+	struct arm_spe_synth *arm_spe_synth =
+		      container_of(tool, struct arm_spe_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(arm_spe_synth->session,
+						 event, NULL);
+}
+
+static int arm_spe_synth_event(struct perf_session *session,
+			       struct perf_event_attr *attr, u64 id)
+{
+	struct arm_spe_synth arm_spe_synth;
+
+	memset(&arm_spe_synth, 0, sizeof(struct arm_spe_synth));
+	arm_spe_synth.session = session;
+
+	return perf_event__synthesize_attr(&arm_spe_synth.dummy_tool, attr, 1,
+					   &id, arm_spe_event_synth);
+}
+
+static void arm_spe_set_event_name(struct evlist *evlist, u64 id,
+				    const char *name)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.id && evsel->core.id[0] == id) {
+			if (evsel->name)
+				zfree(&evsel->name);
+			evsel->name = strdup(name);
+			break;
+		}
+	}
+}
+
+static int
+arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
+{
+	struct evlist *evlist = session->evlist;
+	struct evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.attr.type == spe->pmu_type) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("No selected events with CoreSight Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+		PERF_SAMPLE_PERIOD;
+	if (spe->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+
+	attr.exclude_user = evsel->core.attr.exclude_user;
+	attr.exclude_kernel = evsel->core.attr.exclude_kernel;
+	attr.exclude_hv = evsel->core.attr.exclude_hv;
+	attr.exclude_host = evsel->core.attr.exclude_host;
+	attr.exclude_guest = evsel->core.attr.exclude_guest;
+	attr.sample_id_all = evsel->core.attr.sample_id_all;
+	attr.read_format = evsel->core.attr.read_format;
+
+	/* create new id val to be a fixed offset from evsel id */
+	id = evsel->core.id[0] + 1000000000;
+
+	if (!id)
+		id = 1;
+
+	/* spe events set */
+	if (spe->synth_opts.llc_miss) {
+		spe->sample_llc_miss = true;
+
+		/* llc-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->llc_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "llc-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.tlb_miss) {
+		spe->sample_tlb_miss = true;
+
+		/* tlb-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->tlb_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "tlb-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.branch_miss) {
+		spe->sample_branch_miss = true;
+
+		/* branch-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->branch_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "branch-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.remote_access) {
+		spe->sample_remote_access = true;
+
+		/* remote-access */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->remote_access_id = id;
+		arm_spe_set_event_name(evlist, id, "remote-access");
+		id += 1;
+	}
+
+	return 0;
+}
+
 int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session)
 {
@@ -213,6 +861,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->auxtrace_type = auxtrace_info->type;
 	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
 
+	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 	spe->auxtrace.process_event = arm_spe_process_event;
 	spe->auxtrace.process_auxtrace_event = arm_spe_process_auxtrace_event;
 	spe->auxtrace.flush_events = arm_spe_flush;
@@ -222,8 +871,30 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 
 	arm_spe_print_info(&auxtrace_info->priv[0]);
 
+	if (dump_trace)
+		return 0;
+
+	if (session->arm_spe_synth_opts && session->arm_spe_synth_opts->set)
+		spe->synth_opts = *session->arm_spe_synth_opts;
+	else
+		arm_spe_synth_opts__set_default(&spe->synth_opts);
+
+	err = arm_spe_synth_events(spe, session);
+	if (err)
+		goto err_free_queues;
+
+	err = auxtrace_queues__process_index(&spe->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (spe->queues.populated)
+		spe->data_queued = true;
+
 	return 0;
 
+err_free_queues:
+	auxtrace_queues__free(&spe->queues);
+	session->auxtrace = NULL;
 err_free:
 	free(spe);
 	return err;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index eb087e7df6f4..eea038591453 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1454,6 +1454,55 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 	return -EINVAL;
 }
 
+void arm_spe_synth_opts__set_default(struct arm_spe_synth_opts *synth_opts)
+{
+	synth_opts->llc_miss = true;
+	synth_opts->tlb_miss = true;
+	synth_opts->branch_miss = true;
+	synth_opts->remote_access = true;
+}
+
+int arm_spe_parse_synth_opts(const struct option *opt, const char *str,
+			    int unset __maybe_unused)
+{
+	struct arm_spe_synth_opts *synth_opts = opt->value;
+	const char *p;
+
+	synth_opts->set = true;
+
+	if (!str) {
+		arm_spe_synth_opts__set_default(synth_opts);
+		return 0;
+	}
+
+	for (p = str; *p;) {
+		switch (*p++) {
+		case 'l':
+			synth_opts->llc_miss = true;
+			break;
+		case 't':
+			synth_opts->tlb_miss = true;
+			break;
+		case 'b':
+			synth_opts->branch_miss = true;
+			break;
+		case 'r':
+			synth_opts->remote_access = true;
+			break;
+		case ' ':
+		case ',':
+			break;
+		default:
+			goto out_err;
+		}
+	}
+
+	return 0;
+
+out_err:
+	pr_err("Bad ARM SPE Tracing options '%s'\n", str);
+	return -EINVAL;
+}
 static const char * const auxtrace_error_type_name[] = {
 	[PERF_AUXTRACE_ERROR_ITRACE] = "instruction trace",
 };
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 749d72cd9c7b..b47108599280 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -111,6 +111,22 @@ struct itrace_synth_opts {
 	int			range_num;
 };
 
+/**
+ * struct arm_spe_synth_opts - ARM SPE tracing synthesis options.
+ * @set: indicates whether or not options have been set
+ * @llc_miss: whether to synthesize last level cache miss events
+ * @tlb_miss: whether to synthesize TLB miss events
+ * @branch_miss: whether to synthesize Branch miss events
+ * @remote_access: whether to synthesize Remote access events
+ */
+struct arm_spe_synth_opts {
+	bool			set;
+	bool			llc_miss;
+	bool			tlb_miss;
+	bool			branch_miss;
+	bool			remote_access;
+};
+
 /**
  * struct auxtrace_index_entry - indexes a AUX area tracing event within a
  *                               perf.data file.
@@ -560,6 +576,10 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
 				    bool no_sample);
 
+int arm_spe_parse_synth_opts(const struct option *opt, const char *str,
+			    int unset);
+void arm_spe_synth_opts__set_default(struct arm_spe_synth_opts *synth_opts);
+
 size_t perf_event__fprintf_auxtrace_error(union perf_event *event, FILE *fp);
 void perf_session__auxtrace_error_inc(struct perf_session *session,
 				      union perf_event *event);
@@ -662,6 +682,15 @@ int itrace_parse_synth_opts(const struct option *opt __maybe_unused,
 	return -EINVAL;
 }
 
+static inline
+int arm_spe_parse_synth_opts(const struct option *opt __maybe_unused,
+			    const char *str __maybe_unused,
+			    int unset __maybe_unused)
+{
+	pr_err("ARM SPE area tracing not supported\n");
+	return -EINVAL;
+}
+
 static inline
 int auxtrace_parse_snapshot_options(struct auxtrace_record *itr __maybe_unused,
 				    struct record_opts *opts __maybe_unused,
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index f76480166d38..cee134d7643f 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -19,6 +19,7 @@ struct thread;
 
 struct auxtrace;
 struct itrace_synth_opts;
+struct arm_spe_synth_opts;
 
 struct perf_session {
 	struct perf_header	header;
@@ -26,6 +27,7 @@ struct perf_session {
 	struct evlist	*evlist;
 	struct auxtrace		*auxtrace;
 	struct itrace_synth_opts *itrace_synth_opts;
+	struct arm_spe_synth_opts *arm_spe_synth_opts;
 	struct list_head	auxtrace_index;
 	struct trace_event	tevent;
 	struct perf_record_time_conv	time_conv;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/7] perf report: Add --spe options for arm-spe
  2020-01-23 16:07 [PATCH v2 0/7] perf tools: Add support for some spe events and precise ip James Clark
  2020-01-23 16:07 ` [PATCH v2 1/7] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
  2020-01-23 16:07 ` [PATCH v2 2/7] perf tools: Add support for "report" for some spe events James Clark
@ 2020-01-23 16:07 ` James Clark
  2020-01-23 16:07 ` [PATCH v2 4/7] perf tools: Support "branch-misses:pp" on arm64 James Clark
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-01-23 16:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: suzuki.poulose, gengdongjiu, wxf.wang, liwei391, liuqi115,
	huawei.libin, nd, linux-perf-users, Tan Xiaojun, James Clark,
	Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

The previous patch added support in "perf report" for some arm-spe
events(llc-miss, tlb-miss, branch-miss, remote_access). This patch
adds their help instructions.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-report.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index db61f16ffa56..01eae9a0ff98 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -468,6 +468,16 @@ include::itrace.txt[]
 
 	To disable decoding entirely, use --no-itrace.
 
+--spe::
+	Options for decoding arm-spe tracing data. The options are:
+
+		l	synthesize llc miss events
+		t	synthesize tlb miss events
+		b	synthesize branch miss events
+		r	synthesize remote access events
+
+	The default is all events i.e. the same as --spe=ltbr
+
 --full-source-path::
 	Show the full path for source files for srcline output.
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 4/7] perf tools: Support "branch-misses:pp" on arm64
  2020-01-23 16:07 [PATCH v2 0/7] perf tools: Add support for some spe events and precise ip James Clark
                   ` (2 preceding siblings ...)
  2020-01-23 16:07 ` [PATCH v2 3/7] perf report: Add --spe options for arm-spe James Clark
@ 2020-01-23 16:07 ` James Clark
  2020-01-27 12:31   ` Jiri Olsa
  2020-01-23 16:07 ` [PATCH v2 5/7] perf tools: add perf_evlist__terminate() for terminate James Clark
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-01-23 16:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: suzuki.poulose, gengdongjiu, wxf.wang, liwei391, liuqi115,
	huawei.libin, nd, linux-perf-users, Tan Xiaojun, James Clark,
	James Clark, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

At the suggestion of James Clark, use spe to support the precise
ip of some events. Currently its support event is:
branch-misses.

Example usage:

$ ./perf record -e branch-misses:pp dd if=/dev/zero of=/dev/null count=10000
(:p/pp/ppp is same for this case.)

$ ./perf report --stdio
("--stdio is not necessary")

--------------------------------------------------------------------
...
 # Samples: 14  of event 'branch-misses:pp'
 # Event count (approx.): 14
 #
 # Children      Self  Command  Shared Object      Symbol
 # ........  ........  .......  .................  ..........................
 #
    14.29%    14.29%  dd       [kernel.kallsyms]  [k] __arch_copy_from_user
    14.29%    14.29%  dd       libc-2.28.so       [.] _dl_addr
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] __free_pages
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] __pi_memcpy
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] pagecache_get_page
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] unmap_single_vma
     7.14%     7.14%  dd       dd                 [.] 0x00000000000025ec
     7.14%     7.14%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
     7.14%     7.14%  dd       ld-2.28.so         [.] check_match
     7.14%     7.14%  dd       libc-2.28.so       [.] __mpn_rshift
     7.14%     7.14%  dd       libc-2.28.so       [.] _nl_intern_locale_data
     7.14%     7.14%  dd       libc-2.28.so       [.] read_alias_file
...
--------------------------------------------------------------------

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Suggested-by: James Clark <James.Clark@arm.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/arm-spe.c | 41 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/arm-spe.h |  3 +++
 tools/perf/util/evlist.c  |  2 ++
 3 files changed, 46 insertions(+)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index c99814c58745..0fcaefd386a6 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -35,6 +35,19 @@
 
 #define MAX_TIMESTAMP (~0ULL)
 
+#define SPE_ATTR_TS_ENABLE		BIT(0)
+#define SPE_ATTR_PA_ENABLE		BIT(1)
+#define SPE_ATTR_PCT_ENABLE		BIT(2)
+#define SPE_ATTR_JITTER			BIT(16)
+#define SPE_ATTR_BRANCH_FILTER		BIT(32)
+#define SPE_ATTR_LOAD_FILTER		BIT(33)
+#define SPE_ATTR_STORE_FILTER		BIT(34)
+
+#define SPE_ATTR_EV_RETIRED		BIT(1)
+#define SPE_ATTR_EV_CACHE		BIT(3)
+#define SPE_ATTR_EV_TLB			BIT(5)
+#define SPE_ATTR_EV_BRANCH		BIT(7)
+
 struct arm_spe {
 	struct auxtrace			auxtrace;
 	struct auxtrace_queues		queues;
@@ -778,6 +791,15 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	attr.sample_id_all = evsel->core.attr.sample_id_all;
 	attr.read_format = evsel->core.attr.read_format;
 
+	/* If it is in the precise ip mode, there is no need to
+	 * synthesize new events. */
+	if (!strncmp(evsel->name, "branch-misses", 13)) {
+		spe->sample_branch_miss = true;
+		spe->branch_miss_id = evsel->core.id[0];
+
+		return 0;
+	}
+
 	/* create new id val to be a fixed offset from evsel id */
 	id = evsel->core.id[0] + 1000000000;
 
@@ -899,3 +921,22 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	free(spe);
 	return err;
 }
+
+void arm_spe_precise_ip_support(struct evlist *evlist, struct evsel *evsel)
+{
+	struct perf_pmu *pmu;
+
+	/* Currently only supports precise_ip for branch-misses on arm64 */
+	if (!strcmp(perf_env__arch(evlist->env), "arm64")
+			&& evsel->core.attr.config == PERF_COUNT_HW_BRANCH_MISSES
+			&& evsel->core.attr.precise_ip) {
+		pmu = perf_pmu__find("arm_spe_0");
+		if (pmu) {
+			evsel->pmu_name = pmu->name;
+			evsel->core.attr.type = pmu->type;
+			evsel->core.attr.config = SPE_ATTR_TS_ENABLE
+						| SPE_ATTR_BRANCH_FILTER;
+			evsel->core.attr.config1 = SPE_ATTR_EV_BRANCH;
+		}
+	}
+}
diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index 98d3235781c3..8b1fb191d03a 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -20,6 +20,8 @@ enum {
 union perf_event;
 struct perf_session;
 struct perf_pmu;
+struct evlist;
+struct evsel;
 
 struct auxtrace_record *arm_spe_recording_init(int *err,
 					       struct perf_pmu *arm_spe_pmu);
@@ -28,4 +30,5 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session);
 
 struct perf_event_attr *arm_spe_pmu_default_config(struct perf_pmu *arm_spe_pmu);
+void arm_spe_precise_ip_support(struct evlist *evlist, struct evsel *evsel);
 #endif
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1548237b6558..b9c7e5271611 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -9,6 +9,7 @@
 #include <errno.h>
 #include <inttypes.h>
 #include <poll.h>
+#include "arm-spe.h"
 #include "cpumap.h"
 #include "util/mmap.h"
 #include "thread_map.h"
@@ -179,6 +180,7 @@ void perf_evlist__splice_list_tail(struct evlist *evlist,
 	struct evsel *evsel, *temp;
 
 	__evlist__for_each_entry_safe(list, temp, evsel) {
+		arm_spe_precise_ip_support(evlist, evsel);
 		list_del_init(&evsel->core.node);
 		evlist__add(evlist, evsel);
 	}
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 5/7] perf tools: add perf_evlist__terminate() for terminate
  2020-01-23 16:07 [PATCH v2 0/7] perf tools: Add support for some spe events and precise ip James Clark
                   ` (3 preceding siblings ...)
  2020-01-23 16:07 ` [PATCH v2 4/7] perf tools: Support "branch-misses:pp" on arm64 James Clark
@ 2020-01-23 16:07 ` James Clark
  2020-01-27 12:31   ` Jiri Olsa
  2020-01-23 16:07 ` [PATCH v2 6/7] perf tools: arm-spe: fix record hang after being terminated James Clark
  2020-01-23 16:07 ` [PATCH v2 7/7] perf tools: Unset precise_ip when using SPE James Clark
  6 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-01-23 16:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: suzuki.poulose, gengdongjiu, wxf.wang, liwei391, liuqi115,
	huawei.libin, nd, linux-perf-users, James Clark, Will Deacon,
	Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Tan Xiaojun, Al Grant, Namhyung Kim

From: Wei Li <liwei391@huawei.com>

In __cmd_record(), when receiving SIGINT(ctrl + c), a done flag will
be set and the event list will be disabled by perf_evlist__disable()
once.

While in auxtrace_record.read_finish(), the related events will be
enabled again, if they are continuous, the recording seems to be endless.

Mark the evlist's state as terminated, preparing for the following fix.

Signed-off-by: Wei Li <liwei391@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c |  1 +
 tools/perf/util/evlist.c    | 14 ++++++++++++++
 tools/perf/util/evlist.h    |  1 +
 tools/perf/util/evsel.h     |  1 +
 4 files changed, 17 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..e7c917f9534d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1722,6 +1722,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		if (done && !disabled && !target__none(&opts->target)) {
 			trigger_off(&auxtrace_snapshot_trigger);
 			evlist__disable(rec->evlist);
+			evlist__terminate(rec->evlist);
 			disabled = true;
 		}
 	}
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index b9c7e5271611..b04794cd8586 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -377,6 +377,20 @@ bool evsel__cpu_iter_skip(struct evsel *ev, int cpu)
 	return true;
 }
 
+void evlist__terminate(struct evlist *evlist)
+{
+	struct evsel *pos;
+
+	evlist__for_each_entry(evlist, pos) {
+		if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
+			continue;
+		evsel__disable(pos);
+		pos->terminated = true;
+	}
+
+	evlist->enabled = false;
+}
+
 void evlist__disable(struct evlist *evlist)
 {
 	struct evsel *pos;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index f5bd5c386df1..9fbd0ce2a1c4 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -206,6 +206,7 @@ void evlist__munmap(struct evlist *evlist);
 
 size_t evlist__mmap_size(unsigned long pages);
 
+void evlist__terminate(struct evlist *evlist);
 void evlist__disable(struct evlist *evlist);
 void evlist__enable(struct evlist *evlist);
 void perf_evlist__toggle_enable(struct evlist *evlist);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index dc14f4a823cd..8e8a2cb41de8 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -104,6 +104,7 @@ struct evsel {
 		perf_evsel__sb_cb_t	*cb;
 		void			*data;
 	} side_band;
+	bool			terminated;
 };
 
 struct perf_missing_features {
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 6/7] perf tools: arm-spe: fix record hang after being terminated
  2020-01-23 16:07 [PATCH v2 0/7] perf tools: Add support for some spe events and precise ip James Clark
                   ` (4 preceding siblings ...)
  2020-01-23 16:07 ` [PATCH v2 5/7] perf tools: add perf_evlist__terminate() for terminate James Clark
@ 2020-01-23 16:07 ` James Clark
  2020-01-23 16:07 ` [PATCH v2 7/7] perf tools: Unset precise_ip when using SPE James Clark
  6 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-01-23 16:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: suzuki.poulose, gengdongjiu, wxf.wang, liwei391, liuqi115,
	huawei.libin, nd, linux-perf-users, James Clark, Will Deacon,
	Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Tan Xiaojun, Al Grant, Namhyung Kim

From: Wei Li <liwei391@huawei.com>

If the spe event is terminated, we don't enable it again here.

Signed-off-by: Wei Li <liwei391@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/arch/arm64/util/arm-spe.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index eba6541ec0f1..629badda724d 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -165,9 +165,13 @@ static int arm_spe_read_finish(struct auxtrace_record *itr, int idx)
 	struct evsel *evsel;
 
 	evlist__for_each_entry(sper->evlist, evsel) {
-		if (evsel->core.attr.type == sper->arm_spe_pmu->type)
-			return perf_evlist__enable_event_idx(sper->evlist,
-							     evsel, idx);
+		if (evsel->core.attr.type == sper->arm_spe_pmu->type) {
+			if (evsel->terminated)
+				return 0;
+			else
+				return perf_evlist__enable_event_idx(
+						sper->evlist, evsel, idx);
+		}
 	}
 	return -EINVAL;
 }
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 7/7] perf tools: Unset precise_ip when using SPE
  2020-01-23 16:07 [PATCH v2 0/7] perf tools: Add support for some spe events and precise ip James Clark
                   ` (5 preceding siblings ...)
  2020-01-23 16:07 ` [PATCH v2 6/7] perf tools: arm-spe: fix record hang after being terminated James Clark
@ 2020-01-23 16:07 ` James Clark
  6 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-01-23 16:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: suzuki.poulose, gengdongjiu, wxf.wang, liwei391, liuqi115,
	huawei.libin, nd, linux-perf-users, James Clark, Will Deacon,
	Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Tan Xiaojun, Al Grant, Namhyung Kim

precise_ip is not supported on Arm and the kernel may be
updated to reflect this. So unset it when we know we can use
SPE to get precise data instead.

Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/arm-spe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 0fcaefd386a6..0ed2a68db0b3 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -937,6 +937,7 @@ void arm_spe_precise_ip_support(struct evlist *evlist, struct evsel *evsel)
 			evsel->core.attr.config = SPE_ATTR_TS_ENABLE
 						| SPE_ATTR_BRANCH_FILTER;
 			evsel->core.attr.config1 = SPE_ATTR_EV_BRANCH;
+			evsel->core.attr.precise_ip = 0;
 		}
 	}
 }
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/7] perf tools: add perf_evlist__terminate() for terminate
  2020-01-23 16:07 ` [PATCH v2 5/7] perf tools: add perf_evlist__terminate() for terminate James Clark
@ 2020-01-27 12:31   ` Jiri Olsa
  2020-02-07 15:21     ` [PATCH v3 0/4] perf tools: Add support for some spe events and precise ip James Clark
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Olsa @ 2020-01-27 12:31 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, suzuki.poulose, gengdongjiu,
	wxf.wang, liwei391, liuqi115, huawei.libin, nd, linux-perf-users,
	Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Tan Xiaojun,
	Al Grant, Namhyung Kim

On Thu, Jan 23, 2020 at 04:07:32PM +0000, James Clark wrote:
> From: Wei Li <liwei391@huawei.com>
> 
> In __cmd_record(), when receiving SIGINT(ctrl + c), a done flag will
> be set and the event list will be disabled by perf_evlist__disable()
> once.
> 
> While in auxtrace_record.read_finish(), the related events will be
> enabled again, if they are continuous, the recording seems to be endless.
> 
> Mark the evlist's state as terminated, preparing for the following fix.
> 
> Signed-off-by: Wei Li <liwei391@huawei.com>
> Tested-by: Qi Liu <liuqi115@hisilicon.com>
> Signed-off-by: James Clark <james.clark@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Tan Xiaojun <tanxiaojun@huawei.com>
> Cc: Al Grant <al.grant@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-record.c |  1 +
>  tools/perf/util/evlist.c    | 14 ++++++++++++++
>  tools/perf/util/evlist.h    |  1 +
>  tools/perf/util/evsel.h     |  1 +
>  4 files changed, 17 insertions(+)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 4c301466101b..e7c917f9534d 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1722,6 +1722,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  		if (done && !disabled && !target__none(&opts->target)) {
>  			trigger_off(&auxtrace_snapshot_trigger);
>  			evlist__disable(rec->evlist);
> +			evlist__terminate(rec->evlist);
>  			disabled = true;
>  		}
>  	}
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index b9c7e5271611..b04794cd8586 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -377,6 +377,20 @@ bool evsel__cpu_iter_skip(struct evsel *ev, int cpu)
>  	return true;
>  }
>  
> +void evlist__terminate(struct evlist *evlist)
> +{
> +	struct evsel *pos;
> +
> +	evlist__for_each_entry(evlist, pos) {
> +		if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
> +			continue;
> +		evsel__disable(pos);
> +		pos->terminated = true;
> +	}

how is this different from evlist__disable? other than it does not
follow the cpu affinity ;-) can't you just call evlist__disable and
check later on evlist->enabled instead of evlist->terminated?

jirka


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] perf tools: Support "branch-misses:pp" on arm64
  2020-01-23 16:07 ` [PATCH v2 4/7] perf tools: Support "branch-misses:pp" on arm64 James Clark
@ 2020-01-27 12:31   ` Jiri Olsa
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Olsa @ 2020-01-27 12:31 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, suzuki.poulose, gengdongjiu,
	wxf.wang, liwei391, liuqi115, huawei.libin, nd, linux-perf-users,
	Tan Xiaojun, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Al Grant, Namhyung Kim

On Thu, Jan 23, 2020 at 04:07:31PM +0000, James Clark wrote:

SNIP

> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 1548237b6558..b9c7e5271611 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -9,6 +9,7 @@
>  #include <errno.h>
>  #include <inttypes.h>
>  #include <poll.h>
> +#include "arm-spe.h"
>  #include "cpumap.h"
>  #include "util/mmap.h"
>  #include "thread_map.h"
> @@ -179,6 +180,7 @@ void perf_evlist__splice_list_tail(struct evlist *evlist,
>  	struct evsel *evsel, *temp;
>  
>  	__evlist__for_each_entry_safe(list, temp, evsel) {
> +		arm_spe_precise_ip_support(evlist, evsel);

this is 'splice' function, you can't configure precise in here

do you need this 'config thing' to be executed on arm only?

if yes, please add something like arch_evsel__config, make it
weak for generic code and define it for arm

if no, just add the call at the end perf_evsel__config I guess

thanks,
jirka

>  		list_del_init(&evsel->core.node);
>  		evlist__add(evlist, evsel);
>  	}
> -- 
> 2.25.0
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/7] perf tools: Add support for "report" for some spe events
  2020-01-23 16:07 ` [PATCH v2 2/7] perf tools: Add support for "report" for some spe events James Clark
@ 2020-01-27 12:31   ` Jiri Olsa
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Olsa @ 2020-01-27 12:31 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, suzuki.poulose, gengdongjiu,
	wxf.wang, liwei391, liuqi115, huawei.libin, nd, linux-perf-users,
	Tan Xiaojun, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Al Grant, Namhyung Kim

On Thu, Jan 23, 2020 at 04:07:29PM +0000, James Clark wrote:

SNIP

> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index 749d72cd9c7b..b47108599280 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -111,6 +111,22 @@ struct itrace_synth_opts {
>  	int			range_num;
>  };
>  
> +/**
> + * struct arm_spe_synth_opts - ARM SPE tracing synthesis options.
> + * @set: indicates whether or not options have been set
> + * @llc_miss: whether to synthesize last level cache miss events
> + * @tlb_miss: whether to synthesize TLB miss events
> + * @branch_miss: whether to synthesize Branch miss events
> + * @remote_access: whether to synthesize Remote access events
> + */
> +struct arm_spe_synth_opts {
> +	bool			set;
> +	bool			llc_miss;
> +	bool			tlb_miss;
> +	bool			branch_miss;
> +	bool			remote_access;

hum, why don't you add that to itrace_synth_opts instead? seems generic enough

I don't follow the code much, but I assume itrace_synth_opts is generic object,
as it's already used by some of the s390 and x86 code.. also I don't like new
pointer to synth_ops in perf_session

jirka

> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> index f76480166d38..cee134d7643f 100644
> --- a/tools/perf/util/session.h
> +++ b/tools/perf/util/session.h
> @@ -19,6 +19,7 @@ struct thread;
>  
>  struct auxtrace;
>  struct itrace_synth_opts;
> +struct arm_spe_synth_opts;
>  
>  struct perf_session {
>  	struct perf_header	header;
> @@ -26,6 +27,7 @@ struct perf_session {
>  	struct evlist	*evlist;
>  	struct auxtrace		*auxtrace;
>  	struct itrace_synth_opts *itrace_synth_opts;
> +	struct arm_spe_synth_opts *arm_spe_synth_opts;
>  	struct list_head	auxtrace_index;
>  	struct trace_event	tevent;
>  	struct perf_record_time_conv	time_conv;
> -- 
> 2.25.0
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 0/4] perf tools: Add support for some spe events and precise ip
  2020-01-27 12:31   ` Jiri Olsa
@ 2020-02-07 15:21     ` James Clark
  2020-02-07 15:21       ` [PATCH v3 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
                         ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: James Clark @ 2020-02-07 15:21 UTC (permalink / raw)
  To: jolsa, liwei391, linux-arm-kernel, linux-kernel
  Cc: nd, James Clark, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Tan Xiaojun, Al Grant, Namhyung Kim

Hi Jirka,

Thanks for the feedback. I've made the following changes:

Removed the arm_spe_synth_opts struct and added the new types into
itrace_synth_opts. I could re-use branch but the other ones are new.
And the --trace argument documentation has been updated accordingly.

I've removed the processing of the evlist from perf_evlist__splice_list_tail
and put it in a weak function "auxtrace__preprocess_evlist" that is
only built on Arm.

For the 2 patches about the hang on termination, I have removed them
because I haven't been able to reproduce it and everything is working
ok for me. @Wei Li, are you able to post steps required to reproduce?

Thanks
James

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>

Tan Xiaojun (4):
  perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  perf tools: Add support for "report" for some spe events
  perf report: Add SPE options to --itrace argument
  perf tools: Support "branch-misses:pp" on arm64

 tools/perf/Documentation/itrace.txt           |   5 +-
 tools/perf/arch/arm/util/auxtrace.c           |  38 +
 tools/perf/builtin-record.c                   |   5 +
 tools/perf/util/Build                         |   2 +-
 tools/perf/util/arm-spe-decoder/Build         |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-pkt-decoder.c                     |   0
 .../arm-spe-pkt-decoder.h                     |   2 +
 tools/perf/util/arm-spe.c                     | 756 +++++++++++++++++-
 tools/perf/util/arm-spe.h                     |   3 +
 tools/perf/util/auxtrace.c                    |  13 +
 tools/perf/util/auxtrace.h                    |  14 +-
 tools/perf/util/evlist.c                      |   1 +
 tools/perf/util/evsel.h                       |   1 -
 15 files changed, 1090 insertions(+), 42 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (96%)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  2020-02-07 15:21     ` [PATCH v3 0/4] perf tools: Add support for some spe events and precise ip James Clark
@ 2020-02-07 15:21       ` James Clark
  2020-02-07 15:21       ` [PATCH v3 2/4] perf tools: Add support for "report" for some spe events James Clark
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-02-07 15:21 UTC (permalink / raw)
  To: jolsa, liwei391, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

Create a new arm-spe-decoder directory for subsequent extensions and
move arm-spe-pkt-decoder.h/c to this directory. No code changes.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/Build                                       | 2 +-
 tools/perf/util/arm-spe-decoder/Build                       | 1 +
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c | 0
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h | 0
 tools/perf/util/arm-spe.c                                   | 2 +-
 5 files changed, 3 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (100%)

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 07da6c790b63..0184510083c2 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -104,7 +104,7 @@ perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 perf-$(CONFIG_AUXTRACE) += intel-pt.o
 perf-$(CONFIG_AUXTRACE) += intel-bts.o
 perf-$(CONFIG_AUXTRACE) += arm-spe.o
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-decoder/
 perf-$(CONFIG_AUXTRACE) += s390-cpumsf.o
 
 ifdef CONFIG_LIBOPENCSD
diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
new file mode 100644
index 000000000000..16efbc245028
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -0,0 +1 @@
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
diff --git a/tools/perf/util/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.c
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
diff --git a/tools/perf/util/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.h
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 53be12b23ff4..f3382a38d48e 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -23,7 +23,7 @@
 #include "debug.h"
 #include "auxtrace.h"
 #include "arm-spe.h"
-#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
 struct arm_spe {
 	struct auxtrace			auxtrace;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 2/4] perf tools: Add support for "report" for some spe events
  2020-02-07 15:21     ` [PATCH v3 0/4] perf tools: Add support for some spe events and precise ip James Clark
  2020-02-07 15:21       ` [PATCH v3 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
@ 2020-02-07 15:21       ` James Clark
  2020-02-07 15:21       ` [PATCH v3 3/4] perf report: Add SPE options to --itrace argument James Clark
  2020-02-07 15:21       ` [PATCH v3 4/4] perf tools: Support "branch-misses:pp" on arm64 James Clark
  3 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-02-07 15:21 UTC (permalink / raw)
  To: jolsa, liwei391, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patch is to improve the "perf report" support for spe, and
further process the data. Currently, support for the four events
of llc-miss, tlb-miss, branch-miss, and remote-access is added.

Example usage:

$ ./perf record -c 1024 -e arm_spe_0/branch_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ -o perf-armspe-dd.data dd if=/dev/zero of=/dev/null count=10000

$ ./perf report -i perf-armspe-dd.data --stdio
--------------------------------------------------------------------
...
 # Samples: 23  of event 'llc-miss'
 # Event count (approx.): 23
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    12.12%    12.12%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     6.06%     6.06%  dd       [kernel.kallsyms]  [k] copy_page
     6.06%     6.06%  dd       ld-2.28.so         [.] _dl_relocate_object
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] change_protection_range
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] generic_permission
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] kmem_cache_alloc
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] lookup_fast
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] perf_event_exec
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] ring_buffer_record_is_on
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_start
     3.03%     3.03%  dd       ld-2.28.so         [.] dl_main
     3.03%     3.03%  dd       ld-2.28.so         [.] strcmp
     3.03%     3.03%  dd       libc-2.28.so       [.] _dl_addr
...
 # Samples: 3  of event 'tlb-miss'
 # Event count (approx.): 3
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] filemap_map_pages
    33.33%    33.33%  dd       ld-2.28.so         [.] _dl_start
    33.33%    33.33%  dd       ld-2.28.so         [.] dl_main
...
 # Samples: 20  of event 'branch-miss'
 # Event count (approx.): 20
...
    15.38%    15.38%  dd       [kernel.kallsyms]  [k] __fput
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] do_el0_ia_bp_hardening
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] pagevec_lru_move_fn
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] perf_event_mmap_output
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] task_work_run
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] unmap_single_vma
     7.69%     7.69%  dd       libc-2.28.so       [.] _IO_flush_all_lockp
     7.69%     7.69%  dd       libc-2.28.so       [.] __memcpy_generic
     7.69%     7.69%  dd       libc-2.28.so       [.] _dl_addr
     7.69%     7.69%  dd       libc-2.28.so       [.] msort_with_tmp.part.0
     7.69%     7.69%  dd       libc-2.28.so       [.] read_alias_file
...
 # Samples: 5  of event 'remote-access'
 # Event count (approx.): 5
...
    27.78%    27.78%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    16.67%    16.67%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] change_protection_range
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] generic_permission
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] lookup_fast
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] perf_event_exec
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_relocate_object
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_start
     5.56%     5.56%  dd       ld-2.28.so         [.] dl_main

--------------------------------------------------------------------
After that, more analysis and processing of the raw data of spe
will be done.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/arm-spe-decoder/Build         |   2 +-
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-decoder/arm-spe-pkt-decoder.h     |   2 +
 tools/perf/util/arm-spe.c                     | 745 +++++++++++++++++-
 tools/perf/util/auxtrace.c                    |  13 +
 tools/perf/util/auxtrace.h                    |   8 +-
 7 files changed, 1022 insertions(+), 39 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h

diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
index 16efbc245028..f8dae13fc876 100644
--- a/tools/perf/util/arm-spe-decoder/Build
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -1 +1 @@
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
new file mode 100644
index 000000000000..50e796b89a95
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/zalloc.h>
+
+#include "../util.h"
+#include "../debug.h"
+#include "../auxtrace.h"
+
+#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder.h"
+
+#ifndef BIT
+#define BIT(n)		(1UL << (n))
+#endif
+
+struct arm_spe_decoder {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+	struct arm_spe_state state;
+	const unsigned char *buf;
+	size_t len;
+	uint64_t pos;
+	struct arm_spe_pkt packet;
+	int pkt_step;
+	int pkt_len;
+	int last_packet_type;
+
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t timestamp;
+	uint64_t sample_timestamp;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
+};
+
+static uint64_t arm_spe_calc_ip(uint64_t payload)
+{
+	uint64_t ip = (payload & ~(0xffULL << 56));
+
+	/* fill high 8 bits for kernel virtual address */
+	/* In Armv8 Architecture Reference Manual: Xn[55] determines
+	 * whether the address lies in the upper or lower address range
+	 * for the purpose of determining whether address tagging is
+	 * used */
+	if (ip & BIT(55))
+		ip |= (uint64_t)(0xffULL << 56);
+
+	return ip;
+}
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
+{
+	struct arm_spe_decoder *decoder;
+
+	if (!params->get_trace)
+		return NULL;
+
+	decoder = zalloc(sizeof(struct arm_spe_decoder));
+	if (!decoder)
+		return NULL;
+
+	decoder->get_trace          = params->get_trace;
+	decoder->data               = params->data;
+
+	return decoder;
+}
+
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
+{
+	free(decoder);
+}
+
+static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
+{
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	pr_debug("ERROR: Bad packet\n");
+
+	return -EBADMSG;
+}
+
+
+static int arm_spe_get_data(struct arm_spe_decoder *decoder)
+{
+	struct arm_spe_buffer buffer = { .buf = 0, };
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	pr_debug("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		pr_debug("No more data\n");
+		return -ENODATA;
+	}
+
+	return 0;
+}
+
+static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
+{
+	return arm_spe_get_data(decoder);
+}
+
+static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
+{
+	int ret;
+
+	decoder->last_packet_type = decoder->packet.type;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+
+		if (!decoder->len) {
+			ret = arm_spe_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = arm_spe_get_packet(decoder->buf, decoder->len,
+				&decoder->packet);
+		if (ret <= 0)
+			return arm_spe_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+	} while (decoder->packet.type == ARM_SPE_PAD);
+
+	return 0;
+}
+
+static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
+{
+	int err;
+	int idx;
+	uint64_t payload;
+
+	while (1) {
+		err = arm_spe_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		idx = decoder->packet.index;
+		payload = decoder->packet.payload;
+
+		switch (decoder->packet.type) {
+		case ARM_SPE_TIMESTAMP:
+			decoder->sample_timestamp = payload;
+			return 0;
+		case ARM_SPE_END:
+			decoder->sample_timestamp = 0;
+			return 0;
+		case ARM_SPE_ADDRESS:
+			decoder->ip = arm_spe_calc_ip(payload);
+			if (idx == 0)
+				decoder->state.from_ip = decoder->ip;
+			else if (idx == 1)
+				decoder->state.to_ip = decoder->ip;
+			break;
+		case ARM_SPE_COUNTER:
+			break;
+		case ARM_SPE_CONTEXT:
+			break;
+		case ARM_SPE_OP_TYPE:
+			break;
+		case ARM_SPE_EVENTS:
+			if (payload & BIT(EV_TLB_REFILL))
+				decoder->state.type |= ARM_SPE_TLB_MISS;
+			if (payload & BIT(EV_MISPRED))
+				decoder->state.type |= ARM_SPE_BRANCH_MISS;
+			if (idx > 1 && (payload & BIT(EV_LLC_REFILL)))
+				decoder->state.type |= ARM_SPE_LLC_MISS;
+			if (idx > 1 && (payload & BIT(EV_REMOTE_ACCESS)))
+				decoder->state.type |= ARM_SPE_REMOTE_ACCESS;
+
+			break;
+		case ARM_SPE_DATA_SOURCE:
+			break;
+		case ARM_SPE_BAD:
+			break;
+		case ARM_SPE_PAD:
+			break;
+		default:
+			pr_err("Get Packet Error!\n");
+			return -ENOSYS;
+		}
+	}
+}
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
+{
+	int err;
+
+	decoder->state.type = 0;
+
+	err = arm_spe_walk_trace(decoder);
+	if (err)
+		decoder->state.err = err;
+
+	decoder->state.timestamp = decoder->sample_timestamp;
+
+	return &decoder->state;
+}
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
new file mode 100644
index 000000000000..330f9e1e71ab
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef INCLUDE__ARM_SPE_DECODER_H__
+#define INCLUDE__ARM_SPE_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+enum arm_spe_events {
+	EV_EXCEPTION_GEN,
+	EV_RETIRED,
+	EV_L1D_ACCESS,
+	EV_L1D_REFILL,
+	EV_TLB_ACCESS,
+	EV_TLB_REFILL,
+	EV_NOT_TAKEN,
+	EV_MISPRED,
+	EV_LLC_ACCESS,
+	EV_LLC_REFILL,
+	EV_REMOTE_ACCESS,
+};
+
+enum arm_spe_sample_type {
+	ARM_SPE_LLC_MISS	= 1 << 0,
+	ARM_SPE_TLB_MISS	= 1 << 1,
+	ARM_SPE_BRANCH_MISS	= 1 << 2,
+	ARM_SPE_REMOTE_ACCESS	= 1 << 3,
+	ARM_SPE_EX_STOP		= 1 << 6,
+};
+
+struct arm_spe_state {
+	enum arm_spe_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t timestamp;
+};
+
+struct arm_spe_insn;
+
+struct arm_spe_buffer {
+	const unsigned char *buf;
+	size_t len;
+	u64 offset;
+	bool consecutive;
+	uint64_t ref_timestamp;
+	uint64_t trace_nr;
+};
+
+struct arm_spe_params {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+};
+
+struct arm_spe_decoder;
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params);
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder);
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder);
+
+#endif
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
index d786ef65113f..865d1e35b401 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -15,6 +15,8 @@
 #define ARM_SPE_NEED_MORE_BYTES		-1
 #define ARM_SPE_BAD_PACKET		-2
 
+#define ARM_SPE_PKT_MAX_SZ		16
+
 enum arm_spe_pkt_type {
 	ARM_SPE_BAD,
 	ARM_SPE_PAD,
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index f3382a38d48e..4ef22a0775a9 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -16,34 +16,68 @@
 #include <linux/log2.h>
 #include <linux/zalloc.h>
 
+#include "auxtrace.h"
 #include "color.h"
+#include "debug.h"
 #include "evsel.h"
+#include "evlist.h"
 #include "machine.h"
 #include "session.h"
-#include "debug.h"
-#include "auxtrace.h"
+#include "symbol.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "tool.h"
+#include "util/synthetic-events.h"
+
 #include "arm-spe.h"
+#include "arm-spe-decoder/arm-spe-decoder.h"
 #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
+#define MAX_TIMESTAMP (~0ULL)
+
 struct arm_spe {
 	struct auxtrace			auxtrace;
 	struct auxtrace_queues		queues;
 	struct auxtrace_heap		heap;
+        struct itrace_synth_opts        synth_opts;
 	u32				auxtrace_type;
 	struct perf_session		*session;
 	struct machine			*machine;
 	u32				pmu_type;
+
+	u8				timeless_decoding;
+	u8				data_queued;
+
+	u8				sample_llc_miss;
+	u8				sample_tlb_miss;
+	u8				sample_branch_miss;
+	u8				sample_remote_access;
+	u64				llc_miss_id;
+	u64				tlb_miss_id;
+	u64				branch_miss_id;
+	u64				remote_access_id;
+	u64				kernel_start;
+
+	unsigned long			num_events;
 };
 
 struct arm_spe_queue {
-	struct arm_spe		*spe;
-	unsigned int		queue_nr;
-	struct auxtrace_buffer	*buffer;
-	bool			on_heap;
-	bool			done;
-	pid_t			pid;
-	pid_t			tid;
-	int			cpu;
+	struct arm_spe			*spe;
+	unsigned int			queue_nr;
+	struct auxtrace_buffer		*buffer;
+	struct auxtrace_buffer		*old_buffer;
+	union perf_event		*event_buf;
+	bool				on_heap;
+	bool				done;
+	pid_t				pid;
+	pid_t				tid;
+	int				cpu;
+	void				*decoder;
+	const struct arm_spe_state	*state;
+	u64				time;
+	u64				timestamp;
+	struct thread			*thread;
+	bool				have_sample;
 };
 
 static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
@@ -92,44 +126,494 @@ static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
 	arm_spe_dump(spe, buf, len);
 }
 
-static int arm_spe_process_event(struct perf_session *session __maybe_unused,
-				 union perf_event *event __maybe_unused,
-				 struct perf_sample *sample __maybe_unused,
-				 struct perf_tool *tool __maybe_unused)
+static int arm_spe_get_trace(struct arm_spe_buffer *b, void *data)
+{
+	struct arm_spe_queue *speq = data;
+	struct auxtrace_buffer *buffer = speq->buffer;
+	struct auxtrace_buffer *old_buffer = speq->old_buffer;
+	struct auxtrace_queue *queue;
+
+	queue = &speq->spe->queues.queue_array[speq->queue_nr];
+
+	buffer = auxtrace_buffer__next(queue, buffer);
+	/* If no more data, drop the previous auxtrace_buffer and return */
+	if (!buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	speq->buffer = buffer;
+
+	/* If the aux_buffer doesn't have data associated, try to load it */
+	if (!buffer->data) {
+		/* get the file desc associated with the perf data file */
+		int fd = perf_data__fd(speq->spe->session->data);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (buffer->use_data) {
+		b->len = buffer->use_size;
+		b->buf = buffer->use_data;
+	} else {
+		b->len = buffer->size;
+		b->buf = buffer->data;
+	}
+
+	b->ref_timestamp = buffer->reference;
+
+	if (b->len) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		speq->old_buffer = buffer;
+	} else {
+		auxtrace_buffer__drop_data(buffer);
+		return arm_spe_get_trace(b, data);
+	}
+
+	return 0;
+}
+
+static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe,
+		unsigned int queue_nr)
+{
+	struct arm_spe_params params = { .get_trace = 0, };
+	struct arm_spe_queue *speq;
+
+	speq = zalloc(sizeof(*speq));
+	if (!speq)
+		return NULL;
+
+	speq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!speq->event_buf)
+		goto out_free;
+
+	speq->spe = spe;
+	speq->queue_nr = queue_nr;
+	speq->pid = -1;
+	speq->tid = -1;
+	speq->cpu = -1;
+
+	/* params set */
+	params.get_trace = arm_spe_get_trace;
+	params.data = speq;
+
+	/* create new decoder */
+	speq->decoder = arm_spe_decoder_new(&params);
+	if (!speq->decoder)
+		goto out_free;
+
+	return speq;
+
+out_free:
+	zfree(&speq->event_buf);
+	free(speq);
+
+	return NULL;
+}
+
+static inline u8 arm_spe_cpumode(struct arm_spe *spe, uint64_t ip)
+{
+	return ip >= spe->kernel_start ?
+		PERF_RECORD_MISC_KERNEL :
+		PERF_RECORD_MISC_USER;
+}
+
+static void arm_spe_prep_sample(struct arm_spe *spe,
+				struct arm_spe_queue *speq,
+				union perf_event *event,
+				struct perf_sample *sample)
+{
+	if (!spe->timeless_decoding)
+		sample->time = speq->timestamp;
+
+	sample->ip = speq->state->from_ip;
+	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
+	sample->pid = speq->pid;
+	sample->tid = speq->tid;
+	sample->addr = speq->state->to_ip;
+	sample->period = 1;
+	sample->cpu = speq->cpu;
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = sample->cpumode;
+	event->sample.header.size = sizeof(struct perf_event_header);
+}
+
+static inline int
+arm_spe_deliver_synth_event(struct arm_spe *spe,
+			    struct arm_spe_queue *speq __maybe_unused,
+			    union perf_event *event,
+			    struct perf_sample *sample)
+{
+	int ret;
+
+	ret = perf_session__deliver_synth_event(spe->session, event, sample);
+	if (ret)
+		pr_err("ARM SPE: failed to deliver event, error %d\n", ret);
+
+	return ret;
+}
+
+static int
+arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
+				u64 spe_events_id)
+{
+	struct arm_spe *spe = speq->spe;
+	union perf_event *event = speq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	arm_spe_prep_sample(spe, speq, event, &sample);
+
+	sample.id = spe_events_id;
+	sample.stream_id = spe_events_id;
+
+	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+static int arm_spe_sample(struct arm_spe_queue *speq)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!speq->have_sample)
+		return 0;
+
+	speq->have_sample = false;
+
+	if (spe->sample_llc_miss && (state->type & ARM_SPE_LLC_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->llc_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_tlb_miss && (state->type & ARM_SPE_TLB_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->tlb_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_branch_miss && (state->type & ARM_SPE_BRANCH_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq,
+						      spe->branch_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_remote_access && (state->type & ARM_SPE_REMOTE_ACCESS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->remote_access_id);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!spe->kernel_start)
+		spe->kernel_start = machine__kernel_start(spe->machine);
+
+	while (1) {
+		err = arm_spe_sample(speq);
+		if (err)
+			return err;
+
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("No data or all data has been processed.\n");
+				return 1;
+			}
+			continue;
+		}
+
+		speq->state = state;
+		speq->have_sample = true;
+
+		if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
+			*timestamp = speq->timestamp;
+			return 0;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queue(struct arm_spe *spe,
+			       struct auxtrace_queue *queue,
+			       unsigned int queue_nr)
+{
+	struct arm_spe_queue *speq = queue->priv;
+
+	if (list_empty(&queue->head) || speq)
+		return 0;
+
+	speq = arm_spe__alloc_queue(spe, queue_nr);
+
+	if (!speq)
+		return -ENOMEM;
+
+	queue->priv = speq;
+
+	if (queue->cpu != -1)
+		speq->cpu = queue->cpu;
+
+	if (!speq->on_heap) {
+		const struct arm_spe_state *state;
+		int ret;
+
+		if (spe->timeless_decoding)
+			return 0;
+
+retry:
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("queue %u has no timestamp\n",
+						queue_nr);
+				return 0;
+			}
+			goto retry;
+		}
+
+		speq->timestamp = state->timestamp;
+		speq->state = state;
+		speq->have_sample = true;
+		ret = auxtrace_heap__add(&spe->heap, queue_nr, speq->timestamp);
+		if (ret)
+			return ret;
+		speq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queues(struct arm_spe *spe)
 {
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < spe->queues.nr_queues; i++) {
+		ret = arm_spe__setup_queue(spe, &spe->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int arm_spe__update_queues(struct arm_spe *spe)
+{
+	if (spe->queues.new_data) {
+		spe->queues.new_data = false;
+		return arm_spe__setup_queues(spe);
+	}
+
 	return 0;
 }
 
+static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
+{
+	struct evsel *evsel;
+	struct evlist *evlist = spe->session->evlist;
+	bool timeless_decoding = true;
+
+	/*
+	 * Circle through the list of event and complain if we find one
+	 * with the time bit set.
+	 */
+	evlist__for_each_entry(evlist, evsel) {
+		if ((evsel->core.attr.sample_type & PERF_SAMPLE_TIME))
+			timeless_decoding = false;
+	}
+
+	return timeless_decoding;
+}
+
+static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
+				    struct auxtrace_queue *queue)
+{
+	struct arm_spe_queue *speq = queue->priv;
+	pid_t tid;
+
+	tid = machine__get_current_tid(spe->machine, speq->cpu);
+	if (tid != -1) {
+		speq->tid = tid;
+		thread__zput(speq->thread);
+	} else
+		speq->tid = queue->tid;
+
+	if ((!speq->thread) && (speq->tid != -1)) {
+		speq->thread = machine__find_thread(spe->machine, -1,
+						    speq->tid);
+	}
+
+	if (speq->thread) {
+		speq->pid = speq->thread->pid_;
+		if (queue->cpu == -1)
+			speq->cpu = speq->thread->cpu;
+	}
+}
+
+static int arm_spe_process_queues(struct arm_spe *spe, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct arm_spe_queue *speq;
+
+		if (!spe->heap.heap_cnt)
+			return 0;
+
+		if (spe->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = spe->heap.heap_array[0].queue_nr;
+		queue = &spe->queues.queue_array[queue_nr];
+		speq = queue->priv;
+
+		auxtrace_heap__pop(&spe->heap);
+
+		if (spe->heap.heap_cnt) {
+			ts = spe->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		arm_spe_set_pid_tid_cpu(spe, queue);
+
+		ret = arm_spe_run_decoder(speq, &ts);
+		if (ret < 0) {
+			auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			speq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid,
+					    u64 time_)
+{
+	struct auxtrace_queues *queues = &spe->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &spe->queues.queue_array[i];
+		struct arm_spe_queue *speq = queue->priv;
+
+		if (speq && (tid == -1 || speq->tid == tid)) {
+			speq->time = time_;
+			arm_spe_set_pid_tid_cpu(spe, queue);
+			arm_spe_run_decoder(speq, &ts);
+		}
+	}
+	return 0;
+}
+
+static int arm_spe_process_event(struct perf_session *session,
+				 union perf_event *event,
+				 struct perf_sample *sample,
+				 struct perf_tool *tool)
+{
+	int err = 0;
+	u64 timestamp;
+	struct arm_spe *spe = container_of(session->auxtrace,
+			struct arm_spe, auxtrace);
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("CoreSight SPE Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time && (sample->time != (u64) -1))
+		timestamp = sample->time;
+	else
+		timestamp = 0;
+
+	if (timestamp || spe->timeless_decoding) {
+		err = arm_spe__update_queues(spe);
+		if (err)
+			return err;
+	}
+
+	if (spe->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_timeless_queues(spe,
+					event->fork.tid,
+					sample->time);
+		}
+	} else if (timestamp) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_queues(spe, timestamp);
+			if (err)
+				return err;
+		}
+	}
+
+	return err;
+}
+
 static int arm_spe_process_auxtrace_event(struct perf_session *session,
 					  union perf_event *event,
 					  struct perf_tool *tool __maybe_unused)
 {
 	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
 					     auxtrace);
-	struct auxtrace_buffer *buffer;
-	off_t data_offset;
-	int fd = perf_data__fd(session->data);
-	int err;
 
-	if (perf_data__is_pipe(session->data)) {
-		data_offset = 0;
-	} else {
-		data_offset = lseek(fd, 0, SEEK_CUR);
-		if (data_offset == -1)
-			return -errno;
-	}
+	if (!spe->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data__fd(session->data);
+		int err;
 
-	err = auxtrace_queues__add_event(&spe->queues, session, event,
-					 data_offset, &buffer);
-	if (err)
-		return err;
-
-	/* Dump here now we have copied a piped trace out of the pipe */
-	if (dump_trace) {
-		if (auxtrace_buffer__get_data(buffer, fd)) {
-			arm_spe_dump_event(spe, buffer->data,
-					     buffer->size);
-			auxtrace_buffer__put_data(buffer);
+		if (perf_data__is_pipe(session->data)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&spe->queues, session, event,
+				data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				arm_spe_dump_event(spe, buffer->data,
+						buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
 		}
 	}
 
@@ -139,7 +623,25 @@ static int arm_spe_process_auxtrace_event(struct perf_session *session,
 static int arm_spe_flush(struct perf_session *session __maybe_unused,
 			 struct perf_tool *tool __maybe_unused)
 {
-	return 0;
+	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
+			auxtrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = arm_spe__update_queues(spe);
+	if (ret < 0)
+		return ret;
+
+	if (spe->timeless_decoding)
+		return arm_spe_process_timeless_queues(spe, -1,
+				MAX_TIMESTAMP - 1);
+
+	return arm_spe_process_queues(spe, MAX_TIMESTAMP);
 }
 
 static void arm_spe_free_queue(void *priv)
@@ -148,6 +650,9 @@ static void arm_spe_free_queue(void *priv)
 
 	if (!speq)
 		return;
+	thread__zput(speq->thread);
+	arm_spe_decoder_free(speq->decoder);
+	zfree(&speq->event_buf);
 	free(speq);
 }
 
@@ -188,6 +693,149 @@ static void arm_spe_print_info(__u64 *arr)
 	fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
 }
 
+struct arm_spe_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int arm_spe_event_synth(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample __maybe_unused,
+			       struct machine *machine __maybe_unused)
+{
+	struct arm_spe_synth *arm_spe_synth =
+		      container_of(tool, struct arm_spe_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(arm_spe_synth->session,
+						 event, NULL);
+}
+
+static int arm_spe_synth_event(struct perf_session *session,
+			       struct perf_event_attr *attr, u64 id)
+{
+	struct arm_spe_synth arm_spe_synth;
+
+	memset(&arm_spe_synth, 0, sizeof(struct arm_spe_synth));
+	arm_spe_synth.session = session;
+
+	return perf_event__synthesize_attr(&arm_spe_synth.dummy_tool, attr, 1,
+					   &id, arm_spe_event_synth);
+}
+
+static void arm_spe_set_event_name(struct evlist *evlist, u64 id,
+				    const char *name)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.id && evsel->core.id[0] == id) {
+			if (evsel->name)
+				zfree(&evsel->name);
+			evsel->name = strdup(name);
+			break;
+		}
+	}
+}
+
+static int
+arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
+{
+	struct evlist *evlist = session->evlist;
+	struct evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.attr.type == spe->pmu_type) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("No selected events with CoreSight Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+		PERF_SAMPLE_PERIOD;
+	if (spe->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+
+	attr.exclude_user = evsel->core.attr.exclude_user;
+	attr.exclude_kernel = evsel->core.attr.exclude_kernel;
+	attr.exclude_hv = evsel->core.attr.exclude_hv;
+	attr.exclude_host = evsel->core.attr.exclude_host;
+	attr.exclude_guest = evsel->core.attr.exclude_guest;
+	attr.sample_id_all = evsel->core.attr.sample_id_all;
+	attr.read_format = evsel->core.attr.read_format;
+
+	/* create new id val to be a fixed offset from evsel id */
+	id = evsel->core.id[0] + 1000000000;
+
+	if (!id)
+		id = 1;
+
+	/* spe events set */
+	if (spe->synth_opts.llc_miss) {
+		spe->sample_llc_miss = true;
+
+		/* llc-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->llc_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "llc-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.tlb_miss) {
+		spe->sample_tlb_miss = true;
+
+		/* tlb-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->tlb_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "tlb-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.branches) {
+		spe->sample_branch_miss = true;
+
+		/* branch-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->branch_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "branch-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.remote_access) {
+		spe->sample_remote_access = true;
+
+		/* remote-access */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->remote_access_id = id;
+		arm_spe_set_event_name(evlist, id, "remote-access");
+		id += 1;
+	}
+
+	return 0;
+}
+
 int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session)
 {
@@ -213,6 +861,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->auxtrace_type = auxtrace_info->type;
 	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
 
+	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 	spe->auxtrace.process_event = arm_spe_process_event;
 	spe->auxtrace.process_auxtrace_event = arm_spe_process_auxtrace_event;
 	spe->auxtrace.flush_events = arm_spe_flush;
@@ -222,8 +871,30 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 
 	arm_spe_print_info(&auxtrace_info->priv[0]);
 
+	if (dump_trace)
+		return 0;
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+		spe->synth_opts = *session->itrace_synth_opts;
+	else
+		itrace_synth_opts__set_default(&spe->synth_opts, false);
+
+	err = arm_spe_synth_events(spe, session);
+	if (err)
+		goto err_free_queues;
+
+	err = auxtrace_queues__process_index(&spe->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (spe->queues.populated)
+		spe->data_queued = true;
+
 	return 0;
 
+err_free_queues:
+	auxtrace_queues__free(&spe->queues);
+	session->auxtrace = NULL;
 err_free:
 	free(spe);
 	return err;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index eb087e7df6f4..994d5e3c9e4f 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1279,6 +1279,10 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
 	synth_opts->pwr_events = true;
 	synth_opts->other_events = true;
 	synth_opts->errors = true;
+	synth_opts->llc_miss = true;
+	synth_opts->tlb_miss = true;
+	synth_opts->remote_access = true;
+
 	if (no_sample) {
 		synth_opts->period_type = PERF_ITRACE_PERIOD_INSTRUCTIONS;
 		synth_opts->period = 1;
@@ -1431,6 +1435,15 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 				goto out_err;
 			p = endptr;
 			break;
+		case 'm':
+			synth_opts->llc_miss = true;
+			break;
+		case 't':
+			synth_opts->tlb_miss = true;
+			break;
+		case 'a':
+			synth_opts->remote_access = true;
+			break;
 		case ' ':
 		case ',':
 			break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 749d72cd9c7b..80617b0d044d 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -60,7 +60,7 @@ enum itrace_period_type {
  * @inject: indicates the event (not just the sample) must be fully synthesized
  *          because 'perf inject' will write it out
  * @instructions: whether to synthesize 'instructions' events
- * @branches: whether to synthesize 'branches' events
+ * @branches: whether to synthesize 'branches' events (branch misses only on Arm)
  * @transactions: whether to synthesize events for transactions
  * @ptwrites: whether to synthesize events for ptwrites
  * @pwr_events: whether to synthesize power events
@@ -74,6 +74,9 @@ enum itrace_period_type {
  * @callchain: add callchain to 'instructions' events
  * @thread_stack: feed branches to the thread_stack
  * @last_branch: add branch context to 'instruction' events
+ * @llc_miss: whether to synthesize last level cache miss events
+ * @tlb_miss: whether to synthesize TLB miss events
+ * @remote_access: whether to synthesize Remote access events
  * @callchain_sz: maximum callchain size
  * @last_branch_sz: branch context size
  * @period: 'instructions' events period
@@ -101,6 +104,9 @@ struct itrace_synth_opts {
 	bool			callchain;
 	bool			thread_stack;
 	bool			last_branch;
+	bool			llc_miss;
+	bool			tlb_miss;
+	bool			remote_access;
 	unsigned int		callchain_sz;
 	unsigned int		last_branch_sz;
 	unsigned long long	period;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 3/4] perf report: Add SPE options to --itrace argument
  2020-02-07 15:21     ` [PATCH v3 0/4] perf tools: Add support for some spe events and precise ip James Clark
  2020-02-07 15:21       ` [PATCH v3 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
  2020-02-07 15:21       ` [PATCH v3 2/4] perf tools: Add support for "report" for some spe events James Clark
@ 2020-02-07 15:21       ` James Clark
  2020-02-07 15:21       ` [PATCH v3 4/4] perf tools: Support "branch-misses:pp" on arm64 James Clark
  3 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-02-07 15:21 UTC (permalink / raw)
  To: jolsa, liwei391, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

The previous patch added support in "perf report" for some arm-spe
events(llc-miss, tlb-miss, branch-miss, remote_access). This patch
adds their help instructions.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/itrace.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 82ff7dad40c2..8e1488de1fb3 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -1,5 +1,5 @@
 		i	synthesize instructions events
-		b	synthesize branches events
+		b	synthesize branches events (branch misses on Arm)
 		c	synthesize branches events (calls only)
 		r	synthesize branches events (returns only)
 		x	synthesize transactions events
@@ -12,6 +12,9 @@
 		g	synthesize a call chain (use with i or x)
 		l	synthesize last branch entries (use with i or x)
 		s       skip initial number of events
+		m	synthesize LLC miss events
+		t	synthesize TLB miss events
+		a	synthesize remote access events
 
 	The default is all events i.e. the same as --itrace=ibxwpe,
 	except for perf script where it is --itrace=ce
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 4/4] perf tools: Support "branch-misses:pp" on arm64
  2020-02-07 15:21     ` [PATCH v3 0/4] perf tools: Add support for some spe events and precise ip James Clark
                         ` (2 preceding siblings ...)
  2020-02-07 15:21       ` [PATCH v3 3/4] perf report: Add SPE options to --itrace argument James Clark
@ 2020-02-07 15:21       ` James Clark
  2020-02-10 12:25         ` Jiri Olsa
  3 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-02-07 15:21 UTC (permalink / raw)
  To: jolsa, liwei391, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

At the suggestion of James Clark, use spe to support the precise
ip of some events. Currently its support event is:
branch-misses.

Example usage:

$ ./perf record -e branch-misses:pp dd if=/dev/zero of=/dev/null count=10000
(:p/pp/ppp is same for this case.)

$ ./perf report --stdio
("--stdio is not necessary")

--------------------------------------------------------------------
...
 # Samples: 14  of event 'branch-misses:pp'
 # Event count (approx.): 14
 #
 # Children      Self  Command  Shared Object      Symbol
 # ........  ........  .......  .................  ..........................
 #
    14.29%    14.29%  dd       [kernel.kallsyms]  [k] __arch_copy_from_user
    14.29%    14.29%  dd       libc-2.28.so       [.] _dl_addr
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] __free_pages
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] __pi_memcpy
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] pagecache_get_page
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] unmap_single_vma
     7.14%     7.14%  dd       dd                 [.] 0x00000000000025ec
     7.14%     7.14%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
     7.14%     7.14%  dd       ld-2.28.so         [.] check_match
     7.14%     7.14%  dd       libc-2.28.so       [.] __mpn_rshift
     7.14%     7.14%  dd       libc-2.28.so       [.] _nl_intern_locale_data
     7.14%     7.14%  dd       libc-2.28.so       [.] read_alias_file
...
--------------------------------------------------------------------

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Suggested-by: James Clark <James.Clark@arm.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/arch/arm/util/auxtrace.c | 38 +++++++++++++++++++++++++++++
 tools/perf/builtin-record.c         |  5 ++++
 tools/perf/util/arm-spe.c           |  9 +++++++
 tools/perf/util/arm-spe.h           |  3 +++
 tools/perf/util/auxtrace.h          |  6 +++++
 tools/perf/util/evlist.c            |  1 +
 tools/perf/util/evsel.h             |  1 -
 7 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 0a6e75b8777a..18f0ea7556e7 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -10,11 +10,25 @@
 
 #include "../../util/auxtrace.h"
 #include "../../util/debug.h"
+#include "../../util/env.h"
 #include "../../util/evlist.h"
 #include "../../util/pmu.h"
 #include "cs-etm.h"
 #include "arm-spe.h"
 
+#define SPE_ATTR_TS_ENABLE		BIT(0)
+#define SPE_ATTR_PA_ENABLE		BIT(1)
+#define SPE_ATTR_PCT_ENABLE		BIT(2)
+#define SPE_ATTR_JITTER			BIT(16)
+#define SPE_ATTR_BRANCH_FILTER		BIT(32)
+#define SPE_ATTR_LOAD_FILTER		BIT(33)
+#define SPE_ATTR_STORE_FILTER		BIT(34)
+
+#define SPE_ATTR_EV_RETIRED		BIT(1)
+#define SPE_ATTR_EV_CACHE		BIT(3)
+#define SPE_ATTR_EV_TLB			BIT(5)
+#define SPE_ATTR_EV_BRANCH		BIT(7)
+
 static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
 {
 	struct perf_pmu **arm_spe_pmus = NULL;
@@ -108,3 +122,27 @@ struct auxtrace_record
 	*err = 0;
 	return NULL;
 }
+
+void auxtrace__preprocess_evlist(struct evlist *evlist)
+{
+	struct evsel *evsel;
+	struct perf_pmu *pmu;
+
+	evlist__for_each_entry(evlist, evsel) {
+		/* Currently only supports precise_ip for branch-misses on arm64 */
+		if (!strcmp(perf_env__arch(evlist->env), "arm64")
+			&& evsel->core.attr.config == PERF_COUNT_HW_BRANCH_MISSES
+			&& evsel->core.attr.precise_ip)
+		{
+			pmu = perf_pmu__find("arm_spe_0");
+			if (pmu) {
+				evsel->pmu_name = pmu->name;
+				evsel->core.attr.type = pmu->type;
+				evsel->core.attr.config = SPE_ATTR_TS_ENABLE
+							| SPE_ATTR_BRANCH_FILTER;
+				evsel->core.attr.config1 = SPE_ATTR_EV_BRANCH;
+				evsel->core.attr.precise_ip = 0;
+			}
+		}
+	}
+}
\ No newline at end of file
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..3bc61f03d572 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2451,6 +2451,11 @@ int cmd_record(int argc, const char **argv)
 
 	argc = parse_options(argc, argv, record_options, record_usage,
 			    PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (auxtrace__preprocess_evlist) {
+		auxtrace__preprocess_evlist(rec->evlist);
+	}
+
 	if (quiet)
 		perf_quiet_option();
 
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 4ef22a0775a9..b21806c97dd8 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -778,6 +778,15 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	attr.sample_id_all = evsel->core.attr.sample_id_all;
 	attr.read_format = evsel->core.attr.read_format;
 
+	/* If it is in the precise ip mode, there is no need to
+	 * synthesize new events. */
+	if (!strncmp(evsel->name, "branch-misses", 13)) {
+		spe->sample_branch_miss = true;
+		spe->branch_miss_id = evsel->core.id[0];
+
+		return 0;
+	}
+
 	/* create new id val to be a fixed offset from evsel id */
 	id = evsel->core.id[0] + 1000000000;
 
diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index 98d3235781c3..8b1fb191d03a 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -20,6 +20,8 @@ enum {
 union perf_event;
 struct perf_session;
 struct perf_pmu;
+struct evlist;
+struct evsel;
 
 struct auxtrace_record *arm_spe_recording_init(int *err,
 					       struct perf_pmu *arm_spe_pmu);
@@ -28,4 +30,5 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session);
 
 struct perf_event_attr *arm_spe_pmu_default_config(struct perf_pmu *arm_spe_pmu);
+void arm_spe_precise_ip_support(struct evlist *evlist, struct evsel *evsel);
 #endif
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 80617b0d044d..4f89a3a31ab2 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -584,6 +584,7 @@ void auxtrace__dump_auxtrace_sample(struct perf_session *session,
 int auxtrace__flush_events(struct perf_session *session, struct perf_tool *tool);
 void auxtrace__free_events(struct perf_session *session);
 void auxtrace__free(struct perf_session *session);
+void auxtrace__preprocess_evlist(struct evlist *evlist) __attribute__((weak));
 
 #define ITRACE_HELP \
 "				i:	    		synthesize instructions events\n"		\
@@ -728,6 +729,11 @@ void auxtrace__free(struct perf_session *session __maybe_unused)
 {
 }
 
+static inline
+void auxtrace__preprocess_evlist(struct evlist *evlist __maybe_unused)
+{
+}
+
 static inline
 int auxtrace_index__write(int fd __maybe_unused,
 			  struct list_head *head __maybe_unused)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1548237b6558..84136d0adb29 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -9,6 +9,7 @@
 #include <errno.h>
 #include <inttypes.h>
 #include <poll.h>
+#include "arm-spe.h"
 #include "cpumap.h"
 #include "util/mmap.h"
 #include "thread_map.h"
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index dc14f4a823cd..c212e2eeeeb2 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -174,7 +174,6 @@ void perf_evsel__exit(struct evsel *evsel);
 void evsel__delete(struct evsel *evsel);
 
 struct callchain_param;
-
 void perf_evsel__config(struct evsel *evsel,
 			struct record_opts *opts,
 			struct callchain_param *callchain);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/4] perf tools: Support "branch-misses:pp" on arm64
  2020-02-07 15:21       ` [PATCH v3 4/4] perf tools: Support "branch-misses:pp" on arm64 James Clark
@ 2020-02-10 12:25         ` Jiri Olsa
  2020-02-11 14:04           ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip James Clark
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Olsa @ 2020-02-10 12:25 UTC (permalink / raw)
  To: James Clark
  Cc: liwei391, linux-arm-kernel, linux-kernel, nd, Tan Xiaojun,
	Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Al Grant,
	Namhyung Kim

On Fri, Feb 07, 2020 at 03:21:42PM +0000, James Clark wrote:

SNIP

>  
>  #define ITRACE_HELP \
>  "				i:	    		synthesize instructions events\n"		\
> @@ -728,6 +729,11 @@ void auxtrace__free(struct perf_session *session __maybe_unused)
>  {
>  }
>  
> +static inline
> +void auxtrace__preprocess_evlist(struct evlist *evlist __maybe_unused)
> +{
> +}
> +
>  static inline
>  int auxtrace_index__write(int fd __maybe_unused,
>  			  struct list_head *head __maybe_unused)
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 1548237b6558..84136d0adb29 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -9,6 +9,7 @@
>  #include <errno.h>
>  #include <inttypes.h>
>  #include <poll.h>
> +#include "arm-spe.h"
>  #include "cpumap.h"
>  #include "util/mmap.h"
>  #include "thread_map.h"
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index dc14f4a823cd..c212e2eeeeb2 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -174,7 +174,6 @@ void perf_evsel__exit(struct evsel *evsel);
>  void evsel__delete(struct evsel *evsel);
>  
>  struct callchain_param;
> -

hum? ;-)

jirka

>  void perf_evsel__config(struct evsel *evsel,
>  			struct record_opts *opts,
>  			struct callchain_param *callchain);
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip
  2020-02-10 12:25         ` Jiri Olsa
@ 2020-02-11 14:04           ` James Clark
  2020-02-11 14:04             ` [PATCH v4 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
                               ` (4 more replies)
  0 siblings, 5 replies; 42+ messages in thread
From: James Clark @ 2020-02-11 14:04 UTC (permalink / raw)
  To: jolsa, linux-arm-kernel, linux-kernel; +Cc: nd, James Clark

Hi Jirka,

Oops. I've removed all the changes to evlist.c and evsel.h


James

Tan Xiaojun (4):
  perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  perf tools: Add support for "report" for some spe events
  perf report: Add SPE options to --itrace argument
  perf tools: Support "branch-misses:pp" on arm64

 tools/perf/Documentation/itrace.txt           |   5 +-
 tools/perf/arch/arm/util/auxtrace.c           |  38 +
 tools/perf/builtin-record.c                   |   5 +
 tools/perf/util/Build                         |   2 +-
 tools/perf/util/arm-spe-decoder/Build         |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-pkt-decoder.c                     |   0
 .../arm-spe-pkt-decoder.h                     |   2 +
 tools/perf/util/arm-spe.c                     | 756 +++++++++++++++++-
 tools/perf/util/arm-spe.h                     |   3 +
 tools/perf/util/auxtrace.c                    |  13 +
 tools/perf/util/auxtrace.h                    |  14 +-
 13 files changed, 1089 insertions(+), 41 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (96%)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  2020-02-11 14:04           ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip James Clark
@ 2020-02-11 14:04             ` James Clark
  2020-02-11 14:04             ` [PATCH v4 2/4] perf tools: Add support for "report" for some spe events James Clark
                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-02-11 14:04 UTC (permalink / raw)
  To: jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

Create a new arm-spe-decoder directory for subsequent extensions and
move arm-spe-pkt-decoder.h/c to this directory. No code changes.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/Build                                       | 2 +-
 tools/perf/util/arm-spe-decoder/Build                       | 1 +
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c | 0
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h | 0
 tools/perf/util/arm-spe.c                                   | 2 +-
 5 files changed, 3 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (100%)

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 07da6c790b63..0184510083c2 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -104,7 +104,7 @@ perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 perf-$(CONFIG_AUXTRACE) += intel-pt.o
 perf-$(CONFIG_AUXTRACE) += intel-bts.o
 perf-$(CONFIG_AUXTRACE) += arm-spe.o
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-decoder/
 perf-$(CONFIG_AUXTRACE) += s390-cpumsf.o
 
 ifdef CONFIG_LIBOPENCSD
diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
new file mode 100644
index 000000000000..16efbc245028
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -0,0 +1 @@
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
diff --git a/tools/perf/util/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.c
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
diff --git a/tools/perf/util/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.h
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 53be12b23ff4..f3382a38d48e 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -23,7 +23,7 @@
 #include "debug.h"
 #include "auxtrace.h"
 #include "arm-spe.h"
-#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
 struct arm_spe {
 	struct auxtrace			auxtrace;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 2/4] perf tools: Add support for "report" for some spe events
  2020-02-11 14:04           ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip James Clark
  2020-02-11 14:04             ` [PATCH v4 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
@ 2020-02-11 14:04             ` James Clark
  2020-02-17 11:39               ` Adrian Hunter
  2020-02-11 14:04             ` [PATCH v4 3/4] perf report: Add SPE options to --itrace argument James Clark
                               ` (2 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-02-11 14:04 UTC (permalink / raw)
  To: jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patch is to improve the "perf report" support for spe, and
further process the data. Currently, support for the four events
of llc-miss, tlb-miss, branch-miss, and remote-access is added.

Example usage:

$ ./perf record -c 1024 -e arm_spe_0/branch_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ -o perf-armspe-dd.data dd if=/dev/zero of=/dev/null count=10000

$ ./perf report -i perf-armspe-dd.data --stdio
--------------------------------------------------------------------
...
 # Samples: 23  of event 'llc-miss'
 # Event count (approx.): 23
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    12.12%    12.12%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     6.06%     6.06%  dd       [kernel.kallsyms]  [k] copy_page
     6.06%     6.06%  dd       ld-2.28.so         [.] _dl_relocate_object
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] change_protection_range
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] generic_permission
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] kmem_cache_alloc
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] lookup_fast
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] perf_event_exec
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] ring_buffer_record_is_on
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_start
     3.03%     3.03%  dd       ld-2.28.so         [.] dl_main
     3.03%     3.03%  dd       ld-2.28.so         [.] strcmp
     3.03%     3.03%  dd       libc-2.28.so       [.] _dl_addr
...
 # Samples: 3  of event 'tlb-miss'
 # Event count (approx.): 3
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] filemap_map_pages
    33.33%    33.33%  dd       ld-2.28.so         [.] _dl_start
    33.33%    33.33%  dd       ld-2.28.so         [.] dl_main
...
 # Samples: 20  of event 'branch-miss'
 # Event count (approx.): 20
...
    15.38%    15.38%  dd       [kernel.kallsyms]  [k] __fput
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] do_el0_ia_bp_hardening
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] pagevec_lru_move_fn
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] perf_event_mmap_output
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] task_work_run
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] unmap_single_vma
     7.69%     7.69%  dd       libc-2.28.so       [.] _IO_flush_all_lockp
     7.69%     7.69%  dd       libc-2.28.so       [.] __memcpy_generic
     7.69%     7.69%  dd       libc-2.28.so       [.] _dl_addr
     7.69%     7.69%  dd       libc-2.28.so       [.] msort_with_tmp.part.0
     7.69%     7.69%  dd       libc-2.28.so       [.] read_alias_file
...
 # Samples: 5  of event 'remote-access'
 # Event count (approx.): 5
...
    27.78%    27.78%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    16.67%    16.67%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] change_protection_range
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] generic_permission
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] lookup_fast
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] perf_event_exec
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_relocate_object
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_start
     5.56%     5.56%  dd       ld-2.28.so         [.] dl_main

--------------------------------------------------------------------
After that, more analysis and processing of the raw data of spe
will be done.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/arm-spe-decoder/Build         |   2 +-
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-decoder/arm-spe-pkt-decoder.h     |   2 +
 tools/perf/util/arm-spe.c                     | 745 +++++++++++++++++-
 tools/perf/util/auxtrace.c                    |  13 +
 tools/perf/util/auxtrace.h                    |   8 +-
 7 files changed, 1022 insertions(+), 39 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h

diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
index 16efbc245028..f8dae13fc876 100644
--- a/tools/perf/util/arm-spe-decoder/Build
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -1 +1 @@
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
new file mode 100644
index 000000000000..50e796b89a95
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/zalloc.h>
+
+#include "../util.h"
+#include "../debug.h"
+#include "../auxtrace.h"
+
+#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder.h"
+
+#ifndef BIT
+#define BIT(n)		(1UL << (n))
+#endif
+
+struct arm_spe_decoder {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+	struct arm_spe_state state;
+	const unsigned char *buf;
+	size_t len;
+	uint64_t pos;
+	struct arm_spe_pkt packet;
+	int pkt_step;
+	int pkt_len;
+	int last_packet_type;
+
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t timestamp;
+	uint64_t sample_timestamp;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
+};
+
+static uint64_t arm_spe_calc_ip(uint64_t payload)
+{
+	uint64_t ip = (payload & ~(0xffULL << 56));
+
+	/* fill high 8 bits for kernel virtual address */
+	/* In Armv8 Architecture Reference Manual: Xn[55] determines
+	 * whether the address lies in the upper or lower address range
+	 * for the purpose of determining whether address tagging is
+	 * used */
+	if (ip & BIT(55))
+		ip |= (uint64_t)(0xffULL << 56);
+
+	return ip;
+}
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
+{
+	struct arm_spe_decoder *decoder;
+
+	if (!params->get_trace)
+		return NULL;
+
+	decoder = zalloc(sizeof(struct arm_spe_decoder));
+	if (!decoder)
+		return NULL;
+
+	decoder->get_trace          = params->get_trace;
+	decoder->data               = params->data;
+
+	return decoder;
+}
+
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
+{
+	free(decoder);
+}
+
+static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
+{
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	pr_debug("ERROR: Bad packet\n");
+
+	return -EBADMSG;
+}
+
+
+static int arm_spe_get_data(struct arm_spe_decoder *decoder)
+{
+	struct arm_spe_buffer buffer = { .buf = 0, };
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	pr_debug("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		pr_debug("No more data\n");
+		return -ENODATA;
+	}
+
+	return 0;
+}
+
+static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
+{
+	return arm_spe_get_data(decoder);
+}
+
+static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
+{
+	int ret;
+
+	decoder->last_packet_type = decoder->packet.type;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+
+		if (!decoder->len) {
+			ret = arm_spe_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = arm_spe_get_packet(decoder->buf, decoder->len,
+				&decoder->packet);
+		if (ret <= 0)
+			return arm_spe_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+	} while (decoder->packet.type == ARM_SPE_PAD);
+
+	return 0;
+}
+
+static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
+{
+	int err;
+	int idx;
+	uint64_t payload;
+
+	while (1) {
+		err = arm_spe_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		idx = decoder->packet.index;
+		payload = decoder->packet.payload;
+
+		switch (decoder->packet.type) {
+		case ARM_SPE_TIMESTAMP:
+			decoder->sample_timestamp = payload;
+			return 0;
+		case ARM_SPE_END:
+			decoder->sample_timestamp = 0;
+			return 0;
+		case ARM_SPE_ADDRESS:
+			decoder->ip = arm_spe_calc_ip(payload);
+			if (idx == 0)
+				decoder->state.from_ip = decoder->ip;
+			else if (idx == 1)
+				decoder->state.to_ip = decoder->ip;
+			break;
+		case ARM_SPE_COUNTER:
+			break;
+		case ARM_SPE_CONTEXT:
+			break;
+		case ARM_SPE_OP_TYPE:
+			break;
+		case ARM_SPE_EVENTS:
+			if (payload & BIT(EV_TLB_REFILL))
+				decoder->state.type |= ARM_SPE_TLB_MISS;
+			if (payload & BIT(EV_MISPRED))
+				decoder->state.type |= ARM_SPE_BRANCH_MISS;
+			if (idx > 1 && (payload & BIT(EV_LLC_REFILL)))
+				decoder->state.type |= ARM_SPE_LLC_MISS;
+			if (idx > 1 && (payload & BIT(EV_REMOTE_ACCESS)))
+				decoder->state.type |= ARM_SPE_REMOTE_ACCESS;
+
+			break;
+		case ARM_SPE_DATA_SOURCE:
+			break;
+		case ARM_SPE_BAD:
+			break;
+		case ARM_SPE_PAD:
+			break;
+		default:
+			pr_err("Get Packet Error!\n");
+			return -ENOSYS;
+		}
+	}
+}
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
+{
+	int err;
+
+	decoder->state.type = 0;
+
+	err = arm_spe_walk_trace(decoder);
+	if (err)
+		decoder->state.err = err;
+
+	decoder->state.timestamp = decoder->sample_timestamp;
+
+	return &decoder->state;
+}
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
new file mode 100644
index 000000000000..330f9e1e71ab
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef INCLUDE__ARM_SPE_DECODER_H__
+#define INCLUDE__ARM_SPE_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+enum arm_spe_events {
+	EV_EXCEPTION_GEN,
+	EV_RETIRED,
+	EV_L1D_ACCESS,
+	EV_L1D_REFILL,
+	EV_TLB_ACCESS,
+	EV_TLB_REFILL,
+	EV_NOT_TAKEN,
+	EV_MISPRED,
+	EV_LLC_ACCESS,
+	EV_LLC_REFILL,
+	EV_REMOTE_ACCESS,
+};
+
+enum arm_spe_sample_type {
+	ARM_SPE_LLC_MISS	= 1 << 0,
+	ARM_SPE_TLB_MISS	= 1 << 1,
+	ARM_SPE_BRANCH_MISS	= 1 << 2,
+	ARM_SPE_REMOTE_ACCESS	= 1 << 3,
+	ARM_SPE_EX_STOP		= 1 << 6,
+};
+
+struct arm_spe_state {
+	enum arm_spe_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t timestamp;
+};
+
+struct arm_spe_insn;
+
+struct arm_spe_buffer {
+	const unsigned char *buf;
+	size_t len;
+	u64 offset;
+	bool consecutive;
+	uint64_t ref_timestamp;
+	uint64_t trace_nr;
+};
+
+struct arm_spe_params {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+};
+
+struct arm_spe_decoder;
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params);
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder);
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder);
+
+#endif
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
index d786ef65113f..865d1e35b401 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -15,6 +15,8 @@
 #define ARM_SPE_NEED_MORE_BYTES		-1
 #define ARM_SPE_BAD_PACKET		-2
 
+#define ARM_SPE_PKT_MAX_SZ		16
+
 enum arm_spe_pkt_type {
 	ARM_SPE_BAD,
 	ARM_SPE_PAD,
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index f3382a38d48e..4ef22a0775a9 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -16,34 +16,68 @@
 #include <linux/log2.h>
 #include <linux/zalloc.h>
 
+#include "auxtrace.h"
 #include "color.h"
+#include "debug.h"
 #include "evsel.h"
+#include "evlist.h"
 #include "machine.h"
 #include "session.h"
-#include "debug.h"
-#include "auxtrace.h"
+#include "symbol.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "tool.h"
+#include "util/synthetic-events.h"
+
 #include "arm-spe.h"
+#include "arm-spe-decoder/arm-spe-decoder.h"
 #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
+#define MAX_TIMESTAMP (~0ULL)
+
 struct arm_spe {
 	struct auxtrace			auxtrace;
 	struct auxtrace_queues		queues;
 	struct auxtrace_heap		heap;
+        struct itrace_synth_opts        synth_opts;
 	u32				auxtrace_type;
 	struct perf_session		*session;
 	struct machine			*machine;
 	u32				pmu_type;
+
+	u8				timeless_decoding;
+	u8				data_queued;
+
+	u8				sample_llc_miss;
+	u8				sample_tlb_miss;
+	u8				sample_branch_miss;
+	u8				sample_remote_access;
+	u64				llc_miss_id;
+	u64				tlb_miss_id;
+	u64				branch_miss_id;
+	u64				remote_access_id;
+	u64				kernel_start;
+
+	unsigned long			num_events;
 };
 
 struct arm_spe_queue {
-	struct arm_spe		*spe;
-	unsigned int		queue_nr;
-	struct auxtrace_buffer	*buffer;
-	bool			on_heap;
-	bool			done;
-	pid_t			pid;
-	pid_t			tid;
-	int			cpu;
+	struct arm_spe			*spe;
+	unsigned int			queue_nr;
+	struct auxtrace_buffer		*buffer;
+	struct auxtrace_buffer		*old_buffer;
+	union perf_event		*event_buf;
+	bool				on_heap;
+	bool				done;
+	pid_t				pid;
+	pid_t				tid;
+	int				cpu;
+	void				*decoder;
+	const struct arm_spe_state	*state;
+	u64				time;
+	u64				timestamp;
+	struct thread			*thread;
+	bool				have_sample;
 };
 
 static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
@@ -92,44 +126,494 @@ static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
 	arm_spe_dump(spe, buf, len);
 }
 
-static int arm_spe_process_event(struct perf_session *session __maybe_unused,
-				 union perf_event *event __maybe_unused,
-				 struct perf_sample *sample __maybe_unused,
-				 struct perf_tool *tool __maybe_unused)
+static int arm_spe_get_trace(struct arm_spe_buffer *b, void *data)
+{
+	struct arm_spe_queue *speq = data;
+	struct auxtrace_buffer *buffer = speq->buffer;
+	struct auxtrace_buffer *old_buffer = speq->old_buffer;
+	struct auxtrace_queue *queue;
+
+	queue = &speq->spe->queues.queue_array[speq->queue_nr];
+
+	buffer = auxtrace_buffer__next(queue, buffer);
+	/* If no more data, drop the previous auxtrace_buffer and return */
+	if (!buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	speq->buffer = buffer;
+
+	/* If the aux_buffer doesn't have data associated, try to load it */
+	if (!buffer->data) {
+		/* get the file desc associated with the perf data file */
+		int fd = perf_data__fd(speq->spe->session->data);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (buffer->use_data) {
+		b->len = buffer->use_size;
+		b->buf = buffer->use_data;
+	} else {
+		b->len = buffer->size;
+		b->buf = buffer->data;
+	}
+
+	b->ref_timestamp = buffer->reference;
+
+	if (b->len) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		speq->old_buffer = buffer;
+	} else {
+		auxtrace_buffer__drop_data(buffer);
+		return arm_spe_get_trace(b, data);
+	}
+
+	return 0;
+}
+
+static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe,
+		unsigned int queue_nr)
+{
+	struct arm_spe_params params = { .get_trace = 0, };
+	struct arm_spe_queue *speq;
+
+	speq = zalloc(sizeof(*speq));
+	if (!speq)
+		return NULL;
+
+	speq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!speq->event_buf)
+		goto out_free;
+
+	speq->spe = spe;
+	speq->queue_nr = queue_nr;
+	speq->pid = -1;
+	speq->tid = -1;
+	speq->cpu = -1;
+
+	/* params set */
+	params.get_trace = arm_spe_get_trace;
+	params.data = speq;
+
+	/* create new decoder */
+	speq->decoder = arm_spe_decoder_new(&params);
+	if (!speq->decoder)
+		goto out_free;
+
+	return speq;
+
+out_free:
+	zfree(&speq->event_buf);
+	free(speq);
+
+	return NULL;
+}
+
+static inline u8 arm_spe_cpumode(struct arm_spe *spe, uint64_t ip)
+{
+	return ip >= spe->kernel_start ?
+		PERF_RECORD_MISC_KERNEL :
+		PERF_RECORD_MISC_USER;
+}
+
+static void arm_spe_prep_sample(struct arm_spe *spe,
+				struct arm_spe_queue *speq,
+				union perf_event *event,
+				struct perf_sample *sample)
+{
+	if (!spe->timeless_decoding)
+		sample->time = speq->timestamp;
+
+	sample->ip = speq->state->from_ip;
+	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
+	sample->pid = speq->pid;
+	sample->tid = speq->tid;
+	sample->addr = speq->state->to_ip;
+	sample->period = 1;
+	sample->cpu = speq->cpu;
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = sample->cpumode;
+	event->sample.header.size = sizeof(struct perf_event_header);
+}
+
+static inline int
+arm_spe_deliver_synth_event(struct arm_spe *spe,
+			    struct arm_spe_queue *speq __maybe_unused,
+			    union perf_event *event,
+			    struct perf_sample *sample)
+{
+	int ret;
+
+	ret = perf_session__deliver_synth_event(spe->session, event, sample);
+	if (ret)
+		pr_err("ARM SPE: failed to deliver event, error %d\n", ret);
+
+	return ret;
+}
+
+static int
+arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
+				u64 spe_events_id)
+{
+	struct arm_spe *spe = speq->spe;
+	union perf_event *event = speq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	arm_spe_prep_sample(spe, speq, event, &sample);
+
+	sample.id = spe_events_id;
+	sample.stream_id = spe_events_id;
+
+	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+static int arm_spe_sample(struct arm_spe_queue *speq)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!speq->have_sample)
+		return 0;
+
+	speq->have_sample = false;
+
+	if (spe->sample_llc_miss && (state->type & ARM_SPE_LLC_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->llc_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_tlb_miss && (state->type & ARM_SPE_TLB_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->tlb_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_branch_miss && (state->type & ARM_SPE_BRANCH_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq,
+						      spe->branch_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_remote_access && (state->type & ARM_SPE_REMOTE_ACCESS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->remote_access_id);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!spe->kernel_start)
+		spe->kernel_start = machine__kernel_start(spe->machine);
+
+	while (1) {
+		err = arm_spe_sample(speq);
+		if (err)
+			return err;
+
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("No data or all data has been processed.\n");
+				return 1;
+			}
+			continue;
+		}
+
+		speq->state = state;
+		speq->have_sample = true;
+
+		if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
+			*timestamp = speq->timestamp;
+			return 0;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queue(struct arm_spe *spe,
+			       struct auxtrace_queue *queue,
+			       unsigned int queue_nr)
+{
+	struct arm_spe_queue *speq = queue->priv;
+
+	if (list_empty(&queue->head) || speq)
+		return 0;
+
+	speq = arm_spe__alloc_queue(spe, queue_nr);
+
+	if (!speq)
+		return -ENOMEM;
+
+	queue->priv = speq;
+
+	if (queue->cpu != -1)
+		speq->cpu = queue->cpu;
+
+	if (!speq->on_heap) {
+		const struct arm_spe_state *state;
+		int ret;
+
+		if (spe->timeless_decoding)
+			return 0;
+
+retry:
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("queue %u has no timestamp\n",
+						queue_nr);
+				return 0;
+			}
+			goto retry;
+		}
+
+		speq->timestamp = state->timestamp;
+		speq->state = state;
+		speq->have_sample = true;
+		ret = auxtrace_heap__add(&spe->heap, queue_nr, speq->timestamp);
+		if (ret)
+			return ret;
+		speq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queues(struct arm_spe *spe)
 {
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < spe->queues.nr_queues; i++) {
+		ret = arm_spe__setup_queue(spe, &spe->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int arm_spe__update_queues(struct arm_spe *spe)
+{
+	if (spe->queues.new_data) {
+		spe->queues.new_data = false;
+		return arm_spe__setup_queues(spe);
+	}
+
 	return 0;
 }
 
+static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
+{
+	struct evsel *evsel;
+	struct evlist *evlist = spe->session->evlist;
+	bool timeless_decoding = true;
+
+	/*
+	 * Circle through the list of event and complain if we find one
+	 * with the time bit set.
+	 */
+	evlist__for_each_entry(evlist, evsel) {
+		if ((evsel->core.attr.sample_type & PERF_SAMPLE_TIME))
+			timeless_decoding = false;
+	}
+
+	return timeless_decoding;
+}
+
+static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
+				    struct auxtrace_queue *queue)
+{
+	struct arm_spe_queue *speq = queue->priv;
+	pid_t tid;
+
+	tid = machine__get_current_tid(spe->machine, speq->cpu);
+	if (tid != -1) {
+		speq->tid = tid;
+		thread__zput(speq->thread);
+	} else
+		speq->tid = queue->tid;
+
+	if ((!speq->thread) && (speq->tid != -1)) {
+		speq->thread = machine__find_thread(spe->machine, -1,
+						    speq->tid);
+	}
+
+	if (speq->thread) {
+		speq->pid = speq->thread->pid_;
+		if (queue->cpu == -1)
+			speq->cpu = speq->thread->cpu;
+	}
+}
+
+static int arm_spe_process_queues(struct arm_spe *spe, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct arm_spe_queue *speq;
+
+		if (!spe->heap.heap_cnt)
+			return 0;
+
+		if (spe->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = spe->heap.heap_array[0].queue_nr;
+		queue = &spe->queues.queue_array[queue_nr];
+		speq = queue->priv;
+
+		auxtrace_heap__pop(&spe->heap);
+
+		if (spe->heap.heap_cnt) {
+			ts = spe->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		arm_spe_set_pid_tid_cpu(spe, queue);
+
+		ret = arm_spe_run_decoder(speq, &ts);
+		if (ret < 0) {
+			auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			speq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid,
+					    u64 time_)
+{
+	struct auxtrace_queues *queues = &spe->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &spe->queues.queue_array[i];
+		struct arm_spe_queue *speq = queue->priv;
+
+		if (speq && (tid == -1 || speq->tid == tid)) {
+			speq->time = time_;
+			arm_spe_set_pid_tid_cpu(spe, queue);
+			arm_spe_run_decoder(speq, &ts);
+		}
+	}
+	return 0;
+}
+
+static int arm_spe_process_event(struct perf_session *session,
+				 union perf_event *event,
+				 struct perf_sample *sample,
+				 struct perf_tool *tool)
+{
+	int err = 0;
+	u64 timestamp;
+	struct arm_spe *spe = container_of(session->auxtrace,
+			struct arm_spe, auxtrace);
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("CoreSight SPE Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time && (sample->time != (u64) -1))
+		timestamp = sample->time;
+	else
+		timestamp = 0;
+
+	if (timestamp || spe->timeless_decoding) {
+		err = arm_spe__update_queues(spe);
+		if (err)
+			return err;
+	}
+
+	if (spe->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_timeless_queues(spe,
+					event->fork.tid,
+					sample->time);
+		}
+	} else if (timestamp) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_queues(spe, timestamp);
+			if (err)
+				return err;
+		}
+	}
+
+	return err;
+}
+
 static int arm_spe_process_auxtrace_event(struct perf_session *session,
 					  union perf_event *event,
 					  struct perf_tool *tool __maybe_unused)
 {
 	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
 					     auxtrace);
-	struct auxtrace_buffer *buffer;
-	off_t data_offset;
-	int fd = perf_data__fd(session->data);
-	int err;
 
-	if (perf_data__is_pipe(session->data)) {
-		data_offset = 0;
-	} else {
-		data_offset = lseek(fd, 0, SEEK_CUR);
-		if (data_offset == -1)
-			return -errno;
-	}
+	if (!spe->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data__fd(session->data);
+		int err;
 
-	err = auxtrace_queues__add_event(&spe->queues, session, event,
-					 data_offset, &buffer);
-	if (err)
-		return err;
-
-	/* Dump here now we have copied a piped trace out of the pipe */
-	if (dump_trace) {
-		if (auxtrace_buffer__get_data(buffer, fd)) {
-			arm_spe_dump_event(spe, buffer->data,
-					     buffer->size);
-			auxtrace_buffer__put_data(buffer);
+		if (perf_data__is_pipe(session->data)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&spe->queues, session, event,
+				data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				arm_spe_dump_event(spe, buffer->data,
+						buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
 		}
 	}
 
@@ -139,7 +623,25 @@ static int arm_spe_process_auxtrace_event(struct perf_session *session,
 static int arm_spe_flush(struct perf_session *session __maybe_unused,
 			 struct perf_tool *tool __maybe_unused)
 {
-	return 0;
+	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
+			auxtrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = arm_spe__update_queues(spe);
+	if (ret < 0)
+		return ret;
+
+	if (spe->timeless_decoding)
+		return arm_spe_process_timeless_queues(spe, -1,
+				MAX_TIMESTAMP - 1);
+
+	return arm_spe_process_queues(spe, MAX_TIMESTAMP);
 }
 
 static void arm_spe_free_queue(void *priv)
@@ -148,6 +650,9 @@ static void arm_spe_free_queue(void *priv)
 
 	if (!speq)
 		return;
+	thread__zput(speq->thread);
+	arm_spe_decoder_free(speq->decoder);
+	zfree(&speq->event_buf);
 	free(speq);
 }
 
@@ -188,6 +693,149 @@ static void arm_spe_print_info(__u64 *arr)
 	fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
 }
 
+struct arm_spe_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int arm_spe_event_synth(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample __maybe_unused,
+			       struct machine *machine __maybe_unused)
+{
+	struct arm_spe_synth *arm_spe_synth =
+		      container_of(tool, struct arm_spe_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(arm_spe_synth->session,
+						 event, NULL);
+}
+
+static int arm_spe_synth_event(struct perf_session *session,
+			       struct perf_event_attr *attr, u64 id)
+{
+	struct arm_spe_synth arm_spe_synth;
+
+	memset(&arm_spe_synth, 0, sizeof(struct arm_spe_synth));
+	arm_spe_synth.session = session;
+
+	return perf_event__synthesize_attr(&arm_spe_synth.dummy_tool, attr, 1,
+					   &id, arm_spe_event_synth);
+}
+
+static void arm_spe_set_event_name(struct evlist *evlist, u64 id,
+				    const char *name)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.id && evsel->core.id[0] == id) {
+			if (evsel->name)
+				zfree(&evsel->name);
+			evsel->name = strdup(name);
+			break;
+		}
+	}
+}
+
+static int
+arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
+{
+	struct evlist *evlist = session->evlist;
+	struct evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.attr.type == spe->pmu_type) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("No selected events with CoreSight Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+		PERF_SAMPLE_PERIOD;
+	if (spe->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+
+	attr.exclude_user = evsel->core.attr.exclude_user;
+	attr.exclude_kernel = evsel->core.attr.exclude_kernel;
+	attr.exclude_hv = evsel->core.attr.exclude_hv;
+	attr.exclude_host = evsel->core.attr.exclude_host;
+	attr.exclude_guest = evsel->core.attr.exclude_guest;
+	attr.sample_id_all = evsel->core.attr.sample_id_all;
+	attr.read_format = evsel->core.attr.read_format;
+
+	/* create new id val to be a fixed offset from evsel id */
+	id = evsel->core.id[0] + 1000000000;
+
+	if (!id)
+		id = 1;
+
+	/* spe events set */
+	if (spe->synth_opts.llc_miss) {
+		spe->sample_llc_miss = true;
+
+		/* llc-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->llc_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "llc-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.tlb_miss) {
+		spe->sample_tlb_miss = true;
+
+		/* tlb-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->tlb_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "tlb-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.branches) {
+		spe->sample_branch_miss = true;
+
+		/* branch-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->branch_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "branch-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.remote_access) {
+		spe->sample_remote_access = true;
+
+		/* remote-access */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->remote_access_id = id;
+		arm_spe_set_event_name(evlist, id, "remote-access");
+		id += 1;
+	}
+
+	return 0;
+}
+
 int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session)
 {
@@ -213,6 +861,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->auxtrace_type = auxtrace_info->type;
 	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
 
+	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 	spe->auxtrace.process_event = arm_spe_process_event;
 	spe->auxtrace.process_auxtrace_event = arm_spe_process_auxtrace_event;
 	spe->auxtrace.flush_events = arm_spe_flush;
@@ -222,8 +871,30 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 
 	arm_spe_print_info(&auxtrace_info->priv[0]);
 
+	if (dump_trace)
+		return 0;
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+		spe->synth_opts = *session->itrace_synth_opts;
+	else
+		itrace_synth_opts__set_default(&spe->synth_opts, false);
+
+	err = arm_spe_synth_events(spe, session);
+	if (err)
+		goto err_free_queues;
+
+	err = auxtrace_queues__process_index(&spe->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (spe->queues.populated)
+		spe->data_queued = true;
+
 	return 0;
 
+err_free_queues:
+	auxtrace_queues__free(&spe->queues);
+	session->auxtrace = NULL;
 err_free:
 	free(spe);
 	return err;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index eb087e7df6f4..2901b07a9293 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1279,6 +1279,10 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
 	synth_opts->pwr_events = true;
 	synth_opts->other_events = true;
 	synth_opts->errors = true;
+	synth_opts->llc_miss = true;
+	synth_opts->tlb_miss = true;
+	synth_opts->remote_access = true;
+
 	if (no_sample) {
 		synth_opts->period_type = PERF_ITRACE_PERIOD_INSTRUCTIONS;
 		synth_opts->period = 1;
@@ -1431,6 +1435,15 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 				goto out_err;
 			p = endptr;
 			break;
+		case 'm':
+			synth_opts->llc_miss = true;
+			break;
+		case 't':
+			synth_opts->tlb_miss = true;
+			break;
+		case 'a':
+			synth_opts->remote_access = true;
+			break;
 		case ' ':
 		case ',':
 			break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 749d72cd9c7b..80617b0d044d 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -60,7 +60,7 @@ enum itrace_period_type {
  * @inject: indicates the event (not just the sample) must be fully synthesized
  *          because 'perf inject' will write it out
  * @instructions: whether to synthesize 'instructions' events
- * @branches: whether to synthesize 'branches' events
+ * @branches: whether to synthesize 'branches' events (branch misses only on Arm)
  * @transactions: whether to synthesize events for transactions
  * @ptwrites: whether to synthesize events for ptwrites
  * @pwr_events: whether to synthesize power events
@@ -74,6 +74,9 @@ enum itrace_period_type {
  * @callchain: add callchain to 'instructions' events
  * @thread_stack: feed branches to the thread_stack
  * @last_branch: add branch context to 'instruction' events
+ * @llc_miss: whether to synthesize last level cache miss events
+ * @tlb_miss: whether to synthesize TLB miss events
+ * @remote_access: whether to synthesize Remote access events
  * @callchain_sz: maximum callchain size
  * @last_branch_sz: branch context size
  * @period: 'instructions' events period
@@ -101,6 +104,9 @@ struct itrace_synth_opts {
 	bool			callchain;
 	bool			thread_stack;
 	bool			last_branch;
+	bool			llc_miss;
+	bool			tlb_miss;
+	bool			remote_access;
 	unsigned int		callchain_sz;
 	unsigned int		last_branch_sz;
 	unsigned long long	period;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 3/4] perf report: Add SPE options to --itrace argument
  2020-02-11 14:04           ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip James Clark
  2020-02-11 14:04             ` [PATCH v4 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
  2020-02-11 14:04             ` [PATCH v4 2/4] perf tools: Add support for "report" for some spe events James Clark
@ 2020-02-11 14:04             ` James Clark
  2020-02-17 11:39               ` Adrian Hunter
  2020-02-11 14:04             ` [PATCH v4 " James Clark
  2020-02-12 12:24             ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip Jiri Olsa
  4 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-02-11 14:04 UTC (permalink / raw)
  To: jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

The previous patch added support in "perf report" for some arm-spe
events(llc-miss, tlb-miss, branch-miss, remote_access). This patch
adds their help instructions.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/itrace.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 82ff7dad40c2..8e1488de1fb3 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -1,5 +1,5 @@
 		i	synthesize instructions events
-		b	synthesize branches events
+		b	synthesize branches events (branch misses on Arm)
 		c	synthesize branches events (calls only)
 		r	synthesize branches events (returns only)
 		x	synthesize transactions events
@@ -12,6 +12,9 @@
 		g	synthesize a call chain (use with i or x)
 		l	synthesize last branch entries (use with i or x)
 		s       skip initial number of events
+		m	synthesize LLC miss events
+		t	synthesize TLB miss events
+		a	synthesize remote access events
 
 	The default is all events i.e. the same as --itrace=ibxwpe,
 	except for perf script where it is --itrace=ce
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 4/4] perf tools: Support "branch-misses:pp" on arm64
  2020-02-11 14:04           ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip James Clark
                               ` (2 preceding siblings ...)
  2020-02-11 14:04             ` [PATCH v4 3/4] perf report: Add SPE options to --itrace argument James Clark
@ 2020-02-11 14:04             ` James Clark
  2020-02-17 11:42               ` Adrian Hunter
  2020-02-12 12:24             ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip Jiri Olsa
  4 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-02-11 14:04 UTC (permalink / raw)
  To: jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

At the suggestion of James Clark, use spe to support the precise
ip of some events. Currently its support event is:
branch-misses.

Example usage:

$ ./perf record -e branch-misses:pp dd if=/dev/zero of=/dev/null count=10000
(:p/pp/ppp is same for this case.)

$ ./perf report --stdio
("--stdio is not necessary")

--------------------------------------------------------------------
...
 # Samples: 14  of event 'branch-misses:pp'
 # Event count (approx.): 14
 #
 # Children      Self  Command  Shared Object      Symbol
 # ........  ........  .......  .................  ..........................
 #
    14.29%    14.29%  dd       [kernel.kallsyms]  [k] __arch_copy_from_user
    14.29%    14.29%  dd       libc-2.28.so       [.] _dl_addr
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] __free_pages
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] __pi_memcpy
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] pagecache_get_page
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] unmap_single_vma
     7.14%     7.14%  dd       dd                 [.] 0x00000000000025ec
     7.14%     7.14%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
     7.14%     7.14%  dd       ld-2.28.so         [.] check_match
     7.14%     7.14%  dd       libc-2.28.so       [.] __mpn_rshift
     7.14%     7.14%  dd       libc-2.28.so       [.] _nl_intern_locale_data
     7.14%     7.14%  dd       libc-2.28.so       [.] read_alias_file
...
--------------------------------------------------------------------

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Suggested-by: James Clark <James.Clark@arm.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/arch/arm/util/auxtrace.c | 38 +++++++++++++++++++++++++++++
 tools/perf/builtin-record.c         |  5 ++++
 tools/perf/util/arm-spe.c           |  9 +++++++
 tools/perf/util/arm-spe.h           |  3 +++
 tools/perf/util/auxtrace.h          |  6 +++++
 5 files changed, 61 insertions(+)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 0a6e75b8777a..18f0ea7556e7 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -10,11 +10,25 @@
 
 #include "../../util/auxtrace.h"
 #include "../../util/debug.h"
+#include "../../util/env.h"
 #include "../../util/evlist.h"
 #include "../../util/pmu.h"
 #include "cs-etm.h"
 #include "arm-spe.h"
 
+#define SPE_ATTR_TS_ENABLE		BIT(0)
+#define SPE_ATTR_PA_ENABLE		BIT(1)
+#define SPE_ATTR_PCT_ENABLE		BIT(2)
+#define SPE_ATTR_JITTER			BIT(16)
+#define SPE_ATTR_BRANCH_FILTER		BIT(32)
+#define SPE_ATTR_LOAD_FILTER		BIT(33)
+#define SPE_ATTR_STORE_FILTER		BIT(34)
+
+#define SPE_ATTR_EV_RETIRED		BIT(1)
+#define SPE_ATTR_EV_CACHE		BIT(3)
+#define SPE_ATTR_EV_TLB			BIT(5)
+#define SPE_ATTR_EV_BRANCH		BIT(7)
+
 static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
 {
 	struct perf_pmu **arm_spe_pmus = NULL;
@@ -108,3 +122,27 @@ struct auxtrace_record
 	*err = 0;
 	return NULL;
 }
+
+void auxtrace__preprocess_evlist(struct evlist *evlist)
+{
+	struct evsel *evsel;
+	struct perf_pmu *pmu;
+
+	evlist__for_each_entry(evlist, evsel) {
+		/* Currently only supports precise_ip for branch-misses on arm64 */
+		if (!strcmp(perf_env__arch(evlist->env), "arm64")
+			&& evsel->core.attr.config == PERF_COUNT_HW_BRANCH_MISSES
+			&& evsel->core.attr.precise_ip)
+		{
+			pmu = perf_pmu__find("arm_spe_0");
+			if (pmu) {
+				evsel->pmu_name = pmu->name;
+				evsel->core.attr.type = pmu->type;
+				evsel->core.attr.config = SPE_ATTR_TS_ENABLE
+							| SPE_ATTR_BRANCH_FILTER;
+				evsel->core.attr.config1 = SPE_ATTR_EV_BRANCH;
+				evsel->core.attr.precise_ip = 0;
+			}
+		}
+	}
+}
\ No newline at end of file
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..3bc61f03d572 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2451,6 +2451,11 @@ int cmd_record(int argc, const char **argv)
 
 	argc = parse_options(argc, argv, record_options, record_usage,
 			    PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (auxtrace__preprocess_evlist) {
+		auxtrace__preprocess_evlist(rec->evlist);
+	}
+
 	if (quiet)
 		perf_quiet_option();
 
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 4ef22a0775a9..b21806c97dd8 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -778,6 +778,15 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	attr.sample_id_all = evsel->core.attr.sample_id_all;
 	attr.read_format = evsel->core.attr.read_format;
 
+	/* If it is in the precise ip mode, there is no need to
+	 * synthesize new events. */
+	if (!strncmp(evsel->name, "branch-misses", 13)) {
+		spe->sample_branch_miss = true;
+		spe->branch_miss_id = evsel->core.id[0];
+
+		return 0;
+	}
+
 	/* create new id val to be a fixed offset from evsel id */
 	id = evsel->core.id[0] + 1000000000;
 
diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index 98d3235781c3..8b1fb191d03a 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -20,6 +20,8 @@ enum {
 union perf_event;
 struct perf_session;
 struct perf_pmu;
+struct evlist;
+struct evsel;
 
 struct auxtrace_record *arm_spe_recording_init(int *err,
 					       struct perf_pmu *arm_spe_pmu);
@@ -28,4 +30,5 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session);
 
 struct perf_event_attr *arm_spe_pmu_default_config(struct perf_pmu *arm_spe_pmu);
+void arm_spe_precise_ip_support(struct evlist *evlist, struct evsel *evsel);
 #endif
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 80617b0d044d..4f89a3a31ab2 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -584,6 +584,7 @@ void auxtrace__dump_auxtrace_sample(struct perf_session *session,
 int auxtrace__flush_events(struct perf_session *session, struct perf_tool *tool);
 void auxtrace__free_events(struct perf_session *session);
 void auxtrace__free(struct perf_session *session);
+void auxtrace__preprocess_evlist(struct evlist *evlist) __attribute__((weak));
 
 #define ITRACE_HELP \
 "				i:	    		synthesize instructions events\n"		\
@@ -728,6 +729,11 @@ void auxtrace__free(struct perf_session *session __maybe_unused)
 {
 }
 
+static inline
+void auxtrace__preprocess_evlist(struct evlist *evlist __maybe_unused)
+{
+}
+
 static inline
 int auxtrace_index__write(int fd __maybe_unused,
 			  struct list_head *head __maybe_unused)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip
  2020-02-11 14:04           ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip James Clark
                               ` (3 preceding siblings ...)
  2020-02-11 14:04             ` [PATCH v4 " James Clark
@ 2020-02-12 12:24             ` Jiri Olsa
  2020-02-12 13:10               ` Adrian Hunter
  4 siblings, 1 reply; 42+ messages in thread
From: Jiri Olsa @ 2020-02-12 12:24 UTC (permalink / raw)
  To: James Clark; +Cc: linux-arm-kernel, linux-kernel, nd, Adrian Hunter

On Tue, Feb 11, 2020 at 02:04:41PM +0000, James Clark wrote:
> Hi Jirka,
> 
> Oops. I've removed all the changes to evlist.c and evsel.h

hi,
it looks ok from my POV, but I don't follow auxtrace that much

Adrian,
it's changing some generic bits of the auxtrace framework,
could you please check?

thanks,
jirka

> 
> 
> James
> 
> Tan Xiaojun (4):
>   perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
>   perf tools: Add support for "report" for some spe events
>   perf report: Add SPE options to --itrace argument
>   perf tools: Support "branch-misses:pp" on arm64
> 
>  tools/perf/Documentation/itrace.txt           |   5 +-
>  tools/perf/arch/arm/util/auxtrace.c           |  38 +
>  tools/perf/builtin-record.c                   |   5 +
>  tools/perf/util/Build                         |   2 +-
>  tools/perf/util/arm-spe-decoder/Build         |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
>  .../arm-spe-pkt-decoder.c                     |   0
>  .../arm-spe-pkt-decoder.h                     |   2 +
>  tools/perf/util/arm-spe.c                     | 756 +++++++++++++++++-
>  tools/perf/util/arm-spe.h                     |   3 +
>  tools/perf/util/auxtrace.c                    |  13 +
>  tools/perf/util/auxtrace.h                    |  14 +-
>  13 files changed, 1089 insertions(+), 41 deletions(-)
>  create mode 100644 tools/perf/util/arm-spe-decoder/Build
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>  rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
>  rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (96%)
> 
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip
  2020-02-12 12:24             ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip Jiri Olsa
@ 2020-02-12 13:10               ` Adrian Hunter
  0 siblings, 0 replies; 42+ messages in thread
From: Adrian Hunter @ 2020-02-12 13:10 UTC (permalink / raw)
  To: Jiri Olsa, James Clark; +Cc: linux-arm-kernel, linux-kernel, nd

On 12/02/20 2:24 pm, Jiri Olsa wrote:
> On Tue, Feb 11, 2020 at 02:04:41PM +0000, James Clark wrote:
>> Hi Jirka,
>>
>> Oops. I've removed all the changes to evlist.c and evsel.h
> 
> hi,
> it looks ok from my POV, but I don't follow auxtrace that much
> 
> Adrian,
> it's changing some generic bits of the auxtrace framework,
> could you please check?

Sure, in the next few days.

> 
> thanks,
> jirka
> 
>>
>>
>> James
>>
>> Tan Xiaojun (4):
>>   perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
>>   perf tools: Add support for "report" for some spe events
>>   perf report: Add SPE options to --itrace argument
>>   perf tools: Support "branch-misses:pp" on arm64
>>
>>  tools/perf/Documentation/itrace.txt           |   5 +-
>>  tools/perf/arch/arm/util/auxtrace.c           |  38 +
>>  tools/perf/builtin-record.c                   |   5 +
>>  tools/perf/util/Build                         |   2 +-
>>  tools/perf/util/arm-spe-decoder/Build         |   1 +
>>  .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
>>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
>>  .../arm-spe-pkt-decoder.c                     |   0
>>  .../arm-spe-pkt-decoder.h                     |   2 +
>>  tools/perf/util/arm-spe.c                     | 756 +++++++++++++++++-
>>  tools/perf/util/arm-spe.h                     |   3 +
>>  tools/perf/util/auxtrace.c                    |  13 +
>>  tools/perf/util/auxtrace.h                    |  14 +-
>>  13 files changed, 1089 insertions(+), 41 deletions(-)
>>  create mode 100644 tools/perf/util/arm-spe-decoder/Build
>>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>>  rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
>>  rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (96%)
>>
>> -- 
>> 2.17.1
>>
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/4] perf tools: Add support for "report" for some spe events
  2020-02-11 14:04             ` [PATCH v4 2/4] perf tools: Add support for "report" for some spe events James Clark
@ 2020-02-17 11:39               ` Adrian Hunter
  0 siblings, 0 replies; 42+ messages in thread
From: Adrian Hunter @ 2020-02-17 11:39 UTC (permalink / raw)
  To: James Clark, jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Al Grant, Namhyung Kim

On 11/02/20 4:04 pm, James Clark wrote:
> From: Tan Xiaojun <tanxiaojun@huawei.com>
> 
> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
> Profiling Extensions (SPE) support") is merged, "perf record" and
> "perf report --dump-raw-trace" have been supported. However, the
> raw data that is dumped cannot be used without parsing.
> 
> This patch is to improve the "perf report" support for spe, and
> further process the data. Currently, support for the four events
> of llc-miss, tlb-miss, branch-miss, and remote-access is added.
> 
> Example usage:
> 
> $ ./perf record -c 1024 -e arm_spe_0/branch_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ -o perf-armspe-dd.data dd if=/dev/zero of=/dev/null count=10000
> 
> $ ./perf report -i perf-armspe-dd.data --stdio
> --------------------------------------------------------------------
> ...
>  # Samples: 23  of event 'llc-miss'
>  # Event count (approx.): 23
> ...
>     33.33%    33.33%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>     12.12%    12.12%  dd       [kernel.kallsyms]  [k] perf_event_mmap
>      6.06%     6.06%  dd       [kernel.kallsyms]  [k] copy_page
>      6.06%     6.06%  dd       ld-2.28.so         [.] _dl_relocate_object
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] change_protection_range
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] filemap_map_pages
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] generic_permission
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] kmem_cache_alloc
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] lookup_fast
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] perf_event_exec
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] ring_buffer_record_is_on
>      3.03%     3.03%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
>      3.03%     3.03%  dd       ld-2.28.so         [.] _dl_start
>      3.03%     3.03%  dd       ld-2.28.so         [.] dl_main
>      3.03%     3.03%  dd       ld-2.28.so         [.] strcmp
>      3.03%     3.03%  dd       libc-2.28.so       [.] _dl_addr
> ...
>  # Samples: 3  of event 'tlb-miss'
>  # Event count (approx.): 3
> ...
>     33.33%    33.33%  dd       [kernel.kallsyms]  [k] filemap_map_pages
>     33.33%    33.33%  dd       ld-2.28.so         [.] _dl_start
>     33.33%    33.33%  dd       ld-2.28.so         [.] dl_main
> ...
>  # Samples: 20  of event 'branch-miss'
>  # Event count (approx.): 20
> ...
>     15.38%    15.38%  dd       [kernel.kallsyms]  [k] __fput
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] do_el0_ia_bp_hardening
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] filemap_map_pages
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] pagevec_lru_move_fn
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] perf_event_mmap_output
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] task_work_run
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] unmap_single_vma
>      7.69%     7.69%  dd       libc-2.28.so       [.] _IO_flush_all_lockp
>      7.69%     7.69%  dd       libc-2.28.so       [.] __memcpy_generic
>      7.69%     7.69%  dd       libc-2.28.so       [.] _dl_addr
>      7.69%     7.69%  dd       libc-2.28.so       [.] msort_with_tmp.part.0
>      7.69%     7.69%  dd       libc-2.28.so       [.] read_alias_file
> ...
>  # Samples: 5  of event 'remote-access'
>  # Event count (approx.): 5
> ...
>     27.78%    27.78%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>     16.67%    16.67%  dd       [kernel.kallsyms]  [k] perf_event_mmap
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] change_protection_range
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] filemap_map_pages
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] generic_permission
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] lookup_fast
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] perf_event_exec
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
>      5.56%     5.56%  dd       ld-2.28.so         [.] _dl_relocate_object
>      5.56%     5.56%  dd       ld-2.28.so         [.] _dl_start
>      5.56%     5.56%  dd       ld-2.28.so         [.] dl_main
> 
> --------------------------------------------------------------------
> After that, more analysis and processing of the raw data of spe
> will be done.
> 
> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
> Tested-by: Qi Liu <liuqi115@hisilicon.com>
> Signed-off-by: James Clark <james.clark@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Tan Xiaojun <tanxiaojun@huawei.com>
> Cc: Al Grant <al.grant@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>

For auxtrace:

Acked-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
>  tools/perf/util/arm-spe-decoder/Build         |   2 +-
>  .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
>  .../arm-spe-decoder/arm-spe-pkt-decoder.h     |   2 +
>  tools/perf/util/arm-spe.c                     | 745 +++++++++++++++++-
>  tools/perf/util/auxtrace.c                    |  13 +
>  tools/perf/util/auxtrace.h                    |   8 +-
>  7 files changed, 1022 insertions(+), 39 deletions(-)
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> 
> diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
> index 16efbc245028..f8dae13fc876 100644
> --- a/tools/perf/util/arm-spe-decoder/Build
> +++ b/tools/perf/util/arm-spe-decoder/Build
> @@ -1 +1 @@
> -perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
> +perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> new file mode 100644
> index 000000000000..50e796b89a95
> --- /dev/null
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arm_spe_decoder.c: ARM SPE support
> + */
> +
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif
> +#include <stdlib.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +#include <linux/compiler.h>
> +#include <linux/zalloc.h>
> +
> +#include "../util.h"
> +#include "../debug.h"
> +#include "../auxtrace.h"
> +
> +#include "arm-spe-pkt-decoder.h"
> +#include "arm-spe-decoder.h"
> +
> +#ifndef BIT
> +#define BIT(n)		(1UL << (n))
> +#endif
> +
> +struct arm_spe_decoder {
> +	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
> +	void *data;
> +	struct arm_spe_state state;
> +	const unsigned char *buf;
> +	size_t len;
> +	uint64_t pos;
> +	struct arm_spe_pkt packet;
> +	int pkt_step;
> +	int pkt_len;
> +	int last_packet_type;
> +
> +	uint64_t last_ip;
> +	uint64_t ip;
> +	uint64_t timestamp;
> +	uint64_t sample_timestamp;
> +	const unsigned char *next_buf;
> +	size_t next_len;
> +	unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
> +};
> +
> +static uint64_t arm_spe_calc_ip(uint64_t payload)
> +{
> +	uint64_t ip = (payload & ~(0xffULL << 56));
> +
> +	/* fill high 8 bits for kernel virtual address */
> +	/* In Armv8 Architecture Reference Manual: Xn[55] determines
> +	 * whether the address lies in the upper or lower address range
> +	 * for the purpose of determining whether address tagging is
> +	 * used */
> +	if (ip & BIT(55))
> +		ip |= (uint64_t)(0xffULL << 56);
> +
> +	return ip;
> +}
> +
> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
> +{
> +	struct arm_spe_decoder *decoder;
> +
> +	if (!params->get_trace)
> +		return NULL;
> +
> +	decoder = zalloc(sizeof(struct arm_spe_decoder));
> +	if (!decoder)
> +		return NULL;
> +
> +	decoder->get_trace          = params->get_trace;
> +	decoder->data               = params->data;
> +
> +	return decoder;
> +}
> +
> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
> +{
> +	free(decoder);
> +}
> +
> +static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
> +{
> +	decoder->pkt_len = 1;
> +	decoder->pkt_step = 1;
> +	pr_debug("ERROR: Bad packet\n");
> +
> +	return -EBADMSG;
> +}
> +
> +
> +static int arm_spe_get_data(struct arm_spe_decoder *decoder)
> +{
> +	struct arm_spe_buffer buffer = { .buf = 0, };
> +	int ret;
> +
> +	decoder->pkt_step = 0;
> +
> +	pr_debug("Getting more data\n");
> +	ret = decoder->get_trace(&buffer, decoder->data);
> +	if (ret)
> +		return ret;
> +
> +	decoder->buf = buffer.buf;
> +	decoder->len = buffer.len;
> +	if (!decoder->len) {
> +		pr_debug("No more data\n");
> +		return -ENODATA;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
> +{
> +	return arm_spe_get_data(decoder);
> +}
> +
> +static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
> +{
> +	int ret;
> +
> +	decoder->last_packet_type = decoder->packet.type;
> +
> +	do {
> +		decoder->pos += decoder->pkt_step;
> +		decoder->buf += decoder->pkt_step;
> +		decoder->len -= decoder->pkt_step;
> +
> +
> +		if (!decoder->len) {
> +			ret = arm_spe_get_next_data(decoder);
> +			if (ret)
> +				return ret;
> +		}
> +
> +		ret = arm_spe_get_packet(decoder->buf, decoder->len,
> +				&decoder->packet);
> +		if (ret <= 0)
> +			return arm_spe_bad_packet(decoder);
> +
> +		decoder->pkt_len = ret;
> +		decoder->pkt_step = ret;
> +	} while (decoder->packet.type == ARM_SPE_PAD);
> +
> +	return 0;
> +}
> +
> +static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
> +{
> +	int err;
> +	int idx;
> +	uint64_t payload;
> +
> +	while (1) {
> +		err = arm_spe_get_next_packet(decoder);
> +		if (err)
> +			return err;
> +
> +		idx = decoder->packet.index;
> +		payload = decoder->packet.payload;
> +
> +		switch (decoder->packet.type) {
> +		case ARM_SPE_TIMESTAMP:
> +			decoder->sample_timestamp = payload;
> +			return 0;
> +		case ARM_SPE_END:
> +			decoder->sample_timestamp = 0;
> +			return 0;
> +		case ARM_SPE_ADDRESS:
> +			decoder->ip = arm_spe_calc_ip(payload);
> +			if (idx == 0)
> +				decoder->state.from_ip = decoder->ip;
> +			else if (idx == 1)
> +				decoder->state.to_ip = decoder->ip;
> +			break;
> +		case ARM_SPE_COUNTER:
> +			break;
> +		case ARM_SPE_CONTEXT:
> +			break;
> +		case ARM_SPE_OP_TYPE:
> +			break;
> +		case ARM_SPE_EVENTS:
> +			if (payload & BIT(EV_TLB_REFILL))
> +				decoder->state.type |= ARM_SPE_TLB_MISS;
> +			if (payload & BIT(EV_MISPRED))
> +				decoder->state.type |= ARM_SPE_BRANCH_MISS;
> +			if (idx > 1 && (payload & BIT(EV_LLC_REFILL)))
> +				decoder->state.type |= ARM_SPE_LLC_MISS;
> +			if (idx > 1 && (payload & BIT(EV_REMOTE_ACCESS)))
> +				decoder->state.type |= ARM_SPE_REMOTE_ACCESS;
> +
> +			break;
> +		case ARM_SPE_DATA_SOURCE:
> +			break;
> +		case ARM_SPE_BAD:
> +			break;
> +		case ARM_SPE_PAD:
> +			break;
> +		default:
> +			pr_err("Get Packet Error!\n");
> +			return -ENOSYS;
> +		}
> +	}
> +}
> +
> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
> +{
> +	int err;
> +
> +	decoder->state.type = 0;
> +
> +	err = arm_spe_walk_trace(decoder);
> +	if (err)
> +		decoder->state.err = err;
> +
> +	decoder->state.timestamp = decoder->sample_timestamp;
> +
> +	return &decoder->state;
> +}
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> new file mode 100644
> index 000000000000..330f9e1e71ab
> --- /dev/null
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arm_spe_decoder.c: ARM SPE support
> + */
> +
> +#ifndef INCLUDE__ARM_SPE_DECODER_H__
> +#define INCLUDE__ARM_SPE_DECODER_H__
> +
> +#include <stdint.h>
> +#include <stddef.h>
> +#include <stdbool.h>
> +
> +enum arm_spe_events {
> +	EV_EXCEPTION_GEN,
> +	EV_RETIRED,
> +	EV_L1D_ACCESS,
> +	EV_L1D_REFILL,
> +	EV_TLB_ACCESS,
> +	EV_TLB_REFILL,
> +	EV_NOT_TAKEN,
> +	EV_MISPRED,
> +	EV_LLC_ACCESS,
> +	EV_LLC_REFILL,
> +	EV_REMOTE_ACCESS,
> +};
> +
> +enum arm_spe_sample_type {
> +	ARM_SPE_LLC_MISS	= 1 << 0,
> +	ARM_SPE_TLB_MISS	= 1 << 1,
> +	ARM_SPE_BRANCH_MISS	= 1 << 2,
> +	ARM_SPE_REMOTE_ACCESS	= 1 << 3,
> +	ARM_SPE_EX_STOP		= 1 << 6,
> +};
> +
> +struct arm_spe_state {
> +	enum arm_spe_sample_type type;
> +	int err;
> +	uint64_t from_ip;
> +	uint64_t to_ip;
> +	uint64_t timestamp;
> +};
> +
> +struct arm_spe_insn;
> +
> +struct arm_spe_buffer {
> +	const unsigned char *buf;
> +	size_t len;
> +	u64 offset;
> +	bool consecutive;
> +	uint64_t ref_timestamp;
> +	uint64_t trace_nr;
> +};
> +
> +struct arm_spe_params {
> +	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
> +	void *data;
> +};
> +
> +struct arm_spe_decoder;
> +
> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params);
> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder);
> +
> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder);
> +
> +#endif
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
> index d786ef65113f..865d1e35b401 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
> @@ -15,6 +15,8 @@
>  #define ARM_SPE_NEED_MORE_BYTES		-1
>  #define ARM_SPE_BAD_PACKET		-2
>  
> +#define ARM_SPE_PKT_MAX_SZ		16
> +
>  enum arm_spe_pkt_type {
>  	ARM_SPE_BAD,
>  	ARM_SPE_PAD,
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index f3382a38d48e..4ef22a0775a9 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -16,34 +16,68 @@
>  #include <linux/log2.h>
>  #include <linux/zalloc.h>
>  
> +#include "auxtrace.h"
>  #include "color.h"
> +#include "debug.h"
>  #include "evsel.h"
> +#include "evlist.h"
>  #include "machine.h"
>  #include "session.h"
> -#include "debug.h"
> -#include "auxtrace.h"
> +#include "symbol.h"
> +#include "thread.h"
> +#include "thread-stack.h"
> +#include "tool.h"
> +#include "util/synthetic-events.h"
> +
>  #include "arm-spe.h"
> +#include "arm-spe-decoder/arm-spe-decoder.h"
>  #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
>  
> +#define MAX_TIMESTAMP (~0ULL)
> +
>  struct arm_spe {
>  	struct auxtrace			auxtrace;
>  	struct auxtrace_queues		queues;
>  	struct auxtrace_heap		heap;
> +        struct itrace_synth_opts        synth_opts;
>  	u32				auxtrace_type;
>  	struct perf_session		*session;
>  	struct machine			*machine;
>  	u32				pmu_type;
> +
> +	u8				timeless_decoding;
> +	u8				data_queued;
> +
> +	u8				sample_llc_miss;
> +	u8				sample_tlb_miss;
> +	u8				sample_branch_miss;
> +	u8				sample_remote_access;
> +	u64				llc_miss_id;
> +	u64				tlb_miss_id;
> +	u64				branch_miss_id;
> +	u64				remote_access_id;
> +	u64				kernel_start;
> +
> +	unsigned long			num_events;
>  };
>  
>  struct arm_spe_queue {
> -	struct arm_spe		*spe;
> -	unsigned int		queue_nr;
> -	struct auxtrace_buffer	*buffer;
> -	bool			on_heap;
> -	bool			done;
> -	pid_t			pid;
> -	pid_t			tid;
> -	int			cpu;
> +	struct arm_spe			*spe;
> +	unsigned int			queue_nr;
> +	struct auxtrace_buffer		*buffer;
> +	struct auxtrace_buffer		*old_buffer;
> +	union perf_event		*event_buf;
> +	bool				on_heap;
> +	bool				done;
> +	pid_t				pid;
> +	pid_t				tid;
> +	int				cpu;
> +	void				*decoder;
> +	const struct arm_spe_state	*state;
> +	u64				time;
> +	u64				timestamp;
> +	struct thread			*thread;
> +	bool				have_sample;
>  };
>  
>  static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
> @@ -92,44 +126,494 @@ static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
>  	arm_spe_dump(spe, buf, len);
>  }
>  
> -static int arm_spe_process_event(struct perf_session *session __maybe_unused,
> -				 union perf_event *event __maybe_unused,
> -				 struct perf_sample *sample __maybe_unused,
> -				 struct perf_tool *tool __maybe_unused)
> +static int arm_spe_get_trace(struct arm_spe_buffer *b, void *data)
> +{
> +	struct arm_spe_queue *speq = data;
> +	struct auxtrace_buffer *buffer = speq->buffer;
> +	struct auxtrace_buffer *old_buffer = speq->old_buffer;
> +	struct auxtrace_queue *queue;
> +
> +	queue = &speq->spe->queues.queue_array[speq->queue_nr];
> +
> +	buffer = auxtrace_buffer__next(queue, buffer);
> +	/* If no more data, drop the previous auxtrace_buffer and return */
> +	if (!buffer) {
> +		if (old_buffer)
> +			auxtrace_buffer__drop_data(old_buffer);
> +		b->len = 0;
> +		return 0;
> +	}
> +
> +	speq->buffer = buffer;
> +
> +	/* If the aux_buffer doesn't have data associated, try to load it */
> +	if (!buffer->data) {
> +		/* get the file desc associated with the perf data file */
> +		int fd = perf_data__fd(speq->spe->session->data);
> +
> +		buffer->data = auxtrace_buffer__get_data(buffer, fd);
> +		if (!buffer->data)
> +			return -ENOMEM;
> +	}
> +
> +	if (buffer->use_data) {
> +		b->len = buffer->use_size;
> +		b->buf = buffer->use_data;
> +	} else {
> +		b->len = buffer->size;
> +		b->buf = buffer->data;
> +	}
> +
> +	b->ref_timestamp = buffer->reference;
> +
> +	if (b->len) {
> +		if (old_buffer)
> +			auxtrace_buffer__drop_data(old_buffer);
> +		speq->old_buffer = buffer;
> +	} else {
> +		auxtrace_buffer__drop_data(buffer);
> +		return arm_spe_get_trace(b, data);
> +	}
> +
> +	return 0;
> +}
> +
> +static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe,
> +		unsigned int queue_nr)
> +{
> +	struct arm_spe_params params = { .get_trace = 0, };
> +	struct arm_spe_queue *speq;
> +
> +	speq = zalloc(sizeof(*speq));
> +	if (!speq)
> +		return NULL;
> +
> +	speq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
> +	if (!speq->event_buf)
> +		goto out_free;
> +
> +	speq->spe = spe;
> +	speq->queue_nr = queue_nr;
> +	speq->pid = -1;
> +	speq->tid = -1;
> +	speq->cpu = -1;
> +
> +	/* params set */
> +	params.get_trace = arm_spe_get_trace;
> +	params.data = speq;
> +
> +	/* create new decoder */
> +	speq->decoder = arm_spe_decoder_new(&params);
> +	if (!speq->decoder)
> +		goto out_free;
> +
> +	return speq;
> +
> +out_free:
> +	zfree(&speq->event_buf);
> +	free(speq);
> +
> +	return NULL;
> +}
> +
> +static inline u8 arm_spe_cpumode(struct arm_spe *spe, uint64_t ip)
> +{
> +	return ip >= spe->kernel_start ?
> +		PERF_RECORD_MISC_KERNEL :
> +		PERF_RECORD_MISC_USER;
> +}
> +
> +static void arm_spe_prep_sample(struct arm_spe *spe,
> +				struct arm_spe_queue *speq,
> +				union perf_event *event,
> +				struct perf_sample *sample)
> +{
> +	if (!spe->timeless_decoding)
> +		sample->time = speq->timestamp;
> +
> +	sample->ip = speq->state->from_ip;
> +	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
> +	sample->pid = speq->pid;
> +	sample->tid = speq->tid;
> +	sample->addr = speq->state->to_ip;
> +	sample->period = 1;
> +	sample->cpu = speq->cpu;
> +
> +	event->sample.header.type = PERF_RECORD_SAMPLE;
> +	event->sample.header.misc = sample->cpumode;
> +	event->sample.header.size = sizeof(struct perf_event_header);
> +}
> +
> +static inline int
> +arm_spe_deliver_synth_event(struct arm_spe *spe,
> +			    struct arm_spe_queue *speq __maybe_unused,
> +			    union perf_event *event,
> +			    struct perf_sample *sample)
> +{
> +	int ret;
> +
> +	ret = perf_session__deliver_synth_event(spe->session, event, sample);
> +	if (ret)
> +		pr_err("ARM SPE: failed to deliver event, error %d\n", ret);
> +
> +	return ret;
> +}
> +
> +static int
> +arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
> +				u64 spe_events_id)
> +{
> +	struct arm_spe *spe = speq->spe;
> +	union perf_event *event = speq->event_buf;
> +	struct perf_sample sample = { .ip = 0, };
> +
> +	arm_spe_prep_sample(spe, speq, event, &sample);
> +
> +	sample.id = spe_events_id;
> +	sample.stream_id = spe_events_id;
> +
> +	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
> +}
> +
> +static int arm_spe_sample(struct arm_spe_queue *speq)
> +{
> +	const struct arm_spe_state *state = speq->state;
> +	struct arm_spe *spe = speq->spe;
> +	int err;
> +
> +	if (!speq->have_sample)
> +		return 0;
> +
> +	speq->have_sample = false;
> +
> +	if (spe->sample_llc_miss && (state->type & ARM_SPE_LLC_MISS)) {
> +		err = arm_spe_synth_spe_events_sample(speq, spe->llc_miss_id);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (spe->sample_tlb_miss && (state->type & ARM_SPE_TLB_MISS)) {
> +		err = arm_spe_synth_spe_events_sample(speq, spe->tlb_miss_id);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (spe->sample_branch_miss && (state->type & ARM_SPE_BRANCH_MISS)) {
> +		err = arm_spe_synth_spe_events_sample(speq,
> +						      spe->branch_miss_id);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (spe->sample_remote_access && (state->type & ARM_SPE_REMOTE_ACCESS)) {
> +		err = arm_spe_synth_spe_events_sample(speq, spe->remote_access_id);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
> +{
> +	const struct arm_spe_state *state = speq->state;
> +	struct arm_spe *spe = speq->spe;
> +	int err;
> +
> +	if (!spe->kernel_start)
> +		spe->kernel_start = machine__kernel_start(spe->machine);
> +
> +	while (1) {
> +		err = arm_spe_sample(speq);
> +		if (err)
> +			return err;
> +
> +		state = arm_spe_decode(speq->decoder);
> +		if (state->err) {
> +			if (state->err == -ENODATA) {
> +				pr_debug("No data or all data has been processed.\n");
> +				return 1;
> +			}
> +			continue;
> +		}
> +
> +		speq->state = state;
> +		speq->have_sample = true;
> +
> +		if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
> +			*timestamp = speq->timestamp;
> +			return 0;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe__setup_queue(struct arm_spe *spe,
> +			       struct auxtrace_queue *queue,
> +			       unsigned int queue_nr)
> +{
> +	struct arm_spe_queue *speq = queue->priv;
> +
> +	if (list_empty(&queue->head) || speq)
> +		return 0;
> +
> +	speq = arm_spe__alloc_queue(spe, queue_nr);
> +
> +	if (!speq)
> +		return -ENOMEM;
> +
> +	queue->priv = speq;
> +
> +	if (queue->cpu != -1)
> +		speq->cpu = queue->cpu;
> +
> +	if (!speq->on_heap) {
> +		const struct arm_spe_state *state;
> +		int ret;
> +
> +		if (spe->timeless_decoding)
> +			return 0;
> +
> +retry:
> +		state = arm_spe_decode(speq->decoder);
> +		if (state->err) {
> +			if (state->err == -ENODATA) {
> +				pr_debug("queue %u has no timestamp\n",
> +						queue_nr);
> +				return 0;
> +			}
> +			goto retry;
> +		}
> +
> +		speq->timestamp = state->timestamp;
> +		speq->state = state;
> +		speq->have_sample = true;
> +		ret = auxtrace_heap__add(&spe->heap, queue_nr, speq->timestamp);
> +		if (ret)
> +			return ret;
> +		speq->on_heap = true;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe__setup_queues(struct arm_spe *spe)
>  {
> +	unsigned int i;
> +	int ret;
> +
> +	for (i = 0; i < spe->queues.nr_queues; i++) {
> +		ret = arm_spe__setup_queue(spe, &spe->queues.queue_array[i], i);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe__update_queues(struct arm_spe *spe)
> +{
> +	if (spe->queues.new_data) {
> +		spe->queues.new_data = false;
> +		return arm_spe__setup_queues(spe);
> +	}
> +
>  	return 0;
>  }
>  
> +static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
> +{
> +	struct evsel *evsel;
> +	struct evlist *evlist = spe->session->evlist;
> +	bool timeless_decoding = true;
> +
> +	/*
> +	 * Circle through the list of event and complain if we find one
> +	 * with the time bit set.
> +	 */
> +	evlist__for_each_entry(evlist, evsel) {
> +		if ((evsel->core.attr.sample_type & PERF_SAMPLE_TIME))
> +			timeless_decoding = false;
> +	}
> +
> +	return timeless_decoding;
> +}
> +
> +static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
> +				    struct auxtrace_queue *queue)
> +{
> +	struct arm_spe_queue *speq = queue->priv;
> +	pid_t tid;
> +
> +	tid = machine__get_current_tid(spe->machine, speq->cpu);
> +	if (tid != -1) {
> +		speq->tid = tid;
> +		thread__zput(speq->thread);
> +	} else
> +		speq->tid = queue->tid;
> +
> +	if ((!speq->thread) && (speq->tid != -1)) {
> +		speq->thread = machine__find_thread(spe->machine, -1,
> +						    speq->tid);
> +	}
> +
> +	if (speq->thread) {
> +		speq->pid = speq->thread->pid_;
> +		if (queue->cpu == -1)
> +			speq->cpu = speq->thread->cpu;
> +	}
> +}
> +
> +static int arm_spe_process_queues(struct arm_spe *spe, u64 timestamp)
> +{
> +	unsigned int queue_nr;
> +	u64 ts;
> +	int ret;
> +
> +	while (1) {
> +		struct auxtrace_queue *queue;
> +		struct arm_spe_queue *speq;
> +
> +		if (!spe->heap.heap_cnt)
> +			return 0;
> +
> +		if (spe->heap.heap_array[0].ordinal >= timestamp)
> +			return 0;
> +
> +		queue_nr = spe->heap.heap_array[0].queue_nr;
> +		queue = &spe->queues.queue_array[queue_nr];
> +		speq = queue->priv;
> +
> +		auxtrace_heap__pop(&spe->heap);
> +
> +		if (spe->heap.heap_cnt) {
> +			ts = spe->heap.heap_array[0].ordinal + 1;
> +			if (ts > timestamp)
> +				ts = timestamp;
> +		} else {
> +			ts = timestamp;
> +		}
> +
> +		arm_spe_set_pid_tid_cpu(spe, queue);
> +
> +		ret = arm_spe_run_decoder(speq, &ts);
> +		if (ret < 0) {
> +			auxtrace_heap__add(&spe->heap, queue_nr, ts);
> +			return ret;
> +		}
> +
> +		if (!ret) {
> +			ret = auxtrace_heap__add(&spe->heap, queue_nr, ts);
> +			if (ret < 0)
> +				return ret;
> +		} else {
> +			speq->on_heap = false;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid,
> +					    u64 time_)
> +{
> +	struct auxtrace_queues *queues = &spe->queues;
> +	unsigned int i;
> +	u64 ts = 0;
> +
> +	for (i = 0; i < queues->nr_queues; i++) {
> +		struct auxtrace_queue *queue = &spe->queues.queue_array[i];
> +		struct arm_spe_queue *speq = queue->priv;
> +
> +		if (speq && (tid == -1 || speq->tid == tid)) {
> +			speq->time = time_;
> +			arm_spe_set_pid_tid_cpu(spe, queue);
> +			arm_spe_run_decoder(speq, &ts);
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int arm_spe_process_event(struct perf_session *session,
> +				 union perf_event *event,
> +				 struct perf_sample *sample,
> +				 struct perf_tool *tool)
> +{
> +	int err = 0;
> +	u64 timestamp;
> +	struct arm_spe *spe = container_of(session->auxtrace,
> +			struct arm_spe, auxtrace);
> +
> +	if (dump_trace)
> +		return 0;
> +
> +	if (!tool->ordered_events) {
> +		pr_err("CoreSight SPE Trace requires ordered events\n");
> +		return -EINVAL;
> +	}
> +
> +	if (sample->time && (sample->time != (u64) -1))
> +		timestamp = sample->time;
> +	else
> +		timestamp = 0;
> +
> +	if (timestamp || spe->timeless_decoding) {
> +		err = arm_spe__update_queues(spe);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (spe->timeless_decoding) {
> +		if (event->header.type == PERF_RECORD_EXIT) {
> +			err = arm_spe_process_timeless_queues(spe,
> +					event->fork.tid,
> +					sample->time);
> +		}
> +	} else if (timestamp) {
> +		if (event->header.type == PERF_RECORD_EXIT) {
> +			err = arm_spe_process_queues(spe, timestamp);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	return err;
> +}
> +
>  static int arm_spe_process_auxtrace_event(struct perf_session *session,
>  					  union perf_event *event,
>  					  struct perf_tool *tool __maybe_unused)
>  {
>  	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
>  					     auxtrace);
> -	struct auxtrace_buffer *buffer;
> -	off_t data_offset;
> -	int fd = perf_data__fd(session->data);
> -	int err;
>  
> -	if (perf_data__is_pipe(session->data)) {
> -		data_offset = 0;
> -	} else {
> -		data_offset = lseek(fd, 0, SEEK_CUR);
> -		if (data_offset == -1)
> -			return -errno;
> -	}
> +	if (!spe->data_queued) {
> +		struct auxtrace_buffer *buffer;
> +		off_t data_offset;
> +		int fd = perf_data__fd(session->data);
> +		int err;
>  
> -	err = auxtrace_queues__add_event(&spe->queues, session, event,
> -					 data_offset, &buffer);
> -	if (err)
> -		return err;
> -
> -	/* Dump here now we have copied a piped trace out of the pipe */
> -	if (dump_trace) {
> -		if (auxtrace_buffer__get_data(buffer, fd)) {
> -			arm_spe_dump_event(spe, buffer->data,
> -					     buffer->size);
> -			auxtrace_buffer__put_data(buffer);
> +		if (perf_data__is_pipe(session->data)) {
> +			data_offset = 0;
> +		} else {
> +			data_offset = lseek(fd, 0, SEEK_CUR);
> +			if (data_offset == -1)
> +				return -errno;
> +		}
> +
> +		err = auxtrace_queues__add_event(&spe->queues, session, event,
> +				data_offset, &buffer);
> +		if (err)
> +			return err;
> +
> +		/* Dump here now we have copied a piped trace out of the pipe */
> +		if (dump_trace) {
> +			if (auxtrace_buffer__get_data(buffer, fd)) {
> +				arm_spe_dump_event(spe, buffer->data,
> +						buffer->size);
> +				auxtrace_buffer__put_data(buffer);
> +			}
>  		}
>  	}
>  
> @@ -139,7 +623,25 @@ static int arm_spe_process_auxtrace_event(struct perf_session *session,
>  static int arm_spe_flush(struct perf_session *session __maybe_unused,
>  			 struct perf_tool *tool __maybe_unused)
>  {
> -	return 0;
> +	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
> +			auxtrace);
> +	int ret;
> +
> +	if (dump_trace)
> +		return 0;
> +
> +	if (!tool->ordered_events)
> +		return -EINVAL;
> +
> +	ret = arm_spe__update_queues(spe);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (spe->timeless_decoding)
> +		return arm_spe_process_timeless_queues(spe, -1,
> +				MAX_TIMESTAMP - 1);
> +
> +	return arm_spe_process_queues(spe, MAX_TIMESTAMP);
>  }
>  
>  static void arm_spe_free_queue(void *priv)
> @@ -148,6 +650,9 @@ static void arm_spe_free_queue(void *priv)
>  
>  	if (!speq)
>  		return;
> +	thread__zput(speq->thread);
> +	arm_spe_decoder_free(speq->decoder);
> +	zfree(&speq->event_buf);
>  	free(speq);
>  }
>  
> @@ -188,6 +693,149 @@ static void arm_spe_print_info(__u64 *arr)
>  	fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
>  }
>  
> +struct arm_spe_synth {
> +	struct perf_tool dummy_tool;
> +	struct perf_session *session;
> +};
> +
> +static int arm_spe_event_synth(struct perf_tool *tool,
> +			       union perf_event *event,
> +			       struct perf_sample *sample __maybe_unused,
> +			       struct machine *machine __maybe_unused)
> +{
> +	struct arm_spe_synth *arm_spe_synth =
> +		      container_of(tool, struct arm_spe_synth, dummy_tool);
> +
> +	return perf_session__deliver_synth_event(arm_spe_synth->session,
> +						 event, NULL);
> +}
> +
> +static int arm_spe_synth_event(struct perf_session *session,
> +			       struct perf_event_attr *attr, u64 id)
> +{
> +	struct arm_spe_synth arm_spe_synth;
> +
> +	memset(&arm_spe_synth, 0, sizeof(struct arm_spe_synth));
> +	arm_spe_synth.session = session;
> +
> +	return perf_event__synthesize_attr(&arm_spe_synth.dummy_tool, attr, 1,
> +					   &id, arm_spe_event_synth);
> +}
> +
> +static void arm_spe_set_event_name(struct evlist *evlist, u64 id,
> +				    const char *name)
> +{
> +	struct evsel *evsel;
> +
> +	evlist__for_each_entry(evlist, evsel) {
> +		if (evsel->core.id && evsel->core.id[0] == id) {
> +			if (evsel->name)
> +				zfree(&evsel->name);
> +			evsel->name = strdup(name);
> +			break;
> +		}
> +	}
> +}
> +
> +static int
> +arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
> +{
> +	struct evlist *evlist = session->evlist;
> +	struct evsel *evsel;
> +	struct perf_event_attr attr;
> +	bool found = false;
> +	u64 id;
> +	int err;
> +
> +	evlist__for_each_entry(evlist, evsel) {
> +		if (evsel->core.attr.type == spe->pmu_type) {
> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (!found) {
> +		pr_debug("No selected events with CoreSight Trace data\n");
> +		return 0;
> +	}
> +
> +	memset(&attr, 0, sizeof(struct perf_event_attr));
> +	attr.size = sizeof(struct perf_event_attr);
> +	attr.type = PERF_TYPE_HARDWARE;
> +	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
> +	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
> +		PERF_SAMPLE_PERIOD;
> +	if (spe->timeless_decoding)
> +		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
> +	else
> +		attr.sample_type |= PERF_SAMPLE_TIME;
> +
> +	attr.exclude_user = evsel->core.attr.exclude_user;
> +	attr.exclude_kernel = evsel->core.attr.exclude_kernel;
> +	attr.exclude_hv = evsel->core.attr.exclude_hv;
> +	attr.exclude_host = evsel->core.attr.exclude_host;
> +	attr.exclude_guest = evsel->core.attr.exclude_guest;
> +	attr.sample_id_all = evsel->core.attr.sample_id_all;
> +	attr.read_format = evsel->core.attr.read_format;
> +
> +	/* create new id val to be a fixed offset from evsel id */
> +	id = evsel->core.id[0] + 1000000000;
> +
> +	if (!id)
> +		id = 1;
> +
> +	/* spe events set */
> +	if (spe->synth_opts.llc_miss) {
> +		spe->sample_llc_miss = true;
> +
> +		/* llc-miss */
> +		err = arm_spe_synth_event(session, &attr, id);
> +		if (err)
> +			return err;
> +		spe->llc_miss_id = id;
> +		arm_spe_set_event_name(evlist, id, "llc-miss");
> +		id += 1;
> +	}
> +
> +	if (spe->synth_opts.tlb_miss) {
> +		spe->sample_tlb_miss = true;
> +
> +		/* tlb-miss */
> +		err = arm_spe_synth_event(session, &attr, id);
> +		if (err)
> +			return err;
> +		spe->tlb_miss_id = id;
> +		arm_spe_set_event_name(evlist, id, "tlb-miss");
> +		id += 1;
> +	}
> +
> +	if (spe->synth_opts.branches) {
> +		spe->sample_branch_miss = true;
> +
> +		/* branch-miss */
> +		err = arm_spe_synth_event(session, &attr, id);
> +		if (err)
> +			return err;
> +		spe->branch_miss_id = id;
> +		arm_spe_set_event_name(evlist, id, "branch-miss");
> +		id += 1;
> +	}
> +
> +	if (spe->synth_opts.remote_access) {
> +		spe->sample_remote_access = true;
> +
> +		/* remote-access */
> +		err = arm_spe_synth_event(session, &attr, id);
> +		if (err)
> +			return err;
> +		spe->remote_access_id = id;
> +		arm_spe_set_event_name(evlist, id, "remote-access");
> +		id += 1;
> +	}
> +
> +	return 0;
> +}
> +
>  int arm_spe_process_auxtrace_info(union perf_event *event,
>  				  struct perf_session *session)
>  {
> @@ -213,6 +861,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>  	spe->auxtrace_type = auxtrace_info->type;
>  	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
>  
> +	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
>  	spe->auxtrace.process_event = arm_spe_process_event;
>  	spe->auxtrace.process_auxtrace_event = arm_spe_process_auxtrace_event;
>  	spe->auxtrace.flush_events = arm_spe_flush;
> @@ -222,8 +871,30 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>  
>  	arm_spe_print_info(&auxtrace_info->priv[0]);
>  
> +	if (dump_trace)
> +		return 0;
> +
> +	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
> +		spe->synth_opts = *session->itrace_synth_opts;
> +	else
> +		itrace_synth_opts__set_default(&spe->synth_opts, false);
> +
> +	err = arm_spe_synth_events(spe, session);
> +	if (err)
> +		goto err_free_queues;
> +
> +	err = auxtrace_queues__process_index(&spe->queues, session);
> +	if (err)
> +		goto err_free_queues;
> +
> +	if (spe->queues.populated)
> +		spe->data_queued = true;
> +
>  	return 0;
>  
> +err_free_queues:
> +	auxtrace_queues__free(&spe->queues);
> +	session->auxtrace = NULL;
>  err_free:
>  	free(spe);
>  	return err;
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index eb087e7df6f4..2901b07a9293 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -1279,6 +1279,10 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
>  	synth_opts->pwr_events = true;
>  	synth_opts->other_events = true;
>  	synth_opts->errors = true;
> +	synth_opts->llc_miss = true;
> +	synth_opts->tlb_miss = true;
> +	synth_opts->remote_access = true;
> +
>  	if (no_sample) {
>  		synth_opts->period_type = PERF_ITRACE_PERIOD_INSTRUCTIONS;
>  		synth_opts->period = 1;
> @@ -1431,6 +1435,15 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
>  				goto out_err;
>  			p = endptr;
>  			break;
> +		case 'm':
> +			synth_opts->llc_miss = true;
> +			break;
> +		case 't':
> +			synth_opts->tlb_miss = true;
> +			break;
> +		case 'a':
> +			synth_opts->remote_access = true;
> +			break;
>  		case ' ':
>  		case ',':
>  			break;
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index 749d72cd9c7b..80617b0d044d 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -60,7 +60,7 @@ enum itrace_period_type {
>   * @inject: indicates the event (not just the sample) must be fully synthesized
>   *          because 'perf inject' will write it out
>   * @instructions: whether to synthesize 'instructions' events
> - * @branches: whether to synthesize 'branches' events
> + * @branches: whether to synthesize 'branches' events (branch misses only on Arm)
>   * @transactions: whether to synthesize events for transactions
>   * @ptwrites: whether to synthesize events for ptwrites
>   * @pwr_events: whether to synthesize power events
> @@ -74,6 +74,9 @@ enum itrace_period_type {
>   * @callchain: add callchain to 'instructions' events
>   * @thread_stack: feed branches to the thread_stack
>   * @last_branch: add branch context to 'instruction' events
> + * @llc_miss: whether to synthesize last level cache miss events
> + * @tlb_miss: whether to synthesize TLB miss events
> + * @remote_access: whether to synthesize Remote access events
>   * @callchain_sz: maximum callchain size
>   * @last_branch_sz: branch context size
>   * @period: 'instructions' events period
> @@ -101,6 +104,9 @@ struct itrace_synth_opts {
>  	bool			callchain;
>  	bool			thread_stack;
>  	bool			last_branch;
> +	bool			llc_miss;
> +	bool			tlb_miss;
> +	bool			remote_access;
>  	unsigned int		callchain_sz;
>  	unsigned int		last_branch_sz;
>  	unsigned long long	period;
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/4] perf report: Add SPE options to --itrace argument
  2020-02-11 14:04             ` [PATCH v4 3/4] perf report: Add SPE options to --itrace argument James Clark
@ 2020-02-17 11:39               ` Adrian Hunter
  2020-02-25 11:57                 ` [PATCH v5 0/4] perf tools: Add support for some spe events and precise ip James Clark
  0 siblings, 1 reply; 42+ messages in thread
From: Adrian Hunter @ 2020-02-17 11:39 UTC (permalink / raw)
  To: James Clark, jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Al Grant, Namhyung Kim

On 11/02/20 4:04 pm, James Clark wrote:
> From: Tan Xiaojun <tanxiaojun@huawei.com>
> 
> The previous patch added support in "perf report" for some arm-spe
> events(llc-miss, tlb-miss, branch-miss, remote_access). This patch
> adds their help instructions.
> 
> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
> Tested-by: Qi Liu <liuqi115@hisilicon.com>
> Signed-off-by: James Clark <james.clark@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Tan Xiaojun <tanxiaojun@huawei.com>
> Cc: Al Grant <al.grant@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>

There is also ITRACE_HELP in tools/perf/util/auxtrace.h

Otherwise for auxtrace:

Acked-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
>  tools/perf/Documentation/itrace.txt | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
> index 82ff7dad40c2..8e1488de1fb3 100644
> --- a/tools/perf/Documentation/itrace.txt
> +++ b/tools/perf/Documentation/itrace.txt
> @@ -1,5 +1,5 @@
>  		i	synthesize instructions events
> -		b	synthesize branches events
> +		b	synthesize branches events (branch misses on Arm)
>  		c	synthesize branches events (calls only)
>  		r	synthesize branches events (returns only)
>  		x	synthesize transactions events
> @@ -12,6 +12,9 @@
>  		g	synthesize a call chain (use with i or x)
>  		l	synthesize last branch entries (use with i or x)
>  		s       skip initial number of events
> +		m	synthesize LLC miss events
> +		t	synthesize TLB miss events
> +		a	synthesize remote access events
>  
>  	The default is all events i.e. the same as --itrace=ibxwpe,
>  	except for perf script where it is --itrace=ce
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/4] perf tools: Support "branch-misses:pp" on arm64
  2020-02-11 14:04             ` [PATCH v4 " James Clark
@ 2020-02-17 11:42               ` Adrian Hunter
  2020-02-24 17:08                 ` James Clark
  0 siblings, 1 reply; 42+ messages in thread
From: Adrian Hunter @ 2020-02-17 11:42 UTC (permalink / raw)
  To: James Clark, jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Al Grant,
	Namhyung Kim

On 11/02/20 4:04 pm, James Clark wrote:
> From: Tan Xiaojun <tanxiaojun@huawei.com>
> 
> At the suggestion of James Clark, use spe to support the precise
> ip of some events. Currently its support event is:
> branch-misses.
> 
> Example usage:
> 
> $ ./perf record -e branch-misses:pp dd if=/dev/zero of=/dev/null count=10000
> (:p/pp/ppp is same for this case.)
> 
> $ ./perf report --stdio
> ("--stdio is not necessary")
> 
> --------------------------------------------------------------------
> ...
>  # Samples: 14  of event 'branch-misses:pp'
>  # Event count (approx.): 14
>  #
>  # Children      Self  Command  Shared Object      Symbol
>  # ........  ........  .......  .................  ..........................
>  #
>     14.29%    14.29%  dd       [kernel.kallsyms]  [k] __arch_copy_from_user
>     14.29%    14.29%  dd       libc-2.28.so       [.] _dl_addr
>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] __free_pages
>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] __pi_memcpy
>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] pagecache_get_page
>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] unmap_single_vma
>      7.14%     7.14%  dd       dd                 [.] 0x00000000000025ec
>      7.14%     7.14%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
>      7.14%     7.14%  dd       ld-2.28.so         [.] check_match
>      7.14%     7.14%  dd       libc-2.28.so       [.] __mpn_rshift
>      7.14%     7.14%  dd       libc-2.28.so       [.] _nl_intern_locale_data
>      7.14%     7.14%  dd       libc-2.28.so       [.] read_alias_file
> ...
> --------------------------------------------------------------------
> 
> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
> Suggested-by: James Clark <James.Clark@arm.com>
> Tested-by: Qi Liu <liuqi115@hisilicon.com>
> Signed-off-by: James Clark <james.clark@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Tan Xiaojun <tanxiaojun@huawei.com>
> Cc: Al Grant <al.grant@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/arch/arm/util/auxtrace.c | 38 +++++++++++++++++++++++++++++
>  tools/perf/builtin-record.c         |  5 ++++
>  tools/perf/util/arm-spe.c           |  9 +++++++
>  tools/perf/util/arm-spe.h           |  3 +++
>  tools/perf/util/auxtrace.h          |  6 +++++
>  5 files changed, 61 insertions(+)
> 
> diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
> index 0a6e75b8777a..18f0ea7556e7 100644
> --- a/tools/perf/arch/arm/util/auxtrace.c
> +++ b/tools/perf/arch/arm/util/auxtrace.c
> @@ -10,11 +10,25 @@
>  
>  #include "../../util/auxtrace.h"
>  #include "../../util/debug.h"
> +#include "../../util/env.h"
>  #include "../../util/evlist.h"
>  #include "../../util/pmu.h"
>  #include "cs-etm.h"
>  #include "arm-spe.h"
>  
> +#define SPE_ATTR_TS_ENABLE		BIT(0)
> +#define SPE_ATTR_PA_ENABLE		BIT(1)
> +#define SPE_ATTR_PCT_ENABLE		BIT(2)
> +#define SPE_ATTR_JITTER			BIT(16)
> +#define SPE_ATTR_BRANCH_FILTER		BIT(32)
> +#define SPE_ATTR_LOAD_FILTER		BIT(33)
> +#define SPE_ATTR_STORE_FILTER		BIT(34)
> +
> +#define SPE_ATTR_EV_RETIRED		BIT(1)
> +#define SPE_ATTR_EV_CACHE		BIT(3)
> +#define SPE_ATTR_EV_TLB			BIT(5)
> +#define SPE_ATTR_EV_BRANCH		BIT(7)
> +
>  static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
>  {
>  	struct perf_pmu **arm_spe_pmus = NULL;
> @@ -108,3 +122,27 @@ struct auxtrace_record
>  	*err = 0;
>  	return NULL;
>  }
> +
> +void auxtrace__preprocess_evlist(struct evlist *evlist)
> +{
> +	struct evsel *evsel;
> +	struct perf_pmu *pmu;
> +
> +	evlist__for_each_entry(evlist, evsel) {
> +		/* Currently only supports precise_ip for branch-misses on arm64 */
> +		if (!strcmp(perf_env__arch(evlist->env), "arm64")

Isn't config ambiguous unless you also check type i.e.

			&& evsel->core.attr.type == PERF_TYPE_HARDWARE

> +			&& evsel->core.attr.config == PERF_COUNT_HW_BRANCH_MISSES
> +			&& evsel->core.attr.precise_ip)
> +		{
> +			pmu = perf_pmu__find("arm_spe_0");
> +			if (pmu) {

Changing the event seems a bit weird.

> +				evsel->pmu_name = pmu->name;
> +				evsel->core.attr.type = pmu->type;
> +				evsel->core.attr.config = SPE_ATTR_TS_ENABLE
> +							| SPE_ATTR_BRANCH_FILTER;
> +				evsel->core.attr.config1 = SPE_ATTR_EV_BRANCH;
> +				evsel->core.attr.precise_ip = 0;
> +			}
> +		}
> +	}
> +}
> \ No newline at end of file
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 4c301466101b..3bc61f03d572 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -2451,6 +2451,11 @@ int cmd_record(int argc, const char **argv)
>  
>  	argc = parse_options(argc, argv, record_options, record_usage,
>  			    PARSE_OPT_STOP_AT_NON_OPTION);
> +
> +	if (auxtrace__preprocess_evlist) {
> +		auxtrace__preprocess_evlist(rec->evlist);
> +	}
> +
>  	if (quiet)
>  		perf_quiet_option();
>  
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 4ef22a0775a9..b21806c97dd8 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -778,6 +778,15 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
>  	attr.sample_id_all = evsel->core.attr.sample_id_all;
>  	attr.read_format = evsel->core.attr.read_format;
>  
> +	/* If it is in the precise ip mode, there is no need to
> +	 * synthesize new events. */
> +	if (!strncmp(evsel->name, "branch-misses", 13)) {
> +		spe->sample_branch_miss = true;
> +		spe->branch_miss_id = evsel->core.id[0];
> +
> +		return 0;
> +	}
> +
>  	/* create new id val to be a fixed offset from evsel id */
>  	id = evsel->core.id[0] + 1000000000;
>  
> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
> index 98d3235781c3..8b1fb191d03a 100644
> --- a/tools/perf/util/arm-spe.h
> +++ b/tools/perf/util/arm-spe.h
> @@ -20,6 +20,8 @@ enum {
>  union perf_event;
>  struct perf_session;
>  struct perf_pmu;
> +struct evlist;
> +struct evsel;
>  
>  struct auxtrace_record *arm_spe_recording_init(int *err,
>  					       struct perf_pmu *arm_spe_pmu);
> @@ -28,4 +30,5 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>  				  struct perf_session *session);
>  
>  struct perf_event_attr *arm_spe_pmu_default_config(struct perf_pmu *arm_spe_pmu);
> +void arm_spe_precise_ip_support(struct evlist *evlist, struct evsel *evsel);
>  #endif
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index 80617b0d044d..4f89a3a31ab2 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -584,6 +584,7 @@ void auxtrace__dump_auxtrace_sample(struct perf_session *session,
>  int auxtrace__flush_events(struct perf_session *session, struct perf_tool *tool);
>  void auxtrace__free_events(struct perf_session *session);
>  void auxtrace__free(struct perf_session *session);
> +void auxtrace__preprocess_evlist(struct evlist *evlist) __attribute__((weak));
>  
>  #define ITRACE_HELP \
>  "				i:	    		synthesize instructions events\n"		\
> @@ -728,6 +729,11 @@ void auxtrace__free(struct perf_session *session __maybe_unused)
>  {
>  }
>  
> +static inline
> +void auxtrace__preprocess_evlist(struct evlist *evlist __maybe_unused)
> +{
> +}
> +
>  static inline
>  int auxtrace_index__write(int fd __maybe_unused,
>  			  struct list_head *head __maybe_unused)
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/4] perf tools: Support "branch-misses:pp" on arm64
  2020-02-17 11:42               ` Adrian Hunter
@ 2020-02-24 17:08                 ` James Clark
  2020-02-28 16:01                   ` Mark Rutland
  0 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-02-24 17:08 UTC (permalink / raw)
  To: Adrian Hunter, jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Al Grant,
	Namhyung Kim

Hi Adrian,

On 2/17/20 11:42 AM, Adrian Hunter wrote:
> On 11/02/20 4:04 pm, James Clark wrote:
>> From: Tan Xiaojun <tanxiaojun@huawei.com>
>>
>> At the suggestion of James Clark, use spe to support the precise
>> ip of some events. Currently its support event is:
>> branch-misses.
>>
>> Example usage:
>>
>> $ ./perf record -e branch-misses:pp dd if=/dev/zero of=/dev/null count=10000
>> (:p/pp/ppp is same for this case.)
>>
>> $ ./perf report --stdio
>> ("--stdio is not necessary")
>>
>> --------------------------------------------------------------------
>> ...
>>  # Samples: 14  of event 'branch-misses:pp'
>>  # Event count (approx.): 14
>>  #
>>  # Children      Self  Command  Shared Object      Symbol
>>  # ........  ........  .......  .................  ..........................
>>  #
>>     14.29%    14.29%  dd       [kernel.kallsyms]  [k] __arch_copy_from_user
>>     14.29%    14.29%  dd       libc-2.28.so       [.] _dl_addr
>>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] __free_pages
>>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] __pi_memcpy
>>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] pagecache_get_page
>>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] unmap_single_vma
>>      7.14%     7.14%  dd       dd                 [.] 0x00000000000025ec
>>      7.14%     7.14%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
>>      7.14%     7.14%  dd       ld-2.28.so         [.] check_match
>>      7.14%     7.14%  dd       libc-2.28.so       [.] __mpn_rshift
>>      7.14%     7.14%  dd       libc-2.28.so       [.] _nl_intern_locale_data
>>      7.14%     7.14%  dd       libc-2.28.so       [.] read_alias_file
>> ...
>> --------------------------------------------------------------------
>>
>> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
>> Suggested-by: James Clark <James.Clark@arm.com>
>> Tested-by: Qi Liu <liuqi115@hisilicon.com>
>> Signed-off-by: James Clark <james.clark@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
>> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> Cc: Jiri Olsa <jolsa@redhat.com>
>> Cc: Tan Xiaojun <tanxiaojun@huawei.com>
>> Cc: Al Grant <al.grant@arm.com>
>> Cc: Namhyung Kim <namhyung@kernel.org>
>> ---
>>  tools/perf/arch/arm/util/auxtrace.c | 38 +++++++++++++++++++++++++++++
>>  tools/perf/builtin-record.c         |  5 ++++
>>  tools/perf/util/arm-spe.c           |  9 +++++++
>>  tools/perf/util/arm-spe.h           |  3 +++
>>  tools/perf/util/auxtrace.h          |  6 +++++
>>  5 files changed, 61 insertions(+)
>>
>> diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
>> index 0a6e75b8777a..18f0ea7556e7 100644
>> --- a/tools/perf/arch/arm/util/auxtrace.c
>> +++ b/tools/perf/arch/arm/util/auxtrace.c
>> @@ -10,11 +10,25 @@
>>  
>>  #include "../../util/auxtrace.h"
>>  #include "../../util/debug.h"
>> +#include "../../util/env.h"
>>  #include "../../util/evlist.h"
>>  #include "../../util/pmu.h"
>>  #include "cs-etm.h"
>>  #include "arm-spe.h"
>>  
>> +#define SPE_ATTR_TS_ENABLE		BIT(0)
>> +#define SPE_ATTR_PA_ENABLE		BIT(1)
>> +#define SPE_ATTR_PCT_ENABLE		BIT(2)
>> +#define SPE_ATTR_JITTER			BIT(16)
>> +#define SPE_ATTR_BRANCH_FILTER		BIT(32)
>> +#define SPE_ATTR_LOAD_FILTER		BIT(33)
>> +#define SPE_ATTR_STORE_FILTER		BIT(34)
>> +
>> +#define SPE_ATTR_EV_RETIRED		BIT(1)
>> +#define SPE_ATTR_EV_CACHE		BIT(3)
>> +#define SPE_ATTR_EV_TLB			BIT(5)
>> +#define SPE_ATTR_EV_BRANCH		BIT(7)
>> +
>>  static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
>>  {
>>  	struct perf_pmu **arm_spe_pmus = NULL;
>> @@ -108,3 +122,27 @@ struct auxtrace_record
>>  	*err = 0;
>>  	return NULL;
>>  }
>> +
>> +void auxtrace__preprocess_evlist(struct evlist *evlist)
>> +{
>> +	struct evsel *evsel;
>> +	struct perf_pmu *pmu;
>> +
>> +	evlist__for_each_entry(evlist, evsel) {
>> +		/* Currently only supports precise_ip for branch-misses on arm64 */
>> +		if (!strcmp(perf_env__arch(evlist->env), "arm64")
> 
> Isn't config ambiguous unless you also check type i.e.
> 
> 			&& evsel->core.attr.type == PERF_TYPE_HARDWARE
> 

Yes you're right I will add this.

>> +			&& evsel->core.attr.config == PERF_COUNT_HW_BRANCH_MISSES
>> +			&& evsel->core.attr.precise_ip)
>> +		{
>> +			pmu = perf_pmu__find("arm_spe_0");
>> +			if (pmu) {
> 
> Changing the event seems a bit weird.
> 

This is because there is no support in the kernel for the precise_ip attribute on Arm.
SPE can give you precise ip data for the same event, but changing the event is required.
 
>> +				evsel->pmu_name = pmu->name;
>> +				evsel->core.attr.type = pmu->type;
>> +				evsel->core.attr.config = SPE_ATTR_TS_ENABLE
>> +							| SPE_ATTR_BRANCH_FILTER;
>> +				evsel->core.attr.config1 = SPE_ATTR_EV_BRANCH;
>> +				evsel->core.attr.precise_ip = 0;
>> +			}
>> +		}
>> +	}
>> +}
>> \ No newline at end of file
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 4c301466101b..3bc61f03d572 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -2451,6 +2451,11 @@ int cmd_record(int argc, const char **argv)
>>  
>>  	argc = parse_options(argc, argv, record_options, record_usage,
>>  			    PARSE_OPT_STOP_AT_NON_OPTION);
>> +
>> +	if (auxtrace__preprocess_evlist) {
>> +		auxtrace__preprocess_evlist(rec->evlist);
>> +	}
>> +
>>  	if (quiet)
>>  		perf_quiet_option();
>>  
>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>> index 4ef22a0775a9..b21806c97dd8 100644
>> --- a/tools/perf/util/arm-spe.c
>> +++ b/tools/perf/util/arm-spe.c
>> @@ -778,6 +778,15 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
>>  	attr.sample_id_all = evsel->core.attr.sample_id_all;
>>  	attr.read_format = evsel->core.attr.read_format;
>>  
>> +	/* If it is in the precise ip mode, there is no need to
>> +	 * synthesize new events. */
>> +	if (!strncmp(evsel->name, "branch-misses", 13)) {
>> +		spe->sample_branch_miss = true;
>> +		spe->branch_miss_id = evsel->core.id[0];
>> +
>> +		return 0;
>> +	}
>> +
>>  	/* create new id val to be a fixed offset from evsel id */
>>  	id = evsel->core.id[0] + 1000000000;
>>  
>> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
>> index 98d3235781c3..8b1fb191d03a 100644
>> --- a/tools/perf/util/arm-spe.h
>> +++ b/tools/perf/util/arm-spe.h
>> @@ -20,6 +20,8 @@ enum {
>>  union perf_event;
>>  struct perf_session;
>>  struct perf_pmu;
>> +struct evlist;
>> +struct evsel;
>>  
>>  struct auxtrace_record *arm_spe_recording_init(int *err,
>>  					       struct perf_pmu *arm_spe_pmu);
>> @@ -28,4 +30,5 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>>  				  struct perf_session *session);
>>  
>>  struct perf_event_attr *arm_spe_pmu_default_config(struct perf_pmu *arm_spe_pmu);
>> +void arm_spe_precise_ip_support(struct evlist *evlist, struct evsel *evsel);
>>  #endif
>> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
>> index 80617b0d044d..4f89a3a31ab2 100644
>> --- a/tools/perf/util/auxtrace.h
>> +++ b/tools/perf/util/auxtrace.h
>> @@ -584,6 +584,7 @@ void auxtrace__dump_auxtrace_sample(struct perf_session *session,
>>  int auxtrace__flush_events(struct perf_session *session, struct perf_tool *tool);
>>  void auxtrace__free_events(struct perf_session *session);
>>  void auxtrace__free(struct perf_session *session);
>> +void auxtrace__preprocess_evlist(struct evlist *evlist) __attribute__((weak));
>>  
>>  #define ITRACE_HELP \
>>  "				i:	    		synthesize instructions events\n"		\
>> @@ -728,6 +729,11 @@ void auxtrace__free(struct perf_session *session __maybe_unused)
>>  {
>>  }
>>  
>> +static inline
>> +void auxtrace__preprocess_evlist(struct evlist *evlist __maybe_unused)
>> +{
>> +}
>> +
>>  static inline
>>  int auxtrace_index__write(int fd __maybe_unused,
>>  			  struct list_head *head __maybe_unused)
>>
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v5 0/4] perf tools: Add support for some spe events and precise ip
  2020-02-17 11:39               ` Adrian Hunter
@ 2020-02-25 11:57                 ` James Clark
  2020-02-25 11:57                   ` [PATCH v5 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
                                     ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: James Clark @ 2020-02-25 11:57 UTC (permalink / raw)
  To: adrian.hunter, jolsa, linux-arm-kernel, linux-kernel; +Cc: nd, James Clark

Hi Adrian,

I've added the itrace arguments to ITRACE_HELP and also added the evsel->core.attr.type == PERF_TYPE_HARDWARE
comparison.

Thanks
James

Tan Xiaojun (4):
  perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  perf tools: Add support for "report" for some spe events
  perf report: Add SPE options to --itrace argument
  perf tools: Support "branch-misses:pp" on arm64

 tools/perf/Documentation/itrace.txt           |   5 +-
 tools/perf/arch/arm/util/auxtrace.c           |  39 +
 tools/perf/builtin-record.c                   |   5 +
 tools/perf/util/Build                         |   2 +-
 tools/perf/util/arm-spe-decoder/Build         |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-pkt-decoder.c                     |   0
 .../arm-spe-pkt-decoder.h                     |   2 +
 tools/perf/util/arm-spe.c                     | 756 +++++++++++++++++-
 tools/perf/util/arm-spe.h                     |   3 +
 tools/perf/util/auxtrace.c                    |  13 +
 tools/perf/util/auxtrace.h                    |  19 +-
 13 files changed, 1094 insertions(+), 42 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (96%)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v5 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  2020-02-25 11:57                 ` [PATCH v5 0/4] perf tools: Add support for some spe events and precise ip James Clark
@ 2020-02-25 11:57                   ` James Clark
  2020-02-25 11:57                   ` [PATCH v5 2/4] perf tools: Add support for "report" for some spe events James Clark
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-02-25 11:57 UTC (permalink / raw)
  To: adrian.hunter, jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

Create a new arm-spe-decoder directory for subsequent extensions and
move arm-spe-pkt-decoder.h/c to this directory. No code changes.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/Build                                       | 2 +-
 tools/perf/util/arm-spe-decoder/Build                       | 1 +
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c | 0
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h | 0
 tools/perf/util/arm-spe.c                                   | 2 +-
 5 files changed, 3 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (100%)

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 07da6c790b63..0184510083c2 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -104,7 +104,7 @@ perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 perf-$(CONFIG_AUXTRACE) += intel-pt.o
 perf-$(CONFIG_AUXTRACE) += intel-bts.o
 perf-$(CONFIG_AUXTRACE) += arm-spe.o
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-decoder/
 perf-$(CONFIG_AUXTRACE) += s390-cpumsf.o
 
 ifdef CONFIG_LIBOPENCSD
diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
new file mode 100644
index 000000000000..16efbc245028
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -0,0 +1 @@
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
diff --git a/tools/perf/util/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.c
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
diff --git a/tools/perf/util/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.h
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 53be12b23ff4..f3382a38d48e 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -23,7 +23,7 @@
 #include "debug.h"
 #include "auxtrace.h"
 #include "arm-spe.h"
-#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
 struct arm_spe {
 	struct auxtrace			auxtrace;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 2/4] perf tools: Add support for "report" for some spe events
  2020-02-25 11:57                 ` [PATCH v5 0/4] perf tools: Add support for some spe events and precise ip James Clark
  2020-02-25 11:57                   ` [PATCH v5 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
@ 2020-02-25 11:57                   ` James Clark
  2020-02-29  6:51                     ` Leo Yan
  2020-02-25 11:57                   ` [PATCH v5 3/4] perf report: Add SPE options to --itrace argument James Clark
  2020-02-25 11:57                   ` [PATCH v5 4/4] perf tools: Support "branch-misses:pp" on arm64 James Clark
  3 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-02-25 11:57 UTC (permalink / raw)
  To: adrian.hunter, jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patch is to improve the "perf report" support for spe, and
further process the data. Currently, support for the four events
of llc-miss, tlb-miss, branch-miss, and remote-access is added.

Example usage:

$ ./perf record -c 1024 -e arm_spe_0/branch_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ -o perf-armspe-dd.data dd if=/dev/zero of=/dev/null count=10000

$ ./perf report -i perf-armspe-dd.data --stdio
--------------------------------------------------------------------
...
 # Samples: 23  of event 'llc-miss'
 # Event count (approx.): 23
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    12.12%    12.12%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     6.06%     6.06%  dd       [kernel.kallsyms]  [k] copy_page
     6.06%     6.06%  dd       ld-2.28.so         [.] _dl_relocate_object
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] change_protection_range
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] generic_permission
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] kmem_cache_alloc
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] lookup_fast
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] perf_event_exec
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] ring_buffer_record_is_on
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_start
     3.03%     3.03%  dd       ld-2.28.so         [.] dl_main
     3.03%     3.03%  dd       ld-2.28.so         [.] strcmp
     3.03%     3.03%  dd       libc-2.28.so       [.] _dl_addr
...
 # Samples: 3  of event 'tlb-miss'
 # Event count (approx.): 3
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] filemap_map_pages
    33.33%    33.33%  dd       ld-2.28.so         [.] _dl_start
    33.33%    33.33%  dd       ld-2.28.so         [.] dl_main
...
 # Samples: 20  of event 'branch-miss'
 # Event count (approx.): 20
...
    15.38%    15.38%  dd       [kernel.kallsyms]  [k] __fput
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] do_el0_ia_bp_hardening
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] pagevec_lru_move_fn
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] perf_event_mmap_output
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] task_work_run
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] unmap_single_vma
     7.69%     7.69%  dd       libc-2.28.so       [.] _IO_flush_all_lockp
     7.69%     7.69%  dd       libc-2.28.so       [.] __memcpy_generic
     7.69%     7.69%  dd       libc-2.28.so       [.] _dl_addr
     7.69%     7.69%  dd       libc-2.28.so       [.] msort_with_tmp.part.0
     7.69%     7.69%  dd       libc-2.28.so       [.] read_alias_file
...
 # Samples: 5  of event 'remote-access'
 # Event count (approx.): 5
...
    27.78%    27.78%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    16.67%    16.67%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] change_protection_range
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] generic_permission
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] lookup_fast
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] perf_event_exec
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_relocate_object
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_start
     5.56%     5.56%  dd       ld-2.28.so         [.] dl_main

--------------------------------------------------------------------
After that, more analysis and processing of the raw data of spe
will be done.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/arm-spe-decoder/Build         |   2 +-
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-decoder/arm-spe-pkt-decoder.h     |   2 +
 tools/perf/util/arm-spe.c                     | 745 +++++++++++++++++-
 tools/perf/util/auxtrace.c                    |  13 +
 tools/perf/util/auxtrace.h                    |   8 +-
 7 files changed, 1022 insertions(+), 39 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h

diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
index 16efbc245028..f8dae13fc876 100644
--- a/tools/perf/util/arm-spe-decoder/Build
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -1 +1 @@
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
new file mode 100644
index 000000000000..50e796b89a95
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/zalloc.h>
+
+#include "../util.h"
+#include "../debug.h"
+#include "../auxtrace.h"
+
+#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder.h"
+
+#ifndef BIT
+#define BIT(n)		(1UL << (n))
+#endif
+
+struct arm_spe_decoder {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+	struct arm_spe_state state;
+	const unsigned char *buf;
+	size_t len;
+	uint64_t pos;
+	struct arm_spe_pkt packet;
+	int pkt_step;
+	int pkt_len;
+	int last_packet_type;
+
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t timestamp;
+	uint64_t sample_timestamp;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
+};
+
+static uint64_t arm_spe_calc_ip(uint64_t payload)
+{
+	uint64_t ip = (payload & ~(0xffULL << 56));
+
+	/* fill high 8 bits for kernel virtual address */
+	/* In Armv8 Architecture Reference Manual: Xn[55] determines
+	 * whether the address lies in the upper or lower address range
+	 * for the purpose of determining whether address tagging is
+	 * used */
+	if (ip & BIT(55))
+		ip |= (uint64_t)(0xffULL << 56);
+
+	return ip;
+}
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
+{
+	struct arm_spe_decoder *decoder;
+
+	if (!params->get_trace)
+		return NULL;
+
+	decoder = zalloc(sizeof(struct arm_spe_decoder));
+	if (!decoder)
+		return NULL;
+
+	decoder->get_trace          = params->get_trace;
+	decoder->data               = params->data;
+
+	return decoder;
+}
+
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
+{
+	free(decoder);
+}
+
+static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
+{
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	pr_debug("ERROR: Bad packet\n");
+
+	return -EBADMSG;
+}
+
+
+static int arm_spe_get_data(struct arm_spe_decoder *decoder)
+{
+	struct arm_spe_buffer buffer = { .buf = 0, };
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	pr_debug("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		pr_debug("No more data\n");
+		return -ENODATA;
+	}
+
+	return 0;
+}
+
+static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
+{
+	return arm_spe_get_data(decoder);
+}
+
+static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
+{
+	int ret;
+
+	decoder->last_packet_type = decoder->packet.type;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+
+		if (!decoder->len) {
+			ret = arm_spe_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = arm_spe_get_packet(decoder->buf, decoder->len,
+				&decoder->packet);
+		if (ret <= 0)
+			return arm_spe_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+	} while (decoder->packet.type == ARM_SPE_PAD);
+
+	return 0;
+}
+
+static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
+{
+	int err;
+	int idx;
+	uint64_t payload;
+
+	while (1) {
+		err = arm_spe_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		idx = decoder->packet.index;
+		payload = decoder->packet.payload;
+
+		switch (decoder->packet.type) {
+		case ARM_SPE_TIMESTAMP:
+			decoder->sample_timestamp = payload;
+			return 0;
+		case ARM_SPE_END:
+			decoder->sample_timestamp = 0;
+			return 0;
+		case ARM_SPE_ADDRESS:
+			decoder->ip = arm_spe_calc_ip(payload);
+			if (idx == 0)
+				decoder->state.from_ip = decoder->ip;
+			else if (idx == 1)
+				decoder->state.to_ip = decoder->ip;
+			break;
+		case ARM_SPE_COUNTER:
+			break;
+		case ARM_SPE_CONTEXT:
+			break;
+		case ARM_SPE_OP_TYPE:
+			break;
+		case ARM_SPE_EVENTS:
+			if (payload & BIT(EV_TLB_REFILL))
+				decoder->state.type |= ARM_SPE_TLB_MISS;
+			if (payload & BIT(EV_MISPRED))
+				decoder->state.type |= ARM_SPE_BRANCH_MISS;
+			if (idx > 1 && (payload & BIT(EV_LLC_REFILL)))
+				decoder->state.type |= ARM_SPE_LLC_MISS;
+			if (idx > 1 && (payload & BIT(EV_REMOTE_ACCESS)))
+				decoder->state.type |= ARM_SPE_REMOTE_ACCESS;
+
+			break;
+		case ARM_SPE_DATA_SOURCE:
+			break;
+		case ARM_SPE_BAD:
+			break;
+		case ARM_SPE_PAD:
+			break;
+		default:
+			pr_err("Get Packet Error!\n");
+			return -ENOSYS;
+		}
+	}
+}
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
+{
+	int err;
+
+	decoder->state.type = 0;
+
+	err = arm_spe_walk_trace(decoder);
+	if (err)
+		decoder->state.err = err;
+
+	decoder->state.timestamp = decoder->sample_timestamp;
+
+	return &decoder->state;
+}
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
new file mode 100644
index 000000000000..330f9e1e71ab
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef INCLUDE__ARM_SPE_DECODER_H__
+#define INCLUDE__ARM_SPE_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+enum arm_spe_events {
+	EV_EXCEPTION_GEN,
+	EV_RETIRED,
+	EV_L1D_ACCESS,
+	EV_L1D_REFILL,
+	EV_TLB_ACCESS,
+	EV_TLB_REFILL,
+	EV_NOT_TAKEN,
+	EV_MISPRED,
+	EV_LLC_ACCESS,
+	EV_LLC_REFILL,
+	EV_REMOTE_ACCESS,
+};
+
+enum arm_spe_sample_type {
+	ARM_SPE_LLC_MISS	= 1 << 0,
+	ARM_SPE_TLB_MISS	= 1 << 1,
+	ARM_SPE_BRANCH_MISS	= 1 << 2,
+	ARM_SPE_REMOTE_ACCESS	= 1 << 3,
+	ARM_SPE_EX_STOP		= 1 << 6,
+};
+
+struct arm_spe_state {
+	enum arm_spe_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t timestamp;
+};
+
+struct arm_spe_insn;
+
+struct arm_spe_buffer {
+	const unsigned char *buf;
+	size_t len;
+	u64 offset;
+	bool consecutive;
+	uint64_t ref_timestamp;
+	uint64_t trace_nr;
+};
+
+struct arm_spe_params {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+};
+
+struct arm_spe_decoder;
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params);
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder);
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder);
+
+#endif
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
index d786ef65113f..865d1e35b401 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -15,6 +15,8 @@
 #define ARM_SPE_NEED_MORE_BYTES		-1
 #define ARM_SPE_BAD_PACKET		-2
 
+#define ARM_SPE_PKT_MAX_SZ		16
+
 enum arm_spe_pkt_type {
 	ARM_SPE_BAD,
 	ARM_SPE_PAD,
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index f3382a38d48e..4ef22a0775a9 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -16,34 +16,68 @@
 #include <linux/log2.h>
 #include <linux/zalloc.h>
 
+#include "auxtrace.h"
 #include "color.h"
+#include "debug.h"
 #include "evsel.h"
+#include "evlist.h"
 #include "machine.h"
 #include "session.h"
-#include "debug.h"
-#include "auxtrace.h"
+#include "symbol.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "tool.h"
+#include "util/synthetic-events.h"
+
 #include "arm-spe.h"
+#include "arm-spe-decoder/arm-spe-decoder.h"
 #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
+#define MAX_TIMESTAMP (~0ULL)
+
 struct arm_spe {
 	struct auxtrace			auxtrace;
 	struct auxtrace_queues		queues;
 	struct auxtrace_heap		heap;
+        struct itrace_synth_opts        synth_opts;
 	u32				auxtrace_type;
 	struct perf_session		*session;
 	struct machine			*machine;
 	u32				pmu_type;
+
+	u8				timeless_decoding;
+	u8				data_queued;
+
+	u8				sample_llc_miss;
+	u8				sample_tlb_miss;
+	u8				sample_branch_miss;
+	u8				sample_remote_access;
+	u64				llc_miss_id;
+	u64				tlb_miss_id;
+	u64				branch_miss_id;
+	u64				remote_access_id;
+	u64				kernel_start;
+
+	unsigned long			num_events;
 };
 
 struct arm_spe_queue {
-	struct arm_spe		*spe;
-	unsigned int		queue_nr;
-	struct auxtrace_buffer	*buffer;
-	bool			on_heap;
-	bool			done;
-	pid_t			pid;
-	pid_t			tid;
-	int			cpu;
+	struct arm_spe			*spe;
+	unsigned int			queue_nr;
+	struct auxtrace_buffer		*buffer;
+	struct auxtrace_buffer		*old_buffer;
+	union perf_event		*event_buf;
+	bool				on_heap;
+	bool				done;
+	pid_t				pid;
+	pid_t				tid;
+	int				cpu;
+	void				*decoder;
+	const struct arm_spe_state	*state;
+	u64				time;
+	u64				timestamp;
+	struct thread			*thread;
+	bool				have_sample;
 };
 
 static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
@@ -92,44 +126,494 @@ static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
 	arm_spe_dump(spe, buf, len);
 }
 
-static int arm_spe_process_event(struct perf_session *session __maybe_unused,
-				 union perf_event *event __maybe_unused,
-				 struct perf_sample *sample __maybe_unused,
-				 struct perf_tool *tool __maybe_unused)
+static int arm_spe_get_trace(struct arm_spe_buffer *b, void *data)
+{
+	struct arm_spe_queue *speq = data;
+	struct auxtrace_buffer *buffer = speq->buffer;
+	struct auxtrace_buffer *old_buffer = speq->old_buffer;
+	struct auxtrace_queue *queue;
+
+	queue = &speq->spe->queues.queue_array[speq->queue_nr];
+
+	buffer = auxtrace_buffer__next(queue, buffer);
+	/* If no more data, drop the previous auxtrace_buffer and return */
+	if (!buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	speq->buffer = buffer;
+
+	/* If the aux_buffer doesn't have data associated, try to load it */
+	if (!buffer->data) {
+		/* get the file desc associated with the perf data file */
+		int fd = perf_data__fd(speq->spe->session->data);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (buffer->use_data) {
+		b->len = buffer->use_size;
+		b->buf = buffer->use_data;
+	} else {
+		b->len = buffer->size;
+		b->buf = buffer->data;
+	}
+
+	b->ref_timestamp = buffer->reference;
+
+	if (b->len) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		speq->old_buffer = buffer;
+	} else {
+		auxtrace_buffer__drop_data(buffer);
+		return arm_spe_get_trace(b, data);
+	}
+
+	return 0;
+}
+
+static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe,
+		unsigned int queue_nr)
+{
+	struct arm_spe_params params = { .get_trace = 0, };
+	struct arm_spe_queue *speq;
+
+	speq = zalloc(sizeof(*speq));
+	if (!speq)
+		return NULL;
+
+	speq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!speq->event_buf)
+		goto out_free;
+
+	speq->spe = spe;
+	speq->queue_nr = queue_nr;
+	speq->pid = -1;
+	speq->tid = -1;
+	speq->cpu = -1;
+
+	/* params set */
+	params.get_trace = arm_spe_get_trace;
+	params.data = speq;
+
+	/* create new decoder */
+	speq->decoder = arm_spe_decoder_new(&params);
+	if (!speq->decoder)
+		goto out_free;
+
+	return speq;
+
+out_free:
+	zfree(&speq->event_buf);
+	free(speq);
+
+	return NULL;
+}
+
+static inline u8 arm_spe_cpumode(struct arm_spe *spe, uint64_t ip)
+{
+	return ip >= spe->kernel_start ?
+		PERF_RECORD_MISC_KERNEL :
+		PERF_RECORD_MISC_USER;
+}
+
+static void arm_spe_prep_sample(struct arm_spe *spe,
+				struct arm_spe_queue *speq,
+				union perf_event *event,
+				struct perf_sample *sample)
+{
+	if (!spe->timeless_decoding)
+		sample->time = speq->timestamp;
+
+	sample->ip = speq->state->from_ip;
+	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
+	sample->pid = speq->pid;
+	sample->tid = speq->tid;
+	sample->addr = speq->state->to_ip;
+	sample->period = 1;
+	sample->cpu = speq->cpu;
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = sample->cpumode;
+	event->sample.header.size = sizeof(struct perf_event_header);
+}
+
+static inline int
+arm_spe_deliver_synth_event(struct arm_spe *spe,
+			    struct arm_spe_queue *speq __maybe_unused,
+			    union perf_event *event,
+			    struct perf_sample *sample)
+{
+	int ret;
+
+	ret = perf_session__deliver_synth_event(spe->session, event, sample);
+	if (ret)
+		pr_err("ARM SPE: failed to deliver event, error %d\n", ret);
+
+	return ret;
+}
+
+static int
+arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
+				u64 spe_events_id)
+{
+	struct arm_spe *spe = speq->spe;
+	union perf_event *event = speq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	arm_spe_prep_sample(spe, speq, event, &sample);
+
+	sample.id = spe_events_id;
+	sample.stream_id = spe_events_id;
+
+	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+static int arm_spe_sample(struct arm_spe_queue *speq)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!speq->have_sample)
+		return 0;
+
+	speq->have_sample = false;
+
+	if (spe->sample_llc_miss && (state->type & ARM_SPE_LLC_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->llc_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_tlb_miss && (state->type & ARM_SPE_TLB_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->tlb_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_branch_miss && (state->type & ARM_SPE_BRANCH_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq,
+						      spe->branch_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_remote_access && (state->type & ARM_SPE_REMOTE_ACCESS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->remote_access_id);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!spe->kernel_start)
+		spe->kernel_start = machine__kernel_start(spe->machine);
+
+	while (1) {
+		err = arm_spe_sample(speq);
+		if (err)
+			return err;
+
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("No data or all data has been processed.\n");
+				return 1;
+			}
+			continue;
+		}
+
+		speq->state = state;
+		speq->have_sample = true;
+
+		if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
+			*timestamp = speq->timestamp;
+			return 0;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queue(struct arm_spe *spe,
+			       struct auxtrace_queue *queue,
+			       unsigned int queue_nr)
+{
+	struct arm_spe_queue *speq = queue->priv;
+
+	if (list_empty(&queue->head) || speq)
+		return 0;
+
+	speq = arm_spe__alloc_queue(spe, queue_nr);
+
+	if (!speq)
+		return -ENOMEM;
+
+	queue->priv = speq;
+
+	if (queue->cpu != -1)
+		speq->cpu = queue->cpu;
+
+	if (!speq->on_heap) {
+		const struct arm_spe_state *state;
+		int ret;
+
+		if (spe->timeless_decoding)
+			return 0;
+
+retry:
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("queue %u has no timestamp\n",
+						queue_nr);
+				return 0;
+			}
+			goto retry;
+		}
+
+		speq->timestamp = state->timestamp;
+		speq->state = state;
+		speq->have_sample = true;
+		ret = auxtrace_heap__add(&spe->heap, queue_nr, speq->timestamp);
+		if (ret)
+			return ret;
+		speq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queues(struct arm_spe *spe)
 {
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < spe->queues.nr_queues; i++) {
+		ret = arm_spe__setup_queue(spe, &spe->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int arm_spe__update_queues(struct arm_spe *spe)
+{
+	if (spe->queues.new_data) {
+		spe->queues.new_data = false;
+		return arm_spe__setup_queues(spe);
+	}
+
 	return 0;
 }
 
+static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
+{
+	struct evsel *evsel;
+	struct evlist *evlist = spe->session->evlist;
+	bool timeless_decoding = true;
+
+	/*
+	 * Circle through the list of event and complain if we find one
+	 * with the time bit set.
+	 */
+	evlist__for_each_entry(evlist, evsel) {
+		if ((evsel->core.attr.sample_type & PERF_SAMPLE_TIME))
+			timeless_decoding = false;
+	}
+
+	return timeless_decoding;
+}
+
+static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
+				    struct auxtrace_queue *queue)
+{
+	struct arm_spe_queue *speq = queue->priv;
+	pid_t tid;
+
+	tid = machine__get_current_tid(spe->machine, speq->cpu);
+	if (tid != -1) {
+		speq->tid = tid;
+		thread__zput(speq->thread);
+	} else
+		speq->tid = queue->tid;
+
+	if ((!speq->thread) && (speq->tid != -1)) {
+		speq->thread = machine__find_thread(spe->machine, -1,
+						    speq->tid);
+	}
+
+	if (speq->thread) {
+		speq->pid = speq->thread->pid_;
+		if (queue->cpu == -1)
+			speq->cpu = speq->thread->cpu;
+	}
+}
+
+static int arm_spe_process_queues(struct arm_spe *spe, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct arm_spe_queue *speq;
+
+		if (!spe->heap.heap_cnt)
+			return 0;
+
+		if (spe->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = spe->heap.heap_array[0].queue_nr;
+		queue = &spe->queues.queue_array[queue_nr];
+		speq = queue->priv;
+
+		auxtrace_heap__pop(&spe->heap);
+
+		if (spe->heap.heap_cnt) {
+			ts = spe->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		arm_spe_set_pid_tid_cpu(spe, queue);
+
+		ret = arm_spe_run_decoder(speq, &ts);
+		if (ret < 0) {
+			auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			speq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid,
+					    u64 time_)
+{
+	struct auxtrace_queues *queues = &spe->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &spe->queues.queue_array[i];
+		struct arm_spe_queue *speq = queue->priv;
+
+		if (speq && (tid == -1 || speq->tid == tid)) {
+			speq->time = time_;
+			arm_spe_set_pid_tid_cpu(spe, queue);
+			arm_spe_run_decoder(speq, &ts);
+		}
+	}
+	return 0;
+}
+
+static int arm_spe_process_event(struct perf_session *session,
+				 union perf_event *event,
+				 struct perf_sample *sample,
+				 struct perf_tool *tool)
+{
+	int err = 0;
+	u64 timestamp;
+	struct arm_spe *spe = container_of(session->auxtrace,
+			struct arm_spe, auxtrace);
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("CoreSight SPE Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time && (sample->time != (u64) -1))
+		timestamp = sample->time;
+	else
+		timestamp = 0;
+
+	if (timestamp || spe->timeless_decoding) {
+		err = arm_spe__update_queues(spe);
+		if (err)
+			return err;
+	}
+
+	if (spe->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_timeless_queues(spe,
+					event->fork.tid,
+					sample->time);
+		}
+	} else if (timestamp) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_queues(spe, timestamp);
+			if (err)
+				return err;
+		}
+	}
+
+	return err;
+}
+
 static int arm_spe_process_auxtrace_event(struct perf_session *session,
 					  union perf_event *event,
 					  struct perf_tool *tool __maybe_unused)
 {
 	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
 					     auxtrace);
-	struct auxtrace_buffer *buffer;
-	off_t data_offset;
-	int fd = perf_data__fd(session->data);
-	int err;
 
-	if (perf_data__is_pipe(session->data)) {
-		data_offset = 0;
-	} else {
-		data_offset = lseek(fd, 0, SEEK_CUR);
-		if (data_offset == -1)
-			return -errno;
-	}
+	if (!spe->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data__fd(session->data);
+		int err;
 
-	err = auxtrace_queues__add_event(&spe->queues, session, event,
-					 data_offset, &buffer);
-	if (err)
-		return err;
-
-	/* Dump here now we have copied a piped trace out of the pipe */
-	if (dump_trace) {
-		if (auxtrace_buffer__get_data(buffer, fd)) {
-			arm_spe_dump_event(spe, buffer->data,
-					     buffer->size);
-			auxtrace_buffer__put_data(buffer);
+		if (perf_data__is_pipe(session->data)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&spe->queues, session, event,
+				data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				arm_spe_dump_event(spe, buffer->data,
+						buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
 		}
 	}
 
@@ -139,7 +623,25 @@ static int arm_spe_process_auxtrace_event(struct perf_session *session,
 static int arm_spe_flush(struct perf_session *session __maybe_unused,
 			 struct perf_tool *tool __maybe_unused)
 {
-	return 0;
+	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
+			auxtrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = arm_spe__update_queues(spe);
+	if (ret < 0)
+		return ret;
+
+	if (spe->timeless_decoding)
+		return arm_spe_process_timeless_queues(spe, -1,
+				MAX_TIMESTAMP - 1);
+
+	return arm_spe_process_queues(spe, MAX_TIMESTAMP);
 }
 
 static void arm_spe_free_queue(void *priv)
@@ -148,6 +650,9 @@ static void arm_spe_free_queue(void *priv)
 
 	if (!speq)
 		return;
+	thread__zput(speq->thread);
+	arm_spe_decoder_free(speq->decoder);
+	zfree(&speq->event_buf);
 	free(speq);
 }
 
@@ -188,6 +693,149 @@ static void arm_spe_print_info(__u64 *arr)
 	fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
 }
 
+struct arm_spe_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int arm_spe_event_synth(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample __maybe_unused,
+			       struct machine *machine __maybe_unused)
+{
+	struct arm_spe_synth *arm_spe_synth =
+		      container_of(tool, struct arm_spe_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(arm_spe_synth->session,
+						 event, NULL);
+}
+
+static int arm_spe_synth_event(struct perf_session *session,
+			       struct perf_event_attr *attr, u64 id)
+{
+	struct arm_spe_synth arm_spe_synth;
+
+	memset(&arm_spe_synth, 0, sizeof(struct arm_spe_synth));
+	arm_spe_synth.session = session;
+
+	return perf_event__synthesize_attr(&arm_spe_synth.dummy_tool, attr, 1,
+					   &id, arm_spe_event_synth);
+}
+
+static void arm_spe_set_event_name(struct evlist *evlist, u64 id,
+				    const char *name)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.id && evsel->core.id[0] == id) {
+			if (evsel->name)
+				zfree(&evsel->name);
+			evsel->name = strdup(name);
+			break;
+		}
+	}
+}
+
+static int
+arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
+{
+	struct evlist *evlist = session->evlist;
+	struct evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.attr.type == spe->pmu_type) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("No selected events with CoreSight Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+		PERF_SAMPLE_PERIOD;
+	if (spe->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+
+	attr.exclude_user = evsel->core.attr.exclude_user;
+	attr.exclude_kernel = evsel->core.attr.exclude_kernel;
+	attr.exclude_hv = evsel->core.attr.exclude_hv;
+	attr.exclude_host = evsel->core.attr.exclude_host;
+	attr.exclude_guest = evsel->core.attr.exclude_guest;
+	attr.sample_id_all = evsel->core.attr.sample_id_all;
+	attr.read_format = evsel->core.attr.read_format;
+
+	/* create new id val to be a fixed offset from evsel id */
+	id = evsel->core.id[0] + 1000000000;
+
+	if (!id)
+		id = 1;
+
+	/* spe events set */
+	if (spe->synth_opts.llc_miss) {
+		spe->sample_llc_miss = true;
+
+		/* llc-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->llc_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "llc-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.tlb_miss) {
+		spe->sample_tlb_miss = true;
+
+		/* tlb-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->tlb_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "tlb-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.branches) {
+		spe->sample_branch_miss = true;
+
+		/* branch-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->branch_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "branch-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.remote_access) {
+		spe->sample_remote_access = true;
+
+		/* remote-access */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->remote_access_id = id;
+		arm_spe_set_event_name(evlist, id, "remote-access");
+		id += 1;
+	}
+
+	return 0;
+}
+
 int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session)
 {
@@ -213,6 +861,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->auxtrace_type = auxtrace_info->type;
 	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
 
+	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 	spe->auxtrace.process_event = arm_spe_process_event;
 	spe->auxtrace.process_auxtrace_event = arm_spe_process_auxtrace_event;
 	spe->auxtrace.flush_events = arm_spe_flush;
@@ -222,8 +871,30 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 
 	arm_spe_print_info(&auxtrace_info->priv[0]);
 
+	if (dump_trace)
+		return 0;
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+		spe->synth_opts = *session->itrace_synth_opts;
+	else
+		itrace_synth_opts__set_default(&spe->synth_opts, false);
+
+	err = arm_spe_synth_events(spe, session);
+	if (err)
+		goto err_free_queues;
+
+	err = auxtrace_queues__process_index(&spe->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (spe->queues.populated)
+		spe->data_queued = true;
+
 	return 0;
 
+err_free_queues:
+	auxtrace_queues__free(&spe->queues);
+	session->auxtrace = NULL;
 err_free:
 	free(spe);
 	return err;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index eb087e7df6f4..994d5e3c9e4f 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1279,6 +1279,10 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
 	synth_opts->pwr_events = true;
 	synth_opts->other_events = true;
 	synth_opts->errors = true;
+	synth_opts->llc_miss = true;
+	synth_opts->tlb_miss = true;
+	synth_opts->remote_access = true;
+
 	if (no_sample) {
 		synth_opts->period_type = PERF_ITRACE_PERIOD_INSTRUCTIONS;
 		synth_opts->period = 1;
@@ -1431,6 +1435,15 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 				goto out_err;
 			p = endptr;
 			break;
+		case 'm':
+			synth_opts->llc_miss = true;
+			break;
+		case 't':
+			synth_opts->tlb_miss = true;
+			break;
+		case 'a':
+			synth_opts->remote_access = true;
+			break;
 		case ' ':
 		case ',':
 			break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 749d72cd9c7b..80617b0d044d 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -60,7 +60,7 @@ enum itrace_period_type {
  * @inject: indicates the event (not just the sample) must be fully synthesized
  *          because 'perf inject' will write it out
  * @instructions: whether to synthesize 'instructions' events
- * @branches: whether to synthesize 'branches' events
+ * @branches: whether to synthesize 'branches' events (branch misses only on Arm)
  * @transactions: whether to synthesize events for transactions
  * @ptwrites: whether to synthesize events for ptwrites
  * @pwr_events: whether to synthesize power events
@@ -74,6 +74,9 @@ enum itrace_period_type {
  * @callchain: add callchain to 'instructions' events
  * @thread_stack: feed branches to the thread_stack
  * @last_branch: add branch context to 'instruction' events
+ * @llc_miss: whether to synthesize last level cache miss events
+ * @tlb_miss: whether to synthesize TLB miss events
+ * @remote_access: whether to synthesize Remote access events
  * @callchain_sz: maximum callchain size
  * @last_branch_sz: branch context size
  * @period: 'instructions' events period
@@ -101,6 +104,9 @@ struct itrace_synth_opts {
 	bool			callchain;
 	bool			thread_stack;
 	bool			last_branch;
+	bool			llc_miss;
+	bool			tlb_miss;
+	bool			remote_access;
 	unsigned int		callchain_sz;
 	unsigned int		last_branch_sz;
 	unsigned long long	period;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 3/4] perf report: Add SPE options to --itrace argument
  2020-02-25 11:57                 ` [PATCH v5 0/4] perf tools: Add support for some spe events and precise ip James Clark
  2020-02-25 11:57                   ` [PATCH v5 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
  2020-02-25 11:57                   ` [PATCH v5 2/4] perf tools: Add support for "report" for some spe events James Clark
@ 2020-02-25 11:57                   ` James Clark
  2020-02-25 11:57                   ` [PATCH v5 4/4] perf tools: Support "branch-misses:pp" on arm64 James Clark
  3 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-02-25 11:57 UTC (permalink / raw)
  To: adrian.hunter, jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

The previous patch added support in "perf report" for some arm-spe
events(llc-miss, tlb-miss, branch-miss, remote_access). This patch
adds their help instructions.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/itrace.txt | 5 ++++-
 tools/perf/util/auxtrace.h          | 5 ++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 82ff7dad40c2..da3e5ccc039e 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -1,5 +1,5 @@
 		i	synthesize instructions events
-		b	synthesize branches events
+		b	synthesize branches events (branch misses on Arm)
 		c	synthesize branches events (calls only)
 		r	synthesize branches events (returns only)
 		x	synthesize transactions events
@@ -9,6 +9,9 @@
 			of aux-output (refer to perf record)
 		e	synthesize error events
 		d	create a debug log
+		m	synthesize LLC miss events
+		t	synthesize TLB miss events
+		a	synthesize remote access events
 		g	synthesize a call chain (use with i or x)
 		l	synthesize last branch entries (use with i or x)
 		s       skip initial number of events
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 80617b0d044d..52e148eea7f8 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -587,7 +587,7 @@ void auxtrace__free(struct perf_session *session);
 
 #define ITRACE_HELP \
 "				i:	    		synthesize instructions events\n"		\
-"				b:	    		synthesize branches events\n"		\
+"				b:	    		synthesize branches events (branch misses on Arm)\n" \
 "				c:	    		synthesize branches events (calls only)\n"	\
 "				r:	    		synthesize branches events (returns only)\n" \
 "				x:	    		synthesize transactions events\n"		\
@@ -595,6 +595,9 @@ void auxtrace__free(struct perf_session *session);
 "				p:	    		synthesize power events\n"			\
 "				e:	    		synthesize error events\n"			\
 "				d:	    		create a debug log\n"			\
+"				m:	    		synthesize LLC miss events\n" \
+"				t:	    		synthesize TLB miss events\n" \
+"				a:	    		synthesize remote access events\n" \
 "				g[len]:     		synthesize a call chain (use with i or x)\n" \
 "				l[len]:     		synthesize last branch entries (use with i or x)\n" \
 "				sNUMBER:    		skip initial number of events\n"		\
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 4/4] perf tools: Support "branch-misses:pp" on arm64
  2020-02-25 11:57                 ` [PATCH v5 0/4] perf tools: Add support for some spe events and precise ip James Clark
                                     ` (2 preceding siblings ...)
  2020-02-25 11:57                   ` [PATCH v5 3/4] perf report: Add SPE options to --itrace argument James Clark
@ 2020-02-25 11:57                   ` James Clark
  2020-02-28 16:03                     ` Mark Rutland
  3 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-02-25 11:57 UTC (permalink / raw)
  To: adrian.hunter, jolsa, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

At the suggestion of James Clark, use spe to support the precise
ip of some events. Currently its support event is:
branch-misses.

Example usage:

$ ./perf record -e branch-misses:pp dd if=/dev/zero of=/dev/null count=10000
(:p/pp/ppp is same for this case.)

$ ./perf report --stdio
("--stdio is not necessary")

--------------------------------------------------------------------
...
 # Samples: 14  of event 'branch-misses:pp'
 # Event count (approx.): 14
 #
 # Children      Self  Command  Shared Object      Symbol
 # ........  ........  .......  .................  ..........................
 #
    14.29%    14.29%  dd       [kernel.kallsyms]  [k] __arch_copy_from_user
    14.29%    14.29%  dd       libc-2.28.so       [.] _dl_addr
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] __free_pages
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] __pi_memcpy
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] pagecache_get_page
     7.14%     7.14%  dd       [kernel.kallsyms]  [k] unmap_single_vma
     7.14%     7.14%  dd       dd                 [.] 0x00000000000025ec
     7.14%     7.14%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
     7.14%     7.14%  dd       ld-2.28.so         [.] check_match
     7.14%     7.14%  dd       libc-2.28.so       [.] __mpn_rshift
     7.14%     7.14%  dd       libc-2.28.so       [.] _nl_intern_locale_data
     7.14%     7.14%  dd       libc-2.28.so       [.] read_alias_file
...
--------------------------------------------------------------------

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Suggested-by: James Clark <James.Clark@arm.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/arch/arm/util/auxtrace.c | 39 +++++++++++++++++++++++++++++
 tools/perf/builtin-record.c         |  5 ++++
 tools/perf/util/arm-spe.c           |  9 +++++++
 tools/perf/util/arm-spe.h           |  3 +++
 tools/perf/util/auxtrace.h          |  6 +++++
 5 files changed, 62 insertions(+)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 0a6e75b8777a..7f412b7894ab 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -10,11 +10,25 @@
 
 #include "../../util/auxtrace.h"
 #include "../../util/debug.h"
+#include "../../util/env.h"
 #include "../../util/evlist.h"
 #include "../../util/pmu.h"
 #include "cs-etm.h"
 #include "arm-spe.h"
 
+#define SPE_ATTR_TS_ENABLE		BIT(0)
+#define SPE_ATTR_PA_ENABLE		BIT(1)
+#define SPE_ATTR_PCT_ENABLE		BIT(2)
+#define SPE_ATTR_JITTER			BIT(16)
+#define SPE_ATTR_BRANCH_FILTER		BIT(32)
+#define SPE_ATTR_LOAD_FILTER		BIT(33)
+#define SPE_ATTR_STORE_FILTER		BIT(34)
+
+#define SPE_ATTR_EV_RETIRED		BIT(1)
+#define SPE_ATTR_EV_CACHE		BIT(3)
+#define SPE_ATTR_EV_TLB			BIT(5)
+#define SPE_ATTR_EV_BRANCH		BIT(7)
+
 static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
 {
 	struct perf_pmu **arm_spe_pmus = NULL;
@@ -108,3 +122,28 @@ struct auxtrace_record
 	*err = 0;
 	return NULL;
 }
+
+void auxtrace__preprocess_evlist(struct evlist *evlist)
+{
+	struct evsel *evsel;
+	struct perf_pmu *pmu;
+
+	evlist__for_each_entry(evlist, evsel) {
+		/* Currently only supports precise_ip for branch-misses on arm64 */
+		if (!strcmp(perf_env__arch(evlist->env), "arm64")
+			&& evsel->core.attr.config == PERF_COUNT_HW_BRANCH_MISSES
+			&& evsel->core.attr.type == PERF_TYPE_HARDWARE
+			&& evsel->core.attr.precise_ip)
+		{
+			pmu = perf_pmu__find("arm_spe_0");
+			if (pmu) {
+				evsel->pmu_name = pmu->name;
+				evsel->core.attr.type = pmu->type;
+				evsel->core.attr.config = SPE_ATTR_TS_ENABLE
+							| SPE_ATTR_BRANCH_FILTER;
+				evsel->core.attr.config1 = SPE_ATTR_EV_BRANCH;
+				evsel->core.attr.precise_ip = 0;
+			}
+		}
+	}
+}
\ No newline at end of file
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..3bc61f03d572 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2451,6 +2451,11 @@ int cmd_record(int argc, const char **argv)
 
 	argc = parse_options(argc, argv, record_options, record_usage,
 			    PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (auxtrace__preprocess_evlist) {
+		auxtrace__preprocess_evlist(rec->evlist);
+	}
+
 	if (quiet)
 		perf_quiet_option();
 
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 4ef22a0775a9..b21806c97dd8 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -778,6 +778,15 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	attr.sample_id_all = evsel->core.attr.sample_id_all;
 	attr.read_format = evsel->core.attr.read_format;
 
+	/* If it is in the precise ip mode, there is no need to
+	 * synthesize new events. */
+	if (!strncmp(evsel->name, "branch-misses", 13)) {
+		spe->sample_branch_miss = true;
+		spe->branch_miss_id = evsel->core.id[0];
+
+		return 0;
+	}
+
 	/* create new id val to be a fixed offset from evsel id */
 	id = evsel->core.id[0] + 1000000000;
 
diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index 98d3235781c3..8b1fb191d03a 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -20,6 +20,8 @@ enum {
 union perf_event;
 struct perf_session;
 struct perf_pmu;
+struct evlist;
+struct evsel;
 
 struct auxtrace_record *arm_spe_recording_init(int *err,
 					       struct perf_pmu *arm_spe_pmu);
@@ -28,4 +30,5 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session);
 
 struct perf_event_attr *arm_spe_pmu_default_config(struct perf_pmu *arm_spe_pmu);
+void arm_spe_precise_ip_support(struct evlist *evlist, struct evsel *evsel);
 #endif
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 52e148eea7f8..4be56bca54dc 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -584,6 +584,7 @@ void auxtrace__dump_auxtrace_sample(struct perf_session *session,
 int auxtrace__flush_events(struct perf_session *session, struct perf_tool *tool);
 void auxtrace__free_events(struct perf_session *session);
 void auxtrace__free(struct perf_session *session);
+void auxtrace__preprocess_evlist(struct evlist *evlist) __attribute__((weak));
 
 #define ITRACE_HELP \
 "				i:	    		synthesize instructions events\n"		\
@@ -731,6 +732,11 @@ void auxtrace__free(struct perf_session *session __maybe_unused)
 {
 }
 
+static inline
+void auxtrace__preprocess_evlist(struct evlist *evlist __maybe_unused)
+{
+}
+
 static inline
 int auxtrace_index__write(int fd __maybe_unused,
 			  struct list_head *head __maybe_unused)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 4/4] perf tools: Support "branch-misses:pp" on arm64
  2020-02-24 17:08                 ` James Clark
@ 2020-02-28 16:01                   ` Mark Rutland
  2020-03-06 15:25                     ` [PATCH v6 0/3] perf tools: Add support for some spe events James Clark
  0 siblings, 1 reply; 42+ messages in thread
From: Mark Rutland @ 2020-02-28 16:01 UTC (permalink / raw)
  To: James Clark
  Cc: Adrian Hunter, jolsa, linux-arm-kernel, linux-kernel, nd,
	Tan Xiaojun, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Al Grant,
	Namhyung Kim, Will Deacon

Hi James,

On Mon, Feb 24, 2020 at 05:08:26PM +0000, James Clark wrote:
> On 2/17/20 11:42 AM, Adrian Hunter wrote:
> > On 11/02/20 4:04 pm, James Clark wrote:
> >> From: Tan Xiaojun <tanxiaojun@huawei.com>
> >>
> >> At the suggestion of James Clark, use spe to support the precise
> >> ip of some events. Currently its support event is:
> >> branch-misses.
> >>
> >> Example usage:
> >>
> >> $ ./perf record -e branch-misses:pp dd if=/dev/zero of=/dev/null count=10000
> >> (:p/pp/ppp is same for this case.)
> >>
> >> $ ./perf report --stdio
> >> ("--stdio is not necessary")
> >>
> >> --------------------------------------------------------------------
> >> ...
> >>  # Samples: 14  of event 'branch-misses:pp'
> >>  # Event count (approx.): 14
> >>  #
> >>  # Children      Self  Command  Shared Object      Symbol
> >>  # ........  ........  .......  .................  ..........................
> >>  #
> >>     14.29%    14.29%  dd       [kernel.kallsyms]  [k] __arch_copy_from_user
> >>     14.29%    14.29%  dd       libc-2.28.so       [.] _dl_addr
> >>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] __free_pages
> >>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] __pi_memcpy
> >>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] pagecache_get_page
> >>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] unmap_single_vma
> >>      7.14%     7.14%  dd       dd                 [.] 0x00000000000025ec
> >>      7.14%     7.14%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
> >>      7.14%     7.14%  dd       ld-2.28.so         [.] check_match
> >>      7.14%     7.14%  dd       libc-2.28.so       [.] __mpn_rshift
> >>      7.14%     7.14%  dd       libc-2.28.so       [.] _nl_intern_locale_data
> >>      7.14%     7.14%  dd       libc-2.28.so       [.] read_alias_file
> >> ...
> >> --------------------------------------------------------------------
> >>
> >> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
> >> Suggested-by: James Clark <James.Clark@arm.com>
> >> Tested-by: Qi Liu <liuqi115@hisilicon.com>
> >> Signed-off-by: James Clark <james.clark@arm.com>
> >> Cc: Will Deacon <will@kernel.org>
> >> Cc: Mark Rutland <mark.rutland@arm.com>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> >> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> >> Cc: Jiri Olsa <jolsa@redhat.com>
> >> Cc: Tan Xiaojun <tanxiaojun@huawei.com>
> >> Cc: Al Grant <al.grant@arm.com>
> >> Cc: Namhyung Kim <namhyung@kernel.org>
> >> ---
> >>  tools/perf/arch/arm/util/auxtrace.c | 38 +++++++++++++++++++++++++++++
> >>  tools/perf/builtin-record.c         |  5 ++++
> >>  tools/perf/util/arm-spe.c           |  9 +++++++
> >>  tools/perf/util/arm-spe.h           |  3 +++
> >>  tools/perf/util/auxtrace.h          |  6 +++++
> >>  5 files changed, 61 insertions(+)
> >>
> >> diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
> >> index 0a6e75b8777a..18f0ea7556e7 100644
> >> --- a/tools/perf/arch/arm/util/auxtrace.c
> >> +++ b/tools/perf/arch/arm/util/auxtrace.c
> >> @@ -10,11 +10,25 @@
> >>  
> >>  #include "../../util/auxtrace.h"
> >>  #include "../../util/debug.h"
> >> +#include "../../util/env.h"
> >>  #include "../../util/evlist.h"
> >>  #include "../../util/pmu.h"
> >>  #include "cs-etm.h"
> >>  #include "arm-spe.h"
> >>  
> >> +#define SPE_ATTR_TS_ENABLE		BIT(0)
> >> +#define SPE_ATTR_PA_ENABLE		BIT(1)
> >> +#define SPE_ATTR_PCT_ENABLE		BIT(2)
> >> +#define SPE_ATTR_JITTER			BIT(16)
> >> +#define SPE_ATTR_BRANCH_FILTER		BIT(32)
> >> +#define SPE_ATTR_LOAD_FILTER		BIT(33)
> >> +#define SPE_ATTR_STORE_FILTER		BIT(34)
> >> +
> >> +#define SPE_ATTR_EV_RETIRED		BIT(1)
> >> +#define SPE_ATTR_EV_CACHE		BIT(3)
> >> +#define SPE_ATTR_EV_TLB			BIT(5)
> >> +#define SPE_ATTR_EV_BRANCH		BIT(7)
> >> +
> >>  static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
> >>  {
> >>  	struct perf_pmu **arm_spe_pmus = NULL;
> >> @@ -108,3 +122,27 @@ struct auxtrace_record
> >>  	*err = 0;
> >>  	return NULL;
> >>  }
> >> +
> >> +void auxtrace__preprocess_evlist(struct evlist *evlist)
> >> +{
> >> +	struct evsel *evsel;
> >> +	struct perf_pmu *pmu;
> >> +
> >> +	evlist__for_each_entry(evlist, evsel) {
> >> +		/* Currently only supports precise_ip for branch-misses on arm64 */
> >> +		if (!strcmp(perf_env__arch(evlist->env), "arm64")
> > 
> > Isn't config ambiguous unless you also check type i.e.
> > 
> > 			&& evsel->core.attr.type == PERF_TYPE_HARDWARE
> > 
> 
> Yes you're right I will add this.
> 
> >> +			&& evsel->core.attr.config == PERF_COUNT_HW_BRANCH_MISSES
> >> +			&& evsel->core.attr.precise_ip)
> >> +		{
> >> +			pmu = perf_pmu__find("arm_spe_0");
> >> +			if (pmu) {
> > 
> > Changing the event seems a bit weird.
> > 
> 
> This is because there is no support in the kernel for the precise_ip attribute on Arm.
> SPE can give you precise ip data for the same event, but changing the event is required.

I don't think this is the right level to override the event.

It's true that contemporary CPU PMUs can't generate synchronous events,
and hence the kernel doesn't have a precise IP, but that's not
necessarily going to be the case in future. I'd rather we didn't
silently override the event requested by the user, as I think that is
going to cause more problems for us in future.

When the SPE buffer overflows, events will be silently dropped, which I
don't believe is in keeping with the usual semantics of precise events.
Additionally, hard-coding the "arm_spe_0" name means that this will not
work reliably on big.LITTLE systems.

Instead, can we have the user explicitly request to use the SPE PMU in
this way? If the perf tool could be smart with the "_%d" suffix, and
collate PMUs differing only by that, the user would only need to do
something like:

  perf record -e arm_spe/branch-misses/pp

... which doesn't seem to onerous.

Is something like that possible?

Thanks,
Mark.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 4/4] perf tools: Support "branch-misses:pp" on arm64
  2020-02-25 11:57                   ` [PATCH v5 4/4] perf tools: Support "branch-misses:pp" on arm64 James Clark
@ 2020-02-28 16:03                     ` Mark Rutland
  0 siblings, 0 replies; 42+ messages in thread
From: Mark Rutland @ 2020-02-28 16:03 UTC (permalink / raw)
  To: James Clark
  Cc: adrian.hunter, jolsa, linux-arm-kernel, linux-kernel, nd,
	Tan Xiaojun, Will Deacon, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Al Grant,
	Namhyung Kim

Hi James,

Sorry, I missed this v5 when replying to v4 just now, but my comments
there equally apply here: I don't think that we should be silently
overriding the event requested by the user, and I think that we can make
that request explicit without being too painful for the user.

Thanks,
Mark.

On Tue, Feb 25, 2020 at 11:57:39AM +0000, James Clark wrote:
> From: Tan Xiaojun <tanxiaojun@huawei.com>
> 
> At the suggestion of James Clark, use spe to support the precise
> ip of some events. Currently its support event is:
> branch-misses.
> 
> Example usage:
> 
> $ ./perf record -e branch-misses:pp dd if=/dev/zero of=/dev/null count=10000
> (:p/pp/ppp is same for this case.)
> 
> $ ./perf report --stdio
> ("--stdio is not necessary")
> 
> --------------------------------------------------------------------
> ...
>  # Samples: 14  of event 'branch-misses:pp'
>  # Event count (approx.): 14
>  #
>  # Children      Self  Command  Shared Object      Symbol
>  # ........  ........  .......  .................  ..........................
>  #
>     14.29%    14.29%  dd       [kernel.kallsyms]  [k] __arch_copy_from_user
>     14.29%    14.29%  dd       libc-2.28.so       [.] _dl_addr
>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] __free_pages
>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] __pi_memcpy
>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] pagecache_get_page
>      7.14%     7.14%  dd       [kernel.kallsyms]  [k] unmap_single_vma
>      7.14%     7.14%  dd       dd                 [.] 0x00000000000025ec
>      7.14%     7.14%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
>      7.14%     7.14%  dd       ld-2.28.so         [.] check_match
>      7.14%     7.14%  dd       libc-2.28.so       [.] __mpn_rshift
>      7.14%     7.14%  dd       libc-2.28.so       [.] _nl_intern_locale_data
>      7.14%     7.14%  dd       libc-2.28.so       [.] read_alias_file
> ...
> --------------------------------------------------------------------
> 
> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
> Suggested-by: James Clark <James.Clark@arm.com>
> Tested-by: Qi Liu <liuqi115@hisilicon.com>
> Signed-off-by: James Clark <james.clark@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Tan Xiaojun <tanxiaojun@huawei.com>
> Cc: Al Grant <al.grant@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/arch/arm/util/auxtrace.c | 39 +++++++++++++++++++++++++++++
>  tools/perf/builtin-record.c         |  5 ++++
>  tools/perf/util/arm-spe.c           |  9 +++++++
>  tools/perf/util/arm-spe.h           |  3 +++
>  tools/perf/util/auxtrace.h          |  6 +++++
>  5 files changed, 62 insertions(+)
> 
> diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
> index 0a6e75b8777a..7f412b7894ab 100644
> --- a/tools/perf/arch/arm/util/auxtrace.c
> +++ b/tools/perf/arch/arm/util/auxtrace.c
> @@ -10,11 +10,25 @@
>  
>  #include "../../util/auxtrace.h"
>  #include "../../util/debug.h"
> +#include "../../util/env.h"
>  #include "../../util/evlist.h"
>  #include "../../util/pmu.h"
>  #include "cs-etm.h"
>  #include "arm-spe.h"
>  
> +#define SPE_ATTR_TS_ENABLE		BIT(0)
> +#define SPE_ATTR_PA_ENABLE		BIT(1)
> +#define SPE_ATTR_PCT_ENABLE		BIT(2)
> +#define SPE_ATTR_JITTER			BIT(16)
> +#define SPE_ATTR_BRANCH_FILTER		BIT(32)
> +#define SPE_ATTR_LOAD_FILTER		BIT(33)
> +#define SPE_ATTR_STORE_FILTER		BIT(34)
> +
> +#define SPE_ATTR_EV_RETIRED		BIT(1)
> +#define SPE_ATTR_EV_CACHE		BIT(3)
> +#define SPE_ATTR_EV_TLB			BIT(5)
> +#define SPE_ATTR_EV_BRANCH		BIT(7)
> +
>  static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
>  {
>  	struct perf_pmu **arm_spe_pmus = NULL;
> @@ -108,3 +122,28 @@ struct auxtrace_record
>  	*err = 0;
>  	return NULL;
>  }
> +
> +void auxtrace__preprocess_evlist(struct evlist *evlist)
> +{
> +	struct evsel *evsel;
> +	struct perf_pmu *pmu;
> +
> +	evlist__for_each_entry(evlist, evsel) {
> +		/* Currently only supports precise_ip for branch-misses on arm64 */
> +		if (!strcmp(perf_env__arch(evlist->env), "arm64")
> +			&& evsel->core.attr.config == PERF_COUNT_HW_BRANCH_MISSES
> +			&& evsel->core.attr.type == PERF_TYPE_HARDWARE
> +			&& evsel->core.attr.precise_ip)
> +		{
> +			pmu = perf_pmu__find("arm_spe_0");
> +			if (pmu) {
> +				evsel->pmu_name = pmu->name;
> +				evsel->core.attr.type = pmu->type;
> +				evsel->core.attr.config = SPE_ATTR_TS_ENABLE
> +							| SPE_ATTR_BRANCH_FILTER;
> +				evsel->core.attr.config1 = SPE_ATTR_EV_BRANCH;
> +				evsel->core.attr.precise_ip = 0;
> +			}
> +		}
> +	}
> +}
> \ No newline at end of file
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 4c301466101b..3bc61f03d572 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -2451,6 +2451,11 @@ int cmd_record(int argc, const char **argv)
>  
>  	argc = parse_options(argc, argv, record_options, record_usage,
>  			    PARSE_OPT_STOP_AT_NON_OPTION);
> +
> +	if (auxtrace__preprocess_evlist) {
> +		auxtrace__preprocess_evlist(rec->evlist);
> +	}
> +
>  	if (quiet)
>  		perf_quiet_option();
>  
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 4ef22a0775a9..b21806c97dd8 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -778,6 +778,15 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
>  	attr.sample_id_all = evsel->core.attr.sample_id_all;
>  	attr.read_format = evsel->core.attr.read_format;
>  
> +	/* If it is in the precise ip mode, there is no need to
> +	 * synthesize new events. */
> +	if (!strncmp(evsel->name, "branch-misses", 13)) {
> +		spe->sample_branch_miss = true;
> +		spe->branch_miss_id = evsel->core.id[0];
> +
> +		return 0;
> +	}
> +
>  	/* create new id val to be a fixed offset from evsel id */
>  	id = evsel->core.id[0] + 1000000000;
>  
> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
> index 98d3235781c3..8b1fb191d03a 100644
> --- a/tools/perf/util/arm-spe.h
> +++ b/tools/perf/util/arm-spe.h
> @@ -20,6 +20,8 @@ enum {
>  union perf_event;
>  struct perf_session;
>  struct perf_pmu;
> +struct evlist;
> +struct evsel;
>  
>  struct auxtrace_record *arm_spe_recording_init(int *err,
>  					       struct perf_pmu *arm_spe_pmu);
> @@ -28,4 +30,5 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>  				  struct perf_session *session);
>  
>  struct perf_event_attr *arm_spe_pmu_default_config(struct perf_pmu *arm_spe_pmu);
> +void arm_spe_precise_ip_support(struct evlist *evlist, struct evsel *evsel);
>  #endif
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index 52e148eea7f8..4be56bca54dc 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -584,6 +584,7 @@ void auxtrace__dump_auxtrace_sample(struct perf_session *session,
>  int auxtrace__flush_events(struct perf_session *session, struct perf_tool *tool);
>  void auxtrace__free_events(struct perf_session *session);
>  void auxtrace__free(struct perf_session *session);
> +void auxtrace__preprocess_evlist(struct evlist *evlist) __attribute__((weak));
>  
>  #define ITRACE_HELP \
>  "				i:	    		synthesize instructions events\n"		\
> @@ -731,6 +732,11 @@ void auxtrace__free(struct perf_session *session __maybe_unused)
>  {
>  }
>  
> +static inline
> +void auxtrace__preprocess_evlist(struct evlist *evlist __maybe_unused)
> +{
> +}
> +
>  static inline
>  int auxtrace_index__write(int fd __maybe_unused,
>  			  struct list_head *head __maybe_unused)
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 2/4] perf tools: Add support for "report" for some spe events
  2020-02-25 11:57                   ` [PATCH v5 2/4] perf tools: Add support for "report" for some spe events James Clark
@ 2020-02-29  6:51                     ` Leo Yan
  0 siblings, 0 replies; 42+ messages in thread
From: Leo Yan @ 2020-02-29  6:51 UTC (permalink / raw)
  To: James Clark
  Cc: adrian.hunter, jolsa, linux-arm-kernel, linux-kernel, nd,
	Tan Xiaojun, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Al Grant, Namhyung Kim

Hi James, Xiaojun,

On Tue, Feb 25, 2020 at 11:57:37AM +0000, James Clark wrote:
> From: Tan Xiaojun <tanxiaojun@huawei.com>
> 
> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
> Profiling Extensions (SPE) support") is merged, "perf record" and
> "perf report --dump-raw-trace" have been supported. However, the
> raw data that is dumped cannot be used without parsing.
> 
> This patch is to improve the "perf report" support for spe, and

Usually the capital letters are used for abbreviation, so s/spe/SPE.

> further process the data. Currently, support for the four events
> of llc-miss, tlb-miss, branch-miss, and remote-access is added.

checkpatch.pl report 1 error and 10 warnings at my side, please consider
to fix them.

> Example usage:
> 
> $ ./perf record -c 1024 -e arm_spe_0/branch_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ -o perf-armspe-dd.data dd if=/dev/zero of=/dev/null count=10000

If we need to input many configurations when use SPE, it might be not
friendly for users.  It's good to use default values as possible, and
I'd like to suggest to write a document in Documentation/trace/
folder.

> $ ./perf report -i perf-armspe-dd.data --stdio
> --------------------------------------------------------------------
> ...
>  # Samples: 23  of event 'llc-miss'
>  # Event count (approx.): 23
> ...
>     33.33%    33.33%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>     12.12%    12.12%  dd       [kernel.kallsyms]  [k] perf_event_mmap
>      6.06%     6.06%  dd       [kernel.kallsyms]  [k] copy_page
>      6.06%     6.06%  dd       ld-2.28.so         [.] _dl_relocate_object
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] change_protection_range
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] filemap_map_pages
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] generic_permission
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] kmem_cache_alloc
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] lookup_fast
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] perf_event_exec
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
>      3.03%     3.03%  dd       [kernel.kallsyms]  [k] ring_buffer_record_is_on
>      3.03%     3.03%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
>      3.03%     3.03%  dd       ld-2.28.so         [.] _dl_start
>      3.03%     3.03%  dd       ld-2.28.so         [.] dl_main
>      3.03%     3.03%  dd       ld-2.28.so         [.] strcmp
>      3.03%     3.03%  dd       libc-2.28.so       [.] _dl_addr
> ...
>  # Samples: 3  of event 'tlb-miss'
>  # Event count (approx.): 3
> ...
>     33.33%    33.33%  dd       [kernel.kallsyms]  [k] filemap_map_pages
>     33.33%    33.33%  dd       ld-2.28.so         [.] _dl_start
>     33.33%    33.33%  dd       ld-2.28.so         [.] dl_main
> ...
>  # Samples: 20  of event 'branch-miss'
>  # Event count (approx.): 20
> ...
>     15.38%    15.38%  dd       [kernel.kallsyms]  [k] __fput
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] do_el0_ia_bp_hardening
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] filemap_map_pages
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] pagevec_lru_move_fn
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] perf_event_mmap_output
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] task_work_run
>      7.69%     7.69%  dd       [kernel.kallsyms]  [k] unmap_single_vma
>      7.69%     7.69%  dd       libc-2.28.so       [.] _IO_flush_all_lockp
>      7.69%     7.69%  dd       libc-2.28.so       [.] __memcpy_generic
>      7.69%     7.69%  dd       libc-2.28.so       [.] _dl_addr
>      7.69%     7.69%  dd       libc-2.28.so       [.] msort_with_tmp.part.0
>      7.69%     7.69%  dd       libc-2.28.so       [.] read_alias_file
> ...
>  # Samples: 5  of event 'remote-access'
>  # Event count (approx.): 5
> ...
>     27.78%    27.78%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>     16.67%    16.67%  dd       [kernel.kallsyms]  [k] perf_event_mmap
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] change_protection_range
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] filemap_map_pages
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] generic_permission
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] lookup_fast
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] perf_event_exec
>      5.56%     5.56%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
>      5.56%     5.56%  dd       ld-2.28.so         [.] _dl_relocate_object
>      5.56%     5.56%  dd       ld-2.28.so         [.] _dl_start
>      5.56%     5.56%  dd       ld-2.28.so         [.] dl_main
> 
> --------------------------------------------------------------------
> After that, more analysis and processing of the raw data of spe
> will be done.
> 
> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
> Tested-by: Qi Liu <liuqi115@hisilicon.com>
> Signed-off-by: James Clark <james.clark@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Tan Xiaojun <tanxiaojun@huawei.com>
> Cc: Al Grant <al.grant@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/arm-spe-decoder/Build         |   2 +-
>  .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
>  .../arm-spe-decoder/arm-spe-pkt-decoder.h     |   2 +
>  tools/perf/util/arm-spe.c                     | 745 +++++++++++++++++-
>  tools/perf/util/auxtrace.c                    |  13 +
>  tools/perf/util/auxtrace.h                    |   8 +-
>  7 files changed, 1022 insertions(+), 39 deletions(-)
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> 
> diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
> index 16efbc245028..f8dae13fc876 100644
> --- a/tools/perf/util/arm-spe-decoder/Build
> +++ b/tools/perf/util/arm-spe-decoder/Build
> @@ -1 +1 @@
> -perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
> +perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> new file mode 100644
> index 000000000000..50e796b89a95
> --- /dev/null
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arm_spe_decoder.c: ARM SPE support
> + */
> +
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif
> +#include <stdlib.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +#include <linux/compiler.h>
> +#include <linux/zalloc.h>

List headers with alphabetical order.

> +
> +#include "../util.h"
> +#include "../debug.h"
> +#include "../auxtrace.h"
> +
> +#include "arm-spe-pkt-decoder.h"
> +#include "arm-spe-decoder.h"
> +
> +#ifndef BIT
> +#define BIT(n)		(1UL << (n))
> +#endif
> +
> +struct arm_spe_decoder {
> +	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
> +	void *data;
> +	struct arm_spe_state state;
> +	const unsigned char *buf;
> +	size_t len;
> +	uint64_t pos;

It's good to use U64 as type rather than uint64_t.

> +	struct arm_spe_pkt packet;
> +	int pkt_step;
> +	int pkt_len;
> +	int last_packet_type;
> +
> +	uint64_t last_ip;
> +	uint64_t ip;
> +	uint64_t timestamp;
> +	uint64_t sample_timestamp;
> +	const unsigned char *next_buf;
> +	size_t next_len;
> +	unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
> +};
> +
> +static uint64_t arm_spe_calc_ip(uint64_t payload)
> +{
> +	uint64_t ip = (payload & ~(0xffULL << 56));
> +
> +	/* fill high 8 bits for kernel virtual address */
> +	/* In Armv8 Architecture Reference Manual: Xn[55] determines

If refer to ARMv8-ARM, it's good to give out the exactly document
number, e.g. ARM DDI 0487E.a.

> +	 * whether the address lies in the upper or lower address range
> +	 * for the purpose of determining whether address tagging is
> +	 * used */

Multiple lines comments use the fashion like:

        /*
         * Comments ...
         *    ...  end
         */

> +	if (ip & BIT(55))
> +		ip |= (uint64_t)(0xffULL << 56);

Sorry I might miss something at here when I searched the spec.

Please give more detailed section for the packet format.  I read the
section D10.2.1 'Address packet' and sub section 'Address packet
payload', but doesn't see any description for BIT 55.

I also don't see any handling for below sub types:

- Data access physical address;
- Data access virtual address;
- Instruction virtual address.

> +
> +	return ip;
> +}
> +
> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
> +{
> +	struct arm_spe_decoder *decoder;
> +
> +	if (!params->get_trace)
> +		return NULL;
> +
> +	decoder = zalloc(sizeof(struct arm_spe_decoder));
> +	if (!decoder)
> +		return NULL;
> +
> +	decoder->get_trace          = params->get_trace;
> +	decoder->data               = params->data;

Don't use indent before assignment.

> +
> +	return decoder;
> +}
> +
> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
> +{
> +	free(decoder);
> +}
> +
> +static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
> +{
> +	decoder->pkt_len = 1;
> +	decoder->pkt_step = 1;

I don't find decoder->pkt_len is used in any place.

> +	pr_debug("ERROR: Bad packet\n");

For error, it's good to use pr_err() rather than pr_debug().

> +
> +	return -EBADMSG;
> +}
> +
> +

Duplicate new lines.

> +static int arm_spe_get_data(struct arm_spe_decoder *decoder)
> +{
> +	struct arm_spe_buffer buffer = { .buf = 0, };
> +	int ret;
> +
> +	decoder->pkt_step = 0;
> +
> +	pr_debug("Getting more data\n");

I'd like to remove the debugging info without any concrete info, if
this log is used for debugging flow, we can use GDB alternatively.

> +	ret = decoder->get_trace(&buffer, decoder->data);
> +	if (ret)
> +		return ret;
> +
> +	decoder->buf = buffer.buf;
> +	decoder->len = buffer.len;
> +	if (!decoder->len) {
> +		pr_debug("No more data\n");
> +		return -ENODATA;

This is the normal end of trace data, I don't think we need to return
error number for this case.

> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
> +{
> +	return arm_spe_get_data(decoder);

The two functions arm_spe_get_next_data() and arm_spe_get_data() do
the exactly same thing, so remove arm_spe_get_data()?

> +}
> +
> +static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
> +{
> +	int ret;
> +
> +	decoder->last_packet_type = decoder->packet.type;
> +
> +	do {
> +		decoder->pos += decoder->pkt_step;
> +		decoder->buf += decoder->pkt_step;
> +		decoder->len -= decoder->pkt_step;
> +
> +

Redundant new line.

> +		if (!decoder->len) {
> +			ret = arm_spe_get_next_data(decoder);
> +			if (ret)
> +				return ret;
> +		}
> +
> +		ret = arm_spe_get_packet(decoder->buf, decoder->len,
> +				&decoder->packet);
> +		if (ret <= 0)
> +			return arm_spe_bad_packet(decoder);
> +
> +		decoder->pkt_len = ret;
> +		decoder->pkt_step = ret;
> +	} while (decoder->packet.type == ARM_SPE_PAD);
> +
> +	return 0;
> +}
> +
> +static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
> +{
> +	int err;
> +	int idx;
> +	uint64_t payload;
> +
> +	while (1) {

I am confused by why here it needs to use 'while (1)' to traverse all
packets.

Let's see below logic, if arm_spe_walk_trace() uses 'while (1)' to
parse all packets, and then return to up layer to generate samples.
Seems to me, the more reasonable logic is to parse one packet and
directly return to up layer for samples synthesizing.

  arm_spe_run_decoder()  {
    while (1) {
      arm_spe_sample()            => synthesize sample.
      arm_spe_decode()
        `-> arm_spe_walk_trace()  => go through all packets.
    }
  }

> +		err = arm_spe_get_next_packet(decoder);
> +		if (err)
> +			return err;
> +
> +		idx = decoder->packet.index;
> +		payload = decoder->packet.payload;
> +
> +		switch (decoder->packet.type) {
> +		case ARM_SPE_TIMESTAMP:
> +			decoder->sample_timestamp = payload;
> +			return 0;
> +		case ARM_SPE_END:
> +			decoder->sample_timestamp = 0;
> +			return 0;
> +		case ARM_SPE_ADDRESS:
> +			decoder->ip = arm_spe_calc_ip(payload);
> +			if (idx == 0)

Define macros for idx's 0 and 1, this would be more readable.

> +				decoder->state.from_ip = decoder->ip;
> +			else if (idx == 1)
> +				decoder->state.to_ip = decoder->ip;
> +			break;
> +		case ARM_SPE_COUNTER:
> +			break;
> +		case ARM_SPE_CONTEXT:

I think it misses to read out process ID.

> +			break;
> +		case ARM_SPE_OP_TYPE:
> +			break;
> +		case ARM_SPE_EVENTS:
> +			if (payload & BIT(EV_TLB_REFILL))
> +				decoder->state.type |= ARM_SPE_TLB_MISS;
> +			if (payload & BIT(EV_MISPRED))
> +				decoder->state.type |= ARM_SPE_BRANCH_MISS;
> +			if (idx > 1 && (payload & BIT(EV_LLC_REFILL)))
> +				decoder->state.type |= ARM_SPE_LLC_MISS;
> +			if (idx > 1 && (payload & BIT(EV_REMOTE_ACCESS)))
> +				decoder->state.type |= ARM_SPE_REMOTE_ACCESS;
> +
> +			break;
> +		case ARM_SPE_DATA_SOURCE:
> +			break;
> +		case ARM_SPE_BAD:
> +			break;
> +		case ARM_SPE_PAD:
> +			break;
> +		default:
> +			pr_err("Get Packet Error!\n");
> +			return -ENOSYS;
> +		}
> +	}
> +}
> +
> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
> +{
> +	int err;
> +
> +	decoder->state.type = 0;
> +
> +	err = arm_spe_walk_trace(decoder);
> +	if (err)
> +		decoder->state.err = err;
> +
> +	decoder->state.timestamp = decoder->sample_timestamp;
> +
> +	return &decoder->state;

Since decoder::state can be fetched by the caller, it's pointless to
return &decoder->state.  I think it's better to return error code for
the function rather than return a structure pointer.

> +}
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> new file mode 100644
> index 000000000000..330f9e1e71ab
> --- /dev/null
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arm_spe_decoder.c: ARM SPE support
> + */
> +
> +#ifndef INCLUDE__ARM_SPE_DECODER_H__
> +#define INCLUDE__ARM_SPE_DECODER_H__
> +
> +#include <stdint.h>
> +#include <stddef.h>
> +#include <stdbool.h>
> +
> +enum arm_spe_events {
> +	EV_EXCEPTION_GEN,
> +	EV_RETIRED,
> +	EV_L1D_ACCESS,
> +	EV_L1D_REFILL,
> +	EV_TLB_ACCESS,
> +	EV_TLB_REFILL,
> +	EV_NOT_TAKEN,
> +	EV_MISPRED,
> +	EV_LLC_ACCESS,
> +	EV_LLC_REFILL,
> +	EV_REMOTE_ACCESS,
> +};
> +
> +enum arm_spe_sample_type {
> +	ARM_SPE_LLC_MISS	= 1 << 0,
> +	ARM_SPE_TLB_MISS	= 1 << 1,
> +	ARM_SPE_BRANCH_MISS	= 1 << 2,
> +	ARM_SPE_REMOTE_ACCESS	= 1 << 3,
> +	ARM_SPE_EX_STOP		= 1 << 6,
> +};
> +
> +struct arm_spe_state {
> +	enum arm_spe_sample_type type;
> +	int err;
> +	uint64_t from_ip;
> +	uint64_t to_ip;
> +	uint64_t timestamp;
> +};
> +
> +struct arm_spe_insn;
> +
> +struct arm_spe_buffer {
> +	const unsigned char *buf;
> +	size_t len;
> +	u64 offset;
> +	bool consecutive;
> +	uint64_t ref_timestamp;
> +	uint64_t trace_nr;
> +};
> +
> +struct arm_spe_params {
> +	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
> +	void *data;
> +};
> +
> +struct arm_spe_decoder;
> +
> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params);
> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder);
> +
> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder);
> +
> +#endif
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
> index d786ef65113f..865d1e35b401 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
> @@ -15,6 +15,8 @@
>  #define ARM_SPE_NEED_MORE_BYTES		-1
>  #define ARM_SPE_BAD_PACKET		-2
>  
> +#define ARM_SPE_PKT_MAX_SZ		16
> +
>  enum arm_spe_pkt_type {
>  	ARM_SPE_BAD,
>  	ARM_SPE_PAD,
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index f3382a38d48e..4ef22a0775a9 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -16,34 +16,68 @@
>  #include <linux/log2.h>
>  #include <linux/zalloc.h>
>  
> +#include "auxtrace.h"
>  #include "color.h"
> +#include "debug.h"
>  #include "evsel.h"
> +#include "evlist.h"

Alphabetical order.

>  #include "machine.h"
>  #include "session.h"
> -#include "debug.h"
> -#include "auxtrace.h"
> +#include "symbol.h"
> +#include "thread.h"
> +#include "thread-stack.h"
> +#include "tool.h"
> +#include "util/synthetic-events.h"
> +
>  #include "arm-spe.h"
> +#include "arm-spe-decoder/arm-spe-decoder.h"
>  #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
>  
> +#define MAX_TIMESTAMP (~0ULL)
> +
>  struct arm_spe {
>  	struct auxtrace			auxtrace;
>  	struct auxtrace_queues		queues;
>  	struct auxtrace_heap		heap;
> +        struct itrace_synth_opts        synth_opts;

Tab indent.

>  	u32				auxtrace_type;
>  	struct perf_session		*session;
>  	struct machine			*machine;
>  	u32				pmu_type;
> +
> +	u8				timeless_decoding;
> +	u8				data_queued;
> +
> +	u8				sample_llc_miss;
> +	u8				sample_tlb_miss;
> +	u8				sample_branch_miss;
> +	u8				sample_remote_access;
> +	u64				llc_miss_id;
> +	u64				tlb_miss_id;
> +	u64				branch_miss_id;
> +	u64				remote_access_id;
> +	u64				kernel_start;
> +
> +	unsigned long			num_events;
>  };
>  
>  struct arm_spe_queue {
> -	struct arm_spe		*spe;
> -	unsigned int		queue_nr;
> -	struct auxtrace_buffer	*buffer;
> -	bool			on_heap;
> -	bool			done;
> -	pid_t			pid;
> -	pid_t			tid;
> -	int			cpu;
> +	struct arm_spe			*spe;
> +	unsigned int			queue_nr;
> +	struct auxtrace_buffer		*buffer;
> +	struct auxtrace_buffer		*old_buffer;
> +	union perf_event		*event_buf;
> +	bool				on_heap;
> +	bool				done;
> +	pid_t				pid;
> +	pid_t				tid;
> +	int				cpu;
> +	void				*decoder;
> +	const struct arm_spe_state	*state;
> +	u64				time;
> +	u64				timestamp;
> +	struct thread			*thread;
> +	bool				have_sample;
>  };
>  
>  static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
> @@ -92,44 +126,494 @@ static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
>  	arm_spe_dump(spe, buf, len);
>  }
>  
> -static int arm_spe_process_event(struct perf_session *session __maybe_unused,
> -				 union perf_event *event __maybe_unused,
> -				 struct perf_sample *sample __maybe_unused,
> -				 struct perf_tool *tool __maybe_unused)
> +static int arm_spe_get_trace(struct arm_spe_buffer *b, void *data)
> +{
> +	struct arm_spe_queue *speq = data;
> +	struct auxtrace_buffer *buffer = speq->buffer;
> +	struct auxtrace_buffer *old_buffer = speq->old_buffer;
> +	struct auxtrace_queue *queue;
> +
> +	queue = &speq->spe->queues.queue_array[speq->queue_nr];
> +
> +	buffer = auxtrace_buffer__next(queue, buffer);
> +	/* If no more data, drop the previous auxtrace_buffer and return */
> +	if (!buffer) {
> +		if (old_buffer)
> +			auxtrace_buffer__drop_data(old_buffer);
> +		b->len = 0;
> +		return 0;
> +	}
> +
> +	speq->buffer = buffer;
> +
> +	/* If the aux_buffer doesn't have data associated, try to load it */
> +	if (!buffer->data) {
> +		/* get the file desc associated with the perf data file */
> +		int fd = perf_data__fd(speq->spe->session->data);
> +
> +		buffer->data = auxtrace_buffer__get_data(buffer, fd);
> +		if (!buffer->data)
> +			return -ENOMEM;
> +	}
> +
> +	if (buffer->use_data) {
> +		b->len = buffer->use_size;
> +		b->buf = buffer->use_data;
> +	} else {
> +		b->len = buffer->size;
> +		b->buf = buffer->data;
> +	}
> +
> +	b->ref_timestamp = buffer->reference;
> +
> +	if (b->len) {
> +		if (old_buffer)
> +			auxtrace_buffer__drop_data(old_buffer);
> +		speq->old_buffer = buffer;
> +	} else {
> +		auxtrace_buffer__drop_data(buffer);
> +		return arm_spe_get_trace(b, data);
> +	}
> +
> +	return 0;
> +}
> +
> +static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe,
> +		unsigned int queue_nr)
> +{
> +	struct arm_spe_params params = { .get_trace = 0, };
> +	struct arm_spe_queue *speq;
> +
> +	speq = zalloc(sizeof(*speq));
> +	if (!speq)
> +		return NULL;
> +
> +	speq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
> +	if (!speq->event_buf)
> +		goto out_free;
> +
> +	speq->spe = spe;
> +	speq->queue_nr = queue_nr;
> +	speq->pid = -1;
> +	speq->tid = -1;
> +	speq->cpu = -1;
> +
> +	/* params set */
> +	params.get_trace = arm_spe_get_trace;
> +	params.data = speq;
> +
> +	/* create new decoder */
> +	speq->decoder = arm_spe_decoder_new(&params);
> +	if (!speq->decoder)
> +		goto out_free;
> +
> +	return speq;
> +
> +out_free:
> +	zfree(&speq->event_buf);
> +	free(speq);
> +
> +	return NULL;
> +}
> +
> +static inline u8 arm_spe_cpumode(struct arm_spe *spe, uint64_t ip)
> +{
> +	return ip >= spe->kernel_start ?
> +		PERF_RECORD_MISC_KERNEL :
> +		PERF_RECORD_MISC_USER;
> +}
> +
> +static void arm_spe_prep_sample(struct arm_spe *spe,
> +				struct arm_spe_queue *speq,
> +				union perf_event *event,
> +				struct perf_sample *sample)
> +{
> +	if (!spe->timeless_decoding)
> +		sample->time = speq->timestamp;
> +
> +	sample->ip = speq->state->from_ip;
> +	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
> +	sample->pid = speq->pid;
> +	sample->tid = speq->tid;
> +	sample->addr = speq->state->to_ip;
> +	sample->period = 1;
> +	sample->cpu = speq->cpu;
> +
> +	event->sample.header.type = PERF_RECORD_SAMPLE;
> +	event->sample.header.misc = sample->cpumode;
> +	event->sample.header.size = sizeof(struct perf_event_header);
> +}
> +
> +static inline int
> +arm_spe_deliver_synth_event(struct arm_spe *spe,
> +			    struct arm_spe_queue *speq __maybe_unused,
> +			    union perf_event *event,
> +			    struct perf_sample *sample)
> +{
> +	int ret;
> +
> +	ret = perf_session__deliver_synth_event(spe->session, event, sample);
> +	if (ret)
> +		pr_err("ARM SPE: failed to deliver event, error %d\n", ret);
> +
> +	return ret;
> +}
> +
> +static int
> +arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
> +				u64 spe_events_id)
> +{
> +	struct arm_spe *spe = speq->spe;
> +	union perf_event *event = speq->event_buf;
> +	struct perf_sample sample = { .ip = 0, };
> +
> +	arm_spe_prep_sample(spe, speq, event, &sample);
> +
> +	sample.id = spe_events_id;
> +	sample.stream_id = spe_events_id;
> +
> +	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
> +}
> +
> +static int arm_spe_sample(struct arm_spe_queue *speq)
> +{
> +	const struct arm_spe_state *state = speq->state;
> +	struct arm_spe *spe = speq->spe;
> +	int err;
> +
> +	if (!speq->have_sample)
> +		return 0;
> +
> +	speq->have_sample = false;
> +
> +	if (spe->sample_llc_miss && (state->type & ARM_SPE_LLC_MISS)) {
> +		err = arm_spe_synth_spe_events_sample(speq, spe->llc_miss_id);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (spe->sample_tlb_miss && (state->type & ARM_SPE_TLB_MISS)) {
> +		err = arm_spe_synth_spe_events_sample(speq, spe->tlb_miss_id);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (spe->sample_branch_miss && (state->type & ARM_SPE_BRANCH_MISS)) {
> +		err = arm_spe_synth_spe_events_sample(speq,
> +						      spe->branch_miss_id);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (spe->sample_remote_access && (state->type & ARM_SPE_REMOTE_ACCESS)) {
> +		err = arm_spe_synth_spe_events_sample(speq, spe->remote_access_id);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
> +{
> +	const struct arm_spe_state *state = speq->state;
> +	struct arm_spe *spe = speq->spe;
> +	int err;
> +
> +	if (!spe->kernel_start)
> +		spe->kernel_start = machine__kernel_start(spe->machine);
> +
> +	while (1) {
> +		err = arm_spe_sample(speq);
> +		if (err)
> +			return err;

Should reverse the flow between arm_spe_sample() and arm_spe_decode().

> +
> +		state = arm_spe_decode(speq->decoder);
> +		if (state->err) {
> +			if (state->err == -ENODATA) {
> +				pr_debug("No data or all data has been processed.\n");
> +				return 1;
> +			}
> +			continue;
> +		}
> +
> +		speq->state = state;
> +		speq->have_sample = true;
> +
> +		if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
> +			*timestamp = speq->timestamp;
> +			return 0;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe__setup_queue(struct arm_spe *spe,
> +			       struct auxtrace_queue *queue,
> +			       unsigned int queue_nr)
> +{
> +	struct arm_spe_queue *speq = queue->priv;
> +
> +	if (list_empty(&queue->head) || speq)
> +		return 0;
> +
> +	speq = arm_spe__alloc_queue(spe, queue_nr);
> +
> +	if (!speq)
> +		return -ENOMEM;
> +
> +	queue->priv = speq;
> +
> +	if (queue->cpu != -1)
> +		speq->cpu = queue->cpu;
> +
> +	if (!speq->on_heap) {
> +		const struct arm_spe_state *state;
> +		int ret;
> +
> +		if (spe->timeless_decoding)
> +			return 0;
> +
> +retry:
> +		state = arm_spe_decode(speq->decoder);
> +		if (state->err) {
> +			if (state->err == -ENODATA) {
> +				pr_debug("queue %u has no timestamp\n",
> +						queue_nr);
> +				return 0;
> +			}
> +			goto retry;
> +		}
> +
> +		speq->timestamp = state->timestamp;
> +		speq->state = state;
> +		speq->have_sample = true;
> +		ret = auxtrace_heap__add(&spe->heap, queue_nr, speq->timestamp);
> +		if (ret)
> +			return ret;
> +		speq->on_heap = true;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe__setup_queues(struct arm_spe *spe)
>  {
> +	unsigned int i;
> +	int ret;
> +
> +	for (i = 0; i < spe->queues.nr_queues; i++) {
> +		ret = arm_spe__setup_queue(spe, &spe->queues.queue_array[i], i);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe__update_queues(struct arm_spe *spe)
> +{
> +	if (spe->queues.new_data) {
> +		spe->queues.new_data = false;
> +		return arm_spe__setup_queues(spe);
> +	}
> +
>  	return 0;
>  }
>  
> +static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
> +{
> +	struct evsel *evsel;
> +	struct evlist *evlist = spe->session->evlist;
> +	bool timeless_decoding = true;
> +
> +	/*
> +	 * Circle through the list of event and complain if we find one
> +	 * with the time bit set.
> +	 */
> +	evlist__for_each_entry(evlist, evsel) {
> +		if ((evsel->core.attr.sample_type & PERF_SAMPLE_TIME))
> +			timeless_decoding = false;
> +	}
> +
> +	return timeless_decoding;
> +}
> +
> +static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
> +				    struct auxtrace_queue *queue)
> +{
> +	struct arm_spe_queue *speq = queue->priv;
> +	pid_t tid;
> +
> +	tid = machine__get_current_tid(spe->machine, speq->cpu);
> +	if (tid != -1) {
> +		speq->tid = tid;
> +		thread__zput(speq->thread);
> +	} else
> +		speq->tid = queue->tid;
> +
> +	if ((!speq->thread) && (speq->tid != -1)) {
> +		speq->thread = machine__find_thread(spe->machine, -1,
> +						    speq->tid);
> +	}
> +
> +	if (speq->thread) {
> +		speq->pid = speq->thread->pid_;
> +		if (queue->cpu == -1)
> +			speq->cpu = speq->thread->cpu;
> +	}
> +}
> +
> +static int arm_spe_process_queues(struct arm_spe *spe, u64 timestamp)
> +{
> +	unsigned int queue_nr;
> +	u64 ts;
> +	int ret;
> +
> +	while (1) {
> +		struct auxtrace_queue *queue;
> +		struct arm_spe_queue *speq;
> +
> +		if (!spe->heap.heap_cnt)
> +			return 0;
> +
> +		if (spe->heap.heap_array[0].ordinal >= timestamp)
> +			return 0;
> +
> +		queue_nr = spe->heap.heap_array[0].queue_nr;
> +		queue = &spe->queues.queue_array[queue_nr];
> +		speq = queue->priv;
> +
> +		auxtrace_heap__pop(&spe->heap);
> +
> +		if (spe->heap.heap_cnt) {
> +			ts = spe->heap.heap_array[0].ordinal + 1;
> +			if (ts > timestamp)
> +				ts = timestamp;
> +		} else {
> +			ts = timestamp;
> +		}
> +
> +		arm_spe_set_pid_tid_cpu(spe, queue);

I don't think this is right.

arm_spe_set_pid_tid_cpu() should be invoked by SPE decoder when SPE
decoder finds CONTEXT packet.

I will look into more detailed implementation at my side when I can
run the code on a test platform, and might give more comments after
get some trying.

Thanks,
Leo

> +
> +		ret = arm_spe_run_decoder(speq, &ts);
> +		if (ret < 0) {
> +			auxtrace_heap__add(&spe->heap, queue_nr, ts);
> +			return ret;
> +		}
> +
> +		if (!ret) {
> +			ret = auxtrace_heap__add(&spe->heap, queue_nr, ts);
> +			if (ret < 0)
> +				return ret;
> +		} else {
> +			speq->on_heap = false;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid,
> +					    u64 time_)
> +{
> +	struct auxtrace_queues *queues = &spe->queues;
> +	unsigned int i;
> +	u64 ts = 0;
> +
> +	for (i = 0; i < queues->nr_queues; i++) {
> +		struct auxtrace_queue *queue = &spe->queues.queue_array[i];
> +		struct arm_spe_queue *speq = queue->priv;
> +
> +		if (speq && (tid == -1 || speq->tid == tid)) {
> +			speq->time = time_;
> +			arm_spe_set_pid_tid_cpu(spe, queue);
> +			arm_spe_run_decoder(speq, &ts);
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int arm_spe_process_event(struct perf_session *session,
> +				 union perf_event *event,
> +				 struct perf_sample *sample,
> +				 struct perf_tool *tool)
> +{
> +	int err = 0;
> +	u64 timestamp;
> +	struct arm_spe *spe = container_of(session->auxtrace,
> +			struct arm_spe, auxtrace);
> +
> +	if (dump_trace)
> +		return 0;
> +
> +	if (!tool->ordered_events) {
> +		pr_err("CoreSight SPE Trace requires ordered events\n");
> +		return -EINVAL;
> +	}
> +
> +	if (sample->time && (sample->time != (u64) -1))
> +		timestamp = sample->time;
> +	else
> +		timestamp = 0;
> +
> +	if (timestamp || spe->timeless_decoding) {
> +		err = arm_spe__update_queues(spe);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (spe->timeless_decoding) {
> +		if (event->header.type == PERF_RECORD_EXIT) {
> +			err = arm_spe_process_timeless_queues(spe,
> +					event->fork.tid,
> +					sample->time);
> +		}
> +	} else if (timestamp) {
> +		if (event->header.type == PERF_RECORD_EXIT) {
> +			err = arm_spe_process_queues(spe, timestamp);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	return err;
> +}
> +
>  static int arm_spe_process_auxtrace_event(struct perf_session *session,
>  					  union perf_event *event,
>  					  struct perf_tool *tool __maybe_unused)
>  {
>  	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
>  					     auxtrace);
> -	struct auxtrace_buffer *buffer;
> -	off_t data_offset;
> -	int fd = perf_data__fd(session->data);
> -	int err;
>  
> -	if (perf_data__is_pipe(session->data)) {
> -		data_offset = 0;
> -	} else {
> -		data_offset = lseek(fd, 0, SEEK_CUR);
> -		if (data_offset == -1)
> -			return -errno;
> -	}
> +	if (!spe->data_queued) {
> +		struct auxtrace_buffer *buffer;
> +		off_t data_offset;
> +		int fd = perf_data__fd(session->data);
> +		int err;
>  
> -	err = auxtrace_queues__add_event(&spe->queues, session, event,
> -					 data_offset, &buffer);
> -	if (err)
> -		return err;
> -
> -	/* Dump here now we have copied a piped trace out of the pipe */
> -	if (dump_trace) {
> -		if (auxtrace_buffer__get_data(buffer, fd)) {
> -			arm_spe_dump_event(spe, buffer->data,
> -					     buffer->size);
> -			auxtrace_buffer__put_data(buffer);
> +		if (perf_data__is_pipe(session->data)) {
> +			data_offset = 0;
> +		} else {
> +			data_offset = lseek(fd, 0, SEEK_CUR);
> +			if (data_offset == -1)
> +				return -errno;
> +		}
> +
> +		err = auxtrace_queues__add_event(&spe->queues, session, event,
> +				data_offset, &buffer);
> +		if (err)
> +			return err;
> +
> +		/* Dump here now we have copied a piped trace out of the pipe */
> +		if (dump_trace) {
> +			if (auxtrace_buffer__get_data(buffer, fd)) {
> +				arm_spe_dump_event(spe, buffer->data,
> +						buffer->size);
> +				auxtrace_buffer__put_data(buffer);
> +			}
>  		}
>  	}
>  
> @@ -139,7 +623,25 @@ static int arm_spe_process_auxtrace_event(struct perf_session *session,
>  static int arm_spe_flush(struct perf_session *session __maybe_unused,
>  			 struct perf_tool *tool __maybe_unused)
>  {
> -	return 0;
> +	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
> +			auxtrace);
> +	int ret;
> +
> +	if (dump_trace)
> +		return 0;
> +
> +	if (!tool->ordered_events)
> +		return -EINVAL;
> +
> +	ret = arm_spe__update_queues(spe);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (spe->timeless_decoding)
> +		return arm_spe_process_timeless_queues(spe, -1,
> +				MAX_TIMESTAMP - 1);
> +
> +	return arm_spe_process_queues(spe, MAX_TIMESTAMP);
>  }
>  
>  static void arm_spe_free_queue(void *priv)
> @@ -148,6 +650,9 @@ static void arm_spe_free_queue(void *priv)
>  
>  	if (!speq)
>  		return;
> +	thread__zput(speq->thread);
> +	arm_spe_decoder_free(speq->decoder);
> +	zfree(&speq->event_buf);
>  	free(speq);
>  }
>  
> @@ -188,6 +693,149 @@ static void arm_spe_print_info(__u64 *arr)
>  	fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
>  }
>  
> +struct arm_spe_synth {
> +	struct perf_tool dummy_tool;
> +	struct perf_session *session;
> +};
> +
> +static int arm_spe_event_synth(struct perf_tool *tool,
> +			       union perf_event *event,
> +			       struct perf_sample *sample __maybe_unused,
> +			       struct machine *machine __maybe_unused)
> +{
> +	struct arm_spe_synth *arm_spe_synth =
> +		      container_of(tool, struct arm_spe_synth, dummy_tool);
> +
> +	return perf_session__deliver_synth_event(arm_spe_synth->session,
> +						 event, NULL);
> +}
> +
> +static int arm_spe_synth_event(struct perf_session *session,
> +			       struct perf_event_attr *attr, u64 id)
> +{
> +	struct arm_spe_synth arm_spe_synth;
> +
> +	memset(&arm_spe_synth, 0, sizeof(struct arm_spe_synth));
> +	arm_spe_synth.session = session;
> +
> +	return perf_event__synthesize_attr(&arm_spe_synth.dummy_tool, attr, 1,
> +					   &id, arm_spe_event_synth);
> +}
> +
> +static void arm_spe_set_event_name(struct evlist *evlist, u64 id,
> +				    const char *name)
> +{
> +	struct evsel *evsel;
> +
> +	evlist__for_each_entry(evlist, evsel) {
> +		if (evsel->core.id && evsel->core.id[0] == id) {
> +			if (evsel->name)
> +				zfree(&evsel->name);
> +			evsel->name = strdup(name);
> +			break;
> +		}
> +	}
> +}
> +
> +static int
> +arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
> +{
> +	struct evlist *evlist = session->evlist;
> +	struct evsel *evsel;
> +	struct perf_event_attr attr;
> +	bool found = false;
> +	u64 id;
> +	int err;
> +
> +	evlist__for_each_entry(evlist, evsel) {
> +		if (evsel->core.attr.type == spe->pmu_type) {
> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (!found) {
> +		pr_debug("No selected events with CoreSight Trace data\n");
> +		return 0;
> +	}
> +
> +	memset(&attr, 0, sizeof(struct perf_event_attr));
> +	attr.size = sizeof(struct perf_event_attr);
> +	attr.type = PERF_TYPE_HARDWARE;
> +	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
> +	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
> +		PERF_SAMPLE_PERIOD;
> +	if (spe->timeless_decoding)
> +		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
> +	else
> +		attr.sample_type |= PERF_SAMPLE_TIME;
> +
> +	attr.exclude_user = evsel->core.attr.exclude_user;
> +	attr.exclude_kernel = evsel->core.attr.exclude_kernel;
> +	attr.exclude_hv = evsel->core.attr.exclude_hv;
> +	attr.exclude_host = evsel->core.attr.exclude_host;
> +	attr.exclude_guest = evsel->core.attr.exclude_guest;
> +	attr.sample_id_all = evsel->core.attr.sample_id_all;
> +	attr.read_format = evsel->core.attr.read_format;
> +
> +	/* create new id val to be a fixed offset from evsel id */
> +	id = evsel->core.id[0] + 1000000000;
> +
> +	if (!id)
> +		id = 1;
> +
> +	/* spe events set */
> +	if (spe->synth_opts.llc_miss) {
> +		spe->sample_llc_miss = true;
> +
> +		/* llc-miss */
> +		err = arm_spe_synth_event(session, &attr, id);
> +		if (err)
> +			return err;
> +		spe->llc_miss_id = id;
> +		arm_spe_set_event_name(evlist, id, "llc-miss");
> +		id += 1;
> +	}
> +
> +	if (spe->synth_opts.tlb_miss) {
> +		spe->sample_tlb_miss = true;
> +
> +		/* tlb-miss */
> +		err = arm_spe_synth_event(session, &attr, id);
> +		if (err)
> +			return err;
> +		spe->tlb_miss_id = id;
> +		arm_spe_set_event_name(evlist, id, "tlb-miss");
> +		id += 1;
> +	}
> +
> +	if (spe->synth_opts.branches) {
> +		spe->sample_branch_miss = true;
> +
> +		/* branch-miss */
> +		err = arm_spe_synth_event(session, &attr, id);
> +		if (err)
> +			return err;
> +		spe->branch_miss_id = id;
> +		arm_spe_set_event_name(evlist, id, "branch-miss");
> +		id += 1;
> +	}
> +
> +	if (spe->synth_opts.remote_access) {
> +		spe->sample_remote_access = true;
> +
> +		/* remote-access */
> +		err = arm_spe_synth_event(session, &attr, id);
> +		if (err)
> +			return err;
> +		spe->remote_access_id = id;
> +		arm_spe_set_event_name(evlist, id, "remote-access");
> +		id += 1;
> +	}
> +
> +	return 0;
> +}
> +
>  int arm_spe_process_auxtrace_info(union perf_event *event,
>  				  struct perf_session *session)
>  {
> @@ -213,6 +861,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>  	spe->auxtrace_type = auxtrace_info->type;
>  	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
>  
> +	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
>  	spe->auxtrace.process_event = arm_spe_process_event;
>  	spe->auxtrace.process_auxtrace_event = arm_spe_process_auxtrace_event;
>  	spe->auxtrace.flush_events = arm_spe_flush;
> @@ -222,8 +871,30 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>  
>  	arm_spe_print_info(&auxtrace_info->priv[0]);
>  
> +	if (dump_trace)
> +		return 0;
> +
> +	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
> +		spe->synth_opts = *session->itrace_synth_opts;
> +	else
> +		itrace_synth_opts__set_default(&spe->synth_opts, false);
> +
> +	err = arm_spe_synth_events(spe, session);
> +	if (err)
> +		goto err_free_queues;
> +
> +	err = auxtrace_queues__process_index(&spe->queues, session);
> +	if (err)
> +		goto err_free_queues;
> +
> +	if (spe->queues.populated)
> +		spe->data_queued = true;
> +
>  	return 0;
>  
> +err_free_queues:
> +	auxtrace_queues__free(&spe->queues);
> +	session->auxtrace = NULL;
>  err_free:
>  	free(spe);
>  	return err;
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index eb087e7df6f4..994d5e3c9e4f 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -1279,6 +1279,10 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
>  	synth_opts->pwr_events = true;
>  	synth_opts->other_events = true;
>  	synth_opts->errors = true;
> +	synth_opts->llc_miss = true;
> +	synth_opts->tlb_miss = true;
> +	synth_opts->remote_access = true;
> +
>  	if (no_sample) {
>  		synth_opts->period_type = PERF_ITRACE_PERIOD_INSTRUCTIONS;
>  		synth_opts->period = 1;
> @@ -1431,6 +1435,15 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
>  				goto out_err;
>  			p = endptr;
>  			break;
> +		case 'm':
> +			synth_opts->llc_miss = true;
> +			break;
> +		case 't':
> +			synth_opts->tlb_miss = true;
> +			break;
> +		case 'a':
> +			synth_opts->remote_access = true;
> +			break;
>  		case ' ':
>  		case ',':
>  			break;
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index 749d72cd9c7b..80617b0d044d 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -60,7 +60,7 @@ enum itrace_period_type {
>   * @inject: indicates the event (not just the sample) must be fully synthesized
>   *          because 'perf inject' will write it out
>   * @instructions: whether to synthesize 'instructions' events
> - * @branches: whether to synthesize 'branches' events
> + * @branches: whether to synthesize 'branches' events (branch misses only on Arm)
>   * @transactions: whether to synthesize events for transactions
>   * @ptwrites: whether to synthesize events for ptwrites
>   * @pwr_events: whether to synthesize power events
> @@ -74,6 +74,9 @@ enum itrace_period_type {
>   * @callchain: add callchain to 'instructions' events
>   * @thread_stack: feed branches to the thread_stack
>   * @last_branch: add branch context to 'instruction' events
> + * @llc_miss: whether to synthesize last level cache miss events
> + * @tlb_miss: whether to synthesize TLB miss events
> + * @remote_access: whether to synthesize Remote access events
>   * @callchain_sz: maximum callchain size
>   * @last_branch_sz: branch context size
>   * @period: 'instructions' events period
> @@ -101,6 +104,9 @@ struct itrace_synth_opts {
>  	bool			callchain;
>  	bool			thread_stack;
>  	bool			last_branch;
> +	bool			llc_miss;
> +	bool			tlb_miss;
> +	bool			remote_access;
>  	unsigned int		callchain_sz;
>  	unsigned int		last_branch_sz;
>  	unsigned long long	period;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 0/3] perf tools: Add support for some spe events
  2020-02-28 16:01                   ` Mark Rutland
@ 2020-03-06 15:25                     ` James Clark
  2020-03-06 15:25                       ` [PATCH v6 1/3] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
                                         ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: James Clark @ 2020-03-06 15:25 UTC (permalink / raw)
  To: mark.rutland, linux-arm-kernel, linux-kernel; +Cc: nd, James Clark

Hi Mark,

Yes I think this is something I can look into. For now I have removed
that last patch because the current patch set already works very similarly anyway
and allows people to use SPE in perf:

    ./perf record -e arm_spe_0/branch_filter=1/
vs
    ./perf record -e arm_spe/branch-misses/pp

Also I don't have access to any big.LITTLE hardware with SPE so wouldn't be able
to test collating all the SPE PMUs.

Thanks
James

Tan Xiaojun (3):
  perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  perf tools: Add support for "report" for some spe events
  perf report: Add SPE options to --itrace argument

 tools/perf/Documentation/itrace.txt           |   5 +-
 tools/perf/util/Build                         |   2 +-
 tools/perf/util/arm-spe-decoder/Build         |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-pkt-decoder.c                     |   0
 .../arm-spe-pkt-decoder.h                     |   2 +
 tools/perf/util/arm-spe.c                     | 747 +++++++++++++++++-
 tools/perf/util/auxtrace.c                    |  13 +
 tools/perf/util/auxtrace.h                    |  13 +-
 10 files changed, 1032 insertions(+), 42 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (96%)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 1/3] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
  2020-03-06 15:25                     ` [PATCH v6 0/3] perf tools: Add support for some spe events James Clark
@ 2020-03-06 15:25                       ` James Clark
  2020-03-06 15:25                       ` [PATCH v6 2/3] perf tools: Add support for "report" for some spe events James Clark
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-03-06 15:25 UTC (permalink / raw)
  To: mark.rutland, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

Create a new arm-spe-decoder directory for subsequent extensions and
move arm-spe-pkt-decoder.h/c to this directory. No code changes.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/Build                                       | 2 +-
 tools/perf/util/arm-spe-decoder/Build                       | 1 +
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c | 0
 tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h | 0
 tools/perf/util/arm-spe.c                                   | 2 +-
 5 files changed, 3 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/Build
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
 rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (100%)

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 07da6c790b63..0184510083c2 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -104,7 +104,7 @@ perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 perf-$(CONFIG_AUXTRACE) += intel-pt.o
 perf-$(CONFIG_AUXTRACE) += intel-bts.o
 perf-$(CONFIG_AUXTRACE) += arm-spe.o
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-decoder/
 perf-$(CONFIG_AUXTRACE) += s390-cpumsf.o
 
 ifdef CONFIG_LIBOPENCSD
diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
new file mode 100644
index 000000000000..16efbc245028
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -0,0 +1 @@
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
diff --git a/tools/perf/util/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.c
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
diff --git a/tools/perf/util/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
similarity index 100%
rename from tools/perf/util/arm-spe-pkt-decoder.h
rename to tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 53be12b23ff4..f3382a38d48e 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -23,7 +23,7 @@
 #include "debug.h"
 #include "auxtrace.h"
 #include "arm-spe.h"
-#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
 struct arm_spe {
 	struct auxtrace			auxtrace;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 2/3] perf tools: Add support for "report" for some spe events
  2020-03-06 15:25                     ` [PATCH v6 0/3] perf tools: Add support for some spe events James Clark
  2020-03-06 15:25                       ` [PATCH v6 1/3] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
@ 2020-03-06 15:25                       ` James Clark
  2020-03-06 15:25                       ` [PATCH v6 3/3] perf report: Add SPE options to --itrace argument James Clark
  2020-03-13 11:53                       ` [PATCH v6 0/3] perf tools: Add support for some spe events Mark Rutland
  3 siblings, 0 replies; 42+ messages in thread
From: James Clark @ 2020-03-06 15:25 UTC (permalink / raw)
  To: mark.rutland, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patch is to improve the "perf report" support for spe, and
further process the data. Currently, support for the four events
of llc-miss, tlb-miss, branch-miss, and remote-access is added.

Example usage:

$ ./perf record -c 1024 -e arm_spe_0/branch_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ -o perf-armspe-dd.data dd if=/dev/zero of=/dev/null count=10000

$ ./perf report -i perf-armspe-dd.data --stdio
--------------------------------------------------------------------
...
 # Samples: 23  of event 'llc-miss'
 # Event count (approx.): 23
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    12.12%    12.12%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     6.06%     6.06%  dd       [kernel.kallsyms]  [k] copy_page
     6.06%     6.06%  dd       ld-2.28.so         [.] _dl_relocate_object
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] change_protection_range
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] generic_permission
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] kmem_cache_alloc
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] lookup_fast
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] perf_event_exec
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     3.03%     3.03%  dd       [kernel.kallsyms]  [k] ring_buffer_record_is_on
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
     3.03%     3.03%  dd       ld-2.28.so         [.] _dl_start
     3.03%     3.03%  dd       ld-2.28.so         [.] dl_main
     3.03%     3.03%  dd       ld-2.28.so         [.] strcmp
     3.03%     3.03%  dd       libc-2.28.so       [.] _dl_addr
...
 # Samples: 3  of event 'tlb-miss'
 # Event count (approx.): 3
...
    33.33%    33.33%  dd       [kernel.kallsyms]  [k] filemap_map_pages
    33.33%    33.33%  dd       ld-2.28.so         [.] _dl_start
    33.33%    33.33%  dd       ld-2.28.so         [.] dl_main
...
 # Samples: 20  of event 'branch-miss'
 # Event count (approx.): 20
...
    15.38%    15.38%  dd       [kernel.kallsyms]  [k] __fput
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] do_el0_ia_bp_hardening
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] pagevec_lru_move_fn
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] perf_event_mmap_output
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] task_work_run
     7.69%     7.69%  dd       [kernel.kallsyms]  [k] unmap_single_vma
     7.69%     7.69%  dd       libc-2.28.so       [.] _IO_flush_all_lockp
     7.69%     7.69%  dd       libc-2.28.so       [.] __memcpy_generic
     7.69%     7.69%  dd       libc-2.28.so       [.] _dl_addr
     7.69%     7.69%  dd       libc-2.28.so       [.] msort_with_tmp.part.0
     7.69%     7.69%  dd       libc-2.28.so       [.] read_alias_file
...
 # Samples: 5  of event 'remote-access'
 # Event count (approx.): 5
...
    27.78%    27.78%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
    16.67%    16.67%  dd       [kernel.kallsyms]  [k] perf_event_mmap
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] change_protection_range
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] filemap_map_pages
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] free_pages_and_swap_cache
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] generic_permission
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] lookup_fast
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] perf_event_exec
     5.56%     5.56%  dd       [kernel.kallsyms]  [k] radix_tree_next_chunk
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_relocate_object
     5.56%     5.56%  dd       ld-2.28.so         [.] _dl_start
     5.56%     5.56%  dd       ld-2.28.so         [.] dl_main

--------------------------------------------------------------------
After that, more analysis and processing of the raw data of spe
will be done.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/arm-spe-decoder/Build         |   2 +-
 .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
 .../arm-spe-decoder/arm-spe-pkt-decoder.h     |   2 +
 tools/perf/util/arm-spe.c                     | 745 +++++++++++++++++-
 tools/perf/util/auxtrace.c                    |  13 +
 tools/perf/util/auxtrace.h                    |   8 +-
 7 files changed, 1022 insertions(+), 39 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h

diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
index 16efbc245028..f8dae13fc876 100644
--- a/tools/perf/util/arm-spe-decoder/Build
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -1 +1 @@
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
new file mode 100644
index 000000000000..50e796b89a95
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/zalloc.h>
+
+#include "../util.h"
+#include "../debug.h"
+#include "../auxtrace.h"
+
+#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder.h"
+
+#ifndef BIT
+#define BIT(n)		(1UL << (n))
+#endif
+
+struct arm_spe_decoder {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+	struct arm_spe_state state;
+	const unsigned char *buf;
+	size_t len;
+	uint64_t pos;
+	struct arm_spe_pkt packet;
+	int pkt_step;
+	int pkt_len;
+	int last_packet_type;
+
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t timestamp;
+	uint64_t sample_timestamp;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
+};
+
+static uint64_t arm_spe_calc_ip(uint64_t payload)
+{
+	uint64_t ip = (payload & ~(0xffULL << 56));
+
+	/* fill high 8 bits for kernel virtual address */
+	/* In Armv8 Architecture Reference Manual: Xn[55] determines
+	 * whether the address lies in the upper or lower address range
+	 * for the purpose of determining whether address tagging is
+	 * used */
+	if (ip & BIT(55))
+		ip |= (uint64_t)(0xffULL << 56);
+
+	return ip;
+}
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
+{
+	struct arm_spe_decoder *decoder;
+
+	if (!params->get_trace)
+		return NULL;
+
+	decoder = zalloc(sizeof(struct arm_spe_decoder));
+	if (!decoder)
+		return NULL;
+
+	decoder->get_trace          = params->get_trace;
+	decoder->data               = params->data;
+
+	return decoder;
+}
+
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
+{
+	free(decoder);
+}
+
+static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
+{
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	pr_debug("ERROR: Bad packet\n");
+
+	return -EBADMSG;
+}
+
+
+static int arm_spe_get_data(struct arm_spe_decoder *decoder)
+{
+	struct arm_spe_buffer buffer = { .buf = 0, };
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	pr_debug("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		pr_debug("No more data\n");
+		return -ENODATA;
+	}
+
+	return 0;
+}
+
+static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
+{
+	return arm_spe_get_data(decoder);
+}
+
+static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
+{
+	int ret;
+
+	decoder->last_packet_type = decoder->packet.type;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+
+		if (!decoder->len) {
+			ret = arm_spe_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = arm_spe_get_packet(decoder->buf, decoder->len,
+				&decoder->packet);
+		if (ret <= 0)
+			return arm_spe_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+	} while (decoder->packet.type == ARM_SPE_PAD);
+
+	return 0;
+}
+
+static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
+{
+	int err;
+	int idx;
+	uint64_t payload;
+
+	while (1) {
+		err = arm_spe_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		idx = decoder->packet.index;
+		payload = decoder->packet.payload;
+
+		switch (decoder->packet.type) {
+		case ARM_SPE_TIMESTAMP:
+			decoder->sample_timestamp = payload;
+			return 0;
+		case ARM_SPE_END:
+			decoder->sample_timestamp = 0;
+			return 0;
+		case ARM_SPE_ADDRESS:
+			decoder->ip = arm_spe_calc_ip(payload);
+			if (idx == 0)
+				decoder->state.from_ip = decoder->ip;
+			else if (idx == 1)
+				decoder->state.to_ip = decoder->ip;
+			break;
+		case ARM_SPE_COUNTER:
+			break;
+		case ARM_SPE_CONTEXT:
+			break;
+		case ARM_SPE_OP_TYPE:
+			break;
+		case ARM_SPE_EVENTS:
+			if (payload & BIT(EV_TLB_REFILL))
+				decoder->state.type |= ARM_SPE_TLB_MISS;
+			if (payload & BIT(EV_MISPRED))
+				decoder->state.type |= ARM_SPE_BRANCH_MISS;
+			if (idx > 1 && (payload & BIT(EV_LLC_REFILL)))
+				decoder->state.type |= ARM_SPE_LLC_MISS;
+			if (idx > 1 && (payload & BIT(EV_REMOTE_ACCESS)))
+				decoder->state.type |= ARM_SPE_REMOTE_ACCESS;
+
+			break;
+		case ARM_SPE_DATA_SOURCE:
+			break;
+		case ARM_SPE_BAD:
+			break;
+		case ARM_SPE_PAD:
+			break;
+		default:
+			pr_err("Get Packet Error!\n");
+			return -ENOSYS;
+		}
+	}
+}
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
+{
+	int err;
+
+	decoder->state.type = 0;
+
+	err = arm_spe_walk_trace(decoder);
+	if (err)
+		decoder->state.err = err;
+
+	decoder->state.timestamp = decoder->sample_timestamp;
+
+	return &decoder->state;
+}
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
new file mode 100644
index 000000000000..330f9e1e71ab
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef INCLUDE__ARM_SPE_DECODER_H__
+#define INCLUDE__ARM_SPE_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+enum arm_spe_events {
+	EV_EXCEPTION_GEN,
+	EV_RETIRED,
+	EV_L1D_ACCESS,
+	EV_L1D_REFILL,
+	EV_TLB_ACCESS,
+	EV_TLB_REFILL,
+	EV_NOT_TAKEN,
+	EV_MISPRED,
+	EV_LLC_ACCESS,
+	EV_LLC_REFILL,
+	EV_REMOTE_ACCESS,
+};
+
+enum arm_spe_sample_type {
+	ARM_SPE_LLC_MISS	= 1 << 0,
+	ARM_SPE_TLB_MISS	= 1 << 1,
+	ARM_SPE_BRANCH_MISS	= 1 << 2,
+	ARM_SPE_REMOTE_ACCESS	= 1 << 3,
+	ARM_SPE_EX_STOP		= 1 << 6,
+};
+
+struct arm_spe_state {
+	enum arm_spe_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t timestamp;
+};
+
+struct arm_spe_insn;
+
+struct arm_spe_buffer {
+	const unsigned char *buf;
+	size_t len;
+	u64 offset;
+	bool consecutive;
+	uint64_t ref_timestamp;
+	uint64_t trace_nr;
+};
+
+struct arm_spe_params {
+	int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+	void *data;
+};
+
+struct arm_spe_decoder;
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params);
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder);
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder);
+
+#endif
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
index d786ef65113f..865d1e35b401 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -15,6 +15,8 @@
 #define ARM_SPE_NEED_MORE_BYTES		-1
 #define ARM_SPE_BAD_PACKET		-2
 
+#define ARM_SPE_PKT_MAX_SZ		16
+
 enum arm_spe_pkt_type {
 	ARM_SPE_BAD,
 	ARM_SPE_PAD,
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index f3382a38d48e..4ef22a0775a9 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -16,34 +16,68 @@
 #include <linux/log2.h>
 #include <linux/zalloc.h>
 
+#include "auxtrace.h"
 #include "color.h"
+#include "debug.h"
 #include "evsel.h"
+#include "evlist.h"
 #include "machine.h"
 #include "session.h"
-#include "debug.h"
-#include "auxtrace.h"
+#include "symbol.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "tool.h"
+#include "util/synthetic-events.h"
+
 #include "arm-spe.h"
+#include "arm-spe-decoder/arm-spe-decoder.h"
 #include "arm-spe-decoder/arm-spe-pkt-decoder.h"
 
+#define MAX_TIMESTAMP (~0ULL)
+
 struct arm_spe {
 	struct auxtrace			auxtrace;
 	struct auxtrace_queues		queues;
 	struct auxtrace_heap		heap;
+        struct itrace_synth_opts        synth_opts;
 	u32				auxtrace_type;
 	struct perf_session		*session;
 	struct machine			*machine;
 	u32				pmu_type;
+
+	u8				timeless_decoding;
+	u8				data_queued;
+
+	u8				sample_llc_miss;
+	u8				sample_tlb_miss;
+	u8				sample_branch_miss;
+	u8				sample_remote_access;
+	u64				llc_miss_id;
+	u64				tlb_miss_id;
+	u64				branch_miss_id;
+	u64				remote_access_id;
+	u64				kernel_start;
+
+	unsigned long			num_events;
 };
 
 struct arm_spe_queue {
-	struct arm_spe		*spe;
-	unsigned int		queue_nr;
-	struct auxtrace_buffer	*buffer;
-	bool			on_heap;
-	bool			done;
-	pid_t			pid;
-	pid_t			tid;
-	int			cpu;
+	struct arm_spe			*spe;
+	unsigned int			queue_nr;
+	struct auxtrace_buffer		*buffer;
+	struct auxtrace_buffer		*old_buffer;
+	union perf_event		*event_buf;
+	bool				on_heap;
+	bool				done;
+	pid_t				pid;
+	pid_t				tid;
+	int				cpu;
+	void				*decoder;
+	const struct arm_spe_state	*state;
+	u64				time;
+	u64				timestamp;
+	struct thread			*thread;
+	bool				have_sample;
 };
 
 static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
@@ -92,44 +126,494 @@ static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
 	arm_spe_dump(spe, buf, len);
 }
 
-static int arm_spe_process_event(struct perf_session *session __maybe_unused,
-				 union perf_event *event __maybe_unused,
-				 struct perf_sample *sample __maybe_unused,
-				 struct perf_tool *tool __maybe_unused)
+static int arm_spe_get_trace(struct arm_spe_buffer *b, void *data)
+{
+	struct arm_spe_queue *speq = data;
+	struct auxtrace_buffer *buffer = speq->buffer;
+	struct auxtrace_buffer *old_buffer = speq->old_buffer;
+	struct auxtrace_queue *queue;
+
+	queue = &speq->spe->queues.queue_array[speq->queue_nr];
+
+	buffer = auxtrace_buffer__next(queue, buffer);
+	/* If no more data, drop the previous auxtrace_buffer and return */
+	if (!buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	speq->buffer = buffer;
+
+	/* If the aux_buffer doesn't have data associated, try to load it */
+	if (!buffer->data) {
+		/* get the file desc associated with the perf data file */
+		int fd = perf_data__fd(speq->spe->session->data);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (buffer->use_data) {
+		b->len = buffer->use_size;
+		b->buf = buffer->use_data;
+	} else {
+		b->len = buffer->size;
+		b->buf = buffer->data;
+	}
+
+	b->ref_timestamp = buffer->reference;
+
+	if (b->len) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		speq->old_buffer = buffer;
+	} else {
+		auxtrace_buffer__drop_data(buffer);
+		return arm_spe_get_trace(b, data);
+	}
+
+	return 0;
+}
+
+static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe,
+		unsigned int queue_nr)
+{
+	struct arm_spe_params params = { .get_trace = 0, };
+	struct arm_spe_queue *speq;
+
+	speq = zalloc(sizeof(*speq));
+	if (!speq)
+		return NULL;
+
+	speq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!speq->event_buf)
+		goto out_free;
+
+	speq->spe = spe;
+	speq->queue_nr = queue_nr;
+	speq->pid = -1;
+	speq->tid = -1;
+	speq->cpu = -1;
+
+	/* params set */
+	params.get_trace = arm_spe_get_trace;
+	params.data = speq;
+
+	/* create new decoder */
+	speq->decoder = arm_spe_decoder_new(&params);
+	if (!speq->decoder)
+		goto out_free;
+
+	return speq;
+
+out_free:
+	zfree(&speq->event_buf);
+	free(speq);
+
+	return NULL;
+}
+
+static inline u8 arm_spe_cpumode(struct arm_spe *spe, uint64_t ip)
+{
+	return ip >= spe->kernel_start ?
+		PERF_RECORD_MISC_KERNEL :
+		PERF_RECORD_MISC_USER;
+}
+
+static void arm_spe_prep_sample(struct arm_spe *spe,
+				struct arm_spe_queue *speq,
+				union perf_event *event,
+				struct perf_sample *sample)
+{
+	if (!spe->timeless_decoding)
+		sample->time = speq->timestamp;
+
+	sample->ip = speq->state->from_ip;
+	sample->cpumode = arm_spe_cpumode(spe, sample->ip);
+	sample->pid = speq->pid;
+	sample->tid = speq->tid;
+	sample->addr = speq->state->to_ip;
+	sample->period = 1;
+	sample->cpu = speq->cpu;
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = sample->cpumode;
+	event->sample.header.size = sizeof(struct perf_event_header);
+}
+
+static inline int
+arm_spe_deliver_synth_event(struct arm_spe *spe,
+			    struct arm_spe_queue *speq __maybe_unused,
+			    union perf_event *event,
+			    struct perf_sample *sample)
+{
+	int ret;
+
+	ret = perf_session__deliver_synth_event(spe->session, event, sample);
+	if (ret)
+		pr_err("ARM SPE: failed to deliver event, error %d\n", ret);
+
+	return ret;
+}
+
+static int
+arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
+				u64 spe_events_id)
+{
+	struct arm_spe *spe = speq->spe;
+	union perf_event *event = speq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	arm_spe_prep_sample(spe, speq, event, &sample);
+
+	sample.id = spe_events_id;
+	sample.stream_id = spe_events_id;
+
+	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+static int arm_spe_sample(struct arm_spe_queue *speq)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!speq->have_sample)
+		return 0;
+
+	speq->have_sample = false;
+
+	if (spe->sample_llc_miss && (state->type & ARM_SPE_LLC_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->llc_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_tlb_miss && (state->type & ARM_SPE_TLB_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->tlb_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_branch_miss && (state->type & ARM_SPE_BRANCH_MISS)) {
+		err = arm_spe_synth_spe_events_sample(speq,
+						      spe->branch_miss_id);
+		if (err)
+			return err;
+	}
+
+	if (spe->sample_remote_access && (state->type & ARM_SPE_REMOTE_ACCESS)) {
+		err = arm_spe_synth_spe_events_sample(speq, spe->remote_access_id);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
+{
+	const struct arm_spe_state *state = speq->state;
+	struct arm_spe *spe = speq->spe;
+	int err;
+
+	if (!spe->kernel_start)
+		spe->kernel_start = machine__kernel_start(spe->machine);
+
+	while (1) {
+		err = arm_spe_sample(speq);
+		if (err)
+			return err;
+
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("No data or all data has been processed.\n");
+				return 1;
+			}
+			continue;
+		}
+
+		speq->state = state;
+		speq->have_sample = true;
+
+		if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
+			*timestamp = speq->timestamp;
+			return 0;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queue(struct arm_spe *spe,
+			       struct auxtrace_queue *queue,
+			       unsigned int queue_nr)
+{
+	struct arm_spe_queue *speq = queue->priv;
+
+	if (list_empty(&queue->head) || speq)
+		return 0;
+
+	speq = arm_spe__alloc_queue(spe, queue_nr);
+
+	if (!speq)
+		return -ENOMEM;
+
+	queue->priv = speq;
+
+	if (queue->cpu != -1)
+		speq->cpu = queue->cpu;
+
+	if (!speq->on_heap) {
+		const struct arm_spe_state *state;
+		int ret;
+
+		if (spe->timeless_decoding)
+			return 0;
+
+retry:
+		state = arm_spe_decode(speq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA) {
+				pr_debug("queue %u has no timestamp\n",
+						queue_nr);
+				return 0;
+			}
+			goto retry;
+		}
+
+		speq->timestamp = state->timestamp;
+		speq->state = state;
+		speq->have_sample = true;
+		ret = auxtrace_heap__add(&spe->heap, queue_nr, speq->timestamp);
+		if (ret)
+			return ret;
+		speq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int arm_spe__setup_queues(struct arm_spe *spe)
 {
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < spe->queues.nr_queues; i++) {
+		ret = arm_spe__setup_queue(spe, &spe->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int arm_spe__update_queues(struct arm_spe *spe)
+{
+	if (spe->queues.new_data) {
+		spe->queues.new_data = false;
+		return arm_spe__setup_queues(spe);
+	}
+
 	return 0;
 }
 
+static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
+{
+	struct evsel *evsel;
+	struct evlist *evlist = spe->session->evlist;
+	bool timeless_decoding = true;
+
+	/*
+	 * Circle through the list of event and complain if we find one
+	 * with the time bit set.
+	 */
+	evlist__for_each_entry(evlist, evsel) {
+		if ((evsel->core.attr.sample_type & PERF_SAMPLE_TIME))
+			timeless_decoding = false;
+	}
+
+	return timeless_decoding;
+}
+
+static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
+				    struct auxtrace_queue *queue)
+{
+	struct arm_spe_queue *speq = queue->priv;
+	pid_t tid;
+
+	tid = machine__get_current_tid(spe->machine, speq->cpu);
+	if (tid != -1) {
+		speq->tid = tid;
+		thread__zput(speq->thread);
+	} else
+		speq->tid = queue->tid;
+
+	if ((!speq->thread) && (speq->tid != -1)) {
+		speq->thread = machine__find_thread(spe->machine, -1,
+						    speq->tid);
+	}
+
+	if (speq->thread) {
+		speq->pid = speq->thread->pid_;
+		if (queue->cpu == -1)
+			speq->cpu = speq->thread->cpu;
+	}
+}
+
+static int arm_spe_process_queues(struct arm_spe *spe, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct arm_spe_queue *speq;
+
+		if (!spe->heap.heap_cnt)
+			return 0;
+
+		if (spe->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = spe->heap.heap_array[0].queue_nr;
+		queue = &spe->queues.queue_array[queue_nr];
+		speq = queue->priv;
+
+		auxtrace_heap__pop(&spe->heap);
+
+		if (spe->heap.heap_cnt) {
+			ts = spe->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		arm_spe_set_pid_tid_cpu(spe, queue);
+
+		ret = arm_spe_run_decoder(speq, &ts);
+		if (ret < 0) {
+			auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&spe->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			speq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid,
+					    u64 time_)
+{
+	struct auxtrace_queues *queues = &spe->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &spe->queues.queue_array[i];
+		struct arm_spe_queue *speq = queue->priv;
+
+		if (speq && (tid == -1 || speq->tid == tid)) {
+			speq->time = time_;
+			arm_spe_set_pid_tid_cpu(spe, queue);
+			arm_spe_run_decoder(speq, &ts);
+		}
+	}
+	return 0;
+}
+
+static int arm_spe_process_event(struct perf_session *session,
+				 union perf_event *event,
+				 struct perf_sample *sample,
+				 struct perf_tool *tool)
+{
+	int err = 0;
+	u64 timestamp;
+	struct arm_spe *spe = container_of(session->auxtrace,
+			struct arm_spe, auxtrace);
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("CoreSight SPE Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time && (sample->time != (u64) -1))
+		timestamp = sample->time;
+	else
+		timestamp = 0;
+
+	if (timestamp || spe->timeless_decoding) {
+		err = arm_spe__update_queues(spe);
+		if (err)
+			return err;
+	}
+
+	if (spe->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_timeless_queues(spe,
+					event->fork.tid,
+					sample->time);
+		}
+	} else if (timestamp) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = arm_spe_process_queues(spe, timestamp);
+			if (err)
+				return err;
+		}
+	}
+
+	return err;
+}
+
 static int arm_spe_process_auxtrace_event(struct perf_session *session,
 					  union perf_event *event,
 					  struct perf_tool *tool __maybe_unused)
 {
 	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
 					     auxtrace);
-	struct auxtrace_buffer *buffer;
-	off_t data_offset;
-	int fd = perf_data__fd(session->data);
-	int err;
 
-	if (perf_data__is_pipe(session->data)) {
-		data_offset = 0;
-	} else {
-		data_offset = lseek(fd, 0, SEEK_CUR);
-		if (data_offset == -1)
-			return -errno;
-	}
+	if (!spe->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data__fd(session->data);
+		int err;
 
-	err = auxtrace_queues__add_event(&spe->queues, session, event,
-					 data_offset, &buffer);
-	if (err)
-		return err;
-
-	/* Dump here now we have copied a piped trace out of the pipe */
-	if (dump_trace) {
-		if (auxtrace_buffer__get_data(buffer, fd)) {
-			arm_spe_dump_event(spe, buffer->data,
-					     buffer->size);
-			auxtrace_buffer__put_data(buffer);
+		if (perf_data__is_pipe(session->data)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&spe->queues, session, event,
+				data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				arm_spe_dump_event(spe, buffer->data,
+						buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
 		}
 	}
 
@@ -139,7 +623,25 @@ static int arm_spe_process_auxtrace_event(struct perf_session *session,
 static int arm_spe_flush(struct perf_session *session __maybe_unused,
 			 struct perf_tool *tool __maybe_unused)
 {
-	return 0;
+	struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
+			auxtrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = arm_spe__update_queues(spe);
+	if (ret < 0)
+		return ret;
+
+	if (spe->timeless_decoding)
+		return arm_spe_process_timeless_queues(spe, -1,
+				MAX_TIMESTAMP - 1);
+
+	return arm_spe_process_queues(spe, MAX_TIMESTAMP);
 }
 
 static void arm_spe_free_queue(void *priv)
@@ -148,6 +650,9 @@ static void arm_spe_free_queue(void *priv)
 
 	if (!speq)
 		return;
+	thread__zput(speq->thread);
+	arm_spe_decoder_free(speq->decoder);
+	zfree(&speq->event_buf);
 	free(speq);
 }
 
@@ -188,6 +693,149 @@ static void arm_spe_print_info(__u64 *arr)
 	fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
 }
 
+struct arm_spe_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int arm_spe_event_synth(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample __maybe_unused,
+			       struct machine *machine __maybe_unused)
+{
+	struct arm_spe_synth *arm_spe_synth =
+		      container_of(tool, struct arm_spe_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(arm_spe_synth->session,
+						 event, NULL);
+}
+
+static int arm_spe_synth_event(struct perf_session *session,
+			       struct perf_event_attr *attr, u64 id)
+{
+	struct arm_spe_synth arm_spe_synth;
+
+	memset(&arm_spe_synth, 0, sizeof(struct arm_spe_synth));
+	arm_spe_synth.session = session;
+
+	return perf_event__synthesize_attr(&arm_spe_synth.dummy_tool, attr, 1,
+					   &id, arm_spe_event_synth);
+}
+
+static void arm_spe_set_event_name(struct evlist *evlist, u64 id,
+				    const char *name)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.id && evsel->core.id[0] == id) {
+			if (evsel->name)
+				zfree(&evsel->name);
+			evsel->name = strdup(name);
+			break;
+		}
+	}
+}
+
+static int
+arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
+{
+	struct evlist *evlist = session->evlist;
+	struct evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.attr.type == spe->pmu_type) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("No selected events with CoreSight Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+		PERF_SAMPLE_PERIOD;
+	if (spe->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+
+	attr.exclude_user = evsel->core.attr.exclude_user;
+	attr.exclude_kernel = evsel->core.attr.exclude_kernel;
+	attr.exclude_hv = evsel->core.attr.exclude_hv;
+	attr.exclude_host = evsel->core.attr.exclude_host;
+	attr.exclude_guest = evsel->core.attr.exclude_guest;
+	attr.sample_id_all = evsel->core.attr.sample_id_all;
+	attr.read_format = evsel->core.attr.read_format;
+
+	/* create new id val to be a fixed offset from evsel id */
+	id = evsel->core.id[0] + 1000000000;
+
+	if (!id)
+		id = 1;
+
+	/* spe events set */
+	if (spe->synth_opts.llc_miss) {
+		spe->sample_llc_miss = true;
+
+		/* llc-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->llc_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "llc-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.tlb_miss) {
+		spe->sample_tlb_miss = true;
+
+		/* tlb-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->tlb_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "tlb-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.branches) {
+		spe->sample_branch_miss = true;
+
+		/* branch-miss */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->branch_miss_id = id;
+		arm_spe_set_event_name(evlist, id, "branch-miss");
+		id += 1;
+	}
+
+	if (spe->synth_opts.remote_access) {
+		spe->sample_remote_access = true;
+
+		/* remote-access */
+		err = arm_spe_synth_event(session, &attr, id);
+		if (err)
+			return err;
+		spe->remote_access_id = id;
+		arm_spe_set_event_name(evlist, id, "remote-access");
+		id += 1;
+	}
+
+	return 0;
+}
+
 int arm_spe_process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session)
 {
@@ -213,6 +861,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 	spe->auxtrace_type = auxtrace_info->type;
 	spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
 
+	spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
 	spe->auxtrace.process_event = arm_spe_process_event;
 	spe->auxtrace.process_auxtrace_event = arm_spe_process_auxtrace_event;
 	spe->auxtrace.flush_events = arm_spe_flush;
@@ -222,8 +871,30 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
 
 	arm_spe_print_info(&auxtrace_info->priv[0]);
 
+	if (dump_trace)
+		return 0;
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+		spe->synth_opts = *session->itrace_synth_opts;
+	else
+		itrace_synth_opts__set_default(&spe->synth_opts, false);
+
+	err = arm_spe_synth_events(spe, session);
+	if (err)
+		goto err_free_queues;
+
+	err = auxtrace_queues__process_index(&spe->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (spe->queues.populated)
+		spe->data_queued = true;
+
 	return 0;
 
+err_free_queues:
+	auxtrace_queues__free(&spe->queues);
+	session->auxtrace = NULL;
 err_free:
 	free(spe);
 	return err;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index eb087e7df6f4..994d5e3c9e4f 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1279,6 +1279,10 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
 	synth_opts->pwr_events = true;
 	synth_opts->other_events = true;
 	synth_opts->errors = true;
+	synth_opts->llc_miss = true;
+	synth_opts->tlb_miss = true;
+	synth_opts->remote_access = true;
+
 	if (no_sample) {
 		synth_opts->period_type = PERF_ITRACE_PERIOD_INSTRUCTIONS;
 		synth_opts->period = 1;
@@ -1431,6 +1435,15 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 				goto out_err;
 			p = endptr;
 			break;
+		case 'm':
+			synth_opts->llc_miss = true;
+			break;
+		case 't':
+			synth_opts->tlb_miss = true;
+			break;
+		case 'a':
+			synth_opts->remote_access = true;
+			break;
 		case ' ':
 		case ',':
 			break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 749d72cd9c7b..80617b0d044d 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -60,7 +60,7 @@ enum itrace_period_type {
  * @inject: indicates the event (not just the sample) must be fully synthesized
  *          because 'perf inject' will write it out
  * @instructions: whether to synthesize 'instructions' events
- * @branches: whether to synthesize 'branches' events
+ * @branches: whether to synthesize 'branches' events (branch misses only on Arm)
  * @transactions: whether to synthesize events for transactions
  * @ptwrites: whether to synthesize events for ptwrites
  * @pwr_events: whether to synthesize power events
@@ -74,6 +74,9 @@ enum itrace_period_type {
  * @callchain: add callchain to 'instructions' events
  * @thread_stack: feed branches to the thread_stack
  * @last_branch: add branch context to 'instruction' events
+ * @llc_miss: whether to synthesize last level cache miss events
+ * @tlb_miss: whether to synthesize TLB miss events
+ * @remote_access: whether to synthesize Remote access events
  * @callchain_sz: maximum callchain size
  * @last_branch_sz: branch context size
  * @period: 'instructions' events period
@@ -101,6 +104,9 @@ struct itrace_synth_opts {
 	bool			callchain;
 	bool			thread_stack;
 	bool			last_branch;
+	bool			llc_miss;
+	bool			tlb_miss;
+	bool			remote_access;
 	unsigned int		callchain_sz;
 	unsigned int		last_branch_sz;
 	unsigned long long	period;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 3/3] perf report: Add SPE options to --itrace argument
  2020-03-06 15:25                     ` [PATCH v6 0/3] perf tools: Add support for some spe events James Clark
  2020-03-06 15:25                       ` [PATCH v6 1/3] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
  2020-03-06 15:25                       ` [PATCH v6 2/3] perf tools: Add support for "report" for some spe events James Clark
@ 2020-03-06 15:25                       ` James Clark
  2020-03-13 11:33                         ` Leo Yan
  2020-03-13 11:53                       ` [PATCH v6 0/3] perf tools: Add support for some spe events Mark Rutland
  3 siblings, 1 reply; 42+ messages in thread
From: James Clark @ 2020-03-06 15:25 UTC (permalink / raw)
  To: mark.rutland, linux-arm-kernel, linux-kernel
  Cc: nd, Tan Xiaojun, James Clark, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Al Grant, Namhyung Kim

From: Tan Xiaojun <tanxiaojun@huawei.com>

The previous patch added support in "perf report" for some arm-spe
events(llc-miss, tlb-miss, branch-miss, remote_access). This patch
adds their help instructions.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Qi Liu <liuqi115@hisilicon.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/itrace.txt | 5 ++++-
 tools/perf/util/auxtrace.h          | 5 ++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 82ff7dad40c2..da3e5ccc039e 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -1,5 +1,5 @@
 		i	synthesize instructions events
-		b	synthesize branches events
+		b	synthesize branches events (branch misses on Arm)
 		c	synthesize branches events (calls only)
 		r	synthesize branches events (returns only)
 		x	synthesize transactions events
@@ -9,6 +9,9 @@
 			of aux-output (refer to perf record)
 		e	synthesize error events
 		d	create a debug log
+		m	synthesize LLC miss events
+		t	synthesize TLB miss events
+		a	synthesize remote access events
 		g	synthesize a call chain (use with i or x)
 		l	synthesize last branch entries (use with i or x)
 		s       skip initial number of events
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 80617b0d044d..52e148eea7f8 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -587,7 +587,7 @@ void auxtrace__free(struct perf_session *session);
 
 #define ITRACE_HELP \
 "				i:	    		synthesize instructions events\n"		\
-"				b:	    		synthesize branches events\n"		\
+"				b:	    		synthesize branches events (branch misses on Arm)\n" \
 "				c:	    		synthesize branches events (calls only)\n"	\
 "				r:	    		synthesize branches events (returns only)\n" \
 "				x:	    		synthesize transactions events\n"		\
@@ -595,6 +595,9 @@ void auxtrace__free(struct perf_session *session);
 "				p:	    		synthesize power events\n"			\
 "				e:	    		synthesize error events\n"			\
 "				d:	    		create a debug log\n"			\
+"				m:	    		synthesize LLC miss events\n" \
+"				t:	    		synthesize TLB miss events\n" \
+"				a:	    		synthesize remote access events\n" \
 "				g[len]:     		synthesize a call chain (use with i or x)\n" \
 "				l[len]:     		synthesize last branch entries (use with i or x)\n" \
 "				sNUMBER:    		skip initial number of events\n"		\
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 3/3] perf report: Add SPE options to --itrace argument
  2020-03-06 15:25                       ` [PATCH v6 3/3] perf report: Add SPE options to --itrace argument James Clark
@ 2020-03-13 11:33                         ` Leo Yan
  0 siblings, 0 replies; 42+ messages in thread
From: Leo Yan @ 2020-03-13 11:33 UTC (permalink / raw)
  To: James Clark
  Cc: mark.rutland, linux-arm-kernel, linux-kernel, nd, Tan Xiaojun,
	Will Deacon, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Al Grant, Namhyung Kim

Hi James,

On Fri, Mar 06, 2020 at 03:25:20PM +0000, James Clark wrote:
> From: Tan Xiaojun <tanxiaojun@huawei.com>
> 
> The previous patch added support in "perf report" for some arm-spe
> events(llc-miss, tlb-miss, branch-miss, remote_access). This patch
> adds their help instructions.
> 
> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
> Tested-by: Qi Liu <liuqi115@hisilicon.com>
> Signed-off-by: James Clark <james.clark@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Tan Xiaojun <tanxiaojun@huawei.com>
> Cc: Al Grant <al.grant@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/itrace.txt | 5 ++++-
>  tools/perf/util/auxtrace.h          | 5 ++++-
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
> index 82ff7dad40c2..da3e5ccc039e 100644
> --- a/tools/perf/Documentation/itrace.txt
> +++ b/tools/perf/Documentation/itrace.txt
> @@ -1,5 +1,5 @@
>  		i	synthesize instructions events
> -		b	synthesize branches events
> +		b	synthesize branches events (branch misses on Arm)

This is not valid for Arm CoreSight actually.  Arm CoreSight can use
option 'b' to inject branch samples.  For this reason, suggest to
change as "(branch misses for Arm SPE)".

Thanks,
Leo

>  		c	synthesize branches events (calls only)
>  		r	synthesize branches events (returns only)
>  		x	synthesize transactions events
> @@ -9,6 +9,9 @@
>  			of aux-output (refer to perf record)
>  		e	synthesize error events
>  		d	create a debug log
> +		m	synthesize LLC miss events
> +		t	synthesize TLB miss events
> +		a	synthesize remote access events
>  		g	synthesize a call chain (use with i or x)
>  		l	synthesize last branch entries (use with i or x)
>  		s       skip initial number of events
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index 80617b0d044d..52e148eea7f8 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -587,7 +587,7 @@ void auxtrace__free(struct perf_session *session);
>  
>  #define ITRACE_HELP \
>  "				i:	    		synthesize instructions events\n"		\
> -"				b:	    		synthesize branches events\n"		\
> +"				b:	    		synthesize branches events (branch misses on Arm)\n" \
>  "				c:	    		synthesize branches events (calls only)\n"	\
>  "				r:	    		synthesize branches events (returns only)\n" \
>  "				x:	    		synthesize transactions events\n"		\
> @@ -595,6 +595,9 @@ void auxtrace__free(struct perf_session *session);
>  "				p:	    		synthesize power events\n"			\
>  "				e:	    		synthesize error events\n"			\
>  "				d:	    		create a debug log\n"			\
> +"				m:	    		synthesize LLC miss events\n" \
> +"				t:	    		synthesize TLB miss events\n" \
> +"				a:	    		synthesize remote access events\n" \
>  "				g[len]:     		synthesize a call chain (use with i or x)\n" \
>  "				l[len]:     		synthesize last branch entries (use with i or x)\n" \
>  "				sNUMBER:    		skip initial number of events\n"		\
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 0/3] perf tools: Add support for some spe events
  2020-03-06 15:25                     ` [PATCH v6 0/3] perf tools: Add support for some spe events James Clark
                                         ` (2 preceding siblings ...)
  2020-03-06 15:25                       ` [PATCH v6 3/3] perf report: Add SPE options to --itrace argument James Clark
@ 2020-03-13 11:53                       ` Mark Rutland
  3 siblings, 0 replies; 42+ messages in thread
From: Mark Rutland @ 2020-03-13 11:53 UTC (permalink / raw)
  To: James Clark; +Cc: linux-arm-kernel, linux-kernel, nd

On Fri, Mar 06, 2020 at 03:25:17PM +0000, James Clark wrote:
> Hi Mark,
> 
> Yes I think this is something I can look into. For now I have removed
> that last patch because the current patch set already works very similarly anyway
> and allows people to use SPE in perf:
> 
>     ./perf record -e arm_spe_0/branch_filter=1/
> vs
>     ./perf record -e arm_spe/branch-misses/pp

Thanks, FWIW that looks fine to me.

> Also I don't have access to any big.LITTLE hardware with SPE so wouldn't be able
> to test collating all the SPE PMUs.

Likewise, I just want to make sure we don't back ourselves into a
corner.

Otherwise, I have no comments on these patches, so feel free to take
that as:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> 
> Thanks
> James
> 
> Tan Xiaojun (3):
>   perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
>   perf tools: Add support for "report" for some spe events
>   perf report: Add SPE options to --itrace argument
> 
>  tools/perf/Documentation/itrace.txt           |   5 +-
>  tools/perf/util/Build                         |   2 +-
>  tools/perf/util/arm-spe-decoder/Build         |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.c    | 225 ++++++
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  66 ++
>  .../arm-spe-pkt-decoder.c                     |   0
>  .../arm-spe-pkt-decoder.h                     |   2 +
>  tools/perf/util/arm-spe.c                     | 747 +++++++++++++++++-
>  tools/perf/util/auxtrace.c                    |  13 +
>  tools/perf/util/auxtrace.h                    |  13 +-
>  10 files changed, 1032 insertions(+), 42 deletions(-)
>  create mode 100644 tools/perf/util/arm-spe-decoder/Build
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>  rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
>  rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (96%)
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2020-03-13 11:53 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-23 16:07 [PATCH v2 0/7] perf tools: Add support for some spe events and precise ip James Clark
2020-01-23 16:07 ` [PATCH v2 1/7] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
2020-01-23 16:07 ` [PATCH v2 2/7] perf tools: Add support for "report" for some spe events James Clark
2020-01-27 12:31   ` Jiri Olsa
2020-01-23 16:07 ` [PATCH v2 3/7] perf report: Add --spe options for arm-spe James Clark
2020-01-23 16:07 ` [PATCH v2 4/7] perf tools: Support "branch-misses:pp" on arm64 James Clark
2020-01-27 12:31   ` Jiri Olsa
2020-01-23 16:07 ` [PATCH v2 5/7] perf tools: add perf_evlist__terminate() for terminate James Clark
2020-01-27 12:31   ` Jiri Olsa
2020-02-07 15:21     ` [PATCH v3 0/4] perf tools: Add support for some spe events and precise ip James Clark
2020-02-07 15:21       ` [PATCH v3 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
2020-02-07 15:21       ` [PATCH v3 2/4] perf tools: Add support for "report" for some spe events James Clark
2020-02-07 15:21       ` [PATCH v3 3/4] perf report: Add SPE options to --itrace argument James Clark
2020-02-07 15:21       ` [PATCH v3 4/4] perf tools: Support "branch-misses:pp" on arm64 James Clark
2020-02-10 12:25         ` Jiri Olsa
2020-02-11 14:04           ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip James Clark
2020-02-11 14:04             ` [PATCH v4 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
2020-02-11 14:04             ` [PATCH v4 2/4] perf tools: Add support for "report" for some spe events James Clark
2020-02-17 11:39               ` Adrian Hunter
2020-02-11 14:04             ` [PATCH v4 3/4] perf report: Add SPE options to --itrace argument James Clark
2020-02-17 11:39               ` Adrian Hunter
2020-02-25 11:57                 ` [PATCH v5 0/4] perf tools: Add support for some spe events and precise ip James Clark
2020-02-25 11:57                   ` [PATCH v5 1/4] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
2020-02-25 11:57                   ` [PATCH v5 2/4] perf tools: Add support for "report" for some spe events James Clark
2020-02-29  6:51                     ` Leo Yan
2020-02-25 11:57                   ` [PATCH v5 3/4] perf report: Add SPE options to --itrace argument James Clark
2020-02-25 11:57                   ` [PATCH v5 4/4] perf tools: Support "branch-misses:pp" on arm64 James Clark
2020-02-28 16:03                     ` Mark Rutland
2020-02-11 14:04             ` [PATCH v4 " James Clark
2020-02-17 11:42               ` Adrian Hunter
2020-02-24 17:08                 ` James Clark
2020-02-28 16:01                   ` Mark Rutland
2020-03-06 15:25                     ` [PATCH v6 0/3] perf tools: Add support for some spe events James Clark
2020-03-06 15:25                       ` [PATCH v6 1/3] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir James Clark
2020-03-06 15:25                       ` [PATCH v6 2/3] perf tools: Add support for "report" for some spe events James Clark
2020-03-06 15:25                       ` [PATCH v6 3/3] perf report: Add SPE options to --itrace argument James Clark
2020-03-13 11:33                         ` Leo Yan
2020-03-13 11:53                       ` [PATCH v6 0/3] perf tools: Add support for some spe events Mark Rutland
2020-02-12 12:24             ` [PATCH v4 0/4] perf tools: Add support for some spe events and precise ip Jiri Olsa
2020-02-12 13:10               ` Adrian Hunter
2020-01-23 16:07 ` [PATCH v2 6/7] perf tools: arm-spe: fix record hang after being terminated James Clark
2020-01-23 16:07 ` [PATCH v2 7/7] perf tools: Unset precise_ip when using SPE James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).