linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] perf tools: record aarch64 registers automatically
@ 2021-01-22 16:18 Alexandre Truong
  2021-01-22 16:18 ` [PATCH 2/4] perf tools: add a mechanism to inject stack frames Alexandre Truong
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Alexandre Truong @ 2021-01-22 16:18 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: Alexandre Truong, John Garry, Will Deacon, Mathieu Poirier,
	Leo Yan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kemeng Shi, Ian Rogers, Andi Kleen, Kan Liang, Jin Yao,
	Adrian Hunter, Suzuki K Poulose, Al Grant, James Clark,
	Wilco Dijkstra

On arm64, automatically record all the registers if the frame pointer
mode is on. They will be used to do a dwarf unwind to find the caller
of the leaf frame if the frame pointer was omitted.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Kemeng Shi <shikemeng@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Al Grant <al.grant@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Wilco Dijkstra <wilco.dijkstra@arm.com>
---
 tools/perf/arch/arm64/util/machine.c | 5 +++++
 tools/perf/builtin-record.c          | 7 +++++++
 tools/perf/util/callchain.h          | 2 ++
 3 files changed, 14 insertions(+)

diff --git a/tools/perf/arch/arm64/util/machine.c b/tools/perf/arch/arm64/util/machine.c
index d41b27e781d3..6ba1d356a20c 100644
--- a/tools/perf/arch/arm64/util/machine.c
+++ b/tools/perf/arch/arm64/util/machine.c
@@ -25,3 +25,8 @@ void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
 		p->end = c->start;
 	pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
 }
+
+void arch__add_leaf_frame_record_opts(struct record_opts *opts)
+{
+	opts->sample_user_regs = arch__user_reg_mask();
+}
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7bb10e9863bd..a5161f54b838 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2243,6 +2243,10 @@ static int record__parse_mmap_pages(const struct option *opt,
 	return ret;
 }
 
+void __weak arch__add_leaf_frame_record_opts(struct record_opts *opts __maybe_unused)
+{
+}
+
 static int parse_control_option(const struct option *opt,
 				const char *str,
 				int unset __maybe_unused)
@@ -2810,6 +2814,9 @@ int cmd_record(int argc, const char **argv)
 	/* Enable ignoring missing threads when -u/-p option is defined. */
 	rec->opts.ignore_missing_thread = rec->opts.target.uid != UINT_MAX || rec->opts.target.pid;
 
+	if (callchain_param.enabled && callchain_param.record_mode == CALLCHAIN_FP)
+		arch__add_leaf_frame_record_opts(&rec->opts);
+
 	err = -ENOMEM;
 	if (evlist__create_maps(rec->evlist, &rec->opts.target) < 0)
 		usage_with_options(record_usage, record_options);
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 5824134f983b..77fba053c677 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -280,6 +280,8 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused,
 }
 #endif
 
+void arch__add_leaf_frame_record_opts(struct record_opts *opts);
+
 char *callchain_list__sym_name(struct callchain_list *cl,
 			       char *bf, size_t bfsize, bool show_dso);
 char *callchain_node__scnprintf_value(struct callchain_node *node,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] perf tools: add a mechanism to inject stack frames
  2021-01-22 16:18 [PATCH 1/4] perf tools: record aarch64 registers automatically Alexandre Truong
@ 2021-01-22 16:18 ` Alexandre Truong
  2021-01-22 16:18 ` [PATCH 3/4] perf tools: enable dwarf_callchain_users on arm64 Alexandre Truong
  2021-01-22 16:18 ` [PATCH 4/4] perf tools: determine if LR is the return address Alexandre Truong
  2 siblings, 0 replies; 8+ messages in thread
From: Alexandre Truong @ 2021-01-22 16:18 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: Alexandre Truong, John Garry, Will Deacon, Mathieu Poirier,
	Leo Yan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kemeng Shi, Ian Rogers, Andi Kleen, Kan Liang, Jin Yao,
	Adrian Hunter, Suzuki K Poulose, Al Grant, James Clark,
	Wilco Dijkstra

Add a mechanism for platforms to inject stack frames for the leaf
frame caller if there is enough information to determine a frame
is missing from dwarf or other post processing mechanisms.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Kemeng Shi <shikemeng@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Al Grant <al.grant@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Wilco Dijkstra <wilco.dijkstra@arm.com>
---
 tools/perf/util/machine.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 522ea3236bcc..40082d70eec1 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2671,6 +2671,12 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
 	return err;
 }
 
+static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
+		struct thread *thread __maybe_unused)
+{
+	return 0;
+}
+
 static int thread__resolve_callchain_sample(struct thread *thread,
 					    struct callchain_cursor *cursor,
 					    struct evsel *evsel,
@@ -2687,6 +2693,8 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	int i, j, err, nr_entries;
 	int skip_idx = -1;
 	int first_call = 0;
+	u64 leaf_frame_caller;
+	int pos;
 
 	if (chain)
 		chain_nr = chain->nr;
@@ -2811,6 +2819,21 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 			continue;
 		}
 
+		pos = callchain_param.order == ORDER_CALLEE ? 2 : chain_nr - 2;
+
+		if (i == pos) {
+			leaf_frame_caller = get_leaf_frame_caller(sample, thread);
+
+			if (leaf_frame_caller && leaf_frame_caller != ip) {
+
+				err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, leaf_frame_caller,
+					       false, NULL, NULL, 0);
+				if (err)
+					return (err < 0) ? err : 0;
+			}
+		}
+
 		err = add_callchain_ip(thread, cursor, parent,
 				       root_al, &cpumode, ip,
 				       false, NULL, NULL, 0);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] perf tools: enable dwarf_callchain_users on arm64
  2021-01-22 16:18 [PATCH 1/4] perf tools: record aarch64 registers automatically Alexandre Truong
  2021-01-22 16:18 ` [PATCH 2/4] perf tools: add a mechanism to inject stack frames Alexandre Truong
@ 2021-01-22 16:18 ` Alexandre Truong
  2021-01-22 16:18 ` [PATCH 4/4] perf tools: determine if LR is the return address Alexandre Truong
  2 siblings, 0 replies; 8+ messages in thread
From: Alexandre Truong @ 2021-01-22 16:18 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: Alexandre Truong, John Garry, Will Deacon, Mathieu Poirier,
	Leo Yan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kemeng Shi, Ian Rogers, Andi Kleen, Kan Liang, Jin Yao,
	Adrian Hunter, Suzuki K Poulose, Al Grant, James Clark,
	Wilco Dijkstra

On arm64, enable dwarf_callchain_users which will be needed
to do a dwarf unwind in order to get the caller of the leaf frame.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Kemeng Shi <shikemeng@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Al Grant <al.grant@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Wilco Dijkstra <wilco.dijkstra@arm.com>
---
 tools/perf/builtin-report.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2a845d6cac09..93661a3eaeb1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -405,6 +405,10 @@ static int report__setup_sample_type(struct report *rep)
 
 	callchain_param_setup(sample_type);
 
+	if (callchain_param.record_mode == CALLCHAIN_FP &&
+			strncmp(rep->session->header.env.arch, "aarch64", 7) == 0)
+		dwarf_callchain_users = true;
+
 	if (rep->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
 		ui__warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
 			    "Please apply --call-graph lbr when recording.\n");
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] perf tools: determine if LR is the return address
  2021-01-22 16:18 [PATCH 1/4] perf tools: record aarch64 registers automatically Alexandre Truong
  2021-01-22 16:18 ` [PATCH 2/4] perf tools: add a mechanism to inject stack frames Alexandre Truong
  2021-01-22 16:18 ` [PATCH 3/4] perf tools: enable dwarf_callchain_users on arm64 Alexandre Truong
@ 2021-01-22 16:18 ` Alexandre Truong
  2021-01-24  0:05   ` Jiri Olsa
  2021-02-08 15:39   ` James Clark
  2 siblings, 2 replies; 8+ messages in thread
From: Alexandre Truong @ 2021-01-22 16:18 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: Alexandre Truong, John Garry, Will Deacon, Mathieu Poirier,
	Leo Yan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kemeng Shi, Ian Rogers, Andi Kleen, Kan Liang, Jin Yao,
	Adrian Hunter, Suzuki K Poulose, Al Grant, James Clark,
	Wilco Dijkstra

On arm64 and frame pointer mode (e.g: perf record --callgraph fp),
use dwarf unwind info to check if the link register is the return
address in order to inject it to the frame pointer stack.

Write the following application:

	int a = 10;

	void f2(void)
	{
		for (int i = 0; i < 1000000; i++)
			a *= a;
	}

	void f1()
	{
		f2();
	}

	int main (void)
	{
		f1();
		return 0;
	}

with the following compilation flags:
	gcc -g -fno-omit-frame-pointer -fno-inline -O1

The compiler omits the frame pointer for f2 on arm. This is a problem
with any leaf call, for example an application with many different
calls to malloc() would always omit the calling frame, even if it
can be determined.

	./perf record --call-graph fp ./a.out
	./perf report

currently gives the following stack:

0xffffea52f361
_start
__libc_start_main
main
f2

After this change, perf report correctly shows f1() calling f2(),
even though it was missing from the frame pointer unwind:

	./perf report

0xffffea52f361
_start
__libc_start_main
main
f1
f2

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Kemeng Shi <shikemeng@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Al Grant <al.grant@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Wilco Dijkstra <wilco.dijkstra@arm.com>
---
 tools/perf/util/Build                         |  1 +
 .../util/arm-frame-pointer-unwind-support.c   | 43 +++++++++++++++++++
 .../util/arm-frame-pointer-unwind-support.h   |  7 +++
 tools/perf/util/machine.c                     |  9 ++--
 4 files changed, 57 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/util/arm-frame-pointer-unwind-support.c
 create mode 100644 tools/perf/util/arm-frame-pointer-unwind-support.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e2563d0154eb..2009d5f02972 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -1,3 +1,4 @@
+perf-y += arm-frame-pointer-unwind-support.o
 perf-y += annotate.o
 perf-y += block-info.o
 perf-y += block-range.o
diff --git a/tools/perf/util/arm-frame-pointer-unwind-support.c b/tools/perf/util/arm-frame-pointer-unwind-support.c
new file mode 100644
index 000000000000..2901ae2917e9
--- /dev/null
+++ b/tools/perf/util/arm-frame-pointer-unwind-support.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "../arch/arm64/include/uapi/asm/perf_regs.h"
+#include "arch/arm64/include/perf_regs.h"
+#include "event.h"
+#include "arm-frame-pointer-unwind-support.h"
+#include "callchain.h"
+#include "unwind.h"
+
+struct entries {
+	u64 stack[2];
+	int i;
+};
+
+static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
+{
+	return callchain_param.record_mode != CALLCHAIN_FP || !sample->user_regs.regs
+		|| sample->user_regs.mask != PERF_REGS_MASK;
+}
+
+static int add_entry(struct unwind_entry *entry, void *arg)
+{
+	struct entries *entries = arg;
+
+	entries->stack[entries->i++] = entry->ip;
+	return 0;
+}
+
+u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread)
+{
+	u64 leaf_frame;
+	struct entries entries = {{0, 0}, 0};
+
+	if (get_leaf_frame_caller_enabled(sample))
+		return 0;
+
+	unwind__get_entries(add_entry, &entries, thread, sample, 2);
+	leaf_frame = callchain_param.order == ORDER_CALLER ?
+		entries.stack[0] : entries.stack[1];
+
+	if (leaf_frame + 1 == sample->user_regs.regs[PERF_REG_ARM64_LR])
+		return sample->user_regs.regs[PERF_REG_ARM64_LR];
+	return 0;
+}
diff --git a/tools/perf/util/arm-frame-pointer-unwind-support.h b/tools/perf/util/arm-frame-pointer-unwind-support.h
new file mode 100644
index 000000000000..16dc03fa9abe
--- /dev/null
+++ b/tools/perf/util/arm-frame-pointer-unwind-support.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
+#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
+
+u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread);
+
+#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 40082d70eec1..bc6147e46c89 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -34,6 +34,7 @@
 #include "bpf-event.h"
 #include <internal/lib.h> // page_size
 #include "cgroup.h"
+#include "arm-frame-pointer-unwind-support.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -2671,10 +2672,12 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
 	return err;
 }
 
-static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
-		struct thread *thread __maybe_unused)
+static u64 get_leaf_frame_caller(struct perf_sample *sample, struct thread *thread)
 {
-	return 0;
+	if (strncmp(thread->maps->machine->env->arch, "aarch64", 7) == 0)
+		return get_leaf_frame_caller_aarch64(sample, thread);
+	else
+		return 0;
 }
 
 static int thread__resolve_callchain_sample(struct thread *thread,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] perf tools: determine if LR is the return address
  2021-01-22 16:18 ` [PATCH 4/4] perf tools: determine if LR is the return address Alexandre Truong
@ 2021-01-24  0:05   ` Jiri Olsa
  2021-01-25  9:39     ` James Clark
  2021-02-08 15:39   ` James Clark
  1 sibling, 1 reply; 8+ messages in thread
From: Jiri Olsa @ 2021-01-24  0:05 UTC (permalink / raw)
  To: Alexandre Truong
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Namhyung Kim, Kemeng Shi, Ian Rogers, Andi Kleen, Kan Liang,
	Jin Yao, Adrian Hunter, Suzuki K Poulose, Al Grant, James Clark,
	Wilco Dijkstra

On Fri, Jan 22, 2021 at 04:18:54PM +0000, Alexandre Truong wrote:
> On arm64 and frame pointer mode (e.g: perf record --callgraph fp),
> use dwarf unwind info to check if the link register is the return
> address in order to inject it to the frame pointer stack.
> 
> Write the following application:
> 
> 	int a = 10;
> 
> 	void f2(void)
> 	{
> 		for (int i = 0; i < 1000000; i++)
> 			a *= a;
> 	}
> 
> 	void f1()
> 	{
> 		f2();
> 	}
> 
> 	int main (void)
> 	{
> 		f1();
> 		return 0;
> 	}
> 
> with the following compilation flags:
> 	gcc -g -fno-omit-frame-pointer -fno-inline -O1
> 
> The compiler omits the frame pointer for f2 on arm. This is a problem
> with any leaf call, for example an application with many different
> calls to malloc() would always omit the calling frame, even if it
> can be determined.
> 
> 	./perf record --call-graph fp ./a.out
> 	./perf report
> 
> currently gives the following stack:
> 
> 0xffffea52f361
> _start
> __libc_start_main
> main
> f2

reproduced on x86 as well

> +static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
> +{
> +	return callchain_param.record_mode != CALLCHAIN_FP || !sample->user_regs.regs
> +		|| sample->user_regs.mask != PERF_REGS_MASK;
> +}
> +
> +static int add_entry(struct unwind_entry *entry, void *arg)
> +{
> +	struct entries *entries = arg;
> +
> +	entries->stack[entries->i++] = entry->ip;
> +	return 0;
> +}
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread)
> +{
> +	u64 leaf_frame;
> +	struct entries entries = {{0, 0}, 0};
> +
> +	if (get_leaf_frame_caller_enabled(sample))

the name suggest you'd want to continue if it's true

> +		return 0;
> +
> +	unwind__get_entries(add_entry, &entries, thread, sample, 2);

I'm scratching my head how this unwinds anything, you enabled just
registers, not the stack right? so the unwind code would do just
IP -> LR + 1 shift?

thanks,
jirka

> +	leaf_frame = callchain_param.order == ORDER_CALLER ?
> +		entries.stack[0] : entries.stack[1];
> +
> +	if (leaf_frame + 1 == sample->user_regs.regs[PERF_REG_ARM64_LR])
> +		return sample->user_regs.regs[PERF_REG_ARM64_LR];
> +	return 0;
> +}

SNIP


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] perf tools: determine if LR is the return address
  2021-01-24  0:05   ` Jiri Olsa
@ 2021-01-25  9:39     ` James Clark
  0 siblings, 0 replies; 8+ messages in thread
From: James Clark @ 2021-01-25  9:39 UTC (permalink / raw)
  To: Jiri Olsa, Alexandre Truong
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Namhyung Kim, Kemeng Shi, Ian Rogers, Andi Kleen, Kan Liang,
	Jin Yao, Adrian Hunter, Suzuki K Poulose, Al Grant,
	Wilco Dijkstra



On 24/01/2021 02:05, Jiri Olsa wrote:
> On Fri, Jan 22, 2021 at 04:18:54PM +0000, Alexandre Truong wrote:
>> On arm64 and frame pointer mode (e.g: perf record --callgraph fp),
>> use dwarf unwind info to check if the link register is the return
>> address in order to inject it to the frame pointer stack.
>>
>> Write the following application:
>>
>> 	int a = 10;
>>
>> 	void f2(void)
>> 	{
>> 		for (int i = 0; i < 1000000; i++)
>> 			a *= a;
>> 	}
>>
>> 	void f1()
>> 	{
>> 		f2();
>> 	}
>>
>> 	int main (void)
>> 	{
>> 		f1();
>> 		return 0;
>> 	}
>>
>> with the following compilation flags:
>> 	gcc -g -fno-omit-frame-pointer -fno-inline -O1
>>
>> The compiler omits the frame pointer for f2 on arm. This is a problem
>> with any leaf call, for example an application with many different
>> calls to malloc() would always omit the calling frame, even if it
>> can be determined.
>>
>> 	./perf record --call-graph fp ./a.out
>> 	./perf report
>>
>> currently gives the following stack:
>>
>> 0xffffea52f361
>> _start
>> __libc_start_main
>> main
>> f2
> 
> reproduced on x86 as well
> 
>> +static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
>> +{
>> +	return callchain_param.record_mode != CALLCHAIN_FP || !sample->user_regs.regs
>> +		|| sample->user_regs.mask != PERF_REGS_MASK;
>> +}
>> +
>> +static int add_entry(struct unwind_entry *entry, void *arg)
>> +{
>> +	struct entries *entries = arg;
>> +
>> +	entries->stack[entries->i++] = entry->ip;
>> +	return 0;
>> +}
>> +
>> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread)
>> +{
>> +	u64 leaf_frame;
>> +	struct entries entries = {{0, 0}, 0};
>> +
>> +	if (get_leaf_frame_caller_enabled(sample))
> 
> the name suggest you'd want to continue if it's true
> 
>> +		return 0;
>> +
>> +	unwind__get_entries(add_entry, &entries, thread, sample, 2);
> 
> I'm scratching my head how this unwinds anything, you enabled just
> registers, not the stack right? so the unwind code would do just
> IP -> LR + 1 shift?

I think the idea about using libunwind is that the LR might not
be a valid return address. It could be used as a general purpose
register, or just not used at all.

Libunwind should be able to use the dwarf present in the binary to
unwind one frame, as long as nothing stored in the stack is needed.

But now I look at the disassembly for this example, I see that f2()
just has a single 'b' instruction, and not 'bl' so the link register
won't be set. And also 'f1' does store a few things on the stack.
Whether these are needed or not to unwind one frame I'm not sure.

It could be that libunwind is falling back to a frame pointer unwind
mode, which we don't want.

I think it needs further investigation.


James

> 
> thanks,
> jirka
> 
>> +	leaf_frame = callchain_param.order == ORDER_CALLER ?
>> +		entries.stack[0] : entries.stack[1];
>> +
>> +	if (leaf_frame + 1 == sample->user_regs.regs[PERF_REG_ARM64_LR])
>> +		return sample->user_regs.regs[PERF_REG_ARM64_LR];
>> +	return 0;
>> +}
> 
> SNIP
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] perf tools: determine if LR is the return address
  2021-01-22 16:18 ` [PATCH 4/4] perf tools: determine if LR is the return address Alexandre Truong
  2021-01-24  0:05   ` Jiri Olsa
@ 2021-02-08 15:39   ` James Clark
  2021-02-10 12:05     ` Alexandre Truong
  1 sibling, 1 reply; 8+ messages in thread
From: James Clark @ 2021-02-08 15:39 UTC (permalink / raw)
  To: Alexandre Truong, linux-kernel, linux-perf-users
  Cc: John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kemeng Shi, Ian Rogers, Andi Kleen, Kan Liang, Jin Yao,
	Adrian Hunter, Suzuki K Poulose, Al Grant, Wilco Dijkstra



On 22/01/2021 18:18, Alexandre Truong wrote:

> +}
> +
> +static int add_entry(struct unwind_entry *entry, void *arg)
> +{
> +	struct entries *entries = arg;
> +
> +	entries->stack[entries->i++] = entry->ip;
> +	return 0;
> +}
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread)
> +{
> +	u64 leaf_frame;
> +	struct entries entries = {{0, 0}, 0};
> +
> +	if (get_leaf_frame_caller_enabled(sample))
> +		return 0;
> +
> +	unwind__get_entries(add_entry, &entries, thread, sample, 2);
> +	leaf_frame = callchain_param.order == ORDER_CALLER ?
> +		entries.stack[0] : entries.stack[1];
> +
> +	if (leaf_frame + 1 == sample->user_regs.regs[PERF_REG_ARM64_LR])
> +		return sample->user_regs.regs[PERF_REG_ARM64_LR];

Hi Alex,

From your other reply about your investigation it looks like the check against PERF_REG_ARM64_LR isn't
required because libunwind won't return a value if it's not correct. Whether it's equal to the LR or not.

And PERF_REG_ARM64_LR points to the instruction _after_ the call site. i.e. where to return to,
not where the call was made from. So just leaf_frame rather than leaf_frame+1 would be more accurate.

I was also looking at unwind_entry in machine.c which is similar to your add_entry function and saw that it
does some extra bits like this:

	if (symbol_conf.hide_unresolved && entry->ms.sym == NULL)
		return 0;

	if (append_inlines(cursor, &entry->ms, entry->ip) == 0)
		return 0;

	/*
	 * Convert entry->ip from a virtual address to an offset in
	 * its corresponding binary.
	 */
	if (entry->ms.map)
		addr = map__map_ip(entry->ms.map, entry->ip);

I have a feeling you will also need to do those on your values returned from libunwind to make it 100%
equivalent.

James

> +	return 0;
> +}
> diff --git a/tools/perf/util/arm-frame-pointer-unwind-support.h b/tools/perf/util/arm-frame-pointer-unwind-support.h
> new file mode 100644
> index 000000000000..16dc03fa9abe
> --- /dev/null
> +++ b/tools/perf/util/arm-frame-pointer-unwind-support.h
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread);
> +
> +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 40082d70eec1..bc6147e46c89 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -34,6 +34,7 @@
>  #include "bpf-event.h"
>  #include <internal/lib.h> // page_size
>  #include "cgroup.h"
> +#include "arm-frame-pointer-unwind-support.h"
>  
>  #include <linux/ctype.h>
>  #include <symbol/kallsyms.h>
> @@ -2671,10 +2672,12 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
>  	return err;
>  }
>  
> -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> -		struct thread *thread __maybe_unused)
> +static u64 get_leaf_frame_caller(struct perf_sample *sample, struct thread *thread)
>  {
> -	return 0;
> +	if (strncmp(thread->maps->machine->env->arch, "aarch64", 7) == 0)
> +		return get_leaf_frame_caller_aarch64(sample, thread);
> +	else
> +		return 0;
>  }
>  
>  static int thread__resolve_callchain_sample(struct thread *thread,
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] perf tools: determine if LR is the return address
  2021-02-08 15:39   ` James Clark
@ 2021-02-10 12:05     ` Alexandre Truong
  0 siblings, 0 replies; 8+ messages in thread
From: Alexandre Truong @ 2021-02-10 12:05 UTC (permalink / raw)
  To: James Clark, linux-kernel, linux-perf-users
  Cc: John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kemeng Shi, Ian Rogers, Andi Kleen, Kan Liang, Jin Yao,
	Adrian Hunter, Suzuki K Poulose, Al Grant, Wilco Dijkstra



On 2/8/21 3:39 PM, James Clark wrote:
>
>
> On 22/01/2021 18:18, Alexandre Truong wrote:
>
>> +}
>> +
>> +static int add_entry(struct unwind_entry *entry, void *arg)
>> +{
>> +    struct entries *entries = arg;
>> +
>> +    entries->stack[entries->i++] = entry->ip;
>> +    return 0;
>> +}
>> +
>> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread)
>> +{
>> +    u64 leaf_frame;
>> +    struct entries entries = {{0, 0}, 0};
>> +
>> +    if (get_leaf_frame_caller_enabled(sample))
>> +            return 0;
>> +
>> +    unwind__get_entries(add_entry, &entries, thread, sample, 2);
>> +    leaf_frame = callchain_param.order == ORDER_CALLER ?
>> +            entries.stack[0] : entries.stack[1];
>> +
>> +    if (leaf_frame + 1 == sample->user_regs.regs[PERF_REG_ARM64_LR])
>> +            return sample->user_regs.regs[PERF_REG_ARM64_LR];
>
> Hi Alex,
>
>  From your other reply about your investigation it looks like the check against PERF_REG_ARM64_LR isn't
> required because libunwind won't return a value if it's not correct. Whether it's equal to the LR or not.
>
> And PERF_REG_ARM64_LR points to the instruction _after_ the call site. i.e. where to return to,
> not where the call was made from. So just leaf_frame rather than leaf_frame+1 would be more accurate.
>
> I was also looking at unwind_entry in machine.c which is similar to your add_entry function and saw that it
> does some extra bits like this:
>
>       if (symbol_conf.hide_unresolved && entry->ms.sym == NULL)
>               return 0;
>
>       if (append_inlines(cursor, &entry->ms, entry->ip) == 0)
>               return 0;
>
>       /*
>        * Convert entry->ip from a virtual address to an offset in
>        * its corresponding binary.
>        */
>       if (entry->ms.map)
>               addr = map__map_ip(entry->ms.map, entry->ip);
>
> I have a feeling you will also need to do those on your values returned from libunwind to make it 100%
> equivalent.
>
> James
>

Hi James,

Thanks for your reply.

The check against PERF_REG_ARM64_LR is indeed not required and I can
check if libunwind goes successfully.

I am going to follow up with a v2 of the patch applying these changes.

I think the bits you mentioned don't need to be added because this check
is already done in add_callchain_ip() called afterwards in machine.c :

     if (symbol_conf.hide_unresolved && entry->ms.sym == NULL)
         return 0;

For the second one, I don't think it needs to be added either because
append_inlines() appends ip on the cursor which is also already done by
add_callchain_ip().

For the last one, the conversion from a virtual address to a binary one
isn't required.


Also for the expansion to all platforms, it doesn't work on x86 so I'll
leave it just for arm for now.

Regards,

Alexandre

>> +    return 0;
>> +}
>> diff --git a/tools/perf/util/arm-frame-pointer-unwind-support.h b/tools/perf/util/arm-frame-pointer-unwind-support.h
>> new file mode 100644
>> index 000000000000..16dc03fa9abe
>> --- /dev/null
>> +++ b/tools/perf/util/arm-frame-pointer-unwind-support.h
>> @@ -0,0 +1,7 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
>> +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
>> +
>> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread);
>> +
>> +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>> index 40082d70eec1..bc6147e46c89 100644
>> --- a/tools/perf/util/machine.c
>> +++ b/tools/perf/util/machine.c
>> @@ -34,6 +34,7 @@
>>   #include "bpf-event.h"
>>   #include <internal/lib.h> // page_size
>>   #include "cgroup.h"
>> +#include "arm-frame-pointer-unwind-support.h"
>>
>>   #include <linux/ctype.h>
>>   #include <symbol/kallsyms.h>
>> @@ -2671,10 +2672,12 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
>>      return err;
>>   }
>>
>> -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
>> -            struct thread *thread __maybe_unused)
>> +static u64 get_leaf_frame_caller(struct perf_sample *sample, struct thread *thread)
>>   {
>> -    return 0;
>> +    if (strncmp(thread->maps->machine->env->arch, "aarch64", 7) == 0)
>> +            return get_leaf_frame_caller_aarch64(sample, thread);
>> +    else
>> +            return 0;
>>   }
>>
>>   static int thread__resolve_callchain_sample(struct thread *thread,
>>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-10 12:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 16:18 [PATCH 1/4] perf tools: record aarch64 registers automatically Alexandre Truong
2021-01-22 16:18 ` [PATCH 2/4] perf tools: add a mechanism to inject stack frames Alexandre Truong
2021-01-22 16:18 ` [PATCH 3/4] perf tools: enable dwarf_callchain_users on arm64 Alexandre Truong
2021-01-22 16:18 ` [PATCH 4/4] perf tools: determine if LR is the return address Alexandre Truong
2021-01-24  0:05   ` Jiri Olsa
2021-01-25  9:39     ` James Clark
2021-02-08 15:39   ` James Clark
2021-02-10 12:05     ` Alexandre Truong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).