All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/6] perf tools/arm64: Fix missing leaf-function callers in ARM64 when using "--call-graph=fp"
@ 2021-12-17 15:45 ` German Gomez
  0 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: German Gomez, John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel

(This cset applies on top of [1])

Call-graphs on ARM64 using the option "--call-grah fp" are missing the
callers of the leaf functions. See [PATCH 6/6] for a before and after
after example using this cset.

[1] https://lore.kernel.org/all/20211207180653.1147374-1-german.gomez@arm.com/

---
Changes since v4
  - [PATCH 4/6] Apply Mark Rutland's comments.
  - [PATCH 6/6] Rewrite commit log.

Changes since v3
  - Only record LR register instead of all registers in [PATCH 1/6].
  - Introduce [PATCH 5/6] to refactor the SAMPL_REG macro.
  - Fix compilation issues on different platforms.

Alexandre Truong (5):
  perf tools: record ARM64 LR register automatically
  perf tools: add a mechanism to inject stack frames
  perf tools: Refactor script__setup_sample_type()
  perf tools: enable dwarf_callchain_users on arm64
  perf tools: determine if LR is the return address

German Gomez (1):
  perf tools: Refactor SMPL_REG macro in perf_regs.h

 tools/perf/arch/arm64/util/machine.c          |  7 +++
 tools/perf/builtin-record.c                   |  8 +++
 tools/perf/builtin-report.c                   |  4 +-
 tools/perf/builtin-script.c                   | 13 +---
 tools/perf/util/Build                         |  1 +
 .../util/arm64-frame-pointer-unwind-support.c | 63 +++++++++++++++++++
 .../util/arm64-frame-pointer-unwind-support.h | 10 +++
 tools/perf/util/callchain.c                   | 14 ++++-
 tools/perf/util/callchain.h                   |  4 +-
 tools/perf/util/machine.c                     | 50 ++++++++++++++-
 tools/perf/util/machine.h                     |  1 +
 tools/perf/util/perf_regs.h                   |  7 ++-
 12 files changed, 162 insertions(+), 20 deletions(-)
 create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.c
 create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 0/6] perf tools/arm64: Fix missing leaf-function callers in ARM64 when using "--call-graph=fp"
@ 2021-12-17 15:45 ` German Gomez
  0 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: German Gomez, John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel

(This cset applies on top of [1])

Call-graphs on ARM64 using the option "--call-grah fp" are missing the
callers of the leaf functions. See [PATCH 6/6] for a before and after
after example using this cset.

[1] https://lore.kernel.org/all/20211207180653.1147374-1-german.gomez@arm.com/

---
Changes since v4
  - [PATCH 4/6] Apply Mark Rutland's comments.
  - [PATCH 6/6] Rewrite commit log.

Changes since v3
  - Only record LR register instead of all registers in [PATCH 1/6].
  - Introduce [PATCH 5/6] to refactor the SAMPL_REG macro.
  - Fix compilation issues on different platforms.

Alexandre Truong (5):
  perf tools: record ARM64 LR register automatically
  perf tools: add a mechanism to inject stack frames
  perf tools: Refactor script__setup_sample_type()
  perf tools: enable dwarf_callchain_users on arm64
  perf tools: determine if LR is the return address

German Gomez (1):
  perf tools: Refactor SMPL_REG macro in perf_regs.h

 tools/perf/arch/arm64/util/machine.c          |  7 +++
 tools/perf/builtin-record.c                   |  8 +++
 tools/perf/builtin-report.c                   |  4 +-
 tools/perf/builtin-script.c                   | 13 +---
 tools/perf/util/Build                         |  1 +
 .../util/arm64-frame-pointer-unwind-support.c | 63 +++++++++++++++++++
 .../util/arm64-frame-pointer-unwind-support.h | 10 +++
 tools/perf/util/callchain.c                   | 14 ++++-
 tools/perf/util/callchain.h                   |  4 +-
 tools/perf/util/machine.c                     | 50 ++++++++++++++-
 tools/perf/util/machine.h                     |  1 +
 tools/perf/util/perf_regs.h                   |  7 ++-
 12 files changed, 162 insertions(+), 20 deletions(-)
 create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.c
 create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.h

-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 1/6] perf tools: record ARM64 LR register automatically
  2021-12-17 15:45 ` German Gomez
@ 2021-12-17 15:45   ` German Gomez
  -1 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel

From: Alexandre Truong <alexandre.truong@arm.com>

On ARM64, automatically record the link register if the frame pointer
mode is on. It will be used to do a dwarf unwind to find the caller
of the leaf frame if the frame pointer was omitted.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/arch/arm64/util/machine.c | 7 +++++++
 tools/perf/builtin-record.c          | 8 ++++++++
 tools/perf/util/callchain.h          | 2 ++
 3 files changed, 17 insertions(+)

diff --git a/tools/perf/arch/arm64/util/machine.c b/tools/perf/arch/arm64/util/machine.c
index 7e7714290a87..d2ce31e28cd7 100644
--- a/tools/perf/arch/arm64/util/machine.c
+++ b/tools/perf/arch/arm64/util/machine.c
@@ -5,6 +5,8 @@
 #include <string.h>
 #include "debug.h"
 #include "symbol.h"
+#include "callchain.h"
+#include "record.h"
 
 /* On arm64, kernel text segment starts at high memory address,
  * for example 0xffff 0000 8xxx xxxx. Modules start at a low memory
@@ -26,3 +28,8 @@ void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
 		p->end = c->start;
 	pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end);
 }
+
+void arch__add_leaf_frame_record_opts(struct record_opts *opts)
+{
+	opts->sample_user_regs |= sample_reg_masks[PERF_REG_ARM64_LR].mask;
+}
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0338b813585a..6ac2160913ea 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2267,6 +2267,10 @@ static int record__parse_mmap_pages(const struct option *opt,
 	return ret;
 }
 
+void __weak arch__add_leaf_frame_record_opts(struct record_opts *opts __maybe_unused)
+{
+}
+
 static int parse_control_option(const struct option *opt,
 				const char *str,
 				int unset __maybe_unused)
@@ -2898,6 +2902,10 @@ int cmd_record(int argc, const char **argv)
 	}
 
 	rec->opts.target.hybrid = perf_pmu__has_hybrid();
+
+	if (callchain_param.enabled && callchain_param.record_mode == CALLCHAIN_FP)
+		arch__add_leaf_frame_record_opts(&rec->opts);
+
 	err = -ENOMEM;
 	if (evlist__create_maps(rec->evlist, &rec->opts.target) < 0)
 		usage_with_options(record_usage, record_options);
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 5824134f983b..77fba053c677 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -280,6 +280,8 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused,
 }
 #endif
 
+void arch__add_leaf_frame_record_opts(struct record_opts *opts);
+
 char *callchain_list__sym_name(struct callchain_list *cl,
 			       char *bf, size_t bfsize, bool show_dso);
 char *callchain_node__scnprintf_value(struct callchain_node *node,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 1/6] perf tools: record ARM64 LR register automatically
@ 2021-12-17 15:45   ` German Gomez
  0 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel

From: Alexandre Truong <alexandre.truong@arm.com>

On ARM64, automatically record the link register if the frame pointer
mode is on. It will be used to do a dwarf unwind to find the caller
of the leaf frame if the frame pointer was omitted.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/arch/arm64/util/machine.c | 7 +++++++
 tools/perf/builtin-record.c          | 8 ++++++++
 tools/perf/util/callchain.h          | 2 ++
 3 files changed, 17 insertions(+)

diff --git a/tools/perf/arch/arm64/util/machine.c b/tools/perf/arch/arm64/util/machine.c
index 7e7714290a87..d2ce31e28cd7 100644
--- a/tools/perf/arch/arm64/util/machine.c
+++ b/tools/perf/arch/arm64/util/machine.c
@@ -5,6 +5,8 @@
 #include <string.h>
 #include "debug.h"
 #include "symbol.h"
+#include "callchain.h"
+#include "record.h"
 
 /* On arm64, kernel text segment starts at high memory address,
  * for example 0xffff 0000 8xxx xxxx. Modules start at a low memory
@@ -26,3 +28,8 @@ void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
 		p->end = c->start;
 	pr_debug4("%s sym:%s end:%#" PRIx64 "\n", __func__, p->name, p->end);
 }
+
+void arch__add_leaf_frame_record_opts(struct record_opts *opts)
+{
+	opts->sample_user_regs |= sample_reg_masks[PERF_REG_ARM64_LR].mask;
+}
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0338b813585a..6ac2160913ea 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2267,6 +2267,10 @@ static int record__parse_mmap_pages(const struct option *opt,
 	return ret;
 }
 
+void __weak arch__add_leaf_frame_record_opts(struct record_opts *opts __maybe_unused)
+{
+}
+
 static int parse_control_option(const struct option *opt,
 				const char *str,
 				int unset __maybe_unused)
@@ -2898,6 +2902,10 @@ int cmd_record(int argc, const char **argv)
 	}
 
 	rec->opts.target.hybrid = perf_pmu__has_hybrid();
+
+	if (callchain_param.enabled && callchain_param.record_mode == CALLCHAIN_FP)
+		arch__add_leaf_frame_record_opts(&rec->opts);
+
 	err = -ENOMEM;
 	if (evlist__create_maps(rec->evlist, &rec->opts.target) < 0)
 		usage_with_options(record_usage, record_options);
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 5824134f983b..77fba053c677 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -280,6 +280,8 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused,
 }
 #endif
 
+void arch__add_leaf_frame_record_opts(struct record_opts *opts);
+
 char *callchain_list__sym_name(struct callchain_list *cl,
 			       char *bf, size_t bfsize, bool show_dso);
 char *callchain_node__scnprintf_value(struct callchain_node *node,
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 2/6] perf tools: add a mechanism to inject stack frames
  2021-12-17 15:45 ` German Gomez
@ 2021-12-17 15:45   ` German Gomez
  -1 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel

From: Alexandre Truong <alexandre.truong@arm.com>

Add a mechanism for platforms to inject stack frames for the leaf
frame caller if there is enough information to determine a frame
is missing from dwarf or other post processing mechanisms.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/machine.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index fb8496df8432..3eddad009f78 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2710,6 +2710,12 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
 	return err;
 }
 
+static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
+		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
+{
+	return 0;
+}
+
 static int thread__resolve_callchain_sample(struct thread *thread,
 					    struct callchain_cursor *cursor,
 					    struct evsel *evsel,
@@ -2723,9 +2729,10 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = 0;
 	u8 cpumode = PERF_RECORD_MISC_USER;
-	int i, j, err, nr_entries;
+	int i, j, err, nr_entries, usr_idx;
 	int skip_idx = -1;
 	int first_call = 0;
+	u64 leaf_frame_caller;
 
 	if (chain)
 		chain_nr = chain->nr;
@@ -2850,6 +2857,34 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 			continue;
 		}
 
+		/*
+		 * PERF_CONTEXT_USER allows us to locate where the user stack ends.
+		 * Depending on callchain_param.order and the position of PERF_CONTEXT_USER,
+		 * the index will be different in order to add the missing frame
+		 * at the right place.
+		 */
+
+		usr_idx = callchain_param.order == ORDER_CALLEE ? j-2 : j-1;
+
+		if (usr_idx >= 0 && chain->ips[usr_idx] == PERF_CONTEXT_USER) {
+
+			leaf_frame_caller = get_leaf_frame_caller(sample, thread, usr_idx);
+
+			/*
+			 * check if leaf_frame_Caller != ip to not add the same
+			 * value twice.
+			 */
+
+			if (leaf_frame_caller && leaf_frame_caller != ip) {
+
+				err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, leaf_frame_caller,
+					       false, NULL, NULL, 0);
+				if (err)
+					return (err < 0) ? err : 0;
+			}
+		}
+
 		err = add_callchain_ip(thread, cursor, parent,
 				       root_al, &cpumode, ip,
 				       false, NULL, NULL, 0);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 2/6] perf tools: add a mechanism to inject stack frames
@ 2021-12-17 15:45   ` German Gomez
  0 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel

From: Alexandre Truong <alexandre.truong@arm.com>

Add a mechanism for platforms to inject stack frames for the leaf
frame caller if there is enough information to determine a frame
is missing from dwarf or other post processing mechanisms.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/machine.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index fb8496df8432..3eddad009f78 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2710,6 +2710,12 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
 	return err;
 }
 
+static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
+		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
+{
+	return 0;
+}
+
 static int thread__resolve_callchain_sample(struct thread *thread,
 					    struct callchain_cursor *cursor,
 					    struct evsel *evsel,
@@ -2723,9 +2729,10 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = 0;
 	u8 cpumode = PERF_RECORD_MISC_USER;
-	int i, j, err, nr_entries;
+	int i, j, err, nr_entries, usr_idx;
 	int skip_idx = -1;
 	int first_call = 0;
+	u64 leaf_frame_caller;
 
 	if (chain)
 		chain_nr = chain->nr;
@@ -2850,6 +2857,34 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 			continue;
 		}
 
+		/*
+		 * PERF_CONTEXT_USER allows us to locate where the user stack ends.
+		 * Depending on callchain_param.order and the position of PERF_CONTEXT_USER,
+		 * the index will be different in order to add the missing frame
+		 * at the right place.
+		 */
+
+		usr_idx = callchain_param.order == ORDER_CALLEE ? j-2 : j-1;
+
+		if (usr_idx >= 0 && chain->ips[usr_idx] == PERF_CONTEXT_USER) {
+
+			leaf_frame_caller = get_leaf_frame_caller(sample, thread, usr_idx);
+
+			/*
+			 * check if leaf_frame_Caller != ip to not add the same
+			 * value twice.
+			 */
+
+			if (leaf_frame_caller && leaf_frame_caller != ip) {
+
+				err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, leaf_frame_caller,
+					       false, NULL, NULL, 0);
+				if (err)
+					return (err < 0) ? err : 0;
+			}
+		}
+
 		err = add_callchain_ip(thread, cursor, parent,
 				       root_al, &cpumode, ip,
 				       false, NULL, NULL, 0);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 3/6] perf tools: Refactor script__setup_sample_type()
  2021-12-17 15:45 ` German Gomez
@ 2021-12-17 15:45   ` German Gomez
  -1 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel

From: Alexandre Truong <alexandre.truong@arm.com>

Refactoring script__setup_sample_type() by using
callchain_param_setup() to replace the duplicate code
for callchain parameter setting up.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/builtin-script.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index da2175d70ac9..ab7d575f97f2 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3468,16 +3468,7 @@ static void script__setup_sample_type(struct perf_script *script)
 	struct perf_session *session = script->session;
 	u64 sample_type = evlist__combined_sample_type(session->evlist);
 
-	if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain) {
-		if ((sample_type & PERF_SAMPLE_REGS_USER) &&
-		    (sample_type & PERF_SAMPLE_STACK_USER)) {
-			callchain_param.record_mode = CALLCHAIN_DWARF;
-			dwarf_callchain_users = true;
-		} else if (sample_type & PERF_SAMPLE_BRANCH_STACK)
-			callchain_param.record_mode = CALLCHAIN_LBR;
-		else
-			callchain_param.record_mode = CALLCHAIN_FP;
-	}
+	callchain_param_setup(sample_type);
 
 	if (script->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
 		pr_warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 3/6] perf tools: Refactor script__setup_sample_type()
@ 2021-12-17 15:45   ` German Gomez
  0 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel

From: Alexandre Truong <alexandre.truong@arm.com>

Refactoring script__setup_sample_type() by using
callchain_param_setup() to replace the duplicate code
for callchain parameter setting up.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/builtin-script.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index da2175d70ac9..ab7d575f97f2 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3468,16 +3468,7 @@ static void script__setup_sample_type(struct perf_script *script)
 	struct perf_session *session = script->session;
 	u64 sample_type = evlist__combined_sample_type(session->evlist);
 
-	if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain) {
-		if ((sample_type & PERF_SAMPLE_REGS_USER) &&
-		    (sample_type & PERF_SAMPLE_STACK_USER)) {
-			callchain_param.record_mode = CALLCHAIN_DWARF;
-			dwarf_callchain_users = true;
-		} else if (sample_type & PERF_SAMPLE_BRANCH_STACK)
-			callchain_param.record_mode = CALLCHAIN_LBR;
-		else
-			callchain_param.record_mode = CALLCHAIN_FP;
-	}
+	callchain_param_setup(sample_type);
 
 	if (script->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
 		pr_warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 4/6] perf tools: enable dwarf_callchain_users on arm64
  2021-12-17 15:45 ` German Gomez
@ 2021-12-17 15:45   ` German Gomez
  -1 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel

From: Alexandre Truong <alexandre.truong@arm.com>

On arm64, enable dwarf_callchain_users which will be needed
to do a dwarf unwind in order to get the caller of the leaf frame.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/builtin-report.c |  4 ++--
 tools/perf/builtin-script.c |  4 ++--
 tools/perf/util/callchain.c | 14 +++++++++++++-
 tools/perf/util/callchain.h |  2 +-
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 8167ebfe776a..a31ad60ba66e 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -410,7 +410,7 @@ static int report__setup_sample_type(struct report *rep)
 		}
 	}
 
-	callchain_param_setup(sample_type);
+	callchain_param_setup(sample_type, perf_env__arch(&rep->session->header.env));
 
 	if (rep->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
 		ui__warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
@@ -1124,7 +1124,7 @@ static int process_attr(struct perf_tool *tool __maybe_unused,
 	 * on events sample_type.
 	 */
 	sample_type = evlist__combined_sample_type(*pevlist);
-	callchain_param_setup(sample_type);
+	callchain_param_setup(sample_type, perf_env__arch((*pevlist)->env));
 	return 0;
 }
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ab7d575f97f2..d308adfd1176 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2318,7 +2318,7 @@ static int process_attr(struct perf_tool *tool, union perf_event *event,
 	 * on events sample_type.
 	 */
 	sample_type = evlist__combined_sample_type(evlist);
-	callchain_param_setup(sample_type);
+	callchain_param_setup(sample_type, perf_env__arch((*pevlist)->env));
 
 	/* Enable fields for callchain entries */
 	if (symbol_conf.use_callchain &&
@@ -3468,7 +3468,7 @@ static void script__setup_sample_type(struct perf_script *script)
 	struct perf_session *session = script->session;
 	u64 sample_type = evlist__combined_sample_type(session->evlist);
 
-	callchain_param_setup(sample_type);
+	callchain_param_setup(sample_type, perf_env__arch(session->machines.host.env));
 
 	if (script->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
 		pr_warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 8e2777133bd9..131207b91d15 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1600,7 +1600,7 @@ void callchain_cursor_reset(struct callchain_cursor *cursor)
 		map__zput(node->ms.map);
 }
 
-void callchain_param_setup(u64 sample_type)
+void callchain_param_setup(u64 sample_type, const char *arch)
 {
 	if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain) {
 		if ((sample_type & PERF_SAMPLE_REGS_USER) &&
@@ -1612,6 +1612,18 @@ void callchain_param_setup(u64 sample_type)
 		else
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
+
+	/*
+	 * It's necessary to use libunwind to reliably determine the caller of
+	 * a leaf function on aarch64, as otherwise we cannot know whether to
+	 * start from the LR or FP.
+	 *
+	 * Always starting from the LR can result in duplicate or entirely
+	 * erroneous entries. Always skipping the LR and starting from the FP
+	 * can result in missing entries.
+	 */
+	if (callchain_param.record_mode == CALLCHAIN_FP && !strcmp(arch, "arm64"))
+		dwarf_callchain_users = true;
 }
 
 static bool chain_match(struct callchain_list *base_chain,
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 77fba053c677..d95615daed73 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -300,7 +300,7 @@ int callchain_branch_counts(struct callchain_root *root,
 			    u64 *branch_count, u64 *predicted_count,
 			    u64 *abort_count, u64 *cycles_count);
 
-void callchain_param_setup(u64 sample_type);
+void callchain_param_setup(u64 sample_type, const char *arch);
 
 bool callchain_cnode_matched(struct callchain_node *base_cnode,
 			     struct callchain_node *pair_cnode);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 4/6] perf tools: enable dwarf_callchain_users on arm64
@ 2021-12-17 15:45   ` German Gomez
  0 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel

From: Alexandre Truong <alexandre.truong@arm.com>

On arm64, enable dwarf_callchain_users which will be needed
to do a dwarf unwind in order to get the caller of the leaf frame.

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/builtin-report.c |  4 ++--
 tools/perf/builtin-script.c |  4 ++--
 tools/perf/util/callchain.c | 14 +++++++++++++-
 tools/perf/util/callchain.h |  2 +-
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 8167ebfe776a..a31ad60ba66e 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -410,7 +410,7 @@ static int report__setup_sample_type(struct report *rep)
 		}
 	}
 
-	callchain_param_setup(sample_type);
+	callchain_param_setup(sample_type, perf_env__arch(&rep->session->header.env));
 
 	if (rep->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
 		ui__warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
@@ -1124,7 +1124,7 @@ static int process_attr(struct perf_tool *tool __maybe_unused,
 	 * on events sample_type.
 	 */
 	sample_type = evlist__combined_sample_type(*pevlist);
-	callchain_param_setup(sample_type);
+	callchain_param_setup(sample_type, perf_env__arch((*pevlist)->env));
 	return 0;
 }
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ab7d575f97f2..d308adfd1176 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2318,7 +2318,7 @@ static int process_attr(struct perf_tool *tool, union perf_event *event,
 	 * on events sample_type.
 	 */
 	sample_type = evlist__combined_sample_type(evlist);
-	callchain_param_setup(sample_type);
+	callchain_param_setup(sample_type, perf_env__arch((*pevlist)->env));
 
 	/* Enable fields for callchain entries */
 	if (symbol_conf.use_callchain &&
@@ -3468,7 +3468,7 @@ static void script__setup_sample_type(struct perf_script *script)
 	struct perf_session *session = script->session;
 	u64 sample_type = evlist__combined_sample_type(session->evlist);
 
-	callchain_param_setup(sample_type);
+	callchain_param_setup(sample_type, perf_env__arch(session->machines.host.env));
 
 	if (script->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
 		pr_warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 8e2777133bd9..131207b91d15 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1600,7 +1600,7 @@ void callchain_cursor_reset(struct callchain_cursor *cursor)
 		map__zput(node->ms.map);
 }
 
-void callchain_param_setup(u64 sample_type)
+void callchain_param_setup(u64 sample_type, const char *arch)
 {
 	if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain) {
 		if ((sample_type & PERF_SAMPLE_REGS_USER) &&
@@ -1612,6 +1612,18 @@ void callchain_param_setup(u64 sample_type)
 		else
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
+
+	/*
+	 * It's necessary to use libunwind to reliably determine the caller of
+	 * a leaf function on aarch64, as otherwise we cannot know whether to
+	 * start from the LR or FP.
+	 *
+	 * Always starting from the LR can result in duplicate or entirely
+	 * erroneous entries. Always skipping the LR and starting from the FP
+	 * can result in missing entries.
+	 */
+	if (callchain_param.record_mode == CALLCHAIN_FP && !strcmp(arch, "arm64"))
+		dwarf_callchain_users = true;
 }
 
 static bool chain_match(struct callchain_list *base_chain,
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 77fba053c677..d95615daed73 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -300,7 +300,7 @@ int callchain_branch_counts(struct callchain_root *root,
 			    u64 *branch_count, u64 *predicted_count,
 			    u64 *abort_count, u64 *cycles_count);
 
-void callchain_param_setup(u64 sample_type);
+void callchain_param_setup(u64 sample_type, const char *arch);
 
 bool callchain_cnode_matched(struct callchain_node *base_cnode,
 			     struct callchain_node *pair_cnode);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 5/6] perf tools: Refactor SMPL_REG macro in perf_regs.h
  2021-12-17 15:45 ` German Gomez
@ 2021-12-17 15:45   ` German Gomez
  -1 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: German Gomez, John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel

Refactor the SAMPL_REG macro so that it can be used in a followup commit
to obtain the masks for ARM64 registers.

Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/perf_regs.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
index 4e6b1299c571..ce1127af05e4 100644
--- a/tools/perf/util/perf_regs.h
+++ b/tools/perf/util/perf_regs.h
@@ -11,8 +11,11 @@ struct sample_reg {
 	const char *name;
 	uint64_t mask;
 };
-#define SMPL_REG(n, b) { .name = #n, .mask = 1ULL << (b) }
-#define SMPL_REG2(n, b) { .name = #n, .mask = 3ULL << (b) }
+
+#define SMPL_REG_MASK(b) (1ULL << (b))
+#define SMPL_REG(n, b) { .name = #n, .mask = SMPL_REG_MASK(b) }
+#define SMPL_REG2_MASK(b) (3ULL << (b))
+#define SMPL_REG2(n, b) { .name = #n, .mask = SMPL_REG2_MASK(b) }
 #define SMPL_REG_END { .name = NULL }
 
 enum {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 5/6] perf tools: Refactor SMPL_REG macro in perf_regs.h
@ 2021-12-17 15:45   ` German Gomez
  0 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: German Gomez, John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel

Refactor the SAMPL_REG macro so that it can be used in a followup commit
to obtain the masks for ARM64 registers.

Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/perf_regs.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
index 4e6b1299c571..ce1127af05e4 100644
--- a/tools/perf/util/perf_regs.h
+++ b/tools/perf/util/perf_regs.h
@@ -11,8 +11,11 @@ struct sample_reg {
 	const char *name;
 	uint64_t mask;
 };
-#define SMPL_REG(n, b) { .name = #n, .mask = 1ULL << (b) }
-#define SMPL_REG2(n, b) { .name = #n, .mask = 3ULL << (b) }
+
+#define SMPL_REG_MASK(b) (1ULL << (b))
+#define SMPL_REG(n, b) { .name = #n, .mask = SMPL_REG_MASK(b) }
+#define SMPL_REG2_MASK(b) (3ULL << (b))
+#define SMPL_REG2(n, b) { .name = #n, .mask = SMPL_REG2_MASK(b) }
 #define SMPL_REG_END { .name = NULL }
 
 enum {
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
  2021-12-17 15:45 ` German Gomez
@ 2021-12-17 15:45   ` German Gomez
  -1 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel

From: Alexandre Truong <alexandre.truong@arm.com>

When unwinding using frame pointers on ARM64, the return address of the
current function may not have been pushed into the stack when a function
was interrupted, which makes perf show an incorrect call graph to the
user.

Consider the following example program:

  void leaf() {
      /* long computation */
  }

  void parent() {
      // (1)
      leaf();
      // (2)
  }

  ... could be compiled into (using gcc -fno-inline -fno-omit-frame-pointer):

  leaf:
      /* long computation */
      nop
      ret
  parent:
      // (1)
      stp     x29, x30, [sp, -16]!
      mov     x29, sp
      bl      parent
      nop
      ldp     x29, x30, [sp], 16
      // (2)
      ret

If the program is interrupted at (1), (2), or any point in "leaf:", the
call graph will skip the callers of the current function. We can unwind
using the dwarf info and check if the return addr is the same as the LR
register, and inject the missing frame into the call graph.

Before this patch, the above example shows the following call-graph when
recording using "--call-graph fp" mode in ARM64:

  # Children      Self  Command   Shared Object     Symbol
  # ........  ........  ........  ................  ......................
  #
      99.86%    99.86%  program3  program3          [.] leaf
  	    |
  	    ---_start
  	       __libc_start_main
  	       main
  	       leaf

As can be seen, the "parent" function is missing. This is specially
problematic in "leaf" because for leaf functions the compiler may always
omit pushing the return addr into the stack. After this patch, it shows
the correct graph:

  # Children      Self  Command   Shared Object     Symbol
  # ........  ........  ........  ................  ......................
  #
      99.86%    99.86%  program3  program3          [.] leaf
  	    |
  	    ---_start
  	       __libc_start_main
  	       main
  	       parent
  	       leaf

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/Build                         |  1 +
 .../util/arm64-frame-pointer-unwind-support.c | 63 +++++++++++++++++++
 .../util/arm64-frame-pointer-unwind-support.h | 10 +++
 tools/perf/util/machine.c                     | 19 ++++--
 tools/perf/util/machine.h                     |  1 +
 5 files changed, 89 insertions(+), 5 deletions(-)
 create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.c
 create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 2e5bfbb69960..03d4c647bd86 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -1,3 +1,4 @@
+perf-y += arm64-frame-pointer-unwind-support.o
 perf-y += annotate.o
 perf-y += block-info.o
 perf-y += block-range.o
diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.c b/tools/perf/util/arm64-frame-pointer-unwind-support.c
new file mode 100644
index 000000000000..4f5ecf51ed38
--- /dev/null
+++ b/tools/perf/util/arm64-frame-pointer-unwind-support.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "arm64-frame-pointer-unwind-support.h"
+#include "callchain.h"
+#include "event.h"
+#include "perf_regs.h" // SMPL_REG_MASK
+#include "unwind.h"
+
+#define perf_event_arm_regs perf_event_arm64_regs
+#include "../arch/arm64/include/uapi/asm/perf_regs.h"
+#undef perf_event_arm_regs
+
+struct entries {
+	u64 stack[2];
+	size_t length;
+};
+
+static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
+{
+	return callchain_param.record_mode == CALLCHAIN_FP && sample->user_regs.regs
+		&& sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_LR);
+}
+
+static int add_entry(struct unwind_entry *entry, void *arg)
+{
+	struct entries *entries = arg;
+
+	entries->stack[entries->length++] = entry->ip;
+	return 0;
+}
+
+u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
+{
+	int ret;
+	struct entries entries = {};
+	struct regs_dump old_regs = sample->user_regs;
+
+	if (!get_leaf_frame_caller_enabled(sample))
+		return 0;
+
+	/*
+	 * If PC and SP are not recorded, get the value of PC from the stack
+	 * and set its mask. SP is not used when doing the unwinding but it
+	 * still needs to be set to prevent failures.
+	 */
+
+	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
+		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
+		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
+	}
+
+	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
+		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
+		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
+	}
+
+	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
+	sample->user_regs = old_regs;
+
+	if (ret || entries.length != 2)
+		return ret;
+
+	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
+}
diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
new file mode 100644
index 000000000000..32af9ce94398
--- /dev/null
+++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
+#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
+
+#include "event.h"
+#include "thread.h"
+
+u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
+
+#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 3eddad009f78..a00fd6796b35 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -34,6 +34,7 @@
 #include "bpf-event.h"
 #include <internal/lib.h> // page_size
 #include "cgroup.h"
+#include "arm64-frame-pointer-unwind-support.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
 	return err;
 }
 
-static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
-		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
+static u64 get_leaf_frame_caller(struct perf_sample *sample,
+		struct thread *thread, int usr_idx)
 {
-	return 0;
+	if (machine__normalize_is(thread->maps->machine, "arm64"))
+		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
+	else
+		return 0;
 }
 
 static int thread__resolve_callchain_sample(struct thread *thread,
@@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
 }
 
 /*
- * Compares the raw arch string. N.B. see instead perf_env__arch() if a
- * normalized arch is needed.
+ * Compares the raw arch string. N.B. see instead perf_env__arch() or
+ * machine__normalize_is() if a normalized arch is needed.
  */
 bool machine__is(struct machine *machine, const char *arch)
 {
 	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
 }
 
+bool machine__normalize_is(struct machine *machine, const char *arch)
+{
+	return machine && !strcmp(perf_env__arch(machine->env), arch);
+}
+
 int machine__nr_cpus_avail(struct machine *machine)
 {
 	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index a143087eeb47..665535153411 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
 }
 
 bool machine__is(struct machine *machine, const char *arch);
+bool machine__normalize_is(struct machine *machine, const char *arch);
 int machine__nr_cpus_avail(struct machine *machine);
 
 struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
@ 2021-12-17 15:45   ` German Gomez
  0 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2021-12-17 15:45 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel

From: Alexandre Truong <alexandre.truong@arm.com>

When unwinding using frame pointers on ARM64, the return address of the
current function may not have been pushed into the stack when a function
was interrupted, which makes perf show an incorrect call graph to the
user.

Consider the following example program:

  void leaf() {
      /* long computation */
  }

  void parent() {
      // (1)
      leaf();
      // (2)
  }

  ... could be compiled into (using gcc -fno-inline -fno-omit-frame-pointer):

  leaf:
      /* long computation */
      nop
      ret
  parent:
      // (1)
      stp     x29, x30, [sp, -16]!
      mov     x29, sp
      bl      parent
      nop
      ldp     x29, x30, [sp], 16
      // (2)
      ret

If the program is interrupted at (1), (2), or any point in "leaf:", the
call graph will skip the callers of the current function. We can unwind
using the dwarf info and check if the return addr is the same as the LR
register, and inject the missing frame into the call graph.

Before this patch, the above example shows the following call-graph when
recording using "--call-graph fp" mode in ARM64:

  # Children      Self  Command   Shared Object     Symbol
  # ........  ........  ........  ................  ......................
  #
      99.86%    99.86%  program3  program3          [.] leaf
  	    |
  	    ---_start
  	       __libc_start_main
  	       main
  	       leaf

As can be seen, the "parent" function is missing. This is specially
problematic in "leaf" because for leaf functions the compiler may always
omit pushing the return addr into the stack. After this patch, it shows
the correct graph:

  # Children      Self  Command   Shared Object     Symbol
  # ........  ........  ........  ................  ......................
  #
      99.86%    99.86%  program3  program3          [.] leaf
  	    |
  	    ---_start
  	       __libc_start_main
  	       main
  	       parent
  	       leaf

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/Build                         |  1 +
 .../util/arm64-frame-pointer-unwind-support.c | 63 +++++++++++++++++++
 .../util/arm64-frame-pointer-unwind-support.h | 10 +++
 tools/perf/util/machine.c                     | 19 ++++--
 tools/perf/util/machine.h                     |  1 +
 5 files changed, 89 insertions(+), 5 deletions(-)
 create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.c
 create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 2e5bfbb69960..03d4c647bd86 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -1,3 +1,4 @@
+perf-y += arm64-frame-pointer-unwind-support.o
 perf-y += annotate.o
 perf-y += block-info.o
 perf-y += block-range.o
diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.c b/tools/perf/util/arm64-frame-pointer-unwind-support.c
new file mode 100644
index 000000000000..4f5ecf51ed38
--- /dev/null
+++ b/tools/perf/util/arm64-frame-pointer-unwind-support.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "arm64-frame-pointer-unwind-support.h"
+#include "callchain.h"
+#include "event.h"
+#include "perf_regs.h" // SMPL_REG_MASK
+#include "unwind.h"
+
+#define perf_event_arm_regs perf_event_arm64_regs
+#include "../arch/arm64/include/uapi/asm/perf_regs.h"
+#undef perf_event_arm_regs
+
+struct entries {
+	u64 stack[2];
+	size_t length;
+};
+
+static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
+{
+	return callchain_param.record_mode == CALLCHAIN_FP && sample->user_regs.regs
+		&& sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_LR);
+}
+
+static int add_entry(struct unwind_entry *entry, void *arg)
+{
+	struct entries *entries = arg;
+
+	entries->stack[entries->length++] = entry->ip;
+	return 0;
+}
+
+u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
+{
+	int ret;
+	struct entries entries = {};
+	struct regs_dump old_regs = sample->user_regs;
+
+	if (!get_leaf_frame_caller_enabled(sample))
+		return 0;
+
+	/*
+	 * If PC and SP are not recorded, get the value of PC from the stack
+	 * and set its mask. SP is not used when doing the unwinding but it
+	 * still needs to be set to prevent failures.
+	 */
+
+	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
+		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
+		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
+	}
+
+	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
+		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
+		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
+	}
+
+	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
+	sample->user_regs = old_regs;
+
+	if (ret || entries.length != 2)
+		return ret;
+
+	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
+}
diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
new file mode 100644
index 000000000000..32af9ce94398
--- /dev/null
+++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
+#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
+
+#include "event.h"
+#include "thread.h"
+
+u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
+
+#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 3eddad009f78..a00fd6796b35 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -34,6 +34,7 @@
 #include "bpf-event.h"
 #include <internal/lib.h> // page_size
 #include "cgroup.h"
+#include "arm64-frame-pointer-unwind-support.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
 	return err;
 }
 
-static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
-		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
+static u64 get_leaf_frame_caller(struct perf_sample *sample,
+		struct thread *thread, int usr_idx)
 {
-	return 0;
+	if (machine__normalize_is(thread->maps->machine, "arm64"))
+		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
+	else
+		return 0;
 }
 
 static int thread__resolve_callchain_sample(struct thread *thread,
@@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
 }
 
 /*
- * Compares the raw arch string. N.B. see instead perf_env__arch() if a
- * normalized arch is needed.
+ * Compares the raw arch string. N.B. see instead perf_env__arch() or
+ * machine__normalize_is() if a normalized arch is needed.
  */
 bool machine__is(struct machine *machine, const char *arch)
 {
 	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
 }
 
+bool machine__normalize_is(struct machine *machine, const char *arch)
+{
+	return machine && !strcmp(perf_env__arch(machine->env), arch);
+}
+
 int machine__nr_cpus_avail(struct machine *machine)
 {
 	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index a143087eeb47..665535153411 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
 }
 
 bool machine__is(struct machine *machine, const char *arch);
+bool machine__normalize_is(struct machine *machine, const char *arch);
 int machine__nr_cpus_avail(struct machine *machine);
 
 struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
  2021-12-17 15:45   ` German Gomez
@ 2021-12-17 16:01     ` James Clark
  -1 siblings, 0 replies; 26+ messages in thread
From: James Clark @ 2021-12-17 16:01 UTC (permalink / raw)
  To: German Gomez, linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, John Garry, Will Deacon, Mathieu Poirier,
	Leo Yan, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel



On 17/12/2021 15:45, German Gomez wrote:
> From: Alexandre Truong <alexandre.truong@arm.com>
> 
> When unwinding using frame pointers on ARM64, the return address of the
> current function may not have been pushed into the stack when a function
> was interrupted, which makes perf show an incorrect call graph to the
> user.
> 
> Consider the following example program:
> 
>   void leaf() {
>       /* long computation */
>   }
> 
>   void parent() {
>       // (1)
>       leaf();
>       // (2)
>   }
> 
>   ... could be compiled into (using gcc -fno-inline -fno-omit-frame-pointer):
> 
>   leaf:
>       /* long computation */
>       nop
>       ret
>   parent:
>       // (1)
>       stp     x29, x30, [sp, -16]!
>       mov     x29, sp
>       bl      parent
>       nop
>       ldp     x29, x30, [sp], 16
>       // (2)
>       ret
> 
> If the program is interrupted at (1), (2), or any point in "leaf:", the
> call graph will skip the callers of the current function. We can unwind
> using the dwarf info and check if the return addr is the same as the LR
> register, and inject the missing frame into the call graph.
> 
> Before this patch, the above example shows the following call-graph when
> recording using "--call-graph fp" mode in ARM64:
> 
>   # Children      Self  Command   Shared Object     Symbol
>   # ........  ........  ........  ................  ......................
>   #
>       99.86%    99.86%  program3  program3          [.] leaf
>   	    |
>   	    ---_start
>   	       __libc_start_main
>   	       main
>   	       leaf
> 
> As can be seen, the "parent" function is missing. This is specially
> problematic in "leaf" because for leaf functions the compiler may always
> omit pushing the return addr into the stack. After this patch, it shows
> the correct graph:
> 
>   # Children      Self  Command   Shared Object     Symbol
>   # ........  ........  ........  ................  ......................
>   #
>       99.86%    99.86%  program3  program3          [.] leaf
>   	    |
>   	    ---_start
>   	       __libc_start_main
>   	       main
>   	       parent
>   	       leaf
> 
> Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
> Signed-off-by: German Gomez <german.gomez@arm.com>
> ---
>  tools/perf/util/Build                         |  1 +
>  .../util/arm64-frame-pointer-unwind-support.c | 63 +++++++++++++++++++
>  .../util/arm64-frame-pointer-unwind-support.h | 10 +++
>  tools/perf/util/machine.c                     | 19 ++++--
>  tools/perf/util/machine.h                     |  1 +
>  5 files changed, 89 insertions(+), 5 deletions(-)
>  create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.c
>  create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.h
> 
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index 2e5bfbb69960..03d4c647bd86 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -1,3 +1,4 @@
> +perf-y += arm64-frame-pointer-unwind-support.o
>  perf-y += annotate.o
>  perf-y += block-info.o
>  perf-y += block-range.o
> diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.c b/tools/perf/util/arm64-frame-pointer-unwind-support.c
> new file mode 100644
> index 000000000000..4f5ecf51ed38
> --- /dev/null
> +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.c
> @@ -0,0 +1,63 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "arm64-frame-pointer-unwind-support.h"
> +#include "callchain.h"
> +#include "event.h"
> +#include "perf_regs.h" // SMPL_REG_MASK
> +#include "unwind.h"
> +
> +#define perf_event_arm_regs perf_event_arm64_regs
> +#include "../arch/arm64/include/uapi/asm/perf_regs.h"
> +#undef perf_event_arm_regs
> +
> +struct entries {
> +	u64 stack[2];
> +	size_t length;
> +};
> +
> +static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
> +{
> +	return callchain_param.record_mode == CALLCHAIN_FP && sample->user_regs.regs
> +		&& sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_LR);
> +}
> +
> +static int add_entry(struct unwind_entry *entry, void *arg)
> +{
> +	struct entries *entries = arg;
> +
> +	entries->stack[entries->length++] = entry->ip;
> +	return 0;
> +}
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
> +{
> +	int ret;
> +	struct entries entries = {};
> +	struct regs_dump old_regs = sample->user_regs;
> +
> +	if (!get_leaf_frame_caller_enabled(sample))
> +		return 0;
> +
> +	/*
> +	 * If PC and SP are not recorded, get the value of PC from the stack
> +	 * and set its mask. SP is not used when doing the unwinding but it
> +	 * still needs to be set to prevent failures.
> +	 */
> +
> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
> +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
> +	}
> +
> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
> +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
> +	}
> +
> +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
> +	sample->user_regs = old_regs;
> +
> +	if (ret || entries.length != 2)
> +		return ret;
> +
> +	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
> +}
> diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> new file mode 100644
> index 000000000000..32af9ce94398
> --- /dev/null
> +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> +
> +#include "event.h"
> +#include "thread.h"
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
> +
> +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 3eddad009f78..a00fd6796b35 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -34,6 +34,7 @@
>  #include "bpf-event.h"
>  #include <internal/lib.h> // page_size
>  #include "cgroup.h"
> +#include "arm64-frame-pointer-unwind-support.h"
>  
>  #include <linux/ctype.h>
>  #include <symbol/kallsyms.h>
> @@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
>  	return err;
>  }
>  
> -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> -		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
> +static u64 get_leaf_frame_caller(struct perf_sample *sample,
> +		struct thread *thread, int usr_idx)
>  {
> -	return 0;
> +	if (machine__normalize_is(thread->maps->machine, "arm64"))
> +		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
> +	else
> +		return 0;
>  }
>  
>  static int thread__resolve_callchain_sample(struct thread *thread,
> @@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
>  }
>  
>  /*
> - * Compares the raw arch string. N.B. see instead perf_env__arch() if a
> - * normalized arch is needed.
> + * Compares the raw arch string. N.B. see instead perf_env__arch() or
> + * machine__normalize_is() if a normalized arch is needed.
>   */
>  bool machine__is(struct machine *machine, const char *arch)
>  {
>  	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
>  }
>  
> +bool machine__normalize_is(struct machine *machine, const char *arch)
> +{
> +	return machine && !strcmp(perf_env__arch(machine->env), arch);
> +}
> +

I think this function name would be clearer as something like "machine__normalized_is" or
"machine__normalized_arch_is". The tense is slightly off because it's a test rather than a
verb.

With that change, for the whole set:

Reviewed-by: James Clark <james.clark@arm.com>


>  int machine__nr_cpus_avail(struct machine *machine)
>  {
>  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index a143087eeb47..665535153411 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
>  }
>  
>  bool machine__is(struct machine *machine, const char *arch);
> +bool machine__normalize_is(struct machine *machine, const char *arch);
>  int machine__nr_cpus_avail(struct machine *machine);
>  
>  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
@ 2021-12-17 16:01     ` James Clark
  0 siblings, 0 replies; 26+ messages in thread
From: James Clark @ 2021-12-17 16:01 UTC (permalink / raw)
  To: German Gomez, linux-kernel, linux-perf-users, acme
  Cc: Alexandre Truong, John Garry, Will Deacon, Mathieu Poirier,
	Leo Yan, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel



On 17/12/2021 15:45, German Gomez wrote:
> From: Alexandre Truong <alexandre.truong@arm.com>
> 
> When unwinding using frame pointers on ARM64, the return address of the
> current function may not have been pushed into the stack when a function
> was interrupted, which makes perf show an incorrect call graph to the
> user.
> 
> Consider the following example program:
> 
>   void leaf() {
>       /* long computation */
>   }
> 
>   void parent() {
>       // (1)
>       leaf();
>       // (2)
>   }
> 
>   ... could be compiled into (using gcc -fno-inline -fno-omit-frame-pointer):
> 
>   leaf:
>       /* long computation */
>       nop
>       ret
>   parent:
>       // (1)
>       stp     x29, x30, [sp, -16]!
>       mov     x29, sp
>       bl      parent
>       nop
>       ldp     x29, x30, [sp], 16
>       // (2)
>       ret
> 
> If the program is interrupted at (1), (2), or any point in "leaf:", the
> call graph will skip the callers of the current function. We can unwind
> using the dwarf info and check if the return addr is the same as the LR
> register, and inject the missing frame into the call graph.
> 
> Before this patch, the above example shows the following call-graph when
> recording using "--call-graph fp" mode in ARM64:
> 
>   # Children      Self  Command   Shared Object     Symbol
>   # ........  ........  ........  ................  ......................
>   #
>       99.86%    99.86%  program3  program3          [.] leaf
>   	    |
>   	    ---_start
>   	       __libc_start_main
>   	       main
>   	       leaf
> 
> As can be seen, the "parent" function is missing. This is specially
> problematic in "leaf" because for leaf functions the compiler may always
> omit pushing the return addr into the stack. After this patch, it shows
> the correct graph:
> 
>   # Children      Self  Command   Shared Object     Symbol
>   # ........  ........  ........  ................  ......................
>   #
>       99.86%    99.86%  program3  program3          [.] leaf
>   	    |
>   	    ---_start
>   	       __libc_start_main
>   	       main
>   	       parent
>   	       leaf
> 
> Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
> Signed-off-by: German Gomez <german.gomez@arm.com>
> ---
>  tools/perf/util/Build                         |  1 +
>  .../util/arm64-frame-pointer-unwind-support.c | 63 +++++++++++++++++++
>  .../util/arm64-frame-pointer-unwind-support.h | 10 +++
>  tools/perf/util/machine.c                     | 19 ++++--
>  tools/perf/util/machine.h                     |  1 +
>  5 files changed, 89 insertions(+), 5 deletions(-)
>  create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.c
>  create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.h
> 
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index 2e5bfbb69960..03d4c647bd86 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -1,3 +1,4 @@
> +perf-y += arm64-frame-pointer-unwind-support.o
>  perf-y += annotate.o
>  perf-y += block-info.o
>  perf-y += block-range.o
> diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.c b/tools/perf/util/arm64-frame-pointer-unwind-support.c
> new file mode 100644
> index 000000000000..4f5ecf51ed38
> --- /dev/null
> +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.c
> @@ -0,0 +1,63 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "arm64-frame-pointer-unwind-support.h"
> +#include "callchain.h"
> +#include "event.h"
> +#include "perf_regs.h" // SMPL_REG_MASK
> +#include "unwind.h"
> +
> +#define perf_event_arm_regs perf_event_arm64_regs
> +#include "../arch/arm64/include/uapi/asm/perf_regs.h"
> +#undef perf_event_arm_regs
> +
> +struct entries {
> +	u64 stack[2];
> +	size_t length;
> +};
> +
> +static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
> +{
> +	return callchain_param.record_mode == CALLCHAIN_FP && sample->user_regs.regs
> +		&& sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_LR);
> +}
> +
> +static int add_entry(struct unwind_entry *entry, void *arg)
> +{
> +	struct entries *entries = arg;
> +
> +	entries->stack[entries->length++] = entry->ip;
> +	return 0;
> +}
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
> +{
> +	int ret;
> +	struct entries entries = {};
> +	struct regs_dump old_regs = sample->user_regs;
> +
> +	if (!get_leaf_frame_caller_enabled(sample))
> +		return 0;
> +
> +	/*
> +	 * If PC and SP are not recorded, get the value of PC from the stack
> +	 * and set its mask. SP is not used when doing the unwinding but it
> +	 * still needs to be set to prevent failures.
> +	 */
> +
> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
> +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
> +	}
> +
> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
> +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
> +	}
> +
> +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
> +	sample->user_regs = old_regs;
> +
> +	if (ret || entries.length != 2)
> +		return ret;
> +
> +	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
> +}
> diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> new file mode 100644
> index 000000000000..32af9ce94398
> --- /dev/null
> +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> +
> +#include "event.h"
> +#include "thread.h"
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
> +
> +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 3eddad009f78..a00fd6796b35 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -34,6 +34,7 @@
>  #include "bpf-event.h"
>  #include <internal/lib.h> // page_size
>  #include "cgroup.h"
> +#include "arm64-frame-pointer-unwind-support.h"
>  
>  #include <linux/ctype.h>
>  #include <symbol/kallsyms.h>
> @@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
>  	return err;
>  }
>  
> -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> -		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
> +static u64 get_leaf_frame_caller(struct perf_sample *sample,
> +		struct thread *thread, int usr_idx)
>  {
> -	return 0;
> +	if (machine__normalize_is(thread->maps->machine, "arm64"))
> +		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
> +	else
> +		return 0;
>  }
>  
>  static int thread__resolve_callchain_sample(struct thread *thread,
> @@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
>  }
>  
>  /*
> - * Compares the raw arch string. N.B. see instead perf_env__arch() if a
> - * normalized arch is needed.
> + * Compares the raw arch string. N.B. see instead perf_env__arch() or
> + * machine__normalize_is() if a normalized arch is needed.
>   */
>  bool machine__is(struct machine *machine, const char *arch)
>  {
>  	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
>  }
>  
> +bool machine__normalize_is(struct machine *machine, const char *arch)
> +{
> +	return machine && !strcmp(perf_env__arch(machine->env), arch);
> +}
> +

I think this function name would be clearer as something like "machine__normalized_is" or
"machine__normalized_arch_is". The tense is slightly off because it's a test rather than a
verb.

With that change, for the whole set:

Reviewed-by: James Clark <james.clark@arm.com>


>  int machine__nr_cpus_avail(struct machine *machine)
>  {
>  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index a143087eeb47..665535153411 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
>  }
>  
>  bool machine__is(struct machine *machine, const char *arch);
> +bool machine__normalize_is(struct machine *machine, const char *arch);
>  int machine__nr_cpus_avail(struct machine *machine);
>  
>  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
  2021-12-17 16:01     ` James Clark
@ 2021-12-18 11:35       ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-18 11:35 UTC (permalink / raw)
  To: James Clark
  Cc: German Gomez, linux-kernel, linux-perf-users, Alexandre Truong,
	John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel

Em Fri, Dec 17, 2021 at 04:01:38PM +0000, James Clark escreveu:
> 
> 
> On 17/12/2021 15:45, German Gomez wrote:
> > From: Alexandre Truong <alexandre.truong@arm.com>
> > 
> > When unwinding using frame pointers on ARM64, the return address of the
> > current function may not have been pushed into the stack when a function
> > was interrupted, which makes perf show an incorrect call graph to the
> > user.
> > 
> > Consider the following example program:
> > 
> >   void leaf() {
> >       /* long computation */
> >   }
> > 
> >   void parent() {
> >       // (1)
> >       leaf();
> >       // (2)
> >   }
> > 
> >   ... could be compiled into (using gcc -fno-inline -fno-omit-frame-pointer):
> > 
> >   leaf:
> >       /* long computation */
> >       nop
> >       ret
> >   parent:
> >       // (1)
> >       stp     x29, x30, [sp, -16]!
> >       mov     x29, sp
> >       bl      parent
> >       nop
> >       ldp     x29, x30, [sp], 16
> >       // (2)
> >       ret
> > 
> > If the program is interrupted at (1), (2), or any point in "leaf:", the
> > call graph will skip the callers of the current function. We can unwind
> > using the dwarf info and check if the return addr is the same as the LR
> > register, and inject the missing frame into the call graph.
> > 
> > Before this patch, the above example shows the following call-graph when
> > recording using "--call-graph fp" mode in ARM64:
> > 
> >   # Children      Self  Command   Shared Object     Symbol
> >   # ........  ........  ........  ................  ......................
> >   #
> >       99.86%    99.86%  program3  program3          [.] leaf
> >   	    |
> >   	    ---_start
> >   	       __libc_start_main
> >   	       main
> >   	       leaf
> > 
> > As can be seen, the "parent" function is missing. This is specially
> > problematic in "leaf" because for leaf functions the compiler may always
> > omit pushing the return addr into the stack. After this patch, it shows
> > the correct graph:
> > 
> >   # Children      Self  Command   Shared Object     Symbol
> >   # ........  ........  ........  ................  ......................
> >   #
> >       99.86%    99.86%  program3  program3          [.] leaf
> >   	    |
> >   	    ---_start
> >   	       __libc_start_main
> >   	       main
> >   	       parent
> >   	       leaf
> > 
> > Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
> > Signed-off-by: German Gomez <german.gomez@arm.com>
> > ---
> >  tools/perf/util/Build                         |  1 +
> >  .../util/arm64-frame-pointer-unwind-support.c | 63 +++++++++++++++++++
> >  .../util/arm64-frame-pointer-unwind-support.h | 10 +++
> >  tools/perf/util/machine.c                     | 19 ++++--
> >  tools/perf/util/machine.h                     |  1 +
> >  5 files changed, 89 insertions(+), 5 deletions(-)
> >  create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.c
> >  create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.h
> > 
> > diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> > index 2e5bfbb69960..03d4c647bd86 100644
> > --- a/tools/perf/util/Build
> > +++ b/tools/perf/util/Build
> > @@ -1,3 +1,4 @@
> > +perf-y += arm64-frame-pointer-unwind-support.o
> >  perf-y += annotate.o
> >  perf-y += block-info.o
> >  perf-y += block-range.o
> > diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.c b/tools/perf/util/arm64-frame-pointer-unwind-support.c
> > new file mode 100644
> > index 000000000000..4f5ecf51ed38
> > --- /dev/null
> > +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.c
> > @@ -0,0 +1,63 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include "arm64-frame-pointer-unwind-support.h"
> > +#include "callchain.h"
> > +#include "event.h"
> > +#include "perf_regs.h" // SMPL_REG_MASK
> > +#include "unwind.h"
> > +
> > +#define perf_event_arm_regs perf_event_arm64_regs
> > +#include "../arch/arm64/include/uapi/asm/perf_regs.h"
> > +#undef perf_event_arm_regs
> > +
> > +struct entries {
> > +	u64 stack[2];
> > +	size_t length;
> > +};
> > +
> > +static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
> > +{
> > +	return callchain_param.record_mode == CALLCHAIN_FP && sample->user_regs.regs
> > +		&& sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_LR);
> > +}
> > +
> > +static int add_entry(struct unwind_entry *entry, void *arg)
> > +{
> > +	struct entries *entries = arg;
> > +
> > +	entries->stack[entries->length++] = entry->ip;
> > +	return 0;
> > +}
> > +
> > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
> > +{
> > +	int ret;
> > +	struct entries entries = {};
> > +	struct regs_dump old_regs = sample->user_regs;
> > +
> > +	if (!get_leaf_frame_caller_enabled(sample))
> > +		return 0;
> > +
> > +	/*
> > +	 * If PC and SP are not recorded, get the value of PC from the stack
> > +	 * and set its mask. SP is not used when doing the unwinding but it
> > +	 * still needs to be set to prevent failures.
> > +	 */
> > +
> > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
> > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
> > +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
> > +	}
> > +
> > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
> > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
> > +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
> > +	}
> > +
> > +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
> > +	sample->user_regs = old_regs;
> > +
> > +	if (ret || entries.length != 2)
> > +		return ret;
> > +
> > +	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
> > +}
> > diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > new file mode 100644
> > index 000000000000..32af9ce94398
> > --- /dev/null
> > +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > +
> > +#include "event.h"
> > +#include "thread.h"
> > +
> > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
> > +
> > +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 3eddad009f78..a00fd6796b35 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -34,6 +34,7 @@
> >  #include "bpf-event.h"
> >  #include <internal/lib.h> // page_size
> >  #include "cgroup.h"
> > +#include "arm64-frame-pointer-unwind-support.h"
> >  
> >  #include <linux/ctype.h>
> >  #include <symbol/kallsyms.h>
> > @@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
> >  	return err;
> >  }
> >  
> > -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> > -		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
> > +static u64 get_leaf_frame_caller(struct perf_sample *sample,
> > +		struct thread *thread, int usr_idx)
> >  {
> > -	return 0;
> > +	if (machine__normalize_is(thread->maps->machine, "arm64"))
> > +		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
> > +	else
> > +		return 0;
> >  }
> >  
> >  static int thread__resolve_callchain_sample(struct thread *thread,
> > @@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
> >  }
> >  
> >  /*
> > - * Compares the raw arch string. N.B. see instead perf_env__arch() if a
> > - * normalized arch is needed.
> > + * Compares the raw arch string. N.B. see instead perf_env__arch() or
> > + * machine__normalize_is() if a normalized arch is needed.
> >   */
> >  bool machine__is(struct machine *machine, const char *arch)
> >  {
> >  	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
> >  }
> >  
> > +bool machine__normalize_is(struct machine *machine, const char *arch)
> > +{
> > +	return machine && !strcmp(perf_env__arch(machine->env), arch);
> > +}
> > +
> 
> I think this function name would be clearer as something like "machine__normalized_is" or
> "machine__normalized_arch_is". The tense is slightly off because it's a test rather than a
> verb.

Agreed, its a question, not a command.

- Arnaldo
 
> With that change, for the whole set:
> 
> Reviewed-by: James Clark <james.clark@arm.com>
> 
> 
> >  int machine__nr_cpus_avail(struct machine *machine)
> >  {
> >  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> > index a143087eeb47..665535153411 100644
> > --- a/tools/perf/util/machine.h
> > +++ b/tools/perf/util/machine.h
> > @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
> >  }
> >  
> >  bool machine__is(struct machine *machine, const char *arch);
> > +bool machine__normalize_is(struct machine *machine, const char *arch);
> >  int machine__nr_cpus_avail(struct machine *machine);
> >  
> >  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
> > 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
@ 2021-12-18 11:35       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-18 11:35 UTC (permalink / raw)
  To: James Clark
  Cc: German Gomez, linux-kernel, linux-perf-users, Alexandre Truong,
	John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel

Em Fri, Dec 17, 2021 at 04:01:38PM +0000, James Clark escreveu:
> 
> 
> On 17/12/2021 15:45, German Gomez wrote:
> > From: Alexandre Truong <alexandre.truong@arm.com>
> > 
> > When unwinding using frame pointers on ARM64, the return address of the
> > current function may not have been pushed into the stack when a function
> > was interrupted, which makes perf show an incorrect call graph to the
> > user.
> > 
> > Consider the following example program:
> > 
> >   void leaf() {
> >       /* long computation */
> >   }
> > 
> >   void parent() {
> >       // (1)
> >       leaf();
> >       // (2)
> >   }
> > 
> >   ... could be compiled into (using gcc -fno-inline -fno-omit-frame-pointer):
> > 
> >   leaf:
> >       /* long computation */
> >       nop
> >       ret
> >   parent:
> >       // (1)
> >       stp     x29, x30, [sp, -16]!
> >       mov     x29, sp
> >       bl      parent
> >       nop
> >       ldp     x29, x30, [sp], 16
> >       // (2)
> >       ret
> > 
> > If the program is interrupted at (1), (2), or any point in "leaf:", the
> > call graph will skip the callers of the current function. We can unwind
> > using the dwarf info and check if the return addr is the same as the LR
> > register, and inject the missing frame into the call graph.
> > 
> > Before this patch, the above example shows the following call-graph when
> > recording using "--call-graph fp" mode in ARM64:
> > 
> >   # Children      Self  Command   Shared Object     Symbol
> >   # ........  ........  ........  ................  ......................
> >   #
> >       99.86%    99.86%  program3  program3          [.] leaf
> >   	    |
> >   	    ---_start
> >   	       __libc_start_main
> >   	       main
> >   	       leaf
> > 
> > As can be seen, the "parent" function is missing. This is specially
> > problematic in "leaf" because for leaf functions the compiler may always
> > omit pushing the return addr into the stack. After this patch, it shows
> > the correct graph:
> > 
> >   # Children      Self  Command   Shared Object     Symbol
> >   # ........  ........  ........  ................  ......................
> >   #
> >       99.86%    99.86%  program3  program3          [.] leaf
> >   	    |
> >   	    ---_start
> >   	       __libc_start_main
> >   	       main
> >   	       parent
> >   	       leaf
> > 
> > Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
> > Signed-off-by: German Gomez <german.gomez@arm.com>
> > ---
> >  tools/perf/util/Build                         |  1 +
> >  .../util/arm64-frame-pointer-unwind-support.c | 63 +++++++++++++++++++
> >  .../util/arm64-frame-pointer-unwind-support.h | 10 +++
> >  tools/perf/util/machine.c                     | 19 ++++--
> >  tools/perf/util/machine.h                     |  1 +
> >  5 files changed, 89 insertions(+), 5 deletions(-)
> >  create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.c
> >  create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.h
> > 
> > diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> > index 2e5bfbb69960..03d4c647bd86 100644
> > --- a/tools/perf/util/Build
> > +++ b/tools/perf/util/Build
> > @@ -1,3 +1,4 @@
> > +perf-y += arm64-frame-pointer-unwind-support.o
> >  perf-y += annotate.o
> >  perf-y += block-info.o
> >  perf-y += block-range.o
> > diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.c b/tools/perf/util/arm64-frame-pointer-unwind-support.c
> > new file mode 100644
> > index 000000000000..4f5ecf51ed38
> > --- /dev/null
> > +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.c
> > @@ -0,0 +1,63 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include "arm64-frame-pointer-unwind-support.h"
> > +#include "callchain.h"
> > +#include "event.h"
> > +#include "perf_regs.h" // SMPL_REG_MASK
> > +#include "unwind.h"
> > +
> > +#define perf_event_arm_regs perf_event_arm64_regs
> > +#include "../arch/arm64/include/uapi/asm/perf_regs.h"
> > +#undef perf_event_arm_regs
> > +
> > +struct entries {
> > +	u64 stack[2];
> > +	size_t length;
> > +};
> > +
> > +static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
> > +{
> > +	return callchain_param.record_mode == CALLCHAIN_FP && sample->user_regs.regs
> > +		&& sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_LR);
> > +}
> > +
> > +static int add_entry(struct unwind_entry *entry, void *arg)
> > +{
> > +	struct entries *entries = arg;
> > +
> > +	entries->stack[entries->length++] = entry->ip;
> > +	return 0;
> > +}
> > +
> > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
> > +{
> > +	int ret;
> > +	struct entries entries = {};
> > +	struct regs_dump old_regs = sample->user_regs;
> > +
> > +	if (!get_leaf_frame_caller_enabled(sample))
> > +		return 0;
> > +
> > +	/*
> > +	 * If PC and SP are not recorded, get the value of PC from the stack
> > +	 * and set its mask. SP is not used when doing the unwinding but it
> > +	 * still needs to be set to prevent failures.
> > +	 */
> > +
> > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
> > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
> > +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
> > +	}
> > +
> > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
> > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
> > +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
> > +	}
> > +
> > +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
> > +	sample->user_regs = old_regs;
> > +
> > +	if (ret || entries.length != 2)
> > +		return ret;
> > +
> > +	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
> > +}
> > diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > new file mode 100644
> > index 000000000000..32af9ce94398
> > --- /dev/null
> > +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > +
> > +#include "event.h"
> > +#include "thread.h"
> > +
> > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
> > +
> > +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 3eddad009f78..a00fd6796b35 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -34,6 +34,7 @@
> >  #include "bpf-event.h"
> >  #include <internal/lib.h> // page_size
> >  #include "cgroup.h"
> > +#include "arm64-frame-pointer-unwind-support.h"
> >  
> >  #include <linux/ctype.h>
> >  #include <symbol/kallsyms.h>
> > @@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
> >  	return err;
> >  }
> >  
> > -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> > -		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
> > +static u64 get_leaf_frame_caller(struct perf_sample *sample,
> > +		struct thread *thread, int usr_idx)
> >  {
> > -	return 0;
> > +	if (machine__normalize_is(thread->maps->machine, "arm64"))
> > +		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
> > +	else
> > +		return 0;
> >  }
> >  
> >  static int thread__resolve_callchain_sample(struct thread *thread,
> > @@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
> >  }
> >  
> >  /*
> > - * Compares the raw arch string. N.B. see instead perf_env__arch() if a
> > - * normalized arch is needed.
> > + * Compares the raw arch string. N.B. see instead perf_env__arch() or
> > + * machine__normalize_is() if a normalized arch is needed.
> >   */
> >  bool machine__is(struct machine *machine, const char *arch)
> >  {
> >  	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
> >  }
> >  
> > +bool machine__normalize_is(struct machine *machine, const char *arch)
> > +{
> > +	return machine && !strcmp(perf_env__arch(machine->env), arch);
> > +}
> > +
> 
> I think this function name would be clearer as something like "machine__normalized_is" or
> "machine__normalized_arch_is". The tense is slightly off because it's a test rather than a
> verb.

Agreed, its a question, not a command.

- Arnaldo
 
> With that change, for the whole set:
> 
> Reviewed-by: James Clark <james.clark@arm.com>
> 
> 
> >  int machine__nr_cpus_avail(struct machine *machine)
> >  {
> >  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> > index a143087eeb47..665535153411 100644
> > --- a/tools/perf/util/machine.h
> > +++ b/tools/perf/util/machine.h
> > @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
> >  }
> >  
> >  bool machine__is(struct machine *machine, const char *arch);
> > +bool machine__normalize_is(struct machine *machine, const char *arch);
> >  int machine__nr_cpus_avail(struct machine *machine);
> >  
> >  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
> > 

-- 

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
  2021-12-17 15:45   ` German Gomez
@ 2021-12-21  9:32     ` Jiri Olsa
  -1 siblings, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2021-12-21  9:32 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, acme, Alexandre Truong,
	John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, linux-arm-kernel

On Fri, Dec 17, 2021 at 03:45:20PM +0000, German Gomez wrote:

SNIP

> +}
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
> +{
> +	int ret;
> +	struct entries entries = {};
> +	struct regs_dump old_regs = sample->user_regs;
> +
> +	if (!get_leaf_frame_caller_enabled(sample))
> +		return 0;
> +
> +	/*
> +	 * If PC and SP are not recorded, get the value of PC from the stack
> +	 * and set its mask. SP is not used when doing the unwinding but it
> +	 * still needs to be set to prevent failures.
> +	 */
> +
> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
> +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
> +	}
> +
> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
> +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
> +	}
> +
> +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);

just curious, did you try this with both unwinders libunwind/libdw?

any chance you could add arm specific test for this?

otherwise it looks good to me

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka


> +	sample->user_regs = old_regs;
> +
> +	if (ret || entries.length != 2)
> +		return ret;
> +
> +	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
> +}
> diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> new file mode 100644
> index 000000000000..32af9ce94398
> --- /dev/null
> +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> +
> +#include "event.h"
> +#include "thread.h"
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
> +
> +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 3eddad009f78..a00fd6796b35 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -34,6 +34,7 @@
>  #include "bpf-event.h"
>  #include <internal/lib.h> // page_size
>  #include "cgroup.h"
> +#include "arm64-frame-pointer-unwind-support.h"
>  
>  #include <linux/ctype.h>
>  #include <symbol/kallsyms.h>
> @@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
>  	return err;
>  }
>  
> -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> -		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
> +static u64 get_leaf_frame_caller(struct perf_sample *sample,
> +		struct thread *thread, int usr_idx)
>  {
> -	return 0;
> +	if (machine__normalize_is(thread->maps->machine, "arm64"))
> +		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
> +	else
> +		return 0;
>  }
>  
>  static int thread__resolve_callchain_sample(struct thread *thread,
> @@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
>  }
>  
>  /*
> - * Compares the raw arch string. N.B. see instead perf_env__arch() if a
> - * normalized arch is needed.
> + * Compares the raw arch string. N.B. see instead perf_env__arch() or
> + * machine__normalize_is() if a normalized arch is needed.
>   */
>  bool machine__is(struct machine *machine, const char *arch)
>  {
>  	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
>  }
>  
> +bool machine__normalize_is(struct machine *machine, const char *arch)
> +{
> +	return machine && !strcmp(perf_env__arch(machine->env), arch);
> +}
> +
>  int machine__nr_cpus_avail(struct machine *machine)
>  {
>  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index a143087eeb47..665535153411 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
>  }
>  
>  bool machine__is(struct machine *machine, const char *arch);
> +bool machine__normalize_is(struct machine *machine, const char *arch);
>  int machine__nr_cpus_avail(struct machine *machine);
>  
>  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
@ 2021-12-21  9:32     ` Jiri Olsa
  0 siblings, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2021-12-21  9:32 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, acme, Alexandre Truong,
	John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, linux-arm-kernel

On Fri, Dec 17, 2021 at 03:45:20PM +0000, German Gomez wrote:

SNIP

> +}
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
> +{
> +	int ret;
> +	struct entries entries = {};
> +	struct regs_dump old_regs = sample->user_regs;
> +
> +	if (!get_leaf_frame_caller_enabled(sample))
> +		return 0;
> +
> +	/*
> +	 * If PC and SP are not recorded, get the value of PC from the stack
> +	 * and set its mask. SP is not used when doing the unwinding but it
> +	 * still needs to be set to prevent failures.
> +	 */
> +
> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
> +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
> +	}
> +
> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
> +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
> +	}
> +
> +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);

just curious, did you try this with both unwinders libunwind/libdw?

any chance you could add arm specific test for this?

otherwise it looks good to me

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka


> +	sample->user_regs = old_regs;
> +
> +	if (ret || entries.length != 2)
> +		return ret;
> +
> +	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
> +}
> diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> new file mode 100644
> index 000000000000..32af9ce94398
> --- /dev/null
> +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> +
> +#include "event.h"
> +#include "thread.h"
> +
> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
> +
> +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 3eddad009f78..a00fd6796b35 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -34,6 +34,7 @@
>  #include "bpf-event.h"
>  #include <internal/lib.h> // page_size
>  #include "cgroup.h"
> +#include "arm64-frame-pointer-unwind-support.h"
>  
>  #include <linux/ctype.h>
>  #include <symbol/kallsyms.h>
> @@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
>  	return err;
>  }
>  
> -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> -		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
> +static u64 get_leaf_frame_caller(struct perf_sample *sample,
> +		struct thread *thread, int usr_idx)
>  {
> -	return 0;
> +	if (machine__normalize_is(thread->maps->machine, "arm64"))
> +		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
> +	else
> +		return 0;
>  }
>  
>  static int thread__resolve_callchain_sample(struct thread *thread,
> @@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
>  }
>  
>  /*
> - * Compares the raw arch string. N.B. see instead perf_env__arch() if a
> - * normalized arch is needed.
> + * Compares the raw arch string. N.B. see instead perf_env__arch() or
> + * machine__normalize_is() if a normalized arch is needed.
>   */
>  bool machine__is(struct machine *machine, const char *arch)
>  {
>  	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
>  }
>  
> +bool machine__normalize_is(struct machine *machine, const char *arch)
> +{
> +	return machine && !strcmp(perf_env__arch(machine->env), arch);
> +}
> +
>  int machine__nr_cpus_avail(struct machine *machine)
>  {
>  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index a143087eeb47..665535153411 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
>  }
>  
>  bool machine__is(struct machine *machine, const char *arch);
> +bool machine__normalize_is(struct machine *machine, const char *arch);
>  int machine__nr_cpus_avail(struct machine *machine);
>  
>  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
> -- 
> 2.25.1
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
  2021-12-21  9:32     ` Jiri Olsa
@ 2021-12-21 14:17       ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-21 14:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: German Gomez, linux-kernel, linux-perf-users, Alexandre Truong,
	John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, linux-arm-kernel

Em Tue, Dec 21, 2021 at 10:32:50AM +0100, Jiri Olsa escreveu:
> On Fri, Dec 17, 2021 at 03:45:20PM +0000, German Gomez wrote:
> 
> SNIP
> 
> > +}
> > +
> > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
> > +{
> > +	int ret;
> > +	struct entries entries = {};
> > +	struct regs_dump old_regs = sample->user_regs;
> > +
> > +	if (!get_leaf_frame_caller_enabled(sample))
> > +		return 0;
> > +
> > +	/*
> > +	 * If PC and SP are not recorded, get the value of PC from the stack
> > +	 * and set its mask. SP is not used when doing the unwinding but it
> > +	 * still needs to be set to prevent failures.
> > +	 */
> > +
> > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
> > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
> > +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
> > +	}
> > +
> > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
> > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
> > +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
> > +	}
> > +
> > +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
> 
> just curious, did you try this with both unwinders libunwind/libdw?
> 
> any chance you could add arm specific test for this?
> 
> otherwise it looks good to me

Whole patchkit?
 
> Acked-by: Jiri Olsa <jolsa@kernel.org>
> 
> thanks,
> jirka
> 
> 
> > +	sample->user_regs = old_regs;
> > +
> > +	if (ret || entries.length != 2)
> > +		return ret;
> > +
> > +	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
> > +}
> > diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > new file mode 100644
> > index 000000000000..32af9ce94398
> > --- /dev/null
> > +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > +
> > +#include "event.h"
> > +#include "thread.h"
> > +
> > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
> > +
> > +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 3eddad009f78..a00fd6796b35 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -34,6 +34,7 @@
> >  #include "bpf-event.h"
> >  #include <internal/lib.h> // page_size
> >  #include "cgroup.h"
> > +#include "arm64-frame-pointer-unwind-support.h"
> >  
> >  #include <linux/ctype.h>
> >  #include <symbol/kallsyms.h>
> > @@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
> >  	return err;
> >  }
> >  
> > -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> > -		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
> > +static u64 get_leaf_frame_caller(struct perf_sample *sample,
> > +		struct thread *thread, int usr_idx)
> >  {
> > -	return 0;
> > +	if (machine__normalize_is(thread->maps->machine, "arm64"))
> > +		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
> > +	else
> > +		return 0;
> >  }
> >  
> >  static int thread__resolve_callchain_sample(struct thread *thread,
> > @@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
> >  }
> >  
> >  /*
> > - * Compares the raw arch string. N.B. see instead perf_env__arch() if a
> > - * normalized arch is needed.
> > + * Compares the raw arch string. N.B. see instead perf_env__arch() or
> > + * machine__normalize_is() if a normalized arch is needed.
> >   */
> >  bool machine__is(struct machine *machine, const char *arch)
> >  {
> >  	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
> >  }
> >  
> > +bool machine__normalize_is(struct machine *machine, const char *arch)
> > +{
> > +	return machine && !strcmp(perf_env__arch(machine->env), arch);
> > +}
> > +
> >  int machine__nr_cpus_avail(struct machine *machine)
> >  {
> >  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> > index a143087eeb47..665535153411 100644
> > --- a/tools/perf/util/machine.h
> > +++ b/tools/perf/util/machine.h
> > @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
> >  }
> >  
> >  bool machine__is(struct machine *machine, const char *arch);
> > +bool machine__normalize_is(struct machine *machine, const char *arch);
> >  int machine__nr_cpus_avail(struct machine *machine);
> >  
> >  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
> > -- 
> > 2.25.1
> > 

-- 

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
@ 2021-12-21 14:17       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-12-21 14:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: German Gomez, linux-kernel, linux-perf-users, Alexandre Truong,
	John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, linux-arm-kernel

Em Tue, Dec 21, 2021 at 10:32:50AM +0100, Jiri Olsa escreveu:
> On Fri, Dec 17, 2021 at 03:45:20PM +0000, German Gomez wrote:
> 
> SNIP
> 
> > +}
> > +
> > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
> > +{
> > +	int ret;
> > +	struct entries entries = {};
> > +	struct regs_dump old_regs = sample->user_regs;
> > +
> > +	if (!get_leaf_frame_caller_enabled(sample))
> > +		return 0;
> > +
> > +	/*
> > +	 * If PC and SP are not recorded, get the value of PC from the stack
> > +	 * and set its mask. SP is not used when doing the unwinding but it
> > +	 * still needs to be set to prevent failures.
> > +	 */
> > +
> > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
> > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
> > +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
> > +	}
> > +
> > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
> > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
> > +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
> > +	}
> > +
> > +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
> 
> just curious, did you try this with both unwinders libunwind/libdw?
> 
> any chance you could add arm specific test for this?
> 
> otherwise it looks good to me

Whole patchkit?
 
> Acked-by: Jiri Olsa <jolsa@kernel.org>
> 
> thanks,
> jirka
> 
> 
> > +	sample->user_regs = old_regs;
> > +
> > +	if (ret || entries.length != 2)
> > +		return ret;
> > +
> > +	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
> > +}
> > diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > new file mode 100644
> > index 000000000000..32af9ce94398
> > --- /dev/null
> > +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > +
> > +#include "event.h"
> > +#include "thread.h"
> > +
> > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
> > +
> > +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 3eddad009f78..a00fd6796b35 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -34,6 +34,7 @@
> >  #include "bpf-event.h"
> >  #include <internal/lib.h> // page_size
> >  #include "cgroup.h"
> > +#include "arm64-frame-pointer-unwind-support.h"
> >  
> >  #include <linux/ctype.h>
> >  #include <symbol/kallsyms.h>
> > @@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
> >  	return err;
> >  }
> >  
> > -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> > -		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
> > +static u64 get_leaf_frame_caller(struct perf_sample *sample,
> > +		struct thread *thread, int usr_idx)
> >  {
> > -	return 0;
> > +	if (machine__normalize_is(thread->maps->machine, "arm64"))
> > +		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
> > +	else
> > +		return 0;
> >  }
> >  
> >  static int thread__resolve_callchain_sample(struct thread *thread,
> > @@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
> >  }
> >  
> >  /*
> > - * Compares the raw arch string. N.B. see instead perf_env__arch() if a
> > - * normalized arch is needed.
> > + * Compares the raw arch string. N.B. see instead perf_env__arch() or
> > + * machine__normalize_is() if a normalized arch is needed.
> >   */
> >  bool machine__is(struct machine *machine, const char *arch)
> >  {
> >  	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
> >  }
> >  
> > +bool machine__normalize_is(struct machine *machine, const char *arch)
> > +{
> > +	return machine && !strcmp(perf_env__arch(machine->env), arch);
> > +}
> > +
> >  int machine__nr_cpus_avail(struct machine *machine)
> >  {
> >  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> > index a143087eeb47..665535153411 100644
> > --- a/tools/perf/util/machine.h
> > +++ b/tools/perf/util/machine.h
> > @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
> >  }
> >  
> >  bool machine__is(struct machine *machine, const char *arch);
> > +bool machine__normalize_is(struct machine *machine, const char *arch);
> >  int machine__nr_cpus_avail(struct machine *machine);
> >  
> >  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
> > -- 
> > 2.25.1
> > 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
  2021-12-21 14:17       ` Arnaldo Carvalho de Melo
@ 2021-12-21 15:06         ` Jiri Olsa
  -1 siblings, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2021-12-21 15:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: German Gomez, linux-kernel, linux-perf-users, Alexandre Truong,
	John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, linux-arm-kernel

On Tue, Dec 21, 2021 at 11:17:03AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Dec 21, 2021 at 10:32:50AM +0100, Jiri Olsa escreveu:
> > On Fri, Dec 17, 2021 at 03:45:20PM +0000, German Gomez wrote:
> > 
> > SNIP
> > 
> > > +}
> > > +
> > > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
> > > +{
> > > +	int ret;
> > > +	struct entries entries = {};
> > > +	struct regs_dump old_regs = sample->user_regs;
> > > +
> > > +	if (!get_leaf_frame_caller_enabled(sample))
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * If PC and SP are not recorded, get the value of PC from the stack
> > > +	 * and set its mask. SP is not used when doing the unwinding but it
> > > +	 * still needs to be set to prevent failures.
> > > +	 */
> > > +
> > > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
> > > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
> > > +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
> > > +	}
> > > +
> > > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
> > > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
> > > +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
> > > +	}
> > > +
> > > +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
> > 
> > just curious, did you try this with both unwinders libunwind/libdw?
> > 
> > any chance you could add arm specific test for this?
> > 
> > otherwise it looks good to me
> 
> Whole patchkit?

yes, it's for the patchset

jirka

>  
> > Acked-by: Jiri Olsa <jolsa@kernel.org>
> > 
> > thanks,
> > jirka
> > 
> > 
> > > +	sample->user_regs = old_regs;
> > > +
> > > +	if (ret || entries.length != 2)
> > > +		return ret;
> > > +
> > > +	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
> > > +}
> > > diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > > new file mode 100644
> > > index 000000000000..32af9ce94398
> > > --- /dev/null
> > > +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > > @@ -0,0 +1,10 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > > +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > > +
> > > +#include "event.h"
> > > +#include "thread.h"
> > > +
> > > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
> > > +
> > > +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > index 3eddad009f78..a00fd6796b35 100644
> > > --- a/tools/perf/util/machine.c
> > > +++ b/tools/perf/util/machine.c
> > > @@ -34,6 +34,7 @@
> > >  #include "bpf-event.h"
> > >  #include <internal/lib.h> // page_size
> > >  #include "cgroup.h"
> > > +#include "arm64-frame-pointer-unwind-support.h"
> > >  
> > >  #include <linux/ctype.h>
> > >  #include <symbol/kallsyms.h>
> > > @@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
> > >  	return err;
> > >  }
> > >  
> > > -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> > > -		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
> > > +static u64 get_leaf_frame_caller(struct perf_sample *sample,
> > > +		struct thread *thread, int usr_idx)
> > >  {
> > > -	return 0;
> > > +	if (machine__normalize_is(thread->maps->machine, "arm64"))
> > > +		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
> > > +	else
> > > +		return 0;
> > >  }
> > >  
> > >  static int thread__resolve_callchain_sample(struct thread *thread,
> > > @@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
> > >  }
> > >  
> > >  /*
> > > - * Compares the raw arch string. N.B. see instead perf_env__arch() if a
> > > - * normalized arch is needed.
> > > + * Compares the raw arch string. N.B. see instead perf_env__arch() or
> > > + * machine__normalize_is() if a normalized arch is needed.
> > >   */
> > >  bool machine__is(struct machine *machine, const char *arch)
> > >  {
> > >  	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
> > >  }
> > >  
> > > +bool machine__normalize_is(struct machine *machine, const char *arch)
> > > +{
> > > +	return machine && !strcmp(perf_env__arch(machine->env), arch);
> > > +}
> > > +
> > >  int machine__nr_cpus_avail(struct machine *machine)
> > >  {
> > >  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> > > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> > > index a143087eeb47..665535153411 100644
> > > --- a/tools/perf/util/machine.h
> > > +++ b/tools/perf/util/machine.h
> > > @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
> > >  }
> > >  
> > >  bool machine__is(struct machine *machine, const char *arch);
> > > +bool machine__normalize_is(struct machine *machine, const char *arch);
> > >  int machine__nr_cpus_avail(struct machine *machine);
> > >  
> > >  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
> > > -- 
> > > 2.25.1
> > > 
> 
> -- 
> 
> - Arnaldo
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
@ 2021-12-21 15:06         ` Jiri Olsa
  0 siblings, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2021-12-21 15:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: German Gomez, linux-kernel, linux-perf-users, Alexandre Truong,
	John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, linux-arm-kernel

On Tue, Dec 21, 2021 at 11:17:03AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Dec 21, 2021 at 10:32:50AM +0100, Jiri Olsa escreveu:
> > On Fri, Dec 17, 2021 at 03:45:20PM +0000, German Gomez wrote:
> > 
> > SNIP
> > 
> > > +}
> > > +
> > > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
> > > +{
> > > +	int ret;
> > > +	struct entries entries = {};
> > > +	struct regs_dump old_regs = sample->user_regs;
> > > +
> > > +	if (!get_leaf_frame_caller_enabled(sample))
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * If PC and SP are not recorded, get the value of PC from the stack
> > > +	 * and set its mask. SP is not used when doing the unwinding but it
> > > +	 * still needs to be set to prevent failures.
> > > +	 */
> > > +
> > > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
> > > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
> > > +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
> > > +	}
> > > +
> > > +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
> > > +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
> > > +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
> > > +	}
> > > +
> > > +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
> > 
> > just curious, did you try this with both unwinders libunwind/libdw?
> > 
> > any chance you could add arm specific test for this?
> > 
> > otherwise it looks good to me
> 
> Whole patchkit?

yes, it's for the patchset

jirka

>  
> > Acked-by: Jiri Olsa <jolsa@kernel.org>
> > 
> > thanks,
> > jirka
> > 
> > 
> > > +	sample->user_regs = old_regs;
> > > +
> > > +	if (ret || entries.length != 2)
> > > +		return ret;
> > > +
> > > +	return callchain_param.order == ORDER_CALLER ? entries.stack[0] : entries.stack[1];
> > > +}
> > > diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.h b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > > new file mode 100644
> > > index 000000000000..32af9ce94398
> > > --- /dev/null
> > > +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.h
> > > @@ -0,0 +1,10 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > > +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H
> > > +
> > > +#include "event.h"
> > > +#include "thread.h"
> > > +
> > > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int user_idx);
> > > +
> > > +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */
> > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > index 3eddad009f78..a00fd6796b35 100644
> > > --- a/tools/perf/util/machine.c
> > > +++ b/tools/perf/util/machine.c
> > > @@ -34,6 +34,7 @@
> > >  #include "bpf-event.h"
> > >  #include <internal/lib.h> // page_size
> > >  #include "cgroup.h"
> > > +#include "arm64-frame-pointer-unwind-support.h"
> > >  
> > >  #include <linux/ctype.h>
> > >  #include <symbol/kallsyms.h>
> > > @@ -2710,10 +2711,13 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
> > >  	return err;
> > >  }
> > >  
> > > -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused,
> > > -		struct thread *thread __maybe_unused, int usr_idx __maybe_unused)
> > > +static u64 get_leaf_frame_caller(struct perf_sample *sample,
> > > +		struct thread *thread, int usr_idx)
> > >  {
> > > -	return 0;
> > > +	if (machine__normalize_is(thread->maps->machine, "arm64"))
> > > +		return get_leaf_frame_caller_aarch64(sample, thread, usr_idx);
> > > +	else
> > > +		return 0;
> > >  }
> > >  
> > >  static int thread__resolve_callchain_sample(struct thread *thread,
> > > @@ -3114,14 +3118,19 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
> > >  }
> > >  
> > >  /*
> > > - * Compares the raw arch string. N.B. see instead perf_env__arch() if a
> > > - * normalized arch is needed.
> > > + * Compares the raw arch string. N.B. see instead perf_env__arch() or
> > > + * machine__normalize_is() if a normalized arch is needed.
> > >   */
> > >  bool machine__is(struct machine *machine, const char *arch)
> > >  {
> > >  	return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
> > >  }
> > >  
> > > +bool machine__normalize_is(struct machine *machine, const char *arch)
> > > +{
> > > +	return machine && !strcmp(perf_env__arch(machine->env), arch);
> > > +}
> > > +
> > >  int machine__nr_cpus_avail(struct machine *machine)
> > >  {
> > >  	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> > > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> > > index a143087eeb47..665535153411 100644
> > > --- a/tools/perf/util/machine.h
> > > +++ b/tools/perf/util/machine.h
> > > @@ -208,6 +208,7 @@ static inline bool machine__is_host(struct machine *machine)
> > >  }
> > >  
> > >  bool machine__is(struct machine *machine, const char *arch);
> > > +bool machine__normalize_is(struct machine *machine, const char *arch);
> > >  int machine__nr_cpus_avail(struct machine *machine);
> > >  
> > >  struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
> > > -- 
> > > 2.25.1
> > > 
> 
> -- 
> 
> - Arnaldo
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
  2021-12-21  9:32     ` Jiri Olsa
@ 2022-01-10 10:48       ` German Gomez
  -1 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2022-01-10 10:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, linux-perf-users, acme, Alexandre Truong,
	John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, linux-arm-kernel

Hi Jiri,

On 21/12/2021 09:32, Jiri Olsa wrote:
> On Fri, Dec 17, 2021 at 03:45:20PM +0000, German Gomez wrote:
>
> SNIP
>
>> +}
>> +
>> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
>> +{
>> +	int ret;
>> +	struct entries entries = {};
>> +	struct regs_dump old_regs = sample->user_regs;
>> +
>> +	if (!get_leaf_frame_caller_enabled(sample))
>> +		return 0;
>> +
>> +	/*
>> +	 * If PC and SP are not recorded, get the value of PC from the stack
>> +	 * and set its mask. SP is not used when doing the unwinding but it
>> +	 * still needs to be set to prevent failures.
>> +	 */
>> +
>> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
>> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
>> +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
>> +	}
>> +
>> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
>> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
>> +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
>> +	}
>> +
>> +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
> just curious, did you try this with both unwinders libunwind/libdw?

Yes I did,

This is the program I was using:

  int a = 0;

  void leaf() {
    for (int i = 0; i < 10000000; i++)
      a *= a;
  }

  void parent() {
    leaf();
  }

  int main() {
    parent();
  }

  ... compiled with "gcc -O0 -fno-omit-frame-pointer -fno-inline program.c"

>
> any chance you could add arm specific test for this?

I don't see a reason not to. I'll make a note for a separate patch.

>
> otherwise it looks good to me
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>

Thanks for the review,
German

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp"
@ 2022-01-10 10:48       ` German Gomez
  0 siblings, 0 replies; 26+ messages in thread
From: German Gomez @ 2022-01-10 10:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, linux-perf-users, acme, Alexandre Truong,
	John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, linux-arm-kernel

Hi Jiri,

On 21/12/2021 09:32, Jiri Olsa wrote:
> On Fri, Dec 17, 2021 at 03:45:20PM +0000, German Gomez wrote:
>
> SNIP
>
>> +}
>> +
>> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
>> +{
>> +	int ret;
>> +	struct entries entries = {};
>> +	struct regs_dump old_regs = sample->user_regs;
>> +
>> +	if (!get_leaf_frame_caller_enabled(sample))
>> +		return 0;
>> +
>> +	/*
>> +	 * If PC and SP are not recorded, get the value of PC from the stack
>> +	 * and set its mask. SP is not used when doing the unwinding but it
>> +	 * still needs to be set to prevent failures.
>> +	 */
>> +
>> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_PC))) {
>> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_PC);
>> +		sample->user_regs.cache_regs[PERF_REG_ARM64_PC] = sample->callchain->ips[usr_idx+1];
>> +	}
>> +
>> +	if (!(sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_SP))) {
>> +		sample->user_regs.cache_mask |= SMPL_REG_MASK(PERF_REG_ARM64_SP);
>> +		sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0;
>> +	}
>> +
>> +	ret = unwind__get_entries(add_entry, &entries, thread, sample, 2);
> just curious, did you try this with both unwinders libunwind/libdw?

Yes I did,

This is the program I was using:

  int a = 0;

  void leaf() {
    for (int i = 0; i < 10000000; i++)
      a *= a;
  }

  void parent() {
    leaf();
  }

  int main() {
    parent();
  }

  ... compiled with "gcc -O0 -fno-omit-frame-pointer -fno-inline program.c"

>
> any chance you could add arm specific test for this?

I don't see a reason not to. I'll make a note for a separate patch.

>
> otherwise it looks good to me
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>

Thanks for the review,
German

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-01-10 10:50 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-17 15:45 [PATCH v5 0/6] perf tools/arm64: Fix missing leaf-function callers in ARM64 when using "--call-graph=fp" German Gomez
2021-12-17 15:45 ` German Gomez
2021-12-17 15:45 ` [PATCH v5 1/6] perf tools: record ARM64 LR register automatically German Gomez
2021-12-17 15:45   ` German Gomez
2021-12-17 15:45 ` [PATCH v5 2/6] perf tools: add a mechanism to inject stack frames German Gomez
2021-12-17 15:45   ` German Gomez
2021-12-17 15:45 ` [PATCH v5 3/6] perf tools: Refactor script__setup_sample_type() German Gomez
2021-12-17 15:45   ` German Gomez
2021-12-17 15:45 ` [PATCH v5 4/6] perf tools: enable dwarf_callchain_users on arm64 German Gomez
2021-12-17 15:45   ` German Gomez
2021-12-17 15:45 ` [PATCH v5 5/6] perf tools: Refactor SMPL_REG macro in perf_regs.h German Gomez
2021-12-17 15:45   ` German Gomez
2021-12-17 15:45 ` [PATCH v5 6/6] perf arm64: inject missing frames if perf-record used "--call-graph=fp" German Gomez
2021-12-17 15:45   ` German Gomez
2021-12-17 16:01   ` James Clark
2021-12-17 16:01     ` James Clark
2021-12-18 11:35     ` Arnaldo Carvalho de Melo
2021-12-18 11:35       ` Arnaldo Carvalho de Melo
2021-12-21  9:32   ` Jiri Olsa
2021-12-21  9:32     ` Jiri Olsa
2021-12-21 14:17     ` Arnaldo Carvalho de Melo
2021-12-21 14:17       ` Arnaldo Carvalho de Melo
2021-12-21 15:06       ` Jiri Olsa
2021-12-21 15:06         ` Jiri Olsa
2022-01-10 10:48     ` German Gomez
2022-01-10 10:48       ` German Gomez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.