linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] perf cs-etm: Support thread stack and callchain
@ 2020-02-03  2:07 Leo Yan
  2020-02-03  2:07 ` [PATCH v4 1/5] perf cs-etm: Refactor instruction size handling Leo Yan
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Leo Yan @ 2020-02-03  2:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

This patch series adds support for thread stack and callchain; this is a
sequential version from v3 [1] but reorgnized the patches, some changes
have been refactored into the instruction sample fix patch set [2], and
this patch set is only to focus on thread stack and callchain support.

Patch 01 is to refactor the instruction size calculation; this patch is
used by patch 02.

Patch 02 is to add thread stack support, after applying this patch the
option '-F,+callindent' can be used by perf script tool; patch 03 is to
add branch filter thus the Perf tool can display branch samples only
for function calls and returns after enable the call indentation or call
chain related options.

Patch 04 is to synthesize call chain for the instruction samples.

Patch 05 allows the instruction sample can be handled synchronously with
the thread stack, thus it fixes an error for the callchain generation.

This patch set has been tested on Juno-r2 after applied on perf/core
branch with latest commit 85fc95d75970 ("perf maps: Add missing unlock
to maps__insert() error case"), and this patch set is dependent on the
instruction sample fix patch set [2].


Test for option '-F,+callindent':

Before:

  # perf script -F,+callindent
            main   840          1          branches: main                                 ffffa2319d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so)
            main   840          1          branches:                                      aaaab94cb7d0 main+0xc (/root/coresight_test/main)
            main   840          1          branches:                                      aaaab94cb808 main+0x44 (/root/coresight_test/main)
            main   840          1          branches: lib_loop_test@plt                    aaaab94cb7dc main+0x18 (/root/coresight_test/main)
            main   840          1          branches: lib_loop_test@plt                    aaaab94cb67c lib_loop_test@plt+0xc (/root/coresight_test/main)
            main   840          1          branches: _init                                aaaab94cb650 _init+0x30 (/root/coresight_test/main)
            main   840          1          branches: _dl_fixup                            ffffa24a5b44 _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches: _dl_lookup_symbol_x                  ffffa24a0070 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

  [...]

After:

  # perf script -F,+callindent
            main   840          1          branches:             main                                                     ffffa2319d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so)
            main   840          1          branches:                 lib_loop_test@plt                                    aaaab94cb7dc main+0x18 (/root/coresight_test/main)
            main   840          1          branches:                     _dl_fixup                                        ffffa24a5b44 _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                         _dl_lookup_symbol_x                          ffffa24a0070 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                             do_lookup_x                              ffffa249c4a4 _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                 check_match                          ffffa249bbf8 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                     strcmp                           ffffa249b890 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                 printf@plt                                           aaaab94cb7f0 main+0x2c (/root/coresight_test/main)
            main   840          1          branches:                     _dl_fixup                                        ffffa24a5b44 _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                         _dl_lookup_symbol_x                          ffffa24a0070 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                             do_lookup_x                              ffffa249c4a4 _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                 _dl_name_match_p                     ffffa249baf8 do_lookup_x+0x138 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                     strcmp                           ffffa24a17e8 _dl_name_match_p+0x18 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                     strcmp                           ffffa24a180c _dl_name_match_p+0x3c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                 _dl_name_match_p                     ffffa249baf8 do_lookup_x+0x138 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                     strcmp                           ffffa24a17e8 _dl_name_match_p+0x18 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                     strcmp                           ffffa24a180c _dl_name_match_p+0x3c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                 check_match                          ffffa249bbf8 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                     strcmp                           ffffa249b890 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main   840          1          branches:                                     strcmp                           ffffa249b968 check_match+0x148 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

  [...]


Test for option '--itrace=g':

Before:

  # perf script --itrace=g16l64i100
            main   840        100      instructions:  ffff8000102642c0 event_sched_in.isra.119+0x140 ([kernel.kallsyms])
            main   840        100      instructions:  ffff800010264794 flexible_sched_in+0xe4 ([kernel.kallsyms])
            main   840        100      instructions:  ffff800010263024 perf_pmu_disable+0x4 ([kernel.kallsyms])
            main   840        100      instructions:  ffff80001026b0e0 perf_swevent_add+0xb8 ([kernel.kallsyms])
            main   840        100      instructions:  ffff80001025b504 calc_timer_values+0x34 ([kernel.kallsyms])
            main   840        100      instructions:  ffff80001019bd24 clocks_calc_mult_shift+0x3c ([kernel.kallsyms])
            main   840        100      instructions:  ffff80001026556c perf_event_update_userpage+0xec ([kernel.kallsyms])
            main   840        100      instructions:  ffff80001025c5e4 visit_groups_merge+0x194 ([kernel.kallsyms])

  [...]

After:

  # perf script --itrace=g16l64i100

  main   840        100      instructions: 
  	ffff800010264794 flexible_sched_in+0xe4 ([kernel.kallsyms])
  	ffff80001025c57c visit_groups_merge+0x12c ([kernel.kallsyms])

  main   840        100      instructions: 
  	ffff800010263024 perf_pmu_disable+0x4 ([kernel.kallsyms])
  	ffff8000102641f0 event_sched_in.isra.119+0x70 ([kernel.kallsyms])
  	ffff8000102643d8 group_sched_in+0x60 ([kernel.kallsyms])
  	ffff8000102647b0 flexible_sched_in+0x100 ([kernel.kallsyms])
  	ffff80001025c57c visit_groups_merge+0x12c ([kernel.kallsyms])

  main   840        100      instructions: 
  	ffff80001026b0e0 perf_swevent_add+0xb8 ([kernel.kallsyms])
  	ffff80001026423c event_sched_in.isra.119+0xbc ([kernel.kallsyms])
  	ffff8000102643d8 group_sched_in+0x60 ([kernel.kallsyms])
  	ffff8000102647b0 flexible_sched_in+0x100 ([kernel.kallsyms])
  	ffff80001025c57c visit_groups_merge+0x12c ([kernel.kallsyms])

  [...]


Changes from v3:
* Splitted out separate patch set for instruction samples fixing.
* Rebased on latest perf/core branch.

Changes from v2:
* Added patch 01 to fix the unsigned variable comparison to zero
  (Suzuki).
* Refined commit logs.

Changes from v1:
* Added comments for task thread handling (Mathieu).
* Split patch 02 into two patches, one is for support thread stack and
  another is for callchain support (Mathieu).
* Added a new patch to support branch filter.

[1] https://lkml.org/lkml/2019/10/5/51
[2] https://lkml.org/lkml/2020/2/2/228


Leo Yan (5):
  perf cs-etm: Refactor instruction size handling
  perf cs-etm: Support thread stack
  perf cs-etm: Support branch filter
  perf cs-etm: Support callchain for instruction sample
  perf cs-etm: Synchronize instruction sample with the thread stack

 tools/perf/util/cs-etm.c | 145 ++++++++++++++++++++++++++++++++-------
 1 file changed, 121 insertions(+), 24 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/5] perf cs-etm: Refactor instruction size handling
  2020-02-03  2:07 [PATCH v4 0/5] perf cs-etm: Support thread stack and callchain Leo Yan
@ 2020-02-03  2:07 ` Leo Yan
  2020-02-06 12:36   ` Mike Leach
  2020-02-03  2:07 ` [PATCH v4 2/5] perf cs-etm: Support thread stack Leo Yan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Leo Yan @ 2020-02-03  2:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

cs-etm.c has several functions which need to know instruction size
based on address, e.g. cs_etm__instr_addr() and cs_etm__copy_insn()
two functions both calculate the instruction size separately with its
duplicated code.  Furthermore, adding new features later which might
require to calculate instruction size as well.

For this reason, this patch refactors the code to introduce a new
function cs_etm__instr_size(), this function is central place to
calculate the instruction size based on ISA type and instruction
address.

For a neat implementation, cs_etm__instr_addr() will always execute the
loop without checking ISA type, this allows cs_etm__instr_size() and
cs_etm__instr_addr() have no any duplicate code with each other and both
functions are independent and can be changed separately without breaking
anything.  As a side effect, cs_etm__instr_addr() will do a few more
iterations for A32/A64 instructions, this would be fine if consider perf
is a tool running in the user space.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 48 +++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 720108bd8dba..cb6fcc2acca0 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -918,6 +918,26 @@ static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
 	return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
 }
 
+static inline int cs_etm__instr_size(struct cs_etm_queue *etmq,
+				     u8 trace_chan_id,
+				     enum cs_etm_isa isa,
+				     u64 addr)
+{
+	int insn_len;
+
+	/*
+	 * T32 instruction size might be 32-bit or 16-bit, decide by calling
+	 * cs_etm__t32_instr_size().
+	 */
+	if (isa == CS_ETM_ISA_T32)
+		insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id, addr);
+	/* Otherwise, A64 and A32 instruction size are always 32-bit. */
+	else
+		insn_len = 4;
+
+	return insn_len;
+}
+
 static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
 {
 	/* Returns 0 for the CS_ETM_DISCONTINUITY packet */
@@ -942,19 +962,15 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
 				     const struct cs_etm_packet *packet,
 				     u64 offset)
 {
-	if (packet->isa == CS_ETM_ISA_T32) {
-		u64 addr = packet->start_addr;
+	u64 addr = packet->start_addr;
 
-		while (offset) {
-			addr += cs_etm__t32_instr_size(etmq,
-						       trace_chan_id, addr);
-			offset--;
-		}
-		return addr;
+	while (offset) {
+		addr += cs_etm__instr_size(etmq, trace_chan_id,
+					   packet->isa, addr);
+		offset--;
 	}
 
-	/* Assume a 4 byte instruction size (A32/A64) */
-	return packet->start_addr + offset * 4;
+	return addr;
 }
 
 static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
@@ -1094,16 +1110,8 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
 		return;
 	}
 
-	/*
-	 * T32 instruction size might be 32-bit or 16-bit, decide by calling
-	 * cs_etm__t32_instr_size().
-	 */
-	if (packet->isa == CS_ETM_ISA_T32)
-		sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id,
-							  sample->ip);
-	/* Otherwise, A64 and A32 instruction size are always 32-bit. */
-	else
-		sample->insn_len = 4;
+	sample->insn_len = cs_etm__instr_size(etmq, trace_chan_id,
+					      packet->isa, sample->ip);
 
 	cs_etm__mem_access(etmq, trace_chan_id, sample->ip,
 			   sample->insn_len, (void *)sample->insn);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/5] perf cs-etm: Support thread stack
  2020-02-03  2:07 [PATCH v4 0/5] perf cs-etm: Support thread stack and callchain Leo Yan
  2020-02-03  2:07 ` [PATCH v4 1/5] perf cs-etm: Refactor instruction size handling Leo Yan
@ 2020-02-03  2:07 ` Leo Yan
  2020-02-03  2:07 ` [PATCH v4 3/5] perf cs-etm: Support branch filter Leo Yan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Leo Yan @ 2020-02-03  2:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

Since Arm CoreSight doesn't support thread stack, the decoding cannot
display symbols with indented spaces to reflect the stack depth.

This patch adds support thread stack for Arm CoreSight, this allows
'perf script' to display properly for option '-F,+callindent'.

Before:

  # perf script -F,+callindent
            main  2808          1          branches: coresight_test1                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
            main  2808          1          branches: printf@plt                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches: printf@plt                           aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
            main  2808          1          branches: _init                                aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
            main  2808          1          branches: _dl_fixup                            ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches: _dl_lookup_symbol_x                  ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]

After:

  # perf script -F,+callindent
            main  2808          1          branches:                 coresight_test1                                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
            main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches:                     printf@plt                                       aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
            main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index cb6fcc2acca0..4d289ecf49e2 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1117,6 +1117,45 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
 			   sample->insn_len, (void *)sample->insn);
 }
 
+static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
+				    struct cs_etm_traceid_queue *tidq)
+{
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	u8 trace_chan_id = tidq->trace_chan_id;
+	int insn_len;
+	u64 from_ip, to_ip;
+
+	if (etm->synth_opts.thread_stack) {
+		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
+		to_ip = cs_etm__first_executed_instr(tidq->packet);
+
+		insn_len = cs_etm__instr_size(etmq, trace_chan_id,
+					      tidq->prev_packet->isa, from_ip);
+
+		/*
+		 * Create thread stacks by keeping track of calls and returns;
+		 * any call pushes thread stack, return pops the stack, and
+		 * flush stack when the trace is discontinuous.
+		 */
+		thread_stack__event(tidq->thread, tidq->prev_packet->cpu,
+				    tidq->prev_packet->flags,
+				    from_ip, to_ip, insn_len,
+				    etmq->buffer->buffer_nr + 1);
+	} else {
+		/*
+		 * The thread stack can be output via thread_stack__process();
+		 * thus the detailed information about paired calls and returns
+		 * will be facilitated by Python script for the db-export.
+		 *
+		 * Need to set trace buffer number and flush thread stack if the
+		 * trace buffer number has been alternate.
+		 */
+		thread_stack__set_trace_nr(tidq->thread,
+					   tidq->prev_packet->cpu,
+					   etmq->buffer->buffer_nr + 1);
+	}
+}
+
 static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 					    struct cs_etm_traceid_queue *tidq,
 					    u64 addr, u64 period)
@@ -1471,6 +1510,9 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 		tidq->period_instructions = instrs_over;
 	}
 
+	if (tidq->prev_packet->last_instr_taken_branch)
+		cs_etm__add_stack_event(etmq, tidq);
+
 	if (etm->sample_branches) {
 		bool generate_sample = false;
 
@@ -2687,6 +2729,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 		itrace_synth_opts__set_default(&etm->synth_opts,
 				session->itrace_synth_opts->default_no_sample);
 		etm->synth_opts.callchain = false;
+		etm->synth_opts.thread_stack =
+				session->itrace_synth_opts->thread_stack;
 	}
 
 	err = cs_etm__synth_events(etm, session);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/5] perf cs-etm: Support branch filter
  2020-02-03  2:07 [PATCH v4 0/5] perf cs-etm: Support thread stack and callchain Leo Yan
  2020-02-03  2:07 ` [PATCH v4 1/5] perf cs-etm: Refactor instruction size handling Leo Yan
  2020-02-03  2:07 ` [PATCH v4 2/5] perf cs-etm: Support thread stack Leo Yan
@ 2020-02-03  2:07 ` Leo Yan
  2020-02-03  2:07 ` [PATCH v4 4/5] perf cs-etm: Support callchain for instruction sample Leo Yan
  2020-02-03  2:07 ` [PATCH v4 5/5] perf cs-etm: Synchronize instruction sample with the thread stack Leo Yan
  4 siblings, 0 replies; 11+ messages in thread
From: Leo Yan @ 2020-02-03  2:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

If user specifies option '-F,+callindent' or call chain related options,
it means users only care about function calls and returns; for these
cases, it's pointless to generate samples for the branches within
function.  But unlike other hardware trace handling (e.g. Intel's pt or
bts), Arm CoreSight doesn't filter branch types for these options and
generate samples for all branches, this causes Perf to output many
spurious blanks if the branch is not a function call or return.

To only output pairs of calls and returns, this patch introduces branch
filter and the filter is set according to synthetic options.  Finally,
Perf can output only for calls and returns and avoid to output other
unnecessary blanks.

Before:

  # perf script -F,+callindent
            main  2808          1          branches:                 coresight_test1@plt                                  aaaaba8d37d8 main+0x14 (/root/coresight_test/main)
            main  2808          1          branches:                     coresight_test1@plt                              aaaaba8d367c coresight_test1@plt+0xc (/root/coresight_test/main)
            main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
  [...]

After:

  # perf script -F,+callindent
            main  2808          1          branches:                 coresight_test1@plt                                  aaaaba8d37d8 main+0x14 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                             do_lookup_x                              ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28.
            main  2808          1          branches:                                 check_match                          ffff86369bf0 do_lookup_x+0x238 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                     strcmp                           ffff86369888 check_match+0x70 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                             do_lookup_x                              ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28.
            main  2808          1          branches:                                 _dl_name_match_p                     ffff86369af0 do_lookup_x+0x138 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                     strcmp                           ffff8636f7f0 _dl_name_match_p+0x18 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 4d289ecf49e2..617facef24cc 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -56,6 +56,7 @@ struct cs_etm_auxtrace {
 
 	int num_cpu;
 	u32 auxtrace_type;
+	u32 branches_filter;
 	u64 branches_sample_type;
 	u64 branches_id;
 	u64 instructions_sample_type;
@@ -1218,6 +1219,10 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	} dummy_bs;
 	u64 ip;
 
+	if (etm->branches_filter &&
+	    !(etm->branches_filter & tidq->prev_packet->flags))
+		return 0;
+
 	ip = cs_etm__last_executed_instr(tidq->prev_packet);
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
@@ -2733,6 +2738,13 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 				session->itrace_synth_opts->thread_stack;
 	}
 
+	if (etm->synth_opts.calls)
+		etm->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
+					PERF_IP_FLAG_TRACE_END;
+	if (etm->synth_opts.returns)
+		etm->branches_filter |= PERF_IP_FLAG_RETURN |
+					PERF_IP_FLAG_TRACE_BEGIN;
+
 	err = cs_etm__synth_events(etm, session);
 	if (err)
 		goto err_delete_thread;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/5] perf cs-etm: Support callchain for instruction sample
  2020-02-03  2:07 [PATCH v4 0/5] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (2 preceding siblings ...)
  2020-02-03  2:07 ` [PATCH v4 3/5] perf cs-etm: Support branch filter Leo Yan
@ 2020-02-03  2:07 ` Leo Yan
  2020-02-03  2:07 ` [PATCH v4 5/5] perf cs-etm: Synchronize instruction sample with the thread stack Leo Yan
  4 siblings, 0 replies; 11+ messages in thread
From: Leo Yan @ 2020-02-03  2:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

Now CoreSight has supported the thread stack; based on the thread stack
we can synthesize call chain for the instruction sample; the call chain
can be injected by option '--itrace=g'.

Before:

  # perf script --itrace=g16l64i100
            main  1579        100      instructions:  ffff0000102137f0 group_sched_in+0xb0 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
  [...]

After:

  # perf script --itrace=g16l64i100

  main  1579        100      instructions:
          ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

  main  1579        100      instructions:
          ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
          ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
          ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

  main  1579        100      instructions:
          ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
          ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
          ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
          ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 617facef24cc..8f805657658d 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -17,6 +17,7 @@
 #include <stdlib.h>
 
 #include "auxtrace.h"
+#include "callchain.h"
 #include "color.h"
 #include "cs-etm.h"
 #include "cs-etm-decoder/cs-etm-decoder.h"
@@ -74,6 +75,7 @@ struct cs_etm_traceid_queue {
 	size_t last_branch_pos;
 	union perf_event *event_buf;
 	struct thread *thread;
+	struct ip_callchain *chain;
 	struct branch_stack *last_branch;
 	struct branch_stack *last_branch_rb;
 	struct cs_etm_packet *prev_packet;
@@ -251,6 +253,16 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 	if (!tidq->prev_packet)
 		goto out_free;
 
+	if (etm->synth_opts.callchain) {
+		size_t sz = sizeof(struct ip_callchain);
+
+		/* Add 1 to callchain_sz for callchain context */
+		sz += (etm->synth_opts.callchain_sz + 1) * sizeof(u64);
+		tidq->chain = zalloc(sz);
+		if (!tidq->chain)
+			goto out_free;
+	}
+
 	if (etm->synth_opts.last_branch) {
 		size_t sz = sizeof(struct branch_stack);
 
@@ -273,6 +285,7 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 out_free:
 	zfree(&tidq->last_branch_rb);
 	zfree(&tidq->last_branch);
+	zfree(&tidq->chain);
 	zfree(&tidq->prev_packet);
 	zfree(&tidq->packet);
 out:
@@ -544,6 +557,7 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
 		zfree(&tidq->event_buf);
 		zfree(&tidq->last_branch);
 		zfree(&tidq->last_branch_rb);
+		zfree(&tidq->chain);
 		zfree(&tidq->prev_packet);
 		zfree(&tidq->packet);
 		zfree(&tidq);
@@ -1126,7 +1140,7 @@ static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
 	int insn_len;
 	u64 from_ip, to_ip;
 
-	if (etm->synth_opts.thread_stack) {
+	if (etm->synth_opts.callchain || etm->synth_opts.thread_stack) {
 		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
 		to_ip = cs_etm__first_executed_instr(tidq->packet);
 
@@ -1182,6 +1196,14 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 
 	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
 
+	if (etm->synth_opts.callchain) {
+		thread_stack__sample(tidq->thread, tidq->packet->cpu,
+				     tidq->chain,
+				     etm->synth_opts.callchain_sz + 1,
+				     sample.ip, etm->kernel_start);
+		sample.callchain = tidq->chain;
+	}
+
 	if (etm->synth_opts.last_branch)
 		sample.branch_stack = tidq->last_branch;
 
@@ -1364,6 +1386,8 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 		attr.sample_type &= ~(u64)PERF_SAMPLE_ADDR;
 	}
 
+	if (etm->synth_opts.callchain)
+		attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
 	if (etm->synth_opts.last_branch)
 		attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
 
@@ -2733,7 +2757,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 	} else {
 		itrace_synth_opts__set_default(&etm->synth_opts,
 				session->itrace_synth_opts->default_no_sample);
-		etm->synth_opts.callchain = false;
 		etm->synth_opts.thread_stack =
 				session->itrace_synth_opts->thread_stack;
 	}
@@ -2745,6 +2768,14 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 		etm->branches_filter |= PERF_IP_FLAG_RETURN |
 					PERF_IP_FLAG_TRACE_BEGIN;
 
+	if (etm->synth_opts.callchain && !symbol_conf.use_callchain) {
+		symbol_conf.use_callchain = true;
+		if (callchain_register_param(&callchain_param) < 0) {
+			symbol_conf.use_callchain = false;
+			etm->synth_opts.callchain = false;
+		}
+	}
+
 	err = cs_etm__synth_events(etm, session);
 	if (err)
 		goto err_delete_thread;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 5/5] perf cs-etm: Synchronize instruction sample with the thread stack
  2020-02-03  2:07 [PATCH v4 0/5] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (3 preceding siblings ...)
  2020-02-03  2:07 ` [PATCH v4 4/5] perf cs-etm: Support callchain for instruction sample Leo Yan
@ 2020-02-03  2:07 ` Leo Yan
  2020-02-06 15:01   ` Mike Leach
  4 siblings, 1 reply; 11+ messages in thread
From: Leo Yan @ 2020-02-03  2:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

The synthesized flow use 'tidq->packet' for instruction samples; on the
other hand, 'tidp->prev_packet' is used to generate the thread stack and
the branch samples, this results in the instruction samples using one
packet ahead than thread stack and branch samples ('tidp->prev_packet'
vs 'tidq->packet').

This leads to an instruction's callchain error as shows in below
example:

  main  1579        100      instructions:
  	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
  	ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
  	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
  	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
  	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
  	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
  	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

In the callchain log, for the two continuous lines the up line contains
one child function info and the followed line contains the caller
function info, and so forth.  So the first two lines are:

  perf_event_update_userpage+0x4c  => the sampled instruction
  perf_event_update_userpage+0x48  => the parent function's calling

The child function and parent function both are the same function
perf_event_update_userpage(), but this isn't a recursive function, thus
the sequence for perf_event_update_userpage() calling itself shouldn't
never happen.  This callchain error is caused by the instruction sample
using an ahead packet than the thread stack, the thread stack is deferred
to process the new packet and misses to pop stack if it is just a return
packet.

To fix this issue, we can simply change to use 'tidq->prev_packet' to
generate the instruction samples, this allows the thread stack to push
and pop synchronously with instruction sample.  Finally, the callchain
can be displayed correctly as below:

  main  1579        100      instructions:
	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8f805657658d..410e40ce19f2 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1414,7 +1414,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 	struct cs_etm_packet *tmp;
 	int ret;
 	u8 trace_chan_id = tidq->trace_chan_id;
-	u64 instrs_executed = tidq->packet->instr_count;
+	u64 instrs_executed = tidq->prev_packet->instr_count;
 
 	tidq->period_instructions += instrs_executed;
 
@@ -1505,7 +1505,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 			 * instruction)
 			 */
 			addr = cs_etm__instr_addr(etmq, trace_chan_id,
-						  tidq->packet, offset - 1);
+						  tidq->prev_packet,
+						  offset - 1);
 			ret = cs_etm__synth_instruction_sample(
 				etmq, tidq, addr,
 				etm->instructions_sample_period);
@@ -1525,7 +1526,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 			 * instruction)
 			 */
 			addr = cs_etm__instr_addr(etmq, trace_chan_id,
-						  tidq->packet, offset - 1);
+						  tidq->prev_packet,
+						  offset - 1);
 			ret = cs_etm__synth_instruction_sample(
 				etmq, tidq, addr,
 				etm->instructions_sample_period);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/5] perf cs-etm: Refactor instruction size handling
  2020-02-03  2:07 ` [PATCH v4 1/5] perf cs-etm: Refactor instruction size handling Leo Yan
@ 2020-02-06 12:36   ` Mike Leach
  2020-02-10  5:46     ` Leo Yan
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Leach @ 2020-02-06 12:36 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

Hi Leo,

On Mon, 3 Feb 2020 at 02:07, Leo Yan <leo.yan@linaro.org> wrote:
>
> cs-etm.c has several functions which need to know instruction size
> based on address, e.g. cs_etm__instr_addr() and cs_etm__copy_insn()
> two functions both calculate the instruction size separately with its
> duplicated code.  Furthermore, adding new features later which might
> require to calculate instruction size as well.
>
> For this reason, this patch refactors the code to introduce a new
> function cs_etm__instr_size(), this function is central place to
> calculate the instruction size based on ISA type and instruction
> address.
>
> For a neat implementation, cs_etm__instr_addr() will always execute the
> loop without checking ISA type, this allows cs_etm__instr_size() and
> cs_etm__instr_addr() have no any duplicate code with each other and both
> functions are independent and can be changed separately without breaking
> anything.  As a side effect, cs_etm__instr_addr() will do a few more
> iterations for A32/A64 instructions, this would be fine if consider perf
> is a tool running in the user space.
>

I prefer to take the optimisation win where I can - I always do in the
trace decoder when counting instructions over a range.
Consider that you can be processing MB of trace data, and most likely
that will be A64/A32 on a lot of the current and future platforms.

Therefore I would keep the useful cs_etm__instr_size() function, but
also keep a single ISA check in cs_etm__instr_addr() to do
the (addr + offset * 4) calculation for non T32.

Regards

Mike

> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 48 +++++++++++++++++++++++-----------------
>  1 file changed, 28 insertions(+), 20 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 720108bd8dba..cb6fcc2acca0 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -918,6 +918,26 @@ static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
>         return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
>  }
>
> +static inline int cs_etm__instr_size(struct cs_etm_queue *etmq,
> +                                    u8 trace_chan_id,
> +                                    enum cs_etm_isa isa,
> +                                    u64 addr)
> +{
> +       int insn_len;
> +
> +       /*
> +        * T32 instruction size might be 32-bit or 16-bit, decide by calling
> +        * cs_etm__t32_instr_size().
> +        */
> +       if (isa == CS_ETM_ISA_T32)
> +               insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id, addr);
> +       /* Otherwise, A64 and A32 instruction size are always 32-bit. */
> +       else
> +               insn_len = 4;
> +
> +       return insn_len;
> +}
> +
>  static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
>  {
>         /* Returns 0 for the CS_ETM_DISCONTINUITY packet */
> @@ -942,19 +962,15 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
>                                      const struct cs_etm_packet *packet,
>                                      u64 offset)
>  {
> -       if (packet->isa == CS_ETM_ISA_T32) {
> -               u64 addr = packet->start_addr;
> +       u64 addr = packet->start_addr;
>
> -               while (offset) {
> -                       addr += cs_etm__t32_instr_size(etmq,
> -                                                      trace_chan_id, addr);
> -                       offset--;
> -               }
> -               return addr;
> +       while (offset) {
> +               addr += cs_etm__instr_size(etmq, trace_chan_id,
> +                                          packet->isa, addr);
> +               offset--;
>         }
>
> -       /* Assume a 4 byte instruction size (A32/A64) */
> -       return packet->start_addr + offset * 4;
> +       return addr;
>  }
>
>  static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
> @@ -1094,16 +1110,8 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
>                 return;
>         }
>
> -       /*
> -        * T32 instruction size might be 32-bit or 16-bit, decide by calling
> -        * cs_etm__t32_instr_size().
> -        */
> -       if (packet->isa == CS_ETM_ISA_T32)
> -               sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id,
> -                                                         sample->ip);
> -       /* Otherwise, A64 and A32 instruction size are always 32-bit. */
> -       else
> -               sample->insn_len = 4;
> +       sample->insn_len = cs_etm__instr_size(etmq, trace_chan_id,
> +                                             packet->isa, sample->ip);
>
>         cs_etm__mem_access(etmq, trace_chan_id, sample->ip,
>                            sample->insn_len, (void *)sample->insn);
> --
> 2.17.1
>


-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 5/5] perf cs-etm: Synchronize instruction sample with the thread stack
  2020-02-03  2:07 ` [PATCH v4 5/5] perf cs-etm: Synchronize instruction sample with the thread stack Leo Yan
@ 2020-02-06 15:01   ` Mike Leach
  2020-02-13  9:08     ` Leo Yan
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Leach @ 2020-02-06 15:01 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

Hi Leo,

On Mon, 3 Feb 2020 at 02:08, Leo Yan <leo.yan@linaro.org> wrote:
>
> The synthesized flow use 'tidq->packet' for instruction samples; on the
> other hand, 'tidp->prev_packet' is used to generate the thread stack and
> the branch samples, this results in the instruction samples using one
> packet ahead than thread stack and branch samples ('tidp->prev_packet'
> vs 'tidq->packet').
>
> This leads to an instruction's callchain error as shows in below
> example:
>
>   main  1579        100      instructions:
>         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
>         ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
>         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
>         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
>         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
>         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
>         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
>
> In the callchain log, for the two continuous lines the up line contains
> one child function info and the followed line contains the caller
> function info, and so forth.  So the first two lines are:
>
>   perf_event_update_userpage+0x4c  => the sampled instruction
>   perf_event_update_userpage+0x48  => the parent function's calling
>
> The child function and parent function both are the same function
> perf_event_update_userpage(), but this isn't a recursive function, thus
> the sequence for perf_event_update_userpage() calling itself shouldn't
> never happen.  This callchain error is caused by the instruction sample
> using an ahead packet than the thread stack, the thread stack is deferred
> to process the new packet and misses to pop stack if it is just a return
> packet.
>
> To fix this issue, we can simply change to use 'tidq->prev_packet' to
> generate the instruction samples, this allows the thread stack to push
> and pop synchronously with instruction sample.  Finally, the callchain
> can be displayed correctly as below:
>
>   main  1579        100      instructions:
>         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
>         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
>         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
>         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
>         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
>         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 8f805657658d..410e40ce19f2 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1414,7 +1414,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>         struct cs_etm_packet *tmp;
>         int ret;
>         u8 trace_chan_id = tidq->trace_chan_id;
> -       u64 instrs_executed = tidq->packet->instr_count;
> +       u64 instrs_executed = tidq->prev_packet->instr_count;
>
>         tidq->period_instructions += instrs_executed;
>
> @@ -1505,7 +1505,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>                          * instruction)
>                          */
>                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> -                                                 tidq->packet, offset - 1);
> +                                                 tidq->prev_packet,
> +                                                 offset - 1);
>                         ret = cs_etm__synth_instruction_sample(
>                                 etmq, tidq, addr,
>                                 etm->instructions_sample_period);
> @@ -1525,7 +1526,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>                          * instruction)
>                          */
>                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> -                                                 tidq->packet, offset - 1);
> +                                                 tidq->prev_packet,
> +                                                 offset - 1);
>                         ret = cs_etm__synth_instruction_sample(
>                                 etmq, tidq, addr,
>                                 etm->instructions_sample_period);
> --
> 2.17.1
>
I am really not convinced that this is a correct solution.

Consider a set of trace range packet inputs:
current: 0x3000-0x3050
prev:  0x2000-0x2100
prev-1: 0x1020-0x1080

Before your modification.....
cs_etm__sample()  processes the current packet....

On entry, the branch stack will contain:0x1080=>0x2000;

We add to this from the current packet to get: 0x1080=>0x2000; 0x2100=>0x3000;

This is then copied by cs_etm__copy_last_branch_rb()

We find the instruction sample address in the range 0x3000 to 0x3050,
e.g. 0x3010.
cs_etm__synth_instruction_sample() will then generate a sample with values

sample.ip = 0x3010
sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;

to be passed to the perf session / injected as required.
This sample has the correct branch context for the sampled address -
i.e. how the code arrived @0x3010

After the modification.....
The branch stack will be the same, but the sample address will be from
the range 0x2000-0x2010, e.g. 0x2008 to give a sample in
cs_etm__synth_instruction_sample() of
sample.ip = 0x2008
sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;

This really does not make much sense  - the branch stack no longer
relates to the sample.ip.

Further - cs_etm__synth_instruction_sample() calls cs_etm__copy_insn()
using the _current_ packet and sample.ip. This is a clear mismatch.

I don't know what is causing the apparent error in the callchain, but
given that the previous features added in this set, work without this
alteration, I feel there must be another solution.

Regards

Mike

-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/5] perf cs-etm: Refactor instruction size handling
  2020-02-06 12:36   ` Mike Leach
@ 2020-02-10  5:46     ` Leo Yan
  0 siblings, 0 replies; 11+ messages in thread
From: Leo Yan @ 2020-02-10  5:46 UTC (permalink / raw)
  To: Mike Leach
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

Hi Mike,

On Thu, Feb 06, 2020 at 12:36:43PM +0000, Mike Leach wrote:
> Hi Leo,
> 
> On Mon, 3 Feb 2020 at 02:07, Leo Yan <leo.yan@linaro.org> wrote:
> >
> > cs-etm.c has several functions which need to know instruction size
> > based on address, e.g. cs_etm__instr_addr() and cs_etm__copy_insn()
> > two functions both calculate the instruction size separately with its
> > duplicated code.  Furthermore, adding new features later which might
> > require to calculate instruction size as well.
> >
> > For this reason, this patch refactors the code to introduce a new
> > function cs_etm__instr_size(), this function is central place to
> > calculate the instruction size based on ISA type and instruction
> > address.
> >
> > For a neat implementation, cs_etm__instr_addr() will always execute the
> > loop without checking ISA type, this allows cs_etm__instr_size() and
> > cs_etm__instr_addr() have no any duplicate code with each other and both
> > functions are independent and can be changed separately without breaking
> > anything.  As a side effect, cs_etm__instr_addr() will do a few more
> > iterations for A32/A64 instructions, this would be fine if consider perf
> > is a tool running in the user space.
> >
> 
> I prefer to take the optimisation win where I can - I always do in the
> trace decoder when counting instructions over a range.
> Consider that you can be processing MB of trace data, and most likely
> that will be A64/A32 on a lot of the current and future platforms.
> 
> Therefore I would keep the useful cs_etm__instr_size() function, but
> also keep a single ISA check in cs_etm__instr_addr() to do
> the (addr + offset * 4) calculation for non T32.

Understand.  Will refine the code by following this suggestion.

Thanks,
Leo Yan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 5/5] perf cs-etm: Synchronize instruction sample with the thread stack
  2020-02-06 15:01   ` Mike Leach
@ 2020-02-13  9:08     ` Leo Yan
  2020-02-13 15:00       ` Mike Leach
  0 siblings, 1 reply; 11+ messages in thread
From: Leo Yan @ 2020-02-13  9:08 UTC (permalink / raw)
  To: Mike Leach
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

Hi Mike,

On Thu, Feb 06, 2020 at 03:01:52PM +0000, Mike Leach wrote:
> Hi Leo,
> 
> On Mon, 3 Feb 2020 at 02:08, Leo Yan <leo.yan@linaro.org> wrote:
> >
> > The synthesized flow use 'tidq->packet' for instruction samples; on the
> > other hand, 'tidp->prev_packet' is used to generate the thread stack and
> > the branch samples, this results in the instruction samples using one
> > packet ahead than thread stack and branch samples ('tidp->prev_packet'
> > vs 'tidq->packet').
> >
> > This leads to an instruction's callchain error as shows in below
> > example:
> >
> >   main  1579        100      instructions:
> >         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> >         ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
> >         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> >         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> >         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> >         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> >         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> >
> > In the callchain log, for the two continuous lines the up line contains
> > one child function info and the followed line contains the caller
> > function info, and so forth.  So the first two lines are:
> >
> >   perf_event_update_userpage+0x4c  => the sampled instruction
> >   perf_event_update_userpage+0x48  => the parent function's calling
> >
> > The child function and parent function both are the same function
> > perf_event_update_userpage(), but this isn't a recursive function, thus
> > the sequence for perf_event_update_userpage() calling itself shouldn't
> > never happen.  This callchain error is caused by the instruction sample
> > using an ahead packet than the thread stack, the thread stack is deferred
> > to process the new packet and misses to pop stack if it is just a return
> > packet.
> >
> > To fix this issue, we can simply change to use 'tidq->prev_packet' to
> > generate the instruction samples, this allows the thread stack to push
> > and pop synchronously with instruction sample.  Finally, the callchain
> > can be displayed correctly as below:
> >
> >   main  1579        100      instructions:
> >         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> >         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> >         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> >         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> >         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> >         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/cs-etm.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 8f805657658d..410e40ce19f2 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -1414,7 +1414,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >         struct cs_etm_packet *tmp;
> >         int ret;
> >         u8 trace_chan_id = tidq->trace_chan_id;
> > -       u64 instrs_executed = tidq->packet->instr_count;
> > +       u64 instrs_executed = tidq->prev_packet->instr_count;
> >
> >         tidq->period_instructions += instrs_executed;
> >
> > @@ -1505,7 +1505,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >                          * instruction)
> >                          */
> >                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > -                                                 tidq->packet, offset - 1);
> > +                                                 tidq->prev_packet,
> > +                                                 offset - 1);
> >                         ret = cs_etm__synth_instruction_sample(
> >                                 etmq, tidq, addr,
> >                                 etm->instructions_sample_period);
> > @@ -1525,7 +1526,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >                          * instruction)
> >                          */
> >                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > -                                                 tidq->packet, offset - 1);
> > +                                                 tidq->prev_packet,
> > +                                                 offset - 1);
> >                         ret = cs_etm__synth_instruction_sample(
> >                                 etmq, tidq, addr,
> >                                 etm->instructions_sample_period);
> > --
> > 2.17.1
> >
> I am really not convinced that this is a correct solution.
> 
> Consider a set of trace range packet inputs:
> current: 0x3000-0x3050
> prev:  0x2000-0x2100
> prev-1: 0x1020-0x1080
> 
> Before your modification.....
> cs_etm__sample()  processes the current packet....
> 
> On entry, the branch stack will contain:0x1080=>0x2000;
> 
> We add to this from the current packet to get: 0x1080=>0x2000; 0x2100=>0x3000;
> 
> This is then copied by cs_etm__copy_last_branch_rb()
> 
> We find the instruction sample address in the range 0x3000 to 0x3050,
> e.g. 0x3010.
> cs_etm__synth_instruction_sample() will then generate a sample with values
> 
> sample.ip = 0x3010
> sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;
> 
> to be passed to the perf session / injected as required.
> This sample has the correct branch context for the sampled address -
> i.e. how the code arrived @0x3010
> 
> After the modification.....
> The branch stack will be the same, but the sample address will be from
> the range 0x2000-0x2010, e.g. 0x2008 to give a sample in
> cs_etm__synth_instruction_sample() of
> sample.ip = 0x2008
> sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;
> 
> This really does not make much sense  - the branch stack no longer
> relates to the sample.ip.
> 
> Further - cs_etm__synth_instruction_sample() calls cs_etm__copy_insn()
> using the _current_ packet and sample.ip. This is a clear mismatch.
> 
> I don't know what is causing the apparent error in the callchain, but
> given that the previous features added in this set, work without this
> alteration, I feel there must be another solution.

Good catch!  Thanks a lot for very detailed analysis.

I root caused this issue is relevant with the sequence between two
functions thread_stack__event() and thread_stack__sample().

In this series, thread_stack__sample() is prior to thread_stack__event(),
thus the thread stack event cannot be handled before thread stack
generation.

If move the function thread_stack__event() up and place it before
instruction sample synthesizing; thread_stack__event() can be invoked
prior to thread_stack__sample(), then I can see the thread stack can
be popped properly and the issue can be fixed.  Simply to say, patch
0002 should change the code as below:

         /*
          * Record a branch when the last instruction in
          * PREV_PACKET is a branch.
          */
         if (etm->synth_opts.last_branch &&                                 
             tidq->prev_packet->sample_type == CS_ETM_RANGE &&              
             tidq->prev_packet->last_instr_taken_branch)                    
                 cs_etm__update_last_branch_rb(etmq, tidq);                 
                                                                            
         /*                                                                 
          * The stack event must be processed prior to synthesizing         
          * instruction sample; this can ensure the instruction samples     
          * to generate correct thread stack.                               
          */                                                                
         if (tidq->prev_packet->last_instr_taken_branch)                    
                 cs_etm__add_stack_event(etmq, tidq);                       
                                                                            
         if (etm->sample_instructions &&                                    
             tidq->period_instructions >= etm->instructions_sample_period) {

                cs_etm__synth_instruction_sample();
                    `-> thread_stack__sample();

         }

Does this make sense for you?

Thanks,
Leo Yan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 5/5] perf cs-etm: Synchronize instruction sample with the thread stack
  2020-02-13  9:08     ` Leo Yan
@ 2020-02-13 15:00       ` Mike Leach
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Leach @ 2020-02-13 15:00 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

Hi Leo,

On Thu, 13 Feb 2020 at 09:08, Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi Mike,
>
> On Thu, Feb 06, 2020 at 03:01:52PM +0000, Mike Leach wrote:
> > Hi Leo,
> >
> > On Mon, 3 Feb 2020 at 02:08, Leo Yan <leo.yan@linaro.org> wrote:
> > >
> > > The synthesized flow use 'tidq->packet' for instruction samples; on the
> > > other hand, 'tidp->prev_packet' is used to generate the thread stack and
> > > the branch samples, this results in the instruction samples using one
> > > packet ahead than thread stack and branch samples ('tidp->prev_packet'
> > > vs 'tidq->packet').
> > >
> > > This leads to an instruction's callchain error as shows in below
> > > example:
> > >
> > >   main  1579        100      instructions:
> > >         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> > >         ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
> > >         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> > >         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> > >         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> > >         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> > >         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> > >
> > > In the callchain log, for the two continuous lines the up line contains
> > > one child function info and the followed line contains the caller
> > > function info, and so forth.  So the first two lines are:
> > >
> > >   perf_event_update_userpage+0x4c  => the sampled instruction
> > >   perf_event_update_userpage+0x48  => the parent function's calling
> > >
> > > The child function and parent function both are the same function
> > > perf_event_update_userpage(), but this isn't a recursive function, thus
> > > the sequence for perf_event_update_userpage() calling itself shouldn't
> > > never happen.  This callchain error is caused by the instruction sample
> > > using an ahead packet than the thread stack, the thread stack is deferred
> > > to process the new packet and misses to pop stack if it is just a return
> > > packet.
> > >
> > > To fix this issue, we can simply change to use 'tidq->prev_packet' to
> > > generate the instruction samples, this allows the thread stack to push
> > > and pop synchronously with instruction sample.  Finally, the callchain
> > > can be displayed correctly as below:
> > >
> > >   main  1579        100      instructions:
> > >         ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> > >         ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> > >         ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> > >         ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> > >         ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> > >         ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> > >
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > >  tools/perf/util/cs-etm.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > > index 8f805657658d..410e40ce19f2 100644
> > > --- a/tools/perf/util/cs-etm.c
> > > +++ b/tools/perf/util/cs-etm.c
> > > @@ -1414,7 +1414,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > >         struct cs_etm_packet *tmp;
> > >         int ret;
> > >         u8 trace_chan_id = tidq->trace_chan_id;
> > > -       u64 instrs_executed = tidq->packet->instr_count;
> > > +       u64 instrs_executed = tidq->prev_packet->instr_count;
> > >
> > >         tidq->period_instructions += instrs_executed;
> > >
> > > @@ -1505,7 +1505,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > >                          * instruction)
> > >                          */
> > >                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > > -                                                 tidq->packet, offset - 1);
> > > +                                                 tidq->prev_packet,
> > > +                                                 offset - 1);
> > >                         ret = cs_etm__synth_instruction_sample(
> > >                                 etmq, tidq, addr,
> > >                                 etm->instructions_sample_period);
> > > @@ -1525,7 +1526,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > >                          * instruction)
> > >                          */
> > >                         addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > > -                                                 tidq->packet, offset - 1);
> > > +                                                 tidq->prev_packet,
> > > +                                                 offset - 1);
> > >                         ret = cs_etm__synth_instruction_sample(
> > >                                 etmq, tidq, addr,
> > >                                 etm->instructions_sample_period);
> > > --
> > > 2.17.1
> > >
> > I am really not convinced that this is a correct solution.
> >
> > Consider a set of trace range packet inputs:
> > current: 0x3000-0x3050
> > prev:  0x2000-0x2100
> > prev-1: 0x1020-0x1080
> >
> > Before your modification.....
> > cs_etm__sample()  processes the current packet....
> >
> > On entry, the branch stack will contain:0x1080=>0x2000;
> >
> > We add to this from the current packet to get: 0x1080=>0x2000; 0x2100=>0x3000;
> >
> > This is then copied by cs_etm__copy_last_branch_rb()
> >
> > We find the instruction sample address in the range 0x3000 to 0x3050,
> > e.g. 0x3010.
> > cs_etm__synth_instruction_sample() will then generate a sample with values
> >
> > sample.ip = 0x3010
> > sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;
> >
> > to be passed to the perf session / injected as required.
> > This sample has the correct branch context for the sampled address -
> > i.e. how the code arrived @0x3010
> >
> > After the modification.....
> > The branch stack will be the same, but the sample address will be from
> > the range 0x2000-0x2010, e.g. 0x2008 to give a sample in
> > cs_etm__synth_instruction_sample() of
> > sample.ip = 0x2008
> > sample.branch_stack = 0x1080=>0x2000; 0x2100=>0x3000;
> >
> > This really does not make much sense  - the branch stack no longer
> > relates to the sample.ip.
> >
> > Further - cs_etm__synth_instruction_sample() calls cs_etm__copy_insn()
> > using the _current_ packet and sample.ip. This is a clear mismatch.
> >
> > I don't know what is causing the apparent error in the callchain, but
> > given that the previous features added in this set, work without this
> > alteration, I feel there must be another solution.
>
> Good catch!  Thanks a lot for very detailed analysis.
>
> I root caused this issue is relevant with the sequence between two
> functions thread_stack__event() and thread_stack__sample().
>
> In this series, thread_stack__sample() is prior to thread_stack__event(),
> thus the thread stack event cannot be handled before thread stack
> generation.
>
> If move the function thread_stack__event() up and place it before
> instruction sample synthesizing; thread_stack__event() can be invoked
> prior to thread_stack__sample(), then I can see the thread stack can
> be popped properly and the issue can be fixed.  Simply to say, patch
> 0002 should change the code as below:
>
>          /*
>           * Record a branch when the last instruction in
>           * PREV_PACKET is a branch.
>           */
>          if (etm->synth_opts.last_branch &&
>              tidq->prev_packet->sample_type == CS_ETM_RANGE &&
>              tidq->prev_packet->last_instr_taken_branch)
>                  cs_etm__update_last_branch_rb(etmq, tidq);
>
>          /*
>           * The stack event must be processed prior to synthesizing
>           * instruction sample; this can ensure the instruction samples
>           * to generate correct thread stack.
>           */
>          if (tidq->prev_packet->last_instr_taken_branch)
>                  cs_etm__add_stack_event(etmq, tidq);
>
>          if (etm->sample_instructions &&
>              tidq->period_instructions >= etm->instructions_sample_period) {
>
>                 cs_etm__synth_instruction_sample();
>                     `-> thread_stack__sample();
>
>          }
>
> Does this make sense for you?
>
This looks good.

Regards

Mike



> Thanks,
> Leo Yan



-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-02-13 15:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03  2:07 [PATCH v4 0/5] perf cs-etm: Support thread stack and callchain Leo Yan
2020-02-03  2:07 ` [PATCH v4 1/5] perf cs-etm: Refactor instruction size handling Leo Yan
2020-02-06 12:36   ` Mike Leach
2020-02-10  5:46     ` Leo Yan
2020-02-03  2:07 ` [PATCH v4 2/5] perf cs-etm: Support thread stack Leo Yan
2020-02-03  2:07 ` [PATCH v4 3/5] perf cs-etm: Support branch filter Leo Yan
2020-02-03  2:07 ` [PATCH v4 4/5] perf cs-etm: Support callchain for instruction sample Leo Yan
2020-02-03  2:07 ` [PATCH v4 5/5] perf cs-etm: Synchronize instruction sample with the thread stack Leo Yan
2020-02-06 15:01   ` Mike Leach
2020-02-13  9:08     ` Leo Yan
2020-02-13 15:00       ` Mike Leach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).