linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] perf cs-etm: Support thread stack and callchain
@ 2019-10-05  9:16 Leo Yan
  2019-10-05  9:16 ` [PATCH v3 1/6] perf cs-etm: Fix unsigned variable comparison to zero Leo Yan
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Leo Yan @ 2019-10-05  9:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Mike Leach, Coresight ML,
	Peter Zijlstra, Ingo Molnar
  Cc: Leo Yan

This patch series adds support for thread stack and callchain.

Patch 01 is to fix the unsigned variable comparison to zero; patch 02 is
to refactor the instruction size calculation; these two patches are
preparation for patch 03.

Patch 03 is to add thread stack support, after applying this patch the
option '-F,+callindent' can be used by perf script tool; patch 04 is to
add branch filter thus the Perf tool can display branch samples only
for function calls and returns after enable the call indentation or call
chain related options.

Patch 05 is to synthesize call chain for the instruction samples.

Patch 06 allows the instruction sample can be handled synchronously with
the thread stack, thus it fixes an error for the callchain generation.

This patch set has been tested on 96boards Hikey620 after applied on
perf/core branch with latest commit f7bf75a78095 ("perf annotate: Don't
return -1 for error when doing BPF disassembly").


Test for option '-F,+callindent':

Before:

  # perf script -F,+callindent
            main  2808          1          branches: coresight_test1                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
            main  2808          1          branches: printf@plt                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches: printf@plt                           aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
            main  2808          1          branches: _init                                aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
            main  2808          1          branches: _dl_fixup                            ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches: _dl_lookup_symbol_x                  ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]

After:

  # perf script -F,+callindent
            main  2808          1          branches:                 coresight_test1@plt                                  aaaaba8d37d8 main+0x14 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                             do_lookup_x                              ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28.
            main  2808          1          branches:                                 check_match                          ffff86369bf0 do_lookup_x+0x238 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                     strcmp                           ffff86369888 check_match+0x70 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                             do_lookup_x                              ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28.
            main  2808          1          branches:                                 _dl_name_match_p                     ffff86369af0 do_lookup_x+0x138 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                     strcmp                           ffff8636f7f0 _dl_name_match_p+0x18 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]


Test for option '--itrace=g':

Before:

  # perf script --itrace=g16l64i100
            main  1579        100      instructions:  ffff0000102137f0 group_sched_in+0xb0 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
  [...]

After:

  # perf script --itrace=g16l64i100

  main  1579        100      instructions:
          ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

  main  1579        100      instructions:
          ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
          ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
          ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

  main  1579        100      instructions:
          ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
          ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
          ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
          ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
  [...]


Changes from v2:
* Added patch 01 to fix the unsigned variable comparison to zero
  (Suzuki).
* Refined commit logs.

Changes from v1:
* Added comments for task thread handling (Mathieu).
* Split patch 02 into two patches, one is for support thread stack and
  another is for callchain support (Mathieu).
* Added a new patch to support branch filter.


Leo Yan (6):
  perf cs-etm: Fix unsigned variable comparison to zero
  perf cs-etm: Refactor instruction size handling
  perf cs-etm: Support thread stack
  perf cs-etm: Support branch filter
  perf cs-etm: Support callchain for instruction sample
  perf cs-etm: Synchronize instruction sample with the thread stack

 tools/perf/util/cs-etm.c | 145 ++++++++++++++++++++++++++++++++-------
 1 file changed, 120 insertions(+), 25 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/6] perf cs-etm: Fix unsigned variable comparison to zero
  2019-10-05  9:16 [PATCH v3 0/6] perf cs-etm: Support thread stack and callchain Leo Yan
@ 2019-10-05  9:16 ` Leo Yan
  2019-10-11 20:16   ` Mathieu Poirier
  2019-10-05  9:16 ` [PATCH v3 2/6] perf cs-etm: Refactor instruction size handling Leo Yan
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Leo Yan @ 2019-10-05  9:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Mike Leach, Coresight ML,
	Peter Zijlstra, Ingo Molnar
  Cc: Leo Yan

If the u64 variable 'offset' is a negative integer, comparison it with
bigger than zero is always going to be true because it is unsigned.
Fix this by using s64 type for variable 'offset'.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 4ba0f871f086..4bc2d9709d4f 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -940,7 +940,7 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
 static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
 				     u64 trace_chan_id,
 				     const struct cs_etm_packet *packet,
-				     u64 offset)
+				     s64 offset)
 {
 	if (packet->isa == CS_ETM_ISA_T32) {
 		u64 addr = packet->start_addr;
@@ -1372,7 +1372,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 		 * sample is reported as though instruction has just been
 		 * executed, but PC has not advanced to next instruction)
 		 */
-		u64 offset = (instrs_executed - instrs_over - 1);
+		s64 offset = (instrs_executed - instrs_over - 1);
 		u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
 					      tidq->packet, offset);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 2/6] perf cs-etm: Refactor instruction size handling
  2019-10-05  9:16 [PATCH v3 0/6] perf cs-etm: Support thread stack and callchain Leo Yan
  2019-10-05  9:16 ` [PATCH v3 1/6] perf cs-etm: Fix unsigned variable comparison to zero Leo Yan
@ 2019-10-05  9:16 ` Leo Yan
  2019-10-05  9:16 ` [PATCH v3 3/6] perf cs-etm: Support thread stack Leo Yan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2019-10-05  9:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Mike Leach, Coresight ML,
	Peter Zijlstra, Ingo Molnar
  Cc: Leo Yan

In cs-etm.c there have several functions need to know instruction size
based on address, e.g. cs_etm__instr_addr() and cs_etm__copy_insn()
these two functions both calculate the instruction size separately.
Furthermore, if we consider to add new features later which also might
require to calculate instruction size.

For this reason, this patch refactors the code to introduce a new
function cs_etm__instr_size(), it will be a central place to calculate
the instruction size based on ISA type and instruction address.

For a neat implementation, cs_etm__instr_addr() will always execute the
loop without checking ISA type, this allows cs_etm__instr_size() and
cs_etm__instr_addr() have no any duplicate code with each other and both
functions can be changed independently later without breaking anything.
As a side effect, cs_etm__instr_addr() will do a few more iterations for
A32/A64 instructions, this would be fine if consider perf tool runs in
the user space.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 48 +++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 4bc2d9709d4f..58ceba7b91d5 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -918,6 +918,26 @@ static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
 	return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
 }
 
+static inline int cs_etm__instr_size(struct cs_etm_queue *etmq,
+				     u8 trace_chan_id,
+				     enum cs_etm_isa isa,
+				     u64 addr)
+{
+	int insn_len;
+
+	/*
+	 * T32 instruction size might be 32-bit or 16-bit, decide by calling
+	 * cs_etm__t32_instr_size().
+	 */
+	if (isa == CS_ETM_ISA_T32)
+		insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id, addr);
+	/* Otherwise, A64 and A32 instruction size are always 32-bit. */
+	else
+		insn_len = 4;
+
+	return insn_len;
+}
+
 static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
 {
 	/* Returns 0 for the CS_ETM_DISCONTINUITY packet */
@@ -942,19 +962,15 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
 				     const struct cs_etm_packet *packet,
 				     s64 offset)
 {
-	if (packet->isa == CS_ETM_ISA_T32) {
-		u64 addr = packet->start_addr;
+	u64 addr = packet->start_addr;
 
-		while (offset > 0) {
-			addr += cs_etm__t32_instr_size(etmq,
-						       trace_chan_id, addr);
-			offset--;
-		}
-		return addr;
+	while (offset > 0) {
+		addr += cs_etm__instr_size(etmq, trace_chan_id,
+					   packet->isa, addr);
+		offset--;
 	}
 
-	/* Assume a 4 byte instruction size (A32/A64) */
-	return packet->start_addr + offset * 4;
+	return addr;
 }
 
 static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
@@ -1094,16 +1110,8 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
 		return;
 	}
 
-	/*
-	 * T32 instruction size might be 32-bit or 16-bit, decide by calling
-	 * cs_etm__t32_instr_size().
-	 */
-	if (packet->isa == CS_ETM_ISA_T32)
-		sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id,
-							  sample->ip);
-	/* Otherwise, A64 and A32 instruction size are always 32-bit. */
-	else
-		sample->insn_len = 4;
+	sample->insn_len = cs_etm__instr_size(etmq, trace_chan_id,
+					      packet->isa, sample->ip);
 
 	cs_etm__mem_access(etmq, trace_chan_id, sample->ip,
 			   sample->insn_len, (void *)sample->insn);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 3/6] perf cs-etm: Support thread stack
  2019-10-05  9:16 [PATCH v3 0/6] perf cs-etm: Support thread stack and callchain Leo Yan
  2019-10-05  9:16 ` [PATCH v3 1/6] perf cs-etm: Fix unsigned variable comparison to zero Leo Yan
  2019-10-05  9:16 ` [PATCH v3 2/6] perf cs-etm: Refactor instruction size handling Leo Yan
@ 2019-10-05  9:16 ` Leo Yan
  2019-10-11 17:53   ` Mathieu Poirier
  2019-10-05  9:16 ` [PATCH v3 4/6] perf cs-etm: Support branch filter Leo Yan
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Leo Yan @ 2019-10-05  9:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Mike Leach, Coresight ML,
	Peter Zijlstra, Ingo Molnar
  Cc: Leo Yan

Since Arm CoreSight doesn't support thread stack, the decoding cannot
display symbols with indented spaces to reflect the stack depth.

This patch adds support thread stack for Arm CoreSight, this allows
'perf script' to display properly for option '-F,+callindent'.

Before:

  # perf script -F,+callindent
            main  2808          1          branches: coresight_test1                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
            main  2808          1          branches: printf@plt                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches: printf@plt                           aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
            main  2808          1          branches: _init                                aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
            main  2808          1          branches: _dl_fixup                            ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches: _dl_lookup_symbol_x                  ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]

After:

  # perf script -F,+callindent
            main  2808          1          branches:                 coresight_test1                                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
            main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches:                     printf@plt                                       aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
            main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 58ceba7b91d5..780abbfd1833 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1117,6 +1117,45 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
 			   sample->insn_len, (void *)sample->insn);
 }
 
+static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
+				    struct cs_etm_traceid_queue *tidq)
+{
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	u8 trace_chan_id = tidq->trace_chan_id;
+	int insn_len;
+	u64 from_ip, to_ip;
+
+	if (etm->synth_opts.thread_stack) {
+		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
+		to_ip = cs_etm__first_executed_instr(tidq->packet);
+
+		insn_len = cs_etm__instr_size(etmq, trace_chan_id,
+					      tidq->prev_packet->isa, from_ip);
+
+		/*
+		 * Create thread stacks by keeping track of calls and returns;
+		 * any call pushes thread stack, return pops the stack, and
+		 * flush stack when the trace is discontinuous.
+		 */
+		thread_stack__event(tidq->thread, tidq->prev_packet->cpu,
+				    tidq->prev_packet->flags,
+				    from_ip, to_ip, insn_len,
+				    etmq->buffer->buffer_nr);
+	} else {
+		/*
+		 * The thread stack can be output via thread_stack__process();
+		 * thus the detailed information about paired calls and returns
+		 * will be facilitated by Python script for the db-export.
+		 *
+		 * Need to set trace buffer number and flush thread stack if the
+		 * trace buffer number has been alternate.
+		 */
+		thread_stack__set_trace_nr(tidq->thread,
+					   tidq->prev_packet->cpu,
+					   etmq->buffer->buffer_nr);
+	}
+}
+
 static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 					    struct cs_etm_traceid_queue *tidq,
 					    u64 addr, u64 period)
@@ -1393,6 +1432,9 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 		tidq->period_instructions = instrs_over;
 	}
 
+	if (tidq->prev_packet->last_instr_taken_branch)
+		cs_etm__add_stack_event(etmq, tidq);
+
 	if (etm->sample_branches) {
 		bool generate_sample = false;
 
@@ -2593,6 +2635,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 		itrace_synth_opts__set_default(&etm->synth_opts,
 				session->itrace_synth_opts->default_no_sample);
 		etm->synth_opts.callchain = false;
+		etm->synth_opts.thread_stack =
+				session->itrace_synth_opts->thread_stack;
 	}
 
 	err = cs_etm__synth_events(etm, session);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 4/6] perf cs-etm: Support branch filter
  2019-10-05  9:16 [PATCH v3 0/6] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (2 preceding siblings ...)
  2019-10-05  9:16 ` [PATCH v3 3/6] perf cs-etm: Support thread stack Leo Yan
@ 2019-10-05  9:16 ` Leo Yan
  2019-10-05  9:16 ` [PATCH v3 5/6] perf cs-etm: Support callchain for instruction sample Leo Yan
  2019-10-05  9:16 ` [PATCH v3 6/6] perf cs-etm: Synchronize instruction sample with the thread stack Leo Yan
  5 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2019-10-05  9:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Mike Leach, Coresight ML,
	Peter Zijlstra, Ingo Molnar
  Cc: Leo Yan

If user specifies option '-F,+callindent' or call chain related options,
it means users only care about function calls and returns; for these
cases, it's pointless to generate samples for the branches within
function.  But unlike other hardware trace handling (e.g. Intel's pt or
bts), Arm CoreSight doesn't filter branch types for these options and
generate samples for all branches, this causes Perf to output many
spurious blanks if the branch is not a function call or return.

To only output pairs of calls and returns, this patch introduces branch
filter and the filter is set according to synthetic options.  Finally,
Perf can output only for calls and returns and avoid to output other
unnecessary blanks.

Before:

  # perf script -F,+callindent
            main  2808          1          branches:                 coresight_test1@plt                                  aaaaba8d37d8 main+0x14 (/root/coresight_test/main)
            main  2808          1          branches:                     coresight_test1@plt                              aaaaba8d367c coresight_test1@plt+0xc (/root/coresight_test/main)
            main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
  [...]

After:

  # perf script -F,+callindent
            main  2808          1          branches:                 coresight_test1@plt                                  aaaaba8d37d8 main+0x14 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                             do_lookup_x                              ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28.
            main  2808          1          branches:                                 check_match                          ffff86369bf0 do_lookup_x+0x238 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                     strcmp                           ffff86369888 check_match+0x70 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                             do_lookup_x                              ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28.
            main  2808          1          branches:                                 _dl_name_match_p                     ffff86369af0 do_lookup_x+0x138 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                     strcmp                           ffff8636f7f0 _dl_name_match_p+0x18 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 780abbfd1833..4b42f9c9bd34 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -56,6 +56,7 @@ struct cs_etm_auxtrace {
 
 	int num_cpu;
 	u32 auxtrace_type;
+	u32 branches_filter;
 	u64 branches_sample_type;
 	u64 branches_id;
 	u64 instructions_sample_type;
@@ -1223,6 +1224,10 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	} dummy_bs;
 	u64 ip;
 
+	if (etm->branches_filter &&
+	    !(etm->branches_filter & tidq->prev_packet->flags))
+		return 0;
+
 	ip = cs_etm__last_executed_instr(tidq->prev_packet);
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
@@ -2639,6 +2644,13 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 				session->itrace_synth_opts->thread_stack;
 	}
 
+	if (etm->synth_opts.calls)
+		etm->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
+					PERF_IP_FLAG_TRACE_END;
+	if (etm->synth_opts.returns)
+		etm->branches_filter |= PERF_IP_FLAG_RETURN |
+					PERF_IP_FLAG_TRACE_BEGIN;
+
 	err = cs_etm__synth_events(etm, session);
 	if (err)
 		goto err_delete_thread;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 5/6] perf cs-etm: Support callchain for instruction sample
  2019-10-05  9:16 [PATCH v3 0/6] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (3 preceding siblings ...)
  2019-10-05  9:16 ` [PATCH v3 4/6] perf cs-etm: Support branch filter Leo Yan
@ 2019-10-05  9:16 ` Leo Yan
  2019-10-11 19:59   ` Mathieu Poirier
  2019-10-05  9:16 ` [PATCH v3 6/6] perf cs-etm: Synchronize instruction sample with the thread stack Leo Yan
  5 siblings, 1 reply; 20+ messages in thread
From: Leo Yan @ 2019-10-05  9:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Mike Leach, Coresight ML,
	Peter Zijlstra, Ingo Molnar
  Cc: Leo Yan

Now CoreSight has supported the thread stack; based on the thread stack
we can synthesize call chain for the instruction sample; the call chain
can be injected by option '--itrace=g'.

Before:

  # perf script --itrace=g16l64i100
            main  1579        100      instructions:  ffff0000102137f0 group_sched_in+0xb0 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
  [...]

After:

  # perf script --itrace=g16l64i100

  main  1579        100      instructions:
          ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

  main  1579        100      instructions:
          ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
          ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
          ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

  main  1579        100      instructions:
          ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
          ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
          ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
          ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 4b42f9c9bd34..56e501cd2f5f 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -17,6 +17,7 @@
 #include <stdlib.h>
 
 #include "auxtrace.h"
+#include "callchain.h"
 #include "color.h"
 #include "cs-etm.h"
 #include "cs-etm-decoder/cs-etm-decoder.h"
@@ -74,6 +75,7 @@ struct cs_etm_traceid_queue {
 	size_t last_branch_pos;
 	union perf_event *event_buf;
 	struct thread *thread;
+	struct ip_callchain *chain;
 	struct branch_stack *last_branch;
 	struct branch_stack *last_branch_rb;
 	struct cs_etm_packet *prev_packet;
@@ -251,6 +253,16 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 	if (!tidq->prev_packet)
 		goto out_free;
 
+	if (etm->synth_opts.callchain) {
+		size_t sz = sizeof(struct ip_callchain);
+
+		/* Add 1 to callchain_sz for callchain context */
+		sz += (etm->synth_opts.callchain_sz + 1) * sizeof(u64);
+		tidq->chain = zalloc(sz);
+		if (!tidq->chain)
+			goto out_free;
+	}
+
 	if (etm->synth_opts.last_branch) {
 		size_t sz = sizeof(struct branch_stack);
 
@@ -275,6 +287,7 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 	zfree(&tidq->last_branch);
 	zfree(&tidq->prev_packet);
 	zfree(&tidq->packet);
+	zfree(&tidq->chain);
 out:
 	return rc;
 }
@@ -546,6 +559,7 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
 		zfree(&tidq->last_branch_rb);
 		zfree(&tidq->prev_packet);
 		zfree(&tidq->packet);
+		zfree(&tidq->chain);
 		zfree(&tidq);
 
 		/*
@@ -1126,7 +1140,7 @@ static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
 	int insn_len;
 	u64 from_ip, to_ip;
 
-	if (etm->synth_opts.thread_stack) {
+	if (etm->synth_opts.callchain || etm->synth_opts.thread_stack) {
 		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
 		to_ip = cs_etm__first_executed_instr(tidq->packet);
 
@@ -1182,6 +1196,14 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 
 	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
 
+	if (etm->synth_opts.callchain) {
+		thread_stack__sample(tidq->thread, tidq->packet->cpu,
+				     tidq->chain,
+				     etm->synth_opts.callchain_sz + 1,
+				     sample.ip, etm->kernel_start);
+		sample.callchain = tidq->chain;
+	}
+
 	if (etm->synth_opts.last_branch) {
 		cs_etm__copy_last_branch_rb(etmq, tidq);
 		sample.branch_stack = tidq->last_branch;
@@ -1369,6 +1391,8 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 		attr.sample_type &= ~(u64)PERF_SAMPLE_ADDR;
 	}
 
+	if (etm->synth_opts.callchain)
+		attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
 	if (etm->synth_opts.last_branch)
 		attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
 
@@ -2639,7 +2663,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 	} else {
 		itrace_synth_opts__set_default(&etm->synth_opts,
 				session->itrace_synth_opts->default_no_sample);
-		etm->synth_opts.callchain = false;
 		etm->synth_opts.thread_stack =
 				session->itrace_synth_opts->thread_stack;
 	}
@@ -2651,6 +2674,14 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 		etm->branches_filter |= PERF_IP_FLAG_RETURN |
 					PERF_IP_FLAG_TRACE_BEGIN;
 
+	if (etm->synth_opts.callchain && !symbol_conf.use_callchain) {
+		symbol_conf.use_callchain = true;
+		if (callchain_register_param(&callchain_param) < 0) {
+			symbol_conf.use_callchain = false;
+			etm->synth_opts.callchain = false;
+		}
+	}
+
 	err = cs_etm__synth_events(etm, session);
 	if (err)
 		goto err_delete_thread;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 6/6] perf cs-etm: Synchronize instruction sample with the thread stack
  2019-10-05  9:16 [PATCH v3 0/6] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (4 preceding siblings ...)
  2019-10-05  9:16 ` [PATCH v3 5/6] perf cs-etm: Support callchain for instruction sample Leo Yan
@ 2019-10-05  9:16 ` Leo Yan
  2019-10-11 20:17   ` Mathieu Poirier
  5 siblings, 1 reply; 20+ messages in thread
From: Leo Yan @ 2019-10-05  9:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Mike Leach, Coresight ML,
	Peter Zijlstra, Ingo Molnar
  Cc: Leo Yan

The synthesized flow use 'tidq->packet' for instruction samples; on the
other hand, 'tidp->prev_packet' is used to generate the thread stack and
the branch samples, this results in the instruction samples using one
packet ahead than thread stack and branch samples ('tidp->prev_packet'
vs 'tidq->packet').

This leads to an instruction's callchain error as shows in below
example:

  main  1579        100      instructions:
  	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
  	ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
  	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
  	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
  	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
  	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
  	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

In the callchain log, for the two continuous lines the up line contains
one child function info and the followed line contains the caller
function info, and so forth.  So the first two lines are:

  perf_event_update_userpage+0x4c  => the sampled instruction
  perf_event_update_userpage+0x48  => the parent function's calling

The child function and parent function both are the same function
perf_event_update_userpage(), but this isn't a recursive function, thus
the sequence for perf_event_update_userpage() calling itself shouldn't
never happen.  This callchain error is caused by the instruction sample
using an ahead packet than the thread stack, the thread stack is deferred
to process the new packet and misses to pop stack if it is just a return
packet.

To fix this issue, we can simply change to use 'tidq->prev_packet' to
generate the instruction samples, this allows the thread stack to push
and pop synchronously with instruction sample.  Finally, the callchain
can be displayed correctly as below:

  main  1579        100      instructions:
	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 56e501cd2f5f..fa969dcb45d2 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1419,7 +1419,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 	struct cs_etm_packet *tmp;
 	int ret;
 	u8 trace_chan_id = tidq->trace_chan_id;
-	u64 instrs_executed = tidq->packet->instr_count;
+	u64 instrs_executed = tidq->prev_packet->instr_count;
 
 	tidq->period_instructions += instrs_executed;
 
@@ -1450,7 +1450,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 		 */
 		s64 offset = (instrs_executed - instrs_over - 1);
 		u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
-					      tidq->packet, offset);
+					      tidq->prev_packet, offset);
 
 		ret = cs_etm__synth_instruction_sample(
 			etmq, tidq, addr, etm->instructions_sample_period);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/6] perf cs-etm: Support thread stack
  2019-10-05  9:16 ` [PATCH v3 3/6] perf cs-etm: Support thread stack Leo Yan
@ 2019-10-11 17:53   ` Mathieu Poirier
  2019-10-15  3:33     ` Leo Yan
  2019-10-22  5:03     ` Leo Yan
  0 siblings, 2 replies; 20+ messages in thread
From: Mathieu Poirier @ 2019-10-11 17:53 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	linux-kernel, Mike Leach, Coresight ML, Peter Zijlstra,
	Ingo Molnar

On Sat, Oct 05, 2019 at 05:16:11PM +0800, Leo Yan wrote:
> Since Arm CoreSight doesn't support thread stack, the decoding cannot
> display symbols with indented spaces to reflect the stack depth.
> 
> This patch adds support thread stack for Arm CoreSight, this allows
> 'perf script' to display properly for option '-F,+callindent'.
> 
> Before:
> 
>   # perf script -F,+callindent
>             main  2808          1          branches: coresight_test1                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
>             main  2808          1          branches: printf@plt                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
>             main  2808          1          branches: printf@plt                           aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
>             main  2808          1          branches: _init                                aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
>             main  2808          1          branches: _dl_fixup                            ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so)
>             main  2808          1          branches: _dl_lookup_symbol_x                  ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
>   [...]
> 
> After:
> 
>   # perf script -F,+callindent
>             main  2808          1          branches:                 coresight_test1                                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
>             main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
>             main  2808          1          branches:                     printf@plt                                       aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
>             main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
>             main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
>             main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
>   [...]
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 58ceba7b91d5..780abbfd1833 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1117,6 +1117,45 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
>  			   sample->insn_len, (void *)sample->insn);
>  }
>  
> +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> +				    struct cs_etm_traceid_queue *tidq)
> +{
> +	struct cs_etm_auxtrace *etm = etmq->etm;
> +	u8 trace_chan_id = tidq->trace_chan_id;
> +	int insn_len;
> +	u64 from_ip, to_ip;
> +
> +	if (etm->synth_opts.thread_stack) {
> +		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
> +		to_ip = cs_etm__first_executed_instr(tidq->packet);
> +
> +		insn_len = cs_etm__instr_size(etmq, trace_chan_id,
> +					      tidq->prev_packet->isa, from_ip);
> +
> +		/*
> +		 * Create thread stacks by keeping track of calls and returns;
> +		 * any call pushes thread stack, return pops the stack, and
> +		 * flush stack when the trace is discontinuous.
> +		 */
> +		thread_stack__event(tidq->thread, tidq->prev_packet->cpu,
> +				    tidq->prev_packet->flags,
> +				    from_ip, to_ip, insn_len,
> +				    etmq->buffer->buffer_nr);

Details are a little fuzzy in my head but I'm pretty sure
we want trace_chan_id here.  


> +	} else {
> +		/*
> +		 * The thread stack can be output via thread_stack__process();
> +		 * thus the detailed information about paired calls and returns
> +		 * will be facilitated by Python script for the db-export.
> +		 *
> +		 * Need to set trace buffer number and flush thread stack if the
> +		 * trace buffer number has been alternate.
> +		 */
> +		thread_stack__set_trace_nr(tidq->thread,
> +					   tidq->prev_packet->cpu,
> +					   etmq->buffer->buffer_nr);

Same here.

> +	}
> +}
> +
>  static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>  					    struct cs_etm_traceid_queue *tidq,
>  					    u64 addr, u64 period)
> @@ -1393,6 +1432,9 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>  		tidq->period_instructions = instrs_over;
>  	}
>  
> +	if (tidq->prev_packet->last_instr_taken_branch)
> +		cs_etm__add_stack_event(etmq, tidq);
> +
>  	if (etm->sample_branches) {
>  		bool generate_sample = false;
>  
> @@ -2593,6 +2635,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
>  		itrace_synth_opts__set_default(&etm->synth_opts,
>  				session->itrace_synth_opts->default_no_sample);
>  		etm->synth_opts.callchain = false;
> +		etm->synth_opts.thread_stack =
> +				session->itrace_synth_opts->thread_stack;
>  	}
>  
>  	err = cs_etm__synth_events(etm, session);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 5/6] perf cs-etm: Support callchain for instruction sample
  2019-10-05  9:16 ` [PATCH v3 5/6] perf cs-etm: Support callchain for instruction sample Leo Yan
@ 2019-10-11 19:59   ` Mathieu Poirier
  0 siblings, 0 replies; 20+ messages in thread
From: Mathieu Poirier @ 2019-10-11 19:59 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	linux-kernel, Mike Leach, Coresight ML, Peter Zijlstra,
	Ingo Molnar

On Sat, Oct 05, 2019 at 05:16:13PM +0800, Leo Yan wrote:
> Now CoreSight has supported the thread stack; based on the thread stack
> we can synthesize call chain for the instruction sample; the call chain
> can be injected by option '--itrace=g'.
> 
> Before:
> 
>   # perf script --itrace=g16l64i100
>             main  1579        100      instructions:  ffff0000102137f0 group_sched_in+0xb0 ([kernel.kallsyms])
>             main  1579        100      instructions:  ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
>             main  1579        100      instructions:  ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
>             main  1579        100      instructions:  ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
>             main  1579        100      instructions:  ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
>   [...]
> 
> After:
> 
>   # perf script --itrace=g16l64i100
> 
>   main  1579        100      instructions:
>           ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
>           ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> 
>   main  1579        100      instructions:
>           ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
>           ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
>           ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
>           ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> 
>   main  1579        100      instructions:
>           ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
>           ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
>           ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
>           ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
>           ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
>   [...]
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 35 +++++++++++++++++++++++++++++++++--
>  1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 4b42f9c9bd34..56e501cd2f5f 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -17,6 +17,7 @@
>  #include <stdlib.h>
>  
>  #include "auxtrace.h"
> +#include "callchain.h"
>  #include "color.h"
>  #include "cs-etm.h"
>  #include "cs-etm-decoder/cs-etm-decoder.h"
> @@ -74,6 +75,7 @@ struct cs_etm_traceid_queue {
>  	size_t last_branch_pos;
>  	union perf_event *event_buf;
>  	struct thread *thread;
> +	struct ip_callchain *chain;
>  	struct branch_stack *last_branch;
>  	struct branch_stack *last_branch_rb;
>  	struct cs_etm_packet *prev_packet;
> @@ -251,6 +253,16 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
>  	if (!tidq->prev_packet)
>  		goto out_free;
>  
> +	if (etm->synth_opts.callchain) {
> +		size_t sz = sizeof(struct ip_callchain);
> +
> +		/* Add 1 to callchain_sz for callchain context */
> +		sz += (etm->synth_opts.callchain_sz + 1) * sizeof(u64);
> +		tidq->chain = zalloc(sz);
> +		if (!tidq->chain)
> +			goto out_free;
> +	}
> +
>  	if (etm->synth_opts.last_branch) {
>  		size_t sz = sizeof(struct branch_stack);
>  
> @@ -275,6 +287,7 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
>  	zfree(&tidq->last_branch);
>  	zfree(&tidq->prev_packet);
>  	zfree(&tidq->packet);
> +	zfree(&tidq->chain);

Theoretically this should go two lines up, i.e just below
zfree(&tidq->prev_packet).  If you agree with the comment I did in 3/6 then it
is worth doing the above change, otherwise it can stay that way.

>  out:
>  	return rc;
>  }
> @@ -546,6 +559,7 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
>  		zfree(&tidq->last_branch_rb);
>  		zfree(&tidq->prev_packet);
>  		zfree(&tidq->packet);
> +		zfree(&tidq->chain);

Same comment as above.  The rest looks good to me.

Mathieu

>  		zfree(&tidq);
>  
>  		/*
> @@ -1126,7 +1140,7 @@ static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
>  	int insn_len;
>  	u64 from_ip, to_ip;
>  
> -	if (etm->synth_opts.thread_stack) {
> +	if (etm->synth_opts.callchain || etm->synth_opts.thread_stack) {
>  		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
>  		to_ip = cs_etm__first_executed_instr(tidq->packet);
>  
> @@ -1182,6 +1196,14 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>  
>  	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
>  
> +	if (etm->synth_opts.callchain) {
> +		thread_stack__sample(tidq->thread, tidq->packet->cpu,
> +				     tidq->chain,
> +				     etm->synth_opts.callchain_sz + 1,
> +				     sample.ip, etm->kernel_start);
> +		sample.callchain = tidq->chain;
> +	}
> +
>  	if (etm->synth_opts.last_branch) {
>  		cs_etm__copy_last_branch_rb(etmq, tidq);
>  		sample.branch_stack = tidq->last_branch;
> @@ -1369,6 +1391,8 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
>  		attr.sample_type &= ~(u64)PERF_SAMPLE_ADDR;
>  	}
>  
> +	if (etm->synth_opts.callchain)
> +		attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
>  	if (etm->synth_opts.last_branch)
>  		attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
>  
> @@ -2639,7 +2663,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
>  	} else {
>  		itrace_synth_opts__set_default(&etm->synth_opts,
>  				session->itrace_synth_opts->default_no_sample);
> -		etm->synth_opts.callchain = false;
>  		etm->synth_opts.thread_stack =
>  				session->itrace_synth_opts->thread_stack;
>  	}
> @@ -2651,6 +2674,14 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
>  		etm->branches_filter |= PERF_IP_FLAG_RETURN |
>  					PERF_IP_FLAG_TRACE_BEGIN;
>  
> +	if (etm->synth_opts.callchain && !symbol_conf.use_callchain) {
> +		symbol_conf.use_callchain = true;
> +		if (callchain_register_param(&callchain_param) < 0) {
> +			symbol_conf.use_callchain = false;
> +			etm->synth_opts.callchain = false;
> +		}
> +	}
> +
>  	err = cs_etm__synth_events(etm, session);
>  	if (err)
>  		goto err_delete_thread;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/6] perf cs-etm: Fix unsigned variable comparison to zero
  2019-10-05  9:16 ` [PATCH v3 1/6] perf cs-etm: Fix unsigned variable comparison to zero Leo Yan
@ 2019-10-11 20:16   ` Mathieu Poirier
  2019-10-22  5:10     ` Leo Yan
  0 siblings, 1 reply; 20+ messages in thread
From: Mathieu Poirier @ 2019-10-11 20:16 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	linux-kernel, Mike Leach, Coresight ML, Peter Zijlstra,
	Ingo Molnar

On Sat, Oct 05, 2019 at 05:16:09PM +0800, Leo Yan wrote:
> If the u64 variable 'offset' is a negative integer, comparison it with
> bigger than zero is always going to be true because it is unsigned.
> Fix this by using s64 type for variable 'offset'.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 4ba0f871f086..4bc2d9709d4f 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -940,7 +940,7 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
>  static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
>  				     u64 trace_chan_id,
>  				     const struct cs_etm_packet *packet,
> -				     u64 offset)
> +				     s64 offset)

In Suzuki's reply there was two choices, 1) move the while(offset > 0) to
while (offset) or change the type of @offset to an s64.  Here we know offset
can't be negative because of the 
        tidq->period_instructions >= etm->instructions_sample_period 

in function cs_etm__sample().  As such I think option #1 is the right way to
deal with this rather than changing the type of the variable.

Mathieu

>  {
>  	if (packet->isa == CS_ETM_ISA_T32) {
>  		u64 addr = packet->start_addr;
> @@ -1372,7 +1372,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>  		 * sample is reported as though instruction has just been
>  		 * executed, but PC has not advanced to next instruction)
>  		 */
> -		u64 offset = (instrs_executed - instrs_over - 1);
> +		s64 offset = (instrs_executed - instrs_over - 1);
>  		u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
>  					      tidq->packet, offset);
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 6/6] perf cs-etm: Synchronize instruction sample with the thread stack
  2019-10-05  9:16 ` [PATCH v3 6/6] perf cs-etm: Synchronize instruction sample with the thread stack Leo Yan
@ 2019-10-11 20:17   ` Mathieu Poirier
  2019-10-15  3:44     ` Leo Yan
  2019-10-22  4:50     ` Leo Yan
  0 siblings, 2 replies; 20+ messages in thread
From: Mathieu Poirier @ 2019-10-11 20:17 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	linux-kernel, Mike Leach, Coresight ML, Peter Zijlstra,
	Ingo Molnar

On Sat, Oct 05, 2019 at 05:16:14PM +0800, Leo Yan wrote:
> The synthesized flow use 'tidq->packet' for instruction samples; on the
> other hand, 'tidp->prev_packet' is used to generate the thread stack and
> the branch samples, this results in the instruction samples using one
> packet ahead than thread stack and branch samples ('tidp->prev_packet'
> vs 'tidq->packet').
> 
> This leads to an instruction's callchain error as shows in below
> example:
> 
>   main  1579        100      instructions:
>   	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
>   	ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
>   	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
>   	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
>   	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
>   	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
>   	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> 
> In the callchain log, for the two continuous lines the up line contains
> one child function info and the followed line contains the caller
> function info, and so forth.  So the first two lines are:
> 
>   perf_event_update_userpage+0x4c  => the sampled instruction
>   perf_event_update_userpage+0x48  => the parent function's calling
> 
> The child function and parent function both are the same function
> perf_event_update_userpage(), but this isn't a recursive function, thus
> the sequence for perf_event_update_userpage() calling itself shouldn't
> never happen.  This callchain error is caused by the instruction sample
> using an ahead packet than the thread stack, the thread stack is deferred
> to process the new packet and misses to pop stack if it is just a return
> packet.
> 
> To fix this issue, we can simply change to use 'tidq->prev_packet' to
> generate the instruction samples, this allows the thread stack to push
> and pop synchronously with instruction sample.  Finally, the callchain
> can be displayed correctly as below:
> 
>   main  1579        100      instructions:
> 	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> 	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> 	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> 	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> 	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> 	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 56e501cd2f5f..fa969dcb45d2 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1419,7 +1419,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>  	struct cs_etm_packet *tmp;
>  	int ret;
>  	u8 trace_chan_id = tidq->trace_chan_id;
> -	u64 instrs_executed = tidq->packet->instr_count;
> +	u64 instrs_executed = tidq->prev_packet->instr_count;
>  
>  	tidq->period_instructions += instrs_executed;
>  
> @@ -1450,7 +1450,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>  		 */
>  		s64 offset = (instrs_executed - instrs_over - 1);
>  		u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
> -					      tidq->packet, offset);
> +					      tidq->prev_packet, offset);

I have tested this set in --per-thread mode and things are working as
advertised. Did you see how things look like in CPU-wide scenarios?

Thanks,
Mathieu

>  
>  		ret = cs_etm__synth_instruction_sample(
>  			etmq, tidq, addr, etm->instructions_sample_period);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/6] perf cs-etm: Support thread stack
  2019-10-11 17:53   ` Mathieu Poirier
@ 2019-10-15  3:33     ` Leo Yan
  2019-10-22  5:03     ` Leo Yan
  1 sibling, 0 replies; 20+ messages in thread
From: Leo Yan @ 2019-10-15  3:33 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	linux-kernel, Mike Leach, Coresight ML, Peter Zijlstra,
	Ingo Molnar

Hi Mathieu,

On Fri, Oct 11, 2019 at 11:53:53AM -0600, Mathieu Poirier wrote:
> On Sat, Oct 05, 2019 at 05:16:11PM +0800, Leo Yan wrote:
> > Since Arm CoreSight doesn't support thread stack, the decoding cannot
> > display symbols with indented spaces to reflect the stack depth.
> > 
> > This patch adds support thread stack for Arm CoreSight, this allows
> > 'perf script' to display properly for option '-F,+callindent'.
> > 
> > Before:
> > 
> >   # perf script -F,+callindent
> >             main  2808          1          branches: coresight_test1                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
> >             main  2808          1          branches: printf@plt                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
> >             main  2808          1          branches: printf@plt                           aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
> >             main  2808          1          branches: _init                                aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
> >             main  2808          1          branches: _dl_fixup                            ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so)
> >             main  2808          1          branches: _dl_lookup_symbol_x                  ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
> >   [...]
> > 
> > After:
> > 
> >   # perf script -F,+callindent
> >             main  2808          1          branches:                 coresight_test1                                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
> >             main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
> >             main  2808          1          branches:                     printf@plt                                       aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
> >             main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
> >             main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
> >             main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
> >   [...]
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 44 insertions(+)
> > 
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 58ceba7b91d5..780abbfd1833 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -1117,6 +1117,45 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
> >  			   sample->insn_len, (void *)sample->insn);
> >  }
> >  
> > +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> > +				    struct cs_etm_traceid_queue *tidq)
> > +{
> > +	struct cs_etm_auxtrace *etm = etmq->etm;
> > +	u8 trace_chan_id = tidq->trace_chan_id;
> > +	int insn_len;
> > +	u64 from_ip, to_ip;
> > +
> > +	if (etm->synth_opts.thread_stack) {
> > +		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
> > +		to_ip = cs_etm__first_executed_instr(tidq->packet);
> > +
> > +		insn_len = cs_etm__instr_size(etmq, trace_chan_id,
> > +					      tidq->prev_packet->isa, from_ip);
> > +
> > +		/*
> > +		 * Create thread stacks by keeping track of calls and returns;
> > +		 * any call pushes thread stack, return pops the stack, and
> > +		 * flush stack when the trace is discontinuous.
> > +		 */
> > +		thread_stack__event(tidq->thread, tidq->prev_packet->cpu,
> > +				    tidq->prev_packet->flags,
> > +				    from_ip, to_ip, insn_len,
> > +				    etmq->buffer->buffer_nr);
> 
> Details are a little fuzzy in my head but I'm pretty sure
> we want trace_chan_id here.  

Thanks a lot for reviewing!

This is good point.  After a quick look for the code, seems
TO_CS_QUEUE_NR() is the right thing to use at here for buffer
number.

Will look more closely to the code and apply changes in next
version.

Thanks,
Leo Yan

> > +	} else {
> > +		/*
> > +		 * The thread stack can be output via thread_stack__process();
> > +		 * thus the detailed information about paired calls and returns
> > +		 * will be facilitated by Python script for the db-export.
> > +		 *
> > +		 * Need to set trace buffer number and flush thread stack if the
> > +		 * trace buffer number has been alternate.
> > +		 */
> > +		thread_stack__set_trace_nr(tidq->thread,
> > +					   tidq->prev_packet->cpu,
> > +					   etmq->buffer->buffer_nr);
> 
> Same here.
> 
> > +	}
> > +}
> > +
> >  static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> >  					    struct cs_etm_traceid_queue *tidq,
> >  					    u64 addr, u64 period)
> > @@ -1393,6 +1432,9 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >  		tidq->period_instructions = instrs_over;
> >  	}
> >  
> > +	if (tidq->prev_packet->last_instr_taken_branch)
> > +		cs_etm__add_stack_event(etmq, tidq);
> > +
> >  	if (etm->sample_branches) {
> >  		bool generate_sample = false;
> >  
> > @@ -2593,6 +2635,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
> >  		itrace_synth_opts__set_default(&etm->synth_opts,
> >  				session->itrace_synth_opts->default_no_sample);
> >  		etm->synth_opts.callchain = false;
> > +		etm->synth_opts.thread_stack =
> > +				session->itrace_synth_opts->thread_stack;
> >  	}
> >  
> >  	err = cs_etm__synth_events(etm, session);
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 6/6] perf cs-etm: Synchronize instruction sample with the thread stack
  2019-10-11 20:17   ` Mathieu Poirier
@ 2019-10-15  3:44     ` Leo Yan
  2019-10-22  4:50     ` Leo Yan
  1 sibling, 0 replies; 20+ messages in thread
From: Leo Yan @ 2019-10-15  3:44 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	linux-kernel, Mike Leach, Coresight ML, Peter Zijlstra,
	Ingo Molnar

On Fri, Oct 11, 2019 at 02:17:50PM -0600, Mathieu Poirier wrote:
> On Sat, Oct 05, 2019 at 05:16:14PM +0800, Leo Yan wrote:
> > The synthesized flow use 'tidq->packet' for instruction samples; on the
> > other hand, 'tidp->prev_packet' is used to generate the thread stack and
> > the branch samples, this results in the instruction samples using one
> > packet ahead than thread stack and branch samples ('tidp->prev_packet'
> > vs 'tidq->packet').
> > 
> > This leads to an instruction's callchain error as shows in below
> > example:
> > 
> >   main  1579        100      instructions:
> >   	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> >   	ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
> >   	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> >   	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> >   	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> >   	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> >   	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> > 
> > In the callchain log, for the two continuous lines the up line contains
> > one child function info and the followed line contains the caller
> > function info, and so forth.  So the first two lines are:
> > 
> >   perf_event_update_userpage+0x4c  => the sampled instruction
> >   perf_event_update_userpage+0x48  => the parent function's calling
> > 
> > The child function and parent function both are the same function
> > perf_event_update_userpage(), but this isn't a recursive function, thus
> > the sequence for perf_event_update_userpage() calling itself shouldn't
> > never happen.  This callchain error is caused by the instruction sample
> > using an ahead packet than the thread stack, the thread stack is deferred
> > to process the new packet and misses to pop stack if it is just a return
> > packet.
> > 
> > To fix this issue, we can simply change to use 'tidq->prev_packet' to
> > generate the instruction samples, this allows the thread stack to push
> > and pop synchronously with instruction sample.  Finally, the callchain
> > can be displayed correctly as below:
> > 
> >   main  1579        100      instructions:
> > 	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> > 	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> > 	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> > 	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> > 	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> > 	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/cs-etm.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 56e501cd2f5f..fa969dcb45d2 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -1419,7 +1419,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >  	struct cs_etm_packet *tmp;
> >  	int ret;
> >  	u8 trace_chan_id = tidq->trace_chan_id;
> > -	u64 instrs_executed = tidq->packet->instr_count;
> > +	u64 instrs_executed = tidq->prev_packet->instr_count;
> >  
> >  	tidq->period_instructions += instrs_executed;
> >  
> > @@ -1450,7 +1450,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >  		 */
> >  		s64 offset = (instrs_executed - instrs_over - 1);
> >  		u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > -					      tidq->packet, offset);
> > +					      tidq->prev_packet, offset);
> 
> I have tested this set in --per-thread mode and things are working as
> advertised. Did you see how things look like in CPU-wide scenarios?

I am not sure I checked the callchain for CPU-wide scenarios, but I
didn't pay attention for the case when the trace data is switching
between CPUs, especially if connect with your comment in patch 03,
with wrong buffer ID it might cause the wrong callchain for CPU-wide
scenarios.

Will do more testing for CPU-wide scenarios in next spin.

Thanks for the suggestions.
Leo Yan

> Thanks,
> Mathieu
> 
> >  
> >  		ret = cs_etm__synth_instruction_sample(
> >  			etmq, tidq, addr, etm->instructions_sample_period);
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 6/6] perf cs-etm: Synchronize instruction sample with the thread stack
  2019-10-11 20:17   ` Mathieu Poirier
  2019-10-15  3:44     ` Leo Yan
@ 2019-10-22  4:50     ` Leo Yan
  1 sibling, 0 replies; 20+ messages in thread
From: Leo Yan @ 2019-10-22  4:50 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	linux-kernel, Mike Leach, Coresight ML, Peter Zijlstra,
	Ingo Molnar

Hi Mathieu,

On Fri, Oct 11, 2019 at 02:17:50PM -0600, Mathieu Poirier wrote:
> On Sat, Oct 05, 2019 at 05:16:14PM +0800, Leo Yan wrote:
> > The synthesized flow use 'tidq->packet' for instruction samples; on the
> > other hand, 'tidp->prev_packet' is used to generate the thread stack and
> > the branch samples, this results in the instruction samples using one
> > packet ahead than thread stack and branch samples ('tidp->prev_packet'
> > vs 'tidq->packet').
> > 
> > This leads to an instruction's callchain error as shows in below
> > example:
> > 
> >   main  1579        100      instructions:
> >   	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> >   	ffff000010214850 perf_event_update_userpage+0x48 ([kernel.kallsyms])
> >   	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> >   	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> >   	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> >   	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> >   	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> > 
> > In the callchain log, for the two continuous lines the up line contains
> > one child function info and the followed line contains the caller
> > function info, and so forth.  So the first two lines are:
> > 
> >   perf_event_update_userpage+0x4c  => the sampled instruction
> >   perf_event_update_userpage+0x48  => the parent function's calling
> > 
> > The child function and parent function both are the same function
> > perf_event_update_userpage(), but this isn't a recursive function, thus
> > the sequence for perf_event_update_userpage() calling itself shouldn't
> > never happen.  This callchain error is caused by the instruction sample
> > using an ahead packet than the thread stack, the thread stack is deferred
> > to process the new packet and misses to pop stack if it is just a return
> > packet.
> > 
> > To fix this issue, we can simply change to use 'tidq->prev_packet' to
> > generate the instruction samples, this allows the thread stack to push
> > and pop synchronously with instruction sample.  Finally, the callchain
> > can be displayed correctly as below:
> > 
> >   main  1579        100      instructions:
> > 	ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
> > 	ffff000010219360 perf_swevent_add+0x88 ([kernel.kallsyms])
> > 	ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
> > 	ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
> > 	ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
> > 	ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/cs-etm.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 56e501cd2f5f..fa969dcb45d2 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -1419,7 +1419,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >  	struct cs_etm_packet *tmp;
> >  	int ret;
> >  	u8 trace_chan_id = tidq->trace_chan_id;
> > -	u64 instrs_executed = tidq->packet->instr_count;
> > +	u64 instrs_executed = tidq->prev_packet->instr_count;
> >  
> >  	tidq->period_instructions += instrs_executed;
> >  
> > @@ -1450,7 +1450,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >  		 */
> >  		s64 offset = (instrs_executed - instrs_over - 1);
> >  		u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > -					      tidq->packet, offset);
> > +					      tidq->prev_packet, offset);
> 
> I have tested this set in --per-thread mode and things are working as
> advertised. Did you see how things look like in CPU-wide scenarios?

After some testing, I can confirm this patch set can works well for
CPU-wide trace; the reason is in the arch/arm/util/cs-etm.c, function
cs_etm_recording_options() has enabled option 'ETM_OPT_CTXTID' for
CPU-wide trace:

         /*
          * In the case of per-cpu mmaps, we need the CPU on the
          * AUX event.  We also need the contextID in order to be notified
          * when a context switch happened.
          */
         if (!perf_cpu_map__empty(cpus)) {
                 perf_evsel__set_sample_bit(cs_etm_evsel, CPU);

                 err = cs_etm_set_option(itr, cs_etm_evsel,
                                         ETM_OPT_CTXTID | ETM_OPT_TS);
                 if (err)
                         goto out;
         }

As result, we don't need to specify extra option to enable CTXID
configuration.  So below two commands have the same behaviour:

  # perf record -e cs_etm/@tmc_etr0/ -a -- sh test.sh
  # perf record -e cs_etm/@tmc_etr0,config=0x4000/ -a -- sh test.sh
                                       `-> bit 14: for ETM_OPT_CTXTID

Since the decoding will set tid when receive the packet
'OCSD_GEN_TRC_ELEM_PE_CONTEXT', thus it can give the correct tid/pid
info for threads.  This allows to generate per thread stack base on
thread->tid and avoid mixing info cross different threads.

Thanks,
Leo Yan

> >  
> >  		ret = cs_etm__synth_instruction_sample(
> >  			etmq, tidq, addr, etm->instructions_sample_period);
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/6] perf cs-etm: Support thread stack
  2019-10-11 17:53   ` Mathieu Poirier
  2019-10-15  3:33     ` Leo Yan
@ 2019-10-22  5:03     ` Leo Yan
  2019-10-28 22:43       ` Mathieu Poirier
  1 sibling, 1 reply; 20+ messages in thread
From: Leo Yan @ 2019-10-22  5:03 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	linux-kernel, Mike Leach, Coresight ML, Peter Zijlstra,
	Ingo Molnar

Hi Mathieu,

On Fri, Oct 11, 2019 at 11:53:53AM -0600, Mathieu Poirier wrote:
> On Sat, Oct 05, 2019 at 05:16:11PM +0800, Leo Yan wrote:
> > Since Arm CoreSight doesn't support thread stack, the decoding cannot
> > display symbols with indented spaces to reflect the stack depth.
> > 
> > This patch adds support thread stack for Arm CoreSight, this allows
> > 'perf script' to display properly for option '-F,+callindent'.
> > 
> > Before:
> > 
> >   # perf script -F,+callindent
> >             main  2808          1          branches: coresight_test1                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
> >             main  2808          1          branches: printf@plt                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
> >             main  2808          1          branches: printf@plt                           aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
> >             main  2808          1          branches: _init                                aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
> >             main  2808          1          branches: _dl_fixup                            ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so)
> >             main  2808          1          branches: _dl_lookup_symbol_x                  ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
> >   [...]
> > 
> > After:
> > 
> >   # perf script -F,+callindent
> >             main  2808          1          branches:                 coresight_test1                                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
> >             main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
> >             main  2808          1          branches:                     printf@plt                                       aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
> >             main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
> >             main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
> >             main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
> >   [...]
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 44 insertions(+)
> > 
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 58ceba7b91d5..780abbfd1833 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -1117,6 +1117,45 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
> >  			   sample->insn_len, (void *)sample->insn);
> >  }
> >  
> > +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> > +				    struct cs_etm_traceid_queue *tidq)
> > +{
> > +	struct cs_etm_auxtrace *etm = etmq->etm;
> > +	u8 trace_chan_id = tidq->trace_chan_id;
> > +	int insn_len;
> > +	u64 from_ip, to_ip;
> > +
> > +	if (etm->synth_opts.thread_stack) {
> > +		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
> > +		to_ip = cs_etm__first_executed_instr(tidq->packet);
> > +
> > +		insn_len = cs_etm__instr_size(etmq, trace_chan_id,
> > +					      tidq->prev_packet->isa, from_ip);
> > +
> > +		/*
> > +		 * Create thread stacks by keeping track of calls and returns;
> > +		 * any call pushes thread stack, return pops the stack, and
> > +		 * flush stack when the trace is discontinuous.
> > +		 */
> > +		thread_stack__event(tidq->thread, tidq->prev_packet->cpu,
> > +				    tidq->prev_packet->flags,
> > +				    from_ip, to_ip, insn_len,
> > +				    etmq->buffer->buffer_nr);
> 
> Details are a little fuzzy in my head but I'm pretty sure
> we want trace_chan_id here.  

I spent some time to look into this question, and I think we don't
need to add extra info for trace_chan_id.

The main reason is for CPU wide tracing, if one task is migrated from
CPU_a to CPU_b, if we append 'trace_chan_id' for the buffer number, then
it will tell the thread_stack that the buffer has been changed (or it
will be considered the trace is discontinuous), then thread stack will
be flushed.  Actually, this is not what we want; if a task is migrated
from one CPU to another, we still need to keep its thread stack if the
trace data comes from the same buffer_nr.

To be honest, I struggled to understand what's the purpose for
'buffer->buffer_nr', from the code, I think 'buffer->buffer_nr' is
mainly used to trace the splitted buffers (e.g. the buffers are splitted
into different queues so the trace data coming from different trace
chunk?).  Now I observe 'buffer->buffer_nr' is always zero since the
buffer is not used with splitted mode.  If later we support 1:1 map
between tracers and sinks, then we need to set 'buffer->buffer_nr' so
can reflect the correct buffer mapping, but we don't need to use
trace_chan_id as extra info at here.

Please let me know what you think about this?  If you agree with this,
I will send out patch v4 soon with addressing other comments.

Thanks,
Leo Yan

> > +	} else {
> > +		/*
> > +		 * The thread stack can be output via thread_stack__process();
> > +		 * thus the detailed information about paired calls and returns
> > +		 * will be facilitated by Python script for the db-export.
> > +		 *
> > +		 * Need to set trace buffer number and flush thread stack if the
> > +		 * trace buffer number has been alternate.
> > +		 */
> > +		thread_stack__set_trace_nr(tidq->thread,
> > +					   tidq->prev_packet->cpu,
> > +					   etmq->buffer->buffer_nr);
> 
> Same here.
> 
> > +	}
> > +}
> > +
> >  static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> >  					    struct cs_etm_traceid_queue *tidq,
> >  					    u64 addr, u64 period)
> > @@ -1393,6 +1432,9 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >  		tidq->period_instructions = instrs_over;
> >  	}
> >  
> > +	if (tidq->prev_packet->last_instr_taken_branch)
> > +		cs_etm__add_stack_event(etmq, tidq);
> > +
> >  	if (etm->sample_branches) {
> >  		bool generate_sample = false;
> >  
> > @@ -2593,6 +2635,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
> >  		itrace_synth_opts__set_default(&etm->synth_opts,
> >  				session->itrace_synth_opts->default_no_sample);
> >  		etm->synth_opts.callchain = false;
> > +		etm->synth_opts.thread_stack =
> > +				session->itrace_synth_opts->thread_stack;
> >  	}
> >  
> >  	err = cs_etm__synth_events(etm, session);
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/6] perf cs-etm: Fix unsigned variable comparison to zero
  2019-10-11 20:16   ` Mathieu Poirier
@ 2019-10-22  5:10     ` Leo Yan
  2019-10-22 23:36       ` Mike Leach
  0 siblings, 1 reply; 20+ messages in thread
From: Leo Yan @ 2019-10-22  5:10 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	linux-kernel, Mike Leach, Coresight ML, Peter Zijlstra,
	Ingo Molnar

Hi Mathieu,

On Fri, Oct 11, 2019 at 02:16:06PM -0600, Mathieu Poirier wrote:
> On Sat, Oct 05, 2019 at 05:16:09PM +0800, Leo Yan wrote:
> > If the u64 variable 'offset' is a negative integer, comparison it with
> > bigger than zero is always going to be true because it is unsigned.
> > Fix this by using s64 type for variable 'offset'.
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/cs-etm.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 4ba0f871f086..4bc2d9709d4f 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -940,7 +940,7 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
> >  static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
> >  				     u64 trace_chan_id,
> >  				     const struct cs_etm_packet *packet,
> > -				     u64 offset)
> > +				     s64 offset)
> 
> In Suzuki's reply there was two choices, 1) move the while(offset > 0) to
> while (offset) or change the type of @offset to an s64.  Here we know offset
> can't be negative because of the 
>         tidq->period_instructions >= etm->instructions_sample_period 
> 
> in function cs_etm__sample().  As such I think option #1 is the right way to
> deal with this rather than changing the type of the variable.

I took sometime to use formulas to prove that 'offset' is possible to
be a negative value :)

Just paste the updated commit log at here for review:

  Pi: period_instructions
  Ie: instrs_executed
  Io: instrs_over
  Ip: instructions_sample_period

  Pi' = Pi + Ie   -> New period_instructions equals to the old
                     period_instructions + instrs_executed
  Io  = Pi' - Ip  -> period_instructions - instructions_sample_period

  offset = Ie - Io - 1
         = Ie - (Pi' - Ip) -1
	 = Ie - (Pi + Ie - Ip) -1
	 = Ip - Pi - 1

In theory, if Ip (instructions_sample_period) is small enough and Pi
(period_instructions) is bigger than Ip, then it will lead to the
negative value for 'offset'.

So let's see below command:

  perf inject --itrace=i1il128 -i perf.data -o perf.data.new

With this command, 'offset' is very easily to be a negative value when
handling packets; this is because if use the inject option 'i1', then
instructions_sample_period equals to 1; so:

  offset = 1 - Pi - 1
         = -Pi

Any Pi bigger than zero leads 'offset' to a negative value.

Thanks,
Leo Yan

> >  {
> >  	if (packet->isa == CS_ETM_ISA_T32) {
> >  		u64 addr = packet->start_addr;
> > @@ -1372,7 +1372,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >  		 * sample is reported as though instruction has just been
> >  		 * executed, but PC has not advanced to next instruction)
> >  		 */
> > -		u64 offset = (instrs_executed - instrs_over - 1);
> > +		s64 offset = (instrs_executed - instrs_over - 1);
> >  		u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
> >  					      tidq->packet, offset);
> >  
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/6] perf cs-etm: Fix unsigned variable comparison to zero
  2019-10-22  5:10     ` Leo Yan
@ 2019-10-22 23:36       ` Mike Leach
  2019-10-23  6:49         ` Leo Yan
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Leach @ 2019-10-22 23:36 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mathieu Poirier, Arnaldo Carvalho de Melo, Suzuki K Poulose,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML, Peter Zijlstra,
	Ingo Molnar

Hi Leo,

Two points here - both related.

On Tue, 22 Oct 2019 at 06:10, Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi Mathieu,
>
> On Fri, Oct 11, 2019 at 02:16:06PM -0600, Mathieu Poirier wrote:
> > On Sat, Oct 05, 2019 at 05:16:09PM +0800, Leo Yan wrote:
> > > If the u64 variable 'offset' is a negative integer, comparison it with
> > > bigger than zero is always going to be true because it is unsigned.
> > > Fix this by using s64 type for variable 'offset'.
> > >
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > >  tools/perf/util/cs-etm.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > > index 4ba0f871f086..4bc2d9709d4f 100644
> > > --- a/tools/perf/util/cs-etm.c
> > > +++ b/tools/perf/util/cs-etm.c
> > > @@ -940,7 +940,7 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
> > >  static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
> > >                                  u64 trace_chan_id,
> > >                                  const struct cs_etm_packet *packet,
> > > -                                u64 offset)
> > > +                                s64 offset)
> >
Issue 1:

OK - it appears that cs_etm__instr_addr() is supposed to be returning
the address within the current trace sample of the instruction related
to offset.
For T32 - then if offset < 0, packet->start_addr is returned - not
good but at least within the current trace range
For A32/A64 - if offset < 0 then an address _before_
packet->start_addr is returned - clearly wrong and possibly a
completely invalid address that was never actually traced.

> > In Suzuki's reply there was two choices, 1) move the while(offset > 0) to
> > while (offset) or change the type of @offset to an s64.  Here we know offset
> > can't be negative because of the
> >         tidq->period_instructions >= etm->instructions_sample_period
> >
> > in function cs_etm__sample().  As such I think option #1 is the right way to
> > deal with this rather than changing the type of the variable.
>
> I took sometime to use formulas to prove that 'offset' is possible to
> be a negative value :)
>
> Just paste the updated commit log at here for review:
>
>   Pi: period_instructions
>   Ie: instrs_executed
>   Io: instrs_over
>   Ip: instructions_sample_period
>
>   Pi' = Pi + Ie   -> New period_instructions equals to the old
>                      period_instructions + instrs_executed
>   Io  = Pi' - Ip  -> period_instructions - instructions_sample_period
>
>   offset = Ie - Io - 1
>          = Ie - (Pi' - Ip) -1
>          = Ie - (Pi + Ie - Ip) -1
>          = Ip - Pi - 1
>
> In theory, if Ip (instructions_sample_period) is small enough and Pi
> (period_instructions) is bigger than Ip, then it will lead to the
> negative value for 'offset'.
>
> So let's see below command:
>
>   perf inject --itrace=i1il128 -i perf.data -o perf.data.new
>
> With this command, 'offset' is very easily to be a negative value when
> handling packets; this is because if use the inject option 'i1', then
> instructions_sample_period equals to 1; so:
>
>   offset = 1 - Pi - 1
>          = -Pi
>
> Any Pi bigger than zero leads 'offset' to a negative value.
>
> Thanks,
> Leo Yan
>

Issue 2:

Assuming I have understood the logic of this code correctly - there is
an issue were sample_period < period_instructions as you say -
but I believe the problem is in the logic of the sampling function itself.

Suppose we have a sample_period of 4.

Now on an initial pass through the function, period_instructions must
be 0. (i.e. none left over from the previous pass.)
Suppose also that the number of instructions executed in this sample
is 10 - thus updating period_instructions.
Therefore:
instr_over = 10 - 4 -> 6
offset = 10 - 6 - 1 -> 3.
We therefore call cs_etm_instr_addr to find the address an offset of 3
instructions from the start of the trace sample and synthesize the
sample.
After this we set period_instructions to the instr_over value of 6.

Next pass, assume 10 instructions in the trace sample again.
period_instructions = 6 + 10 -> 16
instr_over = 16 - 4 -> 12
offset = 10 - 12 - 1 -> -3  - the negative value your formulae predict.

This implies that the sample we want is actually in the previous trace
packet - which I believe is in fact the case - as explained below.

My reading of the code is that cs_etm__sample() is called once per
trace range packet extracted from the decoder - and a trace range
packet represents N instructions_executed.
Further I am assuming that instructions_sample_period represents the
desired periodicity of the instruction samples - i.e. 1 sample every
instructions_sample_period number of instructions.

Thus my conclusion here is that where M = instructions_executed +
period_instructions, the function should generate quotient ( M /
instructions_sample_period ) samples and set period_instructions to M
mod instructions_sample_period on exit, ensuring period_instructions
is never larger than the sample_period.

i.e. loop to generate multiple samples until instr_over and therefore
the output value of period_instructions is less than the value of
instructions_sample_period - for the example above, with 10
instructions and a periodicity of 4, we generate 2 samples with a
remainder of 2 instructions carried forwards.

In short leave offset as unsigned and fix the logic of the
cs_etm__sample() function.

Regards

Mike

> > >  {
> > >     if (packet->isa == CS_ETM_ISA_T32) {
> > >             u64 addr = packet->start_addr;
> > > @@ -1372,7 +1372,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > >              * sample is reported as though instruction has just been
> > >              * executed, but PC has not advanced to next instruction)
> > >              */
> > > -           u64 offset = (instrs_executed - instrs_over - 1);
> > > +           s64 offset = (instrs_executed - instrs_over - 1);
> > >             u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > >                                           tidq->packet, offset);
> > >
> > > --
> > > 2.17.1
> > >



--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/6] perf cs-etm: Fix unsigned variable comparison to zero
  2019-10-22 23:36       ` Mike Leach
@ 2019-10-23  6:49         ` Leo Yan
  0 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2019-10-23  6:49 UTC (permalink / raw)
  To: Mike Leach
  Cc: Mathieu Poirier, Arnaldo Carvalho de Melo, Suzuki K Poulose,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML, Peter Zijlstra,
	Ingo Molnar

Hi Mike,

On Wed, Oct 23, 2019 at 12:36:39AM +0100, Mike Leach wrote:
> Hi Leo,
> 
> Two points here - both related.
> 
> On Tue, 22 Oct 2019 at 06:10, Leo Yan <leo.yan@linaro.org> wrote:
> >
> > Hi Mathieu,
> >
> > On Fri, Oct 11, 2019 at 02:16:06PM -0600, Mathieu Poirier wrote:
> > > On Sat, Oct 05, 2019 at 05:16:09PM +0800, Leo Yan wrote:
> > > > If the u64 variable 'offset' is a negative integer, comparison it with
> > > > bigger than zero is always going to be true because it is unsigned.
> > > > Fix this by using s64 type for variable 'offset'.
> > > >
> > > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > > ---
> > > >  tools/perf/util/cs-etm.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > > > index 4ba0f871f086..4bc2d9709d4f 100644
> > > > --- a/tools/perf/util/cs-etm.c
> > > > +++ b/tools/perf/util/cs-etm.c
> > > > @@ -940,7 +940,7 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
> > > >  static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
> > > >                                  u64 trace_chan_id,
> > > >                                  const struct cs_etm_packet *packet,
> > > > -                                u64 offset)
> > > > +                                s64 offset)
> > >
> Issue 1:
> 
> OK - it appears that cs_etm__instr_addr() is supposed to be returning
> the address within the current trace sample of the instruction related
> to offset.
> For T32 - then if offset < 0, packet->start_addr is returned - not
> good but at least within the current trace range
> For A32/A64 - if offset < 0 then an address _before_
> packet->start_addr is returned - clearly wrong and possibly a
> completely invalid address that was never actually traced.

Exactly, if offset < 0 it might output the incorrect trace.

> > > In Suzuki's reply there was two choices, 1) move the while(offset > 0) to
> > > while (offset) or change the type of @offset to an s64.  Here we know offset
> > > can't be negative because of the
> > >         tidq->period_instructions >= etm->instructions_sample_period
> > >
> > > in function cs_etm__sample().  As such I think option #1 is the right way to
> > > deal with this rather than changing the type of the variable.
> >
> > I took sometime to use formulas to prove that 'offset' is possible to
> > be a negative value :)
> >
> > Just paste the updated commit log at here for review:
> >
> >   Pi: period_instructions
> >   Ie: instrs_executed
> >   Io: instrs_over
> >   Ip: instructions_sample_period
> >
> >   Pi' = Pi + Ie   -> New period_instructions equals to the old
> >                      period_instructions + instrs_executed
> >   Io  = Pi' - Ip  -> period_instructions - instructions_sample_period
> >
> >   offset = Ie - Io - 1
> >          = Ie - (Pi' - Ip) -1
> >          = Ie - (Pi + Ie - Ip) -1
> >          = Ip - Pi - 1
> >
> > In theory, if Ip (instructions_sample_period) is small enough and Pi
> > (period_instructions) is bigger than Ip, then it will lead to the
> > negative value for 'offset'.
> >
> > So let's see below command:
> >
> >   perf inject --itrace=i1il128 -i perf.data -o perf.data.new
> >
> > With this command, 'offset' is very easily to be a negative value when
> > handling packets; this is because if use the inject option 'i1', then
> > instructions_sample_period equals to 1; so:
> >
> >   offset = 1 - Pi - 1
> >          = -Pi
> >
> > Any Pi bigger than zero leads 'offset' to a negative value.
> >
> > Thanks,
> > Leo Yan
> >
> 
> Issue 2:
> 
> Assuming I have understood the logic of this code correctly - there is
> an issue were sample_period < period_instructions as you say -
> but I believe the problem is in the logic of the sampling function itself.
> 
> Suppose we have a sample_period of 4.
> 
> Now on an initial pass through the function, period_instructions must
> be 0. (i.e. none left over from the previous pass.)
> Suppose also that the number of instructions executed in this sample
> is 10 - thus updating period_instructions.
> Therefore:
> instr_over = 10 - 4 -> 6
> offset = 10 - 6 - 1 -> 3.
> We therefore call cs_etm_instr_addr to find the address an offset of 3
> instructions from the start of the trace sample and synthesize the
> sample.
> After this we set period_instructions to the instr_over value of 6.
> 
> Next pass, assume 10 instructions in the trace sample again.
> period_instructions = 6 + 10 -> 16
> instr_over = 16 - 4 -> 12
> offset = 10 - 12 - 1 -> -3  - the negative value your formulae predict.
> 
> This implies that the sample we want is actually in the previous trace
> packet - which I believe is in fact the case - as explained below.
> 
> My reading of the code is that cs_etm__sample() is called once per
> trace range packet extracted from the decoder - and a trace range
> packet represents N instructions_executed.
> Further I am assuming that instructions_sample_period represents the
> desired periodicity of the instruction samples - i.e. 1 sample every
> instructions_sample_period number of instructions.

Good point.  Yeah, this is the root cause.

> Thus my conclusion here is that where M = instructions_executed +
> period_instructions, the function should generate quotient ( M /
> instructions_sample_period ) samples and set period_instructions to M
> mod instructions_sample_period on exit, ensuring period_instructions
> is never larger than the sample_period.

Totally agree with this; we should generate synthetic samples without
dropping trace data.

> i.e. loop to generate multiple samples until instr_over and therefore
> the output value of period_instructions is less than the value of
> instructions_sample_period - for the example above, with 10
> instructions and a periodicity of 4, we generate 2 samples with a
> remainder of 2 instructions carried forwards.
> 
> In short leave offset as unsigned and fix the logic of the
> cs_etm__sample() function.

Will follow up this suggestion.

Very appreciate your time to review and gave out much reasonable
solution!

Thanks,
Leo Yan

> > > >  {
> > > >     if (packet->isa == CS_ETM_ISA_T32) {
> > > >             u64 addr = packet->start_addr;
> > > > @@ -1372,7 +1372,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > > >              * sample is reported as though instruction has just been
> > > >              * executed, but PC has not advanced to next instruction)
> > > >              */
> > > > -           u64 offset = (instrs_executed - instrs_over - 1);
> > > > +           s64 offset = (instrs_executed - instrs_over - 1);
> > > >             u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > > >                                           tidq->packet, offset);
> > > >
> > > > --
> > > > 2.17.1
> > > >
> 
> 
> 
> --
> Mike Leach
> Principal Engineer, ARM Ltd.
> Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/6] perf cs-etm: Support thread stack
  2019-10-22  5:03     ` Leo Yan
@ 2019-10-28 22:43       ` Mathieu Poirier
  2019-10-29  4:11         ` Leo Yan
  0 siblings, 1 reply; 20+ messages in thread
From: Mathieu Poirier @ 2019-10-28 22:43 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	Linux Kernel Mailing List, Mike Leach, Coresight ML,
	Peter Zijlstra, Ingo Molnar

On Mon, 21 Oct 2019 at 23:03, Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi Mathieu,
>
> On Fri, Oct 11, 2019 at 11:53:53AM -0600, Mathieu Poirier wrote:
> > On Sat, Oct 05, 2019 at 05:16:11PM +0800, Leo Yan wrote:
> > > Since Arm CoreSight doesn't support thread stack, the decoding cannot
> > > display symbols with indented spaces to reflect the stack depth.
> > >
> > > This patch adds support thread stack for Arm CoreSight, this allows
> > > 'perf script' to display properly for option '-F,+callindent'.
> > >
> > > Before:
> > >
> > >   # perf script -F,+callindent
> > >             main  2808          1          branches: coresight_test1                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
> > >             main  2808          1          branches: printf@plt                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
> > >             main  2808          1          branches: printf@plt                           aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
> > >             main  2808          1          branches: _init                                aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
> > >             main  2808          1          branches: _dl_fixup                            ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so)
> > >             main  2808          1          branches: _dl_lookup_symbol_x                  ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
> > >   [...]
> > >
> > > After:
> > >
> > >   # perf script -F,+callindent
> > >             main  2808          1          branches:                 coresight_test1                                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
> > >             main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
> > >             main  2808          1          branches:                     printf@plt                                       aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
> > >             main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
> > >             main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
> > >             main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
> > >   [...]
> > >
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > >  tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 44 insertions(+)
> > >
> > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > > index 58ceba7b91d5..780abbfd1833 100644
> > > --- a/tools/perf/util/cs-etm.c
> > > +++ b/tools/perf/util/cs-etm.c
> > > @@ -1117,6 +1117,45 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
> > >                        sample->insn_len, (void *)sample->insn);
> > >  }
> > >
> > > +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> > > +                               struct cs_etm_traceid_queue *tidq)
> > > +{
> > > +   struct cs_etm_auxtrace *etm = etmq->etm;
> > > +   u8 trace_chan_id = tidq->trace_chan_id;
> > > +   int insn_len;
> > > +   u64 from_ip, to_ip;
> > > +
> > > +   if (etm->synth_opts.thread_stack) {
> > > +           from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
> > > +           to_ip = cs_etm__first_executed_instr(tidq->packet);
> > > +
> > > +           insn_len = cs_etm__instr_size(etmq, trace_chan_id,
> > > +                                         tidq->prev_packet->isa, from_ip);
> > > +
> > > +           /*
> > > +            * Create thread stacks by keeping track of calls and returns;
> > > +            * any call pushes thread stack, return pops the stack, and
> > > +            * flush stack when the trace is discontinuous.
> > > +            */
> > > +           thread_stack__event(tidq->thread, tidq->prev_packet->cpu,
> > > +                               tidq->prev_packet->flags,
> > > +                               from_ip, to_ip, insn_len,
> > > +                               etmq->buffer->buffer_nr);
> >
> > Details are a little fuzzy in my head but I'm pretty sure
> > we want trace_chan_id here.
>
> I spent some time to look into this question, and I think we don't
> need to add extra info for trace_chan_id.
>
> The main reason is for CPU wide tracing, if one task is migrated from
> CPU_a to CPU_b, if we append 'trace_chan_id' for the buffer number, then
> it will tell the thread_stack that the buffer has been changed (or it
> will be considered the trace is discontinuous), then thread stack will
> be flushed.  Actually, this is not what we want; if a task is migrated
> from one CPU to another, we still need to keep its thread stack if the
> trace data comes from the same buffer_nr.

After reviewing the code I conclude that using etmq->buffer->buffer_nr
is the correct way to proceed.

That being said you have sent this new set [1], which is a rework of
some of the code you have in the current set.  As such the only way
forward is for you to wait until [1] I has been applied and rebase the
remaining work in this set on top of it.

Let me know if you have questions.

Thanks,
Mathieu

[1]. https://patchwork.kernel.org/cover/11130213/

>
> To be honest, I struggled to understand what's the purpose for
> 'buffer->buffer_nr', from the code, I think 'buffer->buffer_nr' is
> mainly used to trace the splitted buffers (e.g. the buffers are splitted
> into different queues so the trace data coming from different trace
> chunk?).  Now I observe 'buffer->buffer_nr' is always zero since the
> buffer is not used with splitted mode.  If later we support 1:1 map
> between tracers and sinks, then we need to set 'buffer->buffer_nr' so
> can reflect the correct buffer mapping, but we don't need to use
> trace_chan_id as extra info at here.
>
> Please let me know what you think about this?  If you agree with this,
> I will send out patch v4 soon with addressing other comments.
>
> Thanks,
> Leo Yan
>
> > > +   } else {
> > > +           /*
> > > +            * The thread stack can be output via thread_stack__process();
> > > +            * thus the detailed information about paired calls and returns
> > > +            * will be facilitated by Python script for the db-export.
> > > +            *
> > > +            * Need to set trace buffer number and flush thread stack if the
> > > +            * trace buffer number has been alternate.
> > > +            */
> > > +           thread_stack__set_trace_nr(tidq->thread,
> > > +                                      tidq->prev_packet->cpu,
> > > +                                      etmq->buffer->buffer_nr);
> >
> > Same here.
> >
> > > +   }
> > > +}
> > > +
> > >  static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> > >                                         struct cs_etm_traceid_queue *tidq,
> > >                                         u64 addr, u64 period)
> > > @@ -1393,6 +1432,9 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > >             tidq->period_instructions = instrs_over;
> > >     }
> > >
> > > +   if (tidq->prev_packet->last_instr_taken_branch)
> > > +           cs_etm__add_stack_event(etmq, tidq);
> > > +
> > >     if (etm->sample_branches) {
> > >             bool generate_sample = false;
> > >
> > > @@ -2593,6 +2635,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
> > >             itrace_synth_opts__set_default(&etm->synth_opts,
> > >                             session->itrace_synth_opts->default_no_sample);
> > >             etm->synth_opts.callchain = false;
> > > +           etm->synth_opts.thread_stack =
> > > +                           session->itrace_synth_opts->thread_stack;
> > >     }
> > >
> > >     err = cs_etm__synth_events(etm, session);
> > > --
> > > 2.17.1
> > >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/6] perf cs-etm: Support thread stack
  2019-10-28 22:43       ` Mathieu Poirier
@ 2019-10-29  4:11         ` Leo Yan
  0 siblings, 0 replies; 20+ messages in thread
From: Leo Yan @ 2019-10-29  4:11 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	Linux Kernel Mailing List, Mike Leach, Coresight ML,
	Peter Zijlstra, Ingo Molnar

Hi Mathieu,

On Mon, Oct 28, 2019 at 04:43:57PM -0600, Mathieu Poirier wrote:
> On Mon, 21 Oct 2019 at 23:03, Leo Yan <leo.yan@linaro.org> wrote:
> >
> > Hi Mathieu,
> >
> > On Fri, Oct 11, 2019 at 11:53:53AM -0600, Mathieu Poirier wrote:
> > > On Sat, Oct 05, 2019 at 05:16:11PM +0800, Leo Yan wrote:
> > > > Since Arm CoreSight doesn't support thread stack, the decoding cannot
> > > > display symbols with indented spaces to reflect the stack depth.
> > > >
> > > > This patch adds support thread stack for Arm CoreSight, this allows
> > > > 'perf script' to display properly for option '-F,+callindent'.
> > > >
> > > > Before:
> > > >
> > > >   # perf script -F,+callindent
> > > >             main  2808          1          branches: coresight_test1                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
> > > >             main  2808          1          branches: printf@plt                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
> > > >             main  2808          1          branches: printf@plt                           aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
> > > >             main  2808          1          branches: _init                                aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
> > > >             main  2808          1          branches: _dl_fixup                            ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so)
> > > >             main  2808          1          branches: _dl_lookup_symbol_x                  ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
> > > >   [...]
> > > >
> > > > After:
> > > >
> > > >   # perf script -F,+callindent
> > > >             main  2808          1          branches:                 coresight_test1                                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
> > > >             main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
> > > >             main  2808          1          branches:                     printf@plt                                       aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
> > > >             main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
> > > >             main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
> > > >             main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
> > > >   [...]
> > > >
> > > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > > ---
> > > >  tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 44 insertions(+)
> > > >
> > > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > > > index 58ceba7b91d5..780abbfd1833 100644
> > > > --- a/tools/perf/util/cs-etm.c
> > > > +++ b/tools/perf/util/cs-etm.c
> > > > @@ -1117,6 +1117,45 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
> > > >                        sample->insn_len, (void *)sample->insn);
> > > >  }
> > > >
> > > > +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> > > > +                               struct cs_etm_traceid_queue *tidq)
> > > > +{
> > > > +   struct cs_etm_auxtrace *etm = etmq->etm;
> > > > +   u8 trace_chan_id = tidq->trace_chan_id;
> > > > +   int insn_len;
> > > > +   u64 from_ip, to_ip;
> > > > +
> > > > +   if (etm->synth_opts.thread_stack) {
> > > > +           from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
> > > > +           to_ip = cs_etm__first_executed_instr(tidq->packet);
> > > > +
> > > > +           insn_len = cs_etm__instr_size(etmq, trace_chan_id,
> > > > +                                         tidq->prev_packet->isa, from_ip);
> > > > +
> > > > +           /*
> > > > +            * Create thread stacks by keeping track of calls and returns;
> > > > +            * any call pushes thread stack, return pops the stack, and
> > > > +            * flush stack when the trace is discontinuous.
> > > > +            */
> > > > +           thread_stack__event(tidq->thread, tidq->prev_packet->cpu,
> > > > +                               tidq->prev_packet->flags,
> > > > +                               from_ip, to_ip, insn_len,
> > > > +                               etmq->buffer->buffer_nr);
> > >
> > > Details are a little fuzzy in my head but I'm pretty sure
> > > we want trace_chan_id here.
> >
> > I spent some time to look into this question, and I think we don't
> > need to add extra info for trace_chan_id.
> >
> > The main reason is for CPU wide tracing, if one task is migrated from
> > CPU_a to CPU_b, if we append 'trace_chan_id' for the buffer number, then
> > it will tell the thread_stack that the buffer has been changed (or it
> > will be considered the trace is discontinuous), then thread stack will
> > be flushed.  Actually, this is not what we want; if a task is migrated
> > from one CPU to another, we still need to keep its thread stack if the
> > trace data comes from the same buffer_nr.
> 
> After reviewing the code I conclude that using etmq->buffer->buffer_nr
> is the correct way to proceed.

Thanks for reviewing and confirmation.

> That being said you have sent this new set [1], which is a rework of
> some of the code you have in the current set.  As such the only way
> forward is for you to wait until [1] I has been applied and rebase the
> remaining work in this set on top of it.

Right.

Seems the shared link is incorrect :)  Let's firstly focus on the patch
set: 'perf cs-etm: Fix synthesizing instruction samples' [2] and after
it is merged I will send new patch set for cs-etm callchain support as
soon as possible.

Thanks,
Leo Yan

[2] https://patchwork.kernel.org/cover/11209991/

> Let me know if you have questions.
> 
> Thanks,
> Mathieu
> 
> [1]. https://patchwork.kernel.org/cover/11130213/
> 
> >
> > To be honest, I struggled to understand what's the purpose for
> > 'buffer->buffer_nr', from the code, I think 'buffer->buffer_nr' is
> > mainly used to trace the splitted buffers (e.g. the buffers are splitted
> > into different queues so the trace data coming from different trace
> > chunk?).  Now I observe 'buffer->buffer_nr' is always zero since the
> > buffer is not used with splitted mode.  If later we support 1:1 map
> > between tracers and sinks, then we need to set 'buffer->buffer_nr' so
> > can reflect the correct buffer mapping, but we don't need to use
> > trace_chan_id as extra info at here.
> >
> > Please let me know what you think about this?  If you agree with this,
> > I will send out patch v4 soon with addressing other comments.
> >
> > Thanks,
> > Leo Yan
> >
> > > > +   } else {
> > > > +           /*
> > > > +            * The thread stack can be output via thread_stack__process();
> > > > +            * thus the detailed information about paired calls and returns
> > > > +            * will be facilitated by Python script for the db-export.
> > > > +            *
> > > > +            * Need to set trace buffer number and flush thread stack if the
> > > > +            * trace buffer number has been alternate.
> > > > +            */
> > > > +           thread_stack__set_trace_nr(tidq->thread,
> > > > +                                      tidq->prev_packet->cpu,
> > > > +                                      etmq->buffer->buffer_nr);
> > >
> > > Same here.
> > >
> > > > +   }
> > > > +}
> > > > +
> > > >  static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> > > >                                         struct cs_etm_traceid_queue *tidq,
> > > >                                         u64 addr, u64 period)
> > > > @@ -1393,6 +1432,9 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > > >             tidq->period_instructions = instrs_over;
> > > >     }
> > > >
> > > > +   if (tidq->prev_packet->last_instr_taken_branch)
> > > > +           cs_etm__add_stack_event(etmq, tidq);
> > > > +
> > > >     if (etm->sample_branches) {
> > > >             bool generate_sample = false;
> > > >
> > > > @@ -2593,6 +2635,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
> > > >             itrace_synth_opts__set_default(&etm->synth_opts,
> > > >                             session->itrace_synth_opts->default_no_sample);
> > > >             etm->synth_opts.callchain = false;
> > > > +           etm->synth_opts.thread_stack =
> > > > +                           session->itrace_synth_opts->thread_stack;
> > > >     }
> > > >
> > > >     err = cs_etm__synth_events(etm, session);
> > > > --
> > > > 2.17.1
> > > >

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-10-29  4:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-05  9:16 [PATCH v3 0/6] perf cs-etm: Support thread stack and callchain Leo Yan
2019-10-05  9:16 ` [PATCH v3 1/6] perf cs-etm: Fix unsigned variable comparison to zero Leo Yan
2019-10-11 20:16   ` Mathieu Poirier
2019-10-22  5:10     ` Leo Yan
2019-10-22 23:36       ` Mike Leach
2019-10-23  6:49         ` Leo Yan
2019-10-05  9:16 ` [PATCH v3 2/6] perf cs-etm: Refactor instruction size handling Leo Yan
2019-10-05  9:16 ` [PATCH v3 3/6] perf cs-etm: Support thread stack Leo Yan
2019-10-11 17:53   ` Mathieu Poirier
2019-10-15  3:33     ` Leo Yan
2019-10-22  5:03     ` Leo Yan
2019-10-28 22:43       ` Mathieu Poirier
2019-10-29  4:11         ` Leo Yan
2019-10-05  9:16 ` [PATCH v3 4/6] perf cs-etm: Support branch filter Leo Yan
2019-10-05  9:16 ` [PATCH v3 5/6] perf cs-etm: Support callchain for instruction sample Leo Yan
2019-10-11 19:59   ` Mathieu Poirier
2019-10-05  9:16 ` [PATCH v3 6/6] perf cs-etm: Synchronize instruction sample with the thread stack Leo Yan
2019-10-11 20:17   ` Mathieu Poirier
2019-10-15  3:44     ` Leo Yan
2019-10-22  4:50     ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).