linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain
@ 2020-02-20  5:26 Leo Yan
  2020-02-20  5:26 ` [PATCH v5 1/9] perf cs-etm: Defer to assign exception sample flag Leo Yan
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Leo Yan @ 2020-02-20  5:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Mike Leach, Robert Walker, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML
  Cc: Leo Yan

This patch series adds support for thread stack and callchain; this patch
set depends on the instruction sample fix patch set [1].

This patch set get more complex, so before divide into small groups, I'd
like to use this patch set version to include all relevant patches, hope
this can give whole context for related code change.

Briefly, this patch can be divided into three parts, which also can be
reviewed separately for every part:

Patches 01, 02 are used to fix samples for one corner case is for
accessing the branch's target address and trigger an exception.
Essentially, an extra branch sample is added to reflect this
mediate branch between the previous branch and exception entry.

Patches 03, 04, 05, 06 are coming from patch v4, which are used to
support thread stack and callchain.

Patches 07, 08, 09 are used to fixup for exception entry and exit.  This
is mainly used to fix two cases, one part is to fixup the thread stack
and callchain for the case when access branch target address and trigger
exception; another part is to fixup the thread stack for instruction
emulation (and other single step cases).

This patch set has been tested on Juno-r2 after applied on perf/core
branch with latest commit 85fc95d75970 ("perf maps: Add missing unlock
to maps__insert() error case"), and this patch set is also applied on
top of instruction sample fix patch set [1].


Test for option '-F,+callindent':

  # perf script -F,+callindent
            main  3258          1          branches:         main                                                         ffffad684d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so)
            main  3258          1          branches:             lib_loop_test@plt                                        aaaae2c4d78c main+0x18 (/root/coresight_test/main)
            main  3258          1          branches:                 _dl_fixup                                            ffffad811b4c _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main  3258          1          branches:                     _dl_lookup_symbol_x                              ffffad80c078 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main  3258          1          branches:                         do_lookup_x                                  ffffad80849c _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main  3258          1          branches:                             check_match                              ffffad807bf0 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main  3258          1          branches:                                 strcmp                               ffffad807888 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main  3258          1          branches:             lib_loop_test@plt                                        aaaae2c4d78c main+0x18 (/root/coresight_test/main)
            main  3258          1          branches:             lib_loop_test@plt                                        aaaae2c4d78c main+0x18 (/root/coresight_test/main)
            main  3258          1          branches:             lib_loop_test@plt                                        aaaae2c4d78c main+0x18 (/root/coresight_test/main)
            main  3258          1          branches:             lib_loop_test@plt                                        aaaae2c4d78c main+0x18 (/root/coresight_test/main)

  [...]


Test for option '--itrace=g':

  # perf script --itrace=g16l64i100

main  3258        100      instructions: 
	    ffffad816a80 memcpy+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad809468 _dl_new_object+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

main  3258        100      instructions: 
	    ffffad80952c _dl_new_object+0x16c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

main  3258        100      instructions: 
	    ffffad8018dc dl_main+0x814 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

main  3258        100      instructions: 
	ffff8000100878d0 el0_sync_handler+0x168 ([kernel.kallsyms])
	ffff800010082d00 el0_sync+0x140 ([kernel.kallsyms])
	    ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

  [...]

Changes from v4:
* Addressed Mike's suggestion for performance improvement for function
  cs_etm__instr_addr() for quick calculation for non T32;
* Removed the patch 'perf cs-etm: Synchronize instruction sample with
  the thread stack' (Mike);
* Fixed the issue for exception is taken for branch target address
  accessing, for the branch sample and stack thread handling, the
  related patches are 01, 02, 07;
* Fixed the stack thread handling for instruction emulation and single
  step with patches 08, 09.

Changes from v3:
* Split out separate patch set for instruction samples fixing.
* Rebased on latest perf/core branch.

Changes from v2:
* Added patch 01 to fix the unsigned variable comparison to zero
  (Suzuki).
* Refined commit logs.

Changes from v1:
* Added comments for task thread handling (Mathieu).
* Split patch 02 into two patches, one is for support thread stack and
  another is for callchain support (Mathieu).
* Added a new patch to support branch filter.

[1] https://lkml.org/lkml/2020/2/18/1406


Leo Yan (9):
  perf cs-etm: Defer to assign exception sample flag
  perf cs-etm: Reflect branch prior to exception
  perf cs-etm: Refactor instruction size handling
  perf cs-etm: Support thread stack
  perf cs-etm: Support branch filter
  perf cs-etm: Support callchain for instruction sample
  perf cs-etm: Fixup exception entry for thread stack
  perf thread: Add helper to get top return address
  perf cs-etm: Fixup exception exit for thread stack

 .../perf/util/cs-etm-decoder/cs-etm-decoder.c |   1 +
 tools/perf/util/cs-etm.c                      | 290 ++++++++++++++++--
 tools/perf/util/thread-stack.c                |  10 +
 tools/perf/util/thread-stack.h                |   1 +
 4 files changed, 268 insertions(+), 34 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/9] perf cs-etm: Defer to assign exception sample flag
  2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
@ 2020-02-20  5:26 ` Leo Yan
  2020-02-20  5:26 ` [PATCH v5 2/9] perf cs-etm: Reflect branch prior to exception Leo Yan
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Leo Yan @ 2020-02-20  5:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Mike Leach, Robert Walker, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML
  Cc: Leo Yan

Currently, neither the exception entry packet nor the exception return
packet isn't used to generate samples; so the exception packet is only
used as an affiliate packet, and the exception sample flag is assigned
to its previous range packet, this is finished in the function
cs_etm__set_sample_flags().

This patch moves the exception sample flag assignment from
cs_etm__set_sample_flags() to cs_etm__exception(), essentially it defers
to assign exception sample flag to the previous range packet, thus this
gives us a chance to keep the previous range packet's original sample
flag.

So this patch is only a preparation for later patches and doesn't
include any change for the functionality; based on it, we can add extra
processing between the exception packet and its previous range packet.

To reduce the indenting, this patch bails out directly at the entry of
cs_etm__exception() if detects the previous packet is not a range
packet.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index bba969d48076..48932a7a933f 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1479,6 +1479,13 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 
 static int cs_etm__exception(struct cs_etm_traceid_queue *tidq)
 {
+	/*
+	 * Usually the exception packet follows a range packet, if it's not the
+	 * case, directly bail out.
+	 */
+	if (tidq->prev_packet->sample_type != CS_ETM_RANGE)
+		return 0;
+
 	/*
 	 * When the exception packet is inserted, whether the last instruction
 	 * in previous range packet is taken branch or not, we need to force
@@ -1490,8 +1497,16 @@ static int cs_etm__exception(struct cs_etm_traceid_queue *tidq)
 	 * swap PACKET with PREV_PACKET.  This keeps PREV_PACKET to be useful
 	 * for generating instruction and branch samples.
 	 */
-	if (tidq->prev_packet->sample_type == CS_ETM_RANGE)
-		tidq->prev_packet->last_instr_taken_branch = true;
+	tidq->prev_packet->last_instr_taken_branch = true;
+
+	/*
+	 * Since the exception packet is not used standalone for generating
+	 * samples and it's affiliation to the previous instruction range
+	 * packet; so set previous range packet flags to tell perf it is an
+	 * exception taken branch.
+	 */
+	if (tidq->packet->sample_type == CS_ETM_EXCEPTION)
+		tidq->prev_packet->flags = tidq->packet->flags;
 
 	return 0;
 }
@@ -1916,15 +1931,6 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
 					PERF_IP_FLAG_CALL |
 					PERF_IP_FLAG_INTERRUPT;
 
-		/*
-		 * When the exception packet is inserted, since exception
-		 * packet is not used standalone for generating samples
-		 * and it's affiliation to the previous instruction range
-		 * packet; so set previous range packet flags to tell perf
-		 * it is an exception taken branch.
-		 */
-		if (prev_packet->sample_type == CS_ETM_RANGE)
-			prev_packet->flags = packet->flags;
 		break;
 	case CS_ETM_EXCEPTION_RET:
 		/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 2/9] perf cs-etm: Reflect branch prior to exception
  2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
  2020-02-20  5:26 ` [PATCH v5 1/9] perf cs-etm: Defer to assign exception sample flag Leo Yan
@ 2020-02-20  5:26 ` Leo Yan
  2020-02-20  5:26 ` [PATCH v5 3/9] perf cs-etm: Refactor instruction size handling Leo Yan
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Leo Yan @ 2020-02-20  5:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Mike Leach, Robert Walker, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML
  Cc: Leo Yan

When a branch instruction is to be executed, if the branch target
address is not mapped into the virtual address space, this branch
instruction will trigger an exception with data abort.  For this case,
CoreSight decoding flow cannot reflect the complete branch flow prior
to exception, and leads the user space addresses inconsistency before
and after the exception handling.

Let's see the detailed explanation for the issue with an example:

  Packet 0: range packet
            start_addr=0xffffad8018a4     end_addr=0xffffad8018ec
  Packet 1: exception packet
            start_addr=0xffffad8018a4     end_addr=0xffffad801910
  Packet 2: range packet
            start_addr=0xffff800010081c00 end_addr=0xffff800010081c18

There have three packets are coming; from packet 0 to packet 1,
CPU tries to branch from 0xffffad8018ec-4 to 0xffffad801910, accessing
the address 0xffffad801910 causes the data abort, so this branch is not
taken and an exception is triggered and jump to 0xffff800010081c00 in
packet 2.

When handle this sequence, it misses a range packet for the branch
between 0xffffad8018ec-4 and 0xffffad801910, so Perf tool cannot
generate a branch sample for it and this might introduce confusion for
the addresses before and after exception handling, since we can see the
exception return address is 0xffffad801910, which is not a sequential
value for the address 0xffffad8018ec-4 before exception was taken.

  0xffffad8018ec-4 -> 0xffff800010081c00: exception is taken ...
  ... exception return back -> 0xffffad801910

To fix this issue, firstly we need to decide which conditions can be
used to distinguish that a branch triggers an exception.  So below
conditions are used to make decision:

  - Check if the exception is a trap by comparing the specific sample
    flag for the exception packet;
  - The exception packet's end address is not same with its previous
    range packet's end address, which implies a branch triggering the
    exception and the branch target address is contained in the
    exception packet's end address.

This patch changes the exception packet to a 'fake' range packet; this
allows to generate an extra branch sample for the branch instruction
prior to the exception (between 0xffffad8018ec-4 and 0xffffad801910).
So finally can get below samples:

  0xffffad8018ec-4 -> 0xffffad801910: branch
  0xffffad801910 -> 0xffff800010081c00: exception is taken ...
  ... exception return back -> 0xffffad801910

Note, this 'fake' range packet will add an extra recording for last
branch array and change the thread stack pushing and popping (if later
supported).  But since 'fake' range packet's instruction length is set
to zero, it doesn't introduce any change for instruction samples.

Before:

  # perf script -F,+flags

             main  3258          1          branches:   int                      ffffad8018e8 dl_main+0x820 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) => ffff800010081c00 vectors+0x400 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010081c20 vectors+0x420 ([kernel.kallsyms]) => ffff800010082bc0 el0_sync+0x0 ([kernel.kallsyms])
             main  3258          1          branches:   jcc                  ffff800010082c8c el0_sync+0xcc ([kernel.kallsyms]) => ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms]) => ffff800010082ccc el0_sync+0x10c ([kernel.kallsyms])
             [...]
             main  3258          1          branches:   jcc                  ffff800010083574 finish_ret_to_user+0x34 ([kernel.kallsyms]) => ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms]) => ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms]) => ffff8000100835c4 finish_ret_to_user+0x84 ([kernel.kallsyms])
             main  3258          1          branches:   iret                 ffff800010083610 finish_ret_to_user+0xd0 ([kernel.kallsyms]) =>     ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

After:

  # perf script -F,+flags

             main  3258          1          branches:   jmp                      ffffad8018e8 dl_main+0x820 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) =>     ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
             main  3258          1          branches:   int                      ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so) => ffff800010081c00 vectors+0x400 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010081c20 vectors+0x420 ([kernel.kallsyms]) => ffff800010082bc0 el0_sync+0x0 ([kernel.kallsyms])
             main  3258          1          branches:   jcc                  ffff800010082c8c el0_sync+0xcc ([kernel.kallsyms]) => ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010082ca0 el0_sync+0xe0 ([kernel.kallsyms]) => ffff800010082ccc el0_sync+0x10c ([kernel.kallsyms])
             [...]
             main  3258          1          branches:   jcc                  ffff800010083574 finish_ret_to_user+0x34 ([kernel.kallsyms]) => ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010083580 finish_ret_to_user+0x40 ([kernel.kallsyms]) => ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms])
             main  3258          1          branches:   jmp                  ffff800010083598 finish_ret_to_user+0x58 ([kernel.kallsyms]) => ffff8000100835c4 finish_ret_to_user+0x84 ([kernel.kallsyms])
             main  3258          1          branches:   iret                 ffff800010083610 finish_ret_to_user+0xd0 ([kernel.kallsyms]) =>     ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

Suggested-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 .../perf/util/cs-etm-decoder/cs-etm-decoder.c |  1 +
 tools/perf/util/cs-etm.c                      | 66 ++++++++++++++++++-
 2 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index cd92a99eb89d..f1f66d883391 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -482,6 +482,7 @@ cs_etm_decoder__buffer_exception(struct cs_etm_packet_queue *queue,
 
 	packet = &queue->packet_buffer[queue->tail];
 	packet->exception_number = elem->exception_number;
+	packet->end_addr = elem->en_addr;
 
 	return ret;
 }
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 48932a7a933f..7cf30b5e0e20 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1477,8 +1477,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 	return 0;
 }
 
-static int cs_etm__exception(struct cs_etm_traceid_queue *tidq)
+static int cs_etm__exception(struct cs_etm_queue *etmq,
+			     struct cs_etm_traceid_queue *tidq)
 {
+	u32 flags;
+
 	/*
 	 * Usually the exception packet follows a range packet, if it's not the
 	 * case, directly bail out.
@@ -1486,6 +1489,65 @@ static int cs_etm__exception(struct cs_etm_traceid_queue *tidq)
 	if (tidq->prev_packet->sample_type != CS_ETM_RANGE)
 		return 0;
 
+	/*
+	 * If the exception is a trap and its end_addr is not same with its
+	 * previous range packet's end_addr, this implies the exception is
+	 * triggered by a branch and the exception packet's end_addr is the
+	 * branch target address from the previous range packet.
+	 *
+	 * Below is an example with three packets:
+	 *   Packet 0: range packet
+	 *             start_addr=0xffffad8018a4     end_addr=0xffffad8018ec
+	 *   Packet 1: exception packet
+	 *             start_addr=0xffffad8018a4     end_addr=0xffffad801910
+	 *   Packet 2: range packet
+	 *             start_addr=0xffff800010081c00 end_addr=0xffff800010081c18
+	 *
+	 * CPU tries to branch from 0xffffad8018ec-4 (packet 0) to
+	 * 0xffffad801910 (packet 1), accessing the address 0xffffad801910
+	 * causes data abort, so the branch is not taken and an exception is
+	 * triggered and jump to 0xffff800010081c00 (packet 2).
+	 *
+	 * For this case, it misses a range packet for the branch between
+	 * 0xffffad8018ec-4 and 0xffffad801910, so perf tool cannot generate
+	 * branch sample and introduces confusion for exception return parsing:
+	 *
+	 *   0xffffad8018ec-4 -> 0xffff800010081c00: exception is taken
+	 *   ... exception return back ... -> 0xffffad801910
+	 *
+	 * To fix this issue, the exception packet is changed to a 'fake'
+	 * range packet.  This can allow to generate a branch sample between
+	 * 0xffffad8018ec-4 and 0xffffad801910.  Finally get below samples:
+	 *
+	 *   0xffffad8018ec-4 -> 0xffffad801910: branch
+	 *   0xffffad801910 -> 0xffff800010081c00: exception is taken
+	 *   ... exception return back ... -> 0xffffad801910
+	 */
+
+	/* Use flags to check if the exception is trap */
+	flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		PERF_IP_FLAG_INTERRUPT;
+
+	if (tidq->packet->sample_type == CS_ETM_EXCEPTION &&
+	    tidq->packet->flags == flags &&
+	    tidq->packet->end_addr != tidq->prev_packet->end_addr) {
+		/*
+		 * Change the exception packet to a range packet, so can reflect
+		 * branch from prev_packet::end_addr-4 to packet::start_addr;
+		 *
+		 * This branch is not taken yet, so set its instruction count
+		 * to zero.  Set 'last_instr_taken_branch' to true, so allow
+		 * it to generate samples with its seqential range packet.
+		 */
+		tidq->packet->sample_type = CS_ETM_RANGE;
+		tidq->packet->start_addr = tidq->packet->end_addr;
+		tidq->packet->instr_count = 0;
+		tidq->packet->last_instr_taken_branch = true;
+
+		/* Generate sample with the previous range packet */
+		return cs_etm__sample(etmq, tidq);
+	}
+
 	/*
 	 * When the exception packet is inserted, whether the last instruction
 	 * in previous range packet is taken branch or not, we need to force
@@ -2045,7 +2107,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
 			 * make sure the previous instruction
 			 * range packet to be handled properly.
 			 */
-			cs_etm__exception(tidq);
+			cs_etm__exception(etmq, tidq);
 			break;
 		case CS_ETM_DISCONTINUITY:
 			/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 3/9] perf cs-etm: Refactor instruction size handling
  2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
  2020-02-20  5:26 ` [PATCH v5 1/9] perf cs-etm: Defer to assign exception sample flag Leo Yan
  2020-02-20  5:26 ` [PATCH v5 2/9] perf cs-etm: Reflect branch prior to exception Leo Yan
@ 2020-02-20  5:26 ` Leo Yan
  2020-02-20  5:26 ` [PATCH v5 4/9] perf cs-etm: Support thread stack Leo Yan
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Leo Yan @ 2020-02-20  5:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Mike Leach, Robert Walker, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML
  Cc: Leo Yan

cs-etm.c has several functions which need to know instruction size
based on address, e.g. cs_etm__instr_addr() and cs_etm__copy_insn()
two functions both calculate the instruction size separately with its
duplicated code.  Furthermore, adding new features later which might
require to calculate instruction size as well.

For this reason, this patch refactors the code to introduce a new
function cs_etm__instr_size(), this function is central place to
calculate the instruction size based on ISA type and instruction
address.

Given the trace data can be MB and most likely that will be A64/A32 on
a lot of the current and future platforms, cs_etm__instr_addr() keeps a
single ISA type check for non T32, for this case it executes an
optimized calculation (addr + offset * 4).

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 52 ++++++++++++++++++++++++----------------
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 7cf30b5e0e20..f3ba2cfb634f 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -935,6 +935,26 @@ static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
 	return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
 }
 
+static inline int cs_etm__instr_size(struct cs_etm_queue *etmq,
+				     u8 trace_chan_id,
+				     enum cs_etm_isa isa,
+				     u64 addr)
+{
+	int insn_len;
+
+	/*
+	 * T32 instruction size might be 32-bit or 16-bit, decide by calling
+	 * cs_etm__t32_instr_size().
+	 */
+	if (isa == CS_ETM_ISA_T32)
+		insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id, addr);
+	/* Otherwise, A64 and A32 instruction size are always 32-bit. */
+	else
+		insn_len = 4;
+
+	return insn_len;
+}
+
 static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
 {
 	/* Returns 0 for the CS_ETM_DISCONTINUITY packet */
@@ -959,19 +979,19 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
 				     const struct cs_etm_packet *packet,
 				     u64 offset)
 {
-	if (packet->isa == CS_ETM_ISA_T32) {
-		u64 addr = packet->start_addr;
+	u64 addr = packet->start_addr;
 
-		while (offset) {
-			addr += cs_etm__t32_instr_size(etmq,
-						       trace_chan_id, addr);
-			offset--;
-		}
-		return addr;
+	/* Optimize calculation for non T32 */
+	if (packet->isa != CS_ETM_ISA_T32)
+		return addr + offset * 4;
+
+	while (offset) {
+		addr += cs_etm__instr_size(etmq, trace_chan_id,
+					   packet->isa, addr);
+		offset--;
 	}
 
-	/* Assume a 4 byte instruction size (A32/A64) */
-	return packet->start_addr + offset * 4;
+	return addr;
 }
 
 static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
@@ -1111,16 +1131,8 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
 		return;
 	}
 
-	/*
-	 * T32 instruction size might be 32-bit or 16-bit, decide by calling
-	 * cs_etm__t32_instr_size().
-	 */
-	if (packet->isa == CS_ETM_ISA_T32)
-		sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id,
-							  sample->ip);
-	/* Otherwise, A64 and A32 instruction size are always 32-bit. */
-	else
-		sample->insn_len = 4;
+	sample->insn_len = cs_etm__instr_size(etmq, trace_chan_id,
+					      packet->isa, sample->ip);
 
 	cs_etm__mem_access(etmq, trace_chan_id, sample->ip,
 			   sample->insn_len, (void *)sample->insn);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 4/9] perf cs-etm: Support thread stack
  2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (2 preceding siblings ...)
  2020-02-20  5:26 ` [PATCH v5 3/9] perf cs-etm: Refactor instruction size handling Leo Yan
@ 2020-02-20  5:26 ` Leo Yan
  2020-02-20  5:26 ` [PATCH v5 5/9] perf cs-etm: Support branch filter Leo Yan
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Leo Yan @ 2020-02-20  5:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Mike Leach, Robert Walker, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML
  Cc: Leo Yan

Since Arm CoreSight doesn't support thread stack, the decoding cannot
display symbols with indented spaces to reflect the stack depth.

This patch adds support thread stack for Arm CoreSight, this allows
'perf script' to display properly for option '-F,+callindent'.

Before:

  # perf script -F,+callindent
            main  2808          1          branches: coresight_test1                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
            main  2808          1          branches: printf@plt                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches: printf@plt                           aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
            main  2808          1          branches: _init                                aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
            main  2808          1          branches: _dl_fixup                            ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches: _dl_lookup_symbol_x                  ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]

After:

  # perf script -F,+callindent
            main  2808          1          branches:                 coresight_test1                                      ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
            main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches:                     printf@plt                                       aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
            main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 44 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index f3ba2cfb634f..08ca919aa2b1 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1138,6 +1138,45 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
 			   sample->insn_len, (void *)sample->insn);
 }
 
+static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
+				    struct cs_etm_traceid_queue *tidq)
+{
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	u8 trace_chan_id = tidq->trace_chan_id;
+	int insn_len;
+	u64 from_ip, to_ip;
+
+	if (etm->synth_opts.thread_stack) {
+		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
+		to_ip = cs_etm__first_executed_instr(tidq->packet);
+
+		insn_len = cs_etm__instr_size(etmq, trace_chan_id,
+					      tidq->prev_packet->isa, from_ip);
+
+		/*
+		 * Create thread stacks by keeping track of calls and returns;
+		 * any call pushes thread stack, return pops the stack, and
+		 * flush stack when the trace is discontinuous.
+		 */
+		thread_stack__event(tidq->thread, tidq->prev_packet->cpu,
+				    tidq->prev_packet->flags,
+				    from_ip, to_ip, insn_len,
+				    etmq->buffer->buffer_nr + 1);
+	} else {
+		/*
+		 * The thread stack can be output via thread_stack__process();
+		 * thus the detailed information about paired calls and returns
+		 * will be facilitated by Python script for the db-export.
+		 *
+		 * Need to set trace buffer number and flush thread stack if the
+		 * trace buffer number has been alternate.
+		 */
+		thread_stack__set_trace_nr(tidq->thread,
+					   tidq->prev_packet->cpu,
+					   etmq->buffer->buffer_nr + 1);
+	}
+}
+
 static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 					    struct cs_etm_traceid_queue *tidq,
 					    u64 addr, u64 period)
@@ -1382,6 +1421,9 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 	    tidq->prev_packet->last_instr_taken_branch)
 		cs_etm__update_last_branch_rb(etmq, tidq);
 
+	if (tidq->prev_packet->last_instr_taken_branch)
+		cs_etm__add_stack_event(etmq, tidq);
+
 	if (etm->sample_instructions &&
 	    tidq->period_instructions >= etm->instructions_sample_period) {
 		/*
@@ -2730,6 +2772,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 		itrace_synth_opts__set_default(&etm->synth_opts,
 				session->itrace_synth_opts->default_no_sample);
 		etm->synth_opts.callchain = false;
+		etm->synth_opts.thread_stack =
+				session->itrace_synth_opts->thread_stack;
 	}
 
 	err = cs_etm__synth_events(etm, session);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 5/9] perf cs-etm: Support branch filter
  2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (3 preceding siblings ...)
  2020-02-20  5:26 ` [PATCH v5 4/9] perf cs-etm: Support thread stack Leo Yan
@ 2020-02-20  5:26 ` Leo Yan
  2020-02-20  5:26 ` [PATCH v5 6/9] perf cs-etm: Support callchain for instruction sample Leo Yan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Leo Yan @ 2020-02-20  5:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Mike Leach, Robert Walker, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML
  Cc: Leo Yan

If user specifies option '-F,+callindent' or call chain related options,
it means users only care about function calls and returns; for these
cases, it's pointless to generate samples for the branches within
function.  But unlike other hardware trace handling (e.g. Intel's pt or
bts), Arm CoreSight doesn't filter branch types for these options and
generate samples for all branches, this causes Perf to output many
spurious blanks if the branch is not a function call or return.

To only output pairs of calls and returns, this patch introduces branch
filter and the filter is set according to synthetic options.  Finally,
Perf can output only for calls and returns and avoid to output other
unnecessary blanks.

Before:

  # perf script -F,+callindent
            main  2808          1          branches:                 coresight_test1@plt                                  aaaaba8d37d8 main+0x14 (/root/coresight_test/main)
            main  2808          1          branches:                     coresight_test1@plt                              aaaaba8d367c coresight_test1@plt+0xc (/root/coresight_test/main)
            main  2808          1          branches:                     _init                                            aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                                                                      ffff8636a3f4 _dl_lookup_symbol_x+0x5c (/lib/aarch64-linux-gnu/ld-2.28.s
  [...]

After:

  # perf script -F,+callindent
            main  2808          1          branches:                 coresight_test1@plt                                  aaaaba8d37d8 main+0x14 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                             do_lookup_x                              ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28.
            main  2808          1          branches:                                 check_match                          ffff86369bf0 do_lookup_x+0x238 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                     strcmp                           ffff86369888 check_match+0x70 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                 printf@plt                                           aaaaba8d37ec main+0x28 (/root/coresight_test/main)
            main  2808          1          branches:                     _dl_fixup                                        ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
            main  2808          1          branches:                         _dl_lookup_symbol_x                          ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                             do_lookup_x                              ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28.
            main  2808          1          branches:                                 _dl_name_match_p                     ffff86369af0 do_lookup_x+0x138 (/lib/aarch64-linux-gnu/ld-2.28.so)
            main  2808          1          branches:                                     strcmp                           ffff8636f7f0 _dl_name_match_p+0x18 (/lib/aarch64-linux-gnu/ld-2.28.so)
  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 08ca919aa2b1..1b08b650b090 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -56,6 +56,7 @@ struct cs_etm_auxtrace {
 
 	int num_cpu;
 	u32 auxtrace_type;
+	u32 branches_filter;
 	u64 branches_sample_type;
 	u64 branches_id;
 	u64 instructions_sample_type;
@@ -1239,6 +1240,10 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	} dummy_bs;
 	u64 ip;
 
+	if (etm->branches_filter &&
+	    !(etm->branches_filter & tidq->prev_packet->flags))
+		return 0;
+
 	ip = cs_etm__last_executed_instr(tidq->prev_packet);
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
@@ -2776,6 +2781,13 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 				session->itrace_synth_opts->thread_stack;
 	}
 
+	if (etm->synth_opts.calls)
+		etm->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
+					PERF_IP_FLAG_TRACE_END;
+	if (etm->synth_opts.returns)
+		etm->branches_filter |= PERF_IP_FLAG_RETURN |
+					PERF_IP_FLAG_TRACE_BEGIN;
+
 	err = cs_etm__synth_events(etm, session);
 	if (err)
 		goto err_delete_thread;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 6/9] perf cs-etm: Support callchain for instruction sample
  2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (4 preceding siblings ...)
  2020-02-20  5:26 ` [PATCH v5 5/9] perf cs-etm: Support branch filter Leo Yan
@ 2020-02-20  5:26 ` Leo Yan
  2020-02-20  5:26 ` [PATCH v5 7/9] perf cs-etm: Fixup exception entry for thread stack Leo Yan
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Leo Yan @ 2020-02-20  5:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Mike Leach, Robert Walker, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML
  Cc: Leo Yan

Now CoreSight has supported the thread stack; based on the thread stack
we can synthesize call chain for the instruction sample; the call chain
can be injected by option '--itrace=g'.

Note the stack event must be processed prior to synthesizing instruction
sample; this can ensure the thread stack to push and pop synchronously
with instruction sample and the thread stack can be generated correctly
for instruction samples.  Add a comment for related info.

Before:

  # perf script --itrace=g16l64i100
            main  1579        100      instructions:  ffff0000102137f0 group_sched_in+0xb0 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
            main  1579        100      instructions:  ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
  [...]

After:

  # perf script --itrace=g16l64i100

  main  1579        100      instructions:
          ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

  main  1579        100      instructions:
          ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
          ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
          ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])

  main  1579        100      instructions:
          ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
          ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
          ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
          ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
          ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 40 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 1b08b650b090..d9c22c145307 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -17,6 +17,7 @@
 #include <stdlib.h>
 
 #include "auxtrace.h"
+#include "callchain.h"
 #include "color.h"
 #include "cs-etm.h"
 #include "cs-etm-decoder/cs-etm-decoder.h"
@@ -74,6 +75,7 @@ struct cs_etm_traceid_queue {
 	size_t last_branch_pos;
 	union perf_event *event_buf;
 	struct thread *thread;
+	struct ip_callchain *chain;
 	struct branch_stack *last_branch;
 	struct branch_stack *last_branch_rb;
 	struct cs_etm_packet *prev_packet;
@@ -251,6 +253,16 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 	if (!tidq->prev_packet)
 		goto out_free;
 
+	if (etm->synth_opts.callchain) {
+		size_t sz = sizeof(struct ip_callchain);
+
+		/* Add 1 to callchain_sz for callchain context */
+		sz += (etm->synth_opts.callchain_sz + 1) * sizeof(u64);
+		tidq->chain = zalloc(sz);
+		if (!tidq->chain)
+			goto out_free;
+	}
+
 	if (etm->synth_opts.last_branch) {
 		size_t sz = sizeof(struct branch_stack);
 
@@ -273,6 +285,7 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 out_free:
 	zfree(&tidq->last_branch_rb);
 	zfree(&tidq->last_branch);
+	zfree(&tidq->chain);
 	zfree(&tidq->prev_packet);
 	zfree(&tidq->packet);
 out:
@@ -561,6 +574,7 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
 		zfree(&tidq->event_buf);
 		zfree(&tidq->last_branch);
 		zfree(&tidq->last_branch_rb);
+		zfree(&tidq->chain);
 		zfree(&tidq->prev_packet);
 		zfree(&tidq->packet);
 		zfree(&tidq);
@@ -1147,7 +1161,7 @@ static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
 	int insn_len;
 	u64 from_ip, to_ip;
 
-	if (etm->synth_opts.thread_stack) {
+	if (etm->synth_opts.callchain || etm->synth_opts.thread_stack) {
 		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
 		to_ip = cs_etm__first_executed_instr(tidq->packet);
 
@@ -1203,6 +1217,14 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 
 	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
 
+	if (etm->synth_opts.callchain) {
+		thread_stack__sample(tidq->thread, tidq->packet->cpu,
+				     tidq->chain,
+				     etm->synth_opts.callchain_sz + 1,
+				     sample.ip, etm->kernel_start);
+		sample.callchain = tidq->chain;
+	}
+
 	if (etm->synth_opts.last_branch)
 		sample.branch_stack = tidq->last_branch;
 
@@ -1385,6 +1407,8 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 		attr.sample_type &= ~(u64)PERF_SAMPLE_ADDR;
 	}
 
+	if (etm->synth_opts.callchain)
+		attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
 	if (etm->synth_opts.last_branch)
 		attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
 
@@ -1426,6 +1450,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 	    tidq->prev_packet->last_instr_taken_branch)
 		cs_etm__update_last_branch_rb(etmq, tidq);
 
+	/*
+	 * The stack event must be processed prior to synthesizing
+	 * instruction sample; this can ensure the instruction samples
+	 * to generate correct thread stack.
+	 */
 	if (tidq->prev_packet->last_instr_taken_branch)
 		cs_etm__add_stack_event(etmq, tidq);
 
@@ -2776,7 +2805,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 	} else {
 		itrace_synth_opts__set_default(&etm->synth_opts,
 				session->itrace_synth_opts->default_no_sample);
-		etm->synth_opts.callchain = false;
 		etm->synth_opts.thread_stack =
 				session->itrace_synth_opts->thread_stack;
 	}
@@ -2788,6 +2816,14 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 		etm->branches_filter |= PERF_IP_FLAG_RETURN |
 					PERF_IP_FLAG_TRACE_BEGIN;
 
+	if (etm->synth_opts.callchain && !symbol_conf.use_callchain) {
+		symbol_conf.use_callchain = true;
+		if (callchain_register_param(&callchain_param) < 0) {
+			symbol_conf.use_callchain = false;
+			etm->synth_opts.callchain = false;
+		}
+	}
+
 	err = cs_etm__synth_events(etm, session);
 	if (err)
 		goto err_delete_thread;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 7/9] perf cs-etm: Fixup exception entry for thread stack
  2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (5 preceding siblings ...)
  2020-02-20  5:26 ` [PATCH v5 6/9] perf cs-etm: Support callchain for instruction sample Leo Yan
@ 2020-02-20  5:26 ` Leo Yan
  2020-02-20  5:27 ` [PATCH v5 8/9] perf thread: Add helper to get top return address Leo Yan
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Leo Yan @ 2020-02-20  5:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Mike Leach, Robert Walker, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML
  Cc: Leo Yan

In theory when an exception is taken, the thread stack is pushed with an
expected return address (ret_addr): from_ip + insn_len; and later when
the exception returns back, it compares the return address (from the new
packet's to_ip) with the ret_addr in the of thread stack, if have the
same values then the thread stack will be popped.

When a branch instruction's target address triggers an exception, the
thread stack's ret_addr is the branch target address plus instruction
length for exception entry; but this branch instruction is not taken,
the exception return address is the branch target address, thus the
thread stack's ret_addr cannot match with the exception return address,
so the thread stack cannot pop properly.

This patch fixes up the ret_addr at the exception entry, when it detects
the exception is triggered by a branch target address, it sets
'insn_len' to zero.  This allows the thread stack can pop properly when
return from exception.

Before:

  # perf script --itrace=g16l64i100

  main  3258        100      instructions:
          ffff800010082c1c el0_sync+0x5c ([kernel.kallsyms])
              ffffad816a14 memcpy+0x4 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800820 _dl_start_final+0x48 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800044 _start+0x4 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

The issues in the output:
memcpy+0x4 => The function call memcpy() causes exception; it's return
              address should be memcpy+0x0.
_start+0x4 => The thread stack is not popped correctly, this is a stale
              data which is left in the previous exception flow.

After:

  # perf script --itrace=g16l64i100

  main  3258        100      instructions:
          ffff800010082c1c el0_sync+0x5c ([kernel.kallsyms])
              ffffad816a10 memcpy+0x0 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800820 _dl_start_final+0x48 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index d9c22c145307..4800daf0dc3d 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1160,6 +1160,7 @@ static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
 	u8 trace_chan_id = tidq->trace_chan_id;
 	int insn_len;
 	u64 from_ip, to_ip;
+	u32 flags;
 
 	if (etm->synth_opts.callchain || etm->synth_opts.thread_stack) {
 		from_ip = cs_etm__last_executed_instr(tidq->prev_packet);
@@ -1168,6 +1169,27 @@ static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
 		insn_len = cs_etm__instr_size(etmq, trace_chan_id,
 					      tidq->prev_packet->isa, from_ip);
 
+		/*
+		 * Fixup the exception entry.
+		 *
+		 * If the packet's start_addr is same with its end_addr, this
+		 * packet was altered from a exception packet to a range packet;
+		 * the detailed info is described in cs_etm__exception(), which
+		 * is used to handle the case for a branch instruction is not
+		 * taken but the branch triggers an exception.
+		 *
+		 * In this case, fixup 'insn_len' to zero so that allow the
+		 * thread stack's return address can match with the exception
+		 * return address, finally can pop up thread stack properly when
+		 * return from exception.
+		 */
+		flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+			PERF_IP_FLAG_INTERRUPT;
+		if (tidq->prev_packet->flags == flags &&
+		    tidq->prev_packet->start_addr ==
+		    tidq->prev_packet->end_addr)
+			insn_len = 0;
+
 		/*
 		 * Create thread stacks by keeping track of calls and returns;
 		 * any call pushes thread stack, return pops the stack, and
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 8/9] perf thread: Add helper to get top return address
  2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (6 preceding siblings ...)
  2020-02-20  5:26 ` [PATCH v5 7/9] perf cs-etm: Fixup exception entry for thread stack Leo Yan
@ 2020-02-20  5:27 ` Leo Yan
  2020-02-20  5:27 ` [PATCH v5 9/9] perf cs-etm: Fixup exception exit for thread stack Leo Yan
  2020-11-05 22:50 ` [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Stephen Boyd
  9 siblings, 0 replies; 12+ messages in thread
From: Leo Yan @ 2020-02-20  5:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Mike Leach, Robert Walker, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML
  Cc: Leo Yan

For the instruction emulation or single step in kernel, when return to
the user space, the return address is not possible to be the same with
the ret_addr in thread stack.

This patch adds a helper to read out the top return address from thread
stack, this can be used for specific calibration in up case.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/thread-stack.c | 10 ++++++++++
 tools/perf/util/thread-stack.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 0885967d5bc3..60cd6fdca8de 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -497,6 +497,16 @@ void thread_stack__sample(struct thread *thread, int cpu,
 	chain->nr = i;
 }
 
+u64 thread_stack__get_top_ret_addr(struct thread *thread, int cpu)
+{
+	struct thread_stack *ts = thread__stack(thread, cpu);
+
+	if (!ts || !ts->cnt)
+		return UINT64_MAX;
+
+	return ts->stack[ts->cnt--].ret_addr;
+}
+
 struct call_return_processor *
 call_return_processor__new(int (*process)(struct call_return *cr, u64 *parent_db_id, void *data),
 			   void *data)
diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
index e1ec5a58f1b2..b9d07a3be6c2 100644
--- a/tools/perf/util/thread-stack.h
+++ b/tools/perf/util/thread-stack.h
@@ -88,6 +88,7 @@ void thread_stack__sample(struct thread *thread, int cpu, struct ip_callchain *c
 int thread_stack__flush(struct thread *thread);
 void thread_stack__free(struct thread *thread);
 size_t thread_stack__depth(struct thread *thread, int cpu);
+u64 thread_stack__get_top_ret_addr(struct thread *thread, int cpu);
 
 struct call_return_processor *
 call_return_processor__new(int (*process)(struct call_return *cr, u64 *parent_db_id, void *data),
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 9/9] perf cs-etm: Fixup exception exit for thread stack
  2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (7 preceding siblings ...)
  2020-02-20  5:27 ` [PATCH v5 8/9] perf thread: Add helper to get top return address Leo Yan
@ 2020-02-20  5:27 ` Leo Yan
  2020-11-05 22:50 ` [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Stephen Boyd
  9 siblings, 0 replies; 12+ messages in thread
From: Leo Yan @ 2020-02-20  5:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mark Rutland, Mike Leach, Robert Walker, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-kernel, Coresight ML
  Cc: Leo Yan

For instruction emulation or other cases (like ptrace), the program will
be trapped into kernel and the kernel executes the instruction with
single step, so the exception return address will be one instruction
ahead than the recorded address 'ret_addr' in the thread stack.
Finally, it's impossible to pop up the thread stack due to the mismatch
between the return address and 'ret_addr'.

To fix this issue, this patch reads out the 'ret_addr' from the top of
thread stack, and if detects the exception return address is one
instruction ahead than 'ret_addr', it implies the kernel has executed
single step.  For this case, calibrate 'to_ip' to 'ret_addr' so can
allow the thread stack to pop up.

Before:

  main  3258        100      instructions:
          ffff800010095f48 do_emulate_mrs+0x48 ([kernel.kallsyms])
          ffff800010096060 emulate_mrs+0x48 ([kernel.kallsyms])
          ffff8000100904ec do_undefinstr+0x1f4 ([kernel.kallsyms])
          ffff80001008788c el0_sync_handler+0x124 ([kernel.kallsyms])
          ffff800010082d00 el0_sync+0x140 ([kernel.kallsyms])
              ffffad8137d8 _dl_sysdep_start+0x2f8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

  main  3258        100      instructions:
          ffff8000100835fc finish_ret_to_user+0xbc ([kernel.kallsyms])
              ffffad8137d8 _dl_sysdep_start+0x2f8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

  main  3258        100      instructions:
              ffffad801138 dl_main+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad8137d8 _dl_sysdep_start+0x2f8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

Note: after return back from instruction emulation with emulate_mrs(),
_dl_sysdep_start+0x2f8 cannot be popped up.

After:

  main  3258        100      instructions:
          ffff800010095f48 do_emulate_mrs+0x48 ([kernel.kallsyms])
          ffff800010096060 emulate_mrs+0x48 ([kernel.kallsyms])
          ffff8000100904ec do_undefinstr+0x1f4 ([kernel.kallsyms])
          ffff80001008788c el0_sync_handler+0x124 ([kernel.kallsyms])
          ffff800010082d00 el0_sync+0x140 ([kernel.kallsyms])
              ffffad8137d8 _dl_sysdep_start+0x2f8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

  main  3258        100      instructions:
          ffff8000100835fc finish_ret_to_user+0xbc ([kernel.kallsyms])
              ffffad8137d8 _dl_sysdep_start+0x2f8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

  main  3258        100      instructions:
              ffffad801138 dl_main+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
              ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 4800daf0dc3d..7ff55704de5c 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1190,6 +1190,34 @@ static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
 		    tidq->prev_packet->end_addr)
 			insn_len = 0;
 
+		/*
+		 * Fixup the exception exit.
+		 *
+		 * For instruction emulation or single step cases, when return
+		 * from exception, since an extra instruction has been executed
+		 * in kernel, the exception return address 'top_ip' is an
+		 * instruction ahead than the expected address 'ret_addr' in
+		 * thread stack.
+		 *
+		 * When detects this case, calibrate 'to_ip' to 'ret_addr' so
+		 * can pop up thread stack.
+		 */
+		flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+			PERF_IP_FLAG_INTERRUPT;
+		if (tidq->prev_packet->flags == flags) {
+			u64 ret_addr;
+			int ret_insn_len;
+
+			ret_addr = thread_stack__get_top_ret_addr(tidq->thread,
+						tidq->prev_packet->cpu);
+			ret_insn_len = cs_etm__instr_size(etmq,
+							  trace_chan_id,
+							  tidq->packet->isa,
+							  ret_addr);
+			if (to_ip == ret_addr + ret_insn_len)
+				to_ip = ret_addr;
+		}
+
 		/*
 		 * Create thread stacks by keeping track of calls and returns;
 		 * any call pushes thread stack, return pops the stack, and
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain
  2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (8 preceding siblings ...)
  2020-02-20  5:27 ` [PATCH v5 9/9] perf cs-etm: Fixup exception exit for thread stack Leo Yan
@ 2020-11-05 22:50 ` Stephen Boyd
  2020-11-06  2:09   ` Leo Yan
  9 siblings, 1 reply; 12+ messages in thread
From: Stephen Boyd @ 2020-11-05 22:50 UTC (permalink / raw)
  To: Alexander Shishkin, Arnaldo Carvalho de Melo, Coresight ML,
	Ingo Molnar, Jiri Olsa, Leo Yan, Mark Rutland, Mathieu Poirier,
	Mike Leach, Namhyung Kim, Peter Zijlstra, Robert Walker,
	Suzuki K Poulose, linux-arm-kernel, linux-kernel
  Cc: Leo Yan

Quoting Leo Yan (2020-02-19 21:26:52)
> This patch series adds support for thread stack and callchain; this patch
> set depends on the instruction sample fix patch set [1].
> 
> This patch set get more complex, so before divide into small groups, I'd
> like to use this patch set version to include all relevant patches, hope
> this can give whole context for related code change.

Was this split up into small groups and sent again? I didn't see
anything when searching lkml.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain
  2020-11-05 22:50 ` [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Stephen Boyd
@ 2020-11-06  2:09   ` Leo Yan
  0 siblings, 0 replies; 12+ messages in thread
From: Leo Yan @ 2020-11-06  2:09 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Coresight ML,
	Ingo Molnar, Jiri Olsa, Mark Rutland, Mathieu Poirier,
	Mike Leach, Namhyung Kim, Peter Zijlstra, Robert Walker,
	Suzuki K Poulose, linux-arm-kernel, linux-kernel

Hi Stephen,

On Thu, Nov 05, 2020 at 02:50:56PM -0800, Stephen Boyd wrote:
> Quoting Leo Yan (2020-02-19 21:26:52)
> > This patch series adds support for thread stack and callchain; this patch
> > set depends on the instruction sample fix patch set [1].
> > 
> > This patch set get more complex, so before divide into small groups, I'd
> > like to use this patch set version to include all relevant patches, hope
> > this can give whole context for related code change.
> 
> Was this split up into small groups and sent again? I didn't see
> anything when searching lkml.

No, this patch series is the last one for upstreaming; since I worked
on other stuffs, so didn't continue to upstream to the mainline kernel.

IIRC, there have a concern for a pontential breakage for perf cs-etm
testing, so falls to backlog.  Let me check with Mathieu/Mike offline
for how to proceed for this patch set.  Thanks for bringing up.

Leo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-06  2:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-20  5:26 [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
2020-02-20  5:26 ` [PATCH v5 1/9] perf cs-etm: Defer to assign exception sample flag Leo Yan
2020-02-20  5:26 ` [PATCH v5 2/9] perf cs-etm: Reflect branch prior to exception Leo Yan
2020-02-20  5:26 ` [PATCH v5 3/9] perf cs-etm: Refactor instruction size handling Leo Yan
2020-02-20  5:26 ` [PATCH v5 4/9] perf cs-etm: Support thread stack Leo Yan
2020-02-20  5:26 ` [PATCH v5 5/9] perf cs-etm: Support branch filter Leo Yan
2020-02-20  5:26 ` [PATCH v5 6/9] perf cs-etm: Support callchain for instruction sample Leo Yan
2020-02-20  5:26 ` [PATCH v5 7/9] perf cs-etm: Fixup exception entry for thread stack Leo Yan
2020-02-20  5:27 ` [PATCH v5 8/9] perf thread: Add helper to get top return address Leo Yan
2020-02-20  5:27 ` [PATCH v5 9/9] perf cs-etm: Fixup exception exit for thread stack Leo Yan
2020-11-05 22:50 ` [PATCH v5 0/9] perf cs-etm: Support thread stack and callchain Stephen Boyd
2020-11-06  2:09   ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).