* [PATCH v1 0/3] collect LBR callstack together with thread stack data @ 2019-08-09 15:16 Alexey Budankov 2019-08-09 15:23 ` [PATCH v1 1/3] perf record: enable LBR callstack capture jointly with thread stack Alexey Budankov ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Alexey Budankov @ 2019-08-09 15:16 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, linux-kernel Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Andi Kleen, Kan Liang, Jin, Yao The patch set unblocks collection of LBR call stack data simultaneously with raw thread stack data by --call-graph dwarf,SIZE option: $perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test Collected LBR call stack can be used to augment dwarf call stack calculated from the raw thread stack data and to provide more comprehensive call stack information for cases when collected SIZE is not enough to cover complete thread stack. Such cases are typical for workloads that allocate large arrays of data on its threads stacks or the possible SIZE to collect can't be large enough due to workload nature or system configuration and this is where hardware captured LBR call stacks can provide missing stack frames. Possible dwarf plus LBR call stacks consolidation algorithm description follows. With this patch set perf report command UI currently ignores collected LBR call stack data and still provides dwarf based call stacks information. =========================================================================== Overview: Legend: THS - thread stack CTX - thread register context SWS - software stack SSF - skipped stack frames PSS - Perf sample stack ip,sp,bp - HW registers values d - allocated stack regions kip - ip address in the kernel space K - captured thread stack size THS ----- | |<-stack bottom ... |---| |ip4| |---| PSS = SWS(THS(K)) | | --> | | | |d3 | user/ | |---| user PSS kernel PSS | |ip3| ------ ------ | |---| |SSF | |SSF | | | | .... .... | | | ------ ------ | |d2 | | -1 | | -1 | |---| user ------ ------ K |ip2| CTX |ip3 | |ip3 | |---| |----| |----| | |d1 | ... |ip2 | , |ip2 | | |---| |---| |----| |----| | |ip1| |bp0| |ip1 | |ip1 | | |---| |---| |----| |----| | | | |ip0|->|ip0 | |ip0 |<-user stack top | | | |---| ------ ------ | | |<-|sp0|<-stack |kip0|<-kernel stack bottom --> ----- ----- top |----| |kip1| |----| |kip2| |----| .... | |<-kernel stack top ------ Algorithm details: Legend: HWS - hardware stack K-SWS - kernel software stack BRANCH TABLE HWS ip ip from to ------ ----------- |ip7`| |ip7`| | |----| |----|----| |ip6`| |ip6`| | user PSS |----| |----|----| |ip5`| |ip5`| | ------ |----| |----|----| | -1 | |ip4`| |ip4`| | ------ |----| |----|----| |ip3 |~~~|ip3`| |ip3`| | |----| |----| |----|----| |ip2 |~~~|ip2`| |ip2`| | |----| |----| |----|----| |ip1 |~~~|ip1`| |ip1`|ip0`| |----| |----| ----------- |ip0 |~~~|ip0`|<---------' ------ ------ 1. if (sym(ipj) == sym(ipj`)), j=0-3 ===> user PSS 2. ipj` , j=4-7 ===> user PSS Augmented PSS = A_SWS(SWS(THS(K)), HWS): user/ user PSS kernel PSS ------ ------ |ip7`| |ip7`|<-user PSS bottom |----| |----| |ip6`| |ip6`| |----| |----| HWS |ip5`| |ip5`| |----| |----| |ip4`| |ip4`| ------ ------ |ip3 | |ip3 | |----| |----| SWS |ip2 | |ip2 | |----| |----| |ip1 | |ip1 | |----| |----| |ip0 | |ip0 |<-user PSS top ------ ------ |kip0|<-kernel PSS bottom |----| |kip1| K-SWS |----| |kip2| |----| |kip3|<-kernel PSS top ------ APSS =========================================================================== --- Alexey Budankov (3): perf record: enable LBR callstack capture jointly with thread stack perf report: dump LBR callstack data by -D jointly with thread stack perf report: prefer dwarf callstacks to LBR ones when captured both tools/perf/builtin-report.c | 2 ++ tools/perf/util/parse-branch-options.c | 1 + tools/perf/util/session.c | 31 ++++++++++++++++---------- 3 files changed, 22 insertions(+), 12 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v1 1/3] perf record: enable LBR callstack capture jointly with thread stack 2019-08-09 15:16 [PATCH v1 0/3] collect LBR callstack together with thread stack data Alexey Budankov @ 2019-08-09 15:23 ` Alexey Budankov 2019-08-23 2:28 ` [tip: perf/core] perf record: Enable " tip-bot2 for Alexey Budankov 2019-08-09 15:26 ` [PATCH v1 2/3] perf report: dump LBR callstack data by -D " Alexey Budankov 2019-08-09 15:31 ` [PATCH v1 3/3] perf report: prefer dwarf callstacks to LBR ones when captured both Alexey Budankov 2 siblings, 1 reply; 7+ messages in thread From: Alexey Budankov @ 2019-08-09 15:23 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Andi Kleen, Kan Liang, Jin, Yao, linux-kernel Enable '-j stack' applicability together with '--call-graph dwarf' option so thread stack data and LBR call stack could be captured jointly: $ perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> --- tools/perf/util/parse-branch-options.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c index 726e8d9e8c54..4ed20c833d44 100644 --- a/tools/perf/util/parse-branch-options.c +++ b/tools/perf/util/parse-branch-options.c @@ -30,6 +30,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("ind_jmp", PERF_SAMPLE_BRANCH_IND_JUMP), BRANCH_OPT("call", PERF_SAMPLE_BRANCH_CALL), BRANCH_OPT("save_type", PERF_SAMPLE_BRANCH_TYPE_SAVE), + BRANCH_OPT("stack", PERF_SAMPLE_BRANCH_CALL_STACK), BRANCH_END }; -- 2.20.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [tip: perf/core] perf record: Enable LBR callstack capture jointly with thread stack 2019-08-09 15:23 ` [PATCH v1 1/3] perf record: enable LBR callstack capture jointly with thread stack Alexey Budankov @ 2019-08-23 2:28 ` tip-bot2 for Alexey Budankov 0 siblings, 0 replies; 7+ messages in thread From: tip-bot2 for Alexey Budankov @ 2019-08-23 2:28 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, Peter Zijlstra, Namhyung Kim, Kan Liang, Jiri Olsa, Jin Yao, Andi Kleen, Alexander Shishkin, Arnaldo Carvalho de Melo, Alexey Budankov The following commit has been merged into the perf/core branch of tip: Commit-ID: 2566349648b40aa3f5edf6af8f7f893ccd6e4eae Gitweb: https://git.kernel.org/tip/2566349648b40aa3f5edf6af8f7f893ccd6e4eae Author: Alexey Budankov <alexey.budankov@linux.intel.com> AuthorDate: Fri, 09 Aug 2019 18:23:58 +03:00 Committer: Arnaldo Carvalho de Melo <acme@redhat.com> CommitterDate: Tue, 20 Aug 2019 12:18:58 -03:00 perf record: Enable LBR callstack capture jointly with thread stack Enable '-j stack' applicability together with '--call-graph dwarf' option so thread stack data and LBR call stack could be captured jointly: $ perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test Collected LBR call stack can be used to augment DWARF call stack calculated from the raw thread stack data and to provide more comprehensive call stack information for cases when collected SIZE is not enough to cover complete thread stack. Such cases are typical for workloads that allocate large arrays of data on its threads stacks or the possible SIZE to collect can't be large enough due to workload nature or system configuration and this is where hardware captured LBR call stacks can provide missing stack frames. Possible DWARF plus LBR call stacks consolidation algorithm description follows. With this patch set perf report command UI currently ignores collected LBR call stack data and still provides DWARF based call stacks information. =========================================================================== Overview: Legend: THS - thread stack CTX - thread register context SWS - software stack SSF - skipped stack frames PSS - Perf sample stack ip,sp,bp - HW registers values d - allocated stack regions kip - ip address in the kernel space K - captured thread stack size THS ----- | |<-stack bottom ... |---| |ip4| |---| PSS = SWS(THS(K)) | | --> | | | |d3 | user/ | |---| user PSS kernel PSS | |ip3| ------ ------ | |---| |SSF | |SSF | | | | .... .... | | | ------ ------ | |d2 | | -1 | | -1 | |---| user ------ ------ K |ip2| CTX |ip3 | |ip3 | |---| |----| |----| | |d1 | ... |ip2 | , |ip2 | | |---| |---| |----| |----| | |ip1| |bp0| |ip1 | |ip1 | | |---| |---| |----| |----| | | | |ip0|->|ip0 | |ip0 |<-user stack top | | | |---| ------ ------ | | |<-|sp0|<-stack |kip0|<-kernel stack bottom --> ----- ----- top |----| |kip1| |----| |kip2| |----| .... | |<-kernel stack top ------ Algorithm details: Legend: HWS - hardware stack K-SWS - kernel software stack BRANCH TABLE HWS ip ip from to ------ ----------- |ip7`| |ip7`| | |----| |----|----| |ip6`| |ip6`| | user PSS |----| |----|----| |ip5`| |ip5`| | ------ |----| |----|----| | -1 | |ip4`| |ip4`| | ------ |----| |----|----| |ip3 |~~~|ip3`| |ip3`| | |----| |----| |----|----| |ip2 |~~~|ip2`| |ip2`| | |----| |----| |----|----| |ip1 |~~~|ip1`| |ip1`|ip0`| |----| |----| ----------- |ip0 |~~~|ip0`|<---------' ------ ------ 1. if (sym(ipj) == sym(ipj`)), j=0-3 ===> user PSS 2. ipj` , j=4-7 ===> user PSS Augmented PSS = A_SWS(SWS(THS(K)), HWS): user/ user PSS kernel PSS ------ ------ |ip7`| |ip7`|<-user PSS bottom |----| |----| |ip6`| |ip6`| |----| |----| HWS |ip5`| |ip5`| |----| |----| |ip4`| |ip4`| ------ ------ |ip3 | |ip3 | |----| |----| SWS |ip2 | |ip2 | |----| |----| |ip1 | |ip1 | |----| |----| |ip0 | |ip0 |<-user PSS top ------ ------ |kip0|<-kernel PSS bottom |----| |kip1| K-SWS |----| |kip2| |----| |kip3|<-kernel PSS top ------ APSS Committer testing: Before: # perf record -g --call-graph dwarf,1024 -j stack,u ls > /dev/null unknown branch filter stack, check man page Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -j, --branch-filter <branch filter mask> branch stack filter modes # perf record -g --call-graph dwarf,1024 -j u ls > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.054 MB perf.data (12 samples) ] # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|BRANCH_STACK|REGS_USER|STACK_USER|DATA_SRC, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY, sample_regs_user: 0xff0fff, sample_stack_user: 1024 # After: # perf record -g --call-graph dwarf,1024 -j stack,u ls > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.044 MB perf.data (11 samples) ] [root@quaco ~]# perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|BRANCH_STACK|REGS_USER|STACK_USER|DATA_SRC, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK, sample_regs_user: 0xff0fff, sample_stack_user: 1024 # Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/e9e00090-66fb-d2a4-c90f-1d12344f7788@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/parse-branch-options.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c index 726e8d9..4ed20c8 100644 --- a/tools/perf/util/parse-branch-options.c +++ b/tools/perf/util/parse-branch-options.c @@ -30,6 +30,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("ind_jmp", PERF_SAMPLE_BRANCH_IND_JUMP), BRANCH_OPT("call", PERF_SAMPLE_BRANCH_CALL), BRANCH_OPT("save_type", PERF_SAMPLE_BRANCH_TYPE_SAVE), + BRANCH_OPT("stack", PERF_SAMPLE_BRANCH_CALL_STACK), BRANCH_END }; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v1 2/3] perf report: dump LBR callstack data by -D jointly with thread stack 2019-08-09 15:16 [PATCH v1 0/3] collect LBR callstack together with thread stack data Alexey Budankov 2019-08-09 15:23 ` [PATCH v1 1/3] perf record: enable LBR callstack capture jointly with thread stack Alexey Budankov @ 2019-08-09 15:26 ` Alexey Budankov 2019-08-23 2:28 ` [tip: perf/core] perf report: Dump " tip-bot2 for Alexey Budankov 2019-08-09 15:31 ` [PATCH v1 3/3] perf report: prefer dwarf callstacks to LBR ones when captured both Alexey Budankov 2 siblings, 1 reply; 7+ messages in thread From: Alexey Budankov @ 2019-08-09 15:26 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Andi Kleen, Kan Liang, Jin, Yao, linux-kernel Make perf report -D command print captured LBR callstack chain when it is collected together with raw thread stack data: 2752673087247083 0x5d10 [0x548]: PERF_RECORD_SAMPLE(IP, 0x4002): 5841/5841: 0x40121f period: 1543862 addr: 0 ... FP chain: nr:0 ... branch callstack: nr:3 ..... 0: 00000000004011d0 ..... 1: 00007f393c388411 ..... 2: 0000000000401098 ... user regs: mask 0xff0fff ABI 64-bit .... AX 0x34e7 .... BX 0x7fff5f6dd3c0 .... CX 0xffffffff .... DX 0x34e6 .... SI 0x7f393c5268d0 .... DI 0x0 .... BP 0x401260 .... SP 0x7fff5f6dd3c0 .... IP 0x40121f .... FLAGS 0x29f .... CS 0x33 .... SS 0x2b .... R8 0x7f393c526800 .... R9 0x7f393c525da0 .... R10 0xfffffffffffff70a .... R11 0x246 .... R12 0x401070 .... R13 0x7fff5f6ddcb0 .... R14 0x0 .... R15 0x0 ... ustack: size 1024, offset 0x130 . data_src: 0x5080021 ... thread: stack_test:5841 ...... dso: /root/abudanko/stacks/stack_test Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> --- tools/perf/util/session.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index b9fe71d11bf6..82e0438a9160 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -1051,23 +1051,30 @@ static void callchain__printf(struct evsel *evsel, i, callchain->ips[i]); } -static void branch_stack__printf(struct perf_sample *sample) +static void branch_stack__printf(struct perf_sample *sample, bool callstack) { uint64_t i; - printf("... branch stack: nr:%" PRIu64 "\n", sample->branch_stack->nr); + printf("%s: nr:%" PRIu64 "\n", + !callstack ? "... branch stack" : "... branch callstack", + sample->branch_stack->nr); for (i = 0; i < sample->branch_stack->nr; i++) { struct branch_entry *e = &sample->branch_stack->entries[i]; - printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n", - i, e->from, e->to, - (unsigned short)e->flags.cycles, - e->flags.mispred ? "M" : " ", - e->flags.predicted ? "P" : " ", - e->flags.abort ? "A" : " ", - e->flags.in_tx ? "T" : " ", - (unsigned)e->flags.reserved); + if (!callstack) { + printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n", + i, e->from, e->to, + (unsigned short)e->flags.cycles, + e->flags.mispred ? "M" : " ", + e->flags.predicted ? "P" : " ", + e->flags.abort ? "A" : " ", + e->flags.in_tx ? "T" : " ", + (unsigned)e->flags.reserved); + } else { + printf("..... %2"PRIu64": %016" PRIx64 "\n", + i, i > 0 ? e->from : e->to); + } } } @@ -1217,8 +1224,8 @@ static void dump_sample(struct evsel *evsel, union perf_event *event, if (evsel__has_callchain(evsel)) callchain__printf(evsel, sample); - if ((sample_type & PERF_SAMPLE_BRANCH_STACK) && !perf_evsel__has_branch_callstack(evsel)) - branch_stack__printf(sample); + if (sample_type & PERF_SAMPLE_BRANCH_STACK) + branch_stack__printf(sample, perf_evsel__has_branch_callstack(evsel)); if (sample_type & PERF_SAMPLE_REGS_USER) regs_user__printf(sample); -- 2.20.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [tip: perf/core] perf report: Dump LBR callstack data by -D jointly with thread stack 2019-08-09 15:26 ` [PATCH v1 2/3] perf report: dump LBR callstack data by -D " Alexey Budankov @ 2019-08-23 2:28 ` tip-bot2 for Alexey Budankov 0 siblings, 0 replies; 7+ messages in thread From: tip-bot2 for Alexey Budankov @ 2019-08-23 2:28 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, Peter Zijlstra, Namhyung Kim, Kan Liang, Jiri Olsa, Jin Yao, Andi Kleen, Alexander Shishkin, Arnaldo Carvalho de Melo, Alexey Budankov The following commit has been merged into the perf/core branch of tip: Commit-ID: d2720c3dad58def723f9617f7cf2a48c752ef50a Gitweb: https://git.kernel.org/tip/d2720c3dad58def723f9617f7cf2a48c752ef50a Author: Alexey Budankov <alexey.budankov@linux.intel.com> AuthorDate: Fri, 09 Aug 2019 18:26:30 +03:00 Committer: Arnaldo Carvalho de Melo <acme@redhat.com> CommitterDate: Tue, 20 Aug 2019 12:19:44 -03:00 perf report: Dump LBR callstack data by -D jointly with thread stack Make perf report -D command print captured LBR callstack chain when it is collected together with raw thread stack data: 2752673087247083 0x5d10 [0x548]: PERF_RECORD_SAMPLE(IP, 0x4002): 5841/5841: 0x40121f period: 1543862 addr: 0 ... FP chain: nr:0 ... branch callstack: nr:3 ..... 0: 00000000004011d0 ..... 1: 00007f393c388411 ..... 2: 0000000000401098 ... user regs: mask 0xff0fff ABI 64-bit .... AX 0x34e7 .... BX 0x7fff5f6dd3c0 .... CX 0xffffffff .... DX 0x34e6 .... SI 0x7f393c5268d0 .... DI 0x0 .... BP 0x401260 .... SP 0x7fff5f6dd3c0 .... IP 0x40121f .... FLAGS 0x29f .... CS 0x33 .... SS 0x2b .... R8 0x7f393c526800 .... R9 0x7f393c525da0 .... R10 0xfffffffffffff70a .... R11 0x246 .... R12 0x401070 .... R13 0x7fff5f6ddcb0 .... R14 0x0 .... R15 0x0 ... ustack: size 1024, offset 0x130 . data_src: 0x5080021 ... thread: stack_test:5841 ...... dso: /root/abudanko/stacks/stack_test Committer testing: # perf record -g --call-graph dwarf,1024 -j stack,u ls > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.042 MB perf.data (10 samples) ] # Before: # perf report -D |& grep PERF_RECORD_SAMPLE -A28 | tail -29 67538909824483 0xa7a0 [0x560]: PERF_RECORD_SAMPLE(IP, 0x4002): 9721/9721: 0x7f441b2b1e20 period: 1376095 addr: 0 ... FP chain: nr:0 ... user regs: mask 0xff0fff ABI 64-bit .... AX 0x7f441b2b1000 .... BX 0x7f441b55b970 .... CX 0x7fff6e2db218 .... DX 0x7fff6e2db218 .... SI 0x7fff6e2db208 .... DI 0x1 .... BP 0x1 .... SP 0x7fff6e2db178 .... IP 0x7f441b2b1e20 .... FLAGS 0x20a .... CS 0x33 .... SS 0x2b .... R8 0x1 .... R9 0x7f441b371c18 .... R10 0x7f441b5a5f10 .... R11 0x202 .... R12 0x7fff6e2db208 .... R13 0x7fff6e2db218 .... R14 0x7f441b5a7150 .... R15 0x0 ... ustack: size 1024, offset 0x148 . data_src: 0x5080021 ... thread: ls:9721 ...... dso: /usr/lib64/libpthread-2.29.so 0xad00 [0x60]: event: 10 # After: # perf report -D |& grep PERF_RECORD_SAMPLE -A31 | tail -32 67538909824483 0xa7a0 [0x560]: PERF_RECORD_SAMPLE(IP, 0x4002): 9721/9721: 0x7f441b2b1e20 period: 1376095 addr: 0 ... FP chain: nr:0 ... branch callstack: nr:4 ..... 0: 00007f441b2b1e20 ..... 1: 00007f441b58af1a ..... 2: 00007f441b58b0e1 ..... 3: 00007f441b57c145 ... user regs: mask 0xff0fff ABI 64-bit .... AX 0x7f441b2b1000 .... BX 0x7f441b55b970 .... CX 0x7fff6e2db218 .... DX 0x7fff6e2db218 .... SI 0x7fff6e2db208 .... DI 0x1 .... BP 0x1 .... SP 0x7fff6e2db178 .... IP 0x7f441b2b1e20 .... FLAGS 0x20a .... CS 0x33 .... SS 0x2b .... R8 0x1 .... R9 0x7f441b371c18 .... R10 0x7f441b5a5f10 .... R11 0x202 .... R12 0x7fff6e2db208 .... R13 0x7fff6e2db218 .... R14 0x7f441b5a7150 .... R15 0x0 ... ustack: size 1024, offset 0x148 . data_src: 0x5080021 ... thread: ls:9721 ...... dso: /usr/lib64/libpthread-2.29.so # Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/aa82e5dd-def2-0ca8-a064-db9e2e8ad076@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/util/session.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index b9fe71d..82e0438 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -1051,23 +1051,30 @@ static void callchain__printf(struct evsel *evsel, i, callchain->ips[i]); } -static void branch_stack__printf(struct perf_sample *sample) +static void branch_stack__printf(struct perf_sample *sample, bool callstack) { uint64_t i; - printf("... branch stack: nr:%" PRIu64 "\n", sample->branch_stack->nr); + printf("%s: nr:%" PRIu64 "\n", + !callstack ? "... branch stack" : "... branch callstack", + sample->branch_stack->nr); for (i = 0; i < sample->branch_stack->nr; i++) { struct branch_entry *e = &sample->branch_stack->entries[i]; - printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n", - i, e->from, e->to, - (unsigned short)e->flags.cycles, - e->flags.mispred ? "M" : " ", - e->flags.predicted ? "P" : " ", - e->flags.abort ? "A" : " ", - e->flags.in_tx ? "T" : " ", - (unsigned)e->flags.reserved); + if (!callstack) { + printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n", + i, e->from, e->to, + (unsigned short)e->flags.cycles, + e->flags.mispred ? "M" : " ", + e->flags.predicted ? "P" : " ", + e->flags.abort ? "A" : " ", + e->flags.in_tx ? "T" : " ", + (unsigned)e->flags.reserved); + } else { + printf("..... %2"PRIu64": %016" PRIx64 "\n", + i, i > 0 ? e->from : e->to); + } } } @@ -1217,8 +1224,8 @@ static void dump_sample(struct evsel *evsel, union perf_event *event, if (evsel__has_callchain(evsel)) callchain__printf(evsel, sample); - if ((sample_type & PERF_SAMPLE_BRANCH_STACK) && !perf_evsel__has_branch_callstack(evsel)) - branch_stack__printf(sample); + if (sample_type & PERF_SAMPLE_BRANCH_STACK) + branch_stack__printf(sample, perf_evsel__has_branch_callstack(evsel)); if (sample_type & PERF_SAMPLE_REGS_USER) regs_user__printf(sample); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v1 3/3] perf report: prefer dwarf callstacks to LBR ones when captured both 2019-08-09 15:16 [PATCH v1 0/3] collect LBR callstack together with thread stack data Alexey Budankov 2019-08-09 15:23 ` [PATCH v1 1/3] perf record: enable LBR callstack capture jointly with thread stack Alexey Budankov 2019-08-09 15:26 ` [PATCH v1 2/3] perf report: dump LBR callstack data by -D " Alexey Budankov @ 2019-08-09 15:31 ` Alexey Budankov 2019-08-23 2:28 ` [tip: perf/core] perf report: Prefer DWARF " tip-bot2 for Alexey Budankov 2 siblings, 1 reply; 7+ messages in thread From: Alexey Budankov @ 2019-08-09 15:31 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Andi Kleen, Kan Liang, Jin, Yao, linux-kernel Display dwarf based callchains when Perf trace contains as raw thread stack data as LBR callstack data. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> --- tools/perf/builtin-report.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index d4288dcce156..5ecac3a1ed5b 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1271,6 +1271,8 @@ int cmd_report(int argc, const char **argv) has_br_stack = perf_header__has_feat(&session->header, HEADER_BRANCH_STACK); + if (perf_evlist__combined_sample_type(session->evlist) & PERF_SAMPLE_STACK_USER) + has_br_stack = false; setup_forced_leader(&report, session->evlist); -- 2.20.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [tip: perf/core] perf report: Prefer DWARF callstacks to LBR ones when captured both 2019-08-09 15:31 ` [PATCH v1 3/3] perf report: prefer dwarf callstacks to LBR ones when captured both Alexey Budankov @ 2019-08-23 2:28 ` tip-bot2 for Alexey Budankov 0 siblings, 0 replies; 7+ messages in thread From: tip-bot2 for Alexey Budankov @ 2019-08-23 2:28 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, Peter Zijlstra, Namhyung Kim, Kan Liang, Jiri Olsa, Jin Yao, Andi Kleen, Alexander Shishkin, Arnaldo Carvalho de Melo, Alexey Budankov The following commit has been merged into the perf/core branch of tip: Commit-ID: 10ccbc1cc0b8a05a5c8491630d36d1e2672036c1 Gitweb: https://git.kernel.org/tip/10ccbc1cc0b8a05a5c8491630d36d1e2672036c1 Author: Alexey Budankov <alexey.budankov@linux.intel.com> AuthorDate: Fri, 09 Aug 2019 18:31:28 +03:00 Committer: Arnaldo Carvalho de Melo <acme@redhat.com> CommitterDate: Tue, 20 Aug 2019 12:20:16 -03:00 perf report: Prefer DWARF callstacks to LBR ones when captured both Display DWARF based callchains when the perf.data file contains raw thread stack data as LBR callstack data. Commiter testing: This changes the output from the branch stack based one, i.e. without this patch, for the same file as in the previous csets: # perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 13 of event 'cycles' # Event count (approx.): 13 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ........................... ......................................... .................. # 7.69% ls libpthread-2.29.so [.] _init [.] __pthread_initialize_minimal_internal 6827 7.69% ls ld-2.29.so [k] _start [k] _dl_start - 7.69% ls ld-2.29.so [.] _dl_start_user [.] _dl_init -24790 7.69% ls ld-2.29.so [k] _dl_start [k] _dl_sysdep_start 278 7.69% ls ld-2.29.so [k] dl_main [k] _dl_map_object_deps 15581 7.69% ls ld-2.29.so [k] open_verify.constprop.0 [k] lseek64 4228 7.69% ls ld-2.29.so [k] _dl_map_object [k] open_verify.constprop.0 55 7.69% ls ld-2.29.so [k] openaux [k] _dl_map_object 67 7.69% ls ld-2.29.so [k] _dl_map_object_deps [k] 0x00007f441b57c090 112 7.69% ls ld-2.29.so [.] call_init.part.0 [.] _init 334 7.69% ls ld-2.29.so [.] _dl_init [.] call_init.part.0 383 7.69% ls ld-2.29.so [k] _dl_sysdep_start [k] dl_main 45 7.69% ls ld-2.29.so [k] _dl_catch_exception [k] openaux 116 # # (Tip: For memory address profiling, try: perf mem record / perf mem report) # To the one that shows call chains: # perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 10 of event 'cycles' # Event count (approx.): 3204047 # # Children Self Command Shared Object Symbol # ........ ........ ....... .................. ......................................... # 55.01% 0.00% ls [kernel.vmlinux] [k] entry_SYSCALL_64_after_hwframe | ---entry_SYSCALL_64_after_hwframe do_syscall_64 | --16.01%--__x64_sys_execve __do_execve_file.isra.0 search_binary_handler load_elf_binary elf_map vm_mmap_pgoff do_mmap mmap_region perf_event_mmap perf_iterate_sb perf_iterate_ctx perf_event_mmap_output perf_output_copy memcpy_erms 55.01% 39.00% ls [kernel.vmlinux] [k] do_syscall_64 | |--39.00%--0xffffffffffffffff | _dl_map_object | open_verify.constprop.0 | __lseek64 (inlined) | entry_SYSCALL_64_after_hwframe | do_syscall_64 | --16.01%--do_syscall_64 __x64_sys_execve __do_execve_file.isra.0 search_binary_handler load_elf_binary elf_map vm_mmap_pgoff do_mmap mmap_region perf_event_mmap perf_iterate_sb perf_iterate_ctx perf_event_mmap_output perf_output_copy memcpy_erms 42.95% 42.95% ls libpthread-2.29.so [.] __pthread_initialize_minimal_internal | ---_init __pthread_initialize_minimal_internal 42.95% 0.00% ls libpthread-2.29.so [.] _init | ---_init __pthread_initialize_minimal_internal <SNIP> # # (Tip: Profiling branch (mis)predictions with: perf record -b / perf report) # # The branch stack view be explicitely selected using: # perf report -h branch-stack Usage: perf report [<options>] -b, --branch-stack use branch records for per branch histogram filling # I.e. after this patch: # perf report -b --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 13 of event 'cycles' # Event count (approx.): 13 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ........................... ......................................... .................. # 7.69% ls libpthread-2.29.so [.] _init [.] __pthread_initialize_minimal_internal 6827 7.69% ls ld-2.29.so [k] _start [k] _dl_start - 7.69% ls ld-2.29.so [.] _dl_start_user [.] _dl_init -24790 7.69% ls ld-2.29.so [k] _dl_start [k] _dl_sysdep_start 278 7.69% ls ld-2.29.so [k] dl_main [k] _dl_map_object_deps 15581 7.69% ls ld-2.29.so [k] open_verify.constprop.0 [k] lseek64 4228 7.69% ls ld-2.29.so [k] _dl_map_object [k] open_verify.constprop.0 55 7.69% ls ld-2.29.so [k] openaux [k] _dl_map_object 67 7.69% ls ld-2.29.so [k] _dl_map_object_deps [k] 0x00007f441b57c090 112 7.69% ls ld-2.29.so [.] call_init.part.0 [.] _init 334 7.69% ls ld-2.29.so [.] _dl_init [.] call_init.part.0 383 7.69% ls ld-2.29.so [k] _dl_sysdep_start [k] dl_main 45 7.69% ls ld-2.29.so [k] _dl_catch_exception [k] openaux 116 # # (Tip: Show current config key-value pairs: perf config --list) # # Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/ccbd9583-82f4-dec5-7e84-64bf56e351fb@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/builtin-report.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 5e003d0..79dfb11 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1281,6 +1281,8 @@ int cmd_report(int argc, const char **argv) has_br_stack = perf_header__has_feat(&session->header, HEADER_BRANCH_STACK); + if (perf_evlist__combined_sample_type(session->evlist) & PERF_SAMPLE_STACK_USER) + has_br_stack = false; setup_forced_leader(&report, session->evlist); ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-08-23 2:28 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-09 15:16 [PATCH v1 0/3] collect LBR callstack together with thread stack data Alexey Budankov 2019-08-09 15:23 ` [PATCH v1 1/3] perf record: enable LBR callstack capture jointly with thread stack Alexey Budankov 2019-08-23 2:28 ` [tip: perf/core] perf record: Enable " tip-bot2 for Alexey Budankov 2019-08-09 15:26 ` [PATCH v1 2/3] perf report: dump LBR callstack data by -D " Alexey Budankov 2019-08-23 2:28 ` [tip: perf/core] perf report: Dump " tip-bot2 for Alexey Budankov 2019-08-09 15:31 ` [PATCH v1 3/3] perf report: prefer dwarf callstacks to LBR ones when captured both Alexey Budankov 2019-08-23 2:28 ` [tip: perf/core] perf report: Prefer DWARF " tip-bot2 for Alexey Budankov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).