From: Alexey Budankov <alexey.budankov@linux.intel.com>
To: Arnaldo Carvalho de Melo <acme@kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Andi Kleen <ak@linux.intel.com>,
Kan Liang <kan.liang@linux.intel.com>,
"Jin, Yao" <yao.jin@linux.intel.com>
Subject: [PATCH v1 0/3] collect LBR callstack together with thread stack data
Date: Fri, 9 Aug 2019 18:16:02 +0300 [thread overview]
Message-ID: <ec5fe6b1-a116-fb60-42c6-dc8a9dedfc15@linux.intel.com> (raw)
The patch set unblocks collection of LBR call stack data simultaneously with
raw thread stack data by --call-graph dwarf,SIZE option:
$perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test
Collected LBR call stack can be used to augment dwarf call stack calculated
from the raw thread stack data and to provide more comprehensive call stack
information for cases when collected SIZE is not enough to cover complete
thread stack.
Such cases are typical for workloads that allocate large arrays of data on
its threads stacks or the possible SIZE to collect can't be large enough due
to workload nature or system configuration and this is where hardware
captured LBR call stacks can provide missing stack frames. Possible dwarf plus
LBR call stacks consolidation algorithm description follows.
With this patch set perf report command UI currently ignores collected LBR
call stack data and still provides dwarf based call stacks information.
===========================================================================
Overview:
Legend:
THS - thread stack
CTX - thread register context
SWS - software stack
SSF - skipped stack frames
PSS - Perf sample stack
ip,sp,bp - HW registers values
d - allocated stack regions
kip - ip address in the kernel space
K - captured thread stack size
THS
-----
| |<-stack bottom
...
|---|
|ip4|
|---| PSS = SWS(THS(K))
| |
--> | |
| |d3 | user/
| |---| user PSS kernel PSS
| |ip3| ------ ------
| |---| |SSF | |SSF |
| | | .... ....
| | | ------ ------
| |d2 | | -1 | | -1 |
|---| user ------ ------
K |ip2| CTX |ip3 | |ip3 |
|---| |----| |----|
| |d1 | ... |ip2 | , |ip2 |
| |---| |---| |----| |----|
| |ip1| |bp0| |ip1 | |ip1 |
| |---| |---| |----| |----|
| | | |ip0|->|ip0 | |ip0 |<-user stack top
| | | |---| ------ ------
| | |<-|sp0|<-stack |kip0|<-kernel stack bottom
--> ----- ----- top |----|
|kip1|
|----|
|kip2|
|----|
....
| |<-kernel stack top
------
Algorithm details:
Legend:
HWS - hardware stack
K-SWS - kernel software stack
BRANCH
TABLE
HWS ip ip
from to
------ -----------
|ip7`| |ip7`| |
|----| |----|----|
|ip6`| |ip6`| |
user PSS |----| |----|----|
|ip5`| |ip5`| |
------ |----| |----|----|
| -1 | |ip4`| |ip4`| |
------ |----| |----|----|
|ip3 |~~~|ip3`| |ip3`| |
|----| |----| |----|----|
|ip2 |~~~|ip2`| |ip2`| |
|----| |----| |----|----|
|ip1 |~~~|ip1`| |ip1`|ip0`|
|----| |----| -----------
|ip0 |~~~|ip0`|<---------'
------ ------
1. if (sym(ipj) == sym(ipj`)), j=0-3 ===> user PSS
2. ipj` , j=4-7 ===> user PSS
Augmented PSS = A_SWS(SWS(THS(K)), HWS):
user/
user PSS kernel PSS
------ ------
|ip7`| |ip7`|<-user PSS bottom
|----| |----|
|ip6`| |ip6`|
|----| |----|
HWS |ip5`| |ip5`|
|----| |----|
|ip4`| |ip4`|
------ ------
|ip3 | |ip3 |
|----| |----|
SWS |ip2 | |ip2 |
|----| |----|
|ip1 | |ip1 |
|----| |----|
|ip0 | |ip0 |<-user PSS top
------ ------
|kip0|<-kernel PSS bottom
|----|
|kip1|
K-SWS |----|
|kip2|
|----|
|kip3|<-kernel PSS top
------
APSS
===========================================================================
---
Alexey Budankov (3):
perf record: enable LBR callstack capture jointly with thread stack
perf report: dump LBR callstack data by -D jointly with thread stack
perf report: prefer dwarf callstacks to LBR ones when captured both
tools/perf/builtin-report.c | 2 ++
tools/perf/util/parse-branch-options.c | 1 +
tools/perf/util/session.c | 31 ++++++++++++++++----------
3 files changed, 22 insertions(+), 12 deletions(-)
--
2.20.1
next reply other threads:[~2019-08-09 15:16 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-09 15:16 Alexey Budankov [this message]
2019-08-09 15:23 ` [PATCH v1 1/3] perf record: enable LBR callstack capture jointly with thread stack Alexey Budankov
2019-08-23 2:28 ` [tip: perf/core] perf record: Enable " tip-bot2 for Alexey Budankov
2019-08-09 15:26 ` [PATCH v1 2/3] perf report: dump LBR callstack data by -D " Alexey Budankov
2019-08-23 2:28 ` [tip: perf/core] perf report: Dump " tip-bot2 for Alexey Budankov
2019-08-09 15:31 ` [PATCH v1 3/3] perf report: prefer dwarf callstacks to LBR ones when captured both Alexey Budankov
2019-08-23 2:28 ` [tip: perf/core] perf report: Prefer DWARF " tip-bot2 for Alexey Budankov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ec5fe6b1-a116-fb60-42c6-dc8a9dedfc15@linux.intel.com \
--to=alexey.budankov@linux.intel.com \
--cc=acme@kernel.org \
--cc=ak@linux.intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=jolsa@redhat.com \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=yao.jin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.