linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] collect LBR callstack together with thread stack data
@ 2019-08-09 15:16 Alexey Budankov
  2019-08-09 15:23 ` [PATCH v1 1/3] perf record: enable LBR callstack capture jointly with thread stack Alexey Budankov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Alexey Budankov @ 2019-08-09 15:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, linux-kernel
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra,
	Ingo Molnar, Andi Kleen, Kan Liang, Jin, Yao


The patch set unblocks collection of LBR call stack data simultaneously with
raw thread stack data by --call-graph dwarf,SIZE option:

  $perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test

Collected LBR call stack can be used to augment dwarf call stack calculated 
from the raw thread stack data and to provide more comprehensive call stack 
information for cases when collected SIZE is not enough to cover complete 
thread stack.

Such cases are typical for workloads that allocate large arrays of data on 
its threads stacks or the possible SIZE to collect can't be large enough due 
to workload nature or system configuration and this is where hardware 
captured LBR call stacks can provide missing stack frames. Possible dwarf plus 
LBR call stacks consolidation algorithm description follows.

With this patch set perf report command UI currently ignores collected LBR 
call stack data and still provides dwarf based call stacks information.

===========================================================================

Overview:

   Legend:

   THS - thread stack
   CTX - thread register context
   SWS - software stack
   SSF - skipped stack frames
   PSS - Perf sample stack

   ip,sp,bp - HW registers values
   d        - allocated stack regions
   kip      - ip address in the kernel space
   K        - captured thread stack size

        THS

       -----
       |   |<-stack bottom
        ... 
       |---|
       |ip4|
       |---|         PSS = SWS(THS(K))
       |   |
   --> |   |
   |   |d3 |                  user/
   |   |---|         user PSS kernel PSS
   |   |ip3|         ------   ------
   |   |---|         |SSF |   |SSF |
   |   |   |          ....     ....
   |   |   |         ------   ------
   |   |d2 |         | -1 |   | -1 |
       |---|   user  ------   ------
   K   |ip2|   CTX   |ip3 |   |ip3 |
       |---|         |----|   |----|
   |   |d1 |   ...   |ip2 | , |ip2 |
   |   |---|  |---|  |----|   |----|
   |   |ip1|  |bp0|  |ip1 |   |ip1 |
   |   |---|  |---|  |----|   |----|
   |   |   |  |ip0|->|ip0 |   |ip0 |<-user stack top
   |   |   |  |---|  ------   ------
   |   |   |<-|sp0|<-stack    |kip0|<-kernel stack bottom
   --> -----  -----   top     |----|
                              |kip1|
                              |----|
		              |kip2|
		              |----|
                               ....
			      |    |<-kernel stack top
                              ------

Algorithm details:

   Legend:

   HWS - hardware stack
   K-SWS - kernel software stack

			 BRANCH
			 TABLE

		 HWS      ip   ip
			  from to
		 ------  -----------
		 |ip7`|  |ip7`|    |
		 |----|  |----|----|
		 |ip6`|  |ip6`|    |
	user PSS |----|  |----|----|
		 |ip5`|  |ip5`|    |
	------   |----|  |----|----|
	| -1 |   |ip4`|  |ip4`|    |
	------   |----|  |----|----|
	|ip3 |~~~|ip3`|  |ip3`|    |
	|----|   |----|  |----|----|
	|ip2 |~~~|ip2`|  |ip2`|    |
	|----| 	 |----|  |----|----|
	|ip1 |~~~|ip1`|  |ip1`|ip0`|
	|----| 	 |----|  -----------
	|ip0 |~~~|ip0`|<---------'
	------   ------

	1. if (sym(ipj) == sym(ipj`)), j=0-3 ===> user PSS
	2. ipj`                      , j=4-7 ===> user PSS

Augmented PSS = A_SWS(SWS(THS(K)), HWS):

	         user/
       user PSS  kernel PSS

	------   ------
	|ip7`|   |ip7`|<-user PSS bottom
	|----|   |----|
	|ip6`|   |ip6`|
	|----|   |----|
    HWS	|ip5`|   |ip5`|
	|----|   |----|
	|ip4`|   |ip4`|
	------   ------
	|ip3 |   |ip3 |
	|----|   |----|
    SWS |ip2 |   |ip2 |
	|----|   |----|
	|ip1 |   |ip1 |
	|----|   |----|
	|ip0 |   |ip0 |<-user PSS top
	------   ------
		 |kip0|<-kernel PSS bottom
		 |----|
		 |kip1|
	   K-SWS |----|
		 |kip2|
		 |----|
		 |kip3|<-kernel PSS top
		 ------

                  APSS

===========================================================================

---
Alexey Budankov (3):
  perf record: enable LBR callstack capture jointly with thread stack
  perf report: dump LBR callstack data by -D jointly with thread stack
  perf report: prefer dwarf callstacks to LBR ones when captured both

 tools/perf/builtin-report.c            |  2 ++
 tools/perf/util/parse-branch-options.c |  1 +
 tools/perf/util/session.c              | 31 ++++++++++++++++----------
 3 files changed, 22 insertions(+), 12 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-23  2:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09 15:16 [PATCH v1 0/3] collect LBR callstack together with thread stack data Alexey Budankov
2019-08-09 15:23 ` [PATCH v1 1/3] perf record: enable LBR callstack capture jointly with thread stack Alexey Budankov
2019-08-23  2:28   ` [tip: perf/core] perf record: Enable " tip-bot2 for Alexey Budankov
2019-08-09 15:26 ` [PATCH v1 2/3] perf report: dump LBR callstack data by -D " Alexey Budankov
2019-08-23  2:28   ` [tip: perf/core] perf report: Dump " tip-bot2 for Alexey Budankov
2019-08-09 15:31 ` [PATCH v1 3/3] perf report: prefer dwarf callstacks to LBR ones when captured both Alexey Budankov
2019-08-23  2:28   ` [tip: perf/core] perf report: Prefer DWARF " tip-bot2 for Alexey Budankov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).