linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Clark Williams <williams@redhat.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Alexey Budankov <alexey.budankov@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Jin Yao <yao.jin@linux.intel.com>, Jiri Olsa <jolsa@redhat.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: [PATCH 08/17] perf record: Enable LBR callstack capture jointly with thread stack
Date: Tue, 20 Aug 2019 16:27:24 -0300	[thread overview]
Message-ID: <20190820192733.19180-9-acme@kernel.org> (raw)
In-Reply-To: <20190820192733.19180-1-acme@kernel.org>

From: Alexey Budankov <alexey.budankov@linux.intel.com>

Enable '-j stack' applicability together with '--call-graph dwarf'
option so thread stack data and LBR call stack could be captured
jointly:

  $ perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test

Collected LBR call stack can be used to augment DWARF call stack
calculated from the raw thread stack data and to provide more
comprehensive call stack information for cases when collected SIZE is
not enough to cover complete thread stack.

Such cases are typical for workloads that allocate large arrays of data
on its threads stacks or the possible SIZE to collect can't be large
enough due to workload nature or system configuration and this is where
hardware captured LBR call stacks can provide missing stack frames.
Possible DWARF plus LBR call stacks consolidation algorithm description
follows.

With this patch set perf report command UI currently ignores collected
LBR call stack data and still provides DWARF based call stacks
information.

  ===========================================================================

  Overview:

   Legend:

   THS - thread stack
   CTX - thread register context
   SWS - software stack
   SSF - skipped stack frames
   PSS - Perf sample stack

   ip,sp,bp - HW registers values
   d        - allocated stack regions
   kip      - ip address in the kernel space
   K        - captured thread stack size

        THS

       -----
       |   |<-stack bottom
        ...
       |---|
       |ip4|
       |---|         PSS = SWS(THS(K))
       |   |
   --> |   |
   |   |d3 |                  user/
   |   |---|         user PSS kernel PSS
   |   |ip3|         ------   ------
   |   |---|         |SSF |   |SSF |
   |   |   |          ....     ....
   |   |   |         ------   ------
   |   |d2 |         | -1 |   | -1 |
       |---|   user  ------   ------
   K   |ip2|   CTX   |ip3 |   |ip3 |
       |---|         |----|   |----|
   |   |d1 |   ...   |ip2 | , |ip2 |
   |   |---|  |---|  |----|   |----|
   |   |ip1|  |bp0|  |ip1 |   |ip1 |
   |   |---|  |---|  |----|   |----|
   |   |   |  |ip0|->|ip0 |   |ip0 |<-user stack top
   |   |   |  |---|  ------   ------
   |   |   |<-|sp0|<-stack    |kip0|<-kernel stack bottom
   --> -----  -----   top     |----|
                              |kip1|
                              |----|
		              |kip2|
		              |----|
                               ....
			      |    |<-kernel stack top
                              ------

  Algorithm details:

   Legend:

   HWS - hardware stack
   K-SWS - kernel software stack

			 BRANCH
			 TABLE

		 HWS      ip   ip
			  from to
		 ------  -----------
		 |ip7`|  |ip7`|    |
		 |----|  |----|----|
		 |ip6`|  |ip6`|    |
	user PSS |----|  |----|----|
		 |ip5`|  |ip5`|    |
	------   |----|  |----|----|
	| -1 |   |ip4`|  |ip4`|    |
	------   |----|  |----|----|
	|ip3 |~~~|ip3`|  |ip3`|    |
	|----|   |----|  |----|----|
	|ip2 |~~~|ip2`|  |ip2`|    |
	|----| 	 |----|  |----|----|
	|ip1 |~~~|ip1`|  |ip1`|ip0`|
	|----| 	 |----|  -----------
	|ip0 |~~~|ip0`|<---------'
	------   ------

	1. if (sym(ipj) == sym(ipj`)), j=0-3 ===> user PSS
	2. ipj`                      , j=4-7 ===> user PSS

  Augmented PSS = A_SWS(SWS(THS(K)), HWS):

	         user/
       user PSS  kernel PSS

	------   ------
	|ip7`|   |ip7`|<-user PSS bottom
	|----|   |----|
	|ip6`|   |ip6`|
	|----|   |----|
    HWS	|ip5`|   |ip5`|
	|----|   |----|
	|ip4`|   |ip4`|
	------   ------
	|ip3 |   |ip3 |
	|----|   |----|
    SWS |ip2 |   |ip2 |
	|----|   |----|
	|ip1 |   |ip1 |
	|----|   |----|
	|ip0 |   |ip0 |<-user PSS top
	------   ------
		 |kip0|<-kernel PSS bottom
		 |----|
		 |kip1|
	   K-SWS |----|
		 |kip2|
		 |----|
		 |kip3|<-kernel PSS top
		 ------

                  APSS

Committer testing:

Before:

  # perf record -g --call-graph dwarf,1024 -j stack,u ls > /dev/null
  unknown branch filter stack, check man page

   Usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

      -j, --branch-filter <branch filter mask>
                            branch stack filter modes
  # perf record -g --call-graph dwarf,1024 -j u ls > /dev/null
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.054 MB perf.data (12 samples) ]
  # perf evlist -v
  cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|BRANCH_STACK|REGS_USER|STACK_USER|DATA_SRC, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY, sample_regs_user: 0xff0fff, sample_stack_user: 1024
   #

After:

  # perf record -g --call-graph dwarf,1024 -j stack,u ls > /dev/null
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.044 MB perf.data (11 samples) ]
  [root@quaco ~]# perf evlist -v
  cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|BRANCH_STACK|REGS_USER|STACK_USER|DATA_SRC, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK, sample_regs_user: 0xff0fff, sample_stack_user: 1024
  #

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/e9e00090-66fb-d2a4-c90f-1d12344f7788@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/parse-branch-options.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c
index 726e8d9e8c54..4ed20c833d44 100644
--- a/tools/perf/util/parse-branch-options.c
+++ b/tools/perf/util/parse-branch-options.c
@@ -30,6 +30,7 @@ static const struct branch_mode branch_modes[] = {
 	BRANCH_OPT("ind_jmp", PERF_SAMPLE_BRANCH_IND_JUMP),
 	BRANCH_OPT("call", PERF_SAMPLE_BRANCH_CALL),
 	BRANCH_OPT("save_type", PERF_SAMPLE_BRANCH_TYPE_SAVE),
+	BRANCH_OPT("stack", PERF_SAMPLE_BRANCH_CALL_STACK),
 	BRANCH_END
 };
 
-- 
2.21.0


  parent reply	other threads:[~2019-08-20 19:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-20 19:27 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 01/17] tools headers: Add limits.h to access __WORDSIZE Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 02/17] perf tools: tools/include should come before tools/uapi/include Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 03/17] tools headers: Grab copy of linux/const.h, needed by linux/bits.h Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 04/17] tools headers: Synchronize linux/bits.h with the kernel sources Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 05/17] tools arch x86: Sync asm/cpufeatures.h with the with the kernel Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 06/17] perf evsel: Add comment for 'idx' member in 'struct perf_sample_id Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 07/17] tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file Arnaldo Carvalho de Melo
2019-08-20 19:27 ` Arnaldo Carvalho de Melo [this message]
2019-08-20 19:27 ` [PATCH 09/17] perf report: Dump LBR callstack data by -D jointly with thread stack Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 10/17] perf report: Prefer DWARF callstacks to LBR ones when captured both Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 11/17] perf cs-etm: Support sample flags 'insn' and 'insnlen' Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 12/17] perf ui: Make 'exit_msg' optional in ui__question_window() Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 13/17] perf ui: Introduce non-interactive ui__info_window() function Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 14/17] perf ui browser: Allow specifying message to show when no samples are available to display Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 15/17] perf top: Show info message while collecting samples Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 16/17] tools headers: Fixup bitsperlong per arch includes Arnaldo Carvalho de Melo
2019-08-20 19:27 ` [PATCH 17/17] libperf: Fix arch include paths Arnaldo Carvalho de Melo
2019-08-20 19:39 ` [GIT PULL] perf/core improvements and fixes Ingo Molnar
2019-08-20 19:44   ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190820192733.19180-9-acme@kernel.org \
    --to=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexey.budankov@linux.intel.com \
    --cc=jolsa@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=williams@redhat.com \
    --cc=yao.jin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).