linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/6] perf thread-stack: Improve thread_stack__no_call_return()
Date: Fri, 22 Feb 2019 11:39:00 -0300	[thread overview]
Message-ID: <20190222143900.GA13225@kernel.org> (raw)
In-Reply-To: <ef029200-4eb7-3fb5-fcb2-64f5b129a6bf@intel.com>

Em Fri, Feb 22, 2019 at 11:48:51AM +0200, Adrian Hunter escreveu:
> On 9/01/19 11:18 AM, Adrian Hunter wrote:
> > Improve thread_stack__no_call_return() to better handle 'returns' that do
> > not match the stack i.e. 'no call'. See code comments for details.
> 
> This patch and patch 6 of the series do not seem to have been applied.
> Can they be applied?

I think there was some python3 oddity and I ended up letting this fell
thru the cracks, now I applied and tested it, see my results below, ints
in my perf/core branch now, will go together with your new batch in the
next pull req to Ingo,

Thanks,

- Arnaldo

commit 4ad79244f4899815a904f6e6adb71d24216b9b1d
Author: Adrian Hunter <adrian.hunter@intel.com>
Date:   Wed Jan 9 11:18:34 2019 +0200

    perf thread-stack: Improve thread_stack__no_call_return()
    
    Improve thread_stack__no_call_return() to better handle 'returns' that
    do not match the stack i.e. 'no call'. See code comments for details.
    The example below shows how retpolines are affected:
    
    Example:
    
      $ cat simple-retpoline.c
      __attribute__((noinline)) int bar(void)
      {
              return -1;
      }
    
      int foo(void)
      {
              return bar() + 1;
      }
    
      __attribute__((indirect_branch("thunk"))) int main()
      {
              int (*volatile fn)(void) = foo;
    
              fn();
              return fn();
      }
      $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
      $ objdump -d simple-retpoline
      <SNIP>
      0000000000001040 <main>:
          1040:       48 83 ec 18             sub    $0x18,%rsp
          1044:       48 8d 05 25 01 00 00    lea    0x125(%rip),%rax        # 1170 <foo>
          104b:       48 89 44 24 08          mov    %rax,0x8(%rsp)
          1050:       48 8b 44 24 08          mov    0x8(%rsp),%rax
          1055:       e8 1f 01 00 00          callq  1179 <__x86_indirect_thunk_rax>
          105a:       48 8b 44 24 08          mov    0x8(%rsp),%rax
          105f:       48 83 c4 18             add    $0x18,%rsp
          1063:       e9 11 01 00 00          jmpq   1179 <__x86_indirect_thunk_rax>
      <SNIP>
      0000000000001160 <bar>:
          1160:       b8 ff ff ff ff          mov    $0xffffffff,%eax
          1165:       c3                      retq
      <SNIP>
      0000000000001170 <foo>:
          1170:       e8 eb ff ff ff          callq  1160 <bar>
          1175:       83 c0 01                add    $0x1,%eax
          1178:       c3                      retq
      0000000000001179 <__x86_indirect_thunk_rax>:
          1179:       e8 07 00 00 00          callq  1185 <__x86_indirect_thunk_rax+0xc>
          117e:       f3 90                   pause
          1180:       0f ae e8                lfence
          1183:       eb f9                   jmp    117e <__x86_indirect_thunk_rax+0x5>
          1185:       48 89 04 24             mov    %rax,(%rsp)
          1189:       c3                      retq
      <SNIP>
      $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
      $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
      2019-01-08 14:03:37.851655 Creating database...
      2019-01-08 14:03:37.863256 Writing records...
      2019-01-08 14:03:38.069750 Adding indexes
      2019-01-08 14:03:38.078799 Done
      $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
    
    Before:
    
        main
            -> __x86_indirect_thunk_rax
                -> __x86_indirect_thunk_rax
                    -> __x86_indirect_thunk_rax
                        -> bar
    
    After:
    
        main
            -> __x86_indirect_thunk_rax
                -> __x86_indirect_thunk_rax
                    -> foo
                        -> bar
    
    Committer testing:
    
    Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
    expanding:
    
    Before:
    
    simple-retpolin
       PID:PID
          _start
             _start
                __libc_start_main
                   main
                       __x86_indirect_thunk_rax
                          __x86_indirect_thunk_rax
                          bar
    
    After:
    
    Remove the "simple.retpoline.db" file, run again the 'perf script' line
    to regenerate the .db file and run the exported-sql-viewer.py again to
    get the same all the way to 'main', then, from there, including 'main':
    
                   main
                       __x86_indirect_thunk_rax
                           __x86_indirect_thunk_rax
                               foo
                                   bar
    
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
    Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index f52c0f90915d..632c07a125ab 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -638,14 +638,57 @@ static int thread_stack__no_call_return(struct thread *thread,
 	else
 		parent = root;
 
-	/* This 'return' had no 'call', so push and pop top of stack */
-	cp = call_path__findnew(cpr, parent, fsym, ip, ks);
+	if (parent->sym == from_al->sym) {
+		/*
+		 * At the bottom of the stack, assume the missing 'call' was
+		 * before the trace started. So, pop the current symbol and push
+		 * the 'to' symbol.
+		 */
+		if (ts->cnt == 1) {
+			err = thread_stack__call_return(thread, ts, --ts->cnt,
+							tm, ref, false);
+			if (err)
+				return err;
+		}
+
+		if (!ts->cnt) {
+			cp = call_path__findnew(cpr, root, tsym, addr, ks);
+
+			return thread_stack__push_cp(ts, addr, tm, ref, cp,
+						     true, false);
+		}
+
+		/*
+		 * Otherwise assume the 'return' is being used as a jump (e.g.
+		 * retpoline) and just push the 'to' symbol.
+		 */
+		cp = call_path__findnew(cpr, parent, tsym, addr, ks);
+
+		err = thread_stack__push_cp(ts, 0, tm, ref, cp, true, false);
+		if (!err)
+			ts->stack[ts->cnt - 1].non_call = true;
+
+		return err;
+	}
+
+	/*
+	 * Assume 'parent' has not yet returned, so push 'to', and then push and
+	 * pop 'from'.
+	 */
+
+	cp = call_path__findnew(cpr, parent, tsym, addr, ks);
 
 	err = thread_stack__push_cp(ts, addr, tm, ref, cp, true, false);
 	if (err)
 		return err;
 
-	return thread_stack__pop_cp(thread, ts, addr, tm, ref, tsym);
+	cp = call_path__findnew(cpr, cp, fsym, ip, ks);
+
+	err = thread_stack__push_cp(ts, ip, tm, ref, cp, true, false);
+	if (err)
+		return err;
+
+	return thread_stack__call_return(thread, ts, --ts->cnt, tm, ref, false);
 }
 
 static int thread_stack__trace_begin(struct thread *thread,

  reply	other threads:[~2019-02-22 14:40 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-09  9:18 [PATCH 0/6] perf thread-stack: x86 retpolines Adrian Hunter
2019-01-09  9:18 ` [PATCH 1/6] perf tools: Fix split_kallsyms_for_kcore for trampoline symbols Adrian Hunter
2019-02-09 12:57   ` [tip:perf/core] perf tools: Fix split_kallsyms_for_kcore() " tip-bot for Adrian Hunter
2019-01-09  9:18 ` [PATCH 2/6] perf thread-stack: Tidy thread_stack__push_cp() usage Adrian Hunter
2019-02-09 12:58   ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09  9:18 ` [PATCH 3/6] perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables Adrian Hunter
2019-02-09 12:58   ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09  9:18 ` [PATCH 4/6] perf thread-stack: Represent jmps to the start of a different symbol Adrian Hunter
2019-02-06 12:39   ` Arnaldo Carvalho de Melo
2019-02-06 13:25     ` Adrian Hunter
2019-02-09 12:59   ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09  9:18 ` [PATCH 5/6] perf thread-stack: Improve thread_stack__no_call_return() Adrian Hunter
2019-02-22  9:48   ` Adrian Hunter
2019-02-22 14:39     ` Arnaldo Carvalho de Melo [this message]
2019-02-28  7:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-09  9:18 ` [PATCH 6/6] perf thread-stack: Hide x86 retpolines Adrian Hunter
2019-01-09 15:38   ` Jiri Olsa
2019-01-10  7:52     ` Adrian Hunter
2019-02-22 19:42   ` Arnaldo Carvalho de Melo
2019-02-22 20:53     ` Arnaldo Carvalho de Melo
2019-02-28  7:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
2019-01-10  9:55 ` [PATCH 0/6] perf thread-stack: " Jiri Olsa
2019-02-06  9:10   ` Adrian Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190222143900.GA13225@kernel.org \
    --to=arnaldo.melo@gmail.com \
    --cc=adrian.hunter@intel.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).