linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] perf report: append inlines to non-dwarf callchains
@ 2023-03-16 13:35 Artem Savkov
  2023-03-16 13:35 ` [PATCH 1/1] " Artem Savkov
  2023-03-16 21:26 ` [PATCH 0/1] " Namhyung Kim
  0 siblings, 2 replies; 14+ messages in thread
From: Artem Savkov @ 2023-03-16 13:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, linux-perf-users, linux-kernel
  Cc: Milian Wolff, Masami Hiramatsu, Andrii Nakryiko, Artem Savkov

In an email to Arnaldo Andrii Nakryiko suggested that perf can get
information about inlined functions from dwarf when available and then
add it to userspace stacktraces even in framepointer or lbr mode.
Looking closer at perf it turned out all required bits and pieces are
already there and inline information can be easily added to both
framepointer and lbr callchains by adding an append_inlines() call to
add_callchain_ip().

I used the following small program for testing:

======
#include <stdio.h>
#include <stdint.h>

static __attribute__((noinline)) uint32_t func_noinline2(uint32_t i)
{
        return i + 10;
}

static inline uint32_t func_declared_inlined(uint32_t i)
{
        return func_noinline2(i + 4);
}

static __attribute__((noinline)) uint32_t func_noinline(uint32_t i)
{
        return func_declared_inlined(i + 3);
}

static uint32_t func_inlined(uint32_t i)
{
        return func_noinline(i + 2);
}

int main(int argc, char **argv)
{
        uint64_t ret = 0;
        uint32_t i = 0;

        for (i = 0; i < 10000000; i++) {
                ret += func_inlined(i);
                ret -= func_noinline(i);
                ret -= func_noinline2(i);
                ret += func_declared_inlined(i);
        }

        printf("%s: %ld\n", __func__, ret);
        return ret;
}
======

When built with "gcc -Wall -fno-omit-frame-pointer -O2 -g" (gcc
version 12.2.1 20221121 (Red Hat 12.2.1-4) (GCC)) it results in
func_inlined() and func_declared_inlined() being inlined into main()
and func_declared_inlined() also being inlined into func_noinline().

Here are the diffs for 'perf report --call-graph --stdio' output before
and after the patch:

Frame-pointer mode recorded with 'perf record --call-graph=fp --freq=max -- ./a.out'
======
--- unpatched.fp	2023-03-16 13:27:22.724017900 +0100
+++ patched.fp	2023-03-16 13:27:28.608857921 +0100
@@ -14,18 +14,24 @@
             ---__libc_start_call_main
                |          
                |--46.31%--main
+               |          |          
+               |           --11.16%--func_declared_inlined (inlined)
                |          
                |--27.41%--func_noinline
+               |          func_declared_inlined (inlined)
                |          
                 --25.68%--func_noinline2
 
     85.71%    39.64%  a.out    a.out                 [.] main
             |          
             |--46.07%--main
+            |          |          
+            |           --11.16%--func_declared_inlined (inlined)
             |          
              --39.64%--__libc_start_call_main
                        |          
                        |--27.17%--func_noinline
+                       |          func_declared_inlined (inlined)
                        |          
                         --12.23%--func_noinline2
 
@@ -34,31 +40,47 @@
             |--46.31%--__libc_start_call_main
             |          |          
             |           --46.07%--main
+            |                     |          
+            |                      --11.16%--func_declared_inlined (inlined)
             |          
              --25.44%--func_noinline2
 
     40.56%    13.39%  a.out    a.out                 [.] func_noinline
             |          
             |--27.17%--func_noinline
+            |          func_declared_inlined (inlined)
             |          
              --13.39%--__libc_start_call_main
                        |          
                         --13.15%--func_noinline2
 
+    27.41%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
+            |
+            ---func_declared_inlined (inlined)
+
+    11.16%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
+            |
+            ---func_declared_inlined (inlined)
+
      0.30%     0.30%  a.out    [unknown]             [k] 0xffffffff81e011b7
      0.27%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start_user
      0.25%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_sysdep_start
      0.20%     0.00%  a.out    ld-linux-x86-64.so.2  [.] dl_main
      0.19%     0.19%  a.out    [unknown]             [k] 0xffffffff81e00193
+     0.18%     0.00%  a.out    a.out                 [.] func_inlined (inlined)
      0.10%     0.05%  a.out    ld-linux-x86-64.so.2  [.] _dl_relocate_object
      0.09%     0.00%  a.out    [unknown]             [k] 0x00007fef811be880
      0.09%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_map_object
      0.06%     0.00%  a.out    libc.so.6             [.] sysmalloc
      0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] munmap
-     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] mprotect
+     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] rtld_timer_accum (inlined)
+     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] rtld_timer_stop (inlined)
+     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] __mprotect (inlined)
      0.05%     0.00%  a.out    libc.so.6             [.] __x86_cacheinfo_ifunc
+     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] elf_dynamic_do_Rela (inlined)
      0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_map_object_from_fd
-     0.04%     0.00%  a.out    ld-linux-x86-64.so.2  [.] mmap64
+     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] elf_get_dynamic_info (inlined)
+     0.04%     0.00%  a.out    ld-linux-x86-64.so.2  [.] __mmap64 (inlined)
      0.04%     0.04%  a.out    ld-linux-x86-64.so.2  [.] _dl_cache_libcmp
      0.04%     0.00%  a.out    [unknown]             [.] 0x000000000001b19a
      0.04%     0.00%  a.out    [unknown]             [k] 0000000000000000
@@ -66,13 +88,15 @@
      0.04%     0.04%  a.out    ld-linux-x86-64.so.2  [.] do_lookup_x
      0.03%     0.03%  a.out    ld-linux-x86-64.so.2  [.] intel_check_word.constprop.0
      0.03%     0.00%  a.out    [unknown]             [.] 0x0000000004000000
+     0.03%     0.00%  a.out    ld-linux-x86-64.so.2  [.] intel_check_word (inlined)
      0.02%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start
      0.02%     0.02%  a.out    ld-linux-x86-64.so.2  [.] __GI___tunables_init
      0.02%     0.00%  a.out    [unknown]             [.] 0x722f3d4b434f535f
      0.02%     0.00%  a.out    [unknown]             [.] 0x00007ffebdd69f59
+     0.02%     0.00%  a.out    ld-linux-x86-64.so.2  [.] tunable_is_name (inlined)
      0.01%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _start
======

LBR mode recorded with 'perf record --call-graph=lbr -- ./a.out'
======
--- unpatched.lbr	2023-03-16 13:27:50.853253211 +0100
+++ patched.lbr	2023-03-16 13:27:56.312104813 +0100
@@ -6,58 +6,95 @@
 # Samples: 238  of event 'cycles:u'
 # Event count (approx.): 193485873
 #
-# Children      Self  Command  Shared Object         Symbol                           
-# ........  ........  .......  ....................  .................................
+# Children      Self  Command  Shared Object         Symbol                       
+# ........  ........  .......  ....................  .............................
 #
     99.11%    36.87%  a.out    a.out                 [.] main
             |          
             |--62.24%--main
             |          |          
-            |           --17.47%--func_noinline
+            |          |--11.05%--func_inlined (inlined)
+            |          |          func_noinline
+            |          |          func_declared_inlined (inlined)
+            |          |          
+            |           --6.42%--func_noinline
+            |                     func_declared_inlined (inlined)
             |          
              --36.87%--_start
-                       __libc_start_main@@GLIBC_2.34
+                       __libc_start_main_alias_2 (inlined)
                        __libc_start_call_main
                        main
                        |          
-                       |--24.52%--func_noinline
+                       |--17.51%--func_inlined (inlined)
+                       |          func_noinline
+                       |          func_declared_inlined (inlined)
+                       |          
+                       |--8.45%--func_noinline2
                        |          
-                        --10.01%--func_noinline2
+                       |--7.01%--func_noinline
+                       |          func_declared_inlined (inlined)
+                       |          
+                        --1.57%--func_declared_inlined (inlined)
+                                  func_noinline2
 
     99.11%     0.00%  a.out    a.out                 [.] _start
             |
             ---_start
-               __libc_start_main@@GLIBC_2.34
+               __libc_start_main_alias_2 (inlined)
                __libc_start_call_main
                main
                |          
-               |--41.99%--func_noinline
+               |--28.55%--func_inlined (inlined)
+               |          func_noinline
+               |          func_declared_inlined (inlined)
+               |          
+               |--13.43%--func_noinline
+               |          func_declared_inlined (inlined)
                |          
-                --10.37%--func_noinline2
+               |--8.81%--func_noinline2
+               |          
+                --1.57%--func_declared_inlined (inlined)
+                          func_noinline2
 
-    99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_main@@GLIBC_2.34
+    99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_main_alias_2 (inlined)
             |
-            ---__libc_start_main@@GLIBC_2.34
+            ---__libc_start_main_alias_2 (inlined)
                __libc_start_call_main
                main
                |          
-               |--41.99%--func_noinline
+               |--28.55%--func_inlined (inlined)
+               |          func_noinline
+               |          func_declared_inlined (inlined)
+               |          
+               |--13.43%--func_noinline
+               |          func_declared_inlined (inlined)
+               |          
+               |--8.81%--func_noinline2
                |          
-                --10.37%--func_noinline2
+                --1.57%--func_declared_inlined (inlined)
+                          func_noinline2
 
     99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_call_main
             |
             ---__libc_start_call_main
                main
                |          
-               |--41.99%--func_noinline
+               |--28.55%--func_inlined (inlined)
+               |          func_noinline
+               |          func_declared_inlined (inlined)
                |          
-                --10.37%--func_noinline2
+               |--13.43%--func_noinline
+               |          func_declared_inlined (inlined)
+               |          
+               |--8.81%--func_noinline2
+               |          
+                --1.57%--func_declared_inlined (inlined)
+                          func_noinline2
 
     54.44%    44.43%  a.out    a.out                 [.] func_noinline2
             |          
             |--44.43%--_start
-            |          __libc_start_main@@GLIBC_2.34
+            |          __libc_start_main_alias_2 (inlined)
             |          __libc_start_call_main
             |          main
             |          
@@ -66,19 +103,42 @@
     41.99%    17.47%  a.out    a.out                 [.] func_noinline
             |          
             |--24.52%--func_noinline
+            |          func_declared_inlined (inlined)
             |          
              --17.47%--_start
-                       __libc_start_main@@GLIBC_2.34
+                       __libc_start_main_alias_2 (inlined)
                        __libc_start_call_main
                        main
-                       func_noinline
+                       |          
+                       |--11.05%--func_inlined (inlined)
+                       |          func_noinline
+                       |          func_declared_inlined (inlined)
+                       |          
+                        --6.42%--func_noinline
+                                  func_declared_inlined (inlined)
+
+    41.99%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
+            |
+            ---func_declared_inlined (inlined)
+
+    28.55%     0.00%  a.out    a.out                 [.] func_inlined (inlined)
+            |
+            ---func_inlined (inlined)
+               func_noinline
+               func_declared_inlined (inlined)
+
+     1.57%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
+            |
+            ---func_declared_inlined (inlined)
+               func_noinline2
 
      0.88%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _start
             |
             ---_start
                _dl_start
                |          
-                --0.71%--_dl_sysdep_start
+                --0.71%--_dl_start_final (inlined)
+                          _dl_sysdep_start
                           dl_main
                           _dl_map_object_deps
                           _dl_catch_exception
@@ -92,7 +152,8 @@
             |
             ---_dl_start
                |          
-                --0.71%--_dl_sysdep_start
+                --0.71%--_dl_start_final (inlined)
+                          _dl_sysdep_start
                           dl_main
                           _dl_map_object_deps
                           _dl_catch_exception
@@ -106,6 +167,20 @@
             |
             ---_start
                _dl_start
+               _dl_start_final (inlined)
+               _dl_sysdep_start
+               dl_main
+               _dl_map_object_deps
+               _dl_catch_exception
+               openaux
+               _dl_map_object
+               _dl_map_object_from_fd
+               _dl_new_object
+               strlen
+
+     0.71%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start_final (inlined)
+            |
+            ---_dl_start_final (inlined)
                _dl_sysdep_start
                dl_main
                _dl_map_object_deps
======

No meaningful differences in output with callchains in dwarf mode or when
tracing a binary that was compiled with no dwarf debuginfo.

Artem Savkov (1):
  perf report: append inlines to non-dwarf callchains

 tools/perf/util/machine.c | 82 ++++++++++++++++++++-------------------
 1 file changed, 43 insertions(+), 39 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-04-04 12:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-16 13:35 [PATCH 0/1] perf report: append inlines to non-dwarf callchains Artem Savkov
2023-03-16 13:35 ` [PATCH 1/1] " Artem Savkov
2023-03-16 21:26 ` [PATCH 0/1] " Namhyung Kim
2023-03-17  7:41   ` Artem Savkov
2023-03-22 18:18     ` Namhyung Kim
2023-03-22 19:44       ` Arnaldo Carvalho de Melo
2023-03-23  8:22         ` Artem Savkov
2023-03-30  5:06         ` Adrian Hunter
2023-03-31  8:52           ` Artem Savkov
2023-04-03 20:30             ` Arnaldo Carvalho de Melo
2023-04-04  5:47               ` Namhyung Kim
2023-04-04  6:57                 ` Adrian Hunter
2023-04-04  6:58                 ` Artem Savkov
2023-04-04 12:22                   ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).