linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] perf report: append inlines to non-dwarf callchains
@ 2023-03-16 13:35 Artem Savkov
  2023-03-16 13:35 ` [PATCH 1/1] " Artem Savkov
  2023-03-16 21:26 ` [PATCH 0/1] " Namhyung Kim
  0 siblings, 2 replies; 14+ messages in thread
From: Artem Savkov @ 2023-03-16 13:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, linux-perf-users, linux-kernel
  Cc: Milian Wolff, Masami Hiramatsu, Andrii Nakryiko, Artem Savkov

In an email to Arnaldo Andrii Nakryiko suggested that perf can get
information about inlined functions from dwarf when available and then
add it to userspace stacktraces even in framepointer or lbr mode.
Looking closer at perf it turned out all required bits and pieces are
already there and inline information can be easily added to both
framepointer and lbr callchains by adding an append_inlines() call to
add_callchain_ip().

I used the following small program for testing:

======
#include <stdio.h>
#include <stdint.h>

static __attribute__((noinline)) uint32_t func_noinline2(uint32_t i)
{
        return i + 10;
}

static inline uint32_t func_declared_inlined(uint32_t i)
{
        return func_noinline2(i + 4);
}

static __attribute__((noinline)) uint32_t func_noinline(uint32_t i)
{
        return func_declared_inlined(i + 3);
}

static uint32_t func_inlined(uint32_t i)
{
        return func_noinline(i + 2);
}

int main(int argc, char **argv)
{
        uint64_t ret = 0;
        uint32_t i = 0;

        for (i = 0; i < 10000000; i++) {
                ret += func_inlined(i);
                ret -= func_noinline(i);
                ret -= func_noinline2(i);
                ret += func_declared_inlined(i);
        }

        printf("%s: %ld\n", __func__, ret);
        return ret;
}
======

When built with "gcc -Wall -fno-omit-frame-pointer -O2 -g" (gcc
version 12.2.1 20221121 (Red Hat 12.2.1-4) (GCC)) it results in
func_inlined() and func_declared_inlined() being inlined into main()
and func_declared_inlined() also being inlined into func_noinline().

Here are the diffs for 'perf report --call-graph --stdio' output before
and after the patch:

Frame-pointer mode recorded with 'perf record --call-graph=fp --freq=max -- ./a.out'
======
--- unpatched.fp	2023-03-16 13:27:22.724017900 +0100
+++ patched.fp	2023-03-16 13:27:28.608857921 +0100
@@ -14,18 +14,24 @@
             ---__libc_start_call_main
                |          
                |--46.31%--main
+               |          |          
+               |           --11.16%--func_declared_inlined (inlined)
                |          
                |--27.41%--func_noinline
+               |          func_declared_inlined (inlined)
                |          
                 --25.68%--func_noinline2
 
     85.71%    39.64%  a.out    a.out                 [.] main
             |          
             |--46.07%--main
+            |          |          
+            |           --11.16%--func_declared_inlined (inlined)
             |          
              --39.64%--__libc_start_call_main
                        |          
                        |--27.17%--func_noinline
+                       |          func_declared_inlined (inlined)
                        |          
                         --12.23%--func_noinline2
 
@@ -34,31 +40,47 @@
             |--46.31%--__libc_start_call_main
             |          |          
             |           --46.07%--main
+            |                     |          
+            |                      --11.16%--func_declared_inlined (inlined)
             |          
              --25.44%--func_noinline2
 
     40.56%    13.39%  a.out    a.out                 [.] func_noinline
             |          
             |--27.17%--func_noinline
+            |          func_declared_inlined (inlined)
             |          
              --13.39%--__libc_start_call_main
                        |          
                         --13.15%--func_noinline2
 
+    27.41%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
+            |
+            ---func_declared_inlined (inlined)
+
+    11.16%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
+            |
+            ---func_declared_inlined (inlined)
+
      0.30%     0.30%  a.out    [unknown]             [k] 0xffffffff81e011b7
      0.27%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start_user
      0.25%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_sysdep_start
      0.20%     0.00%  a.out    ld-linux-x86-64.so.2  [.] dl_main
      0.19%     0.19%  a.out    [unknown]             [k] 0xffffffff81e00193
+     0.18%     0.00%  a.out    a.out                 [.] func_inlined (inlined)
      0.10%     0.05%  a.out    ld-linux-x86-64.so.2  [.] _dl_relocate_object
      0.09%     0.00%  a.out    [unknown]             [k] 0x00007fef811be880
      0.09%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_map_object
      0.06%     0.00%  a.out    libc.so.6             [.] sysmalloc
      0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] munmap
-     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] mprotect
+     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] rtld_timer_accum (inlined)
+     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] rtld_timer_stop (inlined)
+     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] __mprotect (inlined)
      0.05%     0.00%  a.out    libc.so.6             [.] __x86_cacheinfo_ifunc
+     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] elf_dynamic_do_Rela (inlined)
      0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_map_object_from_fd
-     0.04%     0.00%  a.out    ld-linux-x86-64.so.2  [.] mmap64
+     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] elf_get_dynamic_info (inlined)
+     0.04%     0.00%  a.out    ld-linux-x86-64.so.2  [.] __mmap64 (inlined)
      0.04%     0.04%  a.out    ld-linux-x86-64.so.2  [.] _dl_cache_libcmp
      0.04%     0.00%  a.out    [unknown]             [.] 0x000000000001b19a
      0.04%     0.00%  a.out    [unknown]             [k] 0000000000000000
@@ -66,13 +88,15 @@
      0.04%     0.04%  a.out    ld-linux-x86-64.so.2  [.] do_lookup_x
      0.03%     0.03%  a.out    ld-linux-x86-64.so.2  [.] intel_check_word.constprop.0
      0.03%     0.00%  a.out    [unknown]             [.] 0x0000000004000000
+     0.03%     0.00%  a.out    ld-linux-x86-64.so.2  [.] intel_check_word (inlined)
      0.02%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start
      0.02%     0.02%  a.out    ld-linux-x86-64.so.2  [.] __GI___tunables_init
      0.02%     0.00%  a.out    [unknown]             [.] 0x722f3d4b434f535f
      0.02%     0.00%  a.out    [unknown]             [.] 0x00007ffebdd69f59
+     0.02%     0.00%  a.out    ld-linux-x86-64.so.2  [.] tunable_is_name (inlined)
      0.01%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _start
======

LBR mode recorded with 'perf record --call-graph=lbr -- ./a.out'
======
--- unpatched.lbr	2023-03-16 13:27:50.853253211 +0100
+++ patched.lbr	2023-03-16 13:27:56.312104813 +0100
@@ -6,58 +6,95 @@
 # Samples: 238  of event 'cycles:u'
 # Event count (approx.): 193485873
 #
-# Children      Self  Command  Shared Object         Symbol                           
-# ........  ........  .......  ....................  .................................
+# Children      Self  Command  Shared Object         Symbol                       
+# ........  ........  .......  ....................  .............................
 #
     99.11%    36.87%  a.out    a.out                 [.] main
             |          
             |--62.24%--main
             |          |          
-            |           --17.47%--func_noinline
+            |          |--11.05%--func_inlined (inlined)
+            |          |          func_noinline
+            |          |          func_declared_inlined (inlined)
+            |          |          
+            |           --6.42%--func_noinline
+            |                     func_declared_inlined (inlined)
             |          
              --36.87%--_start
-                       __libc_start_main@@GLIBC_2.34
+                       __libc_start_main_alias_2 (inlined)
                        __libc_start_call_main
                        main
                        |          
-                       |--24.52%--func_noinline
+                       |--17.51%--func_inlined (inlined)
+                       |          func_noinline
+                       |          func_declared_inlined (inlined)
+                       |          
+                       |--8.45%--func_noinline2
                        |          
-                        --10.01%--func_noinline2
+                       |--7.01%--func_noinline
+                       |          func_declared_inlined (inlined)
+                       |          
+                        --1.57%--func_declared_inlined (inlined)
+                                  func_noinline2
 
     99.11%     0.00%  a.out    a.out                 [.] _start
             |
             ---_start
-               __libc_start_main@@GLIBC_2.34
+               __libc_start_main_alias_2 (inlined)
                __libc_start_call_main
                main
                |          
-               |--41.99%--func_noinline
+               |--28.55%--func_inlined (inlined)
+               |          func_noinline
+               |          func_declared_inlined (inlined)
+               |          
+               |--13.43%--func_noinline
+               |          func_declared_inlined (inlined)
                |          
-                --10.37%--func_noinline2
+               |--8.81%--func_noinline2
+               |          
+                --1.57%--func_declared_inlined (inlined)
+                          func_noinline2
 
-    99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_main@@GLIBC_2.34
+    99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_main_alias_2 (inlined)
             |
-            ---__libc_start_main@@GLIBC_2.34
+            ---__libc_start_main_alias_2 (inlined)
                __libc_start_call_main
                main
                |          
-               |--41.99%--func_noinline
+               |--28.55%--func_inlined (inlined)
+               |          func_noinline
+               |          func_declared_inlined (inlined)
+               |          
+               |--13.43%--func_noinline
+               |          func_declared_inlined (inlined)
+               |          
+               |--8.81%--func_noinline2
                |          
-                --10.37%--func_noinline2
+                --1.57%--func_declared_inlined (inlined)
+                          func_noinline2
 
     99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_call_main
             |
             ---__libc_start_call_main
                main
                |          
-               |--41.99%--func_noinline
+               |--28.55%--func_inlined (inlined)
+               |          func_noinline
+               |          func_declared_inlined (inlined)
                |          
-                --10.37%--func_noinline2
+               |--13.43%--func_noinline
+               |          func_declared_inlined (inlined)
+               |          
+               |--8.81%--func_noinline2
+               |          
+                --1.57%--func_declared_inlined (inlined)
+                          func_noinline2
 
     54.44%    44.43%  a.out    a.out                 [.] func_noinline2
             |          
             |--44.43%--_start
-            |          __libc_start_main@@GLIBC_2.34
+            |          __libc_start_main_alias_2 (inlined)
             |          __libc_start_call_main
             |          main
             |          
@@ -66,19 +103,42 @@
     41.99%    17.47%  a.out    a.out                 [.] func_noinline
             |          
             |--24.52%--func_noinline
+            |          func_declared_inlined (inlined)
             |          
              --17.47%--_start
-                       __libc_start_main@@GLIBC_2.34
+                       __libc_start_main_alias_2 (inlined)
                        __libc_start_call_main
                        main
-                       func_noinline
+                       |          
+                       |--11.05%--func_inlined (inlined)
+                       |          func_noinline
+                       |          func_declared_inlined (inlined)
+                       |          
+                        --6.42%--func_noinline
+                                  func_declared_inlined (inlined)
+
+    41.99%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
+            |
+            ---func_declared_inlined (inlined)
+
+    28.55%     0.00%  a.out    a.out                 [.] func_inlined (inlined)
+            |
+            ---func_inlined (inlined)
+               func_noinline
+               func_declared_inlined (inlined)
+
+     1.57%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
+            |
+            ---func_declared_inlined (inlined)
+               func_noinline2
 
      0.88%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _start
             |
             ---_start
                _dl_start
                |          
-                --0.71%--_dl_sysdep_start
+                --0.71%--_dl_start_final (inlined)
+                          _dl_sysdep_start
                           dl_main
                           _dl_map_object_deps
                           _dl_catch_exception
@@ -92,7 +152,8 @@
             |
             ---_dl_start
                |          
-                --0.71%--_dl_sysdep_start
+                --0.71%--_dl_start_final (inlined)
+                          _dl_sysdep_start
                           dl_main
                           _dl_map_object_deps
                           _dl_catch_exception
@@ -106,6 +167,20 @@
             |
             ---_start
                _dl_start
+               _dl_start_final (inlined)
+               _dl_sysdep_start
+               dl_main
+               _dl_map_object_deps
+               _dl_catch_exception
+               openaux
+               _dl_map_object
+               _dl_map_object_from_fd
+               _dl_new_object
+               strlen
+
+     0.71%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start_final (inlined)
+            |
+            ---_dl_start_final (inlined)
                _dl_sysdep_start
                dl_main
                _dl_map_object_deps
======

No meaningful differences in output with callchains in dwarf mode or when
tracing a binary that was compiled with no dwarf debuginfo.

Artem Savkov (1):
  perf report: append inlines to non-dwarf callchains

 tools/perf/util/machine.c | 82 ++++++++++++++++++++-------------------
 1 file changed, 43 insertions(+), 39 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/1] perf report: append inlines to non-dwarf callchains
  2023-03-16 13:35 [PATCH 0/1] perf report: append inlines to non-dwarf callchains Artem Savkov
@ 2023-03-16 13:35 ` Artem Savkov
  2023-03-16 21:26 ` [PATCH 0/1] " Namhyung Kim
  1 sibling, 0 replies; 14+ messages in thread
From: Artem Savkov @ 2023-03-16 13:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, linux-perf-users, linux-kernel
  Cc: Milian Wolff, Masami Hiramatsu, Andrii Nakryiko, Artem Savkov

Append information about inlined functions to fp and lbr callchains
from dwarf debuginfo when available. Do so by calling append_inlines()
from add_callchain_ip(). This requires append_inlies to  be moved up a
bit.

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Artem Savkov <asavkov@redhat.com>
---
 tools/perf/util/machine.c | 82 ++++++++++++++++++++-------------------
 1 file changed, 43 insertions(+), 39 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 803c9d1803dd..2dd46b9f0083 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2249,6 +2249,45 @@ struct iterations {
 	u64 cycles;
 };
 
+static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip)
+{
+	struct symbol *sym = ms->sym;
+	struct map *map = ms->map;
+	struct inline_node *inline_node;
+	struct inline_list *ilist;
+	u64 addr;
+	int ret = 1;
+
+	if (!symbol_conf.inline_name || !map || !sym)
+		return ret;
+
+	addr = map__map_ip(map, ip);
+	addr = map__rip_2objdump(map, addr);
+
+	inline_node = inlines__tree_find(&map->dso->inlined_nodes, addr);
+	if (!inline_node) {
+		inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
+		if (!inline_node)
+			return ret;
+		inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
+	}
+
+	list_for_each_entry(ilist, &inline_node->val, list) {
+		struct map_symbol ilist_ms = {
+			.maps = ms->maps,
+			.map = map,
+			.sym = ilist->symbol,
+		};
+		ret = callchain_cursor_append(cursor, ip, &ilist_ms, false,
+					      NULL, 0, 0, 0, ilist->srcline);
+
+		if (ret != 0)
+			return ret;
+	}
+
+	return ret;
+}
+
 static int add_callchain_ip(struct thread *thread,
 			    struct callchain_cursor *cursor,
 			    struct symbol **parent,
@@ -2322,6 +2361,10 @@ static int add_callchain_ip(struct thread *thread,
 	ms.maps = al.maps;
 	ms.map = al.map;
 	ms.sym = al.sym;
+
+	if (append_inlines(cursor, &ms, ip) == 0)
+		return 0;
+
 	srcline = callchain_srcline(&ms, al.addr);
 	return callchain_cursor_append(cursor, ip, &ms,
 				       branch, flags, nr_loop_iter,
@@ -3008,45 +3051,6 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	return 0;
 }
 
-static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip)
-{
-	struct symbol *sym = ms->sym;
-	struct map *map = ms->map;
-	struct inline_node *inline_node;
-	struct inline_list *ilist;
-	u64 addr;
-	int ret = 1;
-
-	if (!symbol_conf.inline_name || !map || !sym)
-		return ret;
-
-	addr = map__map_ip(map, ip);
-	addr = map__rip_2objdump(map, addr);
-
-	inline_node = inlines__tree_find(&map->dso->inlined_nodes, addr);
-	if (!inline_node) {
-		inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
-		if (!inline_node)
-			return ret;
-		inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
-	}
-
-	list_for_each_entry(ilist, &inline_node->val, list) {
-		struct map_symbol ilist_ms = {
-			.maps = ms->maps,
-			.map = map,
-			.sym = ilist->symbol,
-		};
-		ret = callchain_cursor_append(cursor, ip, &ilist_ms, false,
-					      NULL, 0, 0, 0, ilist->srcline);
-
-		if (ret != 0)
-			return ret;
-	}
-
-	return ret;
-}
-
 static int unwind_entry(struct unwind_entry *entry, void *arg)
 {
 	struct callchain_cursor *cursor = arg;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-03-16 13:35 [PATCH 0/1] perf report: append inlines to non-dwarf callchains Artem Savkov
  2023-03-16 13:35 ` [PATCH 1/1] " Artem Savkov
@ 2023-03-16 21:26 ` Namhyung Kim
  2023-03-17  7:41   ` Artem Savkov
  1 sibling, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2023-03-16 21:26 UTC (permalink / raw)
  To: Artem Savkov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, linux-perf-users, linux-kernel, Milian Wolff,
	Masami Hiramatsu, Andrii Nakryiko

Hello,

On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
>
> In an email to Arnaldo Andrii Nakryiko suggested that perf can get
> information about inlined functions from dwarf when available and then
> add it to userspace stacktraces even in framepointer or lbr mode.
> Looking closer at perf it turned out all required bits and pieces are
> already there and inline information can be easily added to both
> framepointer and lbr callchains by adding an append_inlines() call to
> add_callchain_ip().

Looks great!  Have you checked it with perf report -g callee ?
I'm not sure the ordering of inlined functions is maintained
properly.  Maybe you can use --no-children too to simplify
the output.

Thanks,
Namhyung

>
> I used the following small program for testing:
>
> ======
> #include <stdio.h>
> #include <stdint.h>
>
> static __attribute__((noinline)) uint32_t func_noinline2(uint32_t i)
> {
>         return i + 10;
> }
>
> static inline uint32_t func_declared_inlined(uint32_t i)
> {
>         return func_noinline2(i + 4);
> }
>
> static __attribute__((noinline)) uint32_t func_noinline(uint32_t i)
> {
>         return func_declared_inlined(i + 3);
> }
>
> static uint32_t func_inlined(uint32_t i)
> {
>         return func_noinline(i + 2);
> }
>
> int main(int argc, char **argv)
> {
>         uint64_t ret = 0;
>         uint32_t i = 0;
>
>         for (i = 0; i < 10000000; i++) {
>                 ret += func_inlined(i);
>                 ret -= func_noinline(i);
>                 ret -= func_noinline2(i);
>                 ret += func_declared_inlined(i);
>         }
>
>         printf("%s: %ld\n", __func__, ret);
>         return ret;
> }
> ======
>
> When built with "gcc -Wall -fno-omit-frame-pointer -O2 -g" (gcc
> version 12.2.1 20221121 (Red Hat 12.2.1-4) (GCC)) it results in
> func_inlined() and func_declared_inlined() being inlined into main()
> and func_declared_inlined() also being inlined into func_noinline().
>
> Here are the diffs for 'perf report --call-graph --stdio' output before
> and after the patch:
>
> Frame-pointer mode recorded with 'perf record --call-graph=fp --freq=max -- ./a.out'
> ======
> --- unpatched.fp        2023-03-16 13:27:22.724017900 +0100
> +++ patched.fp  2023-03-16 13:27:28.608857921 +0100
> @@ -14,18 +14,24 @@
>              ---__libc_start_call_main
>                 |
>                 |--46.31%--main
> +               |          |
> +               |           --11.16%--func_declared_inlined (inlined)
>                 |
>                 |--27.41%--func_noinline
> +               |          func_declared_inlined (inlined)
>                 |
>                  --25.68%--func_noinline2
>
>      85.71%    39.64%  a.out    a.out                 [.] main
>              |
>              |--46.07%--main
> +            |          |
> +            |           --11.16%--func_declared_inlined (inlined)
>              |
>               --39.64%--__libc_start_call_main
>                         |
>                         |--27.17%--func_noinline
> +                       |          func_declared_inlined (inlined)
>                         |
>                          --12.23%--func_noinline2
>
> @@ -34,31 +40,47 @@
>              |--46.31%--__libc_start_call_main
>              |          |
>              |           --46.07%--main
> +            |                     |
> +            |                      --11.16%--func_declared_inlined (inlined)
>              |
>               --25.44%--func_noinline2
>
>      40.56%    13.39%  a.out    a.out                 [.] func_noinline
>              |
>              |--27.17%--func_noinline
> +            |          func_declared_inlined (inlined)
>              |
>               --13.39%--__libc_start_call_main
>                         |
>                          --13.15%--func_noinline2
>
> +    27.41%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> +            |
> +            ---func_declared_inlined (inlined)
> +
> +    11.16%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> +            |
> +            ---func_declared_inlined (inlined)
> +
>       0.30%     0.30%  a.out    [unknown]             [k] 0xffffffff81e011b7
>       0.27%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start_user
>       0.25%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_sysdep_start
>       0.20%     0.00%  a.out    ld-linux-x86-64.so.2  [.] dl_main
>       0.19%     0.19%  a.out    [unknown]             [k] 0xffffffff81e00193
> +     0.18%     0.00%  a.out    a.out                 [.] func_inlined (inlined)
>       0.10%     0.05%  a.out    ld-linux-x86-64.so.2  [.] _dl_relocate_object
>       0.09%     0.00%  a.out    [unknown]             [k] 0x00007fef811be880
>       0.09%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_map_object
>       0.06%     0.00%  a.out    libc.so.6             [.] sysmalloc
>       0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] munmap
> -     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] mprotect
> +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] rtld_timer_accum (inlined)
> +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] rtld_timer_stop (inlined)
> +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] __mprotect (inlined)
>       0.05%     0.00%  a.out    libc.so.6             [.] __x86_cacheinfo_ifunc
> +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] elf_dynamic_do_Rela (inlined)
>       0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_map_object_from_fd
> -     0.04%     0.00%  a.out    ld-linux-x86-64.so.2  [.] mmap64
> +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] elf_get_dynamic_info (inlined)
> +     0.04%     0.00%  a.out    ld-linux-x86-64.so.2  [.] __mmap64 (inlined)
>       0.04%     0.04%  a.out    ld-linux-x86-64.so.2  [.] _dl_cache_libcmp
>       0.04%     0.00%  a.out    [unknown]             [.] 0x000000000001b19a
>       0.04%     0.00%  a.out    [unknown]             [k] 0000000000000000
> @@ -66,13 +88,15 @@
>       0.04%     0.04%  a.out    ld-linux-x86-64.so.2  [.] do_lookup_x
>       0.03%     0.03%  a.out    ld-linux-x86-64.so.2  [.] intel_check_word.constprop.0
>       0.03%     0.00%  a.out    [unknown]             [.] 0x0000000004000000
> +     0.03%     0.00%  a.out    ld-linux-x86-64.so.2  [.] intel_check_word (inlined)
>       0.02%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start
>       0.02%     0.02%  a.out    ld-linux-x86-64.so.2  [.] __GI___tunables_init
>       0.02%     0.00%  a.out    [unknown]             [.] 0x722f3d4b434f535f
>       0.02%     0.00%  a.out    [unknown]             [.] 0x00007ffebdd69f59
> +     0.02%     0.00%  a.out    ld-linux-x86-64.so.2  [.] tunable_is_name (inlined)
>       0.01%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _start
> ======
>
> LBR mode recorded with 'perf record --call-graph=lbr -- ./a.out'
> ======
> --- unpatched.lbr       2023-03-16 13:27:50.853253211 +0100
> +++ patched.lbr 2023-03-16 13:27:56.312104813 +0100
> @@ -6,58 +6,95 @@
>  # Samples: 238  of event 'cycles:u'
>  # Event count (approx.): 193485873
>  #
> -# Children      Self  Command  Shared Object         Symbol
> -# ........  ........  .......  ....................  .................................
> +# Children      Self  Command  Shared Object         Symbol
> +# ........  ........  .......  ....................  .............................
>  #
>      99.11%    36.87%  a.out    a.out                 [.] main
>              |
>              |--62.24%--main
>              |          |
> -            |           --17.47%--func_noinline
> +            |          |--11.05%--func_inlined (inlined)
> +            |          |          func_noinline
> +            |          |          func_declared_inlined (inlined)
> +            |          |
> +            |           --6.42%--func_noinline
> +            |                     func_declared_inlined (inlined)
>              |
>               --36.87%--_start
> -                       __libc_start_main@@GLIBC_2.34
> +                       __libc_start_main_alias_2 (inlined)
>                         __libc_start_call_main
>                         main
>                         |
> -                       |--24.52%--func_noinline
> +                       |--17.51%--func_inlined (inlined)
> +                       |          func_noinline
> +                       |          func_declared_inlined (inlined)
> +                       |
> +                       |--8.45%--func_noinline2
>                         |
> -                        --10.01%--func_noinline2
> +                       |--7.01%--func_noinline
> +                       |          func_declared_inlined (inlined)
> +                       |
> +                        --1.57%--func_declared_inlined (inlined)
> +                                  func_noinline2
>
>      99.11%     0.00%  a.out    a.out                 [.] _start
>              |
>              ---_start
> -               __libc_start_main@@GLIBC_2.34
> +               __libc_start_main_alias_2 (inlined)
>                 __libc_start_call_main
>                 main
>                 |
> -               |--41.99%--func_noinline
> +               |--28.55%--func_inlined (inlined)
> +               |          func_noinline
> +               |          func_declared_inlined (inlined)
> +               |
> +               |--13.43%--func_noinline
> +               |          func_declared_inlined (inlined)
>                 |
> -                --10.37%--func_noinline2
> +               |--8.81%--func_noinline2
> +               |
> +                --1.57%--func_declared_inlined (inlined)
> +                          func_noinline2
>
> -    99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_main@@GLIBC_2.34
> +    99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_main_alias_2 (inlined)
>              |
> -            ---__libc_start_main@@GLIBC_2.34
> +            ---__libc_start_main_alias_2 (inlined)
>                 __libc_start_call_main
>                 main
>                 |
> -               |--41.99%--func_noinline
> +               |--28.55%--func_inlined (inlined)
> +               |          func_noinline
> +               |          func_declared_inlined (inlined)
> +               |
> +               |--13.43%--func_noinline
> +               |          func_declared_inlined (inlined)
> +               |
> +               |--8.81%--func_noinline2
>                 |
> -                --10.37%--func_noinline2
> +                --1.57%--func_declared_inlined (inlined)
> +                          func_noinline2
>
>      99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_call_main
>              |
>              ---__libc_start_call_main
>                 main
>                 |
> -               |--41.99%--func_noinline
> +               |--28.55%--func_inlined (inlined)
> +               |          func_noinline
> +               |          func_declared_inlined (inlined)
>                 |
> -                --10.37%--func_noinline2
> +               |--13.43%--func_noinline
> +               |          func_declared_inlined (inlined)
> +               |
> +               |--8.81%--func_noinline2
> +               |
> +                --1.57%--func_declared_inlined (inlined)
> +                          func_noinline2
>
>      54.44%    44.43%  a.out    a.out                 [.] func_noinline2
>              |
>              |--44.43%--_start
> -            |          __libc_start_main@@GLIBC_2.34
> +            |          __libc_start_main_alias_2 (inlined)
>              |          __libc_start_call_main
>              |          main
>              |
> @@ -66,19 +103,42 @@
>      41.99%    17.47%  a.out    a.out                 [.] func_noinline
>              |
>              |--24.52%--func_noinline
> +            |          func_declared_inlined (inlined)
>              |
>               --17.47%--_start
> -                       __libc_start_main@@GLIBC_2.34
> +                       __libc_start_main_alias_2 (inlined)
>                         __libc_start_call_main
>                         main
> -                       func_noinline
> +                       |
> +                       |--11.05%--func_inlined (inlined)
> +                       |          func_noinline
> +                       |          func_declared_inlined (inlined)
> +                       |
> +                        --6.42%--func_noinline
> +                                  func_declared_inlined (inlined)
> +
> +    41.99%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> +            |
> +            ---func_declared_inlined (inlined)
> +
> +    28.55%     0.00%  a.out    a.out                 [.] func_inlined (inlined)
> +            |
> +            ---func_inlined (inlined)
> +               func_noinline
> +               func_declared_inlined (inlined)
> +
> +     1.57%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> +            |
> +            ---func_declared_inlined (inlined)
> +               func_noinline2
>
>       0.88%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _start
>              |
>              ---_start
>                 _dl_start
>                 |
> -                --0.71%--_dl_sysdep_start
> +                --0.71%--_dl_start_final (inlined)
> +                          _dl_sysdep_start
>                            dl_main
>                            _dl_map_object_deps
>                            _dl_catch_exception
> @@ -92,7 +152,8 @@
>              |
>              ---_dl_start
>                 |
> -                --0.71%--_dl_sysdep_start
> +                --0.71%--_dl_start_final (inlined)
> +                          _dl_sysdep_start
>                            dl_main
>                            _dl_map_object_deps
>                            _dl_catch_exception
> @@ -106,6 +167,20 @@
>              |
>              ---_start
>                 _dl_start
> +               _dl_start_final (inlined)
> +               _dl_sysdep_start
> +               dl_main
> +               _dl_map_object_deps
> +               _dl_catch_exception
> +               openaux
> +               _dl_map_object
> +               _dl_map_object_from_fd
> +               _dl_new_object
> +               strlen
> +
> +     0.71%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start_final (inlined)
> +            |
> +            ---_dl_start_final (inlined)
>                 _dl_sysdep_start
>                 dl_main
>                 _dl_map_object_deps
> ======
>
> No meaningful differences in output with callchains in dwarf mode or when
> tracing a binary that was compiled with no dwarf debuginfo.
>
> Artem Savkov (1):
>   perf report: append inlines to non-dwarf callchains
>
>  tools/perf/util/machine.c | 82 ++++++++++++++++++++-------------------
>  1 file changed, 43 insertions(+), 39 deletions(-)
>
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-03-16 21:26 ` [PATCH 0/1] " Namhyung Kim
@ 2023-03-17  7:41   ` Artem Savkov
  2023-03-22 18:18     ` Namhyung Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Artem Savkov @ 2023-03-17  7:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, linux-perf-users, linux-kernel, Milian Wolff,
	Masami Hiramatsu, Andrii Nakryiko

On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
> Hello,
> 
> On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
> >
> > In an email to Arnaldo Andrii Nakryiko suggested that perf can get
> > information about inlined functions from dwarf when available and then
> > add it to userspace stacktraces even in framepointer or lbr mode.
> > Looking closer at perf it turned out all required bits and pieces are
> > already there and inline information can be easily added to both
> > framepointer and lbr callchains by adding an append_inlines() call to
> > add_callchain_ip().
> 
> Looks great!  Have you checked it with perf report -g callee ?
> I'm not sure the ordering of inlined functions is maintained
> properly.  Maybe you can use --no-children too to simplify
> the output.

Thanks for the suggestion. I actually have another test program with
functions being numbered rather than (creatively) named, so it might be
easier to use it to figure out ordering. Here's the code:

======
#include <stdio.h>
#include <stdint.h>

static __attribute__((noinline)) uint32_t func5(uint32_t i)
{
        return i + 10;
}

static uint32_t func4(uint32_t i)
{
        return func5(i + 5);
}

static inline uint32_t func3(uint32_t i)
{
        return func4(i + 4);
}

static __attribute__((noinline)) uint32_t func2(uint32_t i)
{
        return func3(i + 3);
}

static uint32_t func1(uint32_t i)
{
        return func2(i + 2);
}

__attribute__((noinline)) uint64_t entry(void)
{
        uint64_t ret = 0;
        uint32_t i = 0;
        for (i = 0; i < 1000000; i++) {
                ret += func1(i);
                ret -= func2(i);
                ret += func3(i);
                ret += func4(i);
                ret -= func5(i);
        }
        return ret;
}

int main(int argc, char **argv)
{
        printf("%s\n", __func__);
        return entry();
}
======

Here is the output I get with '--call-graph callee --no-children'
======
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 250  of event 'cycles:u'
# Event count (approx.): 26819859
#
# Overhead  Command  Shared Object         Symbol                               
# ........  .......  ....................  .....................................
#
    43.58%  a.out    a.out                 [.] func5
            |          
            |--28.93%--entry
            |          main
            |          __libc_start_call_main
            |          
             --14.65%--func4 (inlined)
                       |          
                       |--10.45%--entry
                       |          main
                       |          __libc_start_call_main
                       |          
                        --4.20%--func3 (inlined)
                                  entry
                                  main
                                  __libc_start_call_main

    38.80%  a.out    a.out                 [.] entry
            |          
            |--23.27%--func4 (inlined)
            |          |          
            |          |--20.28%--func3 (inlined)
            |          |          func2
            |          |          main
            |          |          __libc_start_call_main
            |          |          
            |           --2.99%--entry
            |                     main
            |                     __libc_start_call_main
            |          
            |--8.17%--func5
            |          main
            |          __libc_start_call_main
            |          
            |--3.89%--func1 (inlined)
            |          entry
            |          main
            |          __libc_start_call_main
            |          
             --3.48%--entry
                       main
                       __libc_start_call_main

    13.07%  a.out    a.out                 [.] func2
            |
            ---func5
               main
               __libc_start_call_main

     1.54%  a.out    [unknown]             [k] 0xffffffff81e011b7
     1.16%  a.out    [unknown]             [k] 0xffffffff81e00193
            |          
             --0.57%--__mmap64 (inlined)
                       __mmap64 (inlined)

     0.34%  a.out    ld-linux-x86-64.so.2  [.] __tunable_get_val
     0.34%  a.out    ld-linux-x86-64.so.2  [.] strcmp
     0.32%  a.out    libc.so.6             [.] strchr
     0.31%  a.out    ld-linux-x86-64.so.2  [.] _dl_relocate_object
     0.22%  a.out    ld-linux-x86-64.so.2  [.] _dl_init_paths
     0.18%  a.out    ld-linux-x86-64.so.2  [.] get_common_cache_info.constprop.0
     0.14%  a.out    ld-linux-x86-64.so.2  [.] __GI___tunables_init


#
# (Tip: Show individual samples with: perf script)
#
======

It does not seem to be out of order, or at least it is consistent with
what I get with dwarf unwinders.

> 
> Thanks,
> Namhyung
> 
> >
> > I used the following small program for testing:
> >
> > ======
> > #include <stdio.h>
> > #include <stdint.h>
> >
> > static __attribute__((noinline)) uint32_t func_noinline2(uint32_t i)
> > {
> >         return i + 10;
> > }
> >
> > static inline uint32_t func_declared_inlined(uint32_t i)
> > {
> >         return func_noinline2(i + 4);
> > }
> >
> > static __attribute__((noinline)) uint32_t func_noinline(uint32_t i)
> > {
> >         return func_declared_inlined(i + 3);
> > }
> >
> > static uint32_t func_inlined(uint32_t i)
> > {
> >         return func_noinline(i + 2);
> > }
> >
> > int main(int argc, char **argv)
> > {
> >         uint64_t ret = 0;
> >         uint32_t i = 0;
> >
> >         for (i = 0; i < 10000000; i++) {
> >                 ret += func_inlined(i);
> >                 ret -= func_noinline(i);
> >                 ret -= func_noinline2(i);
> >                 ret += func_declared_inlined(i);
> >         }
> >
> >         printf("%s: %ld\n", __func__, ret);
> >         return ret;
> > }
> > ======
> >
> > When built with "gcc -Wall -fno-omit-frame-pointer -O2 -g" (gcc
> > version 12.2.1 20221121 (Red Hat 12.2.1-4) (GCC)) it results in
> > func_inlined() and func_declared_inlined() being inlined into main()
> > and func_declared_inlined() also being inlined into func_noinline().
> >
> > Here are the diffs for 'perf report --call-graph --stdio' output before
> > and after the patch:
> >
> > Frame-pointer mode recorded with 'perf record --call-graph=fp --freq=max -- ./a.out'
> > ======
> > --- unpatched.fp        2023-03-16 13:27:22.724017900 +0100
> > +++ patched.fp  2023-03-16 13:27:28.608857921 +0100
> > @@ -14,18 +14,24 @@
> >              ---__libc_start_call_main
> >                 |
> >                 |--46.31%--main
> > +               |          |
> > +               |           --11.16%--func_declared_inlined (inlined)
> >                 |
> >                 |--27.41%--func_noinline
> > +               |          func_declared_inlined (inlined)
> >                 |
> >                  --25.68%--func_noinline2
> >
> >      85.71%    39.64%  a.out    a.out                 [.] main
> >              |
> >              |--46.07%--main
> > +            |          |
> > +            |           --11.16%--func_declared_inlined (inlined)
> >              |
> >               --39.64%--__libc_start_call_main
> >                         |
> >                         |--27.17%--func_noinline
> > +                       |          func_declared_inlined (inlined)
> >                         |
> >                          --12.23%--func_noinline2
> >
> > @@ -34,31 +40,47 @@
> >              |--46.31%--__libc_start_call_main
> >              |          |
> >              |           --46.07%--main
> > +            |                     |
> > +            |                      --11.16%--func_declared_inlined (inlined)
> >              |
> >               --25.44%--func_noinline2
> >
> >      40.56%    13.39%  a.out    a.out                 [.] func_noinline
> >              |
> >              |--27.17%--func_noinline
> > +            |          func_declared_inlined (inlined)
> >              |
> >               --13.39%--__libc_start_call_main
> >                         |
> >                          --13.15%--func_noinline2
> >
> > +    27.41%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> > +            |
> > +            ---func_declared_inlined (inlined)
> > +
> > +    11.16%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> > +            |
> > +            ---func_declared_inlined (inlined)
> > +
> >       0.30%     0.30%  a.out    [unknown]             [k] 0xffffffff81e011b7
> >       0.27%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start_user
> >       0.25%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_sysdep_start
> >       0.20%     0.00%  a.out    ld-linux-x86-64.so.2  [.] dl_main
> >       0.19%     0.19%  a.out    [unknown]             [k] 0xffffffff81e00193
> > +     0.18%     0.00%  a.out    a.out                 [.] func_inlined (inlined)
> >       0.10%     0.05%  a.out    ld-linux-x86-64.so.2  [.] _dl_relocate_object
> >       0.09%     0.00%  a.out    [unknown]             [k] 0x00007fef811be880
> >       0.09%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_map_object
> >       0.06%     0.00%  a.out    libc.so.6             [.] sysmalloc
> >       0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] munmap
> > -     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] mprotect
> > +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] rtld_timer_accum (inlined)
> > +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] rtld_timer_stop (inlined)
> > +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] __mprotect (inlined)
> >       0.05%     0.00%  a.out    libc.so.6             [.] __x86_cacheinfo_ifunc
> > +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] elf_dynamic_do_Rela (inlined)
> >       0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_map_object_from_fd
> > -     0.04%     0.00%  a.out    ld-linux-x86-64.so.2  [.] mmap64
> > +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] elf_get_dynamic_info (inlined)
> > +     0.04%     0.00%  a.out    ld-linux-x86-64.so.2  [.] __mmap64 (inlined)
> >       0.04%     0.04%  a.out    ld-linux-x86-64.so.2  [.] _dl_cache_libcmp
> >       0.04%     0.00%  a.out    [unknown]             [.] 0x000000000001b19a
> >       0.04%     0.00%  a.out    [unknown]             [k] 0000000000000000
> > @@ -66,13 +88,15 @@
> >       0.04%     0.04%  a.out    ld-linux-x86-64.so.2  [.] do_lookup_x
> >       0.03%     0.03%  a.out    ld-linux-x86-64.so.2  [.] intel_check_word.constprop.0
> >       0.03%     0.00%  a.out    [unknown]             [.] 0x0000000004000000
> > +     0.03%     0.00%  a.out    ld-linux-x86-64.so.2  [.] intel_check_word (inlined)
> >       0.02%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start
> >       0.02%     0.02%  a.out    ld-linux-x86-64.so.2  [.] __GI___tunables_init
> >       0.02%     0.00%  a.out    [unknown]             [.] 0x722f3d4b434f535f
> >       0.02%     0.00%  a.out    [unknown]             [.] 0x00007ffebdd69f59
> > +     0.02%     0.00%  a.out    ld-linux-x86-64.so.2  [.] tunable_is_name (inlined)
> >       0.01%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _start
> > ======
> >
> > LBR mode recorded with 'perf record --call-graph=lbr -- ./a.out'
> > ======
> > --- unpatched.lbr       2023-03-16 13:27:50.853253211 +0100
> > +++ patched.lbr 2023-03-16 13:27:56.312104813 +0100
> > @@ -6,58 +6,95 @@
> >  # Samples: 238  of event 'cycles:u'
> >  # Event count (approx.): 193485873
> >  #
> > -# Children      Self  Command  Shared Object         Symbol
> > -# ........  ........  .......  ....................  .................................
> > +# Children      Self  Command  Shared Object         Symbol
> > +# ........  ........  .......  ....................  .............................
> >  #
> >      99.11%    36.87%  a.out    a.out                 [.] main
> >              |
> >              |--62.24%--main
> >              |          |
> > -            |           --17.47%--func_noinline
> > +            |          |--11.05%--func_inlined (inlined)
> > +            |          |          func_noinline
> > +            |          |          func_declared_inlined (inlined)
> > +            |          |
> > +            |           --6.42%--func_noinline
> > +            |                     func_declared_inlined (inlined)
> >              |
> >               --36.87%--_start
> > -                       __libc_start_main@@GLIBC_2.34
> > +                       __libc_start_main_alias_2 (inlined)
> >                         __libc_start_call_main
> >                         main
> >                         |
> > -                       |--24.52%--func_noinline
> > +                       |--17.51%--func_inlined (inlined)
> > +                       |          func_noinline
> > +                       |          func_declared_inlined (inlined)
> > +                       |
> > +                       |--8.45%--func_noinline2
> >                         |
> > -                        --10.01%--func_noinline2
> > +                       |--7.01%--func_noinline
> > +                       |          func_declared_inlined (inlined)
> > +                       |
> > +                        --1.57%--func_declared_inlined (inlined)
> > +                                  func_noinline2
> >
> >      99.11%     0.00%  a.out    a.out                 [.] _start
> >              |
> >              ---_start
> > -               __libc_start_main@@GLIBC_2.34
> > +               __libc_start_main_alias_2 (inlined)
> >                 __libc_start_call_main
> >                 main
> >                 |
> > -               |--41.99%--func_noinline
> > +               |--28.55%--func_inlined (inlined)
> > +               |          func_noinline
> > +               |          func_declared_inlined (inlined)
> > +               |
> > +               |--13.43%--func_noinline
> > +               |          func_declared_inlined (inlined)
> >                 |
> > -                --10.37%--func_noinline2
> > +               |--8.81%--func_noinline2
> > +               |
> > +                --1.57%--func_declared_inlined (inlined)
> > +                          func_noinline2
> >
> > -    99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_main@@GLIBC_2.34
> > +    99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_main_alias_2 (inlined)
> >              |
> > -            ---__libc_start_main@@GLIBC_2.34
> > +            ---__libc_start_main_alias_2 (inlined)
> >                 __libc_start_call_main
> >                 main
> >                 |
> > -               |--41.99%--func_noinline
> > +               |--28.55%--func_inlined (inlined)
> > +               |          func_noinline
> > +               |          func_declared_inlined (inlined)
> > +               |
> > +               |--13.43%--func_noinline
> > +               |          func_declared_inlined (inlined)
> > +               |
> > +               |--8.81%--func_noinline2
> >                 |
> > -                --10.37%--func_noinline2
> > +                --1.57%--func_declared_inlined (inlined)
> > +                          func_noinline2
> >
> >      99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_call_main
> >              |
> >              ---__libc_start_call_main
> >                 main
> >                 |
> > -               |--41.99%--func_noinline
> > +               |--28.55%--func_inlined (inlined)
> > +               |          func_noinline
> > +               |          func_declared_inlined (inlined)
> >                 |
> > -                --10.37%--func_noinline2
> > +               |--13.43%--func_noinline
> > +               |          func_declared_inlined (inlined)
> > +               |
> > +               |--8.81%--func_noinline2
> > +               |
> > +                --1.57%--func_declared_inlined (inlined)
> > +                          func_noinline2
> >
> >      54.44%    44.43%  a.out    a.out                 [.] func_noinline2
> >              |
> >              |--44.43%--_start
> > -            |          __libc_start_main@@GLIBC_2.34
> > +            |          __libc_start_main_alias_2 (inlined)
> >              |          __libc_start_call_main
> >              |          main
> >              |
> > @@ -66,19 +103,42 @@
> >      41.99%    17.47%  a.out    a.out                 [.] func_noinline
> >              |
> >              |--24.52%--func_noinline
> > +            |          func_declared_inlined (inlined)
> >              |
> >               --17.47%--_start
> > -                       __libc_start_main@@GLIBC_2.34
> > +                       __libc_start_main_alias_2 (inlined)
> >                         __libc_start_call_main
> >                         main
> > -                       func_noinline
> > +                       |
> > +                       |--11.05%--func_inlined (inlined)
> > +                       |          func_noinline
> > +                       |          func_declared_inlined (inlined)
> > +                       |
> > +                        --6.42%--func_noinline
> > +                                  func_declared_inlined (inlined)
> > +
> > +    41.99%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> > +            |
> > +            ---func_declared_inlined (inlined)
> > +
> > +    28.55%     0.00%  a.out    a.out                 [.] func_inlined (inlined)
> > +            |
> > +            ---func_inlined (inlined)
> > +               func_noinline
> > +               func_declared_inlined (inlined)
> > +
> > +     1.57%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> > +            |
> > +            ---func_declared_inlined (inlined)
> > +               func_noinline2
> >
> >       0.88%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _start
> >              |
> >              ---_start
> >                 _dl_start
> >                 |
> > -                --0.71%--_dl_sysdep_start
> > +                --0.71%--_dl_start_final (inlined)
> > +                          _dl_sysdep_start
> >                            dl_main
> >                            _dl_map_object_deps
> >                            _dl_catch_exception
> > @@ -92,7 +152,8 @@
> >              |
> >              ---_dl_start
> >                 |
> > -                --0.71%--_dl_sysdep_start
> > +                --0.71%--_dl_start_final (inlined)
> > +                          _dl_sysdep_start
> >                            dl_main
> >                            _dl_map_object_deps
> >                            _dl_catch_exception
> > @@ -106,6 +167,20 @@
> >              |
> >              ---_start
> >                 _dl_start
> > +               _dl_start_final (inlined)
> > +               _dl_sysdep_start
> > +               dl_main
> > +               _dl_map_object_deps
> > +               _dl_catch_exception
> > +               openaux
> > +               _dl_map_object
> > +               _dl_map_object_from_fd
> > +               _dl_new_object
> > +               strlen
> > +
> > +     0.71%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start_final (inlined)
> > +            |
> > +            ---_dl_start_final (inlined)
> >                 _dl_sysdep_start
> >                 dl_main
> >                 _dl_map_object_deps
> > ======
> >
> > No meaningful differences in output with callchains in dwarf mode or when
> > tracing a binary that was compiled with no dwarf debuginfo.
> >
> > Artem Savkov (1):
> >   perf report: append inlines to non-dwarf callchains
> >
> >  tools/perf/util/machine.c | 82 ++++++++++++++++++++-------------------
> >  1 file changed, 43 insertions(+), 39 deletions(-)
> >
> > --
> > 2.39.2
> >
> 

-- 
 Artem


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-03-17  7:41   ` Artem Savkov
@ 2023-03-22 18:18     ` Namhyung Kim
  2023-03-22 19:44       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2023-03-22 18:18 UTC (permalink / raw)
  To: Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, linux-perf-users,
	linux-kernel, Milian Wolff, Masami Hiramatsu, Andrii Nakryiko

On Fri, Mar 17, 2023 at 12:41 AM Artem Savkov <asavkov@redhat.com> wrote:
>
> On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
> > Hello,
> >
> > On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
> > >
> > > In an email to Arnaldo Andrii Nakryiko suggested that perf can get
> > > information about inlined functions from dwarf when available and then
> > > add it to userspace stacktraces even in framepointer or lbr mode.
> > > Looking closer at perf it turned out all required bits and pieces are
> > > already there and inline information can be easily added to both
> > > framepointer and lbr callchains by adding an append_inlines() call to
> > > add_callchain_ip().
> >
> > Looks great!  Have you checked it with perf report -g callee ?
> > I'm not sure the ordering of inlined functions is maintained
> > properly.  Maybe you can use --no-children too to simplify
> > the output.
>
> Thanks for the suggestion. I actually have another test program with
> functions being numbered rather than (creatively) named, so it might be
> easier to use it to figure out ordering. Here's the code:

Yep, looks good.

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


>
> ======
> #include <stdio.h>
> #include <stdint.h>
>
> static __attribute__((noinline)) uint32_t func5(uint32_t i)
> {
>         return i + 10;
> }
>
> static uint32_t func4(uint32_t i)
> {
>         return func5(i + 5);
> }
>
> static inline uint32_t func3(uint32_t i)
> {
>         return func4(i + 4);
> }
>
> static __attribute__((noinline)) uint32_t func2(uint32_t i)
> {
>         return func3(i + 3);
> }
>
> static uint32_t func1(uint32_t i)
> {
>         return func2(i + 2);
> }
>
> __attribute__((noinline)) uint64_t entry(void)
> {
>         uint64_t ret = 0;
>         uint32_t i = 0;
>         for (i = 0; i < 1000000; i++) {
>                 ret += func1(i);
>                 ret -= func2(i);
>                 ret += func3(i);
>                 ret += func4(i);
>                 ret -= func5(i);
>         }
>         return ret;
> }
>
> int main(int argc, char **argv)
> {
>         printf("%s\n", __func__);
>         return entry();
> }
> ======
>
> Here is the output I get with '--call-graph callee --no-children'
> ======
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 250  of event 'cycles:u'
> # Event count (approx.): 26819859
> #
> # Overhead  Command  Shared Object         Symbol
> # ........  .......  ....................  .....................................
> #
>     43.58%  a.out    a.out                 [.] func5
>             |
>             |--28.93%--entry
>             |          main
>             |          __libc_start_call_main
>             |
>              --14.65%--func4 (inlined)
>                        |
>                        |--10.45%--entry
>                        |          main
>                        |          __libc_start_call_main
>                        |
>                         --4.20%--func3 (inlined)
>                                   entry
>                                   main
>                                   __libc_start_call_main
>
>     38.80%  a.out    a.out                 [.] entry
>             |
>             |--23.27%--func4 (inlined)
>             |          |
>             |          |--20.28%--func3 (inlined)
>             |          |          func2
>             |          |          main
>             |          |          __libc_start_call_main
>             |          |
>             |           --2.99%--entry
>             |                     main
>             |                     __libc_start_call_main
>             |
>             |--8.17%--func5
>             |          main
>             |          __libc_start_call_main
>             |
>             |--3.89%--func1 (inlined)
>             |          entry
>             |          main
>             |          __libc_start_call_main
>             |
>              --3.48%--entry
>                        main
>                        __libc_start_call_main
>
>     13.07%  a.out    a.out                 [.] func2
>             |
>             ---func5
>                main
>                __libc_start_call_main
>
>      1.54%  a.out    [unknown]             [k] 0xffffffff81e011b7
>      1.16%  a.out    [unknown]             [k] 0xffffffff81e00193
>             |
>              --0.57%--__mmap64 (inlined)
>                        __mmap64 (inlined)
>
>      0.34%  a.out    ld-linux-x86-64.so.2  [.] __tunable_get_val
>      0.34%  a.out    ld-linux-x86-64.so.2  [.] strcmp
>      0.32%  a.out    libc.so.6             [.] strchr
>      0.31%  a.out    ld-linux-x86-64.so.2  [.] _dl_relocate_object
>      0.22%  a.out    ld-linux-x86-64.so.2  [.] _dl_init_paths
>      0.18%  a.out    ld-linux-x86-64.so.2  [.] get_common_cache_info.constprop.0
>      0.14%  a.out    ld-linux-x86-64.so.2  [.] __GI___tunables_init
>
>
> #
> # (Tip: Show individual samples with: perf script)
> #
> ======
>
> It does not seem to be out of order, or at least it is consistent with
> what I get with dwarf unwinders.
>
> >
> > Thanks,
> > Namhyung
> >
> > >
> > > I used the following small program for testing:
> > >
> > > ======
> > > #include <stdio.h>
> > > #include <stdint.h>
> > >
> > > static __attribute__((noinline)) uint32_t func_noinline2(uint32_t i)
> > > {
> > >         return i + 10;
> > > }
> > >
> > > static inline uint32_t func_declared_inlined(uint32_t i)
> > > {
> > >         return func_noinline2(i + 4);
> > > }
> > >
> > > static __attribute__((noinline)) uint32_t func_noinline(uint32_t i)
> > > {
> > >         return func_declared_inlined(i + 3);
> > > }
> > >
> > > static uint32_t func_inlined(uint32_t i)
> > > {
> > >         return func_noinline(i + 2);
> > > }
> > >
> > > int main(int argc, char **argv)
> > > {
> > >         uint64_t ret = 0;
> > >         uint32_t i = 0;
> > >
> > >         for (i = 0; i < 10000000; i++) {
> > >                 ret += func_inlined(i);
> > >                 ret -= func_noinline(i);
> > >                 ret -= func_noinline2(i);
> > >                 ret += func_declared_inlined(i);
> > >         }
> > >
> > >         printf("%s: %ld\n", __func__, ret);
> > >         return ret;
> > > }
> > > ======
> > >
> > > When built with "gcc -Wall -fno-omit-frame-pointer -O2 -g" (gcc
> > > version 12.2.1 20221121 (Red Hat 12.2.1-4) (GCC)) it results in
> > > func_inlined() and func_declared_inlined() being inlined into main()
> > > and func_declared_inlined() also being inlined into func_noinline().
> > >
> > > Here are the diffs for 'perf report --call-graph --stdio' output before
> > > and after the patch:
> > >
> > > Frame-pointer mode recorded with 'perf record --call-graph=fp --freq=max -- ./a.out'
> > > ======
> > > --- unpatched.fp        2023-03-16 13:27:22.724017900 +0100
> > > +++ patched.fp  2023-03-16 13:27:28.608857921 +0100
> > > @@ -14,18 +14,24 @@
> > >              ---__libc_start_call_main
> > >                 |
> > >                 |--46.31%--main
> > > +               |          |
> > > +               |           --11.16%--func_declared_inlined (inlined)
> > >                 |
> > >                 |--27.41%--func_noinline
> > > +               |          func_declared_inlined (inlined)
> > >                 |
> > >                  --25.68%--func_noinline2
> > >
> > >      85.71%    39.64%  a.out    a.out                 [.] main
> > >              |
> > >              |--46.07%--main
> > > +            |          |
> > > +            |           --11.16%--func_declared_inlined (inlined)
> > >              |
> > >               --39.64%--__libc_start_call_main
> > >                         |
> > >                         |--27.17%--func_noinline
> > > +                       |          func_declared_inlined (inlined)
> > >                         |
> > >                          --12.23%--func_noinline2
> > >
> > > @@ -34,31 +40,47 @@
> > >              |--46.31%--__libc_start_call_main
> > >              |          |
> > >              |           --46.07%--main
> > > +            |                     |
> > > +            |                      --11.16%--func_declared_inlined (inlined)
> > >              |
> > >               --25.44%--func_noinline2
> > >
> > >      40.56%    13.39%  a.out    a.out                 [.] func_noinline
> > >              |
> > >              |--27.17%--func_noinline
> > > +            |          func_declared_inlined (inlined)
> > >              |
> > >               --13.39%--__libc_start_call_main
> > >                         |
> > >                          --13.15%--func_noinline2
> > >
> > > +    27.41%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> > > +            |
> > > +            ---func_declared_inlined (inlined)
> > > +
> > > +    11.16%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> > > +            |
> > > +            ---func_declared_inlined (inlined)
> > > +
> > >       0.30%     0.30%  a.out    [unknown]             [k] 0xffffffff81e011b7
> > >       0.27%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start_user
> > >       0.25%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_sysdep_start
> > >       0.20%     0.00%  a.out    ld-linux-x86-64.so.2  [.] dl_main
> > >       0.19%     0.19%  a.out    [unknown]             [k] 0xffffffff81e00193
> > > +     0.18%     0.00%  a.out    a.out                 [.] func_inlined (inlined)
> > >       0.10%     0.05%  a.out    ld-linux-x86-64.so.2  [.] _dl_relocate_object
> > >       0.09%     0.00%  a.out    [unknown]             [k] 0x00007fef811be880
> > >       0.09%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_map_object
> > >       0.06%     0.00%  a.out    libc.so.6             [.] sysmalloc
> > >       0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] munmap
> > > -     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] mprotect
> > > +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] rtld_timer_accum (inlined)
> > > +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] rtld_timer_stop (inlined)
> > > +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] __mprotect (inlined)
> > >       0.05%     0.00%  a.out    libc.so.6             [.] __x86_cacheinfo_ifunc
> > > +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] elf_dynamic_do_Rela (inlined)
> > >       0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_map_object_from_fd
> > > -     0.04%     0.00%  a.out    ld-linux-x86-64.so.2  [.] mmap64
> > > +     0.05%     0.00%  a.out    ld-linux-x86-64.so.2  [.] elf_get_dynamic_info (inlined)
> > > +     0.04%     0.00%  a.out    ld-linux-x86-64.so.2  [.] __mmap64 (inlined)
> > >       0.04%     0.04%  a.out    ld-linux-x86-64.so.2  [.] _dl_cache_libcmp
> > >       0.04%     0.00%  a.out    [unknown]             [.] 0x000000000001b19a
> > >       0.04%     0.00%  a.out    [unknown]             [k] 0000000000000000
> > > @@ -66,13 +88,15 @@
> > >       0.04%     0.04%  a.out    ld-linux-x86-64.so.2  [.] do_lookup_x
> > >       0.03%     0.03%  a.out    ld-linux-x86-64.so.2  [.] intel_check_word.constprop.0
> > >       0.03%     0.00%  a.out    [unknown]             [.] 0x0000000004000000
> > > +     0.03%     0.00%  a.out    ld-linux-x86-64.so.2  [.] intel_check_word (inlined)
> > >       0.02%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start
> > >       0.02%     0.02%  a.out    ld-linux-x86-64.so.2  [.] __GI___tunables_init
> > >       0.02%     0.00%  a.out    [unknown]             [.] 0x722f3d4b434f535f
> > >       0.02%     0.00%  a.out    [unknown]             [.] 0x00007ffebdd69f59
> > > +     0.02%     0.00%  a.out    ld-linux-x86-64.so.2  [.] tunable_is_name (inlined)
> > >       0.01%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _start
> > > ======
> > >
> > > LBR mode recorded with 'perf record --call-graph=lbr -- ./a.out'
> > > ======
> > > --- unpatched.lbr       2023-03-16 13:27:50.853253211 +0100
> > > +++ patched.lbr 2023-03-16 13:27:56.312104813 +0100
> > > @@ -6,58 +6,95 @@
> > >  # Samples: 238  of event 'cycles:u'
> > >  # Event count (approx.): 193485873
> > >  #
> > > -# Children      Self  Command  Shared Object         Symbol
> > > -# ........  ........  .......  ....................  .................................
> > > +# Children      Self  Command  Shared Object         Symbol
> > > +# ........  ........  .......  ....................  .............................
> > >  #
> > >      99.11%    36.87%  a.out    a.out                 [.] main
> > >              |
> > >              |--62.24%--main
> > >              |          |
> > > -            |           --17.47%--func_noinline
> > > +            |          |--11.05%--func_inlined (inlined)
> > > +            |          |          func_noinline
> > > +            |          |          func_declared_inlined (inlined)
> > > +            |          |
> > > +            |           --6.42%--func_noinline
> > > +            |                     func_declared_inlined (inlined)
> > >              |
> > >               --36.87%--_start
> > > -                       __libc_start_main@@GLIBC_2.34
> > > +                       __libc_start_main_alias_2 (inlined)
> > >                         __libc_start_call_main
> > >                         main
> > >                         |
> > > -                       |--24.52%--func_noinline
> > > +                       |--17.51%--func_inlined (inlined)
> > > +                       |          func_noinline
> > > +                       |          func_declared_inlined (inlined)
> > > +                       |
> > > +                       |--8.45%--func_noinline2
> > >                         |
> > > -                        --10.01%--func_noinline2
> > > +                       |--7.01%--func_noinline
> > > +                       |          func_declared_inlined (inlined)
> > > +                       |
> > > +                        --1.57%--func_declared_inlined (inlined)
> > > +                                  func_noinline2
> > >
> > >      99.11%     0.00%  a.out    a.out                 [.] _start
> > >              |
> > >              ---_start
> > > -               __libc_start_main@@GLIBC_2.34
> > > +               __libc_start_main_alias_2 (inlined)
> > >                 __libc_start_call_main
> > >                 main
> > >                 |
> > > -               |--41.99%--func_noinline
> > > +               |--28.55%--func_inlined (inlined)
> > > +               |          func_noinline
> > > +               |          func_declared_inlined (inlined)
> > > +               |
> > > +               |--13.43%--func_noinline
> > > +               |          func_declared_inlined (inlined)
> > >                 |
> > > -                --10.37%--func_noinline2
> > > +               |--8.81%--func_noinline2
> > > +               |
> > > +                --1.57%--func_declared_inlined (inlined)
> > > +                          func_noinline2
> > >
> > > -    99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_main@@GLIBC_2.34
> > > +    99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_main_alias_2 (inlined)
> > >              |
> > > -            ---__libc_start_main@@GLIBC_2.34
> > > +            ---__libc_start_main_alias_2 (inlined)
> > >                 __libc_start_call_main
> > >                 main
> > >                 |
> > > -               |--41.99%--func_noinline
> > > +               |--28.55%--func_inlined (inlined)
> > > +               |          func_noinline
> > > +               |          func_declared_inlined (inlined)
> > > +               |
> > > +               |--13.43%--func_noinline
> > > +               |          func_declared_inlined (inlined)
> > > +               |
> > > +               |--8.81%--func_noinline2
> > >                 |
> > > -                --10.37%--func_noinline2
> > > +                --1.57%--func_declared_inlined (inlined)
> > > +                          func_noinline2
> > >
> > >      99.11%     0.00%  a.out    libc.so.6             [.] __libc_start_call_main
> > >              |
> > >              ---__libc_start_call_main
> > >                 main
> > >                 |
> > > -               |--41.99%--func_noinline
> > > +               |--28.55%--func_inlined (inlined)
> > > +               |          func_noinline
> > > +               |          func_declared_inlined (inlined)
> > >                 |
> > > -                --10.37%--func_noinline2
> > > +               |--13.43%--func_noinline
> > > +               |          func_declared_inlined (inlined)
> > > +               |
> > > +               |--8.81%--func_noinline2
> > > +               |
> > > +                --1.57%--func_declared_inlined (inlined)
> > > +                          func_noinline2
> > >
> > >      54.44%    44.43%  a.out    a.out                 [.] func_noinline2
> > >              |
> > >              |--44.43%--_start
> > > -            |          __libc_start_main@@GLIBC_2.34
> > > +            |          __libc_start_main_alias_2 (inlined)
> > >              |          __libc_start_call_main
> > >              |          main
> > >              |
> > > @@ -66,19 +103,42 @@
> > >      41.99%    17.47%  a.out    a.out                 [.] func_noinline
> > >              |
> > >              |--24.52%--func_noinline
> > > +            |          func_declared_inlined (inlined)
> > >              |
> > >               --17.47%--_start
> > > -                       __libc_start_main@@GLIBC_2.34
> > > +                       __libc_start_main_alias_2 (inlined)
> > >                         __libc_start_call_main
> > >                         main
> > > -                       func_noinline
> > > +                       |
> > > +                       |--11.05%--func_inlined (inlined)
> > > +                       |          func_noinline
> > > +                       |          func_declared_inlined (inlined)
> > > +                       |
> > > +                        --6.42%--func_noinline
> > > +                                  func_declared_inlined (inlined)
> > > +
> > > +    41.99%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> > > +            |
> > > +            ---func_declared_inlined (inlined)
> > > +
> > > +    28.55%     0.00%  a.out    a.out                 [.] func_inlined (inlined)
> > > +            |
> > > +            ---func_inlined (inlined)
> > > +               func_noinline
> > > +               func_declared_inlined (inlined)
> > > +
> > > +     1.57%     0.00%  a.out    a.out                 [.] func_declared_inlined (inlined)
> > > +            |
> > > +            ---func_declared_inlined (inlined)
> > > +               func_noinline2
> > >
> > >       0.88%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _start
> > >              |
> > >              ---_start
> > >                 _dl_start
> > >                 |
> > > -                --0.71%--_dl_sysdep_start
> > > +                --0.71%--_dl_start_final (inlined)
> > > +                          _dl_sysdep_start
> > >                            dl_main
> > >                            _dl_map_object_deps
> > >                            _dl_catch_exception
> > > @@ -92,7 +152,8 @@
> > >              |
> > >              ---_dl_start
> > >                 |
> > > -                --0.71%--_dl_sysdep_start
> > > +                --0.71%--_dl_start_final (inlined)
> > > +                          _dl_sysdep_start
> > >                            dl_main
> > >                            _dl_map_object_deps
> > >                            _dl_catch_exception
> > > @@ -106,6 +167,20 @@
> > >              |
> > >              ---_start
> > >                 _dl_start
> > > +               _dl_start_final (inlined)
> > > +               _dl_sysdep_start
> > > +               dl_main
> > > +               _dl_map_object_deps
> > > +               _dl_catch_exception
> > > +               openaux
> > > +               _dl_map_object
> > > +               _dl_map_object_from_fd
> > > +               _dl_new_object
> > > +               strlen
> > > +
> > > +     0.71%     0.00%  a.out    ld-linux-x86-64.so.2  [.] _dl_start_final (inlined)
> > > +            |
> > > +            ---_dl_start_final (inlined)
> > >                 _dl_sysdep_start
> > >                 dl_main
> > >                 _dl_map_object_deps
> > > ======
> > >
> > > No meaningful differences in output with callchains in dwarf mode or when
> > > tracing a binary that was compiled with no dwarf debuginfo.
> > >
> > > Artem Savkov (1):
> > >   perf report: append inlines to non-dwarf callchains
> > >
> > >  tools/perf/util/machine.c | 82 ++++++++++++++++++++-------------------
> > >  1 file changed, 43 insertions(+), 39 deletions(-)
> > >
> > > --
> > > 2.39.2
> > >
> >
>
> --
>  Artem
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-03-22 18:18     ` Namhyung Kim
@ 2023-03-22 19:44       ` Arnaldo Carvalho de Melo
  2023-03-23  8:22         ` Artem Savkov
  2023-03-30  5:06         ` Adrian Hunter
  0 siblings, 2 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-03-22 19:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, linux-perf-users,
	linux-kernel, Milian Wolff, Masami Hiramatsu, Andrii Nakryiko

Em Wed, Mar 22, 2023 at 11:18:49AM -0700, Namhyung Kim escreveu:
> On Fri, Mar 17, 2023 at 12:41 AM Artem Savkov <asavkov@redhat.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
> > > Hello,
> > >
> > > On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
> > > >
> > > > In an email to Arnaldo Andrii Nakryiko suggested that perf can get
> > > > information about inlined functions from dwarf when available and then
> > > > add it to userspace stacktraces even in framepointer or lbr mode.
> > > > Looking closer at perf it turned out all required bits and pieces are
> > > > already there and inline information can be easily added to both
> > > > framepointer and lbr callchains by adding an append_inlines() call to
> > > > add_callchain_ip().
> > >
> > > Looks great!  Have you checked it with perf report -g callee ?
> > > I'm not sure the ordering of inlined functions is maintained
> > > properly.  Maybe you can use --no-children too to simplify
> > > the output.
> >
> > Thanks for the suggestion. I actually have another test program with
> > functions being numbered rather than (creatively) named, so it might be
> > easier to use it to figure out ordering. Here's the code:
> 
> Yep, looks good.
> 
> Acked-by: Namhyung Kim <namhyung@kernel.org>

So, I'll apply this shorter patch instead, ok?

- Arnaldo

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 803c9d1803dd26ef..abf6167f28217fe6 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -44,6 +44,7 @@
 #include <linux/zalloc.h>
 
 static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
+static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
 
 static struct dso *machine__kernel_dso(struct machine *machine)
 {
@@ -2322,6 +2323,10 @@ static int add_callchain_ip(struct thread *thread,
 	ms.maps = al.maps;
 	ms.map = al.map;
 	ms.sym = al.sym;
+
+	if (append_inlines(cursor, &ms, ip) == 0)
+		return 0;
+
 	srcline = callchain_srcline(&ms, al.addr);
 	return callchain_cursor_append(cursor, ip, &ms,
 				       branch, flags, nr_loop_iter,

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-03-22 19:44       ` Arnaldo Carvalho de Melo
@ 2023-03-23  8:22         ` Artem Savkov
  2023-03-30  5:06         ` Adrian Hunter
  1 sibling, 0 replies; 14+ messages in thread
From: Artem Savkov @ 2023-03-23  8:22 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	linux-perf-users, linux-kernel, Milian Wolff, Masami Hiramatsu,
	Andrii Nakryiko

On Wed, Mar 22, 2023 at 04:44:20PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Mar 22, 2023 at 11:18:49AM -0700, Namhyung Kim escreveu:
> > On Fri, Mar 17, 2023 at 12:41 AM Artem Savkov <asavkov@redhat.com> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
> > > > Hello,
> > > >
> > > > On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
> > > > >
> > > > > In an email to Arnaldo Andrii Nakryiko suggested that perf can get
> > > > > information about inlined functions from dwarf when available and then
> > > > > add it to userspace stacktraces even in framepointer or lbr mode.
> > > > > Looking closer at perf it turned out all required bits and pieces are
> > > > > already there and inline information can be easily added to both
> > > > > framepointer and lbr callchains by adding an append_inlines() call to
> > > > > add_callchain_ip().
> > > >
> > > > Looks great!  Have you checked it with perf report -g callee ?
> > > > I'm not sure the ordering of inlined functions is maintained
> > > > properly.  Maybe you can use --no-children too to simplify
> > > > the output.
> > >
> > > Thanks for the suggestion. I actually have another test program with
> > > functions being numbered rather than (creatively) named, so it might be
> > > easier to use it to figure out ordering. Here's the code:
> > 
> > Yep, looks good.
> > 
> > Acked-by: Namhyung Kim <namhyung@kernel.org>
> 
> So, I'll apply this shorter patch instead, ok?

Looks good to me, much cleaner. Thanks.

> - Arnaldo
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 803c9d1803dd26ef..abf6167f28217fe6 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -44,6 +44,7 @@
>  #include <linux/zalloc.h>
>  
>  static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
> +static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
>  
>  static struct dso *machine__kernel_dso(struct machine *machine)
>  {
> @@ -2322,6 +2323,10 @@ static int add_callchain_ip(struct thread *thread,
>  	ms.maps = al.maps;
>  	ms.map = al.map;
>  	ms.sym = al.sym;
> +
> +	if (append_inlines(cursor, &ms, ip) == 0)
> +		return 0;
> +
>  	srcline = callchain_srcline(&ms, al.addr);
>  	return callchain_cursor_append(cursor, ip, &ms,
>  				       branch, flags, nr_loop_iter,

-- 
 Artem


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-03-22 19:44       ` Arnaldo Carvalho de Melo
  2023-03-23  8:22         ` Artem Savkov
@ 2023-03-30  5:06         ` Adrian Hunter
  2023-03-31  8:52           ` Artem Savkov
  1 sibling, 1 reply; 14+ messages in thread
From: Adrian Hunter @ 2023-03-30  5:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Namhyung Kim, asavkov
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, linux-perf-users, linux-kernel,
	Milian Wolff, Masami Hiramatsu, Andrii Nakryiko

On 22/03/23 21:44, Arnaldo Carvalho de Melo wrote:
> Em Wed, Mar 22, 2023 at 11:18:49AM -0700, Namhyung Kim escreveu:
>> On Fri, Mar 17, 2023 at 12:41 AM Artem Savkov <asavkov@redhat.com> wrote:
>>>
>>> On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
>>>> Hello,
>>>>
>>>> On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
>>>>>
>>>>> In an email to Arnaldo Andrii Nakryiko suggested that perf can get
>>>>> information about inlined functions from dwarf when available and then
>>>>> add it to userspace stacktraces even in framepointer or lbr mode.
>>>>> Looking closer at perf it turned out all required bits and pieces are
>>>>> already there and inline information can be easily added to both
>>>>> framepointer and lbr callchains by adding an append_inlines() call to
>>>>> add_callchain_ip().
>>>>
>>>> Looks great!  Have you checked it with perf report -g callee ?
>>>> I'm not sure the ordering of inlined functions is maintained
>>>> properly.  Maybe you can use --no-children too to simplify
>>>> the output.
>>>
>>> Thanks for the suggestion. I actually have another test program with
>>> functions being numbered rather than (creatively) named, so it might be
>>> easier to use it to figure out ordering. Here's the code:
>>
>> Yep, looks good.
>>
>> Acked-by: Namhyung Kim <namhyung@kernel.org>
> 
> So, I'll apply this shorter patch instead, ok?
> 
> - Arnaldo
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 803c9d1803dd26ef..abf6167f28217fe6 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -44,6 +44,7 @@
>  #include <linux/zalloc.h>
>  
>  static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
> +static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
>  
>  static struct dso *machine__kernel_dso(struct machine *machine)
>  {
> @@ -2322,6 +2323,10 @@ static int add_callchain_ip(struct thread *thread,
>  	ms.maps = al.maps;
>  	ms.map = al.map;
>  	ms.sym = al.sym;
> +
> +	if (append_inlines(cursor, &ms, ip) == 0)
> +		return 0;
> +
>  	srcline = callchain_srcline(&ms, al.addr);
>  	return callchain_cursor_append(cursor, ip, &ms,
>  				       branch, flags, nr_loop_iter,

This seems to be breaking --branch-history.  I am not sure
append_inlines() makes sense for branches.  Maybe this should be:

	if (!branch && !append_inlines(cursor, &ms, ip))
		return 0;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-03-30  5:06         ` Adrian Hunter
@ 2023-03-31  8:52           ` Artem Savkov
  2023-04-03 20:30             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 14+ messages in thread
From: Artem Savkov @ 2023-03-31  8:52 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, linux-perf-users, linux-kernel, Milian Wolff,
	Masami Hiramatsu, Andrii Nakryiko

On Thu, Mar 30, 2023 at 08:06:20AM +0300, Adrian Hunter wrote:
> On 22/03/23 21:44, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Mar 22, 2023 at 11:18:49AM -0700, Namhyung Kim escreveu:
> >> On Fri, Mar 17, 2023 at 12:41 AM Artem Savkov <asavkov@redhat.com> wrote:
> >>>
> >>> On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
> >>>> Hello,
> >>>>
> >>>> On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
> >>>>>
> >>>>> In an email to Arnaldo Andrii Nakryiko suggested that perf can get
> >>>>> information about inlined functions from dwarf when available and then
> >>>>> add it to userspace stacktraces even in framepointer or lbr mode.
> >>>>> Looking closer at perf it turned out all required bits and pieces are
> >>>>> already there and inline information can be easily added to both
> >>>>> framepointer and lbr callchains by adding an append_inlines() call to
> >>>>> add_callchain_ip().
> >>>>
> >>>> Looks great!  Have you checked it with perf report -g callee ?
> >>>> I'm not sure the ordering of inlined functions is maintained
> >>>> properly.  Maybe you can use --no-children too to simplify
> >>>> the output.
> >>>
> >>> Thanks for the suggestion. I actually have another test program with
> >>> functions being numbered rather than (creatively) named, so it might be
> >>> easier to use it to figure out ordering. Here's the code:
> >>
> >> Yep, looks good.
> >>
> >> Acked-by: Namhyung Kim <namhyung@kernel.org>
> > 
> > So, I'll apply this shorter patch instead, ok?
> > 
> > - Arnaldo
> > 
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 803c9d1803dd26ef..abf6167f28217fe6 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -44,6 +44,7 @@
> >  #include <linux/zalloc.h>
> >  
> >  static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
> > +static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> >  
> >  static struct dso *machine__kernel_dso(struct machine *machine)
> >  {
> > @@ -2322,6 +2323,10 @@ static int add_callchain_ip(struct thread *thread,
> >  	ms.maps = al.maps;
> >  	ms.map = al.map;
> >  	ms.sym = al.sym;
> > +
> > +	if (append_inlines(cursor, &ms, ip) == 0)
> > +		return 0;
> > +
> >  	srcline = callchain_srcline(&ms, al.addr);
> >  	return callchain_cursor_append(cursor, ip, &ms,
> >  				       branch, flags, nr_loop_iter,
> 
> This seems to be breaking --branch-history.  I am not sure
> append_inlines() makes sense for branches.  Maybe this should be:
> 
> 	if (!branch && !append_inlines(cursor, &ms, ip))
> 		return 0;
> 

Right. So when cllchain_cursor is appended through append_inlines it
always discards branch information, even for the non-inlined function.
So adding !branch makes sense to me. Does anyone else see any problems
with that?

-- 
 Artem


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-03-31  8:52           ` Artem Savkov
@ 2023-04-03 20:30             ` Arnaldo Carvalho de Melo
  2023-04-04  5:47               ` Namhyung Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-04-03 20:30 UTC (permalink / raw)
  To: Artem Savkov
  Cc: Adrian Hunter, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	linux-perf-users, linux-kernel, Milian Wolff, Masami Hiramatsu,
	Andrii Nakryiko

Em Fri, Mar 31, 2023 at 10:52:24AM +0200, Artem Savkov escreveu:
> On Thu, Mar 30, 2023 at 08:06:20AM +0300, Adrian Hunter wrote:
> > On 22/03/23 21:44, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Mar 22, 2023 at 11:18:49AM -0700, Namhyung Kim escreveu:
> > >> On Fri, Mar 17, 2023 at 12:41 AM Artem Savkov <asavkov@redhat.com> wrote:
> > >>>
> > >>> On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
> > >>>> Hello,
> > >>>>
> > >>>> On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
> > >>>>>
> > >>>>> In an email to Arnaldo Andrii Nakryiko suggested that perf can get
> > >>>>> information about inlined functions from dwarf when available and then
> > >>>>> add it to userspace stacktraces even in framepointer or lbr mode.
> > >>>>> Looking closer at perf it turned out all required bits and pieces are
> > >>>>> already there and inline information can be easily added to both
> > >>>>> framepointer and lbr callchains by adding an append_inlines() call to
> > >>>>> add_callchain_ip().
> > >>>>
> > >>>> Looks great!  Have you checked it with perf report -g callee ?
> > >>>> I'm not sure the ordering of inlined functions is maintained
> > >>>> properly.  Maybe you can use --no-children too to simplify
> > >>>> the output.
> > >>>
> > >>> Thanks for the suggestion. I actually have another test program with
> > >>> functions being numbered rather than (creatively) named, so it might be
> > >>> easier to use it to figure out ordering. Here's the code:
> > >>
> > >> Yep, looks good.
> > >>
> > >> Acked-by: Namhyung Kim <namhyung@kernel.org>
> > > 
> > > So, I'll apply this shorter patch instead, ok?
> > > 
> > > - Arnaldo
> > > 
> > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > index 803c9d1803dd26ef..abf6167f28217fe6 100644
> > > --- a/tools/perf/util/machine.c
> > > +++ b/tools/perf/util/machine.c
> > > @@ -44,6 +44,7 @@
> > >  #include <linux/zalloc.h>
> > >  
> > >  static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
> > > +static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> > >  
> > >  static struct dso *machine__kernel_dso(struct machine *machine)
> > >  {
> > > @@ -2322,6 +2323,10 @@ static int add_callchain_ip(struct thread *thread,
> > >  	ms.maps = al.maps;
> > >  	ms.map = al.map;
> > >  	ms.sym = al.sym;
> > > +
> > > +	if (append_inlines(cursor, &ms, ip) == 0)
> > > +		return 0;
> > > +
> > >  	srcline = callchain_srcline(&ms, al.addr);
> > >  	return callchain_cursor_append(cursor, ip, &ms,
> > >  				       branch, flags, nr_loop_iter,
> > 
> > This seems to be breaking --branch-history.  I am not sure
> > append_inlines() makes sense for branches.  Maybe this should be:
> > 
> > 	if (!branch && !append_inlines(cursor, &ms, ip))
> > 		return 0;
> > 
> 
> Right. So when cllchain_cursor is appended through append_inlines it
> always discards branch information, even for the non-inlined function.
> So adding !branch makes sense to me. Does anyone else see any problems
> with that?

I'm no expert in this specific area, so for now till we get to a
conclusion on this, I'll follow Andi's suggestion and revert this patch.

- Arnaldo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-04-03 20:30             ` Arnaldo Carvalho de Melo
@ 2023-04-04  5:47               ` Namhyung Kim
  2023-04-04  6:57                 ` Adrian Hunter
  2023-04-04  6:58                 ` Artem Savkov
  0 siblings, 2 replies; 14+ messages in thread
From: Namhyung Kim @ 2023-04-04  5:47 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Artem Savkov, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	linux-perf-users, linux-kernel, Milian Wolff, Masami Hiramatsu,
	Andrii Nakryiko

On Mon, Apr 3, 2023 at 1:30 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> Em Fri, Mar 31, 2023 at 10:52:24AM +0200, Artem Savkov escreveu:
> > On Thu, Mar 30, 2023 at 08:06:20AM +0300, Adrian Hunter wrote:
> > > On 22/03/23 21:44, Arnaldo Carvalho de Melo wrote:
> > > > Em Wed, Mar 22, 2023 at 11:18:49AM -0700, Namhyung Kim escreveu:
> > > >> On Fri, Mar 17, 2023 at 12:41 AM Artem Savkov <asavkov@redhat.com> wrote:
> > > >>>
> > > >>> On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
> > > >>>> Hello,
> > > >>>>
> > > >>>> On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
> > > >>>>>
> > > >>>>> In an email to Arnaldo Andrii Nakryiko suggested that perf can get
> > > >>>>> information about inlined functions from dwarf when available and then
> > > >>>>> add it to userspace stacktraces even in framepointer or lbr mode.
> > > >>>>> Looking closer at perf it turned out all required bits and pieces are
> > > >>>>> already there and inline information can be easily added to both
> > > >>>>> framepointer and lbr callchains by adding an append_inlines() call to
> > > >>>>> add_callchain_ip().
> > > >>>>
> > > >>>> Looks great!  Have you checked it with perf report -g callee ?
> > > >>>> I'm not sure the ordering of inlined functions is maintained
> > > >>>> properly.  Maybe you can use --no-children too to simplify
> > > >>>> the output.
> > > >>>
> > > >>> Thanks for the suggestion. I actually have another test program with
> > > >>> functions being numbered rather than (creatively) named, so it might be
> > > >>> easier to use it to figure out ordering. Here's the code:
> > > >>
> > > >> Yep, looks good.
> > > >>
> > > >> Acked-by: Namhyung Kim <namhyung@kernel.org>
> > > >
> > > > So, I'll apply this shorter patch instead, ok?
> > > >
> > > > - Arnaldo
> > > >
> > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > index 803c9d1803dd26ef..abf6167f28217fe6 100644
> > > > --- a/tools/perf/util/machine.c
> > > > +++ b/tools/perf/util/machine.c
> > > > @@ -44,6 +44,7 @@
> > > >  #include <linux/zalloc.h>
> > > >
> > > >  static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
> > > > +static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> > > >
> > > >  static struct dso *machine__kernel_dso(struct machine *machine)
> > > >  {
> > > > @@ -2322,6 +2323,10 @@ static int add_callchain_ip(struct thread *thread,
> > > >   ms.maps = al.maps;
> > > >   ms.map = al.map;
> > > >   ms.sym = al.sym;
> > > > +
> > > > + if (append_inlines(cursor, &ms, ip) == 0)
> > > > +         return 0;
> > > > +
> > > >   srcline = callchain_srcline(&ms, al.addr);
> > > >   return callchain_cursor_append(cursor, ip, &ms,
> > > >                                  branch, flags, nr_loop_iter,
> > >
> > > This seems to be breaking --branch-history.  I am not sure
> > > append_inlines() makes sense for branches.  Maybe this should be:
> > >
> > >     if (!branch && !append_inlines(cursor, &ms, ip))
> > >             return 0;
> > >
> >
> > Right. So when cllchain_cursor is appended through append_inlines it
> > always discards branch information, even for the non-inlined function.
> > So adding !branch makes sense to me. Does anyone else see any problems
> > with that?
>
> I'm no expert in this specific area, so for now till we get to a
> conclusion on this, I'll follow Andi's suggestion and revert this patch.

I think we can simply apply Adrian's patch above.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-04-04  5:47               ` Namhyung Kim
@ 2023-04-04  6:57                 ` Adrian Hunter
  2023-04-04  6:58                 ` Artem Savkov
  1 sibling, 0 replies; 14+ messages in thread
From: Adrian Hunter @ 2023-04-04  6:57 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Artem Savkov, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, linux-perf-users,
	linux-kernel, Milian Wolff, Masami Hiramatsu, Andrii Nakryiko

On 4/04/23 08:47, Namhyung Kim wrote:
> On Mon, Apr 3, 2023 at 1:30 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>>
>> Em Fri, Mar 31, 2023 at 10:52:24AM +0200, Artem Savkov escreveu:
>>> On Thu, Mar 30, 2023 at 08:06:20AM +0300, Adrian Hunter wrote:
>>>> On 22/03/23 21:44, Arnaldo Carvalho de Melo wrote:
>>>>> Em Wed, Mar 22, 2023 at 11:18:49AM -0700, Namhyung Kim escreveu:
>>>>>> On Fri, Mar 17, 2023 at 12:41 AM Artem Savkov <asavkov@redhat.com> wrote:
>>>>>>>
>>>>>>> On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> In an email to Arnaldo Andrii Nakryiko suggested that perf can get
>>>>>>>>> information about inlined functions from dwarf when available and then
>>>>>>>>> add it to userspace stacktraces even in framepointer or lbr mode.
>>>>>>>>> Looking closer at perf it turned out all required bits and pieces are
>>>>>>>>> already there and inline information can be easily added to both
>>>>>>>>> framepointer and lbr callchains by adding an append_inlines() call to
>>>>>>>>> add_callchain_ip().
>>>>>>>>
>>>>>>>> Looks great!  Have you checked it with perf report -g callee ?
>>>>>>>> I'm not sure the ordering of inlined functions is maintained
>>>>>>>> properly.  Maybe you can use --no-children too to simplify
>>>>>>>> the output.
>>>>>>>
>>>>>>> Thanks for the suggestion. I actually have another test program with
>>>>>>> functions being numbered rather than (creatively) named, so it might be
>>>>>>> easier to use it to figure out ordering. Here's the code:
>>>>>>
>>>>>> Yep, looks good.
>>>>>>
>>>>>> Acked-by: Namhyung Kim <namhyung@kernel.org>
>>>>>
>>>>> So, I'll apply this shorter patch instead, ok?
>>>>>
>>>>> - Arnaldo
>>>>>
>>>>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>>>>> index 803c9d1803dd26ef..abf6167f28217fe6 100644
>>>>> --- a/tools/perf/util/machine.c
>>>>> +++ b/tools/perf/util/machine.c
>>>>> @@ -44,6 +44,7 @@
>>>>>  #include <linux/zalloc.h>
>>>>>
>>>>>  static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
>>>>> +static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
>>>>>
>>>>>  static struct dso *machine__kernel_dso(struct machine *machine)
>>>>>  {
>>>>> @@ -2322,6 +2323,10 @@ static int add_callchain_ip(struct thread *thread,
>>>>>   ms.maps = al.maps;
>>>>>   ms.map = al.map;
>>>>>   ms.sym = al.sym;
>>>>> +
>>>>> + if (append_inlines(cursor, &ms, ip) == 0)
>>>>> +         return 0;
>>>>> +
>>>>>   srcline = callchain_srcline(&ms, al.addr);
>>>>>   return callchain_cursor_append(cursor, ip, &ms,
>>>>>                                  branch, flags, nr_loop_iter,
>>>>
>>>> This seems to be breaking --branch-history.  I am not sure
>>>> append_inlines() makes sense for branches.  Maybe this should be:
>>>>
>>>>     if (!branch && !append_inlines(cursor, &ms, ip))
>>>>             return 0;
>>>>
>>>
>>> Right. So when cllchain_cursor is appended through append_inlines it
>>> always discards branch information, even for the non-inlined function.
>>> So adding !branch makes sense to me. Does anyone else see any problems
>>> with that?
>>
>> I'm no expert in this specific area, so for now till we get to a
>> conclusion on this, I'll follow Andi's suggestion and revert this patch.
> 
> I think we can simply apply Adrian's patch above.

Yes.  The thing is, inserting inline functions into a branch
stack doesn't work very well.  For example, there can be multiple
branches in the same inline function, so adding the inlines on
every branch makes a mess.  Then there is how to handle branching
out of, and then back into, the same inline function - or is it
a second instance of the inline function...  All too hard, so just
don't try.  At least for now.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-04-04  5:47               ` Namhyung Kim
  2023-04-04  6:57                 ` Adrian Hunter
@ 2023-04-04  6:58                 ` Artem Savkov
  2023-04-04 12:22                   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 14+ messages in thread
From: Artem Savkov @ 2023-04-04  6:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, linux-perf-users, linux-kernel, Milian Wolff,
	Masami Hiramatsu, Andrii Nakryiko

On Mon, Apr 03, 2023 at 10:47:37PM -0700, Namhyung Kim wrote:
> On Mon, Apr 3, 2023 at 1:30 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >
> > Em Fri, Mar 31, 2023 at 10:52:24AM +0200, Artem Savkov escreveu:
> > > On Thu, Mar 30, 2023 at 08:06:20AM +0300, Adrian Hunter wrote:
> > > > On 22/03/23 21:44, Arnaldo Carvalho de Melo wrote:
> > > > > Em Wed, Mar 22, 2023 at 11:18:49AM -0700, Namhyung Kim escreveu:
> > > > >> On Fri, Mar 17, 2023 at 12:41 AM Artem Savkov <asavkov@redhat.com> wrote:
> > > > >>>
> > > > >>> On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
> > > > >>>> Hello,
> > > > >>>>
> > > > >>>> On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
> > > > >>>>>
> > > > >>>>> In an email to Arnaldo Andrii Nakryiko suggested that perf can get
> > > > >>>>> information about inlined functions from dwarf when available and then
> > > > >>>>> add it to userspace stacktraces even in framepointer or lbr mode.
> > > > >>>>> Looking closer at perf it turned out all required bits and pieces are
> > > > >>>>> already there and inline information can be easily added to both
> > > > >>>>> framepointer and lbr callchains by adding an append_inlines() call to
> > > > >>>>> add_callchain_ip().
> > > > >>>>
> > > > >>>> Looks great!  Have you checked it with perf report -g callee ?
> > > > >>>> I'm not sure the ordering of inlined functions is maintained
> > > > >>>> properly.  Maybe you can use --no-children too to simplify
> > > > >>>> the output.
> > > > >>>
> > > > >>> Thanks for the suggestion. I actually have another test program with
> > > > >>> functions being numbered rather than (creatively) named, so it might be
> > > > >>> easier to use it to figure out ordering. Here's the code:
> > > > >>
> > > > >> Yep, looks good.
> > > > >>
> > > > >> Acked-by: Namhyung Kim <namhyung@kernel.org>
> > > > >
> > > > > So, I'll apply this shorter patch instead, ok?
> > > > >
> > > > > - Arnaldo
> > > > >
> > > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > > index 803c9d1803dd26ef..abf6167f28217fe6 100644
> > > > > --- a/tools/perf/util/machine.c
> > > > > +++ b/tools/perf/util/machine.c
> > > > > @@ -44,6 +44,7 @@
> > > > >  #include <linux/zalloc.h>
> > > > >
> > > > >  static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
> > > > > +static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> > > > >
> > > > >  static struct dso *machine__kernel_dso(struct machine *machine)
> > > > >  {
> > > > > @@ -2322,6 +2323,10 @@ static int add_callchain_ip(struct thread *thread,
> > > > >   ms.maps = al.maps;
> > > > >   ms.map = al.map;
> > > > >   ms.sym = al.sym;
> > > > > +
> > > > > + if (append_inlines(cursor, &ms, ip) == 0)
> > > > > +         return 0;
> > > > > +
> > > > >   srcline = callchain_srcline(&ms, al.addr);
> > > > >   return callchain_cursor_append(cursor, ip, &ms,
> > > > >                                  branch, flags, nr_loop_iter,
> > > >
> > > > This seems to be breaking --branch-history.  I am not sure
> > > > append_inlines() makes sense for branches.  Maybe this should be:
> > > >
> > > >     if (!branch && !append_inlines(cursor, &ms, ip))
> > > >             return 0;
> > > >
> > >
> > > Right. So when cllchain_cursor is appended through append_inlines it
> > > always discards branch information, even for the non-inlined function.
> > > So adding !branch makes sense to me. Does anyone else see any problems
> > > with that?
> >
> > I'm no expert in this specific area, so for now till we get to a
> > conclusion on this, I'll follow Andi's suggestion and revert this patch.
> 
> I think we can simply apply Adrian's patch above.

I can send a v2 with this fix included if that'll be more convenient.

-- 
 Artem


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
  2023-04-04  6:58                 ` Artem Savkov
@ 2023-04-04 12:22                   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-04-04 12:22 UTC (permalink / raw)
  To: Artem Savkov
  Cc: Namhyung Kim, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	linux-perf-users, linux-kernel, Milian Wolff, Masami Hiramatsu,
	Andrii Nakryiko

Em Tue, Apr 04, 2023 at 08:58:07AM +0200, Artem Savkov escreveu:
> On Mon, Apr 03, 2023 at 10:47:37PM -0700, Namhyung Kim wrote:
> > On Mon, Apr 3, 2023 at 1:30 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > >
> > > Em Fri, Mar 31, 2023 at 10:52:24AM +0200, Artem Savkov escreveu:
> > > > On Thu, Mar 30, 2023 at 08:06:20AM +0300, Adrian Hunter wrote:
> > > > > On 22/03/23 21:44, Arnaldo Carvalho de Melo wrote:
> > > > > > Em Wed, Mar 22, 2023 at 11:18:49AM -0700, Namhyung Kim escreveu:
> > > > > >> On Fri, Mar 17, 2023 at 12:41 AM Artem Savkov <asavkov@redhat.com> wrote:
> > > > > >>>
> > > > > >>> On Thu, Mar 16, 2023 at 02:26:18PM -0700, Namhyung Kim wrote:
> > > > > >>>> Hello,
> > > > > >>>>
> > > > > >>>> On Thu, Mar 16, 2023 at 6:36 AM Artem Savkov <asavkov@redhat.com> wrote:
> > > > > >>>>>
> > > > > >>>>> In an email to Arnaldo Andrii Nakryiko suggested that perf can get
> > > > > >>>>> information about inlined functions from dwarf when available and then
> > > > > >>>>> add it to userspace stacktraces even in framepointer or lbr mode.
> > > > > >>>>> Looking closer at perf it turned out all required bits and pieces are
> > > > > >>>>> already there and inline information can be easily added to both
> > > > > >>>>> framepointer and lbr callchains by adding an append_inlines() call to
> > > > > >>>>> add_callchain_ip().
> > > > > >>>>
> > > > > >>>> Looks great!  Have you checked it with perf report -g callee ?
> > > > > >>>> I'm not sure the ordering of inlined functions is maintained
> > > > > >>>> properly.  Maybe you can use --no-children too to simplify
> > > > > >>>> the output.
> > > > > >>>
> > > > > >>> Thanks for the suggestion. I actually have another test program with
> > > > > >>> functions being numbered rather than (creatively) named, so it might be
> > > > > >>> easier to use it to figure out ordering. Here's the code:
> > > > > >>
> > > > > >> Yep, looks good.
> > > > > >>
> > > > > >> Acked-by: Namhyung Kim <namhyung@kernel.org>
> > > > > >
> > > > > > So, I'll apply this shorter patch instead, ok?
> > > > > >
> > > > > > - Arnaldo
> > > > > >
> > > > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > > > index 803c9d1803dd26ef..abf6167f28217fe6 100644
> > > > > > --- a/tools/perf/util/machine.c
> > > > > > +++ b/tools/perf/util/machine.c
> > > > > > @@ -44,6 +44,7 @@
> > > > > >  #include <linux/zalloc.h>
> > > > > >
> > > > > >  static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
> > > > > > +static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> > > > > >
> > > > > >  static struct dso *machine__kernel_dso(struct machine *machine)
> > > > > >  {
> > > > > > @@ -2322,6 +2323,10 @@ static int add_callchain_ip(struct thread *thread,
> > > > > >   ms.maps = al.maps;
> > > > > >   ms.map = al.map;
> > > > > >   ms.sym = al.sym;
> > > > > > +
> > > > > > + if (append_inlines(cursor, &ms, ip) == 0)
> > > > > > +         return 0;
> > > > > > +
> > > > > >   srcline = callchain_srcline(&ms, al.addr);
> > > > > >   return callchain_cursor_append(cursor, ip, &ms,
> > > > > >                                  branch, flags, nr_loop_iter,
> > > > >
> > > > > This seems to be breaking --branch-history.  I am not sure
> > > > > append_inlines() makes sense for branches.  Maybe this should be:
> > > > >
> > > > >     if (!branch && !append_inlines(cursor, &ms, ip))
> > > > >             return 0;
> > > > >
> > > >
> > > > Right. So when cllchain_cursor is appended through append_inlines it
> > > > always discards branch information, even for the non-inlined function.
> > > > So adding !branch makes sense to me. Does anyone else see any problems
> > > > with that?
> > >
> > > I'm no expert in this specific area, so for now till we get to a
> > > conclusion on this, I'll follow Andi's suggestion and revert this patch.
> > 
> > I think we can simply apply Adrian's patch above.
> 
> I can send a v2 with this fix included if that'll be more convenient.

No need, I ammended your original patch with Adrian's change.

- Arnaldo

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-04-04 12:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-16 13:35 [PATCH 0/1] perf report: append inlines to non-dwarf callchains Artem Savkov
2023-03-16 13:35 ` [PATCH 1/1] " Artem Savkov
2023-03-16 21:26 ` [PATCH 0/1] " Namhyung Kim
2023-03-17  7:41   ` Artem Savkov
2023-03-22 18:18     ` Namhyung Kim
2023-03-22 19:44       ` Arnaldo Carvalho de Melo
2023-03-23  8:22         ` Artem Savkov
2023-03-30  5:06         ` Adrian Hunter
2023-03-31  8:52           ` Artem Savkov
2023-04-03 20:30             ` Arnaldo Carvalho de Melo
2023-04-04  5:47               ` Namhyung Kim
2023-04-04  6:57                 ` Adrian Hunter
2023-04-04  6:58                 ` Artem Savkov
2023-04-04 12:22                   ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).