From: Artem Savkov <asavkov@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Milian Wolff <milian.wolff@kdab.com>,
Masami Hiramatsu <mhiramat@kernel.org>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Artem Savkov <asavkov@redhat.com>
Subject: [PATCH 0/1] perf report: append inlines to non-dwarf callchains
Date: Thu, 16 Mar 2023 14:35:56 +0100 [thread overview]
Message-ID: <20230316133557.868731-1-asavkov@redhat.com> (raw)
In an email to Arnaldo Andrii Nakryiko suggested that perf can get
information about inlined functions from dwarf when available and then
add it to userspace stacktraces even in framepointer or lbr mode.
Looking closer at perf it turned out all required bits and pieces are
already there and inline information can be easily added to both
framepointer and lbr callchains by adding an append_inlines() call to
add_callchain_ip().
I used the following small program for testing:
======
#include <stdio.h>
#include <stdint.h>
static __attribute__((noinline)) uint32_t func_noinline2(uint32_t i)
{
return i + 10;
}
static inline uint32_t func_declared_inlined(uint32_t i)
{
return func_noinline2(i + 4);
}
static __attribute__((noinline)) uint32_t func_noinline(uint32_t i)
{
return func_declared_inlined(i + 3);
}
static uint32_t func_inlined(uint32_t i)
{
return func_noinline(i + 2);
}
int main(int argc, char **argv)
{
uint64_t ret = 0;
uint32_t i = 0;
for (i = 0; i < 10000000; i++) {
ret += func_inlined(i);
ret -= func_noinline(i);
ret -= func_noinline2(i);
ret += func_declared_inlined(i);
}
printf("%s: %ld\n", __func__, ret);
return ret;
}
======
When built with "gcc -Wall -fno-omit-frame-pointer -O2 -g" (gcc
version 12.2.1 20221121 (Red Hat 12.2.1-4) (GCC)) it results in
func_inlined() and func_declared_inlined() being inlined into main()
and func_declared_inlined() also being inlined into func_noinline().
Here are the diffs for 'perf report --call-graph --stdio' output before
and after the patch:
Frame-pointer mode recorded with 'perf record --call-graph=fp --freq=max -- ./a.out'
======
--- unpatched.fp 2023-03-16 13:27:22.724017900 +0100
+++ patched.fp 2023-03-16 13:27:28.608857921 +0100
@@ -14,18 +14,24 @@
---__libc_start_call_main
|
|--46.31%--main
+ | |
+ | --11.16%--func_declared_inlined (inlined)
|
|--27.41%--func_noinline
+ | func_declared_inlined (inlined)
|
--25.68%--func_noinline2
85.71% 39.64% a.out a.out [.] main
|
|--46.07%--main
+ | |
+ | --11.16%--func_declared_inlined (inlined)
|
--39.64%--__libc_start_call_main
|
|--27.17%--func_noinline
+ | func_declared_inlined (inlined)
|
--12.23%--func_noinline2
@@ -34,31 +40,47 @@
|--46.31%--__libc_start_call_main
| |
| --46.07%--main
+ | |
+ | --11.16%--func_declared_inlined (inlined)
|
--25.44%--func_noinline2
40.56% 13.39% a.out a.out [.] func_noinline
|
|--27.17%--func_noinline
+ | func_declared_inlined (inlined)
|
--13.39%--__libc_start_call_main
|
--13.15%--func_noinline2
+ 27.41% 0.00% a.out a.out [.] func_declared_inlined (inlined)
+ |
+ ---func_declared_inlined (inlined)
+
+ 11.16% 0.00% a.out a.out [.] func_declared_inlined (inlined)
+ |
+ ---func_declared_inlined (inlined)
+
0.30% 0.30% a.out [unknown] [k] 0xffffffff81e011b7
0.27% 0.00% a.out ld-linux-x86-64.so.2 [.] _dl_start_user
0.25% 0.00% a.out ld-linux-x86-64.so.2 [.] _dl_sysdep_start
0.20% 0.00% a.out ld-linux-x86-64.so.2 [.] dl_main
0.19% 0.19% a.out [unknown] [k] 0xffffffff81e00193
+ 0.18% 0.00% a.out a.out [.] func_inlined (inlined)
0.10% 0.05% a.out ld-linux-x86-64.so.2 [.] _dl_relocate_object
0.09% 0.00% a.out [unknown] [k] 0x00007fef811be880
0.09% 0.00% a.out ld-linux-x86-64.so.2 [.] _dl_map_object
0.06% 0.00% a.out libc.so.6 [.] sysmalloc
0.05% 0.00% a.out ld-linux-x86-64.so.2 [.] munmap
- 0.05% 0.00% a.out ld-linux-x86-64.so.2 [.] mprotect
+ 0.05% 0.00% a.out ld-linux-x86-64.so.2 [.] rtld_timer_accum (inlined)
+ 0.05% 0.00% a.out ld-linux-x86-64.so.2 [.] rtld_timer_stop (inlined)
+ 0.05% 0.00% a.out ld-linux-x86-64.so.2 [.] __mprotect (inlined)
0.05% 0.00% a.out libc.so.6 [.] __x86_cacheinfo_ifunc
+ 0.05% 0.00% a.out ld-linux-x86-64.so.2 [.] elf_dynamic_do_Rela (inlined)
0.05% 0.00% a.out ld-linux-x86-64.so.2 [.] _dl_map_object_from_fd
- 0.04% 0.00% a.out ld-linux-x86-64.so.2 [.] mmap64
+ 0.05% 0.00% a.out ld-linux-x86-64.so.2 [.] elf_get_dynamic_info (inlined)
+ 0.04% 0.00% a.out ld-linux-x86-64.so.2 [.] __mmap64 (inlined)
0.04% 0.04% a.out ld-linux-x86-64.so.2 [.] _dl_cache_libcmp
0.04% 0.00% a.out [unknown] [.] 0x000000000001b19a
0.04% 0.00% a.out [unknown] [k] 0000000000000000
@@ -66,13 +88,15 @@
0.04% 0.04% a.out ld-linux-x86-64.so.2 [.] do_lookup_x
0.03% 0.03% a.out ld-linux-x86-64.so.2 [.] intel_check_word.constprop.0
0.03% 0.00% a.out [unknown] [.] 0x0000000004000000
+ 0.03% 0.00% a.out ld-linux-x86-64.so.2 [.] intel_check_word (inlined)
0.02% 0.00% a.out ld-linux-x86-64.so.2 [.] _dl_start
0.02% 0.02% a.out ld-linux-x86-64.so.2 [.] __GI___tunables_init
0.02% 0.00% a.out [unknown] [.] 0x722f3d4b434f535f
0.02% 0.00% a.out [unknown] [.] 0x00007ffebdd69f59
+ 0.02% 0.00% a.out ld-linux-x86-64.so.2 [.] tunable_is_name (inlined)
0.01% 0.00% a.out ld-linux-x86-64.so.2 [.] _start
======
LBR mode recorded with 'perf record --call-graph=lbr -- ./a.out'
======
--- unpatched.lbr 2023-03-16 13:27:50.853253211 +0100
+++ patched.lbr 2023-03-16 13:27:56.312104813 +0100
@@ -6,58 +6,95 @@
# Samples: 238 of event 'cycles:u'
# Event count (approx.): 193485873
#
-# Children Self Command Shared Object Symbol
-# ........ ........ ....... .................... .................................
+# Children Self Command Shared Object Symbol
+# ........ ........ ....... .................... .............................
#
99.11% 36.87% a.out a.out [.] main
|
|--62.24%--main
| |
- | --17.47%--func_noinline
+ | |--11.05%--func_inlined (inlined)
+ | | func_noinline
+ | | func_declared_inlined (inlined)
+ | |
+ | --6.42%--func_noinline
+ | func_declared_inlined (inlined)
|
--36.87%--_start
- __libc_start_main@@GLIBC_2.34
+ __libc_start_main_alias_2 (inlined)
__libc_start_call_main
main
|
- |--24.52%--func_noinline
+ |--17.51%--func_inlined (inlined)
+ | func_noinline
+ | func_declared_inlined (inlined)
+ |
+ |--8.45%--func_noinline2
|
- --10.01%--func_noinline2
+ |--7.01%--func_noinline
+ | func_declared_inlined (inlined)
+ |
+ --1.57%--func_declared_inlined (inlined)
+ func_noinline2
99.11% 0.00% a.out a.out [.] _start
|
---_start
- __libc_start_main@@GLIBC_2.34
+ __libc_start_main_alias_2 (inlined)
__libc_start_call_main
main
|
- |--41.99%--func_noinline
+ |--28.55%--func_inlined (inlined)
+ | func_noinline
+ | func_declared_inlined (inlined)
+ |
+ |--13.43%--func_noinline
+ | func_declared_inlined (inlined)
|
- --10.37%--func_noinline2
+ |--8.81%--func_noinline2
+ |
+ --1.57%--func_declared_inlined (inlined)
+ func_noinline2
- 99.11% 0.00% a.out libc.so.6 [.] __libc_start_main@@GLIBC_2.34
+ 99.11% 0.00% a.out libc.so.6 [.] __libc_start_main_alias_2 (inlined)
|
- ---__libc_start_main@@GLIBC_2.34
+ ---__libc_start_main_alias_2 (inlined)
__libc_start_call_main
main
|
- |--41.99%--func_noinline
+ |--28.55%--func_inlined (inlined)
+ | func_noinline
+ | func_declared_inlined (inlined)
+ |
+ |--13.43%--func_noinline
+ | func_declared_inlined (inlined)
+ |
+ |--8.81%--func_noinline2
|
- --10.37%--func_noinline2
+ --1.57%--func_declared_inlined (inlined)
+ func_noinline2
99.11% 0.00% a.out libc.so.6 [.] __libc_start_call_main
|
---__libc_start_call_main
main
|
- |--41.99%--func_noinline
+ |--28.55%--func_inlined (inlined)
+ | func_noinline
+ | func_declared_inlined (inlined)
|
- --10.37%--func_noinline2
+ |--13.43%--func_noinline
+ | func_declared_inlined (inlined)
+ |
+ |--8.81%--func_noinline2
+ |
+ --1.57%--func_declared_inlined (inlined)
+ func_noinline2
54.44% 44.43% a.out a.out [.] func_noinline2
|
|--44.43%--_start
- | __libc_start_main@@GLIBC_2.34
+ | __libc_start_main_alias_2 (inlined)
| __libc_start_call_main
| main
|
@@ -66,19 +103,42 @@
41.99% 17.47% a.out a.out [.] func_noinline
|
|--24.52%--func_noinline
+ | func_declared_inlined (inlined)
|
--17.47%--_start
- __libc_start_main@@GLIBC_2.34
+ __libc_start_main_alias_2 (inlined)
__libc_start_call_main
main
- func_noinline
+ |
+ |--11.05%--func_inlined (inlined)
+ | func_noinline
+ | func_declared_inlined (inlined)
+ |
+ --6.42%--func_noinline
+ func_declared_inlined (inlined)
+
+ 41.99% 0.00% a.out a.out [.] func_declared_inlined (inlined)
+ |
+ ---func_declared_inlined (inlined)
+
+ 28.55% 0.00% a.out a.out [.] func_inlined (inlined)
+ |
+ ---func_inlined (inlined)
+ func_noinline
+ func_declared_inlined (inlined)
+
+ 1.57% 0.00% a.out a.out [.] func_declared_inlined (inlined)
+ |
+ ---func_declared_inlined (inlined)
+ func_noinline2
0.88% 0.00% a.out ld-linux-x86-64.so.2 [.] _start
|
---_start
_dl_start
|
- --0.71%--_dl_sysdep_start
+ --0.71%--_dl_start_final (inlined)
+ _dl_sysdep_start
dl_main
_dl_map_object_deps
_dl_catch_exception
@@ -92,7 +152,8 @@
|
---_dl_start
|
- --0.71%--_dl_sysdep_start
+ --0.71%--_dl_start_final (inlined)
+ _dl_sysdep_start
dl_main
_dl_map_object_deps
_dl_catch_exception
@@ -106,6 +167,20 @@
|
---_start
_dl_start
+ _dl_start_final (inlined)
+ _dl_sysdep_start
+ dl_main
+ _dl_map_object_deps
+ _dl_catch_exception
+ openaux
+ _dl_map_object
+ _dl_map_object_from_fd
+ _dl_new_object
+ strlen
+
+ 0.71% 0.00% a.out ld-linux-x86-64.so.2 [.] _dl_start_final (inlined)
+ |
+ ---_dl_start_final (inlined)
_dl_sysdep_start
dl_main
_dl_map_object_deps
======
No meaningful differences in output with callchains in dwarf mode or when
tracing a binary that was compiled with no dwarf debuginfo.
Artem Savkov (1):
perf report: append inlines to non-dwarf callchains
tools/perf/util/machine.c | 82 ++++++++++++++++++++-------------------
1 file changed, 43 insertions(+), 39 deletions(-)
--
2.39.2
next reply other threads:[~2023-03-16 13:37 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-16 13:35 Artem Savkov [this message]
2023-03-16 13:35 ` [PATCH 1/1] perf report: append inlines to non-dwarf callchains Artem Savkov
2023-03-16 21:26 ` [PATCH 0/1] " Namhyung Kim
2023-03-17 7:41 ` Artem Savkov
2023-03-22 18:18 ` Namhyung Kim
2023-03-22 19:44 ` Arnaldo Carvalho de Melo
2023-03-23 8:22 ` Artem Savkov
2023-03-30 5:06 ` Adrian Hunter
2023-03-31 8:52 ` Artem Savkov
2023-04-03 20:30 ` Arnaldo Carvalho de Melo
2023-04-04 5:47 ` Namhyung Kim
2023-04-04 6:57 ` Adrian Hunter
2023-04-04 6:58 ` Artem Savkov
2023-04-04 12:22 ` Arnaldo Carvalho de Melo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230316133557.868731-1-asavkov@redhat.com \
--to=asavkov@redhat.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=andrii.nakryiko@gmail.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mhiramat@kernel.org \
--cc=milian.wolff@kdab.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.