linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/14] generate full callchain cursor entries for inlined frames
@ 2017-08-06 21:24 Milian Wolff
  2017-08-06 21:24 ` [PATCH v2 01/14] perf report: remove code to handle inline frames from browsers Milian Wolff
                   ` (13 more replies)
  0 siblings, 14 replies; 34+ messages in thread
From: Milian Wolff @ 2017-08-06 21:24 UTC (permalink / raw)
  To: acme, Jin Yao, Linux-kernel, Namhyung Kim; +Cc: linux-perf-users, Milian Wolff

This series of patches completely reworks the way inline frames are handled.
Instead of querying for the inline nodes on-demand in the individual tools,
we now create proper callchain nodes for inlined frames. The advantages this
approach brings are numerous:

- less duplicated code in the individual browser
- aggregated cost for inlined frames for the --children top-down list
- various bug fixes that arose from querying for a srcline/symbol based on
  the IP of a sample, which will always point to the last inlined frame
  instead of the corresponding non-inlined frame
- overall much better support for visualizing cost for heavily-inlined C++
  code, which simply was confusing and unreliably before
- srcline honors the global setting as to whether full paths or basenames
  should be shown
- caches for inlined frames and srcline information, which allow us to
  enable inline frame handling by default

For comparison, below lists the output before and after for `perf script`
and `perf report`. The example file I used to generate the perf data is:

~~~~~
$ cat inlining.cpp
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 10000000; ++i) {
        s += norm(complex<double>(uniform(engine), uniform(engine)));
    }
    cout << s << '\n';
    return 0;
}
$ g++ -O2 -g -o inlining inlining.cpp
$ perf record --call-graph dwarf ./inlining
~~~~~

Now, the (broken) status-quo looks like this. Look for "NOTE:" to see some
of my comments that outline the various issues I'm trying to solve by this
patch series.

~~~~~
$ perf script --inline
...
inlining 11083 97459.356656:      33680 cycles:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                         std::__complex_abs
                         std::abs<double>
                         std::_Norm_helper<true>::_S_do_it<double>
                         std::norm<double>
                         main
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)
# NOTE: the above inlined stack is confusing: the a4a is an address into main,
#       which is the non-inlined symbol. the entry with the address should be
#       at the end of the stack, where it's actually duplicated once more but
#       there it's missing the address
...
$ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio
...
             --38.86%--_start
                       __libc_start_main
                       |
                       |--15.68%--main random.tcc:3326
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
                       |
                       |--10.36%--main random.h:143
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:332 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:151 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:143 (inline)
                       |
                       |--5.66%--main random.tcc:3332
                       |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
...
# NOTE: the grouping is totally off because the first and last frame of the
        inline nodes is completely bogus, since the IP is used to find the sym/srcline
        which is different from the actual inlined sym/srcline.
        also, the code currently displays either the inlined function name or
        the corresponding filename (but in full length, instead of just the basename).

$ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio --no-children
...
    38.86%  [.] main
            |
            |--15.68%--main random.tcc:3326
            |          /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
            |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
            |          /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
            |          __libc_start_main
            |          _start
...
# NOTE: the srcline for main is wrong, it should be inlining.cpp:14,
        i.e. what is displayed in the line below (see also perf script issue above)
~~~~~

Afterwards, all of the above issues are resolved (and inlined frames are
displayed by default):

~~~~~
$ perf script
...
inlining 11083 97459.356656:      33680 cycles:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a std::__complex_abs (inlined)
                     a4a std::abs<double> (inlined)
                     a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
                     a4a std::norm<double> (inlined)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)
...
# NOTE: only one main entry, at the correct position.
        we do display the (repeated) instruction pointer as that ensures
        interoperability with e.g. the stackcollapse-perf.pl script

$ perf report -s sym -g srcline -i perf.inlining.data --stdio
...
   100.00%    38.86%  [.] main
            |
            |--61.14%--main inlining.cpp:14
            |          std::norm<double> complex:664 (inlined)
            |          std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
            |          std::abs<double> complex:597 (inlined)
            |          std::__complex_abs complex:589 (inlined)
            |          |
            |          |--60.29%--hypot
            |          |          |
            |          |           --56.03%--__hypot_finite
            |          |
            |           --0.85%--cabs
            |
             --38.86%--_start
                       __libc_start_main
                       |
                       |--38.19%--main inlining.cpp:14
                       |          |
                       |          |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
                       |          |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
                       |          |          |
                       |          |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
                       |          |                     |
                       |          |                     |--17.91%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3332 (inlined)
                       |          |                     |          |
                       |          |                     |           --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() random.h:332 (inlined)
                       |          |                     |                     std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> random.h:151 (inlined)
                       |          |                     |                     |
                       |          |                     |                     |--10.36%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:143 (inlined)
                       |          |                     |                     |
                       |          |                     |                      --1.88%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:141 (inlined)
                       |          |                     |
                       |          |                     |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
                       |          |                     |
                       |          |                      --0.79%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3335 (inlined)
                       |          |
                       |           --1.99%--std::norm<double> complex:664 (inlined)
                       |                     std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
                       |                     std::abs<double> complex:597 (inlined)
                       |                     std::__complex_abs complex:589 (inlined)
                       |
                        --0.67%--main inlining.cpp:13
...

# NOTE: still somewhat confusing due to the _start and __libc_start_main frames
        that actually are *above* the main frame. But at least the stuff below
        properly splits up and shows that mutiple functions got inlined into
        inlining.cpp:14, not just one as before.

$ perf report -s sym -g srcline -i perf.inlining.data --stdio --no-children
...
    38.86%  [.] main
            |
            |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
            |          std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
            |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
            |          main inlining.cpp:14
            |          __libc_start_main
            |          _start
...
# NOTE: the first and last entry of the inline stack have the correct symbol and srcline now
        both function and srcline is shown, as well as the (inlined) suffix
        only the basename of the srcline is shown

v2 fixes some issues reported by Namhyung or found by me in further
testing, adds caching and enables inline frames by default.

Milian Wolff (14):
  perf report: remove code to handle inline frames from browsers
  perf util: take elf_name as const string in dso__demangle_sym
  perf report: create real callchain entries for inlined frames
  perf report: fall-back to function name comparison for -g srcline
  perf report: mark inlined frames in output by " (inlined)" suffix
  perf script: mark inlined frames and do not print DSO for them
  perf report: compare symbol name for inlined frames when matching
  perf report: compare symbol name for inlined frames when sorting
  perf report: properly handle branch count in match_chain
  perf report: cache failed lookups of inlined frames
  perf report: cache srclines for callchain nodes
  perf report: use srcline from callchain for hist entries
  perf util: do not consider empty files as valid srclines
  perf util: enable handling of inlined frames by default

 tools/perf/Documentation/perf-report.txt |   3 +-
 tools/perf/Documentation/perf-script.txt |   3 +-
 tools/perf/ui/browsers/hists.c           | 180 ++------------------
 tools/perf/ui/stdio/hist.c               |  77 +--------
 tools/perf/util/callchain.c              | 151 ++++++++---------
 tools/perf/util/callchain.h              |   6 +-
 tools/perf/util/dso.c                    |   3 +
 tools/perf/util/dso.h                    |   2 +
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/evsel_fprintf.c          |  37 +----
 tools/perf/util/hist.c                   |   7 +-
 tools/perf/util/machine.c                |  66 +++++++-
 tools/perf/util/sort.c                   |   6 +
 tools/perf/util/sort.h                   |   1 -
 tools/perf/util/srcline.c                | 271 +++++++++++++++++++++++++------
 tools/perf/util/srcline.h                |  26 ++-
 tools/perf/util/symbol-elf.c             |   2 +-
 tools/perf/util/symbol-minimal.c         |   2 +-
 tools/perf/util/symbol.c                 |   1 +
 tools/perf/util/symbol.h                 |   4 +-
 20 files changed, 426 insertions(+), 423 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-09-07 15:16 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-06 21:24 [PATCH v2 00/14] generate full callchain cursor entries for inlined frames Milian Wolff
2017-08-06 21:24 ` [PATCH v2 01/14] perf report: remove code to handle inline frames from browsers Milian Wolff
2017-08-07 15:07   ` Arnaldo Carvalho de Melo
2017-08-07 19:22     ` Milian Wolff
2017-08-07 20:16       ` Arnaldo Carvalho de Melo
2017-08-06 21:24 ` [PATCH v2 02/14] perf util: take elf_name as const string in dso__demangle_sym Milian Wolff
2017-08-07 15:10   ` Arnaldo Carvalho de Melo
2017-08-14 17:48   ` [tip:perf/core] perf util: Take " tip-bot for Milian Wolff
2017-08-06 21:24 ` [PATCH v2 03/14] perf report: create real callchain entries for inlined frames Milian Wolff
2017-08-16  7:53   ` Namhyung Kim
2017-08-20 20:57     ` Milian Wolff
2017-08-28 12:18       ` Namhyung Kim
2017-09-06 13:13       ` Milian Wolff
2017-09-07 14:58         ` Namhyung Kim
2017-09-07 15:05           ` Milian Wolff
2017-09-07 15:16             ` Namhyung Kim
2017-08-06 21:24 ` [PATCH v2 04/14] perf report: fall-back to function name comparison for -g srcline Milian Wolff
2017-08-06 21:24 ` [PATCH v2 05/14] perf report: mark inlined frames in output by " (inlined)" suffix Milian Wolff
2017-08-06 21:24 ` [PATCH v2 06/14] perf script: mark inlined frames and do not print DSO for them Milian Wolff
2017-08-06 21:24 ` [PATCH v2 07/14] perf report: compare symbol name for inlined frames when matching Milian Wolff
2017-08-06 21:24 ` [PATCH v2 08/14] perf report: compare symbol name for inlined frames when sorting Milian Wolff
2017-08-06 21:24 ` [PATCH v2 09/14] perf report: properly handle branch count in match_chain Milian Wolff
2017-08-06 21:24 ` [PATCH v2 10/14] perf report: cache failed lookups of inlined frames Milian Wolff
2017-08-06 21:24 ` [PATCH v2 11/14] perf report: cache srclines for callchain nodes Milian Wolff
2017-08-10  2:13   ` Namhyung Kim
2017-08-10 11:51     ` Milian Wolff
2017-08-10 14:56       ` Namhyung Kim
2017-08-10 17:58         ` Arnaldo Carvalho de Melo
2017-08-11 11:28           ` Milian Wolff
2017-08-06 21:24 ` [PATCH v2 12/14] perf report: use srcline from callchain for hist entries Milian Wolff
2017-08-06 21:24 ` [PATCH v2 13/14] perf util: do not consider empty files as valid srclines Milian Wolff
2017-08-07 15:21   ` Arnaldo Carvalho de Melo
2017-08-14 17:48   ` [tip:perf/core] perf srcline: Do " tip-bot for Milian Wolff
2017-08-06 21:24 ` [PATCH v2 14/14] perf util: enable handling of inlined frames by default Milian Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).