linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/9] perf report: Improve srcline sort performance (v1)
@ 2022-12-15 19:28 Namhyung Kim
  2022-12-15 19:28 ` [PATCH 1/9] perf srcline: Do not return NULL for srcline Namhyung Kim
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Namhyung Kim @ 2022-12-15 19:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Andi Kleen, Milian Wolff, Leo Yan

Hello,

I noticed a performance problem in the srcline/srcfile processing during
perf report when it's using an external addr2line process.  I guess it's
also helpful even if it uses the libbfd to get the srcline info.

Also note that it's mostly from large (static) binaries, but smaller
binaries should also benefit from the fix if they have a lot of samples.

The first 5 patches are general fixes and updates.  The latter 4 patches
implemented the actual speed-up.

Let's test it with the perf tools itself.  Build a static binary like below.

  $ cd tools/perf
  $ make NO_JVMTI=1 LDFLAGS=-static

Then run the perf test workload.

  $ ./perf record -- ./perf test -w noploop

And run the perf report with srcline sort key like this.

  $ ./perf report -n -s srcline --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 4K of event 'cycles:u'
  # Event count (approx.): 3572938596
  #
  # Overhead       Samples  Source:Line
  # ........  ............  ............
  #
      99.94%          4010  noploop.c:26
       0.03%            14  ??:0
       0.03%             1  perf.c:330
       0.00%             1  wcscpy.o:0

The problem is that it runs the addr2line when it processes each sample.
But as you can see many samples can have same result.  IOW, if the samples
have same address, we don't need to run the addr2line each time.

So I changed the sort_key->cmp() to compare the addresses only and moved
the addr2line from sort_key->collapse() so that they can be run after
merging the samples with the same address.

With the change, I can get a huge speed-up in processing srcline info
while they generate the same output.

Before:

  $ ./perf stat -- ./perf report -s srcline > /dev/null

   Performance counter stats for './perf report -s srcline':

           15,397.13 msec task-clock:u                     #    0.993 CPUs utilized
                   0      context-switches:u               #    0.000 /sec
                   0      cpu-migrations:u                 #    0.000 /sec
               3,810      page-faults:u                    #  247.449 /sec
      54,516,351,820      cycles:u                         #    3.541 GHz
      31,494,118,293      instructions:u                   #    0.58  insn per cycle
       8,577,271,187      branches:u                       #  557.069 M/sec
       1,216,165,520      branch-misses:u                  #   14.18% of all branches

        15.505066606 seconds time elapsed

        15.094122000 seconds user
         0.396962000 seconds sys

After:

  $ ./perf stat -- ./perf report -s srcline > /dev/null

   Performance counter stats for './perf report -s srcline':

              105.66 msec task-clock:u                     #    0.994 CPUs utilized
                   0      context-switches:u               #    0.000 /sec
                   0      cpu-migrations:u                 #    0.000 /sec
               3,275      page-faults:u                    #   30.995 K/sec
         185,063,407      cycles:u                         #    1.751 GHz
         142,470,215      instructions:u                   #    0.77  insn per cycle
          34,584,038      branches:u                       #  327.311 M/sec
           3,226,005      branch-misses:u                  #    9.33% of all branches

         0.106270464 seconds time elapsed

         0.074254000 seconds user
         0.032871000 seconds sys

The code is available at 'perf/srcline-v1' branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (9):
  perf srcline: Do not return NULL for srcline
  perf report: Ignore SIGPIPE for srcline
  perf symbol: Add filename__has_section()
  perf srcline: Skip srcline if .debug_line is missing
  perf srcline: Conditionally suppress addr2line warnings
  perf hist: Add perf_hpp_fmt->init() callback
  perf hist: Improve srcline sort key performance
  perf hist: Improve srcfile sort key performance
  perf hist: Improve srcline_{from,to} sort key performance

 tools/perf/builtin-report.c      |   1 +
 tools/perf/util/hist.c           |  10 +--
 tools/perf/util/hist.h           |   1 +
 tools/perf/util/sort.c           | 129 ++++++++++++++++++++++++++++---
 tools/perf/util/sort.h           |   1 +
 tools/perf/util/srcline.c        |  20 +++--
 tools/perf/util/symbol-elf.c     |  28 +++++++
 tools/perf/util/symbol-minimal.c |   5 ++
 tools/perf/util/symbol.h         |   1 +
 9 files changed, 176 insertions(+), 20 deletions(-)


base-commit: 818448e9cf92e5c6b3c10320372eefcbe4174e4f
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-12-20 18:38 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-15 19:28 [PATCHSET 0/9] perf report: Improve srcline sort performance (v1) Namhyung Kim
2022-12-15 19:28 ` [PATCH 1/9] perf srcline: Do not return NULL for srcline Namhyung Kim
2022-12-15 19:28 ` [PATCH 2/9] perf report: Ignore SIGPIPE " Namhyung Kim
2022-12-16  7:24   ` Andi Kleen
2022-12-16 18:08     ` Namhyung Kim
2022-12-20 18:38       ` Arnaldo Carvalho de Melo
2022-12-15 19:28 ` [PATCH 3/9] perf symbol: Add filename__has_section() Namhyung Kim
2022-12-15 19:28 ` [PATCH 4/9] perf srcline: Skip srcline if .debug_line is missing Namhyung Kim
2022-12-15 19:28 ` [PATCH 5/9] perf srcline: Conditionally suppress addr2line warnings Namhyung Kim
2022-12-15 19:28 ` [PATCH 6/9] perf hist: Add perf_hpp_fmt->init() callback Namhyung Kim
2022-12-15 19:28 ` [PATCH 7/9] perf hist: Improve srcline sort key performance Namhyung Kim
2022-12-15 19:28 ` [PATCH 8/9] perf hist: Improve srcfile " Namhyung Kim
2022-12-15 19:28 ` [PATCH 9/9] perf hist: Improve srcline_{from,to} " Namhyung Kim
2022-12-15 20:28 ` [PATCHSET 0/9] perf report: Improve srcline sort performance (v1) Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).