linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: kan.liang@intel.com
To: acme@kernel.org, peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org
Cc: jolsa@kernel.org, namhyung@kernel.org, adrian.hunter@intel.com,
	lukasz.odzioba@intel.com, ak@linux.intel.com,
	Kan Liang <kan.liang@intel.com>
Subject: [PATCH RFC 00/10] perf top optimization
Date: Thu,  7 Sep 2017 10:55:44 -0700	[thread overview]
Message-ID: <1504806954-150842-1-git-send-email-kan.liang@intel.com> (raw)

From: Kan Liang <kan.liang@intel.com>

The patch series intends to fix the severe performance issue in
Knights Landing/Mill, when monitoring in heavy load system.
perf top costs a few minutes to show the result, which is
unacceptable.
With the patch series applied, the latency will reduces to
several seconds.

machine__synthesize_threads and perf_top__mmap_read costs most of
the perf top time (> 99%).
Patch 1-9 do the optimization for machine__synthesize_threads.
Patch 10 does the optimization for perf_top__mmap_read.

Optimization for machine__synthesize_threads
  - Multithreading the whole process.
  - The threads number is set to the max online CPU# by default.
    User can change the threads number through the new option.
  - Introduces hashtable for machine threads to reduce the lock
    contention.
  - The optimization can also benefit other platforms and other
    perf tools, like perf record. But this patch series doesn't
    do the optimization for other tools. It can be done later
    separately.
  - With this optimization applied, there is a 1.56x speedup in
    Knights Mill with heavy workload.

Optimization for perf_top__mmap_read
  - switch back to overwrite mode
    For non overwrite mode, it tries to read everything in the ring buffer
    and does not check the messup. Once there are lots of samples delivered
    shortly, the processing time could be very long.
    Considering the real time requirement for perf top, it should switch
    back to overwrite mode.
  - With this optimization applied, there is a huge 53.8x speedup in
    Knights Mill with heavy workload.
  - With this optimization applied, the latency of perf_top__mmap_read is
    less than the default perf top fresh time (2s) in Knights Mill with
    heavy workload.

Here are the perf top latency test result on Knights Mill and Skylake server

The heavy workload is to compile Linux kernel as below
"sudo nice make -j$(grep -c '^processor' /proc/cpuinfo)"
Then, "sudo perf top"

The latency period is the time between perf top launched and the first
profiling result shown.

- Latency on Knights Mill (272 CPUs)

Original(s)     With patch(s)   Speedup
272.68          16.48           16.54x

- Latency on Skylake server (192 CPUs)

Original(s)     With patch(s)   Speedup
12.28           2.96            4.15x

Kan Liang (10):
  perf tools: hashtable for machine threads
  perf tools: using scandir to replace readdir
  petf tools: using comm_str to replace comm in hist_entry
  petf tools: introduce a new function to set namespaces id
  perf tools: lock to protect thread list
  perf tools: lock to protect comm_str rb tree
  perf tools: change machine comm_exec type to atomic
  perf top: implement multithreading for perf_event__synthesize_threads
  perf top: add option to set the number of thread for event synthesize
  perf top: switch back to overwrite mode

 tools/perf/builtin-kvm.c              |   3 +-
 tools/perf/builtin-record.c           |   2 +-
 tools/perf/builtin-top.c              |   9 +-
 tools/perf/builtin-trace.c            |  21 +++--
 tools/perf/tests/mmap-thread-lookup.c |   2 +-
 tools/perf/ui/browsers/hists.c        |   2 +-
 tools/perf/util/comm.c                |  18 +++-
 tools/perf/util/event.c               | 142 ++++++++++++++++++++++++------
 tools/perf/util/event.h               |  14 ++-
 tools/perf/util/evlist.c              |   5 +-
 tools/perf/util/hist.c                |  10 +--
 tools/perf/util/machine.c             | 158 +++++++++++++++++++++-------------
 tools/perf/util/machine.h             |  34 ++++++--
 tools/perf/util/rb_resort.h           |   5 +-
 tools/perf/util/sort.c                |   8 +-
 tools/perf/util/sort.h                |   2 +-
 tools/perf/util/thread.c              |  68 ++++++++++++---
 tools/perf/util/thread.h              |   6 +-
 tools/perf/util/top.h                 |   1 +
 19 files changed, 368 insertions(+), 142 deletions(-)

-- 
2.5.5

             reply	other threads:[~2017-09-07 17:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-07 17:55 kan.liang [this message]
2017-09-07 17:55 ` [PATCH RFC 01/10] perf tools: hashtable for machine threads kan.liang
2017-09-08 14:18   ` Arnaldo Carvalho de Melo
2017-09-07 17:55 ` [PATCH RFC 02/10] perf tools: using scandir to replace readdir kan.liang
2017-09-08 14:11   ` Arnaldo Carvalho de Melo
2017-09-22 16:35   ` [tip:perf/core] perf tools: Use scandir() to replace readdir() tip-bot for Kan Liang
2017-09-07 17:55 ` [PATCH RFC 03/10] petf tools: using comm_str to replace comm in hist_entry kan.liang
2017-09-07 17:55 ` [PATCH RFC 04/10] petf tools: introduce a new function to set namespaces id kan.liang
2017-09-07 17:55 ` [PATCH RFC 05/10] perf tools: lock to protect thread list kan.liang
2017-09-07 17:55 ` [PATCH RFC 06/10] perf tools: lock to protect comm_str rb tree kan.liang
2017-09-08 14:27   ` Arnaldo Carvalho de Melo
2017-09-07 17:55 ` [PATCH RFC 07/10] perf tools: change machine comm_exec type to atomic kan.liang
2017-09-07 17:55 ` [PATCH RFC 08/10] perf top: implement multithreading for perf_event__synthesize_threads kan.liang
2017-09-07 17:55 ` [PATCH RFC 09/10] perf top: add option to set the number of thread for event synthesize kan.liang
2017-09-07 17:55 ` [PATCH RFC 10/10] perf top: switch back to overwrite mode kan.liang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1504806954-150842-1-git-send-email-kan.liang@intel.com \
    --to=kan.liang@intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.odzioba@intel.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).