All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Jonathan Corbet <corbet@lwn.net>, Ingo Molnar <mingo@kernel.org>,
	Eric W Biederman <ebiederm@xmission.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Dmitry Vyukov <dvyukov@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andy Lutomirski <luto@kernel.org>,
	Ying Huang <ying.huang@intel.com>,
	linux-kernel@vger.kernel.org
Cc: Feng Tang <feng.tang@intel.com>
Subject: [RFC PATCH 0/3] latencytop lock usage improvement
Date: Mon, 29 Apr 2019 16:03:28 +0800	[thread overview]
Message-ID: <1556525011-28022-1-git-send-email-feng.tang@intel.com> (raw)

Hi All,

latencytop is a very nice tool for tracing system latency hotspots, and
we heavily use it in our LKP test suites.

However, we found in some benchmark tests, there are very severe lock
contentions which hits 70%+ of CPU cycles in perf profile, especially
for benchmark involving massive process scheduling on platforms with 
many CPUs, like running hackbench 
  "hackbench -g 1408 --process --pipe -l 1875 -s 1024"
on a 2 sockets Xeon E5-2699 machine (44 Cores/88 CPUs) with 64GB RAM.

And due to that, we have to explicitly disable latencytop for those test
cases.

By checking the code, we found latencytop use one global spinlock to
cover the latency data updating for both global data and per-task ones.

So initially we tried splitting single lock into one global lock and
per-task lock for better lock granularity, but the lock contention is
only reduced slightly (only 1% drop for perf profile). The reason is
the contention is still severe, as the benchmarks cause massive
scheduling on the 88 CPUs, and every schedule-in call will try to
acquire the lock.

Then we tried to reduce the operations inside the latency_lock's 
protection (between the spin_lock_irqsave/raw_spin_unlock_irqrestore
pair), and also there is only very small improvement, and lock contention
keeps high.

At last,  we tried adding one extra lazy mode which only update the global
latency data when a task exit, while still updating the per-task data
in real time. This reduces the lock contention from 70%+ to less than
5% while boost that hackbench case's throughput by 276%. 

Please help to review, thanks!

Patch 1/3 adds the missing sysctl description for "latencytop" and I
think it could be merged independently.

Patch 2/3 splits latency_lock to global lock and per task lock.
And actually, a more aggressive thought is the per-task lock may not be
needed as the per-task data is only updated at the enqueueing time for
a task, which implies no race condtion for it.

Patch 3/3 implements the lazy mode and update the document.

Thanks,
Feng


Feng Tang (3):
  kernel/sysctl: add description for "latencytop"
  latencytop: split latency_lock to global lock and per task lock
  latencytop: add a lazy mode for updating global data

 Documentation/sysctl/kernel.txt | 23 ++++++++++++++++++++
 include/linux/latencytop.h      |  5 +++++
 include/linux/sched.h           |  1 +
 init/init_task.c                |  3 +++
 kernel/exit.c                   |  2 ++
 kernel/fork.c                   |  4 ++++
 kernel/latencytop.c             | 47 +++++++++++++++++++++++++++++++++++------
 7 files changed, 78 insertions(+), 7 deletions(-)

-- 
2.7.4


             reply	other threads:[~2019-04-29  8:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-29  8:03 Feng Tang [this message]
2019-04-29  8:03 ` [RFC PATCH 1/3] kernel/sysctl: add description for "latencytop" Feng Tang
2019-04-29  8:03 ` [RFC PATCH 2/3] latencytop: split latency_lock to global lock and per task lock Feng Tang
2019-04-29  8:03 ` [RFC PATCH 3/3] latencytop: add a lazy mode for updating global latency data Feng Tang
2019-04-30  8:09 ` [RFC PATCH 0/3] latencytop lock usage improvement Peter Zijlstra
2019-04-30  8:35   ` Feng Tang
2019-04-30  9:10     ` Peter Zijlstra
2019-04-30  9:22       ` Feng Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1556525011-28022-1-git-send-email-feng.tang@intel.com \
    --to=feng.tang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dvyukov@google.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.