All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <Waiman.Long@hp.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
	Jeff Layton <jlayton@redhat.com>,
	Miklos Szeredi <mszeredi@suse.cz>, Ian Kent <raven@themaw.net>,
	Sage Weil <sage@inktank.com>, Steve French <sfrench@samba.org>,
	Trond Myklebust <Trond.Myklebust@netapp.com>,
	Eric Paris <eparis@redhat.com>
Cc: Waiman Long <Waiman.Long@hp.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	autofs@vger.kernel.org, ceph-devel@vger.kernel.org,
	linux-cifs@vger.kernel.org, samba-technical@lists.samba.org,
	linux-nfs@vger.kernel.org, "Chandramouleeswaran,
	Aswin" <aswin@hp.com>, "Norton, Scott J" <scott.norton@hp.com>,
	Andi Kleen <andi@firstfloor.org>,
	Dave Chinner <david@fromorbit.com>
Subject: [PATCH 0/3 v3] dcache: make it more scalable on large system
Date: Wed, 22 May 2013 21:37:25 -0400	[thread overview]
Message-ID: <1369273048-60256-1-git-send-email-Waiman.Long@hp.com> (raw)

Change log:

v2->v3
 - Fix the RCU lock problem found by Al Viro.
 - Rebase the code to the latest v3.10-rc1 linux mainline.
 - Remove patch 4 which may be problematic if the dentry is deleted.
 - Rerun performance measurement using 3.10-rc1 kernel.

v1->v2
 - Include performance improvement in the AIM7 benchmark results because
   of this patch.
 - Modify dget_parent() to avoid taking the lock, if possible, to further
   improve AIM7 benchmark results.

During some perf-record sessions of the kernel running the high_systime
workload of the AIM7 benchmark, it was found that quite a large portion
of the spinlock contention was due to the perf_event_mmap_event()
function itself. This perf kernel function calls d_path() which,
in turn, call path_get() and dput() indirectly. These 3 functions
were the hottest functions shown in the perf-report output of
the _raw_spin_lock() function in an 8-socket system with 80 cores
(hyperthreading off) with a 3.10-rc1 kernel:

-  13.91%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
   - _raw_spin_lock
      + 35.54% path_get
      + 34.85% dput
      + 19.49% d_path

In fact, the output of the "perf record -s -a" (without call-graph)
showed:

 13.37%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
  7.61%              ls  [kernel.kallsyms]     [k] _raw_spin_lock
  3.54%            true  [kernel.kallsyms]     [k] _raw_spin_lock

Without using the perf monitoring tool, the actual execution profile
will be quite different. In fact, with this patch set applied, the
output of the same "perf record -s -a" command became:

  2.82%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
  1.11%              ls  [kernel.kallsyms]     [k] _raw_spin_lock
  0.26%            true  [kernel.kallsyms]     [k] _raw_spin_lock

So the time spent on _raw_spin_lock() function went down from 24.52%
to 4.19%. It can be seen that the performance data collected by the
perf-record command can be heavily skewed in some cases on a system
with a large number of CPUs. This set of patches enables the perf
command to give a more accurate and reliable picture of what is really
happening in the system. At the same time, they can also improve the
general performance of systems especially those with a large number
of CPUs.

The d_path() function takes the following two locks:

1. dentry->d_lock [spinlock] from dget()/dget_parent()/dput()
2. rename_lock    [seqlock]  from d_path()

This set of patches were designed to minimize the locking overhead
of these code paths.

The current kernel takes the dentry->d_lock lock whenever it wants to
increment or decrement the d_count reference count. However, nothing
big will really happen until the reference count goes all the way to 1
or 0.  Actually, we don't need to take the lock when reference count
is bigger than 1. Instead, atomic cmpxchg() function can be used to
increment or decrement the count in these situations. For safety,
other reference count update operations have to be changed to use
atomic instruction as well.

The rename_lock is a sequence lock. The d_path() function takes the
writer lock because it needs to traverse different dentries through
pointers to get the full path name. Hence it can't tolerate changes
in those pointers. But taking the writer lock also prevent multiple
d_path() calls to proceed concurrently.

A solution is to introduce a new lock type where there will be a
second type of reader which can block the writers - the sequence
read/write lock (seqrwlock). The d_path() and related functions will
then be changed to take the reader lock instead of the writer lock.
This will allow multiple d_path() operations to proceed concurrently.

Additional performance testing was conducted using the AIM7
benchmark.  It is mainly the first patch that has impact on the AIM7
benchmark. Please see the patch description of the first patch on
more information about the benchmark results.

Incidentally, these patches also have a favorable impact on Oracle
database performance when measured by the Oracle SLOB benchmark.

The following tests with multiple threads were also run on kernels
with and without the patch on an 8-socket 80-core system and a PC
with 4-core i5 processor:

1. find $HOME -size 0b
2. cat /proc/*/maps /proc/*/numa_maps
3. git diff

For both the find-size and cat-maps tests, the performance difference
with hot cache was within a few percentage points and hence within
the margin of error. Single-thread performance was slightly worse,
but multithread performance was generally a bit better. Apparently,
reference count update isn't a significant factor in those tests. Their
perf traces indicates that there was less spinlock content in
functions like dput(), but the function itself ran a little bit longer
on average.

The git-diff test showed no difference in performance. There is a
slight increase in system time compensated by a slight decrease in
user time.

Of the 3 patches, patch 3 is dependent on patch 2. The first patch
is independent can be applied individually.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>

Waiman Long (3):
  dcache: Don't take unnecessary lock in d_count update
  dcache: introduce a new sequence read/write lock type
  dcache: change rename_lock to a sequence read/write lock

 fs/autofs4/waitq.c        |    6 +-
 fs/ceph/mds_client.c      |    4 +-
 fs/cifs/dir.c             |    4 +-
 fs/dcache.c               |  120 ++++++++++++++++++++--------------------
 fs/namei.c                |    2 +-
 fs/nfs/namespace.c        |    6 +-
 include/linux/dcache.h    |  105 +++++++++++++++++++++++++++++++++--
 include/linux/seqrwlock.h |  137 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/auditsc.c          |    4 +-
 9 files changed, 310 insertions(+), 78 deletions(-)
 create mode 100644 include/linux/seqrwlock.h


             reply	other threads:[~2013-05-23  1:37 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-23  1:37 Waiman Long [this message]
2013-05-23  1:37 ` [PATCH 1/3 v3] dcache: Don't take unnecessary lock in d_count update Waiman Long
2013-05-23  1:37   ` Waiman Long
2013-05-23  1:37 ` Waiman Long
2013-05-23  1:37 ` [PATCH 2/3 v3] dcache: introduce a new sequence read/write lock type Waiman Long
2013-05-23  1:37   ` Waiman Long
2013-05-23  1:37 ` Waiman Long
2013-05-23  1:37 ` [PATCH 3/3 v3] dcache: change rename_lock to a sequence read/write lock Waiman Long
2013-05-23  1:37 ` Waiman Long
2013-05-23  1:37   ` Waiman Long
2013-05-23  9:42 ` [PATCH 0/3 v3] dcache: make it more scalable on large system Dave Chinner
2013-05-23 21:34   ` Waiman Long
2013-05-23 21:34     ` Waiman Long
2013-05-27  2:09     ` Dave Chinner
2013-05-29 15:55       ` Waiman Long
     [not found]         ` <51A624E2.3000301-VXdhtT5mjnY@public.gmane.org>
2013-05-29 16:13           ` Andi Kleen
2013-05-29 16:13             ` Andi Kleen
     [not found]             ` <20130529161358.GJ6123-1g7Xle2YJi4/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2013-05-29 20:23               ` Waiman Long
2013-05-29 20:23                 ` Waiman Long
2013-05-29 16:18           ` Simo Sorce
2013-05-29 16:18             ` Simo Sorce
2013-05-29 16:56             ` Andi Kleen
2013-05-29 17:03               ` Simo Sorce
2013-05-29 20:37               ` Waiman Long
     [not found]             ` <1369844289.2769.146.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2013-05-29 20:32               ` Waiman Long
2013-05-29 20:32                 ` Waiman Long
2013-05-29 18:46           ` J. Bruce Fields
2013-05-29 18:46             ` J. Bruce Fields
2013-05-29 20:37             ` Andi Kleen
     [not found]               ` <20130529203700.GM6123-1g7Xle2YJi4/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2013-05-29 20:43                 ` J. Bruce Fields
2013-05-29 20:43                   ` J. Bruce Fields
2013-05-29 21:01                   ` Andi Kleen
2013-05-29 21:19               ` Jörn Engel
2013-05-29 21:19                 ` Jörn Engel
2013-05-30 15:48                 ` Waiman Long
2013-05-30 15:48                   ` Waiman Long
2013-05-30 15:11                   ` Jörn Engel
2013-05-30 15:11                     ` Jörn Engel
2013-06-06  3:48               ` Dave Chinner
2013-05-29 20:40             ` Waiman Long
2013-05-23  1:37 Waiman Long
2013-05-23  1:37 ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1369273048-60256-1-git-send-email-Waiman.Long@hp.com \
    --to=waiman.long@hp.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=andi@firstfloor.org \
    --cc=aswin@hp.com \
    --cc=autofs@vger.kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=eparis@redhat.com \
    --cc=jlayton@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mszeredi@suse.cz \
    --cc=raven@themaw.net \
    --cc=sage@inktank.com \
    --cc=samba-technical@lists.samba.org \
    --cc=scott.norton@hp.com \
    --cc=sfrench@samba.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.