linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <Waiman.Long@hp.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
	Jeff Layton <jlayton@redhat.com>,
	Miklos Szeredi <mszeredi@suse.cz>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Andi Kleen <andi@firstfloor.org>,
	"Chandramouleeswaran, Aswin" <aswin@hp.com>,
	"Norton, Scott J" <scott.norton@hp.com>
Subject: [PATCH v7 4/4] dcache: Enable lockless update of dentry's refcount
Date: Mon,  5 Aug 2013 23:12:39 -0400	[thread overview]
Message-ID: <1375758759-29629-5-git-send-email-Waiman.Long@hp.com> (raw)
In-Reply-To: <1375758759-29629-1-git-send-email-Waiman.Long@hp.com>

The current code takes the dentry's d_lock lock whenever the refcnt
is being updated. In reality, nothing big really happens until refcnt
goes to 0 in dput(). So it is not necessary to take the lock if the
reference count won't go to 0. On the other hand, there are cases
where refcnt should not be updated or was not expected to be updated
while d_lock was acquired by another thread.

This patch changes the code in dput(), dget(), __dget() and
dget_parent() to use lockless reference count update function calls.

This patch has a particular big impact on the short workload of the
AIM7 benchmark with ramdisk filesystem. The table below show the
performance improvement to the JPM (jobs per minutes) throughput due
to this patch on an 8-socket 80-core x86-64 system with a 3.11-rc3
kernel in a 1/2/4/8 node configuration by using numactl to restrict
the execution of the workload on certain nodes.

+-----------------+----------------+-----------------+----------+
|  Configuration  |    Mean JPM    |    Mean JPM     | % Change |
|                 | Rate w/o patch | Rate with patch |          |
+-----------------+---------------------------------------------+
|                 |              User Range 10 - 100            |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |    1760523     |     4225737     | +140.0%  |
| 4 nodes, HT off |    2020076     |     3206202     |  +58.7%  |
| 2 nodes, HT off |    2391359     |     2654701     |  +11.0%  |
| 1 node , HT off |    2302912     |     2302433     |    0.0%  |
+-----------------+---------------------------------------------+
|                 |              User Range 200 - 1000          |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |    1078421     |     7380760     | +584.4%  |
| 4 nodes, HT off |    1371040     |     4212007     | +207.2%  |
| 2 nodes, HT off |    2844720     |     2783442     |   -2.2%  |
| 1 node , HT off |    2433443     |     2415590     |   -0.7%  |
+-----------------+---------------------------------------------+
|                 |              User Range 1100 - 2000         |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |    1055626     |     7118985     | +574.4%  |
| 4 nodes, HT off |    1352329     |     4512914     | +233.7%  |
| 2 nodes, HT off |    2793037     |     2758652     |   -1.2%  |
| 1 node , HT off |    2458125     |     2445069     |   -0.5%  |
+-----------------+----------------+-----------------+----------+

With 4 nodes and above, there are significant performance improvement
with this patch. With only 1 or 2 nodes, the performance is very close.
Because of variability of the AIM7 benchmark, a few percent difference
may not indicate a real performance gain or loss.

A perf call-graph report of the short workload at 1500 users
without the patch on the same 8-node machine indicates that about
79% of the workload's total time were spent in the _raw_spin_lock()
function. Almost all of which can be attributed to the following 2
kernel functions:
 1. dget_parent (49.92%)
 2. dput (49.84%)

The relevant perf report lines are:
+  78.76%      reaim  [kernel.kallsyms]   [k] _raw_spin_lock
+   0.05%      reaim  [kernel.kallsyms]   [k] dput
+   0.01%      reaim  [kernel.kallsyms]   [k] dget_parent

With this patch installed, the new perf report lines are:
+  19.66%      reaim  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
+   2.46%      reaim  [kernel.kallsyms]   [k] _raw_spin_lock
+   2.23%      reaim  [kernel.kallsyms]   [k] lockref_get_not_zero
+   0.50%      reaim  [kernel.kallsyms]   [k] dput
+   0.32%      reaim  [kernel.kallsyms]   [k] lockref_put_or_lock
+   0.30%      reaim  [kernel.kallsyms]   [k] lockref_get
+   0.01%      reaim  [kernel.kallsyms]   [k] dget_parent

-   2.46%      reaim  [kernel.kallsyms]   [k] _raw_spin_lock
   - _raw_spin_lock
      + 23.89% sys_getcwd
      + 23.60% d_path
      + 8.01% prepend_path
      + 5.18% complete_walk
      + 4.21% __rcu_process_callbacks
      + 3.08% inet_twsk_schedule
      + 2.36% do_anonymous_page
      + 2.24% unlazy_walk
      + 2.02% sem_lock
      + 1.82% process_backlog
      + 1.62% selinux_inode_free_security
      + 1.54% task_rq_lock
      + 1.45% unix_dgram_sendmsg
      + 1.18% enqueue_to_backlog
      + 1.06% unix_stream_sendmsg
      + 0.94% tcp_v4_rcv
      + 0.87% unix_create1
      + 0.71% scheduler_tick
      + 0.60% unix_release_sock
      + 0.59% do_wp_page
      + 0.59% unix_stream_recvmsg
      + 0.58% handle_pte_fault
      + 0.57% __do_fault
      + 0.53% unix_peer_get

The dput() and dget_parent() functions didn't show up in the
_raw_spin_lock callers at all.

This impact of this patch on other AIM7 workloads were much more
modest. Besides short, the other AIM7 workload that showed consistent
improvement is the high_systime workload. For the other workloads,
the changes were so minor that they are no significant difference
with and without the patch.

+--------------+---------------+----------------+-----------------+
|   Workload   | mean % change | mean % change  | mean % change   |
|              | 10-100 users  | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| high_systime |     +0.1%     |     +1.1%      |     +3.4%       |
+--------------+---------------+----------------+-----------------+

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 fs/dcache.c            |   26 ++++++++++++++++++--------
 include/linux/dcache.h |    7 ++-----
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 3adb6aa..9a4cf30 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -513,9 +513,15 @@ void dput(struct dentry *dentry)
 		return;
 
 repeat:
-	if (d_count(dentry) == 1)
-		might_sleep();
-	spin_lock(&dentry->d_lock);
+	if (d_count(dentry) > 1) {
+		if (lockref_put_or_lock(&dentry->d_lockcnt))
+			return;
+		/* dentry's lock taken */
+	} else {
+		if (d_count(dentry) == 1)
+			might_sleep();
+		spin_lock(&dentry->d_lock);
+	}
 	BUG_ON(!d_count(dentry));
 	if (d_count(dentry) > 1) {
 		dentry->d_lockcnt.refcnt--;
@@ -611,26 +617,30 @@ static inline void __dget_dlock(struct dentry *dentry)
 
 static inline void __dget(struct dentry *dentry)
 {
-	spin_lock(&dentry->d_lock);
-	__dget_dlock(dentry);
-	spin_unlock(&dentry->d_lock);
+	lockref_get(&dentry->d_lockcnt);
 }
 
 struct dentry *dget_parent(struct dentry *dentry)
 {
 	struct dentry *ret;
 
+	rcu_read_lock();
+	ret = rcu_dereference(dentry->d_parent);
+	if (lockref_get_not_zero(&ret->d_lockcnt)) {
+		rcu_read_unlock();
+		return ret;
+	}
 repeat:
 	/*
 	 * Don't need rcu_dereference because we re-check it was correct under
 	 * the lock.
 	 */
-	rcu_read_lock();
-	ret = dentry->d_parent;
+	ret = ACCESS_ONCE(dentry->d_parent);
 	spin_lock(&ret->d_lock);
 	if (unlikely(ret != dentry->d_parent)) {
 		spin_unlock(&ret->d_lock);
 		rcu_read_unlock();
+		rcu_read_lock();
 		goto repeat;
 	}
 	rcu_read_unlock();
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 20e6f2e..ec9206e 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -367,11 +367,8 @@ static inline struct dentry *dget_dlock(struct dentry *dentry)
 
 static inline struct dentry *dget(struct dentry *dentry)
 {
-	if (dentry) {
-		spin_lock(&dentry->d_lock);
-		dget_dlock(dentry);
-		spin_unlock(&dentry->d_lock);
-	}
+	if (dentry)
+		lockref_get(&dentry->d_lockcnt);
 	return dentry;
 }
 
-- 
1.7.1

  parent reply	other threads:[~2013-08-06  3:12 UTC|newest]

Thread overview: 144+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-06  3:12 [PATCH v7 0/4] Lockless update of reference count protected by spinlock Waiman Long
2013-08-06  3:12 ` [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount Waiman Long
2013-08-29  1:40   ` Linus Torvalds
2013-08-29  4:44     ` Benjamin Herrenschmidt
2013-08-29  7:00       ` Ingo Molnar
2013-08-29 16:43         ` Linus Torvalds
2013-08-29 19:25           ` Linus Torvalds
2013-08-29 23:42             ` Linus Torvalds
2013-08-30  0:26               ` Benjamin Herrenschmidt
2013-08-30  0:49                 ` Linus Torvalds
2013-08-30  2:06                   ` Michael Neuling
2013-08-30  2:30                     ` Benjamin Herrenschmidt
2013-08-30  2:35                       ` Linus Torvalds
2013-08-30  2:45                         ` Benjamin Herrenschmidt
2013-08-30  2:31                     ` Linus Torvalds
2013-08-30  2:43                       ` Benjamin Herrenschmidt
2013-08-30  7:16                   ` Ingo Molnar
2013-08-30 15:28                     ` Linus Torvalds
2013-08-30  3:12               ` Waiman Long
2013-08-30  3:54                 ` Linus Torvalds
2013-08-30  7:55                   ` Sedat Dilek
2013-08-30  8:10                     ` Sedat Dilek
2013-08-30  9:27                     ` Sedat Dilek
2013-08-30  9:48                       ` Ingo Molnar
2013-08-30  9:56                         ` Sedat Dilek
2013-08-30  9:58                           ` Sedat Dilek
2013-08-30 10:29                             ` Sedat Dilek
2013-08-30 10:36                               ` Peter Zijlstra
2013-08-30 10:44                                 ` Sedat Dilek
2013-08-30 10:46                                   ` Sedat Dilek
2013-08-30 10:52                                   ` Peter Zijlstra
2013-08-30 10:57                                     ` Sedat Dilek
2013-08-30 14:05                                       ` Sedat Dilek
2013-08-30 11:19                                 ` Sedat Dilek
2013-08-30 10:38                               ` Sedat Dilek
2013-08-30 15:34                       ` Linus Torvalds
2013-08-30 15:38                         ` Sedat Dilek
2013-08-30 16:12                           ` Steven Rostedt
2013-08-30 16:16                             ` Sedat Dilek
2013-08-30 18:42                             ` Linus Torvalds
2013-08-30 16:32                           ` Linus Torvalds
2013-08-30 16:37                             ` Sedat Dilek
2013-08-30 16:52                               ` Linus Torvalds
2013-08-30 17:11                                 ` Sedat Dilek
2013-08-30 17:26                                   ` Linus Torvalds
2013-09-01 10:01                                 ` Sedat Dilek
2013-09-01 10:33                                   ` Sedat Dilek
2013-09-01 15:32                                   ` Linus Torvalds
2013-09-01 15:45                                     ` Sedat Dilek
2013-09-01 15:55                                       ` Linus Torvalds
2013-09-02 10:30                                         ` Sedat Dilek
2013-09-02 16:09                                           ` David Ahern
2013-09-01 20:59                                     ` Linus Torvalds
2013-09-01 21:23                                       ` Al Viro
2013-09-01 22:16                                         ` Linus Torvalds
2013-09-01 22:35                                           ` Al Viro
2013-09-01 22:44                                             ` Al Viro
2013-09-01 22:58                                               ` Linus Torvalds
2013-09-01 22:48                                           ` Linus Torvalds
2013-09-01 23:30                                             ` Al Viro
2013-09-02  0:12                                               ` Linus Torvalds
2013-09-02  0:50                                                 ` Linus Torvalds
2013-09-02  7:05                                                   ` Ingo Molnar
2013-09-02 16:44                                                     ` Linus Torvalds
2013-09-03 10:15                                                       ` Ingo Molnar
2013-09-03 15:41                                                         ` Linus Torvalds
2013-09-03 18:34                                                           ` Linus Torvalds
2013-09-03 19:19                                                             ` Ingo Molnar
2013-09-03 21:05                                                               ` Linus Torvalds
2013-09-03 21:13                                                                 ` Linus Torvalds
2013-09-03 21:34                                                                   ` Linus Torvalds
2013-09-03 21:39                                                                     ` Linus Torvalds
2013-09-03 14:08                                                       ` Pavel Machek
2013-09-03 22:37                                     ` Sedat Dilek
2013-09-03 22:55                                       ` Dave Jones
2013-09-03 23:05                                         ` Sedat Dilek
2013-09-03 23:15                                           ` Dave Jones
2013-09-03 23:20                                             ` Sedat Dilek
2013-09-03 23:45                                       ` Sedat Dilek
2013-08-30 18:33                   ` Waiman Long
2013-08-30 18:53                     ` Linus Torvalds
2013-08-30 19:20                       ` Waiman Long
2013-08-30 19:33                         ` Linus Torvalds
2013-08-30 20:15                           ` Waiman Long
2013-08-30 20:43                             ` Linus Torvalds
2013-08-30 20:54                               ` Al Viro
2013-08-30 21:03                                 ` Linus Torvalds
2013-08-30 21:44                                   ` Al Viro
2013-08-30 22:30                                     ` Linus Torvalds
2013-08-31 21:23                                       ` Al Viro
2013-08-31 22:49                                         ` Linus Torvalds
2013-08-31 23:27                                           ` Al Viro
2013-09-01  0:13                                             ` Al Viro
2013-09-01 17:48                                               ` Al Viro
2013-09-09  8:30                                               ` Peter Zijlstra
2013-08-30 21:10                                 ` Waiman Long
2013-08-30 21:22                                   ` Linus Torvalds
2013-08-30 21:30                                   ` Al Viro
2013-08-30 21:42                                     ` Waiman Long
2013-08-30 19:40                         ` Al Viro
2013-08-30 19:52                           ` Waiman Long
2013-08-30 20:26                             ` Al Viro
2013-08-30 20:35                               ` Waiman Long
2013-08-30 20:48                                 ` Al Viro
2013-08-31  2:02                                   ` Waiman Long
2013-08-31  2:35                                     ` Al Viro
2013-08-31  2:42                                       ` Al Viro
2013-09-02 19:25                                         ` Waiman Long
2013-09-03  6:01                                           ` Ingo Molnar
2013-09-03  7:24                                             ` Sedat Dilek
2013-09-03 15:38                                               ` Linus Torvalds
2013-09-03 15:14                                             ` Waiman Long
2013-09-03 15:34                                               ` Linus Torvalds
2013-09-03 19:09                                                 ` Linus Torvalds
2013-09-03 21:01                                                   ` Waiman Long
2013-09-04 14:52                                                   ` Waiman Long
2013-09-04 15:14                                                     ` Linus Torvalds
2013-09-04 19:25                                                       ` Waiman Long
2013-09-04 21:34                                                         ` Linus Torvalds
2013-09-05  2:35                                                           ` Waiman Long
2013-09-05 13:31                                                     ` Ingo Molnar
2013-09-05 17:33                                                       ` Waiman Long
2013-09-05 17:40                                                         ` Ingo Molnar
2013-09-03 22:41                                               ` Sedat Dilek
2013-09-03 23:11                                                 ` Sedat Dilek
2013-09-08 21:45               ` Linus Torvalds
2013-09-09  0:03                 ` Al Viro
2013-09-09  0:25                   ` Linus Torvalds
2013-09-09  0:35                     ` Al Viro
2013-09-09  0:38                       ` Linus Torvalds
2013-09-09  0:57                         ` Al Viro
2013-09-09  2:09                     ` Ramkumar Ramachandra
2013-09-09  0:30                   ` Al Viro
2013-09-09  3:32                   ` Linus Torvalds
2013-09-09  4:06                     ` Ramkumar Ramachandra
2013-09-09  5:44                     ` Al Viro
2013-08-30 17:17           ` Peter Zijlstra
2013-08-30 17:28             ` Linus Torvalds
2013-08-30 17:33               ` Linus Torvalds
2013-08-29 15:20     ` Waiman Long
2013-08-06  3:12 ` [PATCH v7 2/4] spinlock: Enable x86 architecture to do lockless refcount update Waiman Long
2013-08-06  3:12 ` [PATCH v7 3/4] dcache: replace d_lock/d_count by d_lockcnt Waiman Long
2013-08-06  3:12 ` Waiman Long [this message]
2013-08-13 18:03 ` [PATCH v7 0/4] Lockless update of reference count protected by spinlock Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1375758759-29629-5-git-send-email-Waiman.Long@hp.com \
    --to=waiman.long@hp.com \
    --cc=andi@firstfloor.org \
    --cc=aswin@hp.com \
    --cc=benh@kernel.crashing.org \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mszeredi@suse.cz \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=scott.norton@hp.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).