linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6]  rwsem: performance optimizations
       [not found] <cover.1380057198.git.tim.c.chen@linux.intel.com>
@ 2013-09-24 22:22 ` Tim Chen
  2013-09-25  3:32   ` Davidlohr Bueso
  2013-09-24 22:22 ` [PATCH v5 1/6] rwsem: check the lock before cpmxchg in down_write_trylock Tim Chen
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Tim Chen @ 2013-09-24 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

We have incorporated various suggestions from Ingo for version 5 of this patchset
and will like to have it merged if there are no objections.

In this patchset, we introduce two categories of optimizations to read
write semaphore.  The first four patches from Alex Shi reduce cache bouncing of the
sem->count field by doing a pre-read of the sem->count and avoid cmpxchg
if possible.

The last two patches introduce similar optimistic spinning logic as
the mutex code for the writer lock acquisition of rwsem.

Without these optimizations, Davidlohr Bueso saw a -8% regression to
aim7's shared and high_systime workloads when he switched i_mmap_mutex to
rwsem.  Tests were on 8 socket 80 cores system.  With the patchset, he
get significant improvements to the aim7 suite instead of regressions:

alltests (+16.3%), custom (+20%), disk (+19.5%), high_systime (+7%),
shared (+18.4%) and short (+6.3%).

Tim Chen also got a +5% improvements to exim mail server workload on a 40 core system.

Thanks to Ingo Molnar, Peter Hurley and Peter Zijlstra for reviewing this patchset.

Regards,
Tim Chen

Changelog:

v5:
1. Try optimistic spinning before we put the writer on the wait queue
to avoid bottlenecking at wait queue.  This provides 5% boost to exim workload
and between 2% to 8% boost to aim7. 
2. Put MCS locking code into its own mcslock.h file for better reuse
between mutex.c and rwsem.c
3. Remove the configuration RWSEM_SPIN_ON_WRITE_OWNER and make the 
operations default per Ingo's suggestions.

v4:
1. Fixed a bug in task_struct definition in rwsem_can_spin_on_owner
2. Fix another typo for RWSEM_SPIN_ON_WRITE_OWNER config option

v3:
1. Added ACCESS_ONCE to sem->count access in rwsem_can_spin_on_owner.
2. Fix typo bug for RWSEM_SPIN_ON_WRITE_OWNER option in init/Kconfig

v2:
1. Reorganize changes to down_write_trylock and do_wake into 4 patches and fixed
   a bug referencing &sem->count when sem->count is intended.
2. Fix unsafe sem->owner de-reference in rwsem_can_spin_on_owner.
the option to be on for more seasoning but can be turned off should it be detrimental.
3. Various patch comments update



Alex Shi (4):
  rwsem: check the lock before cpmxchg in down_write_trylock
  rwsem: remove 'out' label in do_wake
  rwsem: remove try_reader_grant label do_wake
  rwsem/wake: check lock before do atomic update

Tim Chen (2):
  MCS Lock: Restructure the MCS lock defines and locking code into its
    own file
  rwsem: do optimistic spinning for writer lock acquisition

 include/asm-generic/rwsem.h |    8 +-
 include/linux/rwsem.h       |    8 +-
 kernel/mutex.c              |   58 ++----------
 kernel/rwsem.c              |   19 ++++-
 lib/rwsem.c                 |  218 ++++++++++++++++++++++++++++++++++++++-----
 5 files changed, 228 insertions(+), 83 deletions(-)

-- 
1.7.4.4



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 1/6] rwsem: check the lock before cpmxchg in down_write_trylock
       [not found] <cover.1380057198.git.tim.c.chen@linux.intel.com>
  2013-09-24 22:22 ` [PATCH v5 0/6] rwsem: performance optimizations Tim Chen
@ 2013-09-24 22:22 ` Tim Chen
  2013-09-24 23:22   ` Jason Low
  2013-09-24 22:22 ` [PATCH v5 2/6] rwsem: remove 'out' label in do_wake Tim Chen
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Tim Chen @ 2013-09-24 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

Cmpxchg will cause the cacheline bouning when do the value checking,
that cause scalability issue in a large machine (like a 80 core box).

So a lock pre-read can relief this contention.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 include/asm-generic/rwsem.h |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index bb1e2cd..5ba80e7 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -70,11 +70,11 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	long tmp;
+	if (unlikely(sem->count != RWSEM_UNLOCKED_VALUE))
+		return 0;
 
-	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
+	return cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
+		      RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_UNLOCKED_VALUE;
 }
 
 /*
-- 
1.7.4.4




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 2/6] rwsem: remove 'out' label in do_wake
       [not found] <cover.1380057198.git.tim.c.chen@linux.intel.com>
  2013-09-24 22:22 ` [PATCH v5 0/6] rwsem: performance optimizations Tim Chen
  2013-09-24 22:22 ` [PATCH v5 1/6] rwsem: check the lock before cpmxchg in down_write_trylock Tim Chen
@ 2013-09-24 22:22 ` Tim Chen
  2013-09-24 22:22 ` [PATCH v5 3/6] rwsem: remove try_reader_grant label do_wake Tim Chen
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Tim Chen @ 2013-09-24 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

That make code simple and more readable.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 lib/rwsem.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index 19c5fa9..42f1b1a 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 			 * will block as they will notice the queued writer.
 			 */
 			wake_up_process(waiter->task);
-		goto out;
+		return sem;
 	}
 
 	/* Writers might steal the lock before we grant it to the next reader.
@@ -91,7 +91,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 			/* A writer stole the lock. Undo our reader grant. */
 			if (rwsem_atomic_update(-adjustment, sem) &
 						RWSEM_ACTIVE_MASK)
-				goto out;
+				return sem;
 			/* Last active locker left. Retry waking readers. */
 			goto try_reader_grant;
 		}
@@ -136,7 +136,6 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	sem->wait_list.next = next;
 	next->prev = &sem->wait_list;
 
- out:
 	return sem;
 }
 
-- 
1.7.4.4




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 3/6] rwsem: remove try_reader_grant label do_wake
       [not found] <cover.1380057198.git.tim.c.chen@linux.intel.com>
                   ` (2 preceding siblings ...)
  2013-09-24 22:22 ` [PATCH v5 2/6] rwsem: remove 'out' label in do_wake Tim Chen
@ 2013-09-24 22:22 ` Tim Chen
  2013-09-24 22:22 ` [PATCH v5 4/6] rwsem/wake: check lock before do atomic update Tim Chen
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Tim Chen @ 2013-09-24 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

That make code simple and more readable

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 lib/rwsem.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index 42f1b1a..a8055cf 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -85,15 +85,17 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	adjustment = 0;
 	if (wake_type != RWSEM_WAKE_READ_OWNED) {
 		adjustment = RWSEM_ACTIVE_READ_BIAS;
- try_reader_grant:
-		oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
-		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
-			/* A writer stole the lock. Undo our reader grant. */
+		while (1) {
+			oldcount = rwsem_atomic_update(adjustment, sem)
+								- adjustment;
+			if (likely(oldcount >= RWSEM_WAITING_BIAS))
+				break;
+
+			 /* A writer stole the lock.  Undo our reader grant. */
 			if (rwsem_atomic_update(-adjustment, sem) &
 						RWSEM_ACTIVE_MASK)
 				return sem;
 			/* Last active locker left. Retry waking readers. */
-			goto try_reader_grant;
 		}
 	}
 
-- 
1.7.4.4




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 4/6] rwsem/wake: check lock before do atomic update
       [not found] <cover.1380057198.git.tim.c.chen@linux.intel.com>
                   ` (3 preceding siblings ...)
  2013-09-24 22:22 ` [PATCH v5 3/6] rwsem: remove try_reader_grant label do_wake Tim Chen
@ 2013-09-24 22:22 ` Tim Chen
  2013-09-24 22:22 ` [PATCH v5 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
  2013-09-24 22:22 ` [PATCH v5 6/6] rwsem: do optimistic spinning for writer lock acquisition Tim Chen
  6 siblings, 0 replies; 13+ messages in thread
From: Tim Chen @ 2013-09-24 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

Atomic update lock and roll back will cause cache bouncing in large
machine. A lock status pre-read can relieve this problem

Suggested-by: Davidlohr bueso <davidlohr.bueso@hp.com>
Suggested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 lib/rwsem.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index a8055cf..1d6e6e8 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -64,7 +64,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	struct rwsem_waiter *waiter;
 	struct task_struct *tsk;
 	struct list_head *next;
-	long oldcount, woken, loop, adjustment;
+	long woken, loop, adjustment;
 
 	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
 	if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
@@ -86,6 +86,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	if (wake_type != RWSEM_WAKE_READ_OWNED) {
 		adjustment = RWSEM_ACTIVE_READ_BIAS;
 		while (1) {
+			long oldcount;
+
+			/* A writer stole the lock. */
+			if (unlikely(sem->count < RWSEM_WAITING_BIAS))
+				return sem;
+
 			oldcount = rwsem_atomic_update(adjustment, sem)
 								- adjustment;
 			if (likely(oldcount >= RWSEM_WAITING_BIAS))
-- 
1.7.4.4




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
       [not found] <cover.1380057198.git.tim.c.chen@linux.intel.com>
                   ` (4 preceding siblings ...)
  2013-09-24 22:22 ` [PATCH v5 4/6] rwsem/wake: check lock before do atomic update Tim Chen
@ 2013-09-24 22:22 ` Tim Chen
  2013-09-25  5:55   ` Peter Zijlstra
  2013-09-24 22:22 ` [PATCH v5 6/6] rwsem: do optimistic spinning for writer lock acquisition Tim Chen
  6 siblings, 1 reply; 13+ messages in thread
From: Tim Chen @ 2013-09-24 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

We will need the MCS lock code for doing optimistic spinning for rwsem.
Extracting the MCS code from mutex.c and put into its own file allow us
to reuse this code easily for rwsem.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 kernel/mutex.c |   58 ++++++-------------------------------------------------
 1 files changed, 7 insertions(+), 51 deletions(-)

diff --git a/kernel/mutex.c b/kernel/mutex.c
index 6d647ae..1b6ba3f 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -25,6 +25,7 @@
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
 #include <linux/debug_locks.h>
+#include <linux/mcslock.h>
 
 /*
  * In the DEBUG case we are using the "NULL fastpath" for mutexes,
@@ -111,54 +112,9 @@ EXPORT_SYMBOL(mutex_lock);
  * more or less simultaneously, the spinners need to acquire a MCS lock
  * first before spinning on the owner field.
  *
- * We don't inline mspin_lock() so that perf can correctly account for the
- * time spent in this lock function.
  */
-struct mspin_node {
-	struct mspin_node *next ;
-	int		  locked;	/* 1 if lock acquired */
-};
-#define	MLOCK(mutex)	((struct mspin_node **)&((mutex)->spin_mlock))
 
-static noinline
-void mspin_lock(struct mspin_node **lock, struct mspin_node *node)
-{
-	struct mspin_node *prev;
-
-	/* Init node */
-	node->locked = 0;
-	node->next   = NULL;
-
-	prev = xchg(lock, node);
-	if (likely(prev == NULL)) {
-		/* Lock acquired */
-		node->locked = 1;
-		return;
-	}
-	ACCESS_ONCE(prev->next) = node;
-	smp_wmb();
-	/* Wait until the lock holder passes the lock down */
-	while (!ACCESS_ONCE(node->locked))
-		arch_mutex_cpu_relax();
-}
-
-static void mspin_unlock(struct mspin_node **lock, struct mspin_node *node)
-{
-	struct mspin_node *next = ACCESS_ONCE(node->next);
-
-	if (likely(!next)) {
-		/*
-		 * Release the lock by setting it to NULL
-		 */
-		if (cmpxchg(lock, node, NULL) == node)
-			return;
-		/* Wait until the next pointer is set */
-		while (!(next = ACCESS_ONCE(node->next)))
-			arch_mutex_cpu_relax();
-	}
-	ACCESS_ONCE(next->locked) = 1;
-	smp_wmb();
-}
+#define	MLOCK(mutex)	((struct mcs_spin_node **)&((mutex)->spin_mlock))
 
 /*
  * Mutex spinning code migrated from kernel/sched/core.c
@@ -448,7 +404,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 
 	for (;;) {
 		struct task_struct *owner;
-		struct mspin_node  node;
+		struct mcs_spin_node  node;
 
 		if (!__builtin_constant_p(ww_ctx == NULL) && ww_ctx->acquired > 0) {
 			struct ww_mutex *ww;
@@ -470,10 +426,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * If there's an owner, wait for it to either
 		 * release the lock or go to sleep.
 		 */
-		mspin_lock(MLOCK(lock), &node);
+		mcs_spin_lock(MLOCK(lock), &node);
 		owner = ACCESS_ONCE(lock->owner);
 		if (owner && !mutex_spin_on_owner(lock, owner)) {
-			mspin_unlock(MLOCK(lock), &node);
+			mcs_spin_unlock(MLOCK(lock), &node);
 			goto slowpath;
 		}
 
@@ -488,11 +444,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			}
 
 			mutex_set_owner(lock);
-			mspin_unlock(MLOCK(lock), &node);
+			mcs_spin_unlock(MLOCK(lock), &node);
 			preempt_enable();
 			return 0;
 		}
-		mspin_unlock(MLOCK(lock), &node);
+		mcs_spin_unlock(MLOCK(lock), &node);
 
 		/*
 		 * When there's no owner, we might have preempted between the
-- 
1.7.4.4




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 6/6] rwsem: do optimistic spinning for writer lock acquisition
       [not found] <cover.1380057198.git.tim.c.chen@linux.intel.com>
                   ` (5 preceding siblings ...)
  2013-09-24 22:22 ` [PATCH v5 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
@ 2013-09-24 22:22 ` Tim Chen
  2013-09-25  6:52   ` Ingo Molnar
  6 siblings, 1 reply; 13+ messages in thread
From: Tim Chen @ 2013-09-24 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

We want to add optimistic spinning to rwsems because
the writer rwsem does not perform as well as mutexes. Tim noticed that
for exim (mail server) workloads, when reverting commit 4fc3f1d6 and
Davidlohr noticed it when converting the i_mmap_mutex to a rwsem in some
aim7 workloads. We've noticed that the biggest difference
is when we fail to acquire a mutex in the fastpath, optimistic spinning
comes in to play and we can avoid a large amount of unnecessary sleeping
and overhead of moving tasks in and out of wait queue.

Allowing optimistic spinning before putting the writer on the wait queue
reduces wait queue contention and provided greater chance for the rwsem
to get acquired. With these changes, rwsem is on par with mutex.

Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 include/linux/rwsem.h |    8 ++-
 kernel/rwsem.c        |   19 +++++-
 lib/rwsem.c           |  193 +++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 201 insertions(+), 19 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 0616ffe..4dfc6ae 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -29,6 +29,8 @@ struct rw_semaphore {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map	dep_map;
 #endif
+	struct task_struct	*owner;
+	void			*spin_mlock;
 };
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
@@ -58,8 +60,10 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
 #define __RWSEM_INITIALIZER(name)			\
 	{ RWSEM_UNLOCKED_VALUE,				\
 	  __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock),	\
-	  LIST_HEAD_INIT((name).wait_list)		\
-	  __RWSEM_DEP_MAP_INIT(name) }
+	  LIST_HEAD_INIT((name).wait_list),		\
+	  __RWSEM_DEP_MAP_INIT(name)			\
+	  NULL,						\
+	  NULL }
 
 #define DECLARE_RWSEM(name) \
 	struct rw_semaphore name = __RWSEM_INITIALIZER(name)
diff --git a/kernel/rwsem.c b/kernel/rwsem.c
index cfff143..d74d1c9 100644
--- a/kernel/rwsem.c
+++ b/kernel/rwsem.c
@@ -12,6 +12,16 @@
 
 #include <linux/atomic.h>
 
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+	sem->owner = current;
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+	sem->owner = NULL;
+}
+
 /*
  * lock for reading
  */
@@ -48,6 +58,7 @@ void __sched down_write(struct rw_semaphore *sem)
 	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
+	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(down_write);
@@ -59,8 +70,10 @@ int down_write_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_write_trylock(sem);
 
-	if (ret == 1)
+	if (ret == 1) {
 		rwsem_acquire(&sem->dep_map, 0, 1, _RET_IP_);
+		rwsem_set_owner(sem);
+	}
 	return ret;
 }
 
@@ -86,6 +99,7 @@ void up_write(struct rw_semaphore *sem)
 	rwsem_release(&sem->dep_map, 1, _RET_IP_);
 
 	__up_write(sem);
+	rwsem_clear_owner(sem);
 }
 
 EXPORT_SYMBOL(up_write);
@@ -100,6 +114,7 @@ void downgrade_write(struct rw_semaphore *sem)
 	 * dependency.
 	 */
 	__downgrade_write(sem);
+	rwsem_clear_owner(sem);
 }
 
 EXPORT_SYMBOL(downgrade_write);
@@ -122,6 +137,7 @@ void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest)
 	rwsem_acquire_nest(&sem->dep_map, 0, 0, nest, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
+	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(_down_write_nest_lock);
@@ -141,6 +157,7 @@ void down_write_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
+	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(down_write_nested);
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 1d6e6e8..1752ec9 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -10,6 +10,8 @@
 #include <linux/sched.h>
 #include <linux/init.h>
 #include <linux/export.h>
+#include <linux/sched/rt.h>
+#include <linux/mcslock.h>
 
 /*
  * Initialize an rwsem:
@@ -27,6 +29,8 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name,
 	sem->count = RWSEM_UNLOCKED_VALUE;
 	raw_spin_lock_init(&sem->wait_lock);
 	INIT_LIST_HEAD(&sem->wait_list);
+	sem->owner = NULL;
+	sem->spin_mlock = NULL;
 }
 
 EXPORT_SYMBOL(__init_rwsem);
@@ -194,14 +198,177 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 	return sem;
 }
 
+static inline int rwsem_try_write_lock(long count, struct rw_semaphore *sem)
+{
+	if (!(count & RWSEM_ACTIVE_MASK)) {
+		/* Try acquiring the write lock. */
+		if (sem->count == RWSEM_WAITING_BIAS &&
+		    cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
+			    RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
+			if (!list_is_singular(&sem->wait_list))
+				rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
+			return 1;
+		}
+	}
+	return 0;
+}
+
+static inline int rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
+{
+	long count;
+
+	count = ACCESS_ONCE(sem->count);
+retry:
+	if (count == RWSEM_WAITING_BIAS) {
+		count = cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
+			    RWSEM_ACTIVE_WRITE_BIAS + RWSEM_WAITING_BIAS);
+		/* allow write lock stealing, try acquiring the write lock. */
+		if (count == RWSEM_WAITING_BIAS)
+			goto acquired;
+		else if (count == 0)
+			goto retry;
+	} else if (count == 0) {
+		count = cmpxchg(&sem->count, 0,
+			    RWSEM_ACTIVE_WRITE_BIAS);
+		if (count == 0)
+			goto acquired;
+		else if (count == RWSEM_WAITING_BIAS)
+			goto retry;
+	}
+	return 0;
+
+acquired:
+	return 1;
+}
+
+
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+{
+	int retval;
+	struct task_struct *owner;
+
+	rcu_read_lock();
+	owner = ACCESS_ONCE(sem->owner);
+
+	/* Spin only if active writer running */
+	if (owner)
+		retval = owner->on_cpu;
+	else
+		retval = false;
+
+	rcu_read_unlock();
+	/*
+	 * if lock->owner is not set, the sem owner may have just acquired
+	 * it and not set the owner yet, or the sem has been released, or
+	 * reader active.
+	 */
+	return retval;
+}
+
+static inline bool owner_running(struct rw_semaphore *lock,
+				struct task_struct *owner)
+{
+	if (lock->owner != owner)
+		return false;
+
+	/*
+	 * Ensure we emit the owner->on_cpu, dereference _after_ checking
+	 * lock->owner still matches owner, if that fails, owner might
+	 * point to free()d memory, if it still matches, the rcu_read_lock()
+	 * ensures the memory stays valid.
+	 */
+	barrier();
+
+	return owner->on_cpu;
+}
+
+static noinline
+int rwsem_spin_on_owner(struct rw_semaphore *lock, struct task_struct *owner)
+{
+	rcu_read_lock();
+	while (owner_running(lock, owner)) {
+		if (need_resched())
+			break;
+
+		arch_mutex_cpu_relax();
+	}
+	rcu_read_unlock();
+
+	/*
+	 * We break out the loop above on need_resched() or when the
+	 * owner changed, which is a sign for heavy contention. Return
+	 * success only when lock->owner is NULL.
+	 */
+	return lock->owner == NULL;
+}
+
+#define MLOCK(rwsem)    ((struct mcs_spin_node **)&((rwsem)->spin_mlock))
+
+int rwsem_optimistic_spin(struct rw_semaphore *sem)
+{
+	struct	task_struct	*owner;
+	int	ret = 0;
+
+	/* sem->wait_lock should not be held when doing optimistic spinning */
+	if (!rwsem_can_spin_on_owner(sem))
+		return ret;
+
+	preempt_disable();
+	for (;;) {
+		struct mcs_spin_node node;
+
+		mcs_spin_lock(MLOCK(sem), &node);
+		owner = ACCESS_ONCE(sem->owner);
+		if (owner && !rwsem_spin_on_owner(sem, owner)) {
+			mcs_spin_unlock(MLOCK(sem), &node);
+			break;
+		}
+
+		/* wait_lock will be acquired if write_lock is obtained */
+		if (rwsem_try_write_lock_unqueued(sem)) {
+			mcs_spin_unlock(MLOCK(sem), &node);
+			ret = 1;
+			break;
+		}
+		mcs_spin_unlock(MLOCK(sem), &node);
+
+		/*
+		 * When there's no owner, we might have preempted between the
+		 * owner acquiring the lock and setting the owner field. If
+		 * we're an RT task that will live-lock because we won't let
+		 * the owner complete.
+		 */
+		if (!owner && (need_resched() || rt_task(current)))
+			break;
+
+		/*
+		 * The cpu_relax() call is a compiler barrier which forces
+		 * everything in this loop to be re-loaded. We don't need
+		 * memory barriers as we'll eventually observe the right
+		 * values at the cost of a few extra spins.
+		 */
+		arch_mutex_cpu_relax();
+
+	}
+
+	preempt_enable();
+	return ret;
+}
+
 /*
  * wait until we successfully acquire the write lock
  */
 struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 {
-	long count, adjustment = -RWSEM_ACTIVE_WRITE_BIAS;
+	long count;
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk = current;
+	bool waiting = true;
+
+	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
+	/* do optimistic spinning */
+	if (rwsem_optimistic_spin(sem))
+		goto done;
 
 	/* set up my own style of waitqueue */
 	waiter.task = tsk;
@@ -209,33 +376,26 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	if (list_empty(&sem->wait_list))
-		adjustment += RWSEM_WAITING_BIAS;
+		waiting = false;
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-	count = rwsem_atomic_update(adjustment, sem);
+	if (waiting)
+		count = ACCESS_ONCE(sem->count);
+	else
+		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
 
 	/* If there were already threads queued before us and there are no
 	 * active writers, the lock must be read owned; so we try to wake
 	 * any read locks that were queued ahead of us. */
-	if (count > RWSEM_WAITING_BIAS &&
-	    adjustment == -RWSEM_ACTIVE_WRITE_BIAS)
+	if ((count > RWSEM_WAITING_BIAS) && waiting)
 		sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);
 
 	/* wait until we successfully acquire the lock */
 	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
 	while (true) {
-		if (!(count & RWSEM_ACTIVE_MASK)) {
-			/* Try acquiring the write lock. */
-			count = RWSEM_ACTIVE_WRITE_BIAS;
-			if (!list_is_singular(&sem->wait_list))
-				count += RWSEM_WAITING_BIAS;
-
-			if (sem->count == RWSEM_WAITING_BIAS &&
-			    cmpxchg(&sem->count, RWSEM_WAITING_BIAS, count) ==
-							RWSEM_WAITING_BIAS)
-				break;
-		}
+		if (rwsem_try_write_lock(count, sem))
+			break;
 
 		raw_spin_unlock_irq(&sem->wait_lock);
 
@@ -250,6 +410,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
+done:
 	tsk->state = TASK_RUNNING;
 
 	return sem;
-- 
1.7.4.4



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/6] rwsem: check the lock before cpmxchg in down_write_trylock
  2013-09-24 22:22 ` [PATCH v5 1/6] rwsem: check the lock before cpmxchg in down_write_trylock Tim Chen
@ 2013-09-24 23:22   ` Jason Low
  2013-09-24 23:30     ` Tim Chen
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Low @ 2013-09-24 23:22 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

Should we do something similar with __down_read_trylock, such as
the following?


Signed-off-by: Jason Low <jason.low2@hp.com>
---
 include/asm-generic/rwsem.h |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index bb1e2cd..47990dc 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -42,6 +42,9 @@ static inline int __down_read_trylock(struct
rw_semaphore *sem)
        long tmp;

        while ((tmp = sem->count) >= 0) {
+               if (sem->count != tmp)
+                       continue;
+
                if (tmp == cmpxchg(&sem->count, tmp,
                                   tmp + RWSEM_ACTIVE_READ_BIAS)) {
                        return 1;
-- 
1.7.1

On Tue, Sep 24, 2013 at 3:22 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> Cmpxchg will cause the cacheline bouning when do the value checking,
> that cause scalability issue in a large machine (like a 80 core box).
>
> So a lock pre-read can relief this contention.
>
> Signed-off-by: Alex Shi <alex.shi@intel.com>
> ---
>  include/asm-generic/rwsem.h |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
> index bb1e2cd..5ba80e7 100644
> --- a/include/asm-generic/rwsem.h
> +++ b/include/asm-generic/rwsem.h
> @@ -70,11 +70,11 @@ static inline void __down_write(struct rw_semaphore *sem)
>
>  static inline int __down_write_trylock(struct rw_semaphore *sem)
>  {
> -       long tmp;
> +       if (unlikely(sem->count != RWSEM_UNLOCKED_VALUE))
> +               return 0;
>
> -       tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
> -                     RWSEM_ACTIVE_WRITE_BIAS);
> -       return tmp == RWSEM_UNLOCKED_VALUE;
> +       return cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
> +                     RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_UNLOCKED_VALUE;
>  }
>
>  /*
> --
> 1.7.4.4
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/6] rwsem: check the lock before cpmxchg in down_write_trylock
  2013-09-24 23:22   ` Jason Low
@ 2013-09-24 23:30     ` Tim Chen
  0 siblings, 0 replies; 13+ messages in thread
From: Tim Chen @ 2013-09-24 23:30 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Tue, 2013-09-24 at 16:22 -0700, Jason Low wrote:
> Should we do something similar with __down_read_trylock, such as
> the following?
> 
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>
> ---
>  include/asm-generic/rwsem.h |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
> index bb1e2cd..47990dc 100644
> --- a/include/asm-generic/rwsem.h
> +++ b/include/asm-generic/rwsem.h
> @@ -42,6 +42,9 @@ static inline int __down_read_trylock(struct
> rw_semaphore *sem)
>         long tmp;
> 
>         while ((tmp = sem->count) >= 0) {
> +               if (sem->count != tmp)
> +                       continue;
> +

Considering that tmp has just been assigned the value of sem->count, the
added if check failure is unlikely and probably not needed.  We should
proceed to cmpxchg below.

>                 if (tmp == cmpxchg(&sem->count, tmp,
>                                    tmp + RWSEM_ACTIVE_READ_BIAS)) {
>                         return 1;

Tim


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/6]  rwsem: performance optimizations
  2013-09-24 22:22 ` [PATCH v5 0/6] rwsem: performance optimizations Tim Chen
@ 2013-09-25  3:32   ` Davidlohr Bueso
  0 siblings, 0 replies; 13+ messages in thread
From: Davidlohr Bueso @ 2013-09-25  3:32 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Tue, 2013-09-24 at 15:22 -0700, Tim Chen wrote:
> We have incorporated various suggestions from Ingo for version 5 of this patchset
> and will like to have it merged if there are no objections.
> 
> In this patchset, we introduce two categories of optimizations to read
> write semaphore.  The first four patches from Alex Shi reduce cache bouncing of the
> sem->count field by doing a pre-read of the sem->count and avoid cmpxchg
> if possible.
> 
> The last two patches introduce similar optimistic spinning logic as
> the mutex code for the writer lock acquisition of rwsem.

Right. We address the general 'mutexes out perform writer-rwsems'
situations that has been seen in more than one case. Users now need not
worry about performance issues when choosing between these two locking
mechanisms.

Thanks,
Davidlohr



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-24 22:22 ` [PATCH v5 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
@ 2013-09-25  5:55   ` Peter Zijlstra
  2013-09-25 15:58     ` Tim Chen
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2013-09-25  5:55 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Tue, Sep 24, 2013 at 03:22:46PM -0700, Tim Chen wrote:
> We will need the MCS lock code for doing optimistic spinning for rwsem.
> Extracting the MCS code from mutex.c and put into its own file allow us
> to reuse this code easily for rwsem.
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> ---
>  kernel/mutex.c |   58 ++++++-------------------------------------------------
>  1 files changed, 7 insertions(+), 51 deletions(-)

Wasn't this patch supposed to add include/linux/mcslock.h ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 6/6] rwsem: do optimistic spinning for writer lock acquisition
  2013-09-24 22:22 ` [PATCH v5 6/6] rwsem: do optimistic spinning for writer lock acquisition Tim Chen
@ 2013-09-25  6:52   ` Ingo Molnar
  0 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2013-09-25  6:52 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm, Linus Torvalds


just a few style nitpicks:

>  
> +static inline int rwsem_try_write_lock(long count, struct rw_semaphore *sem)
> +{
> +	if (!(count & RWSEM_ACTIVE_MASK)) {
> +		/* Try acquiring the write lock. */
> +		if (sem->count == RWSEM_WAITING_BIAS &&
> +		    cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
> +			    RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
> +			if (!list_is_singular(&sem->wait_list))
> +				rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
> +			return 1;
> +		}
> +	}
> +	return 0;
> +}
> +static inline int rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
> +{
> +	long count;
> +
> +	count = ACCESS_ONCE(sem->count);
> +retry:
> +	if (count == RWSEM_WAITING_BIAS) {
> +		count = cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
> +			    RWSEM_ACTIVE_WRITE_BIAS + RWSEM_WAITING_BIAS);
> +		/* allow write lock stealing, try acquiring the write lock. */
> +		if (count == RWSEM_WAITING_BIAS)
> +			goto acquired;
> +		else if (count == 0)
> +			goto retry;
> +	} else if (count == 0) {
> +		count = cmpxchg(&sem->count, 0,
> +			    RWSEM_ACTIVE_WRITE_BIAS);

So, you factored this out from within a deeply nested piece of code - but 
the ugly line-break still remained - that can now be put into a single 
line.

> +		if (count == 0)
> +			goto acquired;
> +		else if (count == RWSEM_WAITING_BIAS)
> +			goto retry;
> +	}
> +	return 0;
> +
> +acquired:
> +	return 1;
> +}
> +
> +

Unnecessary newline.

> +static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
> +{
> +	int retval;
> +	struct task_struct *owner;
> +
> +	rcu_read_lock();
> +	owner = ACCESS_ONCE(sem->owner);
> +
> +	/* Spin only if active writer running */
> +	if (owner)
> +		retval = owner->on_cpu;
> +	else
> +		retval = false;
> +
> +	rcu_read_unlock();
> +	/*
> +	 * if lock->owner is not set, the sem owner may have just acquired
> +	 * it and not set the owner yet, or the sem has been released, or
> +	 * reader active.
> +	 */
> +	return retval;
> +}
> +
> +static inline bool owner_running(struct rw_semaphore *lock,
> +				struct task_struct *owner)
> +{
> +	if (lock->owner != owner)
> +		return false;
> +
> +	/*
> +	 * Ensure we emit the owner->on_cpu, dereference _after_ checking
> +	 * lock->owner still matches owner, if that fails, owner might
> +	 * point to free()d memory, if it still matches, the rcu_read_lock()
> +	 * ensures the memory stays valid.
> +	 */
> +	barrier();
> +
> +	return owner->on_cpu;
> +}
> +
> +static noinline
> +int rwsem_spin_on_owner(struct rw_semaphore *lock, struct task_struct *owner)
> +{
> +	rcu_read_lock();
> +	while (owner_running(lock, owner)) {
> +		if (need_resched())
> +			break;
> +
> +		arch_mutex_cpu_relax();
> +	}
> +	rcu_read_unlock();
> +
> +	/*
> +	 * We break out the loop above on need_resched() or when the
> +	 * owner changed, which is a sign for heavy contention. Return
> +	 * success only when lock->owner is NULL.
> +	 */
> +	return lock->owner == NULL;
> +}
> +
> +#define MLOCK(rwsem)    ((struct mcs_spin_node **)&((rwsem)->spin_mlock))
> +
> +int rwsem_optimistic_spin(struct rw_semaphore *sem)
> +{
> +	struct	task_struct	*owner;
> +	int	ret = 0;

Those tabs look weird.

> +
> +	/* sem->wait_lock should not be held when doing optimistic spinning */
> +	if (!rwsem_can_spin_on_owner(sem))
> +		return ret;
> +
> +	preempt_disable();
> +	for (;;) {
> +		struct mcs_spin_node node;
> +
> +		mcs_spin_lock(MLOCK(sem), &node);
> +		owner = ACCESS_ONCE(sem->owner);
> +		if (owner && !rwsem_spin_on_owner(sem, owner)) {
> +			mcs_spin_unlock(MLOCK(sem), &node);
> +			break;
> +		}
> +
> +		/* wait_lock will be acquired if write_lock is obtained */
> +		if (rwsem_try_write_lock_unqueued(sem)) {
> +			mcs_spin_unlock(MLOCK(sem), &node);
> +			ret = 1;
> +			break;
> +		}
> +		mcs_spin_unlock(MLOCK(sem), &node);
> +
> +		/*
> +		 * When there's no owner, we might have preempted between the
> +		 * owner acquiring the lock and setting the owner field. If
> +		 * we're an RT task that will live-lock because we won't let
> +		 * the owner complete.
> +		 */
> +		if (!owner && (need_resched() || rt_task(current)))
> +			break;
> +
> +		/*
> +		 * The cpu_relax() call is a compiler barrier which forces
> +		 * everything in this loop to be re-loaded. We don't need
> +		 * memory barriers as we'll eventually observe the right
> +		 * values at the cost of a few extra spins.
> +		 */
> +		arch_mutex_cpu_relax();
> +

Unnecessary newline.

> +	}
> +
> +	preempt_enable();
> +	return ret;
> +}

Please move the preempt_enable() one line higher, with the extra newline 
after it - it pairs with the loop, not with the return statement.

> +
>  /*
>   * wait until we successfully acquire the write lock
>   */
>  struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
>  {
> -	long count, adjustment = -RWSEM_ACTIVE_WRITE_BIAS;
> +	long count;
>  	struct rwsem_waiter waiter;
>  	struct task_struct *tsk = current;
> +	bool waiting = true;
> +
> +	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
> +	/* do optimistic spinning */
> +	if (rwsem_optimistic_spin(sem))
> +		goto done;
>  
>  	/* set up my own style of waitqueue */
>  	waiter.task = tsk;
> @@ -209,33 +376,26 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
>  
>  	raw_spin_lock_irq(&sem->wait_lock);
>  	if (list_empty(&sem->wait_list))
> -		adjustment += RWSEM_WAITING_BIAS;
> +		waiting = false;
>  	list_add_tail(&waiter.list, &sem->wait_list);
>  
>  	/* we're now waiting on the lock, but no longer actively locking */
> -	count = rwsem_atomic_update(adjustment, sem);
> +	if (waiting)
> +		count = ACCESS_ONCE(sem->count);
> +	else
> +		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
>  
>  	/* If there were already threads queued before us and there are no
>  	 * active writers, the lock must be read owned; so we try to wake
>  	 * any read locks that were queued ahead of us. */

Please convert this comment to the usual style while touching the code so 
heavily.

> -	if (count > RWSEM_WAITING_BIAS &&
> -	    adjustment == -RWSEM_ACTIVE_WRITE_BIAS)
> +	if ((count > RWSEM_WAITING_BIAS) && waiting)
>  		sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);
>  
>  	/* wait until we successfully acquire the lock */
>  	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
>  	while (true) {

the pattern we typically use in that case is:

	for (;;) {

Like you did in rwsem_optimistic_spin().

There's a lot of style leeway in the kernel, but we try to 'harmonize' 
such patterns as much as possible within the same subsystem, especially 
when changing the code radically which will probably necessiate future 
detective work on this code.

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-25  5:55   ` Peter Zijlstra
@ 2013-09-25 15:58     ` Tim Chen
  0 siblings, 0 replies; 13+ messages in thread
From: Tim Chen @ 2013-09-25 15:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Wed, 2013-09-25 at 07:55 +0200, Peter Zijlstra wrote:
> On Tue, Sep 24, 2013 at 03:22:46PM -0700, Tim Chen wrote:
> > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > Extracting the MCS code from mutex.c and put into its own file allow us
> > to reuse this code easily for rwsem.
> > 
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > ---
> >  kernel/mutex.c |   58 ++++++-------------------------------------------------
> >  1 files changed, 7 insertions(+), 51 deletions(-)
> 
> Wasn't this patch supposed to add include/linux/mcslock.h ?

Thanks for catching it.  I will correct it.

Tim


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-09-25 15:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1380057198.git.tim.c.chen@linux.intel.com>
2013-09-24 22:22 ` [PATCH v5 0/6] rwsem: performance optimizations Tim Chen
2013-09-25  3:32   ` Davidlohr Bueso
2013-09-24 22:22 ` [PATCH v5 1/6] rwsem: check the lock before cpmxchg in down_write_trylock Tim Chen
2013-09-24 23:22   ` Jason Low
2013-09-24 23:30     ` Tim Chen
2013-09-24 22:22 ` [PATCH v5 2/6] rwsem: remove 'out' label in do_wake Tim Chen
2013-09-24 22:22 ` [PATCH v5 3/6] rwsem: remove try_reader_grant label do_wake Tim Chen
2013-09-24 22:22 ` [PATCH v5 4/6] rwsem/wake: check lock before do atomic update Tim Chen
2013-09-24 22:22 ` [PATCH v5 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
2013-09-25  5:55   ` Peter Zijlstra
2013-09-25 15:58     ` Tim Chen
2013-09-24 22:22 ` [PATCH v5 6/6] rwsem: do optimistic spinning for writer lock acquisition Tim Chen
2013-09-25  6:52   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).