linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -tip v2 0/5] rwsem: Fine tuning
@ 2015-01-30  9:14 Davidlohr Bueso
  2015-01-30  9:14 ` [PATCH 1/5] locking/rwsem: Use task->state helpers Davidlohr Bueso
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2015-01-30  9:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso

Hello,

First two patches are self descriptive obvious add-ons.

The rest are performance enhancements for write-mostly workloads.
While this is not our priority (use mutexes instead!!), there are
cases where heavy use of writers can severely hurt rwsem performance.
For instance, we got reports[1] of writer only issues when converting
the i_mmap_rwsem from mutex, on a workload that pathologically pounds
on this lock for vma operations:

-     81.20%           execl  [kernel.kallsyms]     [k] osq_lock
      - 100.00% mutex_optimistic_spin
           __mutex_lock_slowpath
         - mutex_lock
            + 47.71% unlink_file_vma
            + 34.91% vma_adjust
            + 17.38% vma_link

This is enough to make small differences painfully evident. These changes
(particularly patch 5/5) recover most (~75%) of the performance regression.
Patches 3- 5 deal with optimistic spinning fine tuning. Details are in the
individual patches. Passes multiple x86-64 tests.

Thanks!

Changes since v1 (https://lkml.org/lkml/2015/1/26/34):

- Redid patch 2.

- Improved "Check for active lock before bailing on spinning" patch
to consider checking if the lock has a new owner.

- Dropped patch 5, will send a new patch later disabling preemption in the
slowpath to get rid of the barriers (assuming testing goes well) without
missing wakeups. It's not very related to the patches in this series anyway.

Davidlohr Bueso (5):
  locking/rwsem: Use task->state helpers
  locking/rwsem: Document barrier need when waking tasks
  locking/rwsem: Set lock ownership ASAP
  locking/rwsem: Avoid deceiving lock spinners
  locking/rwsem: Check for active lock before bailing on spinning

 kernel/locking/mutex.c          |  2 +-
 kernel/locking/rwsem-spinlock.c |  9 +++++-
 kernel/locking/rwsem-xadd.c     | 66 ++++++++++++++++++++++++++++-------------
 kernel/locking/rwsem.c          | 22 +-------------
 kernel/locking/rwsem.h          | 20 +++++++++++++
 5 files changed, 76 insertions(+), 43 deletions(-)
 create mode 100644 kernel/locking/rwsem.h

-- 
2.1.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] locking/rwsem: Use task->state helpers
  2015-01-30  9:14 [PATCH -tip v2 0/5] rwsem: Fine tuning Davidlohr Bueso
@ 2015-01-30  9:14 ` Davidlohr Bueso
  2015-01-30  9:14 ` [PATCH 2/5] locking/rwsem: Document barrier need when waking tasks Davidlohr Bueso
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2015-01-30  9:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

Call __set_task_state() instead of assigning the new state
directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP
environments, keeping track of who last changed the state.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/rwsem-spinlock.c | 2 +-
 kernel/locking/rwsem-xadd.c     | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 2c93571..2555ae1 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -154,7 +154,7 @@ void __sched __down_read(struct rw_semaphore *sem)
 		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
 	}
 
-	tsk->state = TASK_RUNNING;
+	__set_task_state(tsk, TASK_RUNNING);
  out:
 	;
 }
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 7628c3f..2f7cc40 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -242,8 +242,7 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 		schedule();
 	}
 
-	tsk->state = TASK_RUNNING;
-
+	__set_task_state(tsk, TASK_RUNNING);
 	return sem;
 }
 EXPORT_SYMBOL(rwsem_down_read_failed);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] locking/rwsem: Document barrier need when waking tasks
  2015-01-30  9:14 [PATCH -tip v2 0/5] rwsem: Fine tuning Davidlohr Bueso
  2015-01-30  9:14 ` [PATCH 1/5] locking/rwsem: Use task->state helpers Davidlohr Bueso
@ 2015-01-30  9:14 ` Davidlohr Bueso
  2015-02-18 17:11   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
  2015-01-30  9:14 ` [PATCH 3/5] locking/rwsem: Set lock ownership ASAP Davidlohr Bueso
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Davidlohr Bueso @ 2015-01-30  9:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

The need for the smp_mb in __rwsem_do_wake should be
properly documented. Applies to both xadd and spinlock
variants.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/rwsem-spinlock.c | 7 +++++++
 kernel/locking/rwsem-xadd.c     | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 2555ae1..3a50485 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -85,6 +85,13 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
 
 		list_del(&waiter->list);
 		tsk = waiter->task;
+		/*
+		 * Make sure we do not wakeup the next reader before
+		 * setting the nil condition to grant the next reader;
+		 * otherwise we could miss the wakeup on the other
+		 * side and end up sleeping again. See the pairing
+		 * in rwsem_down_read_failed().
+		 */
 		smp_mb();
 		waiter->task = NULL;
 		wake_up_process(tsk);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2f7cc40..82aba46 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -186,6 +186,13 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 		waiter = list_entry(next, struct rwsem_waiter, list);
 		next = waiter->list.next;
 		tsk = waiter->task;
+		/*
+		 * Make sure we do not wakeup the next reader before
+		 * setting the nil condition to grant the next reader;
+		 * otherwise we could miss the wakeup on the other
+		 * side and end up sleeping again. See the pairing
+		 * in rwsem_down_read_failed().
+		 */
 		smp_mb();
 		waiter->task = NULL;
 		wake_up_process(tsk);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] locking/rwsem: Set lock ownership ASAP
  2015-01-30  9:14 [PATCH -tip v2 0/5] rwsem: Fine tuning Davidlohr Bueso
  2015-01-30  9:14 ` [PATCH 1/5] locking/rwsem: Use task->state helpers Davidlohr Bueso
  2015-01-30  9:14 ` [PATCH 2/5] locking/rwsem: Document barrier need when waking tasks Davidlohr Bueso
@ 2015-01-30  9:14 ` Davidlohr Bueso
  2015-02-18 17:11   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
  2015-01-30  9:14 ` [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners Davidlohr Bueso
  2015-01-30  9:14 ` [PATCH 5/5] locking/rwsem: Check for active lock before bailing on spinning Davidlohr Bueso
  4 siblings, 1 reply; 22+ messages in thread
From: Davidlohr Bueso @ 2015-01-30  9:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

In order to optimize the spinning step, we need to set the lock
owner as soon as the lock is acquired; after a successful counter
cmpxchg operation, that is. This is particularly useful as rwsems
need to set the owner to nil for readers, so there is a greater
chance of falling out of the spinning. Currently we only set the
owner much later in the game, in the more generic level -- latency
can be specially bad when waiting for a node->next pointer when
releasing the osq in up_write calls.

As such, update the owner inside rwsem_try_write_lock (when the
lock is obtained after blocking) and rwsem_try_write_lock_unqueued
(when the lock is obtained while spinning). This requires creating
a new internal rwsem.h header to share the owner related calls.

Also cleanup some headers for mutex and rwsem.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/mutex.c      |  2 +-
 kernel/locking/rwsem-xadd.c |  8 ++++++--
 kernel/locking/rwsem.c      | 22 +---------------------
 kernel/locking/rwsem.h      | 20 ++++++++++++++++++++
 4 files changed, 28 insertions(+), 24 deletions(-)
 create mode 100644 kernel/locking/rwsem.h

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index c67a60b..166355e 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -25,7 +25,7 @@
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
 #include <linux/debug_locks.h>
-#include "mcs_spinlock.h"
+#include <linux/osq_lock.h>
 
 /*
  * In the DEBUG case we are using the "NULL fastpath" for mutexes,
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 82aba46..07713e5 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -14,8 +14,9 @@
 #include <linux/init.h>
 #include <linux/export.h>
 #include <linux/sched/rt.h>
+#include <linux/osq_lock.h>
 
-#include "mcs_spinlock.h"
+#include "rwsem.h"
 
 /*
  * Guide to the rw_semaphore's count field for common values.
@@ -265,6 +266,7 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
 		    RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
 		if (!list_is_singular(&sem->wait_list))
 			rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
+		rwsem_set_owner(sem);
 		return true;
 	}
 
@@ -284,8 +286,10 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 			return false;
 
 		old = cmpxchg(&sem->count, count, count + RWSEM_ACTIVE_WRITE_BIAS);
-		if (old == count)
+		if (old == count) {
+			rwsem_set_owner(sem);
 			return true;
+		}
 
 		count = old;
 	}
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index e2d3bc7..205be0c 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -9,29 +9,9 @@
 #include <linux/sched.h>
 #include <linux/export.h>
 #include <linux/rwsem.h>
-
 #include <linux/atomic.h>
 
-#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-	sem->owner = current;
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-	sem->owner = NULL;
-}
-
-#else
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-}
-#endif
+#include "rwsem.h"
 
 /*
  * lock for reading
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
new file mode 100644
index 0000000..870ed9a
--- /dev/null
+++ b/kernel/locking/rwsem.h
@@ -0,0 +1,20 @@
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+	sem->owner = current;
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+	sem->owner = NULL;
+}
+
+#else
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+}
+#endif
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-01-30  9:14 [PATCH -tip v2 0/5] rwsem: Fine tuning Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2015-01-30  9:14 ` [PATCH 3/5] locking/rwsem: Set lock ownership ASAP Davidlohr Bueso
@ 2015-01-30  9:14 ` Davidlohr Bueso
  2015-01-31  1:51   ` Tim Chen
                     ` (2 more replies)
  2015-01-30  9:14 ` [PATCH 5/5] locking/rwsem: Check for active lock before bailing on spinning Davidlohr Bueso
  4 siblings, 3 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2015-01-30  9:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

When readers hold the semaphore, the ->owner is nil. As such,
and unlike mutexes, '!owner' does not necessarily imply that
the lock is free. This will cause writers to potentially spin
excessively as they've been mislead to thinking they have a
chance of acquiring the lock, instead of blocking.

This patch therefore enhances the counter check when the owner
is not set by the time we've broken out of the loop. Otherwise
we can return true as a new owner has the lock and thus we want
to continue spinning. While at it, we can make rwsem_spin_on_owner()
less ambiguos and return right away under need_resched conditions.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 07713e5..1c0d11e 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -337,21 +337,30 @@ static inline bool owner_running(struct rw_semaphore *sem,
 static noinline
 bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 {
+	long count;
+
 	rcu_read_lock();
 	while (owner_running(sem, owner)) {
-		if (need_resched())
-			break;
+		/* abort spinning when need_resched */
+		if (need_resched()) {
+			rcu_read_unlock();
+			return false;
+		}
 
 		cpu_relax_lowlatency();
 	}
 	rcu_read_unlock();
 
+	if (READ_ONCE(sem->owner))
+		return true; /* new owner, continue spinning */
+
 	/*
-	 * We break out the loop above on need_resched() or when the
-	 * owner changed, which is a sign for heavy contention. Return
-	 * success only when sem->owner is NULL.
+	 * When the owner is not set, the lock could be free or
+	 * held by readers. Check the counter to verify the
+	 * state.
 	 */
-	return sem->owner == NULL;
+	count = READ_ONCE(sem->count);
+	return (count == 0 || count == RWSEM_WAITING_BIAS);
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] locking/rwsem: Check for active lock before bailing on spinning
  2015-01-30  9:14 [PATCH -tip v2 0/5] rwsem: Fine tuning Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2015-01-30  9:14 ` [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners Davidlohr Bueso
@ 2015-01-30  9:14 ` Davidlohr Bueso
  2015-02-18 17:12   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
  4 siblings, 1 reply; 22+ messages in thread
From: Davidlohr Bueso @ 2015-01-30  9:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

37e9562453b (locking/rwsem: Allow conservative optimistic
spinning when readers have lock) forced the default for
optimistic spinning to be disabled if the lock owner was
nil, which makes much sense for readers. However, while
it is not our priority, we can make some optimizations
for write-mostly workloads. We can bail the spinning step
and still be conservative if there are any active tasks,
otherwise there's really no reason not to spin, as the
semaphore is most likely unlocked.

This patch recovers most of a Unixbench 'execl' benchmark
throughput by sleeping less and making better average system
usage:

before:
CPU     %user     %nice   %system   %iowait    %steal     %idle
all      0.60      0.00      8.02      0.00      0.00     91.38

after:
CPU     %user     %nice   %system   %iowait    %steal     %idle
all      1.22      0.00     70.18      0.00      0.00     28.60

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Jason Low <jason.low2@hp.com>
---
 kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 1c0d11e..e4ad019 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -298,23 +298,30 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner;
-	bool on_cpu = false;
+	bool ret = true;
 
 	if (need_resched())
 		return false;
 
 	rcu_read_lock();
 	owner = ACCESS_ONCE(sem->owner);
-	if (owner)
-		on_cpu = owner->on_cpu;
-	rcu_read_unlock();
+	if (!owner) {
+		long count = ACCESS_ONCE(sem->count);
+		/*
+		 * If sem->owner is not set, yet we have just recently entered the
+		 * slowpath with the lock being active, then there is a possibility
+		 * reader(s) may have the lock. To be safe, bail spinning in these
+		 * situations.
+		 */
+		if (count & RWSEM_ACTIVE_MASK)
+			ret = false;
+		goto done;
+	}
 
-	/*
-	 * If sem->owner is not set, yet we have just recently entered the
-	 * slowpath, then there is a possibility reader(s) may have the lock.
-	 * To be safe, avoid spinning in these situations.
-	 */
-	return on_cpu;
+	ret = owner->on_cpu;
+done:
+	rcu_read_unlock();
+	return ret;
 }
 
 static inline bool owner_running(struct rw_semaphore *sem,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-01-30  9:14 ` [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners Davidlohr Bueso
@ 2015-01-31  1:51   ` Tim Chen
  2015-01-31  2:28     ` Davidlohr Bueso
  2015-01-31  9:29   ` Peter Zijlstra
  2015-02-18 17:12   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
  2 siblings, 1 reply; 22+ messages in thread
From: Tim Chen @ 2015-01-31  1:51 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Jason Low,
	Michel Lespinasse, linux-kernel, Davidlohr Bueso

On Fri, 2015-01-30 at 01:14 -0800, Davidlohr Bueso wrote:
> When readers hold the semaphore, the ->owner is nil. As such,
> and unlike mutexes, '!owner' does not necessarily imply that
> the lock is free. This will cause writers to potentially spin
> excessively as they've been mislead to thinking they have a
> chance of acquiring the lock, instead of blocking.
> 
> This patch therefore enhances the counter check when the owner
> is not set by the time we've broken out of the loop. Otherwise
> we can return true as a new owner has the lock and thus we want
> to continue spinning. While at it, we can make rwsem_spin_on_owner()
> less ambiguos and return right away under need_resched conditions.
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 07713e5..1c0d11e 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -337,21 +337,30 @@ static inline bool owner_running(struct rw_semaphore *sem,
>  static noinline
>  bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
>  {
> +	long count;
> +
>  	rcu_read_lock();
>  	while (owner_running(sem, owner)) {
> -		if (need_resched())
> -			break;
> +		/* abort spinning when need_resched */
> +		if (need_resched()) {
> +			rcu_read_unlock();
> +			return false;
> +		}
>  
>  		cpu_relax_lowlatency();
>  	}
>  	rcu_read_unlock();
>  
> +	if (READ_ONCE(sem->owner))
> +		return true; /* new owner, continue spinning */
> +

Do you have some comparison data of whether it is more advantageous
to continue spinning when owner changes?  After the above change, 
rwsem will behave more like a spin lock for write lock and 
will keep spinning when the lock changes ownership. Now during heavy
lock contention, if we don't continue spinning and sleep, we may use the
clock cycles for actually running other threads.  This was the
assumption in the older code.  The trade
off may or may not be worth it depending on how big the thread
switching overhead is and how long the lock is held.  

It will be good to have a few data points to make sure
that this change is beneficial.

>  	/*
> -	 * We break out the loop above on need_resched() or when the
> -	 * owner changed, which is a sign for heavy contention. Return
> -	 * success only when sem->owner is NULL.
> +	 * When the owner is not set, the lock could be free or
> +	 * held by readers. Check the counter to verify the
> +	 * state.
>  	 */
> -	return sem->owner == NULL;
> +	count = READ_ONCE(sem->count);
> +	return (count == 0 || count == RWSEM_WAITING_BIAS);
>  }
>  
>  static bool rwsem_optimistic_spin(struct rw_semaphore *sem)

Thanks.

Tim


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-01-31  1:51   ` Tim Chen
@ 2015-01-31  2:28     ` Davidlohr Bueso
  2015-02-03 17:16       ` Tim Chen
  0 siblings, 1 reply; 22+ messages in thread
From: Davidlohr Bueso @ 2015-01-31  2:28 UTC (permalink / raw)
  To: Tim Chen
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Jason Low,
	Michel Lespinasse, linux-kernel

On Fri, 2015-01-30 at 17:51 -0800, Tim Chen wrote:
> On Fri, 2015-01-30 at 01:14 -0800, Davidlohr Bueso wrote:
> > When readers hold the semaphore, the ->owner is nil. As such,
> > and unlike mutexes, '!owner' does not necessarily imply that
> > the lock is free. This will cause writers to potentially spin
> > excessively as they've been mislead to thinking they have a
> > chance of acquiring the lock, instead of blocking.
> > 
> > This patch therefore enhances the counter check when the owner
> > is not set by the time we've broken out of the loop. Otherwise
> > we can return true as a new owner has the lock and thus we want
> > to continue spinning. While at it, we can make rwsem_spin_on_owner()
> > less ambiguos and return right away under need_resched conditions.
> > 
> > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > ---
> >  kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------
> >  1 file changed, 15 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > index 07713e5..1c0d11e 100644
> > --- a/kernel/locking/rwsem-xadd.c
> > +++ b/kernel/locking/rwsem-xadd.c
> > @@ -337,21 +337,30 @@ static inline bool owner_running(struct rw_semaphore *sem,
> >  static noinline
> >  bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
> >  {
> > +	long count;
> > +
> >  	rcu_read_lock();
> >  	while (owner_running(sem, owner)) {
> > -		if (need_resched())
> > -			break;
> > +		/* abort spinning when need_resched */
> > +		if (need_resched()) {
> > +			rcu_read_unlock();
> > +			return false;
> > +		}
> >  
> >  		cpu_relax_lowlatency();
> >  	}
> >  	rcu_read_unlock();
> >  
> > +	if (READ_ONCE(sem->owner))
> > +		return true; /* new owner, continue spinning */
> > +
> 
> Do you have some comparison data of whether it is more advantageous
> to continue spinning when owner changes?  After the above change, 
> rwsem will behave more like a spin lock for write lock and 
> will keep spinning when the lock changes ownership.

But recall we still abort when need_resched, so the spinning isn't
infinite. Never has been.

>  Now during heavy
> lock contention, if we don't continue spinning and sleep, we may use the
> clock cycles for actually running other threads. 

Under heavy contention, time spinning will force us to ultimately block
anyway.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-01-30  9:14 ` [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners Davidlohr Bueso
  2015-01-31  1:51   ` Tim Chen
@ 2015-01-31  9:29   ` Peter Zijlstra
  2015-01-31 21:14     ` Davidlohr Bueso
  2015-02-18 17:12   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
  2 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2015-01-31  9:29 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Paul E. McKenney, Jason Low, Michel Lespinasse,
	Tim Chen, linux-kernel, Davidlohr Bueso

On Fri, Jan 30, 2015 at 01:14:26AM -0800, Davidlohr Bueso wrote:
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -337,21 +337,30 @@ static inline bool owner_running(struct rw_semaphore *sem,
>  static noinline
>  bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
>  {
> +	long count;
> +
>  	rcu_read_lock();
>  	while (owner_running(sem, owner)) {
> +		/* abort spinning when need_resched */
> +		if (need_resched()) {
> +			rcu_read_unlock();
> +			return false;
> +		}
>  
>  		cpu_relax_lowlatency();
>  	}
>  	rcu_read_unlock();
>  
> +	if (READ_ONCE(sem->owner))
> +		return true; /* new owner, continue spinning */
> +

Same concern as Tim; also the mutex code seems to terminate the spin
when owner changes. And I think we want to have writers behave similar
to mutexes, no?

Does it make sense to change things to allow owner changes from NULL,
but not to NULL?

>  	/*
> +	 * When the owner is not set, the lock could be free or
> +	 * held by readers. Check the counter to verify the
> +	 * state.
>  	 */
> -	return sem->owner == NULL;
> +	count = READ_ONCE(sem->count);
> +	return (count == 0 || count == RWSEM_WAITING_BIAS);
>  }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-01-31  9:29   ` Peter Zijlstra
@ 2015-01-31 21:14     ` Davidlohr Bueso
  2015-01-31 21:17       ` Davidlohr Bueso
  0 siblings, 1 reply; 22+ messages in thread
From: Davidlohr Bueso @ 2015-01-31 21:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul E. McKenney, Jason Low, Michel Lespinasse,
	Tim Chen, linux-kernel

On Sat, 2015-01-31 at 10:29 +0100, Peter Zijlstra wrote:
> On Fri, Jan 30, 2015 at 01:14:26AM -0800, Davidlohr Bueso wrote:
> > +++ b/kernel/locking/rwsem-xadd.c
> > @@ -337,21 +337,30 @@ static inline bool owner_running(struct rw_semaphore *sem,
> >  static noinline
> >  bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
> >  {
> > +	long count;
> > +
> >  	rcu_read_lock();
> >  	while (owner_running(sem, owner)) {
> > +		/* abort spinning when need_resched */
> > +		if (need_resched()) {
> > +			rcu_read_unlock();
> > +			return false;
> > +		}
> >  
> >  		cpu_relax_lowlatency();
> >  	}
> >  	rcu_read_unlock();
> >  
> > +	if (READ_ONCE(sem->owner))
> > +		return true; /* new owner, continue spinning */
> > +
> 
> Same concern as Tim; also the mutex code seems to terminate the spin
> when owner changes. And I think we want to have writers behave similar
> to mutexes, no?
> 
> Does it make sense to change things to allow owner changes from NULL,
> but not to NULL?

I think it does, yes:

- owner changes to nil: readers can hold the lock. We know the rest.

- owner changes from nil: implies that a writer now holds the mutex. Why
should we want to block? We continue to apply the same reasoning why
we're spinning in the first place. This is very beneficial if, for
instance, we began polling on the owner when the lock is just about to
be released. So a few iterations later, the lock owner changes on us and
with the current code will make us sleep. With this change, after a few
spins it is very likely we'll get the lock. And if not, the need_resched
will ultimately kick in and block. Additionally, as Jason pointed out,
with osq we have no need to worry about simultaneously spinning on the
owner at the same time.

Or am I missing something?

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-01-31 21:14     ` Davidlohr Bueso
@ 2015-01-31 21:17       ` Davidlohr Bueso
  0 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2015-01-31 21:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul E. McKenney, Jason Low, Michel Lespinasse,
	Tim Chen, linux-kernel

On Sat, 2015-01-31 at 13:14 -0800, Davidlohr Bueso wrote:
> - owner changes from nil: implies that a writer now holds the mutex. Why

Err... that should say rwsem.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-01-31  2:28     ` Davidlohr Bueso
@ 2015-02-03 17:16       ` Tim Chen
  2015-02-03 17:54         ` Jason Low
  0 siblings, 1 reply; 22+ messages in thread
From: Tim Chen @ 2015-02-03 17:16 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Jason Low,
	Michel Lespinasse, linux-kernel


> > >  
> > > +	if (READ_ONCE(sem->owner))
> > > +		return true; /* new owner, continue spinning */
> > > +
> > 
> > Do you have some comparison data of whether it is more advantageous
> > to continue spinning when owner changes?  After the above change, 
> > rwsem will behave more like a spin lock for write lock and 
> > will keep spinning when the lock changes ownership.
> 
> But recall we still abort when need_resched, so the spinning isn't
> infinite. Never has been.
> 
> >  Now during heavy
> > lock contention, if we don't continue spinning and sleep, we may use the
> > clock cycles for actually running other threads. 
> 
> Under heavy contention, time spinning will force us to ultimately block
> anyway.

The question is under heavy contention, if we are going to block anyway,
won't it be more advantageous not to continue spinning so we can use
the cycles for useful task?  The original code assumes that if the lock
has switched owner, then we are under heavy contention and we can stop
spinning and block.  I think it'll be useful to have some
data comparing the two behaviors.

Thanks.

Tim



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-02-03 17:16       ` Tim Chen
@ 2015-02-03 17:54         ` Jason Low
  2015-02-03 19:43           ` Tim Chen
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Low @ 2015-02-03 17:54 UTC (permalink / raw)
  To: Tim Chen
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Paul E. McKenney,
	Michel Lespinasse, linux-kernel, jason.low2

On Tue, 2015-02-03 at 09:16 -0800, Tim Chen wrote:
> > > >  
> > > > +	if (READ_ONCE(sem->owner))
> > > > +		return true; /* new owner, continue spinning */
> > > > +
> > > 
> > > Do you have some comparison data of whether it is more advantageous
> > > to continue spinning when owner changes?  After the above change, 
> > > rwsem will behave more like a spin lock for write lock and 
> > > will keep spinning when the lock changes ownership.
> > 
> > But recall we still abort when need_resched, so the spinning isn't
> > infinite. Never has been.
> > 
> > >  Now during heavy
> > > lock contention, if we don't continue spinning and sleep, we may use the
> > > clock cycles for actually running other threads. 
> > 
> > Under heavy contention, time spinning will force us to ultimately block
> > anyway.
> 
> The question is under heavy contention, if we are going to block anyway,
> won't it be more advantageous not to continue spinning so we can use
> the cycles for useful task?

Hi Tim,

Now that we have the OSQ logic, under heavy contention, there will still
only be 1 thread that is spinning on owner at a time. So if another
thread is able to obtain the lock before the spinner, we're only sending
the top spinner of the lock to the slowpath. As long as the new lock
owner is running, there is a chance for this top spinner to obtain the
lock, and spinning would be useful.

Since we have the need_resched() checks, this thread will block when
there really is another task that should run over the spinning thread.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-02-03 17:54         ` Jason Low
@ 2015-02-03 19:43           ` Tim Chen
  2015-02-03 21:04             ` Jason Low
  2015-02-04 12:06             ` Peter Zijlstra
  0 siblings, 2 replies; 22+ messages in thread
From: Tim Chen @ 2015-02-03 19:43 UTC (permalink / raw)
  To: Jason Low
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Paul E. McKenney,
	Michel Lespinasse, linux-kernel

On Tue, 2015-02-03 at 09:54 -0800, Jason Low wrote:
> On Tue, 2015-02-03 at 09:16 -0800, Tim Chen wrote:
> > > > >  
> > > > > +	if (READ_ONCE(sem->owner))
> > > > > +		return true; /* new owner, continue spinning */
> > > > > +
> > > > 
> > > > Do you have some comparison data of whether it is more advantageous
> > > > to continue spinning when owner changes?  After the above change, 
> > > > rwsem will behave more like a spin lock for write lock and 
> > > > will keep spinning when the lock changes ownership.
> > > 
> > > But recall we still abort when need_resched, so the spinning isn't
> > > infinite. Never has been.
> > > 
> > > >  Now during heavy
> > > > lock contention, if we don't continue spinning and sleep, we may use the
> > > > clock cycles for actually running other threads. 
> > > 
> > > Under heavy contention, time spinning will force us to ultimately block
> > > anyway.
> > 
> > The question is under heavy contention, if we are going to block anyway,
> > won't it be more advantageous not to continue spinning so we can use
> > the cycles for useful task?
> 
> Hi Tim,
> 
> Now that we have the OSQ logic, under heavy contention, there will still
> only be 1 thread that is spinning on owner at a time. 

That's true.  We cannot have the lock grabbed by a new write 
contender as any new writer contender of the lock will be 
queued by the OSQ logic. Only the
thread doing the optimistic spin is attempting write lock.  
In other word, switching of write owner of the rwsem to a new
owner cannot happen.  Either write owner stay as the original one, or
we don't have a write owner.  So using test of write owner
switching as an indicator of congestion is incorrect.

If my reasoning above is sound, then the check 

+       if (READ_ONCE(sem->owner))
+               return true; /* new owner, continue spinning */
+

is unnecessary and can be removed, as we cannot have a 
new write owner of the rwsem, other than the thread
doing optimistic spinning.

Tim



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-02-03 19:43           ` Tim Chen
@ 2015-02-03 21:04             ` Jason Low
  2015-02-03 21:48               ` Tim Chen
  2015-02-04 12:06             ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Jason Low @ 2015-02-03 21:04 UTC (permalink / raw)
  To: Tim Chen
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Paul E. McKenney,
	Michel Lespinasse, linux-kernel, jason.low2

On Tue, 2015-02-03 at 11:43 -0800, Tim Chen wrote:
> On Tue, 2015-02-03 at 09:54 -0800, Jason Low wrote:
> > On Tue, 2015-02-03 at 09:16 -0800, Tim Chen wrote:
> > > > > >  
> > > > > > +	if (READ_ONCE(sem->owner))
> > > > > > +		return true; /* new owner, continue spinning */
> > > > > > +
> > > > > 
> > > > > Do you have some comparison data of whether it is more advantageous
> > > > > to continue spinning when owner changes?  After the above change, 
> > > > > rwsem will behave more like a spin lock for write lock and 
> > > > > will keep spinning when the lock changes ownership.
> > > > 
> > > > But recall we still abort when need_resched, so the spinning isn't
> > > > infinite. Never has been.
> > > > 
> > > > >  Now during heavy
> > > > > lock contention, if we don't continue spinning and sleep, we may use the
> > > > > clock cycles for actually running other threads. 
> > > > 
> > > > Under heavy contention, time spinning will force us to ultimately block
> > > > anyway.
> > > 
> > > The question is under heavy contention, if we are going to block anyway,
> > > won't it be more advantageous not to continue spinning so we can use
> > > the cycles for useful task?
> > 
> > Hi Tim,
> > 
> > Now that we have the OSQ logic, under heavy contention, there will still
> > only be 1 thread that is spinning on owner at a time. 
> 
> That's true.  We cannot have the lock grabbed by a new write 
> contender as any new writer contender of the lock will be 
> queued by the OSQ logic. Only the
> thread doing the optimistic spin is attempting write lock.  
> In other word, switching of write owner of the rwsem to a new
> owner cannot happen.

Another thread can still obtain the write lock in the fast path though
right? We try to obtain the write lock once before calling
rwsem_down_write_failed().



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-02-03 21:04             ` Jason Low
@ 2015-02-03 21:48               ` Tim Chen
  0 siblings, 0 replies; 22+ messages in thread
From: Tim Chen @ 2015-02-03 21:48 UTC (permalink / raw)
  To: Jason Low
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Paul E. McKenney,
	Michel Lespinasse, linux-kernel

On Tue, 2015-02-03 at 13:04 -0800, Jason Low wrote:
> On Tue, 2015-02-03 at 11:43 -0800, Tim Chen wrote:
> > On Tue, 2015-02-03 at 09:54 -0800, Jason Low wrote:
> > > On Tue, 2015-02-03 at 09:16 -0800, Tim Chen wrote:
> > > > > > >  
> > > > > > > +	if (READ_ONCE(sem->owner))
> > > > > > > +		return true; /* new owner, continue spinning */
> > > > > > > +
> > > > > > 
> > > > > > Do you have some comparison data of whether it is more advantageous
> > > > > > to continue spinning when owner changes?  After the above change, 
> > > > > > rwsem will behave more like a spin lock for write lock and 
> > > > > > will keep spinning when the lock changes ownership.
> > > > > 
> > > > > But recall we still abort when need_resched, so the spinning isn't
> > > > > infinite. Never has been.
> > > > > 
> > > > > >  Now during heavy
> > > > > > lock contention, if we don't continue spinning and sleep, we may use the
> > > > > > clock cycles for actually running other threads. 
> > > > > 
> > > > > Under heavy contention, time spinning will force us to ultimately block
> > > > > anyway.
> > > > 
> > > > The question is under heavy contention, if we are going to block anyway,
> > > > won't it be more advantageous not to continue spinning so we can use
> > > > the cycles for useful task?
> > > 
> > > Hi Tim,
> > > 
> > > Now that we have the OSQ logic, under heavy contention, there will still
> > > only be 1 thread that is spinning on owner at a time. 
> > 
> > That's true.  We cannot have the lock grabbed by a new write 
> > contender as any new writer contender of the lock will be 
> > queued by the OSQ logic. Only the
> > thread doing the optimistic spin is attempting write lock.  
> > In other word, switching of write owner of the rwsem to a new
> > owner cannot happen.
> 
> Another thread can still obtain the write lock in the fast path though
> right? We try to obtain the write lock once before calling
> rwsem_down_write_failed().
> 
> 

True. The change owner check is still needed then.  

Thinking more about this, I now agree that continue spinning is the 
right thing. The possible number of threads contending for write 
locking has been greatly reduced by OSQ logic.  Most of the time 
any new threads doing write locking attempt will do that only once
and then go directly to the OSQ. The probability of success of retrying
write lock by the thread at head of OSQ is high so we should do it.

Davidlohr, you can add my Ack for this patch.

Thanks.

Tim


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-02-03 19:43           ` Tim Chen
  2015-02-03 21:04             ` Jason Low
@ 2015-02-04 12:06             ` Peter Zijlstra
  2015-02-04 17:39               ` Tim Chen
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2015-02-04 12:06 UTC (permalink / raw)
  To: Tim Chen
  Cc: Jason Low, Davidlohr Bueso, Ingo Molnar, Paul E. McKenney,
	Michel Lespinasse, linux-kernel

On Tue, Feb 03, 2015 at 11:43:36AM -0800, Tim Chen wrote:
> That's true.  We cannot have the lock grabbed by a new write 
> contender as any new writer contender of the lock will be 
> queued by the OSQ logic. Only the
> thread doing the optimistic spin is attempting write lock.  
> In other word, switching of write owner of the rwsem to a new
> owner cannot happen.  Either write owner stay as the original one, or
> we don't have a write owner.  So using test of write owner
> switching as an indicator of congestion is incorrect.
> 
> If my reasoning above is sound, then the check 
> 
> +       if (READ_ONCE(sem->owner))
> +               return true; /* new owner, continue spinning */
> +
> 
> is unnecessary and can be removed, as we cannot have a 
> new write owner of the rwsem, other than the thread
> doing optimistic spinning.

I have read the rest of the thread; but the one thing that I didn't see
is trylocks, trylocks can always come in an steal things regardless of
the OSQ stuff.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners
  2015-02-04 12:06             ` Peter Zijlstra
@ 2015-02-04 17:39               ` Tim Chen
  0 siblings, 0 replies; 22+ messages in thread
From: Tim Chen @ 2015-02-04 17:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, Davidlohr Bueso, Ingo Molnar, Paul E. McKenney,
	Michel Lespinasse, linux-kernel

On Wed, 2015-02-04 at 13:06 +0100, Peter Zijlstra wrote:
> On Tue, Feb 03, 2015 at 11:43:36AM -0800, Tim Chen wrote:
> > That's true.  We cannot have the lock grabbed by a new write 
> > contender as any new writer contender of the lock will be 
> > queued by the OSQ logic. Only the
> > thread doing the optimistic spin is attempting write lock.  
> > In other word, switching of write owner of the rwsem to a new
> > owner cannot happen.  Either write owner stay as the original one, or
> > we don't have a write owner.  So using test of write owner
> > switching as an indicator of congestion is incorrect.
> > 
> > If my reasoning above is sound, then the check 
> > 
> > +       if (READ_ONCE(sem->owner))
> > +               return true; /* new owner, continue spinning */
> > +
> > 
> > is unnecessary and can be removed, as we cannot have a 
> > new write owner of the rwsem, other than the thread
> > doing optimistic spinning.
> 
> I have read the rest of the thread; but the one thing that I didn't see
> is trylocks, trylocks can always come in an steal things regardless of
> the OSQ stuff.

Jason also pointed that out.  So the owner change check is needed
after all.  Now because of the OSQ logic, even if owner has changed,
the likelihood that the spinner at the head of OSQ will acquire the
lock is high.  So it should continue to spin.

That's because any new threads coming in will try lock only
once, and go to the OSQ. It is unlikely that they will trylock at
the precise moment when the owner release the lock as they do not
continue to spin on the lock.  The contention from new threads
are low.

So letting the thread at head of OSQ to continue to spin is probably
the right thing to do.

Tim


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [tip:locking/core] locking/rwsem: Document barrier need when waking tasks
  2015-01-30  9:14 ` [PATCH 2/5] locking/rwsem: Document barrier need when waking tasks Davidlohr Bueso
@ 2015-02-18 17:11   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2015-02-18 17:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tim.c.chen, linux-kernel, mingo, jason.low2, hpa, walken,
	torvalds, paulmck, dbueso, peterz, tglx, dave

Commit-ID:  49e4b2bcf7b812e985e65b6c8a0255b1520a6e7e
Gitweb:     http://git.kernel.org/tip/49e4b2bcf7b812e985e65b6c8a0255b1520a6e7e
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Fri, 30 Jan 2015 01:14:24 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 18 Feb 2015 16:57:10 +0100

locking/rwsem: Document barrier need when waking tasks

The need for the smp_mb() in __rwsem_do_wake() should be
properly documented. Applies to both xadd and spinlock
variants.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1422609267-15102-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rwsem-spinlock.c | 7 +++++++
 kernel/locking/rwsem-xadd.c     | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 2555ae1..3a50485 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -85,6 +85,13 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
 
 		list_del(&waiter->list);
 		tsk = waiter->task;
+		/*
+		 * Make sure we do not wakeup the next reader before
+		 * setting the nil condition to grant the next reader;
+		 * otherwise we could miss the wakeup on the other
+		 * side and end up sleeping again. See the pairing
+		 * in rwsem_down_read_failed().
+		 */
 		smp_mb();
 		waiter->task = NULL;
 		wake_up_process(tsk);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2f7cc40..82aba46 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -186,6 +186,13 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 		waiter = list_entry(next, struct rwsem_waiter, list);
 		next = waiter->list.next;
 		tsk = waiter->task;
+		/*
+		 * Make sure we do not wakeup the next reader before
+		 * setting the nil condition to grant the next reader;
+		 * otherwise we could miss the wakeup on the other
+		 * side and end up sleeping again. See the pairing
+		 * in rwsem_down_read_failed().
+		 */
 		smp_mb();
 		waiter->task = NULL;
 		wake_up_process(tsk);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:locking/core] locking/rwsem: Set lock ownership ASAP
  2015-01-30  9:14 ` [PATCH 3/5] locking/rwsem: Set lock ownership ASAP Davidlohr Bueso
@ 2015-02-18 17:11   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2015-02-18 17:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: paulmck, dave, mingo, dbueso, torvalds, tim.c.chen, linux-kernel,
	hpa, peterz, tglx, jason.low2, walken

Commit-ID:  7a215f89a0335582292ec6f3edaa3abd570da75a
Gitweb:     http://git.kernel.org/tip/7a215f89a0335582292ec6f3edaa3abd570da75a
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Fri, 30 Jan 2015 01:14:25 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 18 Feb 2015 16:57:13 +0100

locking/rwsem: Set lock ownership ASAP

In order to optimize the spinning step, we need to set the lock
owner as soon as the lock is acquired; after a successful counter
cmpxchg operation, that is. This is particularly useful as rwsems
need to set the owner to nil for readers, so there is a greater
chance of falling out of the spinning. Currently we only set the
owner much later in the game, in the more generic level -- latency
can be specially bad when waiting for a node->next pointer when
releasing the osq in up_write calls.

As such, update the owner inside rwsem_try_write_lock (when the
lock is obtained after blocking) and rwsem_try_write_lock_unqueued
(when the lock is obtained while spinning). This requires creating
a new internal rwsem.h header to share the owner related calls.

Also cleanup some headers for mutex and rwsem.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1422609267-15102-4-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/mutex.c      |  2 +-
 kernel/locking/rwsem-xadd.c |  8 ++++++--
 kernel/locking/rwsem.c      | 22 +---------------------
 kernel/locking/rwsem.h      | 20 ++++++++++++++++++++
 4 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 59cd6c3..43bf25e 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -25,7 +25,7 @@
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
 #include <linux/debug_locks.h>
-#include "mcs_spinlock.h"
+#include <linux/osq_lock.h>
 
 /*
  * In the DEBUG case we are using the "NULL fastpath" for mutexes,
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 82aba46..07713e5 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -14,8 +14,9 @@
 #include <linux/init.h>
 #include <linux/export.h>
 #include <linux/sched/rt.h>
+#include <linux/osq_lock.h>
 
-#include "mcs_spinlock.h"
+#include "rwsem.h"
 
 /*
  * Guide to the rw_semaphore's count field for common values.
@@ -265,6 +266,7 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
 		    RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
 		if (!list_is_singular(&sem->wait_list))
 			rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
+		rwsem_set_owner(sem);
 		return true;
 	}
 
@@ -284,8 +286,10 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 			return false;
 
 		old = cmpxchg(&sem->count, count, count + RWSEM_ACTIVE_WRITE_BIAS);
-		if (old == count)
+		if (old == count) {
+			rwsem_set_owner(sem);
 			return true;
+		}
 
 		count = old;
 	}
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index e2d3bc7..205be0c 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -9,29 +9,9 @@
 #include <linux/sched.h>
 #include <linux/export.h>
 #include <linux/rwsem.h>
-
 #include <linux/atomic.h>
 
-#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-	sem->owner = current;
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-	sem->owner = NULL;
-}
-
-#else
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-}
-#endif
+#include "rwsem.h"
 
 /*
  * lock for reading
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
new file mode 100644
index 0000000..870ed9a
--- /dev/null
+++ b/kernel/locking/rwsem.h
@@ -0,0 +1,20 @@
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+	sem->owner = current;
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+	sem->owner = NULL;
+}
+
+#else
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+}
+#endif

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:locking/core] locking/rwsem: Avoid deceiving lock spinners
  2015-01-30  9:14 ` [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners Davidlohr Bueso
  2015-01-31  1:51   ` Tim Chen
  2015-01-31  9:29   ` Peter Zijlstra
@ 2015-02-18 17:12   ` tip-bot for Davidlohr Bueso
  2 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2015-02-18 17:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, dave, tim.c.chen, hpa, tglx, walken, peterz,
	paulmck, dbueso, jason.low2, torvalds

Commit-ID:  b3fd4f03ca0b9952221f39ae6790e698bf4b39e7
Gitweb:     http://git.kernel.org/tip/b3fd4f03ca0b9952221f39ae6790e698bf4b39e7
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Fri, 30 Jan 2015 01:14:26 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 18 Feb 2015 16:57:16 +0100

locking/rwsem: Avoid deceiving lock spinners

When readers hold the semaphore, the ->owner is nil. As such,
and unlike mutexes, '!owner' does not necessarily imply that
the lock is free. This will cause writers to potentially spin
excessively as they've been mislead to thinking they have a
chance of acquiring the lock, instead of blocking.

This patch therefore enhances the counter check when the owner
is not set by the time we've broken out of the loop. Otherwise
we can return true as a new owner has the lock and thus we want
to continue spinning. While at it, we can make rwsem_spin_on_owner()
less ambiguos and return right away under need_resched conditions.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1422609267-15102-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rwsem-xadd.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 07713e5..1c0d11e 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -337,21 +337,30 @@ static inline bool owner_running(struct rw_semaphore *sem,
 static noinline
 bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 {
+	long count;
+
 	rcu_read_lock();
 	while (owner_running(sem, owner)) {
-		if (need_resched())
-			break;
+		/* abort spinning when need_resched */
+		if (need_resched()) {
+			rcu_read_unlock();
+			return false;
+		}
 
 		cpu_relax_lowlatency();
 	}
 	rcu_read_unlock();
 
+	if (READ_ONCE(sem->owner))
+		return true; /* new owner, continue spinning */
+
 	/*
-	 * We break out the loop above on need_resched() or when the
-	 * owner changed, which is a sign for heavy contention. Return
-	 * success only when sem->owner is NULL.
+	 * When the owner is not set, the lock could be free or
+	 * held by readers. Check the counter to verify the
+	 * state.
 	 */
-	return sem->owner == NULL;
+	count = READ_ONCE(sem->count);
+	return (count == 0 || count == RWSEM_WAITING_BIAS);
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:locking/core] locking/rwsem: Check for active lock before bailing on spinning
  2015-01-30  9:14 ` [PATCH 5/5] locking/rwsem: Check for active lock before bailing on spinning Davidlohr Bueso
@ 2015-02-18 17:12   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2015-02-18 17:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, torvalds, tim.c.chen, peterz, dave, tglx,
	hpa, walken, dbueso, paulmck, jason.low2

Commit-ID:  1a99367023f6ac664365a37fa508b059e31d0e88
Gitweb:     http://git.kernel.org/tip/1a99367023f6ac664365a37fa508b059e31d0e88
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Fri, 30 Jan 2015 01:14:27 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 18 Feb 2015 16:57:18 +0100

locking/rwsem: Check for active lock before bailing on spinning

37e9562453b ("locking/rwsem: Allow conservative optimistic
spinning when readers have lock") forced the default for
optimistic spinning to be disabled if the lock owner was
nil, which makes much sense for readers. However, while
it is not our priority, we can make some optimizations
for write-mostly workloads. We can bail the spinning step
and still be conservative if there are any active tasks,
otherwise there's really no reason not to spin, as the
semaphore is most likely unlocked.

This patch recovers most of a Unixbench 'execl' benchmark
throughput by sleeping less and making better average system
usage:

  before:
  CPU     %user     %nice   %system   %iowait    %steal     %idle
  all      0.60      0.00      8.02      0.00      0.00     91.38

  after:
  CPU     %user     %nice   %system   %iowait    %steal     %idle
  all      1.22      0.00     70.18      0.00      0.00     28.60

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1422609267-15102-6-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 1c0d11e..e4ad019 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -298,23 +298,30 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner;
-	bool on_cpu = false;
+	bool ret = true;
 
 	if (need_resched())
 		return false;
 
 	rcu_read_lock();
 	owner = ACCESS_ONCE(sem->owner);
-	if (owner)
-		on_cpu = owner->on_cpu;
-	rcu_read_unlock();
+	if (!owner) {
+		long count = ACCESS_ONCE(sem->count);
+		/*
+		 * If sem->owner is not set, yet we have just recently entered the
+		 * slowpath with the lock being active, then there is a possibility
+		 * reader(s) may have the lock. To be safe, bail spinning in these
+		 * situations.
+		 */
+		if (count & RWSEM_ACTIVE_MASK)
+			ret = false;
+		goto done;
+	}
 
-	/*
-	 * If sem->owner is not set, yet we have just recently entered the
-	 * slowpath, then there is a possibility reader(s) may have the lock.
-	 * To be safe, avoid spinning in these situations.
-	 */
-	return on_cpu;
+	ret = owner->on_cpu;
+done:
+	rcu_read_unlock();
+	return ret;
 }
 
 static inline bool owner_running(struct rw_semaphore *sem,

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-02-18 17:24 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-30  9:14 [PATCH -tip v2 0/5] rwsem: Fine tuning Davidlohr Bueso
2015-01-30  9:14 ` [PATCH 1/5] locking/rwsem: Use task->state helpers Davidlohr Bueso
2015-01-30  9:14 ` [PATCH 2/5] locking/rwsem: Document barrier need when waking tasks Davidlohr Bueso
2015-02-18 17:11   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2015-01-30  9:14 ` [PATCH 3/5] locking/rwsem: Set lock ownership ASAP Davidlohr Bueso
2015-02-18 17:11   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2015-01-30  9:14 ` [PATCH 4/5] locking/rwsem: Avoid deceiving lock spinners Davidlohr Bueso
2015-01-31  1:51   ` Tim Chen
2015-01-31  2:28     ` Davidlohr Bueso
2015-02-03 17:16       ` Tim Chen
2015-02-03 17:54         ` Jason Low
2015-02-03 19:43           ` Tim Chen
2015-02-03 21:04             ` Jason Low
2015-02-03 21:48               ` Tim Chen
2015-02-04 12:06             ` Peter Zijlstra
2015-02-04 17:39               ` Tim Chen
2015-01-31  9:29   ` Peter Zijlstra
2015-01-31 21:14     ` Davidlohr Bueso
2015-01-31 21:17       ` Davidlohr Bueso
2015-02-18 17:12   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2015-01-30  9:14 ` [PATCH 5/5] locking/rwsem: Check for active lock before bailing on spinning Davidlohr Bueso
2015-02-18 17:12   ` [tip:locking/core] " tip-bot for Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).