All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -tip 0/6] rwsem: Fine tuning
@ 2015-01-26  7:36 Davidlohr Bueso
  2015-01-26  7:36 ` [PATCH 1/6] locking/rwsem: Use task->state helpers Davidlohr Bueso
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-26  7:36 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso

Hello,

First two patches are self descriptive obvious add-ons.

The rest are performance enhancements for write-mostly workloads.
While this is not our priority (use mutexes instead!!), there are
cases where heavy use of writers can severly hurt rwsem performance.
For instace, we got reports[1] of writer only issues when converting
the i_mmap_rwsem from mutex, on a workload that pathologically pounds
on this lock for vma operations:

-     81.20%           execl  [kernel.kallsyms]     [k] osq_lock
      - 100.00% mutex_optimistic_spin                                                                                                   
           __mutex_lock_slowpath                                                                                                        
         - mutex_lock                                                                                                                   
            + 47.71% unlink_file_vma                                                                                                    
            + 34.91% vma_adjust                                                                                                         
            + 17.38% vma_link

This is enough to make small differences painfully evident. These changes
(particularly patch 6/6) recover most (~75%) of the performance regression.
Patches 4 and 6 deal with optimistic spinning fine tunning, while patch
5 is an attempt to get tid of two barriers when blocking. While I believe
this is safe, it certainly needs more eyeballs, I could have easily overlooked
something. Most of these changes are straighforward, but have various implications.

Passes multiple x86-64 tests.

Thanks!

[1] https://lkml.org/lkml/2015/1/7/884

Davidlohr Bueso (6):
  locking/rwsem: Use task->state helpers
  locking/rwsem: Document barrier need when waking tasks
  locking/rwsem: Set lock ownership ASAP
  locking/rwsem: Avoid deceiving lock spinners
  locking/rwsem: Optimize slowpath/sleeping
  locking/rwsem: Check for active lock before bailing on spinning

 kernel/locking/mutex.c          |  2 +-
 kernel/locking/rwsem-spinlock.c |  7 +++-
 kernel/locking/rwsem-xadd.c     | 71 +++++++++++++++++++++++++++++------------
 kernel/locking/rwsem.c          | 22 +------------
 kernel/locking/rwsem.h          | 20 ++++++++++++
 5 files changed, 78 insertions(+), 44 deletions(-)
 create mode 100644 kernel/locking/rwsem.h

-- 
2.1.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] locking/rwsem: Use task->state helpers
  2015-01-26  7:36 [PATCH -tip 0/6] rwsem: Fine tuning Davidlohr Bueso
@ 2015-01-26  7:36 ` Davidlohr Bueso
  2015-02-04 14:38   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
  2015-01-26  7:36 ` [PATCH 2/6] locking/rwsem: Document barrier need when waking tasks Davidlohr Bueso
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-26  7:36 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

Call __set_task_state() instead of assigning the new state
directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP
environments, keeping track of who last changed the state.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/rwsem-spinlock.c | 2 +-
 kernel/locking/rwsem-xadd.c     | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 2c93571..2555ae1 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -154,7 +154,7 @@ void __sched __down_read(struct rw_semaphore *sem)
 		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
 	}
 
-	tsk->state = TASK_RUNNING;
+	__set_task_state(tsk, TASK_RUNNING);
  out:
 	;
 }
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 7628c3f..2f7cc40 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -242,8 +242,7 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 		schedule();
 	}
 
-	tsk->state = TASK_RUNNING;
-
+	__set_task_state(tsk, TASK_RUNNING);
 	return sem;
 }
 EXPORT_SYMBOL(rwsem_down_read_failed);
-- 
2.1.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/6] locking/rwsem: Document barrier need when waking tasks
  2015-01-26  7:36 [PATCH -tip 0/6] rwsem: Fine tuning Davidlohr Bueso
  2015-01-26  7:36 ` [PATCH 1/6] locking/rwsem: Use task->state helpers Davidlohr Bueso
@ 2015-01-26  7:36 ` Davidlohr Bueso
  2015-01-27 17:07   ` Peter Zijlstra
  2015-01-26  7:36 ` [PATCH 3/6] locking/rwsem: Set lock ownership ASAP Davidlohr Bueso
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-26  7:36 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

The need for the smp_mb in __rwsem_do_wake should be
properly documented. Applies to both xadd and spinlock
variants.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/rwsem-spinlock.c | 5 +++++
 kernel/locking/rwsem-xadd.c     | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 2555ae1..54f7a17 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -85,6 +85,11 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
 
 		list_del(&waiter->list);
 		tsk = waiter->task;
+		/*
+		 * Ensure that all cores see the read before
+		 * setting it to the waiter task to nil, as that
+		 * grants the read lock to the next task.
+		 */
 		smp_mb();
 		waiter->task = NULL;
 		wake_up_process(tsk);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2f7cc40..149e8c7 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -186,6 +186,11 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 		waiter = list_entry(next, struct rwsem_waiter, list);
 		next = waiter->list.next;
 		tsk = waiter->task;
+		/*
+		 * Ensure that all cores see the read before
+		 * setting it to the waiter task to nil, as that
+		 * grants the read lock to the next task.
+		 */
 		smp_mb();
 		waiter->task = NULL;
 		wake_up_process(tsk);
-- 
2.1.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/6] locking/rwsem: Set lock ownership ASAP
  2015-01-26  7:36 [PATCH -tip 0/6] rwsem: Fine tuning Davidlohr Bueso
  2015-01-26  7:36 ` [PATCH 1/6] locking/rwsem: Use task->state helpers Davidlohr Bueso
  2015-01-26  7:36 ` [PATCH 2/6] locking/rwsem: Document barrier need when waking tasks Davidlohr Bueso
@ 2015-01-26  7:36 ` Davidlohr Bueso
  2015-01-27 17:10   ` Peter Zijlstra
  2015-01-26  7:36 ` [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners Davidlohr Bueso
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-26  7:36 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

In order to optimize the spinning step, we need to set the lock
owner as soon as the lock is acquired; after a successful counter
cmpxchg operation, that is. This is particularly useful as rwsems
need to set the owner to nil for readers, so there is a greater
chance of falling out of the spinning. Currently we only set the
owner much later in the game, in the more generic level -- latency
can be specially bad when waiting for a node->next pointer when
releasing the osq in up_write calls.

As such, update the owner inside rwsem_try_write_lock (when the
lock is obtained after blocking) and rwsem_try_write_lock_unqueued
(when the lock is obtained while spinning). This requires creating
a new internal rwsem.h header to share the owner related calls.

Also cleanup some headers for mutex and rwsem.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/mutex.c      |  2 +-
 kernel/locking/rwsem-xadd.c |  8 ++++++--
 kernel/locking/rwsem.c      | 22 +---------------------
 kernel/locking/rwsem.h      | 20 ++++++++++++++++++++
 4 files changed, 28 insertions(+), 24 deletions(-)
 create mode 100644 kernel/locking/rwsem.h

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index c67a60b..166355e 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -25,7 +25,7 @@
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
 #include <linux/debug_locks.h>
-#include "mcs_spinlock.h"
+#include <linux/osq_lock.h>
 
 /*
  * In the DEBUG case we are using the "NULL fastpath" for mutexes,
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 149e8c7..5e425d8 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -14,8 +14,9 @@
 #include <linux/init.h>
 #include <linux/export.h>
 #include <linux/sched/rt.h>
+#include <linux/osq_lock.h>
 
-#include "mcs_spinlock.h"
+#include "rwsem.h"
 
 /*
  * Guide to the rw_semaphore's count field for common values.
@@ -263,6 +264,7 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
 		    RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
 		if (!list_is_singular(&sem->wait_list))
 			rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
+		rwsem_set_owner(sem);
 		return true;
 	}
 
@@ -282,8 +284,10 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 			return false;
 
 		old = cmpxchg(&sem->count, count, count + RWSEM_ACTIVE_WRITE_BIAS);
-		if (old == count)
+		if (old == count) {
+			rwsem_set_owner(sem);
 			return true;
+		}
 
 		count = old;
 	}
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index e2d3bc7..205be0c 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -9,29 +9,9 @@
 #include <linux/sched.h>
 #include <linux/export.h>
 #include <linux/rwsem.h>
-
 #include <linux/atomic.h>
 
-#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-	sem->owner = current;
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-	sem->owner = NULL;
-}
-
-#else
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-}
-#endif
+#include "rwsem.h"
 
 /*
  * lock for reading
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
new file mode 100644
index 0000000..870ed9a
--- /dev/null
+++ b/kernel/locking/rwsem.h
@@ -0,0 +1,20 @@
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+	sem->owner = current;
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+	sem->owner = NULL;
+}
+
+#else
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+}
+#endif
-- 
2.1.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners
  2015-01-26  7:36 [PATCH -tip 0/6] rwsem: Fine tuning Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2015-01-26  7:36 ` [PATCH 3/6] locking/rwsem: Set lock ownership ASAP Davidlohr Bueso
@ 2015-01-26  7:36 ` Davidlohr Bueso
  2015-01-27 17:23   ` Jason Low
  2015-01-26  7:36 ` [PATCH 5/6] locking/rwsem: Optimize slowpath/sleeping Davidlohr Bueso
  2015-01-26  7:36 ` [PATCH 6/6] locking/rwsem: Check for active lock before bailing on spinning Davidlohr Bueso
  5 siblings, 1 reply; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-26  7:36 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

When readers hold the semaphore, the ->owner is nil. As such,
and unlike mutexes, '!owner' does not necessarily imply that
the lock is free. This will cause writer spinners to potentially
spin excessively as they've been mislead to thinking they have
a chance of acquiring the lock, instead of blocking.

This patch therefore replaces this bogus check to solely rely on
the counter to know if the lock is available. Because we don't
hold the wait lock, we can obviously do this in an unqueued
manner.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/rwsem-xadd.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 5e425d8..18a50da 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -335,6 +335,8 @@ static inline bool owner_running(struct rw_semaphore *sem,
 static noinline
 bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 {
+	long count;
+
 	rcu_read_lock();
 	while (owner_running(sem, owner)) {
 		if (need_resched())
@@ -347,9 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 	/*
 	 * We break out the loop above on need_resched() or when the
 	 * owner changed, which is a sign for heavy contention. Return
-	 * success only when sem->owner is NULL.
+	 * success only when the lock is available in order to attempt
+	 * another trylock.
 	 */
-	return sem->owner == NULL;
+	count = READ_ONCE(sem->count);
+	return count == 0 || count == RWSEM_WAITING_BIAS;
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
-- 
2.1.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 5/6] locking/rwsem: Optimize slowpath/sleeping
  2015-01-26  7:36 [PATCH -tip 0/6] rwsem: Fine tuning Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2015-01-26  7:36 ` [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners Davidlohr Bueso
@ 2015-01-26  7:36 ` Davidlohr Bueso
  2015-01-27 17:34   ` Peter Zijlstra
  2015-01-26  7:36 ` [PATCH 6/6] locking/rwsem: Check for active lock before bailing on spinning Davidlohr Bueso
  5 siblings, 1 reply; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-26  7:36 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

When blocking , we incur in multiple barriers when setting the
task's uninterruptable state. This is particularly bad when the
lock keeps getting stolen from the task trying to acquire the sem.
These changes propose delaying setting the task's new state until
we are sure that calling schedule is inevitable.

This implies that we do the trylock and active check (both basically
->counter checks) as TASK_RUNNING. For the trylock we hold the wait
lock with interrupts disabled, so no risk there. And for the active
check, the window for which we could get interrupted is quite small
and makes no tangible difference.

This patch increases Unixbench's 'execl' throughput by 25% on a 40
core machine.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/rwsem-xadd.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 18a50da..88b3468 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -459,17 +459,27 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
 
 	/* wait until we successfully acquire the lock */
-	set_current_state(TASK_UNINTERRUPTIBLE);
 	while (true) {
 		if (rwsem_try_write_lock(count, sem))
 			break;
+
+		__set_current_state(TASK_UNINTERRUPTIBLE);
 		raw_spin_unlock_irq(&sem->wait_lock);
 
-		/* Block until there are no active lockers. */
-		do {
+		/*
+		 * When there are active locks after we wake up,
+		 * the lock was probably stolen from us. Thus,
+		 * go immediately back to sleep and avoid taking
+		 * the wait_lock.
+		 */
+		while (true) {
 			schedule();
-			set_current_state(TASK_UNINTERRUPTIBLE);
-		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
+
+			count = READ_ONCE(sem->count);
+			if (!(count & RWSEM_ACTIVE_MASK))
+				break;
+			__set_current_state(TASK_UNINTERRUPTIBLE);
+		}
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
-- 
2.1.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] locking/rwsem: Check for active lock before bailing on spinning
  2015-01-26  7:36 [PATCH -tip 0/6] rwsem: Fine tuning Davidlohr Bueso
                   ` (4 preceding siblings ...)
  2015-01-26  7:36 ` [PATCH 5/6] locking/rwsem: Optimize slowpath/sleeping Davidlohr Bueso
@ 2015-01-26  7:36 ` Davidlohr Bueso
  2015-01-27 18:11   ` Jason Low
  5 siblings, 1 reply; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-26  7:36 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Jason Low, Michel Lespinasse, Tim Chen,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

37e9562453b (locking/rwsem: Allow conservative optimistic
spinning when readers have lock) forced the default for
optimistic spinning to be disabled if the lock owner was
nil, which makes much sense for readers. However, while
it is not our priority, we can make some optimizations
for write-mostly workloads. We can bail the spinning step
and still be conservative if there are any active tasks,
otherwise there's really no reason not to spin, as the
semaphore is most likely unlocked.

This patch recovers most of a Unixbench 'execl' benchmark
throughput by sleeping less and making better average system
usage:

before:
CPU     %user     %nice   %system   %iowait    %steal     %idle
all      0.60      0.00      8.02      0.00      0.00     91.38

after:
CPU     %user     %nice   %system   %iowait    %steal     %idle
all      1.22      0.00     70.18      0.00      0.00     28.60

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 88b3468..e0e9738 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -296,23 +296,30 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner;
-	bool on_cpu = false;
+	bool ret = true;
 
 	if (need_resched())
 		return false;
 
 	rcu_read_lock();
 	owner = ACCESS_ONCE(sem->owner);
-	if (owner)
-		on_cpu = owner->on_cpu;
-	rcu_read_unlock();
+	if (!owner) {
+		long count = ACCESS_ONCE(sem->count);
+		/*
+		 * If sem->owner is not set, yet we have just recently entered the
+		 * slowpath with the lock being active, then there is a possibility
+		 * reader(s) may have the lock. To be safe, bail spinning in these
+		 * situations.
+		 */
+		if (count & RWSEM_ACTIVE_MASK)
+			ret = false;
+		goto done;
+	}
 
-	/*
-	 * If sem->owner is not set, yet we have just recently entered the
-	 * slowpath, then there is a possibility reader(s) may have the lock.
-	 * To be safe, avoid spinning in these situations.
-	 */
-	return on_cpu;
+	ret = owner->on_cpu;
+done:
+	rcu_read_unlock();
+	return ret;
 }
 
 static inline bool owner_running(struct rw_semaphore *sem,
-- 
2.1.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] locking/rwsem: Document barrier need when waking tasks
  2015-01-26  7:36 ` [PATCH 2/6] locking/rwsem: Document barrier need when waking tasks Davidlohr Bueso
@ 2015-01-27 17:07   ` Peter Zijlstra
  2015-01-27 20:30     ` Davidlohr Bueso
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-01-27 17:07 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Paul E. McKenney, Jason Low, Michel Lespinasse,
	Tim Chen, linux-kernel, Davidlohr Bueso

On Sun, Jan 25, 2015 at 11:36:05PM -0800, Davidlohr Bueso wrote:
> The need for the smp_mb in __rwsem_do_wake should be
> properly documented. Applies to both xadd and spinlock
> variants.
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  kernel/locking/rwsem-spinlock.c | 5 +++++
>  kernel/locking/rwsem-xadd.c     | 5 +++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
> index 2555ae1..54f7a17 100644
> --- a/kernel/locking/rwsem-spinlock.c
> +++ b/kernel/locking/rwsem-spinlock.c
> @@ -85,6 +85,11 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
>  
>  		list_del(&waiter->list);
>  		tsk = waiter->task;
> +		/*
> +		 * Ensure that all cores see the read before
> +		 * setting it to the waiter task to nil, as that
> +		 * grants the read lock to the next task.
> +		 */
>  		smp_mb();
>  		waiter->task = NULL;
>  		wake_up_process(tsk);

Could you enhance that comment by pointing at the pairing code? Is that
the wait loop in rwsem_down_read_failed()?

Also, the comment confuses, how can all cores observe a read into a
local variable?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] locking/rwsem: Set lock ownership ASAP
  2015-01-26  7:36 ` [PATCH 3/6] locking/rwsem: Set lock ownership ASAP Davidlohr Bueso
@ 2015-01-27 17:10   ` Peter Zijlstra
  2015-01-27 19:18     ` Davidlohr Bueso
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-01-27 17:10 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Paul E. McKenney, Jason Low, Michel Lespinasse,
	Tim Chen, linux-kernel, Davidlohr Bueso

On Sun, Jan 25, 2015 at 11:36:06PM -0800, Davidlohr Bueso wrote:
> In order to optimize the spinning step, we need to set the lock
> owner as soon as the lock is acquired; after a successful counter
> cmpxchg operation, that is. This is particularly useful as rwsems
> need to set the owner to nil for readers, so there is a greater
> chance of falling out of the spinning. Currently we only set the
> owner much later in the game, in the more generic level -- latency
> can be specially bad when waiting for a node->next pointer when
> releasing the osq in up_write calls.
> 
> As such, update the owner inside rwsem_try_write_lock (when the
> lock is obtained after blocking) and rwsem_try_write_lock_unqueued
> (when the lock is obtained while spinning). This requires creating
> a new internal rwsem.h header to share the owner related calls.
> 
> Also cleanup some headers for mutex and rwsem.

This is the thing I suggested
lkml.kernel.org/r/20150108103708.GE29390@twins.programming.kicks-ass.net
there right?

Do you have numbers for how much this gained?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners
  2015-01-26  7:36 ` [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners Davidlohr Bueso
@ 2015-01-27 17:23   ` Jason Low
  2015-01-28  3:54     ` Davidlohr Bueso
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Low @ 2015-01-27 17:23 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel, Davidlohr Bueso, jason.low2

On Sun, 2015-01-25 at 23:36 -0800, Davidlohr Bueso wrote:
> When readers hold the semaphore, the ->owner is nil. As such,
> and unlike mutexes, '!owner' does not necessarily imply that
> the lock is free. This will cause writer spinners to potentially
> spin excessively as they've been mislead to thinking they have
> a chance of acquiring the lock, instead of blocking.
> 
> This patch therefore replaces this bogus check to solely rely on
> the counter to know if the lock is available. Because we don't
> hold the wait lock, we can obviously do this in an unqueued
> manner.
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  kernel/locking/rwsem-xadd.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 5e425d8..18a50da 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -335,6 +335,8 @@ static inline bool owner_running(struct rw_semaphore *sem,
>  static noinline
>  bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
>  {
> +	long count;
> +
>  	rcu_read_lock();
>  	while (owner_running(sem, owner)) {
>  		if (need_resched())
> @@ -347,9 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
>  	/*
>  	 * We break out the loop above on need_resched() or when the
>  	 * owner changed, which is a sign for heavy contention. Return
> -	 * success only when sem->owner is NULL.
> +	 * success only when the lock is available in order to attempt
> +	 * another trylock.
>  	 */
> -	return sem->owner == NULL;
> +	count = READ_ONCE(sem->count);
> +	return count == 0 || count == RWSEM_WAITING_BIAS;

If we clear the owner field right before unlocking, would this cause
some situations where we spin until the owner is cleared (about to
release the lock), and then the spinner return false from
rwsem_spin_on_owner?



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] locking/rwsem: Optimize slowpath/sleeping
  2015-01-26  7:36 ` [PATCH 5/6] locking/rwsem: Optimize slowpath/sleeping Davidlohr Bueso
@ 2015-01-27 17:34   ` Peter Zijlstra
  2015-01-27 21:57     ` Davidlohr Bueso
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-01-27 17:34 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Paul E. McKenney, Jason Low, Michel Lespinasse,
	Tim Chen, linux-kernel, Davidlohr Bueso

On Sun, Jan 25, 2015 at 11:36:08PM -0800, Davidlohr Bueso wrote:
> When blocking , we incur in multiple barriers when setting the
> task's uninterruptable state. This is particularly bad when the
> lock keeps getting stolen from the task trying to acquire the sem.
> These changes propose delaying setting the task's new state until
> we are sure that calling schedule is inevitable.
> 
> This implies that we do the trylock and active check (both basically
> ->counter checks) as TASK_RUNNING. For the trylock we hold the wait
> lock with interrupts disabled, so no risk there. And for the active
> check, the window for which we could get interrupted is quite small
> and makes no tangible difference.
> 
> This patch increases Unixbench's 'execl' throughput by 25% on a 40
> core machine.
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  kernel/locking/rwsem-xadd.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 18a50da..88b3468 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -459,17 +459,27 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
>  		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
>  
>  	/* wait until we successfully acquire the lock */
>  	while (true) {
>  		if (rwsem_try_write_lock(count, sem))
>  			break;
> +
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
>  		raw_spin_unlock_irq(&sem->wait_lock);
>  
> +		/*
> +		 * When there are active locks after we wake up,
> +		 * the lock was probably stolen from us. Thus,
> +		 * go immediately back to sleep and avoid taking
> +		 * the wait_lock.
> +		 */
> +		while (true) {
>  			schedule();
> +
> +			count = READ_ONCE(sem->count);
> +			if (!(count & RWSEM_ACTIVE_MASK))
> +				break;
> +			__set_current_state(TASK_UNINTERRUPTIBLE);
> +		}

So its late and I'm not seeing it; why is this safe? How will we not
miss the wakeup that makes condition true?

>  
>  		raw_spin_lock_irq(&sem->wait_lock);
>  	}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] locking/rwsem: Check for active lock before bailing on spinning
  2015-01-26  7:36 ` [PATCH 6/6] locking/rwsem: Check for active lock before bailing on spinning Davidlohr Bueso
@ 2015-01-27 18:11   ` Jason Low
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Low @ 2015-01-27 18:11 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel, Davidlohr Bueso, jason.low2

On Sun, 2015-01-25 at 23:36 -0800, Davidlohr Bueso wrote:
> 37e9562453b (locking/rwsem: Allow conservative optimistic
> spinning when readers have lock) forced the default for
> optimistic spinning to be disabled if the lock owner was
> nil, which makes much sense for readers. However, while
> it is not our priority, we can make some optimizations
> for write-mostly workloads. We can bail the spinning step
> and still be conservative if there are any active tasks,
> otherwise there's really no reason not to spin, as the
> semaphore is most likely unlocked.
> 
> This patch recovers most of a Unixbench 'execl' benchmark
> throughput by sleeping less and making better average system
> usage:
> 
> before:
> CPU     %user     %nice   %system   %iowait    %steal     %idle
> all      0.60      0.00      8.02      0.00      0.00     91.38
> 
> after:
> CPU     %user     %nice   %system   %iowait    %steal     %idle
> all      1.22      0.00     70.18      0.00      0.00     28.60
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++----------
>  1 file changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 88b3468..e0e9738 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -296,23 +296,30 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
>  static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
>  {
>  	struct task_struct *owner;
> -	bool on_cpu = false;
> +	bool ret = true;
>  
>  	if (need_resched())
>  		return false;
>  
>  	rcu_read_lock();
>  	owner = ACCESS_ONCE(sem->owner);
> -	if (owner)
> -		on_cpu = owner->on_cpu;
> -	rcu_read_unlock();
> +	if (!owner) {
> +		long count = ACCESS_ONCE(sem->count);
> +		/*
> +		 * If sem->owner is not set, yet we have just recently entered the
> +		 * slowpath with the lock being active, then there is a possibility
> +		 * reader(s) may have the lock. To be safe, bail spinning in these
> +		 * situations.
> +		 */
> +		if (count & RWSEM_ACTIVE_MASK)
> +			ret = false;
> +		goto done;
> +	}
>  
> -	/*
> -	 * If sem->owner is not set, yet we have just recently entered the
> -	 * slowpath, then there is a possibility reader(s) may have the lock.
> -	 * To be safe, avoid spinning in these situations.
> -	 */
> -	return on_cpu;
> +	ret = owner->on_cpu;
> +done:
> +	rcu_read_unlock();
> +	return ret;
>  }

Acked-by: Jason Low <jason.low2@hp.com>




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] locking/rwsem: Set lock ownership ASAP
  2015-01-27 17:10   ` Peter Zijlstra
@ 2015-01-27 19:18     ` Davidlohr Bueso
  0 siblings, 0 replies; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-27 19:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul E. McKenney, Jason Low, Michel Lespinasse,
	Tim Chen, linux-kernel

On Tue, 2015-01-27 at 18:10 +0100, Peter Zijlstra wrote:
> This is the thing I suggested
> lkml.kernel.org/r/20150108103708.GE29390@twins.programming.kicks-ass.net
> there right?

Yeah.

> Do you have numbers for how much this gained?

This is more of a correctness patch, nothing really tangible for
performance -- although I did note a 5% tp increase as that particular
workload, as pounds on the osq so we have to wait for the node->next
pointer. Otherwise, the window between when we set the lock is taken and
owner is set is very small.

Thanks,
Davidlohr



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] locking/rwsem: Document barrier need when waking tasks
  2015-01-27 17:07   ` Peter Zijlstra
@ 2015-01-27 20:30     ` Davidlohr Bueso
  0 siblings, 0 replies; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-27 20:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul E. McKenney, Jason Low, Michel Lespinasse,
	Tim Chen, linux-kernel

On Tue, 2015-01-27 at 18:07 +0100, Peter Zijlstra wrote:
> On Sun, Jan 25, 2015 at 11:36:05PM -0800, Davidlohr Bueso wrote:
> > The need for the smp_mb in __rwsem_do_wake should be
> > properly documented. Applies to both xadd and spinlock
> > variants.
> > 
> > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > ---
> >  kernel/locking/rwsem-spinlock.c | 5 +++++
> >  kernel/locking/rwsem-xadd.c     | 5 +++++
> >  2 files changed, 10 insertions(+)
> > 
> > diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
> > index 2555ae1..54f7a17 100644
> > --- a/kernel/locking/rwsem-spinlock.c
> > +++ b/kernel/locking/rwsem-spinlock.c
> > @@ -85,6 +85,11 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
> >  
> >  		list_del(&waiter->list);
> >  		tsk = waiter->task;
> > +		/*
> > +		 * Ensure that all cores see the read before
> > +		 * setting it to the waiter task to nil, as that
> > +		 * grants the read lock to the next task.
> > +		 */
> >  		smp_mb();
> >  		waiter->task = NULL;
> >  		wake_up_process(tsk);
> 
> Could you enhance that comment by pointing at the pairing code? Is that
> the wait loop in rwsem_down_read_failed()?

Yep.

> Also, the comment confuses, how can all cores observe a read into a
> local variable?

Yep, I'll rephrase that ;)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] locking/rwsem: Optimize slowpath/sleeping
  2015-01-27 17:34   ` Peter Zijlstra
@ 2015-01-27 21:57     ` Davidlohr Bueso
  0 siblings, 0 replies; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-27 21:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul E. McKenney, Jason Low, Michel Lespinasse,
	Tim Chen, linux-kernel

On Tue, 2015-01-27 at 18:34 +0100, Peter Zijlstra wrote:
> On Sun, Jan 25, 2015 at 11:36:08PM -0800, Davidlohr Bueso wrote:
> > When blocking , we incur in multiple barriers when setting the
> > task's uninterruptable state. This is particularly bad when the
> > lock keeps getting stolen from the task trying to acquire the sem.
> > These changes propose delaying setting the task's new state until
> > we are sure that calling schedule is inevitable.
> > 
> > This implies that we do the trylock and active check (both basically
> > ->counter checks) as TASK_RUNNING. For the trylock we hold the wait
> > lock with interrupts disabled, so no risk there. And for the active
> > check, the window for which we could get interrupted is quite small
> > and makes no tangible difference.
> > 
> > This patch increases Unixbench's 'execl' throughput by 25% on a 40
> > core machine.
> > 
> > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > ---
> >  kernel/locking/rwsem-xadd.c | 20 +++++++++++++++-----
> >  1 file changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > index 18a50da..88b3468 100644
> > --- a/kernel/locking/rwsem-xadd.c
> > +++ b/kernel/locking/rwsem-xadd.c
> > @@ -459,17 +459,27 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
> >  		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
> >  
> >  	/* wait until we successfully acquire the lock */
> >  	while (true) {
> >  		if (rwsem_try_write_lock(count, sem))
> >  			break;
> > +
> > +		__set_current_state(TASK_UNINTERRUPTIBLE);
> >  		raw_spin_unlock_irq(&sem->wait_lock);
> >  
> > +		/*
> > +		 * When there are active locks after we wake up,
> > +		 * the lock was probably stolen from us. Thus,
> > +		 * go immediately back to sleep and avoid taking
> > +		 * the wait_lock.
> > +		 */
> > +		while (true) {
> >  			schedule();
> > +
> > +			count = READ_ONCE(sem->count);
> > +			if (!(count & RWSEM_ACTIVE_MASK))
> > +				break;
> > +			__set_current_state(TASK_UNINTERRUPTIBLE);
> > +		}
> 
> So its late and I'm not seeing it; why is this safe? How will we not
> miss the wakeup that makes condition true?

I was thinking preemption was disabled. But actually yeah, that's a now
stale patch. We should only get rid of the first barrier, that should be
safe.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners
  2015-01-27 17:23   ` Jason Low
@ 2015-01-28  3:54     ` Davidlohr Bueso
  2015-01-28 17:01       ` Tim Chen
  2015-01-28 21:03       ` Jason Low
  0 siblings, 2 replies; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-28  3:54 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel

On Tue, 2015-01-27 at 09:23 -0800, Jason Low wrote:
> On Sun, 2015-01-25 at 23:36 -0800, Davidlohr Bueso wrote:
> > When readers hold the semaphore, the ->owner is nil. As such,
> > and unlike mutexes, '!owner' does not necessarily imply that
> > the lock is free. This will cause writer spinners to potentially
> > spin excessively as they've been mislead to thinking they have
> > a chance of acquiring the lock, instead of blocking.
> > 
> > This patch therefore replaces this bogus check to solely rely on
> > the counter to know if the lock is available. Because we don't
> > hold the wait lock, we can obviously do this in an unqueued
> > manner.
> > 
> > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > ---
> >  kernel/locking/rwsem-xadd.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > index 5e425d8..18a50da 100644
> > --- a/kernel/locking/rwsem-xadd.c
> > +++ b/kernel/locking/rwsem-xadd.c
> > @@ -335,6 +335,8 @@ static inline bool owner_running(struct rw_semaphore *sem,
> >  static noinline
> >  bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
> >  {
> > +	long count;
> > +
> >  	rcu_read_lock();
> >  	while (owner_running(sem, owner)) {
> >  		if (need_resched())
> > @@ -347,9 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
> >  	/*
> >  	 * We break out the loop above on need_resched() or when the
> >  	 * owner changed, which is a sign for heavy contention. Return
> > -	 * success only when sem->owner is NULL.
> > +	 * success only when the lock is available in order to attempt
> > +	 * another trylock.
> >  	 */
> > -	return sem->owner == NULL;
> > +	count = READ_ONCE(sem->count);
> > +	return count == 0 || count == RWSEM_WAITING_BIAS;
> 
> If we clear the owner field right before unlocking, would this cause
> some situations where we spin until the owner is cleared (about to
> release the lock), and then the spinner return false from
> rwsem_spin_on_owner?

I'm not sure I understand your concern ;) could you rephrase that? 

So I think you're referring to the window between when we 1) clear the
->owner and 2) update the ->counter in the unlocking paths. That would
lead the function to break out of the loop ("owner changed") and return
a bogus "sem is locked, thus taken by a new owner now, continue
spinning" reason for it (counter !=0 yet, for example). 

And that's perfectly fine, really. We've never held a strict
owner-counter dependency, and the owner pointer is completely
unreliable. So all this would end up doing is causing us to perform an
extra iteration per race. This is a pretty good tradeoff for what the
patch addresses.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners
  2015-01-28  3:54     ` Davidlohr Bueso
@ 2015-01-28 17:01       ` Tim Chen
  2015-01-28 21:03       ` Jason Low
  1 sibling, 0 replies; 26+ messages in thread
From: Tim Chen @ 2015-01-28 17:01 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Jason Low, Peter Zijlstra, Ingo Molnar, Paul E. McKenney,
	Michel Lespinasse, linux-kernel

On Tue, 2015-01-27 at 19:54 -0800, Davidlohr Bueso wrote:
> On Tue, 2015-01-27 at 09:23 -0800, Jason Low wrote:
> > On Sun, 2015-01-25 at 23:36 -0800, Davidlohr Bueso wrote:
> > > When readers hold the semaphore, the ->owner is nil. As such,
> > > and unlike mutexes, '!owner' does not necessarily imply that
> > > the lock is free. This will cause writer spinners to potentially
> > > spin excessively as they've been mislead to thinking they have
> > > a chance of acquiring the lock, instead of blocking.
> > > 
> > > This patch therefore replaces this bogus check to solely rely on
> > > the counter to know if the lock is available. Because we don't
> > > hold the wait lock, we can obviously do this in an unqueued
> > > manner.
> > > 
> > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > > ---
> > >  kernel/locking/rwsem-xadd.c | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > > index 5e425d8..18a50da 100644
> > > --- a/kernel/locking/rwsem-xadd.c
> > > +++ b/kernel/locking/rwsem-xadd.c
> > > @@ -335,6 +335,8 @@ static inline bool owner_running(struct rw_semaphore *sem,
> > >  static noinline
> > >  bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
> > >  {
> > > +	long count;
> > > +
> > >  	rcu_read_lock();
> > >  	while (owner_running(sem, owner)) {
> > >  		if (need_resched())
> > > @@ -347,9 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
> > >  	/*
> > >  	 * We break out the loop above on need_resched() or when the
> > >  	 * owner changed, which is a sign for heavy contention. Return
> > > -	 * success only when sem->owner is NULL.
> > > +	 * success only when the lock is available in order to attempt
> > > +	 * another trylock.
> > >  	 */
> > > -	return sem->owner == NULL;
> > > +	count = READ_ONCE(sem->count);
> > > +	return count == 0 || count == RWSEM_WAITING_BIAS;
> > 
> > If we clear the owner field right before unlocking, would this cause
> > some situations where we spin until the owner is cleared (about to
> > release the lock), and then the spinner return false from
> > rwsem_spin_on_owner?
> 
> I'm not sure I understand your concern ;) could you rephrase that? 
> 
> So I think you're referring to the window between when we 1) clear the
> ->owner and 2) update the ->counter in the unlocking paths. That would
> lead the function to break out of the loop ("owner changed") and return
> a bogus "sem is locked, thus taken by a new owner now, continue
> spinning" reason for it (counter !=0 yet, for example). 
> 
> And that's perfectly fine, really. We've never held a strict
> owner-counter dependency, and the owner pointer is completely
> unreliable. So all this would end up doing is causing us to perform an
> extra iteration per race. This is a pretty good tradeoff for what the
> patch addresses.

I agree.  The counter is a more accurate and immediate indicator
of whether the lock is available, which is what we want to find out
here.

Tim



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners
  2015-01-28  3:54     ` Davidlohr Bueso
  2015-01-28 17:01       ` Tim Chen
@ 2015-01-28 21:03       ` Jason Low
  2015-01-29  1:10         ` Davidlohr Bueso
  1 sibling, 1 reply; 26+ messages in thread
From: Jason Low @ 2015-01-28 21:03 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel, jason.low2

On Tue, 2015-01-27 at 19:54 -0800, Davidlohr Bueso wrote:
> On Tue, 2015-01-27 at 09:23 -0800, Jason Low wrote:
> > On Sun, 2015-01-25 at 23:36 -0800, Davidlohr Bueso wrote:
> > > When readers hold the semaphore, the ->owner is nil. As such,
> > > and unlike mutexes, '!owner' does not necessarily imply that
> > > the lock is free. This will cause writer spinners to potentially
> > > spin excessively as they've been mislead to thinking they have
> > > a chance of acquiring the lock, instead of blocking.
> > > 
> > > This patch therefore replaces this bogus check to solely rely on
> > > the counter to know if the lock is available. Because we don't
> > > hold the wait lock, we can obviously do this in an unqueued
> > > manner.
> > > 
> > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > > ---
> > >  kernel/locking/rwsem-xadd.c | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > > index 5e425d8..18a50da 100644
> > > --- a/kernel/locking/rwsem-xadd.c
> > > +++ b/kernel/locking/rwsem-xadd.c
> > > @@ -335,6 +335,8 @@ static inline bool owner_running(struct rw_semaphore *sem,
> > >  static noinline
> > >  bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
> > >  {
> > > +	long count;
> > > +
> > >  	rcu_read_lock();
> > >  	while (owner_running(sem, owner)) {
> > >  		if (need_resched())
> > > @@ -347,9 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
> > >  	/*
> > >  	 * We break out the loop above on need_resched() or when the
> > >  	 * owner changed, which is a sign for heavy contention. Return
> > > -	 * success only when sem->owner is NULL.
> > > +	 * success only when the lock is available in order to attempt
> > > +	 * another trylock.
> > >  	 */
> > > -	return sem->owner == NULL;
> > > +	count = READ_ONCE(sem->count);
> > > +	return count == 0 || count == RWSEM_WAITING_BIAS;
> > 
> > If we clear the owner field right before unlocking, would this cause
> > some situations where we spin until the owner is cleared (about to
> > release the lock), and then the spinner return false from
> > rwsem_spin_on_owner?
> 
> I'm not sure I understand your concern ;) could you rephrase that? 

Sure, let me try to elaborate on that  :)

Since the unlocker clears the owner field before actually unlocking, I'm
thinking that with this patch, the spinner in rwsem_spin_on_owner()
would often read the count before the unlocker sets count.

When the owner releases the lock, it sets the owner field to NULL. This
causes the spinner to break out of the loop as the owner changed. The
spinner would then proceed to read sem->count, but before the owner
modifies sem->count.

Thread 1(owner)		Thread 2 (spinning on owner)
---------------		----------------------------
up_write()
  rwsem_clear_owner()
			owner_running() // returns false
			count = READ_ONCE(sem->count)
  __up_write()
			return (count == 0 || count == RWSEM_WAITING_BIAS) // returns false
			// going to slowpath



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners
  2015-01-28 21:03       ` Jason Low
@ 2015-01-29  1:10         ` Davidlohr Bueso
  2015-01-29 20:13           ` Jason Low
  0 siblings, 1 reply; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-29  1:10 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel

On Wed, 2015-01-28 at 13:03 -0800, Jason Low wrote:
> On Tue, 2015-01-27 at 19:54 -0800, Davidlohr Bueso wrote:
> > On Tue, 2015-01-27 at 09:23 -0800, Jason Low wrote:
> > > On Sun, 2015-01-25 at 23:36 -0800, Davidlohr Bueso wrote:
> > > > When readers hold the semaphore, the ->owner is nil. As such,
> > > > and unlike mutexes, '!owner' does not necessarily imply that
> > > > the lock is free. This will cause writer spinners to potentially
> > > > spin excessively as they've been mislead to thinking they have
> > > > a chance of acquiring the lock, instead of blocking.
> > > > 
> > > > This patch therefore replaces this bogus check to solely rely on
> > > > the counter to know if the lock is available. Because we don't
> > > > hold the wait lock, we can obviously do this in an unqueued
> > > > manner.
> > > > 
> > > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > > > ---
> > > >  kernel/locking/rwsem-xadd.c | 8 ++++++--
> > > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > > > index 5e425d8..18a50da 100644
> > > > --- a/kernel/locking/rwsem-xadd.c
> > > > +++ b/kernel/locking/rwsem-xadd.c
> > > > @@ -335,6 +335,8 @@ static inline bool owner_running(struct rw_semaphore *sem,
> > > >  static noinline
> > > >  bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
> > > >  {
> > > > +	long count;
> > > > +
> > > >  	rcu_read_lock();
> > > >  	while (owner_running(sem, owner)) {
> > > >  		if (need_resched())
> > > > @@ -347,9 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
> > > >  	/*
> > > >  	 * We break out the loop above on need_resched() or when the
> > > >  	 * owner changed, which is a sign for heavy contention. Return
> > > > -	 * success only when sem->owner is NULL.
> > > > +	 * success only when the lock is available in order to attempt
> > > > +	 * another trylock.
> > > >  	 */
> > > > -	return sem->owner == NULL;
> > > > +	count = READ_ONCE(sem->count);
> > > > +	return count == 0 || count == RWSEM_WAITING_BIAS;
> > > 
> > > If we clear the owner field right before unlocking, would this cause
> > > some situations where we spin until the owner is cleared (about to
> > > release the lock), and then the spinner return false from
> > > rwsem_spin_on_owner?
> > 
> > I'm not sure I understand your concern ;) could you rephrase that? 
> 
> Sure, let me try to elaborate on that  :)
> 
> Since the unlocker clears the owner field before actually unlocking, I'm
> thinking that with this patch, the spinner in rwsem_spin_on_owner()
> would often read the count before the unlocker sets count.
> 
> When the owner releases the lock, it sets the owner field to NULL. This
> causes the spinner to break out of the loop as the owner changed. The
> spinner would then proceed to read sem->count, but before the owner
> modifies sem->count.
> 
> Thread 1(owner)		Thread 2 (spinning on owner)
> ---------------		----------------------------
> up_write()
>   rwsem_clear_owner()
> 			owner_running() // returns false
> 			count = READ_ONCE(sem->count)
>   __up_write()
> 			return (count == 0 || count == RWSEM_WAITING_BIAS) // returns false
> 			// going to slowpath

Yeah, I think that's something we'll just have to life with. Dealing
with owner and counter state will inevitable be unperfect at some point.
I think the race window is still quite small, and the change still deals
with a much more concerning problem.

After taking another look at the patch, I do think we should account for
owner before returning, though: When owner is set by the time we break
out of the loop, there's a new owner then (ie lock stolen), so we should
return true as we want to continue spinning. In this patch it would
return false. We can also return immediately when need_resched, making
rwsem_spin_on_owner() less ambiguous. What do you think of having this
instead?

bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
{
	long count;

	rcu_read_lock();
	while (owner_running(sem, owner)) {
		/* abort spinning when need_resched */
		if (need_resched()) {
			rcu_read_unlock();
			return false;
		}
		
		cpu_relax_lowlatency();
	}
	rcu_read_unlock();

	if (READ_ONCE(sem->owner))
		return true; /* new owner, continue spinning */
	
	/*
	 * When the owner is not set, the lock could be free or
	 * held by readers. Check the counter to verify the
	 * state.
	 */
	count = READ_ONCE(sem->count);
	return (count == 0 || count == RWSEM_WAITING_BIAS);
}

Thanks,
Davidlohd



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners
  2015-01-29  1:10         ` Davidlohr Bueso
@ 2015-01-29 20:13           ` Jason Low
  2015-01-29 20:18             ` Jason Low
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Low @ 2015-01-29 20:13 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel, jason.low2

On Wed, 2015-01-28 at 17:10 -0800, Davidlohr Bueso wrote:

> 	if (READ_ONCE(sem->owner))
> 		return true; /* new owner, continue spinning */

In terms of the sem->owner check, I agree. This also reminds me of a
similar patch I was going to send out, but for mutex. The idea is that
before we added all the MCS logic, we wanted to return false if owner
isn't NULL to prevent too many threads simultaneously spinning on the
owner at the same time.

With the new MCS queuing, we don't have to worry about that issue, so it
makes sense to continue spinning as long as owner is running and the
spinner doesn't need to reschedule.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners
  2015-01-29 20:13           ` Jason Low
@ 2015-01-29 20:18             ` Jason Low
  2015-01-29 23:15               ` Davidlohr Bueso
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Low @ 2015-01-29 20:18 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel, jason.low2

On Thu, 2015-01-29 at 12:13 -0800, Jason Low wrote:
> On Wed, 2015-01-28 at 17:10 -0800, Davidlohr Bueso wrote:
> 
> > 	if (READ_ONCE(sem->owner))
> > 		return true; /* new owner, continue spinning */
> 
> In terms of the sem->owner check, I agree. This also reminds me of a
> similar patch I was going to send out, but for mutex. The idea is that
> before we added all the MCS logic, we wanted to return false if owner
> isn't NULL to prevent too many threads simultaneously spinning on the
> owner at the same time.
> 
> With the new MCS queuing, we don't have to worry about that issue, so it
> makes sense to continue spinning as long as owner is running and the
> spinner doesn't need to reschedule.

By the way, here was the patch that I was referring to:

-------------------------------------------------------

In the mutex_spin_on_owner(), we return true only if lock->owner == NULL.
This was beneficial in situations where there were multiple threads
simultaneously spinning for the mutex. If another thread got the lock
while other spinner(s) were also doing mutex_spin_on_owner(), then the
other spinners would stop spinning. This workaround helped reduce the
chance that many spinners were simultaneously spinning for the mutex
which can help improve performance in highly contended cases.

However, recent changes were made to the optimistic spinning code such
that instead of having all spinners simultaneously spin for the mutex,
we queue the spinners with an MCS lock such that only one thread spins
for the mutex at a time.

Now, we don't have to worry about multiple threads spinning on owner
at the same time, and if lock->owner is not NULL at this point, it likely
means another thread happen to obtain the lock in the fastpath.

This patch changes this so that mutex_spin_on_owner() return true when
the lock owner changes, which means a thread will only stop spinning
if it either needs to reschedule or if the lock owner is not running.

We saw up to a 5% performance improvement in the fserver workload with
this patch.

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 kernel/locking/mutex.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 8d1e2c1..e94e6b8 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -163,11 +163,11 @@ int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 	rcu_read_unlock();
 
 	/*
-	 * We break out the loop above on need_resched() and when the
-	 * owner changed, which is a sign for heavy contention. Return
-	 * success only when lock->owner is NULL.
+	 * We break out the loop above on either need_resched(), when
+	 * the owner is not running, or when the lock owner changed.
+	 * Return success only when the lock owner changed.
 	 */
-	return lock->owner == NULL;
+	return lock->owner != owner;
 }
 
 /*
-- 
1.7.1




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners
  2015-01-29 20:18             ` Jason Low
@ 2015-01-29 23:15               ` Davidlohr Bueso
  2015-01-30  1:52                 ` Refactoring mutex spin on owner code Jason Low
  0 siblings, 1 reply; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-29 23:15 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel

On Thu, 2015-01-29 at 12:18 -0800, Jason Low wrote:
>  	/*
> -	 * We break out the loop above on need_resched() and when the
> -	 * owner changed, which is a sign for heavy contention. Return
> -	 * success only when lock->owner is NULL.
> +	 * We break out the loop above on either need_resched(), when
> +	 * the owner is not running, or when the lock owner changed.
> +	 * Return success only when the lock owner changed.
>  	 */
> -	return lock->owner == NULL;
> +	return lock->owner != owner;
>  }

Ideally we would refactor all this, along with getting rid of
owner_running() at some point. It no longer makes sense to split up
mutex_spin_on_owner() and we're doing duplicate owner checks. It would
also be simpler than having to guess why we broke out of the loop, for
example.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Refactoring mutex spin on owner code
  2015-01-29 23:15               ` Davidlohr Bueso
@ 2015-01-30  1:52                 ` Jason Low
  2015-01-30  7:14                   ` Davidlohr Bueso
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Low @ 2015-01-30  1:52 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel, jason.low2

On Thu, 2015-01-29 at 15:15 -0800, Davidlohr Bueso wrote:
> On Thu, 2015-01-29 at 12:18 -0800, Jason Low wrote:
> >  	/*
> > -	 * We break out the loop above on need_resched() and when the
> > -	 * owner changed, which is a sign for heavy contention. Return
> > -	 * success only when lock->owner is NULL.
> > +	 * We break out the loop above on either need_resched(), when
> > +	 * the owner is not running, or when the lock owner changed.
> > +	 * Return success only when the lock owner changed.
> >  	 */
> > -	return lock->owner == NULL;
> > +	return lock->owner != owner;
> >  }
> 
> Ideally we would refactor all this, along with getting rid of
> owner_running() at some point. It no longer makes sense to split up
> mutex_spin_on_owner() and we're doing duplicate owner checks. It would
> also be simpler than having to guess why we broke out of the loop, for
> example.

Sure, that makes sense. What do you think of this additional change for
refactoring the mutex version?

---
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 8711505..b6a8633 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -204,44 +204,45 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
  * Mutex spinning code migrated from kernel/sched/core.c
  */
 
-static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
-{
-	if (lock->owner != owner)
-		return false;
-
-	/*
-	 * Ensure we emit the owner->on_cpu, dereference _after_ checking
-	 * lock->owner still matches owner, if that fails, owner might
-	 * point to free()d memory, if it still matches, the rcu_read_lock()
-	 * ensures the memory stays valid.
-	 */
-	barrier();
-
-	return owner->on_cpu;
-}
-
 /*
  * Look out! "owner" is an entirely speculative pointer
  * access and not reliable.
  */
 static noinline
-int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 {
+	bool ret;
+
 	rcu_read_lock();
-	while (owner_running(lock, owner)) {
-		if (need_resched())
+	while (true) {
+		/* Return success when the lock owner changed */
+		if (lock->owner != owner) {
+			ret = true;
 			break;
+		}
+
+		/*
+		 * Ensure we emit the owner->on_cpu, dereference _after_
+		 * checking lock->owner still matches owner, if that fails,
+		 * owner might point to free()d memory, if it still matches,
+		 * the rcu_read_lock() ensures the memory stays valid.
+		 */
+		barrier();
+
+		/*
+		 * Stop spinning if we need to reschedule or if owner is
+		 * not running.
+		 */
+		if (!owner->on_cpu || need_resched()) {
+			ret = false;
+			break;
+		}
 
 		cpu_relax_lowlatency();
 	}
 	rcu_read_unlock();
 
-	/*
-	 * We break out the loop above on either need_resched(), when
-	 * the owner is not running, or when the lock owner changed.
-	 * Return success only when the lock owner changed.
-	 */
-	return lock->owner != owner;
+	return ret;
 }
 
 /*



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Refactoring mutex spin on owner code
  2015-01-30  1:52                 ` Refactoring mutex spin on owner code Jason Low
@ 2015-01-30  7:14                   ` Davidlohr Bueso
  2015-01-30  7:51                     ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Davidlohr Bueso @ 2015-01-30  7:14 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel

On Thu, 2015-01-29 at 17:52 -0800, Jason Low wrote:
> On Thu, 2015-01-29 at 15:15 -0800, Davidlohr Bueso wrote:
> > On Thu, 2015-01-29 at 12:18 -0800, Jason Low wrote:
> > >  	/*
> > > -	 * We break out the loop above on need_resched() and when the
> > > -	 * owner changed, which is a sign for heavy contention. Return
> > > -	 * success only when lock->owner is NULL.
> > > +	 * We break out the loop above on either need_resched(), when
> > > +	 * the owner is not running, or when the lock owner changed.
> > > +	 * Return success only when the lock owner changed.
> > >  	 */
> > > -	return lock->owner == NULL;
> > > +	return lock->owner != owner;
> > >  }
> > 
> > Ideally we would refactor all this, along with getting rid of
> > owner_running() at some point. It no longer makes sense to split up
> > mutex_spin_on_owner() and we're doing duplicate owner checks. It would
> > also be simpler than having to guess why we broke out of the loop, for
> > example.
> 
> Sure, that makes sense. What do you think of this additional change for
> refactoring the mutex version?
> 
> ---
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 8711505..b6a8633 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -204,44 +204,45 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
>   * Mutex spinning code migrated from kernel/sched/core.c
>   */
>  
> -static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
> -{
> -	if (lock->owner != owner)
> -		return false;
> -
> -	/*
> -	 * Ensure we emit the owner->on_cpu, dereference _after_ checking
> -	 * lock->owner still matches owner, if that fails, owner might
> -	 * point to free()d memory, if it still matches, the rcu_read_lock()
> -	 * ensures the memory stays valid.
> -	 */
> -	barrier();
> -
> -	return owner->on_cpu;
> -}
> -
>  /*
>   * Look out! "owner" is an entirely speculative pointer
>   * access and not reliable.
>   */
>  static noinline
> -int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
> +bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
>  {
> +	bool ret;
> +
>  	rcu_read_lock();
> -	while (owner_running(lock, owner)) {
> -		if (need_resched())
> +	while (true) {
> +		/* Return success when the lock owner changed */
> +		if (lock->owner != owner) {

Shouldn't this be a READ_ONCE(lock->owner)? We're in a loop and need to
avoid gcc giving us stale data if the owner is updated after a few
iterations, no?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Refactoring mutex spin on owner code
  2015-01-30  7:14                   ` Davidlohr Bueso
@ 2015-01-30  7:51                     ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-01-30  7:51 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Jason Low, Ingo Molnar, Paul E. McKenney, Michel Lespinasse,
	Tim Chen, linux-kernel

On Thu, Jan 29, 2015 at 11:14:40PM -0800, Davidlohr Bueso wrote:
> > +bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
> >  {
> > +	bool ret;
> > +
> >  	rcu_read_lock();
> > -	while (owner_running(lock, owner)) {
> > -		if (need_resched())
> > +	while (true) {
> > +		/* Return success when the lock owner changed */
> > +		if (lock->owner != owner) {
> 
> Shouldn't this be a READ_ONCE(lock->owner)? We're in a loop and need to
> avoid gcc giving us stale data if the owner is updated after a few
> iterations, no?

There's a barrier() in that loop, and cpu_relax() also implies
barrier(). I'm pretty sure that's more than sufficient to make GCC emit
loads.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [tip:locking/core] locking/rwsem: Use task->state helpers
  2015-01-26  7:36 ` [PATCH 1/6] locking/rwsem: Use task->state helpers Davidlohr Bueso
@ 2015-02-04 14:38   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 26+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2015-02-04 14:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, jason.low2, paulmck, dave, tglx, torvalds, dbueso,
	hpa, peterz, walken, tim.c.chen, mingo

Commit-ID:  73105994c57d06e40a33ab5a716db04e898b4c05
Gitweb:     http://git.kernel.org/tip/73105994c57d06e40a33ab5a716db04e898b4c05
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Sun, 25 Jan 2015 23:36:04 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 4 Feb 2015 07:57:39 +0100

locking/rwsem: Use task->state helpers

Call __set_task_state() instead of assigning the new state
directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP
environments, keeping track of who last changed the state.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1422257769-14083-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rwsem-spinlock.c | 2 +-
 kernel/locking/rwsem-xadd.c     | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 2c93571..2555ae1 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -154,7 +154,7 @@ void __sched __down_read(struct rw_semaphore *sem)
 		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
 	}
 
-	tsk->state = TASK_RUNNING;
+	__set_task_state(tsk, TASK_RUNNING);
  out:
 	;
 }
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 7628c3f..2f7cc40 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -242,8 +242,7 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 		schedule();
 	}
 
-	tsk->state = TASK_RUNNING;
-
+	__set_task_state(tsk, TASK_RUNNING);
 	return sem;
 }
 EXPORT_SYMBOL(rwsem_down_read_failed);

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-02-04 14:39 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-26  7:36 [PATCH -tip 0/6] rwsem: Fine tuning Davidlohr Bueso
2015-01-26  7:36 ` [PATCH 1/6] locking/rwsem: Use task->state helpers Davidlohr Bueso
2015-02-04 14:38   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2015-01-26  7:36 ` [PATCH 2/6] locking/rwsem: Document barrier need when waking tasks Davidlohr Bueso
2015-01-27 17:07   ` Peter Zijlstra
2015-01-27 20:30     ` Davidlohr Bueso
2015-01-26  7:36 ` [PATCH 3/6] locking/rwsem: Set lock ownership ASAP Davidlohr Bueso
2015-01-27 17:10   ` Peter Zijlstra
2015-01-27 19:18     ` Davidlohr Bueso
2015-01-26  7:36 ` [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners Davidlohr Bueso
2015-01-27 17:23   ` Jason Low
2015-01-28  3:54     ` Davidlohr Bueso
2015-01-28 17:01       ` Tim Chen
2015-01-28 21:03       ` Jason Low
2015-01-29  1:10         ` Davidlohr Bueso
2015-01-29 20:13           ` Jason Low
2015-01-29 20:18             ` Jason Low
2015-01-29 23:15               ` Davidlohr Bueso
2015-01-30  1:52                 ` Refactoring mutex spin on owner code Jason Low
2015-01-30  7:14                   ` Davidlohr Bueso
2015-01-30  7:51                     ` Peter Zijlstra
2015-01-26  7:36 ` [PATCH 5/6] locking/rwsem: Optimize slowpath/sleeping Davidlohr Bueso
2015-01-27 17:34   ` Peter Zijlstra
2015-01-27 21:57     ` Davidlohr Bueso
2015-01-26  7:36 ` [PATCH 6/6] locking/rwsem: Check for active lock before bailing on spinning Davidlohr Bueso
2015-01-27 18:11   ` Jason Low

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.