linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] mutex: Mutex scalability patches
@ 2014-01-28 19:13 Jason Low
  2014-01-28 19:13 ` [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched() Jason Low
                   ` (5 more replies)
  0 siblings, 6 replies; 51+ messages in thread
From: Jason Low @ 2014-01-28 19:13 UTC (permalink / raw)
  To: mingo, peterz, paulmck, Waiman.Long, torvalds, tglx, jason.low2
  Cc: linux-kernel, riel, akpm, davidlohr, hpa, andi, aswin,
	scott.norton, chegu_vinod

v1->v2:
- Replace the previous patch that limits the # of times a thread can spin with
  !lock->owner with a patch that releases the mutex before holding the wait_lock
  in the __mutex_unlock_common_slowpath() function.
- Add a patch which allows a thread to attempt 1 mutex_spin_on_owner() without
  checking need_resched() if need_resched() triggered while in the MCS queue.
- Add a patch which disables preemption between modifying lock->owner and
  acquiring/releasing the mutex.

This patchset addresses a few scalability issues with mutexes.

Patch 1 has the mutex_can_spin_on_owner() funtion check for need_resched()
before being added to MCS queue. 

Patches 2, 3 are to fix issues with threads spinning when
there is no lock owner when the mutex is under high contention.

Patch 4 and 5 are RFC patches. Patch 4 disables preemption between modifying
lock->owner and locking/unlocking the mutex. Patch 5 addresses the situation
where spinners can potentially wait a long time in the MCS queue for a chance
to spin on mutex owner (not checking for need_resched()), yet ends up not
getting to spin.

These changes benefit the AIM7 fserver and high_systime workloads (run on disk)
on an 8 socket, 80 core box. The table below shows the performance
improvements with 3.13 + patches 1, 2, 3 when compared to the 3.13 baseline,
and the performance improvements with 3.13 + all 5 patches compared to
the 3.13 baseline.

Note: I split the % improvement into these 2 categories because
patch 3 and patch 5 are the most interesting/important patches in
this patchset in terms of performance improvements.

---------------------------------------------------------------
		high_systime
---------------------------------------------------------------
# users   | avg % improvement with | avg % improvement with
          | 3.13 + patch 1, 2, 3   | 3.13 + patch 1, 2, 3, 4, 5
---------------------------------------------------------------
1000-2000 |    +27.05%             |    +53.35%
---------------------------------------------------------------
100-900   |    +36.11%             |    +52.56%
---------------------------------------------------------------
10-90     |     +2.47%             |     +4.05%
---------------------------------------------------------------


---------------------------------------------------------------
		fserver
---------------------------------------------------------------
# users   | avg % improvement with | avg % improvement with
          | 3.13 + patch 1, 2, 3   | 3.13 + patch 1, 2, 3, 4, 5
---------------------------------------------------------------
1000-2000 |    +18.31%             |    +37.65%
---------------------------------------------------------------
100-900   |     +5.99%             |    +17.50%
---------------------------------------------------------------
10-90     |     +2.47%             |     +6.10%




^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched()
  2014-01-28 19:13 [PATCH v2 0/5] mutex: Mutex scalability patches Jason Low
@ 2014-01-28 19:13 ` Jason Low
  2014-01-28 20:20   ` Paul E. McKenney
                     ` (2 more replies)
  2014-01-28 19:13 ` [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued Jason Low
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 51+ messages in thread
From: Jason Low @ 2014-01-28 19:13 UTC (permalink / raw)
  To: mingo, peterz, paulmck, Waiman.Long, torvalds, tglx, jason.low2
  Cc: linux-kernel, riel, akpm, davidlohr, hpa, andi, aswin,
	scott.norton, chegu_vinod

The mutex_can_spin_on_owner() function should also return false if the
task needs to be rescheduled to avoid entering the MCS queue when it
needs to reschedule.

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 kernel/locking/mutex.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4dd6e4c..85c6be1 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -212,6 +212,9 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 	struct task_struct *owner;
 	int retval = 1;
 
+	if (need_resched())
+		return 0;
+
 	rcu_read_lock();
 	owner = ACCESS_ONCE(lock->owner);
 	if (owner)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued
  2014-01-28 19:13 [PATCH v2 0/5] mutex: Mutex scalability patches Jason Low
  2014-01-28 19:13 ` [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched() Jason Low
@ 2014-01-28 19:13 ` Jason Low
  2014-01-28 20:23   ` Paul E. McKenney
  2014-03-11 12:41   ` [tip:core/locking] locking/mutexes: " tip-bot for Jason Low
  2014-01-28 19:13 ` [PATCH v2 3/5] mutex: Unlock the mutex without the wait_lock Jason Low
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 51+ messages in thread
From: Jason Low @ 2014-01-28 19:13 UTC (permalink / raw)
  To: mingo, peterz, paulmck, Waiman.Long, torvalds, tglx, jason.low2
  Cc: linux-kernel, riel, akpm, davidlohr, hpa, andi, aswin,
	scott.norton, chegu_vinod

The mutex->spin_mlock was introduced in order to ensure that only 1 thread
spins for lock acquisition at a time to reduce cache line contention. When
lock->owner is NULL and the lock->count is still not 1, the spinner(s) will
continually release and obtain the lock->spin_mlock. This can generate
quite a bit of overhead/contention, and also might just delay the spinner
from getting the lock.

This patch modifies the way optimistic spinners are queued by queuing before
entering the optimistic spinning loop as oppose to acquiring before every
call to mutex_spin_on_owner(). So in situations where the spinner requires
a few extra spins before obtaining the lock, then there will only be 1 spinner
trying to get the lock and it will avoid the overhead from unnecessarily
unlocking and locking the spin_mlock.

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 kernel/locking/mutex.c |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 85c6be1..7519d27 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -419,6 +419,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	struct mutex_waiter waiter;
 	unsigned long flags;
 	int ret;
+	struct mspin_node node;
 
 	preempt_disable();
 	mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
@@ -449,9 +450,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	if (!mutex_can_spin_on_owner(lock))
 		goto slowpath;
 
+	mspin_lock(MLOCK(lock), &node);
 	for (;;) {
 		struct task_struct *owner;
-		struct mspin_node  node;
 
 		if (use_ww_ctx && ww_ctx->acquired > 0) {
 			struct ww_mutex *ww;
@@ -466,19 +467,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			 * performed the optimistic spinning cannot be done.
 			 */
 			if (ACCESS_ONCE(ww->ctx))
-				goto slowpath;
+				break;
 		}
 
 		/*
 		 * If there's an owner, wait for it to either
 		 * release the lock or go to sleep.
 		 */
-		mspin_lock(MLOCK(lock), &node);
 		owner = ACCESS_ONCE(lock->owner);
-		if (owner && !mutex_spin_on_owner(lock, owner)) {
-			mspin_unlock(MLOCK(lock), &node);
-			goto slowpath;
-		}
+		if (owner && !mutex_spin_on_owner(lock, owner))
+			break;
 
 		if ((atomic_read(&lock->count) == 1) &&
 		    (atomic_cmpxchg(&lock->count, 1, 0) == 1)) {
@@ -495,7 +493,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			preempt_enable();
 			return 0;
 		}
-		mspin_unlock(MLOCK(lock), &node);
 
 		/*
 		 * When there's no owner, we might have preempted between the
@@ -504,7 +501,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * the owner complete.
 		 */
 		if (!owner && (need_resched() || rt_task(task)))
-			goto slowpath;
+			break;
 
 		/*
 		 * The cpu_relax() call is a compiler barrier which forces
@@ -514,6 +511,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 */
 		arch_mutex_cpu_relax();
 	}
+	mspin_unlock(MLOCK(lock), &node);
 slowpath:
 #endif
 	spin_lock_mutex(&lock->wait_lock, flags);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v2 3/5] mutex: Unlock the mutex without the wait_lock
  2014-01-28 19:13 [PATCH v2 0/5] mutex: Mutex scalability patches Jason Low
  2014-01-28 19:13 ` [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched() Jason Low
  2014-01-28 19:13 ` [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued Jason Low
@ 2014-01-28 19:13 ` Jason Low
  2014-03-11 12:41   ` [tip:core/locking] locking/mutexes: " tip-bot for Jason Low
  2014-01-28 19:13 ` [RFC][PATCH v2 4/5] mutex: Disable preemtion between modifying lock->owner and locking/unlocking mutex Jason Low
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-01-28 19:13 UTC (permalink / raw)
  To: mingo, peterz, paulmck, Waiman.Long, torvalds, tglx, jason.low2
  Cc: linux-kernel, riel, akpm, davidlohr, hpa, andi, aswin,
	scott.norton, chegu_vinod

When running workloads that have high contention in mutexes on an 8 socket
machine, mutex spinners would often spin for a long time with no lock owner.

The main reason why this is occuring is in __mutex_unlock_common_slowpath(),
if __mutex_slowpath_needs_to_unlock(), then the owner needs to acquire the
mutex->wait_lock before releasing the mutex (setting lock->count to 1). When
the wait_lock is contended, this delays the mutex from being released.
We should be able to release the mutex without holding the wait_lock.

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 kernel/locking/mutex.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 7519d27..6d85b08 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -718,10 +718,6 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
 	struct mutex *lock = container_of(lock_count, struct mutex, count);
 	unsigned long flags;
 
-	spin_lock_mutex(&lock->wait_lock, flags);
-	mutex_release(&lock->dep_map, nested, _RET_IP_);
-	debug_mutex_unlock(lock);
-
 	/*
 	 * some architectures leave the lock unlocked in the fastpath failure
 	 * case, others need to leave it locked. In the later case we have to
@@ -730,6 +726,10 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
 	if (__mutex_slowpath_needs_to_unlock())
 		atomic_set(&lock->count, 1);
 
+	spin_lock_mutex(&lock->wait_lock, flags);
+	mutex_release(&lock->dep_map, nested, _RET_IP_);
+	debug_mutex_unlock(lock);
+
 	if (!list_empty(&lock->wait_list)) {
 		/* get the first entry from the wait-list: */
 		struct mutex_waiter *waiter =
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH v2 4/5] mutex: Disable preemtion between modifying lock->owner and locking/unlocking mutex
  2014-01-28 19:13 [PATCH v2 0/5] mutex: Mutex scalability patches Jason Low
                   ` (2 preceding siblings ...)
  2014-01-28 19:13 ` [PATCH v2 3/5] mutex: Unlock the mutex without the wait_lock Jason Low
@ 2014-01-28 19:13 ` Jason Low
  2014-01-28 20:54   ` Peter Zijlstra
  2014-01-28 19:13 ` [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued Jason Low
  2014-01-28 21:08 ` [PATCH v2 0/5] mutex: Mutex scalability patches Davidlohr Bueso
  5 siblings, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-01-28 19:13 UTC (permalink / raw)
  To: mingo, peterz, paulmck, Waiman.Long, torvalds, tglx, jason.low2
  Cc: linux-kernel, riel, akpm, davidlohr, hpa, andi, aswin,
	scott.norton, chegu_vinod

This RFC patch disables preemption between modifying lock->owner and
locking/unlocking the mutex lock. This prevents situations where the owner
can preempt between those 2 operations, which causes optimistic spinners to
be unable to check if lock->owner is not on CPU. As mentioned in the
thread for this v1 patchset, disabling preemption is a cheap operation.

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 kernel/locking/mutex.c |   30 ++++++++++++++++++++++++++++--
 1 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 6d85b08..cfaaf53 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -98,8 +98,10 @@ void __sched mutex_lock(struct mutex *lock)
 	 * The locking fastpath is the 1->0 transition from
 	 * 'unlocked' into 'locked' state.
 	 */
+	preempt_disable();
 	__mutex_fastpath_lock(&lock->count, __mutex_lock_slowpath);
 	mutex_set_owner(lock);
+	preempt_enable();
 }
 
 EXPORT_SYMBOL(mutex_lock);
@@ -253,9 +255,13 @@ void __sched mutex_unlock(struct mutex *lock)
 	 * the slow path will always be taken, and that clears the owner field
 	 * after verifying that it was indeed current.
 	 */
+	preempt_disable();
 	mutex_clear_owner(lock);
 #endif
 	__mutex_fastpath_unlock(&lock->count, __mutex_unlock_slowpath);
+#ifndef CONFIG_DEBUG_MUTEXES
+	preempt_enable();
+#endif
 }
 
 EXPORT_SYMBOL(mutex_unlock);
@@ -292,9 +298,13 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 	 * the slow path will always be taken, and that clears the owner field
 	 * after verifying that it was indeed current.
 	 */
+	preempt_disable();
 	mutex_clear_owner(&lock->base);
 #endif
 	__mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
+#ifndef CONFIG_DEBUG_MUTEXES
+	preempt_enable();
+#endif
 }
 EXPORT_SYMBOL(ww_mutex_unlock);
 
@@ -780,12 +790,16 @@ int __sched mutex_lock_interruptible(struct mutex *lock)
 	int ret;
 
 	might_sleep();
+	preempt_disable();
 	ret =  __mutex_fastpath_lock_retval(&lock->count);
 	if (likely(!ret)) {
 		mutex_set_owner(lock);
+		preempt_enable();
 		return 0;
-	} else
+	} else {
+		preempt_enable();
 		return __mutex_lock_interruptible_slowpath(lock);
+	}
 }
 
 EXPORT_SYMBOL(mutex_lock_interruptible);
@@ -795,12 +809,16 @@ int __sched mutex_lock_killable(struct mutex *lock)
 	int ret;
 
 	might_sleep();
+	preempt_disable();
 	ret = __mutex_fastpath_lock_retval(&lock->count);
 	if (likely(!ret)) {
 		mutex_set_owner(lock);
+		preempt_enable();
 		return 0;
-	} else
+	} else {
+		preempt_enable();
 		return __mutex_lock_killable_slowpath(lock);
+	}
 }
 EXPORT_SYMBOL(mutex_lock_killable);
 
@@ -889,9 +907,11 @@ int __sched mutex_trylock(struct mutex *lock)
 {
 	int ret;
 
+	preempt_disable();
 	ret = __mutex_fastpath_trylock(&lock->count, __mutex_trylock_slowpath);
 	if (ret)
 		mutex_set_owner(lock);
+	preempt_enable();
 
 	return ret;
 }
@@ -904,6 +924,7 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	int ret;
 
 	might_sleep();
+	preempt_disable();
 
 	ret = __mutex_fastpath_lock_retval(&lock->base.count);
 
@@ -912,6 +933,8 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 		mutex_set_owner(&lock->base);
 	} else
 		ret = __ww_mutex_lock_slowpath(lock, ctx);
+
+	preempt_enable();
 	return ret;
 }
 EXPORT_SYMBOL(__ww_mutex_lock);
@@ -922,6 +945,7 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	int ret;
 
 	might_sleep();
+	preempt_disable();
 
 	ret = __mutex_fastpath_lock_retval(&lock->base.count);
 
@@ -930,6 +954,8 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 		mutex_set_owner(&lock->base);
 	} else
 		ret = __ww_mutex_lock_interruptible_slowpath(lock, ctx);
+
+	preempt_enable();
 	return ret;
 }
 EXPORT_SYMBOL(__ww_mutex_lock_interruptible);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-28 19:13 [PATCH v2 0/5] mutex: Mutex scalability patches Jason Low
                   ` (3 preceding siblings ...)
  2014-01-28 19:13 ` [RFC][PATCH v2 4/5] mutex: Disable preemtion between modifying lock->owner and locking/unlocking mutex Jason Low
@ 2014-01-28 19:13 ` Jason Low
  2014-01-28 21:07   ` Peter Zijlstra
  2014-01-28 21:08 ` [PATCH v2 0/5] mutex: Mutex scalability patches Davidlohr Bueso
  5 siblings, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-01-28 19:13 UTC (permalink / raw)
  To: mingo, peterz, paulmck, Waiman.Long, torvalds, tglx, jason.low2
  Cc: linux-kernel, riel, akpm, davidlohr, hpa, andi, aswin,
	scott.norton, chegu_vinod

Before a thread attempts mutex spin on owner, it is first added to a queue
using an MCS lock so that only 1 thread spins on owner at a time. However, when
the spinner is queued, it is unable to check if it needs to reschedule and
will remain on the queue. This could cause the spinner to spin longer
than its allocated time. However, once it is the spinner's turn to spin on
owner, it would immediately go to slowpath if it need_resched() and gets no spin
time. In these situations, not only does the spinner take up more time for a
chance to spin for the mutex, it also ends up not getting to spin once it
gets its turn.

One solution would be to exit the MCS queue and go to mutex slowpath if
need_resched(). However, that may require a lot of overhead. For instance, if a
thread at the end of the queue need_resched(), in order to remove it from the
queue, we would have to unqueue and requeue all other spinners in front of it.

This RFC patch tries to address the issue in another context by avoiding
situations where spinners immediately get sent to the slowpath on
need_resched() upon getting to spin. We will first check for need_resched()
right after acquiring the MCS lock. If need_resched() is true, then
need_resched() triggered while the thread is waiting in the MCS queue (since
patch 1 makes the spinner check for need_resched() before entering the queue).
In this case, we will allow the thread to have at least 1 try to do
mutex_spin_on_owner() regardless of need_resched(). This patch also removes
the need_resched() in the outer loop in case we require a few extra spins to
observe lock->count == 1, and patch 4 ensures we won't be spinning with
lock owner preempted.

And if the need_resched() check after acquiring the MCS lock is false, then
we won't give the spinner any extra spinning.

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 kernel/locking/mutex.c |   22 ++++++++++++++++------
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index cfaaf53..2281a48 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -187,11 +187,11 @@ static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
  * access and not reliable.
  */
 static noinline
-int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner, int spin_grace_period)
 {
 	rcu_read_lock();
 	while (owner_running(lock, owner)) {
-		if (need_resched())
+		if (need_resched() && !spin_grace_period)
 			break;
 
 		arch_mutex_cpu_relax();
@@ -428,7 +428,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	struct task_struct *task = current;
 	struct mutex_waiter waiter;
 	unsigned long flags;
-	int ret;
+	int ret, spin_grace_period;
 	struct mspin_node node;
 
 	preempt_disable();
@@ -461,6 +461,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		goto slowpath;
 
 	mspin_lock(MLOCK(lock), &node);
+	/*
+	 * If need_resched() triggered while queued, then we will still give
+	 * this spinner a chance to spin for the mutex, rather than send this
+	 * immediately to slowpath after waiting for its turn.
+	 */
+	spin_grace_period = (need_resched()) ? 1 : 0;
 	for (;;) {
 		struct task_struct *owner;
 
@@ -485,8 +491,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * release the lock or go to sleep.
 		 */
 		owner = ACCESS_ONCE(lock->owner);
-		if (owner && !mutex_spin_on_owner(lock, owner))
-			break;
+		if (owner) {
+			if (!mutex_spin_on_owner(lock, owner, spin_grace_period))
+				break;
+
+			spin_grace_period = 0;
+		}
 
 		if ((atomic_read(&lock->count) == 1) &&
 		    (atomic_cmpxchg(&lock->count, 1, 0) == 1)) {
@@ -510,7 +520,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * we're an RT task that will live-lock because we won't let
 		 * the owner complete.
 		 */
-		if (!owner && (need_resched() || rt_task(task)))
+		if (!owner && rt_task(task))
 			break;
 
 		/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched()
  2014-01-28 19:13 ` [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched() Jason Low
@ 2014-01-28 20:20   ` Paul E. McKenney
  2014-01-28 22:01     ` Jason Low
  2014-01-28 21:09   ` Davidlohr Bueso
  2014-03-11 12:41   ` [tip:core/locking] locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner() tip-bot for Jason Low
  2 siblings, 1 reply; 51+ messages in thread
From: Paul E. McKenney @ 2014-01-28 20:20 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, peterz, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, Jan 28, 2014 at 11:13:12AM -0800, Jason Low wrote:
> The mutex_can_spin_on_owner() function should also return false if the
> task needs to be rescheduled to avoid entering the MCS queue when it
> needs to reschedule.
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

But I cannot help asking how this affects performance.  (My guess is
"not much", but always good to know.)

> ---
>  kernel/locking/mutex.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 4dd6e4c..85c6be1 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -212,6 +212,9 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
>  	struct task_struct *owner;
>  	int retval = 1;
> 
> +	if (need_resched())
> +		return 0;
> +
>  	rcu_read_lock();
>  	owner = ACCESS_ONCE(lock->owner);
>  	if (owner)
> -- 
> 1.7.1
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued
  2014-01-28 19:13 ` [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued Jason Low
@ 2014-01-28 20:23   ` Paul E. McKenney
  2014-01-28 20:24     ` Paul E. McKenney
                       ` (2 more replies)
  2014-03-11 12:41   ` [tip:core/locking] locking/mutexes: " tip-bot for Jason Low
  1 sibling, 3 replies; 51+ messages in thread
From: Paul E. McKenney @ 2014-01-28 20:23 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, peterz, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, Jan 28, 2014 at 11:13:13AM -0800, Jason Low wrote:
> The mutex->spin_mlock was introduced in order to ensure that only 1 thread
> spins for lock acquisition at a time to reduce cache line contention. When
> lock->owner is NULL and the lock->count is still not 1, the spinner(s) will
> continually release and obtain the lock->spin_mlock. This can generate
> quite a bit of overhead/contention, and also might just delay the spinner
> from getting the lock.
> 
> This patch modifies the way optimistic spinners are queued by queuing before
> entering the optimistic spinning loop as oppose to acquiring before every
> call to mutex_spin_on_owner(). So in situations where the spinner requires
> a few extra spins before obtaining the lock, then there will only be 1 spinner
> trying to get the lock and it will avoid the overhead from unnecessarily
> unlocking and locking the spin_mlock.
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>

One question below.  Also, this might well have a visible effect on
performance, so would be good to see the numbers.

							Thanx, Paul

> ---
>  kernel/locking/mutex.c |   16 +++++++---------
>  1 files changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 85c6be1..7519d27 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -419,6 +419,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	struct mutex_waiter waiter;
>  	unsigned long flags;
>  	int ret;
> +	struct mspin_node node;
> 
>  	preempt_disable();
>  	mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
> @@ -449,9 +450,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	if (!mutex_can_spin_on_owner(lock))
>  		goto slowpath;
> 
> +	mspin_lock(MLOCK(lock), &node);
>  	for (;;) {
>  		struct task_struct *owner;
> -		struct mspin_node  node;
> 
>  		if (use_ww_ctx && ww_ctx->acquired > 0) {
>  			struct ww_mutex *ww;
> @@ -466,19 +467,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  			 * performed the optimistic spinning cannot be done.
>  			 */
>  			if (ACCESS_ONCE(ww->ctx))
> -				goto slowpath;
> +				break;
>  		}
> 
>  		/*
>  		 * If there's an owner, wait for it to either
>  		 * release the lock or go to sleep.
>  		 */
> -		mspin_lock(MLOCK(lock), &node);
>  		owner = ACCESS_ONCE(lock->owner);
> -		if (owner && !mutex_spin_on_owner(lock, owner)) {
> -			mspin_unlock(MLOCK(lock), &node);
> -			goto slowpath;
> -		}
> +		if (owner && !mutex_spin_on_owner(lock, owner))
> +			break;
> 
>  		if ((atomic_read(&lock->count) == 1) &&
>  		    (atomic_cmpxchg(&lock->count, 1, 0) == 1)) {
> @@ -495,7 +493,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  			preempt_enable();
>  			return 0;
>  		}
> -		mspin_unlock(MLOCK(lock), &node);
> 
>  		/*
>  		 * When there's no owner, we might have preempted between the
> @@ -504,7 +501,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  		 * the owner complete.
>  		 */
>  		if (!owner && (need_resched() || rt_task(task)))
> -			goto slowpath;
> +			break;
> 
>  		/*
>  		 * The cpu_relax() call is a compiler barrier which forces
> @@ -514,6 +511,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  		 */
>  		arch_mutex_cpu_relax();
>  	}
> +	mspin_unlock(MLOCK(lock), &node);
>  slowpath:

Are there any remaining goto statements to slowpath?  If so, they need
to release the lock.  If not, this label should be removed.

>  #endif
>  	spin_lock_mutex(&lock->wait_lock, flags);
> -- 
> 1.7.1
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued
  2014-01-28 20:23   ` Paul E. McKenney
@ 2014-01-28 20:24     ` Paul E. McKenney
  2014-01-28 21:17     ` Davidlohr Bueso
  2014-01-28 22:10     ` Jason Low
  2 siblings, 0 replies; 51+ messages in thread
From: Paul E. McKenney @ 2014-01-28 20:24 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, peterz, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, Jan 28, 2014 at 12:23:34PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 28, 2014 at 11:13:13AM -0800, Jason Low wrote:
> > The mutex->spin_mlock was introduced in order to ensure that only 1 thread
> > spins for lock acquisition at a time to reduce cache line contention. When
> > lock->owner is NULL and the lock->count is still not 1, the spinner(s) will
> > continually release and obtain the lock->spin_mlock. This can generate
> > quite a bit of overhead/contention, and also might just delay the spinner
> > from getting the lock.
> > 
> > This patch modifies the way optimistic spinners are queued by queuing before
> > entering the optimistic spinning loop as oppose to acquiring before every
> > call to mutex_spin_on_owner(). So in situations where the spinner requires
> > a few extra spins before obtaining the lock, then there will only be 1 spinner
> > trying to get the lock and it will avoid the overhead from unnecessarily
> > unlocking and locking the spin_mlock.
> > 
> > Signed-off-by: Jason Low <jason.low2@hp.com>
> 
> One question below.  Also, this might well have a visible effect on
> performance, so would be good to see the numbers.

Never mind, I see the numbers in your patch 0.  :-/

							Thanx, Paul

> > ---
> >  kernel/locking/mutex.c |   16 +++++++---------
> >  1 files changed, 7 insertions(+), 9 deletions(-)
> > 
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index 85c6be1..7519d27 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -419,6 +419,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >  	struct mutex_waiter waiter;
> >  	unsigned long flags;
> >  	int ret;
> > +	struct mspin_node node;
> > 
> >  	preempt_disable();
> >  	mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
> > @@ -449,9 +450,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >  	if (!mutex_can_spin_on_owner(lock))
> >  		goto slowpath;
> > 
> > +	mspin_lock(MLOCK(lock), &node);
> >  	for (;;) {
> >  		struct task_struct *owner;
> > -		struct mspin_node  node;
> > 
> >  		if (use_ww_ctx && ww_ctx->acquired > 0) {
> >  			struct ww_mutex *ww;
> > @@ -466,19 +467,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >  			 * performed the optimistic spinning cannot be done.
> >  			 */
> >  			if (ACCESS_ONCE(ww->ctx))
> > -				goto slowpath;
> > +				break;
> >  		}
> > 
> >  		/*
> >  		 * If there's an owner, wait for it to either
> >  		 * release the lock or go to sleep.
> >  		 */
> > -		mspin_lock(MLOCK(lock), &node);
> >  		owner = ACCESS_ONCE(lock->owner);
> > -		if (owner && !mutex_spin_on_owner(lock, owner)) {
> > -			mspin_unlock(MLOCK(lock), &node);
> > -			goto slowpath;
> > -		}
> > +		if (owner && !mutex_spin_on_owner(lock, owner))
> > +			break;
> > 
> >  		if ((atomic_read(&lock->count) == 1) &&
> >  		    (atomic_cmpxchg(&lock->count, 1, 0) == 1)) {
> > @@ -495,7 +493,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >  			preempt_enable();
> >  			return 0;
> >  		}
> > -		mspin_unlock(MLOCK(lock), &node);
> > 
> >  		/*
> >  		 * When there's no owner, we might have preempted between the
> > @@ -504,7 +501,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >  		 * the owner complete.
> >  		 */
> >  		if (!owner && (need_resched() || rt_task(task)))
> > -			goto slowpath;
> > +			break;
> > 
> >  		/*
> >  		 * The cpu_relax() call is a compiler barrier which forces
> > @@ -514,6 +511,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >  		 */
> >  		arch_mutex_cpu_relax();
> >  	}
> > +	mspin_unlock(MLOCK(lock), &node);
> >  slowpath:
> 
> Are there any remaining goto statements to slowpath?  If so, they need
> to release the lock.  If not, this label should be removed.
> 
> >  #endif
> >  	spin_lock_mutex(&lock->wait_lock, flags);
> > -- 
> > 1.7.1
> > 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 4/5] mutex: Disable preemtion between modifying lock->owner and locking/unlocking mutex
  2014-01-28 19:13 ` [RFC][PATCH v2 4/5] mutex: Disable preemtion between modifying lock->owner and locking/unlocking mutex Jason Low
@ 2014-01-28 20:54   ` Peter Zijlstra
  2014-01-28 22:17     ` Jason Low
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-01-28 20:54 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, paulmck, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, Jan 28, 2014 at 11:13:15AM -0800, Jason Low wrote:
> This RFC patch disables preemption between modifying lock->owner and
> locking/unlocking the mutex lock. This prevents situations where the owner
> can preempt between those 2 operations, which causes optimistic spinners to
> be unable to check if lock->owner is not on CPU. As mentioned in the
> thread for this v1 patchset, disabling preemption is a cheap operation.

In that same thread it was also said that this wasn't really an issue at
all. So what is the justification?

The patch is rather hideous.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-28 19:13 ` [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued Jason Low
@ 2014-01-28 21:07   ` Peter Zijlstra
  2014-01-28 22:51     ` Jason Low
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-01-28 21:07 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, paulmck, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, Jan 28, 2014 at 11:13:16AM -0800, Jason Low wrote:
> Before a thread attempts mutex spin on owner, it is first added to a queue
> using an MCS lock so that only 1 thread spins on owner at a time. However, when
> the spinner is queued, it is unable to check if it needs to reschedule and
> will remain on the queue. This could cause the spinner to spin longer
> than its allocated time. 

what allocated time?

> However, once it is the spinner's turn to spin on
> owner, it would immediately go to slowpath if it need_resched() and gets no spin
> time. In these situations, not only does the spinner take up more time for a
> chance to spin for the mutex, it also ends up not getting to spin once it
> gets its turn.
> 
> One solution would be to exit the MCS queue and go to mutex slowpath if
> need_resched(). However, that may require a lot of overhead. For instance, if a
> thread at the end of the queue need_resched(), in order to remove it from the
> queue, we would have to unqueue and requeue all other spinners in front of it.

If you can do that, you can also walk the list and find prev and cmpxchg
the entry out. But I don't think you can do either, as we simply don't
have a head pointer.

> This RFC patch tries to address the issue in another context by avoiding
> situations where spinners immediately get sent to the slowpath on
> need_resched() upon getting to spin. 

> We will first check for need_resched()
> right after acquiring the MCS lock. If need_resched() is true, then
> need_resched() triggered while the thread is waiting in the MCS queue (since
> patch 1 makes the spinner check for need_resched() before entering the queue).

> In this case, we will allow the thread to have at least 1 try to do
> mutex_spin_on_owner() regardless of need_resched(). 

No! We should absolutely not ever willfully ignore need_resched(). Esp.
not for unbounded spins.

> This patch also removes
> the need_resched() in the outer loop in case we require a few extra spins to
> observe lock->count == 1, and patch 4 ensures we won't be spinning with
> lock owner preempted.
> 
> And if the need_resched() check after acquiring the MCS lock is false, then
> we won't give the spinner any extra spinning.

But urgh, nasty problem. Lemme ponder this a bit.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 0/5] mutex: Mutex scalability patches
  2014-01-28 19:13 [PATCH v2 0/5] mutex: Mutex scalability patches Jason Low
                   ` (4 preceding siblings ...)
  2014-01-28 19:13 ` [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued Jason Low
@ 2014-01-28 21:08 ` Davidlohr Bueso
  2014-01-28 23:11   ` Jason Low
  5 siblings, 1 reply; 51+ messages in thread
From: Davidlohr Bueso @ 2014-01-28 21:08 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, peterz, paulmck, Waiman.Long, torvalds, tglx,
	linux-kernel, riel, akpm, hpa, andi, aswin, scott.norton,
	chegu_vinod

On Tue, 2014-01-28 at 11:13 -0800, Jason Low wrote:
> v1->v2:
> - Replace the previous patch that limits the # of times a thread can spin with
>   !lock->owner with a patch that releases the mutex before holding the wait_lock
>   in the __mutex_unlock_common_slowpath() function.
> - Add a patch which allows a thread to attempt 1 mutex_spin_on_owner() without
>   checking need_resched() if need_resched() triggered while in the MCS queue.
> - Add a patch which disables preemption between modifying lock->owner and
>   acquiring/releasing the mutex.
> 
> This patchset addresses a few scalability issues with mutexes.
> 
> Patch 1 has the mutex_can_spin_on_owner() funtion check for need_resched()
> before being added to MCS queue. 
> 
> Patches 2, 3 are to fix issues with threads spinning when
> there is no lock owner when the mutex is under high contention.
> 
> Patch 4 and 5 are RFC patches. Patch 4 disables preemption between modifying
> lock->owner and locking/unlocking the mutex. Patch 5 addresses the situation
> where spinners can potentially wait a long time in the MCS queue for a chance
> to spin on mutex owner (not checking for need_resched()), yet ends up not
> getting to spin.
> 
> These changes benefit the AIM7 fserver and high_systime workloads (run on disk)
> on an 8 socket, 80 core box. The table below shows the performance
> improvements with 3.13 + patches 1, 2, 3 when compared to the 3.13 baseline,
> and the performance improvements with 3.13 + all 5 patches compared to
> the 3.13 baseline.

A lot of these changes are quite subtle. It would be good to see how
smaller systems are impacted with other workloads, not only big servers.
Since you see improvement in fserver, perhaps similar workloads could
also be of use: fio, filebench, postmark, fstress, etc.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched()
  2014-01-28 19:13 ` [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched() Jason Low
  2014-01-28 20:20   ` Paul E. McKenney
@ 2014-01-28 21:09   ` Davidlohr Bueso
  2014-03-11 12:41   ` [tip:core/locking] locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner() tip-bot for Jason Low
  2 siblings, 0 replies; 51+ messages in thread
From: Davidlohr Bueso @ 2014-01-28 21:09 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, peterz, paulmck, Waiman.Long, torvalds, tglx,
	linux-kernel, riel, akpm, hpa, andi, aswin, scott.norton,
	chegu_vinod

On Tue, 2014-01-28 at 11:13 -0800, Jason Low wrote:
> The mutex_can_spin_on_owner() function should also return false if the
> task needs to be rescheduled to avoid entering the MCS queue when it
> needs to reschedule.
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>

Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued
  2014-01-28 20:23   ` Paul E. McKenney
  2014-01-28 20:24     ` Paul E. McKenney
@ 2014-01-28 21:17     ` Davidlohr Bueso
  2014-01-28 22:10     ` Jason Low
  2 siblings, 0 replies; 51+ messages in thread
From: Davidlohr Bueso @ 2014-01-28 21:17 UTC (permalink / raw)
  To: paulmck
  Cc: Jason Low, mingo, peterz, Waiman.Long, torvalds, tglx,
	linux-kernel, riel, akpm, hpa, andi, aswin, scott.norton,
	chegu_vinod

On Tue, 2014-01-28 at 12:23 -0800, Paul E. McKenney wrote:
> On Tue, Jan 28, 2014 at 11:13:13AM -0800, Jason Low wrote:
> > ...
> >  		if (!owner && (need_resched() || rt_task(task)))
> > -			goto slowpath;
> > +			break;
> > 
> >  		/*
> >  		 * The cpu_relax() call is a compiler barrier which forces
> > @@ -514,6 +511,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >  		 */
> >  		arch_mutex_cpu_relax();
> >  	}
> > +	mspin_unlock(MLOCK(lock), &node);
> >  slowpath:
> 
> Are there any remaining goto statements to slowpath?  If so, they need
> to release the lock.  If not, this label should be removed.

We still have the !mutex_can_spin_on_owner case.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched()
  2014-01-28 20:20   ` Paul E. McKenney
@ 2014-01-28 22:01     ` Jason Low
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Low @ 2014-01-28 22:01 UTC (permalink / raw)
  To: paulmck
  Cc: mingo, peterz, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, 2014-01-28 at 12:20 -0800, Paul E. McKenney wrote:
> On Tue, Jan 28, 2014 at 11:13:12AM -0800, Jason Low wrote:
> > The mutex_can_spin_on_owner() function should also return false if the
> > task needs to be rescheduled to avoid entering the MCS queue when it
> > needs to reschedule.
> > 
> > Signed-off-by: Jason Low <jason.low2@hp.com>
> 
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> But I cannot help asking how this affects performance.  (My guess is
> "not much", but always good to know.)

Hi Paul,

In the past, when I tested this particular patch, I did not see any
noticeable performance gains. Patch 1 is really more of a "correctness"
change which was why I didn't retest this by itself. I can be sure to
include the benchmarks numbers on this particular patch though next
time.

Thanks.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued
  2014-01-28 20:23   ` Paul E. McKenney
  2014-01-28 20:24     ` Paul E. McKenney
  2014-01-28 21:17     ` Davidlohr Bueso
@ 2014-01-28 22:10     ` Jason Low
  2014-02-02 21:58       ` Paul E. McKenney
  2 siblings, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-01-28 22:10 UTC (permalink / raw)
  To: paulmck
  Cc: mingo, peterz, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, 2014-01-28 at 12:23 -0800, Paul E. McKenney wrote:
> On Tue, Jan 28, 2014 at 11:13:13AM -0800, Jason Low wrote:
> >  		/*
> >  		 * The cpu_relax() call is a compiler barrier which forces
> > @@ -514,6 +511,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >  		 */
> >  		arch_mutex_cpu_relax();
> >  	}
> > +	mspin_unlock(MLOCK(lock), &node);
> >  slowpath:
> 
> Are there any remaining goto statements to slowpath?  If so, they need
> to release the lock.  If not, this label should be removed.

Yes, if the mutex_can_spin_on_owner() returns false, then the thread
goes to directly slowpath, bypassing the optimistic spinning loop. In
that case, the thread avoids acquiring the MCS lock, and doesn't unlock
the MCS lock.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 4/5] mutex: Disable preemtion between modifying lock->owner and locking/unlocking mutex
  2014-01-28 20:54   ` Peter Zijlstra
@ 2014-01-28 22:17     ` Jason Low
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Low @ 2014-01-28 22:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, paulmck, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, 2014-01-28 at 21:54 +0100, Peter Zijlstra wrote:
> On Tue, Jan 28, 2014 at 11:13:15AM -0800, Jason Low wrote:
> > This RFC patch disables preemption between modifying lock->owner and
> > locking/unlocking the mutex lock. This prevents situations where the owner
> > can preempt between those 2 operations, which causes optimistic spinners to
> > be unable to check if lock->owner is not on CPU. As mentioned in the
> > thread for this v1 patchset, disabling preemption is a cheap operation.
> 
> In that same thread it was also said that this wasn't really an issue at
> all. So what is the justification?

This patch is mainly just a complementary patch for patch 5 so that we
can allow the spinner to wait slightly longer for lock->count to get set
without worrying about need_resched()/owner getting preempted.

Jason



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-28 21:07   ` Peter Zijlstra
@ 2014-01-28 22:51     ` Jason Low
  2014-01-29 11:51       ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-01-28 22:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, paulmck, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, 2014-01-28 at 22:07 +0100, Peter Zijlstra wrote:
> On Tue, Jan 28, 2014 at 11:13:16AM -0800, Jason Low wrote:
> > Before a thread attempts mutex spin on owner, it is first added to a queue
> > using an MCS lock so that only 1 thread spins on owner at a time. However, when
> > the spinner is queued, it is unable to check if it needs to reschedule and
> > will remain on the queue. This could cause the spinner to spin longer
> > than its allocated time. 
> 
> what allocated time?

Hi Peter,

By "spin longer than its allocated time", I meant to say that the thread
can continue spinning/waiting in the MCS queue after need_resched() has
been set.

> > However, once it is the spinner's turn to spin on
> > owner, it would immediately go to slowpath if it need_resched() and gets no spin
> > time. In these situations, not only does the spinner take up more time for a
> > chance to spin for the mutex, it also ends up not getting to spin once it
> > gets its turn.
> > 
> > One solution would be to exit the MCS queue and go to mutex slowpath if
> > need_resched(). However, that may require a lot of overhead. For instance, if a
> > thread at the end of the queue need_resched(), in order to remove it from the
> > queue, we would have to unqueue and requeue all other spinners in front of it.
> 
> If you can do that, you can also walk the list and find prev and cmpxchg
> the entry out. But I don't think you can do either, as we simply don't
> have a head pointer.
> 
> > This RFC patch tries to address the issue in another context by avoiding
> > situations where spinners immediately get sent to the slowpath on
> > need_resched() upon getting to spin. 
> 
> > We will first check for need_resched()
> > right after acquiring the MCS lock. If need_resched() is true, then
> > need_resched() triggered while the thread is waiting in the MCS queue (since
> > patch 1 makes the spinner check for need_resched() before entering the queue).
> 
> > In this case, we will allow the thread to have at least 1 try to do
> > mutex_spin_on_owner() regardless of need_resched(). 
> 
> No! We should absolutely not ever willfully ignore need_resched(). Esp.
> not for unbounded spins.

Ok. This was sort of a proof of concept patch to illustrate the type of
performance gains we can get by addressing this issue.

> > This patch also removes
> > the need_resched() in the outer loop in case we require a few extra spins to
> > observe lock->count == 1, and patch 4 ensures we won't be spinning with
> > lock owner preempted.
> > 
> > And if the need_resched() check after acquiring the MCS lock is false, then
> > we won't give the spinner any extra spinning.
> 
> But urgh, nasty problem. Lemme ponder this a bit.

Thanks,
Jason



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 0/5] mutex: Mutex scalability patches
  2014-01-28 21:08 ` [PATCH v2 0/5] mutex: Mutex scalability patches Davidlohr Bueso
@ 2014-01-28 23:11   ` Jason Low
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Low @ 2014-01-28 23:11 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: mingo, peterz, paulmck, Waiman.Long, torvalds, tglx,
	linux-kernel, riel, akpm, hpa, andi, aswin, scott.norton,
	chegu_vinod

On Tue, 2014-01-28 at 13:08 -0800, Davidlohr Bueso wrote:
> On Tue, 2014-01-28 at 11:13 -0800, Jason Low wrote:
> > v1->v2:
> > - Replace the previous patch that limits the # of times a thread can spin with
> >   !lock->owner with a patch that releases the mutex before holding the wait_lock
> >   in the __mutex_unlock_common_slowpath() function.
> > - Add a patch which allows a thread to attempt 1 mutex_spin_on_owner() without
> >   checking need_resched() if need_resched() triggered while in the MCS queue.
> > - Add a patch which disables preemption between modifying lock->owner and
> >   acquiring/releasing the mutex.
> > 
> > This patchset addresses a few scalability issues with mutexes.
> > 
> > Patch 1 has the mutex_can_spin_on_owner() funtion check for need_resched()
> > before being added to MCS queue. 
> > 
> > Patches 2, 3 are to fix issues with threads spinning when
> > there is no lock owner when the mutex is under high contention.
> > 
> > Patch 4 and 5 are RFC patches. Patch 4 disables preemption between modifying
> > lock->owner and locking/unlocking the mutex. Patch 5 addresses the situation
> > where spinners can potentially wait a long time in the MCS queue for a chance
> > to spin on mutex owner (not checking for need_resched()), yet ends up not
> > getting to spin.
> > 
> > These changes benefit the AIM7 fserver and high_systime workloads (run on disk)
> > on an 8 socket, 80 core box. The table below shows the performance
> > improvements with 3.13 + patches 1, 2, 3 when compared to the 3.13 baseline,
> > and the performance improvements with 3.13 + all 5 patches compared to
> > the 3.13 baseline.
> 
> A lot of these changes are quite subtle. It would be good to see how
> smaller systems are impacted with other workloads, not only big servers.
> Since you see improvement in fserver, perhaps similar workloads could
> also be of use: fio, filebench, postmark, fstress, etc.

Okay, I will include the numbers I collect on smaller systems next time
(even if the % difference is small).

Thanks,
Jason


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-28 22:51     ` Jason Low
@ 2014-01-29 11:51       ` Peter Zijlstra
  2014-01-31  3:29         ` Jason Low
  2014-02-05 21:44         ` Waiman Long
  0 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-01-29 11:51 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, paulmck, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, Jan 28, 2014 at 02:51:35PM -0800, Jason Low wrote:
> > But urgh, nasty problem. Lemme ponder this a bit.

OK, please have a very careful look at the below. It survived a boot
with udev -- which usually stresses mutex contention enough to explode
(in fact it did a few time when I got the contention/cancel path wrong),
however I have not ran anything else on it.

The below is an MCS variant that allows relatively cheap unqueueing. But
its somewhat tricky and I might have gotten a case wrong, esp. the
double concurrent cancel case got my head hurting (I didn't attempt a
tripple unqueue).

Applies to tip/master but does generate a few (harmless) compile
warnings because I didn't fully clean up the mcs_spinlock vs m_spinlock
thing.

Also, there's a comment in the slowpath that bears consideration.

---
 kernel/locking/mutex.c | 158 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 148 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 45fe1b5293d6..4a69da73903c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -166,6 +166,9 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 	struct task_struct *owner;
 	int retval = 1;
 
+	if (need_resched())
+		return 0;
+
 	rcu_read_lock();
 	owner = ACCESS_ONCE(lock->owner);
 	if (owner)
@@ -358,6 +361,134 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
 	spin_unlock_mutex(&lock->base.wait_lock, flags);
 }
 
+struct m_spinlock {
+	struct m_spinlock *next, *prev;
+	int locked; /* 1 if lock acquired */
+};
+
+/*
+ * Using a single mcs node per CPU is safe because mutex_lock() should not be
+ * called from interrupt context and we have preemption disabled over the mcs
+ * lock usage.
+ */
+static DEFINE_PER_CPU(struct m_spinlock, m_node);
+
+static bool m_spin_lock(struct m_spinlock **lock)
+{
+	struct m_spinlock *node = this_cpu_ptr(&m_node);
+	struct m_spinlock *prev, *next;
+
+	node->locked = 0;
+	node->next = NULL;
+
+	node->prev = prev = xchg(lock, node);
+	if (likely(prev == NULL))
+		return true;
+
+	ACCESS_ONCE(prev->next) = node;
+
+	/*
+	 * Normally @prev is untouchable after the above store; because at that
+	 * moment unlock can proceed and wipe the node element from stack.
+	 *
+	 * However, since our nodes are static per-cpu storage, we're
+	 * guaranteed their existence -- this allows us to apply
+	 * cmpxchg in an attempt to undo our queueing.
+	 */
+
+	while (!smp_load_acquire(&node->locked)) {
+		if (need_resched())
+			goto unqueue;
+		arch_mutex_cpu_relax();
+	}
+	return true;
+
+unqueue:
+
+	/*
+	 * Undo our @prev->next assignment; this will make @prev's unlock()
+	 * wait for a next pointer since @lock points to us.
+	 */
+	if (cmpxchg(&prev->next, node, NULL) != node) { /* A -> B */
+		/*
+		 * @prev->next no longer pointed to us; either we hold the lock
+		 * or @prev cancelled the wait too and we need to reload and
+		 * retry.
+		 */
+		if (smp_load_acquire(&node->locked))
+			return true;
+
+		/*
+		 * Because we observed the new @prev->next, the smp_wmb() at (B)
+		 * ensures that we must now observe the new @node->prev.
+		 */
+		prev = ACCESS_ONCE(node->prev);
+		goto unqueue;
+	}
+
+	if (smp_load_acquire(&node->locked)) {
+		/*
+		 * OOPS, we were too late, we already got the lock.  No harm
+		 * done though; @prev is now unused an nobody cares we frobbed
+		 * it.
+		 */
+		return true;
+	}
+
+	/*
+	 * Per the above logic @prev's unlock() will now wait,
+	 * therefore @prev is now stable.
+	 */
+
+	if (cmpxchg(lock, node, prev) == node) {
+		/*
+		 * We were still the last queued, we moved @lock back.  @prev
+		 * will now observe @lock and will complete its unlock().
+		 */
+		return false;
+	}
+
+	/*
+	 * We're not the last to be queued, obtain ->next.
+	 */
+
+	while (!(next = ACCESS_ONCE(node->next)))
+		arch_mutex_cpu_relax();
+
+	ACCESS_ONCE(next->prev) = prev;
+
+	/*
+	 * Ensure that @next->prev is written before we write @prev->next,
+	 * this guarantees that when the cmpxchg at (A) fails we must
+	 * observe the new prev value.
+	 */
+	smp_wmb(); /* B -> A */
+
+	/*
+	 * And point @prev to our next, and we're unlinked. We can use an
+	 * non-atomic op because only we modify @prev->next.
+	 */
+	ACCESS_ONCE(prev->next) = next;
+
+	return false;
+}
+
+static void m_spin_unlock(struct m_spinlock **lock)
+{
+	struct m_spinlock *node = this_cpu_ptr(&m_node);
+	struct m_spinlock *next = ACCESS_ONCE(node->next);
+
+	while (likely(!next)) {
+		if (likely(cmpxchg(lock, node, NULL) == node))
+			return;
+
+		arch_mutex_cpu_relax();
+		next = ACCESS_ONCE(node->next);
+	}
+
+	smp_store_release(&next->locked, 1);
+}
+
 /*
  * Lock a mutex (possibly interruptible), slowpath:
  */
@@ -400,9 +531,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	if (!mutex_can_spin_on_owner(lock))
 		goto slowpath;
 
+	if (!m_spin_lock(&lock->mcs_lock))
+		goto slowpath;
+
 	for (;;) {
 		struct task_struct *owner;
-		struct mcs_spinlock  node;
 
 		if (use_ww_ctx && ww_ctx->acquired > 0) {
 			struct ww_mutex *ww;
@@ -417,19 +550,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			 * performed the optimistic spinning cannot be done.
 			 */
 			if (ACCESS_ONCE(ww->ctx))
-				goto slowpath;
+				break;
 		}
 
 		/*
 		 * If there's an owner, wait for it to either
 		 * release the lock or go to sleep.
 		 */
-		mcs_spin_lock(&lock->mcs_lock, &node);
 		owner = ACCESS_ONCE(lock->owner);
-		if (owner && !mutex_spin_on_owner(lock, owner)) {
-			mcs_spin_unlock(&lock->mcs_lock, &node);
-			goto slowpath;
-		}
+		if (owner && !mutex_spin_on_owner(lock, owner))
+			break;
 
 		if ((atomic_read(&lock->count) == 1) &&
 		    (atomic_cmpxchg(&lock->count, 1, 0) == 1)) {
@@ -442,11 +572,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			}
 
 			mutex_set_owner(lock);
-			mcs_spin_unlock(&lock->mcs_lock, &node);
+			m_spin_unlock(&lock->mcs_lock);
 			preempt_enable();
 			return 0;
 		}
-		mcs_spin_unlock(&lock->mcs_lock, &node);
 
 		/*
 		 * When there's no owner, we might have preempted between the
@@ -455,7 +584,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * the owner complete.
 		 */
 		if (!owner && (need_resched() || rt_task(task)))
-			goto slowpath;
+			break;
 
 		/*
 		 * The cpu_relax() call is a compiler barrier which forces
@@ -465,10 +594,19 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 */
 		arch_mutex_cpu_relax();
 	}
+	m_spin_unlock(&lock->mcs_lock);
 slowpath:
 #endif
 	spin_lock_mutex(&lock->wait_lock, flags);
 
+	/*
+	 * XXX arguably, when need_resched() is set and there's other pending
+	 * owners we should not try-acquire and simply queue and schedule().
+	 *
+	 * There's nothing worse than obtaining a lock only to get immediately
+	 * scheduled out.
+	 */
+
 	/* once more, can we acquire the lock? */
 	if (MUTEX_SHOW_NO_WAITER(lock) && (atomic_xchg(&lock->count, 0) == 1))
 		goto skip_wait;

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-29 11:51       ` Peter Zijlstra
@ 2014-01-31  3:29         ` Jason Low
  2014-01-31 14:09           ` Peter Zijlstra
  2014-02-05 21:44         ` Waiman Long
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-01-31  3:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, paulmck, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Wed, 2014-01-29 at 12:51 +0100, Peter Zijlstra wrote:
> On Tue, Jan 28, 2014 at 02:51:35PM -0800, Jason Low wrote:
> > > But urgh, nasty problem. Lemme ponder this a bit.
> 
> OK, please have a very careful look at the below. It survived a boot
> with udev -- which usually stresses mutex contention enough to explode
> (in fact it did a few time when I got the contention/cancel path wrong),
> however I have not ran anything else on it.

I tested this patch on a 2 socket, 8 core machine with the AIM7 fserver
workload. After 100 users, the system gets soft lockups.

Some condition may be causing threads to not leave the "goto unqueue"
loop. I added a debug counter, and threads were able to reach more than
1,000,000,000 "goto unqueue".

I also was initially thinking if there can be problems when multiple
threads need_resched() and unqueue at the same time. As an example, 2
nodes that need to reschedule are next to each other in the middle of
the MCS queue. The 1st node executes "while (!(next =
ACCESS_ONCE(node->next)))" and exits the while loop because next is not
NULL. Then, the 2nd node execute its "if (cmpxchg(&prev->next, node,
NULL) != node)". We may then end up in a situation where the node before
the 1st node gets linked with the outdated 2nd node.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-31  3:29         ` Jason Low
@ 2014-01-31 14:09           ` Peter Zijlstra
  2014-01-31 20:01             ` Jason Low
  2014-02-02 20:02             ` Peter Zijlstra
  0 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-01-31 14:09 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, paulmck, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Thu, Jan 30, 2014 at 07:29:37PM -0800, Jason Low wrote:
> On Wed, 2014-01-29 at 12:51 +0100, Peter Zijlstra wrote:
> > On Tue, Jan 28, 2014 at 02:51:35PM -0800, Jason Low wrote:
> > > > But urgh, nasty problem. Lemme ponder this a bit.
> > 
> > OK, please have a very careful look at the below. It survived a boot
> > with udev -- which usually stresses mutex contention enough to explode
> > (in fact it did a few time when I got the contention/cancel path wrong),
> > however I have not ran anything else on it.
> 
> I tested this patch on a 2 socket, 8 core machine with the AIM7 fserver
> workload. After 100 users, the system gets soft lockups.
> 
> Some condition may be causing threads to not leave the "goto unqueue"
> loop. I added a debug counter, and threads were able to reach more than
> 1,000,000,000 "goto unqueue".
> 
> I also was initially thinking if there can be problems when multiple
> threads need_resched() and unqueue at the same time. As an example, 2
> nodes that need to reschedule are next to each other in the middle of
> the MCS queue. The 1st node executes "while (!(next =
> ACCESS_ONCE(node->next)))" and exits the while loop because next is not
> NULL. Then, the 2nd node execute its "if (cmpxchg(&prev->next, node,
> NULL) != node)". We may then end up in a situation where the node before
> the 1st node gets linked with the outdated 2nd node.

Yes indeed, I found two bugs. If you consider the MCS lock with 4 queued
nodes:

       .----.
 L ->  | N4 |
       `----'
        ^  V
       .----.
       | N3 |
       `----'
        ^  V
       .----.
       | N2 |
       `----'
        ^  V
       .----.
       | N1 |
       `----'

And look at the 3 unqueue steps in the below patch, and read 3A to
mean, Node 3 ran step A.

Both bugs were in Step B, the first was triggered through:

  3A,4A,3B,4B

In this case the old code would have 3 spinning in @n3->next, however
4B would have moved L to N3 and left @n3->next empty. FAIL.

The second was:

  2A,2B,3A,2C,3B,3C

In this case the cancellation of both 2 and 3 'succeed' and we end up
with N1 linked to N3 and N2 linked to N4. Total fail.


The first bug was fixed by having Step-B spin on both @n->next and @l
just like the unlink() path already did.

The second bug was fixed by having Step-B xchg(@n->next, NULL) instead
of only reading it; this way the 3A above cannot complete until after 2C
and we have a coherent chain back.

I've downloaded AIM7 from sf.net and I hope I'm running it with 100+
loads but I'm not entirely sure I got this thing right, its not really
making progress with or without patch :/

---
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -166,6 +166,9 @@ static inline int mutex_can_spin_on_owne
 	struct task_struct *owner;
 	int retval = 1;
 
+	if (need_resched())
+		return 0;
+
 	rcu_read_lock();
 	owner = ACCESS_ONCE(lock->owner);
 	if (owner)
@@ -358,6 +361,166 @@ ww_mutex_set_context_fastpath(struct ww_
 	spin_unlock_mutex(&lock->base.wait_lock, flags);
 }
 
+struct m_spinlock {
+	struct m_spinlock *next, *prev;
+	int locked; /* 1 if lock acquired */
+};
+
+/*
+ * Using a single mcs node per CPU is safe because mutex_lock() should not be
+ * called from interrupt context and we have preemption disabled over the mcs
+ * lock usage.
+ */
+static DEFINE_PER_CPU(struct m_spinlock, m_node);
+
+static bool m_spin_lock(struct m_spinlock **lock)
+{
+	struct m_spinlock *node = this_cpu_ptr(&m_node);
+	struct m_spinlock *prev, *next;
+
+	node->locked = 0;
+	node->next = NULL;
+
+	node->prev = prev = xchg(lock, node);
+	if (likely(prev == NULL))
+		return true;
+
+	ACCESS_ONCE(prev->next) = node;
+
+	/*
+	 * Normally @prev is untouchable after the above store; because at that
+	 * moment unlock can proceed and wipe the node element from stack.
+	 *
+	 * However, since our nodes are static per-cpu storage, we're
+	 * guaranteed their existence -- this allows us to apply
+	 * cmpxchg in an attempt to undo our queueing.
+	 */
+
+	while (!smp_load_acquire(&node->locked)) {
+		if (need_resched())
+			goto unqueue;
+		arch_mutex_cpu_relax();
+	}
+	return true;
+
+unqueue:
+	/*
+	 * Step - A  -- stabilize @prev
+	 *
+	 * Undo our @prev->next assignment; this will make @prev's
+	 * unlock()/cancel() wait for a next pointer since @lock points to us
+	 * (or later).
+	 */
+
+	for (;;) {
+		next = cmpxchg(&prev->next, node, NULL); /* A -> B,C */
+
+		/*
+		 * Because the unlock() path retains @prev->next (for
+		 * performance) we must check @node->locked after clearing
+		 * @prev->next to see if we raced.
+		 *
+		 * Ordered by the cmpxchg() above and the conditional-store in
+		 * unlock().
+		 */
+		if (smp_load_acquire(&node->locked)) {
+			/*
+			 * OOPS, we were too late, we already got the lock. No
+			 * harm done though; @prev is now unused an nobody
+			 * cares we frobbed it.
+			 */
+			return true;
+		}
+
+		if (next == node)
+			break;
+
+		/*
+		 * @prev->next didn't point to us anymore, we didn't own the
+		 * lock, so reload and try again.
+		 *
+		 * Because we observed the new @prev->next, the smp_wmb() at
+		 * (C) ensures that we must now observe the new @node->prev.
+		 */
+		prev = ACCESS_ONCE(node->prev);
+	}
+
+	/*
+	 * Step - B -- stabilize @next
+	 *
+	 * Similar to unlock(), wait for @node->next or move @lock from @node
+	 * back to @prev.
+	 */
+
+	for (;;) {
+		if (*lock == node && cmpxchg(lock, node, prev) == node) {
+			/*
+			 * We were the last queued, we moved @lock back. @prev
+			 * will now observe @lock and will complete its
+			 * unlock()/cancel().
+			 */
+			return false;
+		}
+
+		/*
+		 * We must xchg() the @node->next value, because if we were to
+		 * leave it in, a concurrent cancel() from @node->next might
+		 * complete Step-A and think its @prev is still valid.
+		 *
+		 * If the concurrent cancel() wins the race, we'll wait for
+		 * either @lock to point to us, through its Step-B, or wait for
+		 * a new @node->next from its Step-C.
+		 */
+		next = xchg(&node->next, NULL); /* B -> A */
+		if (next)
+			break;
+
+		arch_mutex_cpu_relax();
+	}
+
+	/*
+	 * Step - C -- unlink
+	 *
+	 * @prev is stable because its still waiting for a new @prev->next
+	 * pointer, @next is stable because our @node->next pointer is NULL and
+	 * it will wait in Step-A.
+	 */
+
+	ACCESS_ONCE(next->prev) = prev;
+
+	/*
+	 * Ensure that @next->prev is written before we write @prev->next,
+	 * this guarantees that when the cmpxchg at (A) fails we must
+	 * observe the new prev value.
+	 */
+	smp_wmb(); /* C -> A */
+
+	/*
+	 * And point @prev to our next, and we're unlinked.
+	 */
+	ACCESS_ONCE(prev->next) = next;
+
+	return false;
+}
+
+static void m_spin_unlock(struct m_spinlock **lock)
+{
+	struct m_spinlock *node = this_cpu_ptr(&m_node);
+	struct m_spinlock *next;
+
+	for (;;) {
+		if (likely(cmpxchg(lock, node, NULL) == node))
+			return;
+
+		next = ACCESS_ONCE(node->next);
+		if (unlikely(next))
+			break;
+
+		arch_mutex_cpu_relax();
+	}
+	smp_store_release(&next->locked, 1);
+}
+
 /*
  * Lock a mutex (possibly interruptible), slowpath:
  */
@@ -400,9 +563,11 @@ __mutex_lock_common(struct mutex *lock,
 	if (!mutex_can_spin_on_owner(lock))
 		goto slowpath;
 
+	if (!m_spin_lock(&lock->mcs_lock))
+		goto slowpath;
+
 	for (;;) {
 		struct task_struct *owner;
-		struct mcs_spinlock  node;
 
 		if (use_ww_ctx && ww_ctx->acquired > 0) {
 			struct ww_mutex *ww;
@@ -417,19 +582,16 @@ __mutex_lock_common(struct mutex *lock,
 			 * performed the optimistic spinning cannot be done.
 			 */
 			if (ACCESS_ONCE(ww->ctx))
-				goto slowpath;
+				break;
 		}
 
 		/*
 		 * If there's an owner, wait for it to either
 		 * release the lock or go to sleep.
 		 */
-		mcs_spin_lock(&lock->mcs_lock, &node);
 		owner = ACCESS_ONCE(lock->owner);
-		if (owner && !mutex_spin_on_owner(lock, owner)) {
-			mcs_spin_unlock(&lock->mcs_lock, &node);
-			goto slowpath;
-		}
+		if (owner && !mutex_spin_on_owner(lock, owner))
+			break;
 
 		if ((atomic_read(&lock->count) == 1) &&
 		    (atomic_cmpxchg(&lock->count, 1, 0) == 1)) {
@@ -442,11 +604,10 @@ __mutex_lock_common(struct mutex *lock,
 			}
 
 			mutex_set_owner(lock);
-			mcs_spin_unlock(&lock->mcs_lock, &node);
+			m_spin_unlock(&lock->mcs_lock);
 			preempt_enable();
 			return 0;
 		}
-		mcs_spin_unlock(&lock->mcs_lock, &node);
 
 		/*
 		 * When there's no owner, we might have preempted between the
@@ -455,7 +616,7 @@ __mutex_lock_common(struct mutex *lock,
 		 * the owner complete.
 		 */
 		if (!owner && (need_resched() || rt_task(task)))
-			goto slowpath;
+			break;
 
 		/*
 		 * The cpu_relax() call is a compiler barrier which forces
@@ -465,10 +626,19 @@ __mutex_lock_common(struct mutex *lock,
 		 */
 		arch_mutex_cpu_relax();
 	}
+	m_spin_unlock(&lock->mcs_lock);
 slowpath:
 #endif
 	spin_lock_mutex(&lock->wait_lock, flags);
 
+	/*
+	 * XXX arguably, when need_resched() is set and there's other pending
+	 * owners we should not try-acquire and simply queue and schedule().
+	 *
+	 * There's nothing worse than obtaining a lock only to get immediately
+	 * scheduled out.
+	 */
+
 	/* once more, can we acquire the lock? */
 	if (MUTEX_SHOW_NO_WAITER(lock) && (atomic_xchg(&lock->count, 0) == 1))
 		goto skip_wait;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-31 14:09           ` Peter Zijlstra
@ 2014-01-31 20:01             ` Jason Low
  2014-01-31 20:08               ` Peter Zijlstra
  2014-02-02 20:02             ` Peter Zijlstra
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-01-31 20:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, Ingo Molnar, Paul McKenney, Waiman Long,
	Linus Torvalds, Thomas Gleixner, Linux Kernel Mailing List,
	Rik van Riel, Andrew Morton, Davidlohr Bueso, H. Peter Anvin,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	chegu_vinod

On Fri, Jan 31, 2014 at 6:09 AM, Peter Zijlstra <peterz@infradead.org> wrote:

> I've downloaded AIM7 from sf.net and I hope I'm running it with 100+
> loads but I'm not entirely sure I got this thing right, its not really
> making progress with or without patch :/

Ingo's program http://lkml.org/lkml/2006/1/8/50 using the V option may
be able to generate similar mutex contention.

Currently still getting soft lockups with the updated version.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-31 20:01             ` Jason Low
@ 2014-01-31 20:08               ` Peter Zijlstra
  2014-02-02 21:01                 ` Jason Low
  2014-02-02 22:02                 ` Paul E. McKenney
  0 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-01-31 20:08 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Paul McKenney, Waiman Long, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Rik van Riel,
	Andrew Morton, Davidlohr Bueso, H. Peter Anvin, Andi Kleen,
	Chandramouleeswaran, Aswin, Norton, Scott J, chegu_vinod

On Fri, Jan 31, 2014 at 12:01:37PM -0800, Jason Low wrote:
> On Fri, Jan 31, 2014 at 6:09 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > I've downloaded AIM7 from sf.net and I hope I'm running it with 100+
> > loads but I'm not entirely sure I got this thing right, its not really
> > making progress with or without patch :/
> 
> Ingo's program http://lkml.org/lkml/2006/1/8/50 using the V option may
> be able to generate similar mutex contention.
> 
> Currently still getting soft lockups with the updated version.

Bugger.. ok clearly I need to think harder still. I'm fairly sure this
cancelation can work though, just seems tricky to get right :-)

I'll give that proglet from Ingo a go, although that might be Monday ere
I get to it.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-31 14:09           ` Peter Zijlstra
  2014-01-31 20:01             ` Jason Low
@ 2014-02-02 20:02             ` Peter Zijlstra
  1 sibling, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-02-02 20:02 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, paulmck, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Fri, Jan 31, 2014 at 03:09:41PM +0100, Peter Zijlstra wrote:
> +struct m_spinlock {
> +	struct m_spinlock *next, *prev;
> +	int locked; /* 1 if lock acquired */
> +};
> +
> +/*
> + * Using a single mcs node per CPU is safe because mutex_lock() should not be
> + * called from interrupt context and we have preemption disabled over the mcs
> + * lock usage.
> + */
> +static DEFINE_PER_CPU(struct m_spinlock, m_node);
> +
> +static bool m_spin_lock(struct m_spinlock **lock)
> +{
> +	struct m_spinlock *node = this_cpu_ptr(&m_node);
> +	struct m_spinlock *prev, *next;
> +
> +	node->locked = 0;
> +	node->next = NULL;
> +
> +	node->prev = prev = xchg(lock, node);
> +	if (likely(prev == NULL))
> +		return true;
> +
> +	ACCESS_ONCE(prev->next) = node;
> +
> +	/*
> +	 * Normally @prev is untouchable after the above store; because at that
> +	 * moment unlock can proceed and wipe the node element from stack.
> +	 *
> +	 * However, since our nodes are static per-cpu storage, we're
> +	 * guaranteed their existence -- this allows us to apply
> +	 * cmpxchg in an attempt to undo our queueing.
> +	 */
> +
> +	while (!smp_load_acquire(&node->locked)) {
> +		if (need_resched())
> +			goto unqueue;
> +		arch_mutex_cpu_relax();
> +	}
> +	return true;
> +
> +unqueue:
> +	/*
> +	 * Step - A  -- stabilize @prev
> +	 *
> +	 * Undo our @prev->next assignment; this will make @prev's
> +	 * unlock()/cancel() wait for a next pointer since @lock points to us
> +	 * (or later).
> +	 */
> +
> +	for (;;) {
> +		next = cmpxchg(&prev->next, node, NULL); /* A -> B,C */
> +
> +		/*
> +		 * Because the unlock() path retains @prev->next (for
> +		 * performance) we must check @node->locked after clearing
> +		 * @prev->next to see if we raced.
> +		 *
> +		 * Ordered by the cmpxchg() above and the conditional-store in
> +		 * unlock().
> +		 */
> +		if (smp_load_acquire(&node->locked)) {
> +			/*
> +			 * OOPS, we were too late, we already got the lock. No
> +			 * harm done though; @prev is now unused an nobody
> +			 * cares we frobbed it.
> +			 */
> +			return true;
> +		}
> +
> +		if (next == node)
> +			break;
> +
> +		/*
> +		 * @prev->next didn't point to us anymore, we didn't own the
> +		 * lock, so reload and try again.
> +		 *
> +		 * Because we observed the new @prev->next, the smp_wmb() at
> +		 * (C) ensures that we must now observe the new @node->prev.
> +		 */
> +		prev = ACCESS_ONCE(node->prev);
> +	}
> +
> +	/*
> +	 * Step - B -- stabilize @next
> +	 *
> +	 * Similar to unlock(), wait for @node->next or move @lock from @node
> +	 * back to @prev.
> +	 */
> +
> +	for (;;) {
> +		if (*lock == node && cmpxchg(lock, node, prev) == node) {
> +			/*
> +			 * We were the last queued, we moved @lock back. @prev
> +			 * will now observe @lock and will complete its
> +			 * unlock()/cancel().
> +			 */
> +			return false;
> +		}
> +
> +		/*
> +		 * We must xchg() the @node->next value, because if we were to
> +		 * leave it in, a concurrent cancel() from @node->next might
> +		 * complete Step-A and think its @prev is still valid.
> +		 *
> +		 * If the concurrent cancel() wins the race, we'll wait for
> +		 * either @lock to point to us, through its Step-B, or wait for
> +		 * a new @node->next from its Step-C.
> +		 */
> +		next = xchg(&node->next, NULL); /* B -> A */
> +		if (next)
> +			break;
> +
> +		arch_mutex_cpu_relax();
> +	}
> +
> +	/*
> +	 * Step - C -- unlink
> +	 *
> +	 * @prev is stable because its still waiting for a new @prev->next
> +	 * pointer, @next is stable because our @node->next pointer is NULL and
> +	 * it will wait in Step-A.
> +	 */
> +
> +	ACCESS_ONCE(next->prev) = prev;
> +
> +	/*
> +	 * Ensure that @next->prev is written before we write @prev->next,
> +	 * this guarantees that when the cmpxchg at (A) fails we must
> +	 * observe the new prev value.
> +	 */
> +	smp_wmb(); /* C -> A */

OK, I've definitely stared at this code for too long :/

I don't think the above barrier is right -- or even required for that
matter. At this point I can't see anything wrong with the order of
either of these stores.

If the latter store hits first an unlock can can happen and release the
@next entry, which is fine as the Step-A loop can deal with that, if the
unlock doesn't happen, we'll simply wait for the first store to become
visible before trying to undo later again.

If the former store hits first, we'll simply wait for the later store to
appear and we'll try to undo it.

This obviously doesn't explain lockups, but it does reduce the code to
stare at ever so slightly.

> +	/*
> +	 * And point @prev to our next, and we're unlinked.
> +	 */
> +	ACCESS_ONCE(prev->next) = next;
> +
> +	return false;
> +}
> +
> +static void m_spin_unlock(struct m_spinlock **lock)
> +{
> +	struct m_spinlock *node = this_cpu_ptr(&m_node);
> +	struct m_spinlock *next;
> +
> +	for (;;) {
> +		if (likely(cmpxchg(lock, node, NULL) == node))
> +			return;
> +
> +		next = ACCESS_ONCE(node->next);
> +		if (unlikely(next))
> +			break;
> +
> +		arch_mutex_cpu_relax();
> +	}
> +	smp_store_release(&next->locked, 1);
> +}

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-31 20:08               ` Peter Zijlstra
@ 2014-02-02 21:01                 ` Jason Low
  2014-02-02 21:12                   ` Peter Zijlstra
  2014-02-02 22:02                 ` Paul E. McKenney
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-02-02 21:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul McKenney, Waiman Long, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Rik van Riel,
	Andrew Morton, Davidlohr Bueso, H. Peter Anvin, Andi Kleen,
	Chandramouleeswaran, Aswin, Norton, Scott J, chegu_vinod

On Fri, 2014-01-31 at 21:08 +0100, Peter Zijlstra wrote:
> On Fri, Jan 31, 2014 at 12:01:37PM -0800, Jason Low wrote:
> > Currently still getting soft lockups with the updated version.
> 
> Bugger.. ok clearly I need to think harder still. I'm fairly sure this
> cancelation can work though, just seems tricky to get right :-)

Ok, I believe I have found a race condition between m_spin_lock() and
m_spin_unlock().

In m_spin_unlock(), we do "next = ACCESS_ONCE(node->next)". Then, if
next is not NULL, we proceed to set next->locked to 1.

A thread in m_spin_lock() in the unqueue path could execute
"next = cmpxchg(&prev->next, node, NULL)" after the thread in
m_spin_unlock() accesses its node->next and finds that it is not NULL.
Then, the thread in m_spin_lock() could check !node->locked before
the thread in m_spin_unlock() sets next->locked to 1.

The following addition change was able to solve the initial lockups that were
occurring when running fserver on a 2 socket box.

---
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 9eb4dbe..e71a84a 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -513,8 +513,13 @@ static void m_spin_unlock(struct m_spinlock **lock)
 			return;
 
 		next = ACCESS_ONCE(node->next);
-		if (unlikely(next))
-			break;
+
+		if (unlikely(next)) {
+			next = cmpxchg(&node->next, next, NULL);
+
+			if (next)
+				break;
+		}
 
 		arch_mutex_cpu_relax();
 	}



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-02 21:01                 ` Jason Low
@ 2014-02-02 21:12                   ` Peter Zijlstra
  2014-02-03 18:39                     ` Jason Low
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-02-02 21:12 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Paul McKenney, Waiman Long, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Rik van Riel,
	Andrew Morton, Davidlohr Bueso, H. Peter Anvin, Andi Kleen,
	Chandramouleeswaran, Aswin, Norton, Scott J, chegu_vinod

On Sun, Feb 02, 2014 at 01:01:23PM -0800, Jason Low wrote:
> On Fri, 2014-01-31 at 21:08 +0100, Peter Zijlstra wrote:
> > On Fri, Jan 31, 2014 at 12:01:37PM -0800, Jason Low wrote:
> > > Currently still getting soft lockups with the updated version.
> > 
> > Bugger.. ok clearly I need to think harder still. I'm fairly sure this
> > cancelation can work though, just seems tricky to get right :-)
> 
> Ok, I believe I have found a race condition between m_spin_lock() and
> m_spin_unlock().
> 
> In m_spin_unlock(), we do "next = ACCESS_ONCE(node->next)". Then, if
> next is not NULL, we proceed to set next->locked to 1.
> 
> A thread in m_spin_lock() in the unqueue path could execute
> "next = cmpxchg(&prev->next, node, NULL)" after the thread in
> m_spin_unlock() accesses its node->next and finds that it is not NULL.
> Then, the thread in m_spin_lock() could check !node->locked before
> the thread in m_spin_unlock() sets next->locked to 1.

Yes indeed. How silly of me to not spot that!

> The following addition change was able to solve the initial lockups that were
> occurring when running fserver on a 2 socket box.
> 
> ---
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 9eb4dbe..e71a84a 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -513,8 +513,13 @@ static void m_spin_unlock(struct m_spinlock **lock)
>  			return;
>  
>  		next = ACCESS_ONCE(node->next);
> -		if (unlikely(next))
> -			break;
> +
> +		if (unlikely(next)) {
> +			next = cmpxchg(&node->next, next, NULL);
> +
> +			if (next)

The cmpxchg could fail and next still be !NULL I suppose.

> +				break;
> +		}


The way I wrote that same loop in step-B, is:


	for (;;) {
		if (*lock == node && cmpxchg(lock, node, prev) == node)
			return

		next = xchg(&node->next, NULL); /* B -> A */
		if (next)
			break;

		arch_mutex_cpu_relax();
	}

I suppose we can make that something like:


	if (node->next) {
		next = xchg(&node->next, NULL);
		if (next)
			break
	}

To avoid the xchg on every loop.

I had wanted to avoid the additional locked op in the unlock path, but
yes that does make things easier.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued
  2014-01-28 22:10     ` Jason Low
@ 2014-02-02 21:58       ` Paul E. McKenney
  0 siblings, 0 replies; 51+ messages in thread
From: Paul E. McKenney @ 2014-02-02 21:58 UTC (permalink / raw)
  To: Jason Low
  Cc: mingo, peterz, Waiman.Long, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Tue, Jan 28, 2014 at 02:10:41PM -0800, Jason Low wrote:
> On Tue, 2014-01-28 at 12:23 -0800, Paul E. McKenney wrote:
> > On Tue, Jan 28, 2014 at 11:13:13AM -0800, Jason Low wrote:
> > >  		/*
> > >  		 * The cpu_relax() call is a compiler barrier which forces
> > > @@ -514,6 +511,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> > >  		 */
> > >  		arch_mutex_cpu_relax();
> > >  	}
> > > +	mspin_unlock(MLOCK(lock), &node);
> > >  slowpath:
> > 
> > Are there any remaining goto statements to slowpath?  If so, they need
> > to release the lock.  If not, this label should be removed.
> 
> Yes, if the mutex_can_spin_on_owner() returns false, then the thread
> goes to directly slowpath, bypassing the optimistic spinning loop. In
> that case, the thread avoids acquiring the MCS lock, and doesn't unlock
> the MCS lock.

Got it, apologies for my confusion!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-31 20:08               ` Peter Zijlstra
  2014-02-02 21:01                 ` Jason Low
@ 2014-02-02 22:02                 ` Paul E. McKenney
  1 sibling, 0 replies; 51+ messages in thread
From: Paul E. McKenney @ 2014-02-02 22:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, Ingo Molnar, Waiman Long, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Rik van Riel,
	Andrew Morton, Davidlohr Bueso, H. Peter Anvin, Andi Kleen,
	Chandramouleeswaran, Aswin, Norton, Scott J, chegu_vinod

On Fri, Jan 31, 2014 at 09:08:25PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 31, 2014 at 12:01:37PM -0800, Jason Low wrote:
> > On Fri, Jan 31, 2014 at 6:09 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > I've downloaded AIM7 from sf.net and I hope I'm running it with 100+
> > > loads but I'm not entirely sure I got this thing right, its not really
> > > making progress with or without patch :/
> > 
> > Ingo's program http://lkml.org/lkml/2006/1/8/50 using the V option may
> > be able to generate similar mutex contention.
> > 
> > Currently still getting soft lockups with the updated version.
> 
> Bugger.. ok clearly I need to think harder still. I'm fairly sure this
> cancelation can work though, just seems tricky to get right :-)

We used to do something similar to avoid passing locks off to tasks that
had been interrupted while spinning, and it was a bit tricky.  But we
had it a bit easier, because we didn't actually have to remove the element
from the queue, just bypass it at lock-grant time.

							Thanx, Paul

> I'll give that proglet from Ingo a go, although that might be Monday ere
> I get to it.
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-02 21:12                   ` Peter Zijlstra
@ 2014-02-03 18:39                     ` Jason Low
  2014-02-03 19:25                       ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-02-03 18:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul McKenney, Waiman Long, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Rik van Riel,
	Andrew Morton, Davidlohr Bueso, H. Peter Anvin, Andi Kleen,
	Chandramouleeswaran, Aswin, Norton, Scott J, chegu_vinod

On Sun, 2014-02-02 at 22:12 +0100, Peter Zijlstra wrote:
> The way I wrote that same loop in step-B, is:
> 
> 
> 	for (;;) {
> 		if (*lock == node && cmpxchg(lock, node, prev) == node)
> 			return
> 
> 		next = xchg(&node->next, NULL); /* B -> A */
> 		if (next)
> 			break;
> 
> 		arch_mutex_cpu_relax();
> 	}
> 
> I suppose we can make that something like:
> 
> 
> 	if (node->next) {
> 		next = xchg(&node->next, NULL);
> 		if (next)
> 			break
> 	}
> 
> To avoid the xchg on every loop.

Ah yes, we want to use xchg() on &node->next.

Since the cmpxchg() is now in a loop in the unlock function, an
additional (*lock == node) check before the cmpxchg() would also be nice
to avoid spinning on cmpxchg() there too.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-03 18:39                     ` Jason Low
@ 2014-02-03 19:25                       ` Peter Zijlstra
  2014-02-03 20:55                         ` Jason Low
  2014-02-04  7:13                         ` Jason Low
  0 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-02-03 19:25 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Paul McKenney, Waiman Long, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Rik van Riel,
	Andrew Morton, Davidlohr Bueso, H. Peter Anvin, Andi Kleen,
	Chandramouleeswaran, Aswin, Norton, Scott J, chegu_vinod

On Mon, Feb 03, 2014 at 10:39:20AM -0800, Jason Low wrote:
> > To avoid the xchg on every loop.
> 
> Ah yes, we want to use xchg() on &node->next.
> 
> Since the cmpxchg() is now in a loop in the unlock function, an
> additional (*lock == node) check before the cmpxchg() would also be nice
> to avoid spinning on cmpxchg() there too.

Right, I have the below; you can find the patches this depends upon
here:

  http://programming.kicks-ass.net/sekrit/patches.tar.bz2

---
Subject: locking, mutex: Cancelable MCS lock for adaptive spinning
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 29 Jan 2014 12:51:42 +0100

Since we want a task waiting for a mutex_lock() to go to sleep and
reschedule on need_resched() we must be able to abort the
mcs_spin_lock() around the adaptive spin.

Therefore implement a cancelable mcs lock.

XXX: anybody got a better name than m_spinlock?

Cc: paulmck@linux.vnet.ibm.com
Cc: Waiman.Long@hp.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Cc: chegu_vinod@hp.com
Cc: mingo@redhat.com
Cc: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-7jr68p4f447w2e0ck7y1yl06@git.kernel.org
---
 include/linux/mutex.h         |    4 -
 kernel/locking/Makefile       |    2 
 kernel/locking/mcs_spinlock.c |  156 ++++++++++++++++++++++++++++++++++++++++++
 kernel/locking/mcs_spinlock.h |   18 ++++
 kernel/locking/mutex.c        |   10 +-
 5 files changed, 183 insertions(+), 7 deletions(-)

--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -46,7 +46,7 @@
  * - detects multi-task circular deadlocks and prints out all affected
  *   locks and tasks (and only those tasks)
  */
-struct mcs_spinlock;
+struct m_spinlock;
 struct mutex {
 	/* 1: unlocked, 0: locked, negative: locked, possible waiters */
 	atomic_t		count;
@@ -56,7 +56,7 @@ struct mutex {
 	struct task_struct	*owner;
 #endif
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
-	struct mcs_spinlock	*mcs_lock;	/* Spinner MCS lock */
+	struct m_spinlock	*m_lock;	/* Spinner MCS lock */
 #endif
 #ifdef CONFIG_DEBUG_MUTEXES
 	const char 		*name;
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -1,5 +1,5 @@
 
-obj-y += mutex.o semaphore.o rwsem.o lglock.o
+obj-y += mutex.o semaphore.o rwsem.o lglock.o mcs_spinlock.o
 
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_lockdep.o = -pg
--- /dev/null
+++ b/kernel/locking/mcs_spinlock.c
@@ -0,0 +1,156 @@
+
+#include <linux/percpu.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include "mcs_spinlock.h"
+
+#ifdef CONFIG_SMP
+
+/*
+ * Using a single mcs node per CPU is safe because mutex_lock() should not be
+ * called from interrupt context and we have preemption disabled over the mcs
+ * lock usage.
+ */
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct m_spinlock, m_node);
+
+/*
+ * Get a stable @node->next pointer, either for unlock() or unqueue() purposes.
+ * Can return NULL in case we were the last queued and we updated @lock instead.
+ */
+static inline struct m_spinlock *
+m_spin_wait_next(struct m_spinlock **lock, struct m_spinlock *node,
+		 struct m_spinlock *prev)
+{
+	struct m_spinlock *next = NULL;
+
+	for (;;) {
+		if (*lock == node && cmpxchg(lock, node, prev) == node) {
+			/*
+			 * We were the last queued, we moved @lock back. @prev
+			 * will now observe @lock and will complete its
+			 * unlock()/unqueue().
+			 */
+			break;
+		}
+
+		/*
+		 * We must xchg() the @node->next value, because if we were to
+		 * leave it in, a concurrent unlock()/unqueue() from
+		 * @node->next might complete Step-A and think its @prev is
+		 * still valid.
+		 *
+		 * If the concurrent unlock()/unqueue() wins the race, we'll
+		 * wait for either @lock to point to us, through its Step-B, or
+		 * wait for a new @node->next from its Step-C.
+		 */
+		if (node->next) {
+			next = xchg(&node->next, NULL);
+			if (next)
+				break;
+		}
+
+		arch_mutex_cpu_relax();
+	}
+
+	return next;
+}
+
+bool m_spin_lock(struct m_spinlock **lock)
+{
+	struct m_spinlock *node = this_cpu_ptr(&m_node);
+	struct m_spinlock *prev, *next;
+
+	node->locked = 0;
+	node->next = NULL;
+
+	node->prev = prev = xchg(lock, node);
+	if (likely(prev == NULL))
+		return true;
+
+	ACCESS_ONCE(prev->next) = node;
+
+	/*
+	 * Normally @prev is untouchable after the above store; because at that
+	 * moment unlock can proceed and wipe the node element from stack.
+	 *
+	 * However, since our nodes are static per-cpu storage, we're
+	 * guaranteed their existence -- this allows us to apply
+	 * cmpxchg in an attempt to undo our queueing.
+	 */
+
+	while (!smp_load_acquire(&node->locked)) {
+		if (need_resched())
+			goto unqueue;
+		arch_mutex_cpu_relax();
+	}
+	return true;
+
+unqueue:
+	/*
+	 * Step - A  -- stabilize @prev
+	 *
+	 * Undo our @prev->next assignment; this will make @prev's
+	 * unlock()/unqueue() wait for a next pointer since @lock points to us
+	 * (or later).
+	 */
+
+	for (;;) {
+		if (prev->next == node &&
+		    cmpxchg(&prev->next, node, NULL) == node)
+			break;
+
+		/*
+		 * We can only fail the cmpxchg() racing against an unlock(),
+		 * in which case we should observe @node->locked becomming
+		 * true.
+		 */
+		if (smp_load_acquire(&node->locked))
+			return true;
+
+		/*
+		 * Or we race against a concurrent unqueue()'s step-B, in which
+		 * case its step-C will write us a new @node->prev pointer.
+		 */
+		prev = ACCESS_ONCE(node->prev);
+	}
+
+	/*
+	 * Step - B -- stabilize @next
+	 *
+	 * Similar to unlock(), wait for @node->next or move @lock from @node
+	 * back to @prev.
+	 */
+
+	next = m_spin_wait_next(lock, node, prev);
+	if (!next)
+		return false;
+
+	/*
+	 * Step - C -- unlink
+	 *
+	 * @prev is stable because its still waiting for a new @prev->next
+	 * pointer, @next is stable because our @node->next pointer is NULL and
+	 * it will wait in Step-A.
+	 */
+
+	ACCESS_ONCE(next->prev) = prev;
+	ACCESS_ONCE(prev->next) = next;
+
+	return false;
+}
+
+void m_spin_unlock(struct m_spinlock **lock)
+{
+	struct m_spinlock *node = this_cpu_ptr(&m_node);
+	struct m_spinlock *next;
+
+	if (likely(cmpxchg(lock, node, NULL) == node))
+		return;
+
+	next = m_spin_wait_next(lock, node, NULL);
+	if (next)
+		ACCESS_ONCE(next->locked) = 1;
+}
+
+#endif
+
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -109,4 +109,22 @@ void mcs_spin_unlock(struct mcs_spinlock
 	arch_mcs_spin_unlock_contended(&next->locked);
 }
 
+/*
+ * Cancellable version of the MCS lock above.
+ *
+ * This version can fail acquisition and unqueue a spinner; it assumes no
+ * nesting.
+ *
+ * Intended for adaptive spinning of sleeping locks:
+ * mutex_lock()/rwsem_down_{read,write}() etc.
+ */
+
+struct m_spinlock {
+	struct m_spinlock *next, *prev;
+	int locked; /* 1 if lock acquired */
+};
+
+extern bool m_spin_lock(struct m_spinlock **lock);
+extern void m_spin_unlock(struct m_spinlock **lock);
+
 #endif /* __LINUX_MCS_SPINLOCK_H */
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -53,7 +53,7 @@ __mutex_init(struct mutex *lock, const c
 	INIT_LIST_HEAD(&lock->wait_list);
 	mutex_clear_owner(lock);
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
-	lock->mcs_lock = NULL;
+	lock->m_lock = NULL;
 #endif
 
 	debug_mutex_init(lock, name, key);
@@ -403,7 +403,9 @@ __mutex_lock_common(struct mutex *lock,
 	if (!mutex_can_spin_on_owner(lock))
 		goto slowpath;
 
-	mcs_spin_lock(&lock->mcs_lock);
+	if (!m_spin_lock(&lock->m_lock))
+		goto slowpath;
+
 	for (;;) {
 		struct task_struct *owner;
 
@@ -442,7 +444,7 @@ __mutex_lock_common(struct mutex *lock,
 			}
 
 			mutex_set_owner(lock);
-			mcs_spin_unlock(&lock->mcs_lock);
+			m_spin_unlock(&lock->m_lock);
 			preempt_enable();
 			return 0;
 		}
@@ -464,7 +466,7 @@ __mutex_lock_common(struct mutex *lock,
 		 */
 		arch_mutex_cpu_relax();
 	}
-	mcs_spin_unlock(&lock->mcs_lock);
+	m_spin_unlock(&lock->m_lock);
 slowpath:
 #endif
 	spin_lock_mutex(&lock->wait_lock, flags);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-03 19:25                       ` Peter Zijlstra
@ 2014-02-03 20:55                         ` Jason Low
  2014-02-03 21:06                           ` Peter Zijlstra
  2014-02-04  7:13                         ` Jason Low
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-02-03 20:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul McKenney, Waiman Long, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Rik van Riel,
	Andrew Morton, Davidlohr Bueso, H. Peter Anvin, Andi Kleen,
	Chandramouleeswaran, Aswin, Norton, Scott J, chegu_vinod

On Mon, 2014-02-03 at 20:25 +0100, Peter Zijlstra wrote:
> On Mon, Feb 03, 2014 at 10:39:20AM -0800, Jason Low wrote:
> > > To avoid the xchg on every loop.
> > 
> > Ah yes, we want to use xchg() on &node->next.
> > 
> > Since the cmpxchg() is now in a loop in the unlock function, an
> > additional (*lock == node) check before the cmpxchg() would also be nice
> > to avoid spinning on cmpxchg() there too.
> 
> Right, I have the below; you can find the patches this depends upon
> here:
> 
>   http://programming.kicks-ass.net/sekrit/patches.tar.bz2
> 
> ---
> Subject: locking, mutex: Cancelable MCS lock for adaptive spinning
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed, 29 Jan 2014 12:51:42 +0100
> 
> Since we want a task waiting for a mutex_lock() to go to sleep and
> reschedule on need_resched() we must be able to abort the
> mcs_spin_lock() around the adaptive spin.
> 
> Therefore implement a cancelable mcs lock.
> 
> XXX: anybody got a better name than m_spinlock?

So I was thinking something along the lines of
mcs_spin_lock_cancelable() as that's essentially what this function
does.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-03 20:55                         ` Jason Low
@ 2014-02-03 21:06                           ` Peter Zijlstra
  2014-02-03 21:56                             ` Jason Low
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-02-03 21:06 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Paul McKenney, Waiman Long, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Rik van Riel,
	Andrew Morton, Davidlohr Bueso, H. Peter Anvin, Andi Kleen,
	Chandramouleeswaran, Aswin, Norton, Scott J, chegu_vinod

On Mon, Feb 03, 2014 at 12:55:34PM -0800, Jason Low wrote:
> On Mon, 2014-02-03 at 20:25 +0100, Peter Zijlstra wrote:
> > XXX: anybody got a better name than m_spinlock?
> 
> So I was thinking something along the lines of
> mcs_spin_lock_cancelable() as that's essentially what this function
> does.

sure, but what do we call the data structure that goes with it? Can't
have two struct mcs_spinlock :/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-03 21:06                           ` Peter Zijlstra
@ 2014-02-03 21:56                             ` Jason Low
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Low @ 2014-02-03 21:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul McKenney, Waiman Long, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Rik van Riel,
	Andrew Morton, Davidlohr Bueso, H. Peter Anvin, Andi Kleen,
	Chandramouleeswaran, Aswin, Norton, Scott J, chegu_vinod

On Mon, 2014-02-03 at 22:06 +0100, Peter Zijlstra wrote:
> On Mon, Feb 03, 2014 at 12:55:34PM -0800, Jason Low wrote:
> > On Mon, 2014-02-03 at 20:25 +0100, Peter Zijlstra wrote:
> > > XXX: anybody got a better name than m_spinlock?
> > 
> > So I was thinking something along the lines of
> > mcs_spin_lock_cancelable() as that's essentially what this function
> > does.
> 
> sure, but what do we call the data structure that goes with it? Can't
> have two struct mcs_spinlock :/

If this structure is only going to be used for cancelable mcs locking,
what do you think of "struct cancelable_mcs_spinlock"?



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-03 19:25                       ` Peter Zijlstra
  2014-02-03 20:55                         ` Jason Low
@ 2014-02-04  7:13                         ` Jason Low
  1 sibling, 0 replies; 51+ messages in thread
From: Jason Low @ 2014-02-04  7:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul McKenney, Waiman Long, Linus Torvalds,
	Thomas Gleixner, Linux Kernel Mailing List, Rik van Riel,
	Andrew Morton, Davidlohr Bueso, H. Peter Anvin, Andi Kleen,
	Chandramouleeswaran, Aswin, Norton, Scott J, chegu_vinod

On Mon, 2014-02-03 at 20:25 +0100, Peter Zijlstra wrote:

> +void m_spin_unlock(struct m_spinlock **lock)
> +{
> +	struct m_spinlock *node = this_cpu_ptr(&m_node);
> +	struct m_spinlock *next;
> +
> +	if (likely(cmpxchg(lock, node, NULL) == node))
> +		return;

At this current point, (node->next != NULL) is a likely scenario.
Perhaps we can also add the following code here:

	next = xchg(&node->next, NULL);
	if (next) {
		ACCESS_ONCE(next->locked) = 1;
		return;
	}

> +	next = m_spin_wait_next(lock, node, NULL);
> +	if (next)
> +		ACCESS_ONCE(next->locked) = 1;
> +}



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-01-29 11:51       ` Peter Zijlstra
  2014-01-31  3:29         ` Jason Low
@ 2014-02-05 21:44         ` Waiman Long
  2014-02-06 14:04           ` Peter Zijlstra
  2014-02-06 17:44           ` Jason Low
  1 sibling, 2 replies; 51+ messages in thread
From: Waiman Long @ 2014-02-05 21:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, mingo, paulmck, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On 01/29/2014 06:51 AM, Peter Zijlstra wrote:
> On Tue, Jan 28, 2014 at 02:51:35PM -0800, Jason Low wrote:
>>> But urgh, nasty problem. Lemme ponder this a bit.
> OK, please have a very careful look at the below. It survived a boot
> with udev -- which usually stresses mutex contention enough to explode
> (in fact it did a few time when I got the contention/cancel path wrong),
> however I have not ran anything else on it.
>
> The below is an MCS variant that allows relatively cheap unqueueing. But
> its somewhat tricky and I might have gotten a case wrong, esp. the
> double concurrent cancel case got my head hurting (I didn't attempt a
> tripple unqueue).
>
> Applies to tip/master but does generate a few (harmless) compile
> warnings because I didn't fully clean up the mcs_spinlock vs m_spinlock
> thing.
>
> Also, there's a comment in the slowpath that bears consideration.
>
>

I have an alternative way of breaking out of the MCS lock waiting queue 
when need_resched() is set. I overload the locked flag to indicate a 
skipped node if negative. I run the patch through the AIM7 high-systime 
workload on a 4-socket server and it seemed to run fine.

Please check the following POC patch to see if you have any comment.
diff --git a/include/linux/mcs_spinlock.h b/include/linux/mcs_spinlock.h
index e9a4d74..84a0b32 100644
--- a/include/linux/mcs_spinlock.h
+++ b/include/linux/mcs_spinlock.h
@@ -14,17 +14,45 @@

  struct mcs_spinlock {
      struct mcs_spinlock *next;
-    int locked; /* 1 if lock acquired */
+    int locked; /* 1 if lock acquired, -1 if skipping node */
  };

+/*
+ * The node skipping feature requires the MCS node to be persistent,
+ * i.e. not allocated on stack. In addition, the MCS node cannot be
+ * reused until after the locked flag is cleared. The mcs_skip_node()
+ * macro must be defined before including this header file.
+ */
+#ifdef mcs_skip_node
+#undef arch_mcs_spin_lock_contended
+#undef arch_mcs_spin_unlock_contended
+
+#define    arch_mcs_spin_lock_contended(n)                    \
+do {                                    \
+    while (!smp_load_acquire(&(n)->locked)) {            \
+        if (mcs_skip_node(n)) {                    \
+            if (cmpxchg(&(n)->locked, 0, -1) == 0)        \
+                return;                    \
+        }                            \
+        arch_mutex_cpu_relax();                    \
+    };                                \
+} while (0)
+
+#define    arch_mcs_spin_unlock_contended(n)                \
+    if (cmpxchg(&(n)->locked, 0, 1) == 0)                \
+        return
+
+#define mcs_set_locked(n, v)    (n)->locked = (v)
+#else    /* mcs_skip_node */
+
  #ifndef arch_mcs_spin_lock_contended
  /*
   * Using smp_load_acquire() provides a memory barrier that ensures
   * subsequent operations happen after the lock is acquired.
   */
-#define arch_mcs_spin_lock_contended(l)                    \
+#define arch_mcs_spin_lock_contended(n)                    \
  do {                                    \
-    while (!(smp_load_acquire(l)))                    \
+    while (!(smp_load_acquire(&(n)->locked)))            \
          arch_mutex_cpu_relax();                    \
  } while (0)
  #endif
@@ -35,9 +63,12 @@ do {                                \
   * operations in the critical section has been completed before
   * unlocking.
   */
-#define arch_mcs_spin_unlock_contended(l)                \
-    smp_store_release((l), 1)
+#define arch_mcs_spin_unlock_contended(n)                \
+    smp_store_release(&(n)->locked, 1)
+
+#define mcs_set_locked(n, v)
  #endif
+#endif    /* mcs_skip_node */

  /*
   * Note: the smp_load_acquire/smp_store_release pair is not
@@ -77,12 +108,13 @@ void mcs_spin_lock(struct mcs_spinlock **lock, 
struct mcs_spinlock *node)
           * value won't be used. If a debug mode is needed to
           * audit lock status, then set node->locked value here.
           */
+        mcs_set_locked(node, 1);
          return;
      }
      ACCESS_ONCE(prev->next) = node;

      /* Wait until the lock holder passes the lock down. */
-    arch_mcs_spin_lock_contended(&node->locked);
+    arch_mcs_spin_lock_contended(node);
  }

  /*
@@ -94,19 +126,35 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, 
struct mcs_spinlock *node)
  {
      struct mcs_spinlock *next = ACCESS_ONCE(node->next);

+#ifdef mcs_skip_node
+check_next_node:
+#endif
      if (likely(!next)) {
          /*
           * Release the lock by setting it to NULL
           */
-        if (likely(cmpxchg(lock, node, NULL) == node))
+        if (likely(cmpxchg(lock, node, NULL) == node)) {
+            mcs_set_locked(node, 0);
              return;
+        }
          /* Wait until the next pointer is set */
          while (!(next = ACCESS_ONCE(node->next)))
              arch_mutex_cpu_relax();
      }
+    /* Clear the lock flag to indicate the node can be used again. */
+    mcs_set_locked(node, 0);

      /* Pass lock to next waiter. */
-    arch_mcs_spin_unlock_contended(&next->locked);
+    arch_mcs_spin_unlock_contended(next);
+
+#ifdef mcs_skip_node
+    /*
+     * The next node should be skipped
+     */
+    node = next;
+    next = ACCESS_ONCE(node->next);
+    goto check_next_node;
+#endif
  }

  #endif /* __LINUX_MCS_SPINLOCK_H */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 45fe1b5..6351eca 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -25,6 +25,10 @@
  #include <linux/spinlock.h>
  #include <linux/interrupt.h>
  #include <linux/debug_locks.h>
+/*
+ * Allowed an MCS node to be skipped if need_resched() is true.
+ */
+#define mcs_skip_node(n)    need_resched()
  #include <linux/mcs_spinlock.h>

  /*
@@ -45,6 +49,13 @@
   */
  #define    MUTEX_SHOW_NO_WAITER(mutex)    
(atomic_read(&(mutex)->count) >= 0)

+/*
+ * Using a single mcs node per CPU is safe because mutex_lock() should 
not be
+ * called from interrupt context and we have preemption disabled over 
the mcs
+ * lock usage.
+ */
+static DEFINE_PER_CPU(struct mcs_spinlock, m_node);
+
  void
  __mutex_init(struct mutex *lock, const char *name, struct 
lock_class_key *key)
  {
@@ -166,6 +177,9 @@ static inline int mutex_can_spin_on_owner(struct 
mutex *lock)
      struct task_struct *owner;
      int retval = 1;

+    if (need_resched())
+        return 0;
+
      rcu_read_lock();
      owner = ACCESS_ONCE(lock->owner);
      if (owner)
@@ -370,6 +384,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
      struct mutex_waiter waiter;
      unsigned long flags;
      int ret;
+    struct mcs_spinlock  *node;

      preempt_disable();
      mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
@@ -400,9 +415,13 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
      if (!mutex_can_spin_on_owner(lock))
          goto slowpath;

+    node = this_cpu_ptr(&m_node);
+    if (node->locked < 0)
+        /* MCS node not available yet */
+        goto slowpath;
+
      for (;;) {
          struct task_struct *owner;
-        struct mcs_spinlock  node;

          if (use_ww_ctx && ww_ctx->acquired > 0) {
              struct ww_mutex *ww;
@@ -424,10 +443,15 @@ __mutex_lock_common(struct mutex *lock, long 
state, unsigned int subclass,
           * If there's an owner, wait for it to either
           * release the lock or go to sleep.
           */
-        mcs_spin_lock(&lock->mcs_lock, &node);
+        mcs_spin_lock(&lock->mcs_lock, node);
+        if (node->locked < 0)
+            /*
+             * need_resched() true, no unlock needed
+             */
+            goto slowpath;
          owner = ACCESS_ONCE(lock->owner);
          if (owner && !mutex_spin_on_owner(lock, owner)) {
-            mcs_spin_unlock(&lock->mcs_lock, &node);
+            mcs_spin_unlock(&lock->mcs_lock, node);
              goto slowpath;
          }

@@ -442,11 +466,11 @@ __mutex_lock_common(struct mutex *lock, long 
state, unsigned int subclass,
              }

              mutex_set_owner(lock);
-            mcs_spin_unlock(&lock->mcs_lock, &node);
+            mcs_spin_unlock(&lock->mcs_lock, node);
              preempt_enable();
              return 0;
          }
-        mcs_spin_unlock(&lock->mcs_lock, &node);
+        mcs_spin_unlock(&lock->mcs_lock, node);

          /*
           * When there's no owner, we might have preempted between the




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-05 21:44         ` Waiman Long
@ 2014-02-06 14:04           ` Peter Zijlstra
  2014-02-06 18:45             ` Waiman Long
  2014-02-06 17:44           ` Jason Low
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-02-06 14:04 UTC (permalink / raw)
  To: Waiman Long
  Cc: Jason Low, mingo, paulmck, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On Wed, Feb 05, 2014 at 04:44:34PM -0500, Waiman Long wrote:
> I have an alternative way of breaking out of the MCS lock waiting queue when
> need_resched() is set. I overload the locked flag to indicate a skipped node
> if negative. 

I'm not quite seeing how it works (then again, I've not really read the
patch carefully).

Suppose you break out; at that point you get queued and go to sleep.
Suppose you got woken up while you MCS entry is still 'pending' and
magically win the race and acquire the lock.

At that point your MCS entry can be re-used while its still part of the
list.

Its a fantastically small race window, but I don't see anything that
makes it impossible.

> I run the patch through the AIM7 high-systime workload on a
> 4-socket server and it seemed to run fine.

How do people run this AIM7 piece of shit? I let it run for over an hour
and it generated exactly 0 numbers, it just sits there eating cpu-time
and creating a racket from my pantry.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-05 21:44         ` Waiman Long
  2014-02-06 14:04           ` Peter Zijlstra
@ 2014-02-06 17:44           ` Jason Low
  2014-02-06 18:37             ` Waiman Long
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-02-06 17:44 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, mingo, paulmck, torvalds, tglx, linux-kernel,
	riel, akpm, davidlohr, hpa, andi, aswin, scott.norton,
	chegu_vinod

On Wed, 2014-02-05 at 16:44 -0500, Waiman Long wrote:
> On 01/29/2014 06:51 AM, Peter Zijlstra wrote:
> > On Tue, Jan 28, 2014 at 02:51:35PM -0800, Jason Low wrote:
> >>> But urgh, nasty problem. Lemme ponder this a bit.
> > OK, please have a very careful look at the below. It survived a boot
> > with udev -- which usually stresses mutex contention enough to explode
> > (in fact it did a few time when I got the contention/cancel path wrong),
> > however I have not ran anything else on it.
> >
> > The below is an MCS variant that allows relatively cheap unqueueing. But
> > its somewhat tricky and I might have gotten a case wrong, esp. the
> > double concurrent cancel case got my head hurting (I didn't attempt a
> > tripple unqueue).
> >
> > Applies to tip/master but does generate a few (harmless) compile
> > warnings because I didn't fully clean up the mcs_spinlock vs m_spinlock
> > thing.
> >
> > Also, there's a comment in the slowpath that bears consideration.
> >
> >
> 
> I have an alternative way of breaking out of the MCS lock waiting queue 
> when need_resched() is set. I overload the locked flag to indicate a 
> skipped node if negative. I run the patch through the AIM7 high-systime 
> workload on a 4-socket server and it seemed to run fine.
> 
> Please check the following POC patch to see if you have any comment.

So one of the concerns I had with the approach of skipping nodes was
that, under heavy contention, we potentially could cause optimistic
spinning to be disabled on CPUs for a while since the nodes can't be
used until they have been released. One advantage of the unqueuing
method would be that nodes are usable after the spinners exit the MCS
queue and go to sleep.

Jason


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-06 17:44           ` Jason Low
@ 2014-02-06 18:37             ` Waiman Long
  0 siblings, 0 replies; 51+ messages in thread
From: Waiman Long @ 2014-02-06 18:37 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, mingo, paulmck, torvalds, tglx, linux-kernel,
	riel, akpm, davidlohr, hpa, andi, aswin, scott.norton,
	chegu_vinod

On 02/06/2014 12:44 PM, Jason Low wrote:
> On Wed, 2014-02-05 at 16:44 -0500, Waiman Long wrote:
>> On 01/29/2014 06:51 AM, Peter Zijlstra wrote:
>>> On Tue, Jan 28, 2014 at 02:51:35PM -0800, Jason Low wrote:
>>>>> But urgh, nasty problem. Lemme ponder this a bit.
>>> OK, please have a very careful look at the below. It survived a boot
>>> with udev -- which usually stresses mutex contention enough to explode
>>> (in fact it did a few time when I got the contention/cancel path wrong),
>>> however I have not ran anything else on it.
>>>
>>> The below is an MCS variant that allows relatively cheap unqueueing. But
>>> its somewhat tricky and I might have gotten a case wrong, esp. the
>>> double concurrent cancel case got my head hurting (I didn't attempt a
>>> tripple unqueue).
>>>
>>> Applies to tip/master but does generate a few (harmless) compile
>>> warnings because I didn't fully clean up the mcs_spinlock vs m_spinlock
>>> thing.
>>>
>>> Also, there's a comment in the slowpath that bears consideration.
>>>
>>>
>> I have an alternative way of breaking out of the MCS lock waiting queue
>> when need_resched() is set. I overload the locked flag to indicate a
>> skipped node if negative. I run the patch through the AIM7 high-systime
>> workload on a 4-socket server and it seemed to run fine.
>>
>> Please check the following POC patch to see if you have any comment.
> So one of the concerns I had with the approach of skipping nodes was
> that, under heavy contention, we potentially could cause optimistic
> spinning to be disabled on CPUs for a while since the nodes can't be
> used until they have been released. One advantage of the unqueuing
> method would be that nodes are usable after the spinners exit the MCS
> queue and go to sleep.
>
> Jason
>

Under heavy contention when many threads are trying to access the 
mutexes using optimistic spinning. This patch can actually reduce the 
number of wasted CPU cycles waiting in the MCS spin loop and let the 
CPUs do other useful work. So I don't see that as a negative. I think 
this kind of self-tuning is actually good for the overall throughput of 
the system.

-Longman

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-06 14:04           ` Peter Zijlstra
@ 2014-02-06 18:45             ` Waiman Long
  2014-02-06 20:10               ` Norton, Scott J
  0 siblings, 1 reply; 51+ messages in thread
From: Waiman Long @ 2014-02-06 18:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, mingo, paulmck, torvalds, tglx, linux-kernel, riel,
	akpm, davidlohr, hpa, andi, aswin, scott.norton, chegu_vinod

On 02/06/2014 09:04 AM, Peter Zijlstra wrote:
> On Wed, Feb 05, 2014 at 04:44:34PM -0500, Waiman Long wrote:
>> I have an alternative way of breaking out of the MCS lock waiting queue when
>> need_resched() is set. I overload the locked flag to indicate a skipped node
>> if negative.
> I'm not quite seeing how it works (then again, I've not really read the
> patch carefully).
>
> Suppose you break out; at that point you get queued and go to sleep.
> Suppose you got woken up while you MCS entry is still 'pending' and
> magically win the race and acquire the lock.
>
> At that point your MCS entry can be re-used while its still part of the
> list.

Actually the MCS node entry cannot be reused until an MCS queue head 
task traverses that entry and clear the locked flag. So it is possible 
that the affected CPU won't be able to participate in optimistic 
spinning for a while if the mutex is heavily contended.

> Its a fantastically small race window, but I don't see anything that
> makes it impossible.

I will send out an official patch with some performance data to solicit 
further feedback.

>> I run the patch through the AIM7 high-systime workload on a
>> 4-socket server and it seemed to run fine.
> How do people run this AIM7 piece of shit? I let it run for over an hour
> and it generated exactly 0 numbers, it just sits there eating cpu-time
> and creating a racket from my pantry.

AIM7 can be tricky to set up. Fortunately, someone in our team had done 
the ground work so I can just grab and run it.

-Longman

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-06 18:45             ` Waiman Long
@ 2014-02-06 20:10               ` Norton, Scott J
  2014-02-10 17:01                 ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Norton, Scott J @ 2014-02-06 20:10 UTC (permalink / raw)
  To: Long, Wai Man, Peter Zijlstra
  Cc: Low, Jason, mingo, paulmck, torvalds, tglx, linux-kernel, riel,
	akpm, Bueso, Davidlohr, hpa, andi, Chandramouleeswaran, Aswin,
	Vinod, Chegu

>> I run the patch through the AIM7 high-systime workload on a
>> 4-socket server and it seemed to run fine.
> How do people run this AIM7 piece of shit? I let it run for over an hour
> and it generated exactly 0 numbers, it just sits there eating cpu-time
> and creating a racket from my pantry.

./reaim -s100  -e2000 -t -j100 -i100 -f workfile.high_systime

The reaim.config file contains:

FILESIZE 10k
POOLSIZE 1m
DISKDIR /t0
DISKDIR /t1
DISKDIR /t2
DISKDIR /t3
DISKDIR /t4
DISKDIR /t5
DISKDIR /t6
DISKDIR /t7
DISKDIR /t8
DISKDIR /t9
DISKDIR /t10
DISKDIR /t11
DISKDIR /t12
DISKDIR /t13
DISKDIR /t14
DISKDIR /t15

The way Longman uses this is to create 16 ramdisk filesystems through
/dev/ram* and then mount those filesystems to the /t* directories.
Although you could run it through a regular filesystem also. It will use
whatever you place in the reaim.config file as DISKDIR. You can specify
one or more DISKDIR directories.


Scott


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued
  2014-02-06 20:10               ` Norton, Scott J
@ 2014-02-10 17:01                 ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-02-10 17:01 UTC (permalink / raw)
  To: Norton, Scott J
  Cc: Long, Wai Man, Low, Jason, mingo, paulmck, torvalds, tglx,
	linux-kernel, riel, akpm, Bueso, Davidlohr, hpa, andi,
	Chandramouleeswaran, Aswin, Vinod, Chegu

On Thu, Feb 06, 2014 at 08:10:02PM +0000, Norton, Scott J wrote:
> > How do people run this AIM7 piece of shit? I let it run for over an hour
> > and it generated exactly 0 numbers, it just sits there eating cpu-time
> > and creating a racket from my pantry.
> 
> ./reaim -s100  -e2000 -t -j100 -i100 -f workfile.high_systime
> 
> The reaim.config file contains:
> 
> FILESIZE 10k
> POOLSIZE 1m
> DISKDIR /t0
> DISKDIR /t1
> DISKDIR /t2
> DISKDIR /t3
> DISKDIR /t4
> DISKDIR /t5
> DISKDIR /t6
> DISKDIR /t7
> DISKDIR /t8
> DISKDIR /t9
> DISKDIR /t10
> DISKDIR /t11
> DISKDIR /t12
> DISKDIR /t13
> DISKDIR /t14
> DISKDIR /t15
> 
> The way Longman uses this is to create 16 ramdisk filesystems through
> /dev/ram* and then mount those filesystems to the /t* directories.
> Although you could run it through a regular filesystem also. It will use
> whatever you place in the reaim.config file as DISKDIR. You can specify
> one or more DISKDIR directories.

OK, and we're back to creating a racket but not producing useful
numbers; how long is that crap supposed to run before it gives a number?

Surely it can produce a useful number after a few minutes of runtime..
Letting it run for hours is just a waste of time and money.

Note that I'm running this on a WSM-EP with (2*6*2) 24 CPUs and 24G of ram.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [tip:core/locking] locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
  2014-01-28 19:13 ` [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched() Jason Low
  2014-01-28 20:20   ` Paul E. McKenney
  2014-01-28 21:09   ` Davidlohr Bueso
@ 2014-03-11 12:41   ` tip-bot for Jason Low
  2 siblings, 0 replies; 51+ messages in thread
From: tip-bot for Jason Low @ 2014-03-11 12:41 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, jason.low2, tglx

Commit-ID:  46af29e479cc0c1c63633007993af5292c2c3e75
Gitweb:     http://git.kernel.org/tip/46af29e479cc0c1c63633007993af5292c2c3e75
Author:     Jason Low <jason.low2@hp.com>
AuthorDate: Tue, 28 Jan 2014 11:13:12 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 11 Mar 2014 12:14:52 +0100

locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()

The mutex_can_spin_on_owner() function should also return false if the
task needs to be rescheduled to avoid entering the MCS queue when it
needs to reschedule.

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman.Long@hp.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Cc: chegu_vinod@hp.com
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1390936396-3962-2-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/mutex.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4f408be..e6d646b 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -166,6 +166,9 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 	struct task_struct *owner;
 	int retval = 1;
 
+	if (need_resched())
+		return 0;
+
 	rcu_read_lock();
 	owner = ACCESS_ONCE(lock->owner);
 	if (owner)

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [tip:core/locking] locking/mutexes: Modify the way optimistic spinners are queued
  2014-01-28 19:13 ` [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued Jason Low
  2014-01-28 20:23   ` Paul E. McKenney
@ 2014-03-11 12:41   ` tip-bot for Jason Low
  2014-03-11 15:24     ` Jason Low
  1 sibling, 1 reply; 51+ messages in thread
From: tip-bot for Jason Low @ 2014-03-11 12:41 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, jason.low2, tglx

Commit-ID:  47667fa1502e4d759df87e9cc7fbc0f202483361
Gitweb:     http://git.kernel.org/tip/47667fa1502e4d759df87e9cc7fbc0f202483361
Author:     Jason Low <jason.low2@hp.com>
AuthorDate: Tue, 28 Jan 2014 11:13:13 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 11 Mar 2014 12:14:53 +0100

locking/mutexes: Modify the way optimistic spinners are queued

The mutex->spin_mlock was introduced in order to ensure that only 1 thread
spins for lock acquisition at a time to reduce cache line contention. When
lock->owner is NULL and the lock->count is still not 1, the spinner(s) will
continually release and obtain the lock->spin_mlock. This can generate
quite a bit of overhead/contention, and also might just delay the spinner
from getting the lock.

This patch modifies the way optimistic spinners are queued by queuing before
entering the optimistic spinning loop as oppose to acquiring before every
call to mutex_spin_on_owner(). So in situations where the spinner requires
a few extra spins before obtaining the lock, then there will only be 1 spinner
trying to get the lock and it will avoid the overhead from unnecessarily
unlocking and locking the spin_mlock.

Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Cc: chegu_vinod@hp.com
Cc: Waiman.Long@hp.com
Cc: paulmck@linux.vnet.ibm.com
Cc: torvalds@linux-foundation.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390936396-3962-3-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/mutex.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index e6d646b..82dad2c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -403,9 +403,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	if (!mutex_can_spin_on_owner(lock))
 		goto slowpath;
 
+	mcs_spin_lock(&lock->mcs_lock, &node);
 	for (;;) {
 		struct task_struct *owner;
-		struct mcs_spinlock  node;
 
 		if (use_ww_ctx && ww_ctx->acquired > 0) {
 			struct ww_mutex *ww;
@@ -420,19 +420,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			 * performed the optimistic spinning cannot be done.
 			 */
 			if (ACCESS_ONCE(ww->ctx))
-				goto slowpath;
+				break;
 		}
 
 		/*
 		 * If there's an owner, wait for it to either
 		 * release the lock or go to sleep.
 		 */
-		mcs_spin_lock(&lock->mcs_lock, &node);
 		owner = ACCESS_ONCE(lock->owner);
-		if (owner && !mutex_spin_on_owner(lock, owner)) {
-			mcs_spin_unlock(&lock->mcs_lock, &node);
-			goto slowpath;
-		}
+		if (owner && !mutex_spin_on_owner(lock, owner))
+			break;
 
 		if ((atomic_read(&lock->count) == 1) &&
 		    (atomic_cmpxchg(&lock->count, 1, 0) == 1)) {
@@ -449,7 +446,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			preempt_enable();
 			return 0;
 		}
-		mcs_spin_unlock(&lock->mcs_lock, &node);
 
 		/*
 		 * When there's no owner, we might have preempted between the
@@ -458,7 +454,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * the owner complete.
 		 */
 		if (!owner && (need_resched() || rt_task(task)))
-			goto slowpath;
+			break;
 
 		/*
 		 * The cpu_relax() call is a compiler barrier which forces
@@ -468,6 +464,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 */
 		arch_mutex_cpu_relax();
 	}
+	mcs_spin_unlock(&lock->mcs_lock, &node);
 slowpath:
 #endif
 	spin_lock_mutex(&lock->wait_lock, flags);

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [tip:core/locking] locking/mutexes: Unlock the mutex without the wait_lock
  2014-01-28 19:13 ` [PATCH v2 3/5] mutex: Unlock the mutex without the wait_lock Jason Low
@ 2014-03-11 12:41   ` tip-bot for Jason Low
  2014-03-12 12:24     ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: tip-bot for Jason Low @ 2014-03-11 12:41 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, jason.low2, tglx

Commit-ID:  1d8fe7dc8078b23e060ec62ccb4cdc1ac3c41bf8
Gitweb:     http://git.kernel.org/tip/1d8fe7dc8078b23e060ec62ccb4cdc1ac3c41bf8
Author:     Jason Low <jason.low2@hp.com>
AuthorDate: Tue, 28 Jan 2014 11:13:14 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 11 Mar 2014 12:14:54 +0100

locking/mutexes: Unlock the mutex without the wait_lock

When running workloads that have high contention in mutexes on an 8 socket
machine, mutex spinners would often spin for a long time with no lock owner.

The main reason why this is occuring is in __mutex_unlock_common_slowpath(),
if __mutex_slowpath_needs_to_unlock(), then the owner needs to acquire the
mutex->wait_lock before releasing the mutex (setting lock->count to 1). When
the wait_lock is contended, this delays the mutex from being released.
We should be able to release the mutex without holding the wait_lock.

Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: chegu_vinod@hp.com
Cc: paulmck@linux.vnet.ibm.com
Cc: Waiman.Long@hp.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390936396-3962-4-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/mutex.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 82dad2c..dc3d6f2 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -671,10 +671,6 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
 	struct mutex *lock = container_of(lock_count, struct mutex, count);
 	unsigned long flags;
 
-	spin_lock_mutex(&lock->wait_lock, flags);
-	mutex_release(&lock->dep_map, nested, _RET_IP_);
-	debug_mutex_unlock(lock);
-
 	/*
 	 * some architectures leave the lock unlocked in the fastpath failure
 	 * case, others need to leave it locked. In the later case we have to
@@ -683,6 +679,10 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
 	if (__mutex_slowpath_needs_to_unlock())
 		atomic_set(&lock->count, 1);
 
+	spin_lock_mutex(&lock->wait_lock, flags);
+	mutex_release(&lock->dep_map, nested, _RET_IP_);
+	debug_mutex_unlock(lock);
+
 	if (!list_empty(&lock->wait_list)) {
 		/* get the first entry from the wait-list: */
 		struct mutex_waiter *waiter =

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [tip:core/locking] locking/mutexes: Modify the way optimistic spinners are queued
  2014-03-11 12:41   ` [tip:core/locking] locking/mutexes: " tip-bot for Jason Low
@ 2014-03-11 15:24     ` Jason Low
  2014-03-11 15:33       ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Low @ 2014-03-11 15:24 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, peterz, tglx; +Cc: linux-tip-commits

On Tue, 2014-03-11 at 05:41 -0700, tip-bot for Jason Low wrote:

> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index e6d646b..82dad2c 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -403,9 +403,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	if (!mutex_can_spin_on_owner(lock))
>  		goto slowpath;
>  
> +	mcs_spin_lock(&lock->mcs_lock, &node);
>  	for (;;) {
>  		struct task_struct *owner;
> -		struct mcs_spinlock  node;

Hi Peter, Ingo,

The "struct mcs_spinlock node" still needs to be moved to the beginning
of __mutex_lock_common() right?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [tip:core/locking] locking/mutexes: Modify the way optimistic spinners are queued
  2014-03-11 15:24     ` Jason Low
@ 2014-03-11 15:33       ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-03-11 15:33 UTC (permalink / raw)
  To: Jason Low; +Cc: mingo, hpa, linux-kernel, tglx, linux-tip-commits

On Tue, Mar 11, 2014 at 08:24:20AM -0700, Jason Low wrote:
> On Tue, 2014-03-11 at 05:41 -0700, tip-bot for Jason Low wrote:
> 
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index e6d646b..82dad2c 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -403,9 +403,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> >  	if (!mutex_can_spin_on_owner(lock))
> >  		goto slowpath;
> >  
> > +	mcs_spin_lock(&lock->mcs_lock, &node);
> >  	for (;;) {
> >  		struct task_struct *owner;
> > -		struct mcs_spinlock  node;
> 
> Hi Peter, Ingo,
> 
> The "struct mcs_spinlock node" still needs to be moved to the beginning
> of __mutex_lock_common() right?

Crud; I thought I fixed that since you pointed that out the last time.

Anyway; later patches make the argument go away, so its a bisect fail at
worst, not sure that's worth fixing at this point :/

sorry about that.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [tip:core/locking] locking/mutexes: Unlock the mutex without the wait_lock
  2014-03-11 12:41   ` [tip:core/locking] locking/mutexes: " tip-bot for Jason Low
@ 2014-03-12 12:24     ` Peter Zijlstra
  2014-03-12 18:44       ` Jason Low
  2014-03-13  7:28       ` [tip:core/locking] locking/mutex: Fix debug checks tip-bot for Peter Zijlstra
  0 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-03-12 12:24 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, tglx, jason.low2; +Cc: linux-tip-commits

On Tue, Mar 11, 2014 at 05:41:23AM -0700, tip-bot for Jason Low wrote:
>  kernel/locking/mutex.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 82dad2c..dc3d6f2 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -671,10 +671,6 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
>  	struct mutex *lock = container_of(lock_count, struct mutex, count);
>  	unsigned long flags;
>  
> -	spin_lock_mutex(&lock->wait_lock, flags);
> -	mutex_release(&lock->dep_map, nested, _RET_IP_);
> -	debug_mutex_unlock(lock);
> -
>  	/*
>  	 * some architectures leave the lock unlocked in the fastpath failure
>  	 * case, others need to leave it locked. In the later case we have to
> @@ -683,6 +679,10 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
>  	if (__mutex_slowpath_needs_to_unlock())
>  		atomic_set(&lock->count, 1);
>  
> +	spin_lock_mutex(&lock->wait_lock, flags);
> +	mutex_release(&lock->dep_map, nested, _RET_IP_);
> +	debug_mutex_unlock(lock);
> +
>  	if (!list_empty(&lock->wait_list)) {
>  		/* get the first entry from the wait-list: */
>  		struct mutex_waiter *waiter =

OK, so this patch generates:

WARNING: CPU: 0 PID: 139 at /usr/src/linux-2.6/kernel/locking/mutex-debug.c:82 debug_mutex_unlock+0x155/0x180()
DEBUG_LOCKS_WARN_ON(lock->owner != current)

for kernels with CONFIG_DEBUG_MUTEXES=y

And that makes sense, because as soon as we release the lock a new owner
can come in.

One would think that !__mutex_slowpath_needs_to_unlock() implementations
suffer the same, but for DEBUG we fall back to mutex-null.h which has an
unconditional 1 for that.

How about something like the below; will test after lunch.

---
Subject: locking/mutex: Fix debug checks

The mutex debug code requires the mutex to be unlocked after doing the
debug checks, otherwise it can find inconsistent state.

Fixes: 1d8fe7dc8078 ("locking/mutexes: Unlock the mutex without the wait_lock")
Almost-Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/locking/mutex-debug.c | 6 ++++++
 kernel/locking/mutex.c       | 7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index faf6f5b53e77..e1191c996c59 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -83,6 +83,12 @@ void debug_mutex_unlock(struct mutex *lock)
 
 	DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
 	mutex_clear_owner(lock);
+
+	/*
+	 * __mutex_slowpath_needs_to_unlock() is explicitly 0 for debug
+	 * mutexes so that we can do it here after we've verified state.
+	 */
+	atomic_set(&lock->count, 1);
 }
 
 void debug_mutex_init(struct mutex *lock, const char *name,
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 26c96142caac..e6fa88b64b17 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -34,6 +34,13 @@
 #ifdef CONFIG_DEBUG_MUTEXES
 # include "mutex-debug.h"
 # include <asm-generic/mutex-null.h>
+/*
+ * Must be 0 for the debug case so we do not do the unlock outside of the
+ * wait_lock region. debug_mutex_unlock() will do the actual unlock in this
+ * case.
+ */
+# undef __mutex_slowpath_needs_to_unlock
+# define  __mutex_slowpath_needs_to_unlock()	0
 #else
 # include "mutex.h"
 # include <asm/mutex.h>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [tip:core/locking] locking/mutexes: Unlock the mutex without the wait_lock
  2014-03-12 12:24     ` Peter Zijlstra
@ 2014-03-12 18:44       ` Jason Low
  2014-03-13  7:28       ` [tip:core/locking] locking/mutex: Fix debug checks tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 51+ messages in thread
From: Jason Low @ 2014-03-12 18:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, hpa, linux-kernel, tglx, linux-tip-commits

On Wed, 2014-03-12 at 13:24 +0100, Peter Zijlstra wrote:
> On Tue, Mar 11, 2014 at 05:41:23AM -0700, tip-bot for Jason Low wrote:
> >  kernel/locking/mutex.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index 82dad2c..dc3d6f2 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -671,10 +671,6 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
> >  	struct mutex *lock = container_of(lock_count, struct mutex, count);
> >  	unsigned long flags;
> >  
> > -	spin_lock_mutex(&lock->wait_lock, flags);
> > -	mutex_release(&lock->dep_map, nested, _RET_IP_);
> > -	debug_mutex_unlock(lock);
> > -

With the latest patch, perhaps it may be helpful to also add to the
comment below, mentioning that it doesn't unlock here for debug mutexes
since the following debug_mutex_unlock() would handle the unlocking
instead.

> >  	/*
> >  	 * some architectures leave the lock unlocked in the fastpath failure
> >  	 * case, others need to leave it locked. In the later case we have to
> > @@ -683,6 +679,10 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
> >  	if (__mutex_slowpath_needs_to_unlock())
> >


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [tip:core/locking] locking/mutex: Fix debug checks
  2014-03-12 12:24     ` Peter Zijlstra
  2014-03-12 18:44       ` Jason Low
@ 2014-03-13  7:28       ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 51+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-03-13  7:28 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, tglx

Commit-ID:  6f008e72cd111a119b5d8de8c5438d892aae99eb
Gitweb:     http://git.kernel.org/tip/6f008e72cd111a119b5d8de8c5438d892aae99eb
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 12 Mar 2014 13:24:42 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 12 Mar 2014 13:49:47 +0100

locking/mutex: Fix debug checks

OK, so commit:

  1d8fe7dc8078 ("locking/mutexes: Unlock the mutex without the wait_lock")

generates this boot warning when CONFIG_DEBUG_MUTEXES=y:

  WARNING: CPU: 0 PID: 139 at /usr/src/linux-2.6/kernel/locking/mutex-debug.c:82 debug_mutex_unlock+0x155/0x180() DEBUG_LOCKS_WARN_ON(lock->owner != current)

And that makes sense, because as soon as we release the lock a
new owner can come in...

One would think that !__mutex_slowpath_needs_to_unlock()
implementations suffer the same, but for DEBUG we fall back to
mutex-null.h which has an unconditional 1 for that.

The mutex debug code requires the mutex to be unlocked after
doing the debug checks, otherwise it can find inconsistent
state.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: jason.low2@hp.com
Link: http://lkml.kernel.org/r/20140312122442.GB27965@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/mutex-debug.c | 6 ++++++
 kernel/locking/mutex.c       | 7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index faf6f5b..e1191c9 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -83,6 +83,12 @@ void debug_mutex_unlock(struct mutex *lock)
 
 	DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
 	mutex_clear_owner(lock);
+
+	/*
+	 * __mutex_slowpath_needs_to_unlock() is explicitly 0 for debug
+	 * mutexes so that we can do it here after we've verified state.
+	 */
+	atomic_set(&lock->count, 1);
 }
 
 void debug_mutex_init(struct mutex *lock, const char *name,
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 02c61a9..14fe72c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -34,6 +34,13 @@
 #ifdef CONFIG_DEBUG_MUTEXES
 # include "mutex-debug.h"
 # include <asm-generic/mutex-null.h>
+/*
+ * Must be 0 for the debug case so we do not do the unlock outside of the
+ * wait_lock region. debug_mutex_unlock() will do the actual unlock in this
+ * case.
+ */
+# undef __mutex_slowpath_needs_to_unlock
+# define  __mutex_slowpath_needs_to_unlock()	0
 #else
 # include "mutex.h"
 # include <asm/mutex.h>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH v2 5/5] mutex: Give spinners a chance to  spin_on_owner if need_resched() triggered while queued
@ 2014-02-06 14:52 Daniel J Blueman
  0 siblings, 0 replies; 51+ messages in thread
From: Daniel J Blueman @ 2014-02-06 14:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Waiman Long

On Thursday, 6 February 2014 22:10:01 UTC+8, Peter Zijlstra  wrote:
 > On Wed, Feb 05, 2014 at 04:44:34PM -0500, Waiman Long wrote:
 > > I have an alternative way of breaking out of the MCS lock waiting 
queue when
 > > need_resched() is set. I overload the locked flag to indicate a 
skipped node
 > > if negative.
 >
 > I'm not quite seeing how it works (then again, I've not really read the
 > patch carefully).
 >
 > Suppose you break out; at that point you get queued and go to sleep.
 > Suppose you got woken up while you MCS entry is still 'pending' and
 > magically win the race and acquire the lock.
 >
 > At that point your MCS entry can be re-used while its still part of the
 > list.
 >
 > Its a fantastically small race window, but I don't see anything that
 > makes it impossible.
 >
 > > I run the patch through the AIM7 high-systime workload on a
 > > 4-socket server and it seemed to run fine.
 >
 > How do people run this AIM7 piece of shit? I let it run for over an hour
 > and it generated exactly 0 numbers, it just sits there eating cpu-time
 > and creating a racket from my pantry.

Without any better advice, I was building the OSDL AIM7 [1], tweaking 
DISKDIR in data/reaim.config and running (eg on a 384-core setup):
$ src/reaim -c data/reaim.config -f data/workfile.compute -i 16 -e 384

Thanks,
   Daniel

[1] http://sourceforge.net/projects/re-aim-7/
-- 
Daniel J Blueman
Principal Software Engineer, Numascale

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2014-03-13  7:28 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-28 19:13 [PATCH v2 0/5] mutex: Mutex scalability patches Jason Low
2014-01-28 19:13 ` [PATCH v2 1/5] mutex: In mutex_can_spin_on_owner(), return false if task need_resched() Jason Low
2014-01-28 20:20   ` Paul E. McKenney
2014-01-28 22:01     ` Jason Low
2014-01-28 21:09   ` Davidlohr Bueso
2014-03-11 12:41   ` [tip:core/locking] locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner() tip-bot for Jason Low
2014-01-28 19:13 ` [PATCH v2 2/5] mutex: Modify the way optimistic spinners are queued Jason Low
2014-01-28 20:23   ` Paul E. McKenney
2014-01-28 20:24     ` Paul E. McKenney
2014-01-28 21:17     ` Davidlohr Bueso
2014-01-28 22:10     ` Jason Low
2014-02-02 21:58       ` Paul E. McKenney
2014-03-11 12:41   ` [tip:core/locking] locking/mutexes: " tip-bot for Jason Low
2014-03-11 15:24     ` Jason Low
2014-03-11 15:33       ` Peter Zijlstra
2014-01-28 19:13 ` [PATCH v2 3/5] mutex: Unlock the mutex without the wait_lock Jason Low
2014-03-11 12:41   ` [tip:core/locking] locking/mutexes: " tip-bot for Jason Low
2014-03-12 12:24     ` Peter Zijlstra
2014-03-12 18:44       ` Jason Low
2014-03-13  7:28       ` [tip:core/locking] locking/mutex: Fix debug checks tip-bot for Peter Zijlstra
2014-01-28 19:13 ` [RFC][PATCH v2 4/5] mutex: Disable preemtion between modifying lock->owner and locking/unlocking mutex Jason Low
2014-01-28 20:54   ` Peter Zijlstra
2014-01-28 22:17     ` Jason Low
2014-01-28 19:13 ` [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued Jason Low
2014-01-28 21:07   ` Peter Zijlstra
2014-01-28 22:51     ` Jason Low
2014-01-29 11:51       ` Peter Zijlstra
2014-01-31  3:29         ` Jason Low
2014-01-31 14:09           ` Peter Zijlstra
2014-01-31 20:01             ` Jason Low
2014-01-31 20:08               ` Peter Zijlstra
2014-02-02 21:01                 ` Jason Low
2014-02-02 21:12                   ` Peter Zijlstra
2014-02-03 18:39                     ` Jason Low
2014-02-03 19:25                       ` Peter Zijlstra
2014-02-03 20:55                         ` Jason Low
2014-02-03 21:06                           ` Peter Zijlstra
2014-02-03 21:56                             ` Jason Low
2014-02-04  7:13                         ` Jason Low
2014-02-02 22:02                 ` Paul E. McKenney
2014-02-02 20:02             ` Peter Zijlstra
2014-02-05 21:44         ` Waiman Long
2014-02-06 14:04           ` Peter Zijlstra
2014-02-06 18:45             ` Waiman Long
2014-02-06 20:10               ` Norton, Scott J
2014-02-10 17:01                 ` Peter Zijlstra
2014-02-06 17:44           ` Jason Low
2014-02-06 18:37             ` Waiman Long
2014-01-28 21:08 ` [PATCH v2 0/5] mutex: Mutex scalability patches Davidlohr Bueso
2014-01-28 23:11   ` Jason Low
2014-02-06 14:52 [RFC][PATCH v2 5/5] mutex: Give spinners a chance to spin_on_owner if need_resched() triggered while queued Daniel J Blueman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).