linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
@ 2016-01-21  9:29 Ding Tianhong
  2016-01-21 21:23 ` Tim Chen
  2016-01-21 23:02 ` Waiman Long
  0 siblings, 2 replies; 37+ messages in thread
From: Ding Tianhong @ 2016-01-21  9:29 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel
  Cc: Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long

I build a script to create several process for ioctl loop calling,
the ioctl will calling the kernel function just like:
xx_ioctl {
...
rtnl_lock();
function();
rtnl_unlock();
...
}
The function may sleep several ms, but will not halt, at the same time
another user service may calling ifconfig to change the state of the
ethernet, and after several hours, the hung task thread report this problem:

========================================================================
149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
[149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
[149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
[149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
[149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
[149738.042290] Call Trace:
[149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
[149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
[149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
[149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
[149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
[149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
[149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
[149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
[149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
[149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0

================================ cut here ================================

I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
slow path and found that the mutex may spin on owner ignore whether the  wait list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/mutex.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..596b341 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 	struct task_struct *owner;
 	int retval = 1;
 
-	if (need_resched())
+	if (need_resched() || atomic_read(&lock->count) == -1)
 		return 0;
 
 	rcu_read_lock();
@@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
 /*
  * Optimistic spinning.
  *
- * We try to spin for acquisition when we find that the lock owner
- * is currently running on a (different) CPU and while we don't
- * need to reschedule. The rationale is that if the lock owner is
- * running, it is likely to release the lock soon.
+ * We try to spin for acquisition when we find that there are no
+ * pending waiters and the lock owner is currently running on a
+ * (different) CPU and while we don't need to reschedule. The
+ * rationale is that if the lock owner is running, it is likely
+ * to release the lock soon.
  *
  * Since this needs the lock owner, and this mutex implementation
  * doesn't track the owner atomically in the lock field, we need to
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21  9:29 [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Ding Tianhong
@ 2016-01-21 21:23 ` Tim Chen
  2016-01-22  2:41   ` Paul E. McKenney
  2016-01-21 23:02 ` Waiman Long
  1 sibling, 1 reply; 37+ messages in thread
From: Tim Chen @ 2016-01-21 21:23 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Waiman Long

On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote:

> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
>  	struct task_struct *owner;
>  	int retval = 1;
>  
> -	if (need_resched())
> +	if (need_resched() || atomic_read(&lock->count) == -1)
>  		return 0;
>  

One concern I have is this change will eliminate any optimistic spinning
as long as there is a waiter.  Is there a middle ground that we
can allow only one spinner if there are waiters?  

In other words, we allow spinning when
atomic_read(&lock->count) == -1 but there is no one on the
osq lock that queue up the spinners (i.e. no other process doing
optimistic spinning).

This could allow a bit of spinning without starving out the waiters.

Tim

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21  9:29 [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Ding Tianhong
  2016-01-21 21:23 ` Tim Chen
@ 2016-01-21 23:02 ` Waiman Long
  2016-01-22  6:09   ` Davidlohr Bueso
  2016-01-22  8:54   ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Peter Zijlstra
  1 sibling, 2 replies; 37+ messages in thread
From: Waiman Long @ 2016-01-21 23:02 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

[-- Attachment #1: Type: text/plain, Size: 4432 bytes --]

On 01/21/2016 04:29 AM, Ding Tianhong wrote:
> I build a script to create several process for ioctl loop calling,
> the ioctl will calling the kernel function just like:
> xx_ioctl {
> ...
> rtnl_lock();
> function();
> rtnl_unlock();
> ...
> }
> The function may sleep several ms, but will not halt, at the same time
> another user service may calling ifconfig to change the state of the
> ethernet, and after several hours, the hung task thread report this problem:
>
> ========================================================================
> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
> [149738.040597] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
> [149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
> [149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
> [149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
> [149738.042290] Call Trace:
> [149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
> [149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
> [149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
> [149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
> [149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
> [149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
> [149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
> [149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
> [149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
> [149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0
>
> ================================ cut here ================================
>
> I got the vmcore and found that the ifconfig is already in the wait_list of the
> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
> normally several times in one second, so it means that my process jump the
> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
> slow path and found that the mutex may spin on owner ignore whether the  wait list
> is empty, it will cause the task in the wait list always be cut in line, so add
> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
>
> Signed-off-by: Ding Tianhong<dingtianhong@huawei.com>
> Cc: Ingo Molnar<mingo@redhat.com>
> Cc: Peter Zijlstra<peterz@infradead.org>
> Cc: Davidlohr Bueso<dave@stgolabs.net>
> Cc: Linus Torvalds<torvalds@linux-foundation.org>
> Cc: Paul E. McKenney<paulmck@us.ibm.com>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: Will Deacon<will.deacon@arm.com>
> Cc: Jason Low<jason.low2@hp.com>
> Cc: Tim Chen<tim.c.chen@linux.intel.com>
> Cc: Waiman Long<Waiman.Long@hp.com>
> ---
>   kernel/locking/mutex.c | 11 ++++++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
>   	struct task_struct *owner;
>   	int retval = 1;
>
> -	if (need_resched())
> +	if (need_resched() || atomic_read(&lock->count) == -1)
>   		return 0;
>
>   	rcu_read_lock();
> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
>   /*
>    * Optimistic spinning.
>    *
> - * We try to spin for acquisition when we find that the lock owner
> - * is currently running on a (different) CPU and while we don't
> - * need to reschedule. The rationale is that if the lock owner is
> - * running, it is likely to release the lock soon.
> + * We try to spin for acquisition when we find that there are no
> + * pending waiters and the lock owner is currently running on a
> + * (different) CPU and while we don't need to reschedule. The
> + * rationale is that if the lock owner is running, it is likely
> + * to release the lock soon.
>    *
>    * Since this needs the lock owner, and this mutex implementation
>    * doesn't track the owner atomically in the lock field, we need to

This patch will largely defeat the performance benefit of optimistic 
spinning. I have an alternative solution to this live-lock problem. 
Would you mind trying out the attached patch to see if it can fix your 
problem?

Cheers,
Longman


[-- Attachment #2: 0001-locking-mutex-Enable-optimistic-spinning-of-woken-ta.patch --]
[-- Type: text/plain, Size: 5460 bytes --]

From 1bbb5a4434d395f48163abc5435c5c720a15d327 Mon Sep 17 00:00:00 2001
From: Waiman Long <Waiman.Long@hpe.com>
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list

Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 include/linux/mutex.h  |    2 +
 kernel/locking/mutex.c |   95 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 2cb7531..2c55ecd 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -57,6 +57,8 @@ struct mutex {
 #endif
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 	struct optimistic_spin_queue osq; /* Spinner MCS lock */
+	/* Set if wait list head actively spinning */
+	int			wlh_spinning;
 #endif
 #ifdef CONFIG_DEBUG_MUTEXES
 	void			*magic;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..8b27b03 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -55,6 +55,7 @@ __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
 	mutex_clear_owner(lock);
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 	osq_lock_init(&lock->osq);
+	lock->wlh_spinning = false;
 #endif
 
 	debug_mutex_init(lock, name, key);
@@ -346,8 +347,12 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		if (owner && !mutex_spin_on_owner(lock, owner))
 			break;
 
-		/* Try to acquire the mutex if it is unlocked. */
-		if (mutex_try_to_acquire(lock)) {
+		/*
+		 * Try to acquire the mutex if it is unlocked and the wait
+		 * list head isn't spinning on the lock.
+		 */
+		if (!READ_ONCE(lock->wlh_spinning) &&
+		    mutex_try_to_acquire(lock)) {
 			lock_acquired(&lock->dep_map, ip);
 
 			if (use_ww_ctx) {
@@ -398,12 +403,91 @@ done:
 
 	return false;
 }
+
+/*
+ * Wait list head optimistic spinning
+ *
+ * The wait list head, when woken up, will try to spin on the lock if the
+ * lock owner is active. It will also set the wlh_spinning flag to give
+ * itself a higher chance of getting the lock than the other optimisically
+ * spinning locker in the OSQ.
+ */
+static bool mutex_wlh_opt_spin(struct mutex *lock,
+			       struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+{
+	struct task_struct *owner, *task = current;
+	int gotlock = false;
+
+	WRITE_ONCE(lock->wlh_spinning, true);
+	while (true) {
+		if (use_ww_ctx && ww_ctx->acquired > 0) {
+			struct ww_mutex *ww;
+
+			ww = container_of(lock, struct ww_mutex, base);
+			/*
+			 * If ww->ctx is set the contents are undefined, only
+			 * by acquiring wait_lock there is a guarantee that
+			 * they are not invalid when reading.
+			 *
+			 * As such, when deadlock detection needs to be
+			 * performed the optimistic spinning cannot be done.
+			 */
+			if (READ_ONCE(ww->ctx))
+				break;
+		}
+
+		/*
+		 * If there's an owner, wait for it to either
+		 * release the lock or go to sleep.
+		 */
+		owner = READ_ONCE(lock->owner);
+		if (owner && !mutex_spin_on_owner(lock, owner))
+			break;
+
+		/*
+		 * Try to acquire the mutex if it is unlocked. The mutex
+		 * value is set to -1 which will be changed to 0 later on
+		 * if the wait list becomes empty.
+		 */
+		if (!mutex_is_locked(lock) &&
+		   (atomic_cmpxchg_acquire(&lock->count, 1, -1) == 1)) {
+			gotlock = true;
+			break;
+		}
+
+		/*
+		 * When there's no owner, we might have preempted between the
+		 * owner acquiring the lock and setting the owner field. If
+		 * we're an RT task that will live-lock because we won't let
+		 * the owner complete.
+		 */
+		if (!owner && (need_resched() || rt_task(task)))
+			break;
+
+		/*
+		 * The cpu_relax() call is a compiler barrier which forces
+		 * everything in this loop to be re-loaded. We don't need
+		 * memory barriers as we'll eventually observe the right
+		 * values at the cost of a few extra spins.
+		 */
+		cpu_relax_lowlatency();
+
+	}
+	WRITE_ONCE(lock->wlh_spinning, false);
+	return gotlock;
+}
 #else
 static bool mutex_optimistic_spin(struct mutex *lock,
 				  struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
 {
 	return false;
 }
+
+static bool mutex_wlh_opt_spin(struct mutex *lock,
+			       struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+{
+	return false;
+}
 #endif
 
 __visible __used noinline
@@ -543,6 +627,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_contended(&lock->dep_map, ip);
 
 	for (;;) {
+		int gotlock;
+
 		/*
 		 * Lets try to take the lock again - this is needed even if
 		 * we get here for the first time (shortly after failing to
@@ -577,7 +663,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
+
+		/* optimistically spinning on the mutex without the wait lock */
+		gotlock = mutex_wlh_opt_spin(lock, ww_ctx, use_ww_ctx);
 		spin_lock_mutex(&lock->wait_lock, flags);
+		if (gotlock)
+			break;
 	}
 	__set_task_state(task, TASK_RUNNING);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21 21:23 ` Tim Chen
@ 2016-01-22  2:41   ` Paul E. McKenney
  2016-01-22  2:48     ` Davidlohr Bueso
  0 siblings, 1 reply; 37+ messages in thread
From: Paul E. McKenney @ 2016-01-22  2:41 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Thomas Gleixner, Will Deacon,
	Jason Low, Waiman Long

On Thu, Jan 21, 2016 at 01:23:09PM -0800, Tim Chen wrote:
> On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote:
> 
> > 
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index 0551c21..596b341 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
> >  	struct task_struct *owner;
> >  	int retval = 1;
> >  
> > -	if (need_resched())
> > +	if (need_resched() || atomic_read(&lock->count) == -1)
> >  		return 0;
> >  
> 
> One concern I have is this change will eliminate any optimistic spinning
> as long as there is a waiter.  Is there a middle ground that we
> can allow only one spinner if there are waiters?  
> 
> In other words, we allow spinning when
> atomic_read(&lock->count) == -1 but there is no one on the
> osq lock that queue up the spinners (i.e. no other process doing
> optimistic spinning).
> 
> This could allow a bit of spinning without starving out the waiters.

I did some testing, which exposed it to the 0day test robot, which
did note some performance differences.  I was hoping that it would
clear up some instability from other patches, but no such luck.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22  2:41   ` Paul E. McKenney
@ 2016-01-22  2:48     ` Davidlohr Bueso
  2016-01-22  3:13       ` Paul E. McKenney
  0 siblings, 1 reply; 37+ messages in thread
From: Davidlohr Bueso @ 2016-01-22  2:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Tim Chen, Ding Tianhong, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Linus Torvalds, Thomas Gleixner, Will Deacon,
	Jason Low, Waiman Long

On Thu, 21 Jan 2016, Paul E. McKenney wrote:

>I did some testing, which exposed it to the 0day test robot, which
>did note some performance differences.  I was hoping that it would
>clear up some instability from other patches, but no such luck.  ;-)

Oh, that explains why we got a performance regression report :)

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22  2:48     ` Davidlohr Bueso
@ 2016-01-22  3:13       ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2016-01-22  3:13 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Tim Chen, Ding Tianhong, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Linus Torvalds, Thomas Gleixner, Will Deacon,
	Jason Low, Waiman Long

On Thu, Jan 21, 2016 at 06:48:54PM -0800, Davidlohr Bueso wrote:
> On Thu, 21 Jan 2016, Paul E. McKenney wrote:
> 
> >I did some testing, which exposed it to the 0day test robot, which
> >did note some performance differences.  I was hoping that it would
> >clear up some instability from other patches, but no such luck.  ;-)
> 
> Oh, that explains why we got a performance regression report :)

Plus I suspected that you wanted some extra email.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21 23:02 ` Waiman Long
@ 2016-01-22  6:09   ` Davidlohr Bueso
  2016-01-22 13:38     ` Waiman Long
  2016-01-22  8:54   ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: Davidlohr Bueso @ 2016-01-22  6:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On Thu, 21 Jan 2016, Waiman Long wrote:

>On 01/21/2016 04:29 AM, Ding Tianhong wrote:

>>I got the vmcore and found that the ifconfig is already in the wait_list of the
>>rtnl_lock for 120 second, but my process could get and release the rtnl_lock
>>normally several times in one second, so it means that my process jump the
>>queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
>>slow path and found that the mutex may spin on owner ignore whether the  wait list
>>is empty, it will cause the task in the wait list always be cut in line, so add
>>test for wait list in the mutex_can_spin_on_owner and avoid this problem.

So this has been somewhat always known, at least in theory, until now. It's the cost
of spinning without going through the wait-queue, unlike other locks.

>> [...]

>From: Waiman Long <Waiman.Long@hpe.com>
>Date: Thu, 21 Jan 2016 17:53:14 -0500
>Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list
>
>Ding Tianhong reported a live-lock situation where a constant stream
>of incoming optimistic spinners blocked a task in the wait list from
>getting the mutex.
>
>This patch attempts to fix this live-lock condition by enabling the
>a woken task in the wait list to enter optimistic spinning loop itself
>with precedence over the ones in the OSQ. This should prevent the
>live-lock
>condition from happening.

And one of the reasons why we never bothered 'fixing' things was the additional
branching out in the slowpath (and lack of real issue, although this one being so
damn pathological). I fear that your approach is one of those scenarios where the
code ends up being bloated, albeit most of it is actually duplicated and can be
refactored *sigh*. So now we'd spin, then sleep, then try spinning then sleep again...
phew. Not to mention the performance implications, ie loosing the benefits of osq
over waiter spinning in scenarios that would otherwise have more osq spinners as
opposed to waiter spinners, or in setups where it is actually best to block instead
of spinning.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21 23:02 ` Waiman Long
  2016-01-22  6:09   ` Davidlohr Bueso
@ 2016-01-22  8:54   ` Peter Zijlstra
  2016-01-22 10:20     ` Jason Low
  2016-01-22 13:41     ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Waiman Long
  1 sibling, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2016-01-22  8:54 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:
> This patch attempts to fix this live-lock condition by enabling the
> a woken task in the wait list to enter optimistic spinning loop itself
> with precedence over the ones in the OSQ. This should prevent the
> live-lock
> condition from happening.


So I think having the top waiter going back in to contend on the OSQ is
an excellent idea, but I'm not sure the wlh_spinning thing is important.

The OSQ itself is FIFO fair, and the waiters retain the wait_list
position. So having the top wait_list entry contending on the OSQ
ensures we cannot starve (I think).

Also, as Davidlohr said, we cannot copy/paste this much code.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22  8:54   ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Peter Zijlstra
@ 2016-01-22 10:20     ` Jason Low
  2016-01-22 10:53       ` Peter Zijlstra
  2016-01-22 13:41     ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Waiman Long
  1 sibling, 1 reply; 37+ messages in thread
From: Jason Low @ 2016-01-22 10:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2

On Fri, 2016-01-22 at 09:54 +0100, Peter Zijlstra wrote:
> On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:
> > This patch attempts to fix this live-lock condition by enabling the
> > a woken task in the wait list to enter optimistic spinning loop itself
> > with precedence over the ones in the OSQ. This should prevent the
> > live-lock
> > condition from happening.
> 
> 
> So I think having the top waiter going back in to contend on the OSQ is
> an excellent idea, but I'm not sure the wlh_spinning thing is important.
> 
> The OSQ itself is FIFO fair, and the waiters retain the wait_list
> position. So having the top wait_list entry contending on the OSQ
> ensures we cannot starve (I think).

Right, and we can also avoid needing to add that extra field to the
mutex structure. Before calling optimistic spinning, we do want to check
if the lock is available to avoid unnecessary OSQ overhead though.

So maybe the following would be sufficient:

---
 kernel/locking/mutex.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..ead0bd1 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_contended(&lock->dep_map, ip);
 
 	for (;;) {
+		bool acquired = false;
+
 		/*
 		 * Lets try to take the lock again - this is needed even if
 		 * we get here for the first time (shortly after failing to
@@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
+
+		if (mutex_is_locked(lock))
+			acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
 		spin_lock_mutex(&lock->wait_lock, flags);
+		if (acquired)
+			break;
 	}
 	__set_task_state(task, TASK_RUNNING);
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 10:20     ` Jason Low
@ 2016-01-22 10:53       ` Peter Zijlstra
  2016-01-22 10:56         ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2016-01-22 10:53 UTC (permalink / raw)
  To: Jason Low
  Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Fri, Jan 22, 2016 at 02:20:19AM -0800, Jason Low wrote:

> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	lock_contended(&lock->dep_map, ip);
>  
>  	for (;;) {
> +		bool acquired = false;
> +
>  		/*
>  		 * Lets try to take the lock again - this is needed even if
>  		 * we get here for the first time (shortly after failing to
> @@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  		/* didn't get the lock, go to sleep: */
>  		spin_unlock_mutex(&lock->wait_lock, flags);
>  		schedule_preempt_disabled();
> +
> +		if (mutex_is_locked(lock))
> +			acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
>  		spin_lock_mutex(&lock->wait_lock, flags);
> +		if (acquired)
> +			break;
>  	}
>  	__set_task_state(task, TASK_RUNNING);

I think the problem here is that mutex_optimistic_spin() leaves the
mutex->count == 0, even though we have waiters (us at the very least).

But this should be easily fixed, since if we acquired, we should be the
one releasing, so there's no race.

So something like so:

		if (acquired) {
			atomic_set(&mutex->count, -1);
			break;
		}

Should deal with that -- we'll set it to 0 again a little further down
if the list ends up empty.


There might be other details, but this is the one that stood out.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 10:53       ` Peter Zijlstra
@ 2016-01-22 10:56         ` Peter Zijlstra
  2016-01-22 11:06           ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2016-01-22 10:56 UTC (permalink / raw)
  To: Jason Low
  Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:

> There might be other details, but this is the one that stood out.

I think this also does the wrong thing for use_ww_ctx.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 10:56         ` Peter Zijlstra
@ 2016-01-22 11:06           ` Peter Zijlstra
  2016-01-22 13:59             ` Waiman Long
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2016-01-22 11:06 UTC (permalink / raw)
  To: Jason Low
  Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:
> 
> > There might be other details, but this is the one that stood out.
> 
> I think this also does the wrong thing for use_ww_ctx.

Something like so? 

---
 kernel/locking/mutex.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c219c40e..070a0ac34aa7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	struct task_struct *task = current;
 	struct mutex_waiter waiter;
 	unsigned long flags;
+	bool acquired;
 	int ret;
 
 	preempt_disable();
@@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_contended(&lock->dep_map, ip);
 
 	for (;;) {
+		acquired = false;
 		/*
 		 * Lets try to take the lock again - this is needed even if
 		 * we get here for the first time (shortly after failing to
@@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
+
+		if (mutex_is_locked(lock))
+			acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
+
 		spin_lock_mutex(&lock->wait_lock, flags);
+
+		if (acquired) {
+			atomic_set(&lock->count, -1);
+			break;
+		}
 	}
 	__set_task_state(task, TASK_RUNNING);
 
@@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		atomic_set(&lock->count, 0);
 	debug_mutex_free_waiter(&waiter);
 
+	if (acquired)
+		goto unlock;
+
 skip_wait:
 	/* got the lock - cleanup and rejoice! */
 	lock_acquired(&lock->dep_map, ip);
@@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		ww_mutex_set_context_slowpath(ww, ww_ctx);
 	}
 
+unlock:
 	spin_unlock_mutex(&lock->wait_lock, flags);
 	preempt_enable();
 	return 0;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22  6:09   ` Davidlohr Bueso
@ 2016-01-22 13:38     ` Waiman Long
  2016-01-22 16:46       ` Davidlohr Bueso
  0 siblings, 1 reply; 37+ messages in thread
From: Waiman Long @ 2016-01-22 13:38 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On 01/22/2016 01:09 AM, Davidlohr Bueso wrote:
> On Thu, 21 Jan 2016, Waiman Long wrote:
>
>> On 01/21/2016 04:29 AM, Ding Tianhong wrote:
>
>>> I got the vmcore and found that the ifconfig is already in the 
>>> wait_list of the
>>> rtnl_lock for 120 second, but my process could get and release the 
>>> rtnl_lock
>>> normally several times in one second, so it means that my process 
>>> jump the
>>> queue and the ifconfig couldn't get the rtnl all the time, I check 
>>> the mutex lock
>>> slow path and found that the mutex may spin on owner ignore whether 
>>> the  wait list
>>> is empty, it will cause the task in the wait list always be cut in 
>>> line, so add
>>> test for wait list in the mutex_can_spin_on_owner and avoid this 
>>> problem.
>
> So this has been somewhat always known, at least in theory, until now. 
> It's the cost
> of spinning without going through the wait-queue, unlike other locks.
>
>>> [...]
>
>> From: Waiman Long <Waiman.Long@hpe.com>
>> Date: Thu, 21 Jan 2016 17:53:14 -0500
>> Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken 
>> task in wait list
>>
>> Ding Tianhong reported a live-lock situation where a constant stream
>> of incoming optimistic spinners blocked a task in the wait list from
>> getting the mutex.
>>
>> This patch attempts to fix this live-lock condition by enabling the
>> a woken task in the wait list to enter optimistic spinning loop itself
>> with precedence over the ones in the OSQ. This should prevent the
>> live-lock
>> condition from happening.
>
> And one of the reasons why we never bothered 'fixing' things was the 
> additional
> branching out in the slowpath (and lack of real issue, although this 
> one being so
> damn pathological). I fear that your approach is one of those 
> scenarios where the
> code ends up being bloated, albeit most of it is actually duplicated 
> and can be
> refactored *sigh*. So now we'd spin, then sleep, then try spinning 
> then sleep again...
> phew. Not to mention the performance implications, ie loosing the 
> benefits of osq
> over waiter spinning in scenarios that would otherwise have more osq 
> spinners as
> opposed to waiter spinners, or in setups where it is actually best to 
> block instead
> of spinning.

The patch that I sent out is just a proof of concept to make sure that 
it can fix that particular case. I do plan to refactor it if I decide to 
go ahead with an official one. Unlike the OSQ, there can be no more than 
one waiter spinner as the wakeup function is directed to only the first 
task in the wait list and the spinning won't happen until the task is 
first woken up. In the worst case scenario, there are only 2 spinners 
spinning on the lock and the owner field, one from OSQ and one from the 
wait list. That shouldn't put too much cacheline contention traffic to 
the system.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22  8:54   ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Peter Zijlstra
  2016-01-22 10:20     ` Jason Low
@ 2016-01-22 13:41     ` Waiman Long
  1 sibling, 0 replies; 37+ messages in thread
From: Waiman Long @ 2016-01-22 13:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On 01/22/2016 03:54 AM, Peter Zijlstra wrote:
> On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:
>> This patch attempts to fix this live-lock condition by enabling the
>> a woken task in the wait list to enter optimistic spinning loop itself
>> with precedence over the ones in the OSQ. This should prevent the
>> live-lock
>> condition from happening.
>
> So I think having the top waiter going back in to contend on the OSQ is
> an excellent idea, but I'm not sure the wlh_spinning thing is important.

Yes, that is optional. I put it there just to make it is more likely for 
the waiter spinner to get the lock. Without that, the chance will be 
50/50 on average. I can certainly take that out.

> The OSQ itself is FIFO fair, and the waiters retain the wait_list
> position. So having the top wait_list entry contending on the OSQ
> ensures we cannot starve (I think).
>
> Also, as Davidlohr said, we cannot copy/paste this much code.

As I said in the previous mail, I do intend to refactor it before 
sending out the official patch.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 11:06           ` Peter Zijlstra
@ 2016-01-22 13:59             ` Waiman Long
  2016-01-24  8:03               ` Ding Tianhong
  0 siblings, 1 reply; 37+ messages in thread
From: Waiman Long @ 2016-01-22 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, Ding Tianhong, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On 01/22/2016 06:06 AM, Peter Zijlstra wrote:
> On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:
>> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:
>>
>>> There might be other details, but this is the one that stood out.
>> I think this also does the wrong thing for use_ww_ctx.
> Something like so?

I think that should work. My only minor concern is that putting the 
waiter spinner at the end of the OSQ will take it longer to get the 
lock, but that shouldn't be a big issue.

> ---
>   kernel/locking/mutex.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c219c40e..070a0ac34aa7 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	struct task_struct *task = current;
>   	struct mutex_waiter waiter;
>   	unsigned long flags;
> +	bool acquired;
>   	int ret;
>
>   	preempt_disable();
> @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	lock_contended(&lock->dep_map, ip);
>
>   	for (;;) {
> +		acquired = false;
>   		/*
>   		 * Lets try to take the lock again - this is needed even if
>   		 * we get here for the first time (shortly after failing to
> @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   		/* didn't get the lock, go to sleep: */
>   		spin_unlock_mutex(&lock->wait_lock, flags);
>   		schedule_preempt_disabled();
> +
> +		if (mutex_is_locked(lock))
> +			acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
> +
>   		spin_lock_mutex(&lock->wait_lock, flags);
> +
> +		if (acquired) {
> +			atomic_set(&lock->count, -1);
> +			break;
> +		}
>   	}
>   	__set_task_state(task, TASK_RUNNING);
>
> @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   		atomic_set(&lock->count, 0);
>   	debug_mutex_free_waiter(&waiter);
>
> +	if (acquired)
> +		goto unlock;
> +
>   skip_wait:
>   	/* got the lock - cleanup and rejoice! */
>   	lock_acquired(&lock->dep_map, ip);
> @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   		ww_mutex_set_context_slowpath(ww, ww_ctx);
>   	}
>
> +unlock:
>   	spin_unlock_mutex(&lock->wait_lock, flags);
>   	preempt_enable();
>   	return 0;

Cheers,
Longman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 13:38     ` Waiman Long
@ 2016-01-22 16:46       ` Davidlohr Bueso
  2016-01-25  2:23         ` [PATCH] locking/mutex: Allow next waiter lockless wakeup Davidlohr Bueso
  0 siblings, 1 reply; 37+ messages in thread
From: Davidlohr Bueso @ 2016-01-22 16:46 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On Fri, 22 Jan 2016, Waiman Long wrote:

>The patch that I sent out is just a proof of concept to make sure
>that it can fix that particular case. I do plan to refactor it if I
>decide to go ahead with an official one. Unlike the OSQ, there can be
>no more than one waiter spinner as the wakeup function is directed to
>only the first task in the wait list and the spinning won't happen
>until the task is first woken up. In the worst case scenario, there
>are only 2 spinners spinning on the lock and the owner field, one
>from OSQ and one from the wait list. That shouldn't put too much
>cacheline contention traffic to the system.

Similarly, I guess we should also wakeup the next waiter in line after
releasing the wait_lock via wake_q. This would allow the woken waiter a
slightly better chance of finding the wait_lock free when continuing to
take the mutex.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 13:59             ` Waiman Long
@ 2016-01-24  8:03               ` Ding Tianhong
  2016-01-29  9:53                 ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Ding Tianhong @ 2016-01-24  8:03 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra
  Cc: Jason Low, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Tim Chen, Waiman Long

On 2016/1/22 21:59, Waiman Long wrote:
> On 01/22/2016 06:06 AM, Peter Zijlstra wrote:
>> On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:
>>> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:
>>>
>>>> There might be other details, but this is the one that stood out.
>>> I think this also does the wrong thing for use_ww_ctx.
>> Something like so?
> 
> I think that should work. My only minor concern is that putting the waiter spinner at the end of the OSQ will take it longer to get the lock, but that shouldn't be a big issue.
> 
>> ---
>>   kernel/locking/mutex.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 0551c219c40e..070a0ac34aa7 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>       struct task_struct *task = current;
>>       struct mutex_waiter waiter;
>>       unsigned long flags;
>> +    bool acquired;
>>       int ret;
>>
>>       preempt_disable();
>> @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>       lock_contended(&lock->dep_map, ip);
>>
>>       for (;;) {
>> +        acquired = false;
>>           /*
>>            * Lets try to take the lock again - this is needed even if
>>            * we get here for the first time (shortly after failing to
>> @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>           /* didn't get the lock, go to sleep: */
>>           spin_unlock_mutex(&lock->wait_lock, flags);
>>           schedule_preempt_disabled();
>> +
>> +        if (mutex_is_locked(lock))
>> +            acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
>> +
>>           spin_lock_mutex(&lock->wait_lock, flags);
>> +
>> +        if (acquired) {
>> +            atomic_set(&lock->count, -1);
>> +            break;
>> +        }
>>       }
>>       __set_task_state(task, TASK_RUNNING);
>>
>> @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>           atomic_set(&lock->count, 0);
>>       debug_mutex_free_waiter(&waiter);
>>
>> +    if (acquired)
>> +        goto unlock;
>> +
>>   skip_wait:
>>       /* got the lock - cleanup and rejoice! */
>>       lock_acquired(&lock->dep_map, ip);
>> @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>           ww_mutex_set_context_slowpath(ww, ww_ctx);
>>       }
>>
>> +unlock:
>>       spin_unlock_mutex(&lock->wait_lock, flags);
>>       preempt_enable();
>>       return 0;
> 
> Cheers,
> Longman
> 

looks good to me, I will try this solution and report the result, thanks everyone.

Ding

> .
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] locking/mutex: Allow next waiter lockless wakeup
  2016-01-22 16:46       ` Davidlohr Bueso
@ 2016-01-25  2:23         ` Davidlohr Bueso
  2016-01-25 23:02           ` Waiman Long
  2016-02-29 11:21           ` [tip:locking/core] " tip-bot for Davidlohr Bueso
  0 siblings, 2 replies; 37+ messages in thread
From: Davidlohr Bueso @ 2016-01-25  2:23 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

Make use of wake_q and enable the wakeup to occur after
releasing the wait_lock. This is similar to what we do
with rtmutex top waiter, slightly shortening the critical
region and allow other waiters to acquire the wait_lock
sooner. In low contention cases it can also help the
recently woken waiter to find the wait_lock available
(fastpath) when it continues execution.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
  kernel/locking/mutex.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..e364b42 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -716,6 +716,7 @@ static inline void
  __mutex_unlock_common_slowpath(struct mutex *lock, int nested)
  {
  	unsigned long flags;
+	WAKE_Q(wake_q);
  
  	/*
  	 * As a performance measurement, release the lock before doing other
@@ -743,11 +744,11 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested)
  					   struct mutex_waiter, list);
  
  		debug_mutex_wake_waiter(lock, waiter);
-
-		wake_up_process(waiter->task);
+		wake_q_add(&wake_q, waiter->task);
  	}
  
  	spin_unlock_mutex(&lock->wait_lock, flags);
+	wake_up_q(&wake_q);
  }
  
  /*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH] locking/mutex: Allow next waiter lockless wakeup
  2016-01-25  2:23         ` [PATCH] locking/mutex: Allow next waiter lockless wakeup Davidlohr Bueso
@ 2016-01-25 23:02           ` Waiman Long
  2016-02-29 11:21           ` [tip:locking/core] " tip-bot for Davidlohr Bueso
  1 sibling, 0 replies; 37+ messages in thread
From: Waiman Long @ 2016-01-25 23:02 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On 01/24/2016 09:23 PM, Davidlohr Bueso wrote:
> Make use of wake_q and enable the wakeup to occur after
> releasing the wait_lock. This is similar to what we do
> with rtmutex top waiter, slightly shortening the critical
> region and allow other waiters to acquire the wait_lock
> sooner. In low contention cases it can also help the
> recently woken waiter to find the wait_lock available
> (fastpath) when it continues execution.
>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  kernel/locking/mutex.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..e364b42 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -716,6 +716,7 @@ static inline void
>  __mutex_unlock_common_slowpath(struct mutex *lock, int nested)
>  {
>      unsigned long flags;
> +    WAKE_Q(wake_q);
>
>      /*
>       * As a performance measurement, release the lock before doing other
> @@ -743,11 +744,11 @@ __mutex_unlock_common_slowpath(struct mutex 
> *lock, int nested)
>                         struct mutex_waiter, list);
>
>          debug_mutex_wake_waiter(lock, waiter);
> -
> -        wake_up_process(waiter->task);
> +        wake_q_add(&wake_q, waiter->task);
>      }
>
>      spin_unlock_mutex(&lock->wait_lock, flags);
> +    wake_up_q(&wake_q);
>  }
>
>  /*

This patch looks good to me.

Reviewed-by: Waiman Long <Waiman.Long@hpe.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-24  8:03               ` Ding Tianhong
@ 2016-01-29  9:53                 ` Peter Zijlstra
  2016-01-30  1:18                   ` Ding Tianhong
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2016-01-29  9:53 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Waiman Long, Jason Low, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:

> looks good to me, I will try this solution and report the result, thanks everyone.

Did you get a change to run with this?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-29  9:53                 ` Peter Zijlstra
@ 2016-01-30  1:18                   ` Ding Tianhong
  2016-02-01  3:29                     ` huang ying
  2016-02-01 10:08                     ` [PATCH] locking/mutex: Avoid spinner vs waiter starvation Peter Zijlstra
  0 siblings, 2 replies; 37+ messages in thread
From: Ding Tianhong @ 2016-01-30  1:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Jason Low, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On 2016/1/29 17:53, Peter Zijlstra wrote:
> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
> 
>> looks good to me, I will try this solution and report the result, thanks everyone.
> 
> Did you get a change to run with this?
> 
> .
> 

I backport this patch to 3.10 lts kernel, and didn't change any logic, Till now, the patch works fine to me, and no need to change anything,
So I think this patch is no problem, could you formal release this patch to the latest kernel? :)

Thanks.
Ding 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-30  1:18                   ` Ding Tianhong
@ 2016-02-01  3:29                     ` huang ying
  2016-02-01  3:35                       ` Huang, Ying
  2016-02-01 10:08                     ` [PATCH] locking/mutex: Avoid spinner vs waiter starvation Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: huang ying @ 2016-02-01  3:29 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Peter Zijlstra, Waiman Long, Jason Low, Ingo Molnar,
	linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, Huang Ying

On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
> On 2016/1/29 17:53, Peter Zijlstra wrote:
>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
>>
>>> looks good to me, I will try this solution and report the result, thanks everyone.
>>
>> Did you get a change to run with this?
>>
>> .
>>
>
> I backport this patch to 3.10 lts kernel, and didn't change any logic, Till now, the patch works fine to me, and no need to change anything,
> So I think this patch is no problem, could you formal release this patch to the latest kernel? :)
>
> Thanks.
> Ding
>
>

The original patch from Tianhong triggered a performance regression
because the optimistic spinning is turned off in effect.  I tested
Peter's patch with same configuration and there show no regression.
So I think the patch keep the optimistic spinning.  Test result
details will be in the next email.

Best Regards,
Huang, YIng

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-02-01  3:29                     ` huang ying
@ 2016-02-01  3:35                       ` Huang, Ying
  0 siblings, 0 replies; 37+ messages in thread
From: Huang, Ying @ 2016-02-01  3:35 UTC (permalink / raw)
  To: huang ying
  Cc: Ding Tianhong, Peter Zijlstra, Waiman Long, Jason Low,
	Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds,
	Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen,
	Waiman Long, Huang Ying

huang ying <huang.ying.caritas@gmail.com> writes:

> On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>> On 2016/1/29 17:53, Peter Zijlstra wrote:
>>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
>>>
>>>> looks good to me, I will try this solution and report the result, thanks everyone.
>>>
>>> Did you get a change to run with this?
>>>
>>> .
>>>
>>
>> I backport this patch to 3.10 lts kernel, and didn't change any
>> logic, Till now, the patch works fine to me, and no need to change
>> anything,
>> So I think this patch is no problem, could you formal release this patch to the latest kernel? :)
>>
>> Thanks.
>> Ding
>>
>>
>
> The original patch from Tianhong triggered a performance regression
> because the optimistic spinning is turned off in effect.  I tested
> Peter's patch with same configuration and there show no regression.
> So I think the patch keep the optimistic spinning.  Test result
> details will be in the next email.

Here is the detailed test result:

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase:
  gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/lkp-ivb-d01/fstime/unixbench

commit: 
  v4.4
  1db66c17114d5437c0757d6792c0d8923192ecd6

            v4.4 1db66c17114d5437c0757d6792 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    371269 ± 10%     -93.2%      25080 ±  4%  unixbench.time.voluntary_context_switches
    371269 ± 10%     -93.2%      25080 ±  4%  time.voluntary_context_switches
      6189 ±  8%     -76.4%       1463 ±  6%  vmstat.system.cs
      5706 ±  0%      -1.7%       5608 ±  0%  vmstat.system.in
    113680 ± 12%     -73.5%      30086 ±  1%  cpuidle.C1-IVB.usage
   1515925 ± 20%     +68.3%    2552001 ± 11%  cpuidle.C1E-IVB.time
   1227221 ± 20%     -51.7%     592695 ± 17%  cpuidle.C3-IVB.time
      2697 ± 10%     -77.8%     598.33 ± 10%  cpuidle.C3-IVB.usage
     15173 ±  6%     -23.1%      11663 ±  1%  cpuidle.C6-IVB.usage
     34.38 ± 27%     -35.0%      22.33 ±  2%  cpuidle.POLL.usage
     61.92 ±  9%     +14.3%      70.78 ±  7%  sched_debug.cfs_rq:/.load_avg.min
     40.85 ± 29%     +27.3%      52.00 ± 10%  sched_debug.cfs_rq:/.runnable_load_avg.2
     -1949 ±-37%     -64.6%    -690.19 ±-134%  sched_debug.cfs_rq:/.spread0.4
     -1773 ±-29%     -85.6%    -256.00 ±-388%  sched_debug.cfs_rq:/.spread0.7
     -2478 ±-26%     -49.5%      -1251 ±-66%  sched_debug.cfs_rq:/.spread0.min
     61.95 ±  9%     +14.8%      71.11 ±  7%  sched_debug.cfs_rq:/.tg_load_avg_contrib.min
    396962 ± 12%     +27.9%     507573 ± 10%  sched_debug.cpu.avg_idle.0
    432973 ± 18%     +45.3%     629147 ± 12%  sched_debug.cpu.avg_idle.6
    448566 ±  3%     +11.5%     499990 ±  0%  sched_debug.cpu.avg_idle.avg
     45.31 ±  5%      -9.5%      41.00 ±  3%  sched_debug.cpu.cpu_load[3].7
     52204 ± 10%     -49.9%      26173 ± 34%  sched_debug.cpu.nr_switches.0
     50383 ± 12%     -57.6%      21353 ± 15%  sched_debug.cpu.nr_switches.1
     45425 ± 16%     -68.5%      14325 ± 28%  sched_debug.cpu.nr_switches.2
     43069 ± 20%     -65.5%      14852 ± 41%  sched_debug.cpu.nr_switches.3
     40285 ± 16%     -70.4%      11905 ± 47%  sched_debug.cpu.nr_switches.4
     40732 ± 13%     -75.8%       9872 ± 39%  sched_debug.cpu.nr_switches.5
     43011 ± 19%     -80.0%       8607 ± 42%  sched_debug.cpu.nr_switches.6
     38076 ± 12%     -75.9%       9167 ± 40%  sched_debug.cpu.nr_switches.7
     44148 ±  7%     -67.1%      14532 ±  6%  sched_debug.cpu.nr_switches.avg
     59877 ±  8%     -51.3%      29146 ± 21%  sched_debug.cpu.nr_switches.max
     33672 ±  9%     -83.0%       5718 ±  4%  sched_debug.cpu.nr_switches.min
     -0.62 ±-1411%   -2212.5%      13.00 ± 16%  sched_debug.cpu.nr_uninterruptible.0
     -1.23 ±-582%   -1318.8%      15.00 ± 35%  sched_debug.cpu.nr_uninterruptible.1
      2.54 ±263%    +267.7%       9.33 ± 91%  sched_debug.cpu.nr_uninterruptible.2
      0.31 ±2966%   -5841.7%     -17.67 ±-19%  sched_debug.cpu.nr_uninterruptible.5
      9.84 ± 19%     +70.4%      16.76 ±  5%  sched_debug.cpu.nr_uninterruptible.stddev
    116287 ±  4%     -20.9%      91972 ±  9%  sched_debug.cpu.sched_count.0
     50411 ± 12%     -57.6%      21382 ± 15%  sched_debug.cpu.sched_count.1
     45453 ± 16%     -68.4%      14356 ± 28%  sched_debug.cpu.sched_count.2
     43098 ± 20%     -65.5%      14888 ± 41%  sched_debug.cpu.sched_count.3
     40314 ± 16%     -70.4%      11934 ± 47%  sched_debug.cpu.sched_count.4
     40761 ± 13%     -75.7%       9896 ± 39%  sched_debug.cpu.sched_count.5
     43041 ± 19%     -79.9%       8636 ± 42%  sched_debug.cpu.sched_count.6
     38105 ± 12%     -75.9%       9193 ± 40%  sched_debug.cpu.sched_count.7
     52184 ±  6%     -56.3%      22782 ±  4%  sched_debug.cpu.sched_count.avg
    116288 ±  4%     -20.9%      91972 ±  9%  sched_debug.cpu.sched_count.max
     33701 ±  9%     -82.9%       5746 ±  4%  sched_debug.cpu.sched_count.min
     22760 ± 10%     -63.0%       8418 ± 40%  sched_debug.cpu.sched_goidle.0
     23319 ± 13%     -60.9%       9114 ± 22%  sched_debug.cpu.sched_goidle.1
     21273 ± 17%     -80.4%       4169 ± 13%  sched_debug.cpu.sched_goidle.2
     19993 ± 19%     -67.8%       6429 ± 45%  sched_debug.cpu.sched_goidle.3
     18614 ± 17%     -85.0%       2788 ± 29%  sched_debug.cpu.sched_goidle.4
     18921 ± 12%     -86.7%       2520 ± 15%  sched_debug.cpu.sched_goidle.5
     20131 ± 17%     -82.1%       3596 ± 52%  sched_debug.cpu.sched_goidle.6
     17861 ± 12%     -86.9%       2334 ± 14%  sched_debug.cpu.sched_goidle.7
     20359 ±  8%     -75.8%       4921 ±  5%  sched_debug.cpu.sched_goidle.avg
     26477 ± 10%     -60.2%      10539 ± 21%  sched_debug.cpu.sched_goidle.max
     15845 ± 10%     -87.2%       2033 ±  4%  sched_debug.cpu.sched_goidle.min
     29043 ± 15%     -58.8%      11958 ± 26%  sched_debug.cpu.ttwu_count.0
     24191 ± 10%     -68.8%       7547 ± 27%  sched_debug.cpu.ttwu_count.1
     21313 ± 11%     -72.7%       5819 ± 24%  sched_debug.cpu.ttwu_count.2
     21487 ± 13%     -61.4%       8296 ± 43%  sched_debug.cpu.ttwu_count.3
     19644 ± 15%     -54.4%       8967 ± 79%  sched_debug.cpu.ttwu_count.4
     20786 ± 15%     -69.2%       6409 ± 58%  sched_debug.cpu.ttwu_count.5
     20435 ± 17%     -79.3%       4231 ± 58%  sched_debug.cpu.ttwu_count.6
     19293 ± 17%     -77.0%       4432 ± 55%  sched_debug.cpu.ttwu_count.7
     22024 ±  7%     -67.3%       7207 ±  6%  sched_debug.cpu.ttwu_count.avg
     31009 ±  9%     -45.2%      17008 ± 17%  sched_debug.cpu.ttwu_count.max
     16791 ± 10%     -85.1%       2494 ±  5%  sched_debug.cpu.ttwu_count.min
      3084 ±  8%     +25.5%       3870 ±  9%  sched_debug.cpu.ttwu_local.avg

The URL of the original regression report email:

https://lists.01.org/pipermail/lkp/2016-January/003442.html

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] locking/mutex: Avoid spinner vs waiter starvation
  2016-01-30  1:18                   ` Ding Tianhong
  2016-02-01  3:29                     ` huang ying
@ 2016-02-01 10:08                     ` Peter Zijlstra
  2016-02-02 21:19                       ` Davidlohr Bueso
  2016-02-04  1:35                       ` Jason Low
  1 sibling, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2016-02-01 10:08 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Waiman Long, Jason Low, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Sat, Jan 30, 2016 at 09:18:44AM +0800, Ding Tianhong wrote:
> On 2016/1/29 17:53, Peter Zijlstra wrote:
> > On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
> > 
> >> looks good to me, I will try this solution and report the result, thanks everyone.
> > 
> > Did you get a change to run with this?
> > 
> > .
> > 
> 
> I backport this patch to 3.10 lts kernel, and didn't change any logic,
> Till now, the patch works fine to me, and no need to change anything,
> So I think this patch is no problem, could you formal release this
> patch to the latest kernel? :)

Thanks for testing, I've queued the below patch.

---
Subject: locking/mutex: Avoid spinner vs waiter starvation
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri, 22 Jan 2016 12:06:53 +0100

Ding Tianhong reported that under his load the optimistic spinners
would totally starve a task that ended up on the wait list.

Fix this by ensuring the top waiter also partakes in the optimistic
spin queue.

There are a few subtle differences between the assumed state of
regular optimistic spinners and those already on the wait list, which
result in the @acquired complication of the acquire path.

Most notable are:

 - waiters are on the wait list and need to be taken off
 - mutex_optimistic_spin() sets the lock->count to 0 on acquire
   even though there might be more tasks on the wait list.

Cc: Jason Low <jason.low2@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Will Deacon <Will.Deacon@arm.com>
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20160122110653.GF6375@twins.programming.kicks-ass.net
---
 kernel/locking/mutex.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock,
 	struct task_struct *task = current;
 	struct mutex_waiter waiter;
 	unsigned long flags;
+	bool acquired;
 	int ret;
 
 	preempt_disable();
@@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock,
 	lock_contended(&lock->dep_map, ip);
 
 	for (;;) {
+		acquired = false;
 		/*
 		 * Lets try to take the lock again - this is needed even if
 		 * we get here for the first time (shortly after failing to
@@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock,
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
+
+		if (mutex_is_locked(lock))
+			acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
+
 		spin_lock_mutex(&lock->wait_lock, flags);
+
+		if (acquired) {
+			atomic_set(&lock->count, -1);
+			break;
+		}
 	}
 	__set_task_state(task, TASK_RUNNING);
 
@@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock,
 		atomic_set(&lock->count, 0);
 	debug_mutex_free_waiter(&waiter);
 
+	if (acquired)
+		goto unlock;
+
 skip_wait:
 	/* got the lock - cleanup and rejoice! */
 	lock_acquired(&lock->dep_map, ip);
@@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock,
 		ww_mutex_set_context_slowpath(ww, ww_ctx);
 	}
 
+unlock:
 	spin_unlock_mutex(&lock->wait_lock, flags);
 	preempt_enable();
 	return 0;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation
  2016-02-01 10:08                     ` [PATCH] locking/mutex: Avoid spinner vs waiter starvation Peter Zijlstra
@ 2016-02-02 21:19                       ` Davidlohr Bueso
  2016-02-03  7:10                         ` Ding Tianhong
  2016-02-03 22:07                         ` Waiman Long
  2016-02-04  1:35                       ` Jason Low
  1 sibling, 2 replies; 37+ messages in thread
From: Davidlohr Bueso @ 2016-02-02 21:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ding Tianhong, Waiman Long, Jason Low, Ingo Molnar, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Tim Chen, Waiman Long

On Mon, 01 Feb 2016, Peter Zijlstra wrote:

>Subject: locking/mutex: Avoid spinner vs waiter starvation
>From: Peter Zijlstra <peterz@infradead.org>
>Date: Fri, 22 Jan 2016 12:06:53 +0100
>
>Ding Tianhong reported that under his load the optimistic spinners
>would totally starve a task that ended up on the wait list.
>
>Fix this by ensuring the top waiter also partakes in the optimistic
>spin queue.
>
>There are a few subtle differences between the assumed state of
>regular optimistic spinners and those already on the wait list, which
>result in the @acquired complication of the acquire path.
>
>Most notable are:
>
> - waiters are on the wait list and need to be taken off
> - mutex_optimistic_spin() sets the lock->count to 0 on acquire
>   even though there might be more tasks on the wait list.

Right, the main impact I see with these complications are that the
window of when a waiter takes the lock via spinning and then acquires
the wait_lock to remove itself from the list, will allow an unlock
thread to set the lock as available in the fastpath which could in
turn allow a third thread the steal the lock. With high contention,
this window will be come obviously larger as we contend for the
wait_lock.

CPU-0	      	 	       	    CPU-1			CPU-3
__mutex_lock_common		    
   mutex_optimistic_spin
   (->count now 0)
			__mutex_fastpath_unlock
			(->count now 1)				 __mutex_fastpath_lock
				     				 (stolen)
														
spin_lock_mutex(&lock->wait_lock, flags);

But we've always been bad when it comes to counter and waiters.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation
  2016-02-02 21:19                       ` Davidlohr Bueso
@ 2016-02-03  7:10                         ` Ding Tianhong
  2016-02-03 19:24                           ` Davidlohr Bueso
  2016-02-03 22:07                         ` Waiman Long
  1 sibling, 1 reply; 37+ messages in thread
From: Ding Tianhong @ 2016-02-03  7:10 UTC (permalink / raw)
  To: Davidlohr Bueso, Peter Zijlstra
  Cc: Waiman Long, Jason Low, Ingo Molnar, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Tim Chen, Waiman Long

On 2016/2/3 5:19, Davidlohr Bueso wrote:
> On Mon, 01 Feb 2016, Peter Zijlstra wrote:
> 
>> Subject: locking/mutex: Avoid spinner vs waiter starvation
>> From: Peter Zijlstra <peterz@infradead.org>
>> Date: Fri, 22 Jan 2016 12:06:53 +0100
>>
>> Ding Tianhong reported that under his load the optimistic spinners
>> would totally starve a task that ended up on the wait list.
>>
>> Fix this by ensuring the top waiter also partakes in the optimistic
>> spin queue.
>>
>> There are a few subtle differences between the assumed state of
>> regular optimistic spinners and those already on the wait list, which
>> result in the @acquired complication of the acquire path.
>>
>> Most notable are:
>>
>> - waiters are on the wait list and need to be taken off
>> - mutex_optimistic_spin() sets the lock->count to 0 on acquire
>>   even though there might be more tasks on the wait list.
> 
> Right, the main impact I see with these complications are that the
> window of when a waiter takes the lock via spinning and then acquires
> the wait_lock to remove itself from the list, will allow an unlock
> thread to set the lock as available in the fastpath which could in
> turn allow a third thread the steal the lock. With high contention,
> this window will be come obviously larger as we contend for the
> wait_lock.
> 
> CPU-0                                  CPU-1            CPU-3
> __mutex_lock_common              mutex_optimistic_spin
>   (->count now 0)
>             __mutex_fastpath_unlock
>             (->count now 1)                 __mutex_fastpath_lock
>                                       (stolen)
>                                                        
> spin_lock_mutex(&lock->wait_lock, flags);
> 
> But we've always been bad when it comes to counter and waiters.
> 

Agree, but this patch is going to help the waiter in the wait list to get the lock, your scene probability looks more
too low and I don't think it is a problem.

Thanks
Ding


> Thanks,
> Davidlohr
> 
> .
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation
  2016-02-03  7:10                         ` Ding Tianhong
@ 2016-02-03 19:24                           ` Davidlohr Bueso
  2016-02-04  1:20                             ` Ding Tianhong
  0 siblings, 1 reply; 37+ messages in thread
From: Davidlohr Bueso @ 2016-02-03 19:24 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Peter Zijlstra, Waiman Long, Jason Low, Ingo Molnar,
	linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On Wed, 03 Feb 2016, Ding Tianhong wrote:

>Agree, but this patch is going to help the waiter in the wait list to get the lock, your scene probability looks more
>too low and I don't think it is a problem.

Sure, I was in fact implying its not the end of the world,
although it will be interesting to see the impact on different
(non pathological) workloads, even if it only affects a single
waiter. Also, technically this issue can also affect rwsems if
only using writers, but that's obviously pretty idiotic, so I
wouldn't worry about it.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation
  2016-02-02 21:19                       ` Davidlohr Bueso
  2016-02-03  7:10                         ` Ding Tianhong
@ 2016-02-03 22:07                         ` Waiman Long
  1 sibling, 0 replies; 37+ messages in thread
From: Waiman Long @ 2016-02-03 22:07 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ding Tianhong, Jason Low, Ingo Molnar,
	linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On 02/02/2016 04:19 PM, Davidlohr Bueso wrote:
> On Mon, 01 Feb 2016, Peter Zijlstra wrote:
>
>> Subject: locking/mutex: Avoid spinner vs waiter starvation
>> From: Peter Zijlstra <peterz@infradead.org>
>> Date: Fri, 22 Jan 2016 12:06:53 +0100
>>
>> Ding Tianhong reported that under his load the optimistic spinners
>> would totally starve a task that ended up on the wait list.
>>
>> Fix this by ensuring the top waiter also partakes in the optimistic
>> spin queue.
>>
>> There are a few subtle differences between the assumed state of
>> regular optimistic spinners and those already on the wait list, which
>> result in the @acquired complication of the acquire path.
>>
>> Most notable are:
>>
>> - waiters are on the wait list and need to be taken off
>> - mutex_optimistic_spin() sets the lock->count to 0 on acquire
>>   even though there might be more tasks on the wait list.
>
> Right, the main impact I see with these complications are that the
> window of when a waiter takes the lock via spinning and then acquires
> the wait_lock to remove itself from the list, will allow an unlock
> thread to set the lock as available in the fastpath which could in
> turn allow a third thread the steal the lock. With high contention,
> this window will be come obviously larger as we contend for the
> wait_lock.
>
> CPU-0                                  CPU-1            CPU-3
> __mutex_lock_common              mutex_optimistic_spin
>   (->count now 0)
>             __mutex_fastpath_unlock
>             (->count now 1)                 __mutex_fastpath_lock
>                                       (stolen)
>
> spin_lock_mutex(&lock->wait_lock, flags);
>
> But we've always been bad when it comes to counter and waiters.
>
> Thanks,
> Davidlohr

I don't quite get how that can happen. CPU0 cannot change the count to 0 
unless CPU1, the lock holder, does the unlock first. Once CPU0 sees a 
count of 1 and change it to 0, it is the lock holder and there can be no 
other CPU that can do the unlock.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation
  2016-02-03 19:24                           ` Davidlohr Bueso
@ 2016-02-04  1:20                             ` Ding Tianhong
  2016-02-12 18:33                               ` Waiman Long
  0 siblings, 1 reply; 37+ messages in thread
From: Ding Tianhong @ 2016-02-04  1:20 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Waiman Long, Jason Low, Ingo Molnar,
	linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On 2016/2/4 3:24, Davidlohr Bueso wrote:
> On Wed, 03 Feb 2016, Ding Tianhong wrote:
> 
>> Agree, but this patch is going to help the waiter in the wait list to get the lock, your scene probability looks more
>> too low and I don't think it is a problem.
> 
> Sure, I was in fact implying its not the end of the world,
> although it will be interesting to see the impact on different
> (non pathological) workloads, even if it only affects a single
> waiter. Also, technically this issue can also affect rwsems if
> only using writers, but that's obviously pretty idiotic, so I
> wouldn't worry about it.
> 
> Thanks,
> Davidlohr
> 

Hi Davidlohr, Peter:

According Davidlohr's suggestion, I use several VM to test Peter's patch, and sadly I found one VM
still happen Hung Task for this problem, so I think we still need to think more about this solution.

Thanks.
Ding

> .
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation
  2016-02-01 10:08                     ` [PATCH] locking/mutex: Avoid spinner vs waiter starvation Peter Zijlstra
  2016-02-02 21:19                       ` Davidlohr Bueso
@ 2016-02-04  1:35                       ` Jason Low
  2016-02-04  8:55                         ` huang ying
  1 sibling, 1 reply; 37+ messages in thread
From: Jason Low @ 2016-02-04  1:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ding Tianhong, Waiman Long, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2

On Mon, 2016-02-01 at 11:08 +0100, Peter Zijlstra wrote:
> On Sat, Jan 30, 2016 at 09:18:44AM +0800, Ding Tianhong wrote:
> > On 2016/1/29 17:53, Peter Zijlstra wrote:
> > > On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
> > > 
> > >> looks good to me, I will try this solution and report the result, thanks everyone.
> > > 
> > > Did you get a change to run with this?
> > > 
> > > .
> > > 
> > 
> > I backport this patch to 3.10 lts kernel, and didn't change any logic,
> > Till now, the patch works fine to me, and no need to change anything,
> > So I think this patch is no problem, could you formal release this
> > patch to the latest kernel? :)
> 
> Thanks for testing, I've queued the below patch.
> 
> ---
> Subject: locking/mutex: Avoid spinner vs waiter starvation
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri, 22 Jan 2016 12:06:53 +0100
> 
> Ding Tianhong reported that under his load the optimistic spinners
> would totally starve a task that ended up on the wait list.
> 
> Fix this by ensuring the top waiter also partakes in the optimistic
> spin queue.
> 
> There are a few subtle differences between the assumed state of
> regular optimistic spinners and those already on the wait list, which
> result in the @acquired complication of the acquire path.
> 
> Most notable are:
> 
>  - waiters are on the wait list and need to be taken off
>  - mutex_optimistic_spin() sets the lock->count to 0 on acquire
>    even though there might be more tasks on the wait list.
> 
> Cc: Jason Low <jason.low2@hp.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Waiman Long <waiman.long@hpe.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Will Deacon <Will.Deacon@arm.com>
> Reported-by: Ding Tianhong <dingtianhong@huawei.com>
> Tested-by: Ding Tianhong <dingtianhong@huawei.com>
> Tested-by: "Huang, Ying" <ying.huang@intel.com>
> Suggested-by: Waiman Long <Waiman.Long@hp.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/20160122110653.GF6375@twins.programming.kicks-ass.net

I've done some testing with this patch with some of the AIM7 workloads
and found that this reduced throughput by about 10%. The reduction in
throughput is expected since spinning as a waiter is less efficient.

Another observation I made is that the top waiter spinners would often
times require needing to reschedule before being able to acquire the
lock from spinning when there was high contention. A waiter can go into
the cycle of spin -> reschedule -> spin -> reschedule. So although the
chance of starvation is reduced, this patch doesn't fully address the
issue of waiter starvation.

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation
  2016-02-04  1:35                       ` Jason Low
@ 2016-02-04  8:55                         ` huang ying
  2016-02-04 22:49                           ` Jason Low
  0 siblings, 1 reply; 37+ messages in thread
From: huang ying @ 2016-02-04  8:55 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ding Tianhong, Waiman Long, Ingo Molnar,
	linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2

Hi, Low,

On Thu, Feb 4, 2016 at 9:35 AM, Jason Low <jason.low2@hp.com> wrote:
> I've done some testing with this patch with some of the AIM7 workloads
> and found that this reduced throughput by about 10%. The reduction in
> throughput is expected since spinning as a waiter is less efficient.
>
> Another observation I made is that the top waiter spinners would often
> times require needing to reschedule before being able to acquire the
> lock from spinning when there was high contention. A waiter can go into
> the cycle of spin -> reschedule -> spin -> reschedule. So although the
> chance of starvation is reduced, this patch doesn't fully address the
> issue of waiter starvation.

Could you share your workload?  I want to reproduce it in 0day/LKP+ environment.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation
  2016-02-04  8:55                         ` huang ying
@ 2016-02-04 22:49                           ` Jason Low
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Low @ 2016-02-04 22:49 UTC (permalink / raw)
  To: huang ying
  Cc: Peter Zijlstra, Ding Tianhong, Waiman Long, Ingo Molnar,
	linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2,
	scott.norton

On Thu, 2016-02-04 at 16:55 +0800, huang ying wrote:
> Hi, Low,
> 
> On Thu, Feb 4, 2016 at 9:35 AM, Jason Low <jason.low2@hp.com> wrote:
> > I've done some testing with this patch with some of the AIM7 workloads
> > and found that this reduced throughput by about 10%. The reduction in
> > throughput is expected since spinning as a waiter is less efficient.
> >
> > Another observation I made is that the top waiter spinners would often
> > times require needing to reschedule before being able to acquire the
> > lock from spinning when there was high contention. A waiter can go into
> > the cycle of spin -> reschedule -> spin -> reschedule. So although the
> > chance of starvation is reduced, this patch doesn't fully address the
> > issue of waiter starvation.
> 
> Could you share your workload?  I want to reproduce it in 0day/LKP+ environment.

CC'ing Scott, who wrote the automation scripts.

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation
  2016-02-04  1:20                             ` Ding Tianhong
@ 2016-02-12 18:33                               ` Waiman Long
  0 siblings, 0 replies; 37+ messages in thread
From: Waiman Long @ 2016-02-12 18:33 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Davidlohr Bueso, Peter Zijlstra, Jason Low, Ingo Molnar,
	linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]

On 02/03/2016 08:20 PM, Ding Tianhong wrote:
> On 2016/2/4 3:24, Davidlohr Bueso wrote:
>> On Wed, 03 Feb 2016, Ding Tianhong wrote:
>>
>>> Agree, but this patch is going to help the waiter in the wait list to get the lock, your scene probability looks more
>>> too low and I don't think it is a problem.
>> Sure, I was in fact implying its not the end of the world,
>> although it will be interesting to see the impact on different
>> (non pathological) workloads, even if it only affects a single
>> waiter. Also, technically this issue can also affect rwsems if
>> only using writers, but that's obviously pretty idiotic, so I
>> wouldn't worry about it.
>>
>> Thanks,
>> Davidlohr
>>
> Hi Davidlohr, Peter:
>
> According Davidlohr's suggestion, I use several VM to test Peter's patch, and sadly I found one VM
> still happen Hung Task for this problem, so I think we still need to think more about this solution.
>
> Thanks.
> Ding

After thinking about it, it may be the case the hung task missed the 
wakeup. If no new task enters the wait queue again, the hung task will 
never get woken up. Would you mind trying out my patch to see if you are 
still seeing hung task or not?

Thanks,
Longman

[-- Attachment #2: longman-mutex-v2-patch.tar.gz --]
[-- Type: application/x-gzip, Size: 5106 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [tip:locking/core] locking/mutex: Allow next waiter lockless wakeup
  2016-01-25  2:23         ` [PATCH] locking/mutex: Allow next waiter lockless wakeup Davidlohr Bueso
  2016-01-25 23:02           ` Waiman Long
@ 2016-02-29 11:21           ` tip-bot for Davidlohr Bueso
  1 sibling, 0 replies; 37+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2016-02-29 11:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Will.Deacon, mingo, paulmck, linux-kernel, Waiman.Long, paulmck,
	akpm, dingtianhong, waiman.long, tim.c.chen, dave, dbueso,
	peterz, torvalds, jason.low2, hpa, tglx

Commit-ID:  1329ce6fbbe4536592dfcfc8d64d61bfeb598fe6
Gitweb:     http://git.kernel.org/tip/1329ce6fbbe4536592dfcfc8d64d61bfeb598fe6
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Sun, 24 Jan 2016 18:23:43 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 29 Feb 2016 10:02:42 +0100

locking/mutex: Allow next waiter lockless wakeup

Make use of wake-queues and enable the wakeup to occur after releasing the
wait_lock. This is similar to what we do with rtmutex top waiter,
slightly shortening the critical region and allow other waiters to
acquire the wait_lock sooner. In low contention cases it can also help
the recently woken waiter to find the wait_lock available (fastpath)
when it continues execution.

Reviewed-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: Will Deacon <Will.Deacon@arm.com>
Link: http://lkml.kernel.org/r/20160125022343.GA3322@linux-uzut.site
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/mutex.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..e364b42 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -716,6 +716,7 @@ static inline void
 __mutex_unlock_common_slowpath(struct mutex *lock, int nested)
 {
 	unsigned long flags;
+	WAKE_Q(wake_q);
 
 	/*
 	 * As a performance measurement, release the lock before doing other
@@ -743,11 +744,11 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested)
 					   struct mutex_waiter, list);
 
 		debug_mutex_wake_waiter(lock, waiter);
-
-		wake_up_process(waiter->task);
+		wake_q_add(&wake_q, waiter->task);
 	}
 
 	spin_unlock_mutex(&lock->wait_lock, flags);
+	wake_up_q(&wake_q);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21  7:29 ` Ingo Molnar
@ 2016-01-21  9:04   ` Ding Tianhong
  0 siblings, 0 replies; 37+ messages in thread
From: Ding Tianhong @ 2016-01-21  9:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner,
	Peter Zijlstra, Will Deacon, Jason Low, Tim Chen, Waiman Long

On 2016/1/21 15:29, Ingo Molnar wrote:
> 
> Cc:-ed other gents who touched the mutex code recently. Mail quoted below.
> 

Ok, thanks.

Ding

> Thanks,
> 
> 	Ingo
> 
> * Ding Tianhong <dingtianhong@huawei.com> wrote:
> 
>> I build a script to create several process for ioctl loop calling,
>> the ioctl will calling the kernel function just like:
>> xx_ioctl {
>> ...
>> rtnl_lock();
>> function();
>> rtnl_unlock();
>> ...
>> }
>> The function may sleep several ms, but will not halt, at the same time
>> another user service may calling ifconfig to change the state of the
>> ethernet, and after several hours, the hung task thread report this problem:
>>
>> ========================================================================
>> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
>> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
>> [149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
>> [149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
>> [149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
>> [149738.042290] Call Trace:
>> [149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
>> [149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
>> [149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
>> [149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
>> [149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
>> [149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
>> [149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
>> [149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
>> [149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
>> [149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0
>>
>> ================================ cut here ================================
>>
>> I got the vmcore and found that the ifconfig is already in the wait_list of the
>> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
>> normally several times in one second, so it means that my process jump the
>> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
>> slow path and found that the mutex may spin on owner ignore whether the  wait list
>> is empty, it will cause the task in the wait list always be cut in line, so add
>> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
>>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
>>  kernel/locking/mutex.c | 11 ++++++-----
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 0551c21..596b341 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
>>  	struct task_struct *owner;
>>  	int retval = 1;
>>  
>> -	if (need_resched())
>> +	if (need_resched() || atomic_read(&lock->count) == -1)
>>  		return 0;
>>  
>>  	rcu_read_lock();
>> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
>>  /*
>>   * Optimistic spinning.
>>   *
>> - * We try to spin for acquisition when we find that the lock owner
>> - * is currently running on a (different) CPU and while we don't
>> - * need to reschedule. The rationale is that if the lock owner is
>> - * running, it is likely to release the lock soon.
>> + * We try to spin for acquisition when we find that there are no
>> + * pending waiters and the lock owner is currently running on a
>> + * (different) CPU and while we don't need to reschedule. The
>> + * rationale is that if the lock owner is running, it is likely
>> + * to release the lock soon.
>>   *
>>   * Since this needs the lock owner, and this mutex implementation
>>   * doesn't track the owner atomically in the lock field, we need to
>> -- 
>> 2.5.0
>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21  6:53 [PATCH RFC ] " Ding Tianhong
@ 2016-01-21  7:29 ` Ingo Molnar
  2016-01-21  9:04   ` Ding Tianhong
  0 siblings, 1 reply; 37+ messages in thread
From: Ingo Molnar @ 2016-01-21  7:29 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner,
	Peter Zijlstra, Will Deacon, Jason Low, Tim Chen, Waiman Long


Cc:-ed other gents who touched the mutex code recently. Mail quoted below.

Thanks,

	Ingo

* Ding Tianhong <dingtianhong@huawei.com> wrote:

> I build a script to create several process for ioctl loop calling,
> the ioctl will calling the kernel function just like:
> xx_ioctl {
> ...
> rtnl_lock();
> function();
> rtnl_unlock();
> ...
> }
> The function may sleep several ms, but will not halt, at the same time
> another user service may calling ifconfig to change the state of the
> ethernet, and after several hours, the hung task thread report this problem:
> 
> ========================================================================
> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
> [149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
> [149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
> [149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
> [149738.042290] Call Trace:
> [149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
> [149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
> [149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
> [149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
> [149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
> [149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
> [149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
> [149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
> [149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
> [149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0
> 
> ================================ cut here ================================
> 
> I got the vmcore and found that the ifconfig is already in the wait_list of the
> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
> normally several times in one second, so it means that my process jump the
> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
> slow path and found that the mutex may spin on owner ignore whether the  wait list
> is empty, it will cause the task in the wait list always be cut in line, so add
> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
> 
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>  kernel/locking/mutex.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
>  	struct task_struct *owner;
>  	int retval = 1;
>  
> -	if (need_resched())
> +	if (need_resched() || atomic_read(&lock->count) == -1)
>  		return 0;
>  
>  	rcu_read_lock();
> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
>  /*
>   * Optimistic spinning.
>   *
> - * We try to spin for acquisition when we find that the lock owner
> - * is currently running on a (different) CPU and while we don't
> - * need to reschedule. The rationale is that if the lock owner is
> - * running, it is likely to release the lock soon.
> + * We try to spin for acquisition when we find that there are no
> + * pending waiters and the lock owner is currently running on a
> + * (different) CPU and while we don't need to reschedule. The
> + * rationale is that if the lock owner is running, it is likely
> + * to release the lock soon.
>   *
>   * Since this needs the lock owner, and this mutex implementation
>   * doesn't track the owner atomically in the lock field, we need to
> -- 
> 2.5.0
> 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.
@ 2016-01-21  6:53 Ding Tianhong
  2016-01-21  7:29 ` Ingo Molnar
  0 siblings, 1 reply; 37+ messages in thread
From: Ding Tianhong @ 2016-01-21  6:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel

I build a script to create several process for ioctl loop calling,
the ioctl will calling the kernel function just like:
xx_ioctl {
...
rtnl_lock();
function();
rtnl_unlock();
...
}
The function may sleep several ms, but will not halt, at the same time
another user service may calling ifconfig to change the state of the
ethernet, and after several hours, the hung task thread report this problem:

========================================================================
149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
[149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
[149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
[149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
[149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
[149738.042290] Call Trace:
[149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
[149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
[149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
[149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
[149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
[149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
[149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
[149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
[149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
[149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0

================================ cut here ================================

I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
slow path and found that the mutex may spin on owner ignore whether the  wait list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 kernel/locking/mutex.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..596b341 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 	struct task_struct *owner;
 	int retval = 1;
 
-	if (need_resched())
+	if (need_resched() || atomic_read(&lock->count) == -1)
 		return 0;
 
 	rcu_read_lock();
@@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
 /*
  * Optimistic spinning.
  *
- * We try to spin for acquisition when we find that the lock owner
- * is currently running on a (different) CPU and while we don't
- * need to reschedule. The rationale is that if the lock owner is
- * running, it is likely to release the lock soon.
+ * We try to spin for acquisition when we find that there are no
+ * pending waiters and the lock owner is currently running on a
+ * (different) CPU and while we don't need to reschedule. The
+ * rationale is that if the lock owner is running, it is likely
+ * to release the lock soon.
  *
  * Since this needs the lock owner, and this mutex implementation
  * doesn't track the owner atomically in the lock field, we need to
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2016-02-29 11:22 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-21  9:29 [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Ding Tianhong
2016-01-21 21:23 ` Tim Chen
2016-01-22  2:41   ` Paul E. McKenney
2016-01-22  2:48     ` Davidlohr Bueso
2016-01-22  3:13       ` Paul E. McKenney
2016-01-21 23:02 ` Waiman Long
2016-01-22  6:09   ` Davidlohr Bueso
2016-01-22 13:38     ` Waiman Long
2016-01-22 16:46       ` Davidlohr Bueso
2016-01-25  2:23         ` [PATCH] locking/mutex: Allow next waiter lockless wakeup Davidlohr Bueso
2016-01-25 23:02           ` Waiman Long
2016-02-29 11:21           ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2016-01-22  8:54   ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Peter Zijlstra
2016-01-22 10:20     ` Jason Low
2016-01-22 10:53       ` Peter Zijlstra
2016-01-22 10:56         ` Peter Zijlstra
2016-01-22 11:06           ` Peter Zijlstra
2016-01-22 13:59             ` Waiman Long
2016-01-24  8:03               ` Ding Tianhong
2016-01-29  9:53                 ` Peter Zijlstra
2016-01-30  1:18                   ` Ding Tianhong
2016-02-01  3:29                     ` huang ying
2016-02-01  3:35                       ` Huang, Ying
2016-02-01 10:08                     ` [PATCH] locking/mutex: Avoid spinner vs waiter starvation Peter Zijlstra
2016-02-02 21:19                       ` Davidlohr Bueso
2016-02-03  7:10                         ` Ding Tianhong
2016-02-03 19:24                           ` Davidlohr Bueso
2016-02-04  1:20                             ` Ding Tianhong
2016-02-12 18:33                               ` Waiman Long
2016-02-03 22:07                         ` Waiman Long
2016-02-04  1:35                       ` Jason Low
2016-02-04  8:55                         ` huang ying
2016-02-04 22:49                           ` Jason Low
2016-01-22 13:41     ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Waiman Long
  -- strict thread matches above, loose matches on Subject: below --
2016-01-21  6:53 [PATCH RFC ] " Ding Tianhong
2016-01-21  7:29 ` Ingo Molnar
2016-01-21  9:04   ` Ding Tianhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).