[PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter
@ 2016-02-12 17:32 Waiman Long
  2016-02-12 17:32 ` [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin() Waiman Long
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Waiman Long @ 2016-02-12 17:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

v1->v2:
 - Set task state to running before doing optimistic spinning.
 - Add 2 more patches to handle possible missed wakeups and wasteful
   spinning in try_to_wake_up() function.

This patchset is a variant of PeterZ's "locking/mutex: Avoid spinner
vs waiter starvation" patch. The major difference is that the
waiter-spinner won't enter into the OSQ used by the spinners. Instead,
it will spin directly on the lock in parallel with the queue head
of the OSQ. So there will be a bit more cacheline contention on the
lock cacheline, but that shouldn't cause noticeable impact on system
performance.

This patchset tries to address 2 issues with Peter's patch:

 1) Ding Tianhong still find that hanging task could happen in some cases.
 2) Jason Low found that there was performance regression for some AIM7
    workloads.

By making the waiter-spinner to spin directly on the mutex, it will
increase the chance for the waiter-spinner to get the lock instead
of waiting in the OSQ for its turn.

Patch 1 modifies the mutex_optimistic_spin() function to enable it
to be called by a waiter-spinner that doesn't need to go into the OSQ.

Patch 2 modifies the mutex locking slowpath to make the waiter call
mutex_optimistic_spin() to do spinning after being waken up.

Patch 3 reverses the sequence of setting task state and changing
mutex count to -1 to prevent the possibility of missed wakeup.

Patch 4 modifies the wakeup code to abandon the wakeup operation
while spinning on the on_cpu flag if the task has changed back to a
non-sleeping state.

My own test on a 4-socket E7-4820 v3 system showed a regression of
about 4% in the high_systime workload with Peter's patch which this
new patch effectively eliminates.

Testing on an 8-socket Westmere-EX server, however, has performance
change from -9% to than +140% on the fserver workload of AIM7
depending on how the system was set up.

Waiman Long (4):
  locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  locking/mutex: Enable optimistic spinning of woken task in wait queue
  locking/mutex: Avoid missed wakeup of mutex waiter
  sched/fair: Abort wakeup when task is no longer in a sleeping state

 kernel/locking/mutex.c |  119 ++++++++++++++++++++++++++++++++++--------------
 kernel/sched/core.c    |    9 +++-
 2 files changed, 92 insertions(+), 36 deletions(-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 17:32 [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Waiman Long
@ 2016-02-12 17:32 ` Waiman Long
  2016-02-12 20:23   ` Peter Zijlstra
                     ` (2 more replies)
  2016-02-12 17:32 ` [PATCH v2 2/4] locking/mutex: Enable optimistic spinning of woken task in wait queue Waiman Long
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 28+ messages in thread
From: Waiman Long @ 2016-02-12 17:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long, Waiman Long

This patch adds a new waiter parameter to the mutex_optimistic_spin()
function to prepare it to be used by a waiter-spinner that doesn't
need to go into the OSQ as there can only be one waiter-spinner which
is the head of the waiting queue.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/mutex.c |   55 ++++++++++++++++++++++++++++++++---------------
 1 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..3c41448 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -273,11 +273,15 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 
 /*
  * Atomically try to take the lock when it is available
+ *
+ * For waiter-spinner, the count needs to be set to -1 first which will be
+ * cleared to 0 later on if the list becomes empty. For regular spinner,
+ * the count will be set to 0.
  */
-static inline bool mutex_try_to_acquire(struct mutex *lock)
+static inline bool mutex_try_to_acquire(struct mutex *lock, int waiter)
 {
 	return !mutex_is_locked(lock) &&
-		(atomic_cmpxchg_acquire(&lock->count, 1, 0) == 1);
+		(atomic_cmpxchg_acquire(&lock->count, 1, waiter ? -1 : 0) == 1);
 }
 
 /*
@@ -302,22 +306,33 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
  *
  * Returns true when the lock was taken, otherwise false, indicating
  * that we need to jump to the slowpath and sleep.
+ *
+ * The waiter flag is set to true if the spinner is a waiter in the wait
+ * queue. It doesn't go into OSQ as there can be only one waiter at the
+ * head of the queue spinning. It is possible that both head waiter and
+ * the head of the OSQ spinning on the lock. So there may be a bit more
+ * cacheline contention in this case. The waiter needs to set the lock
+ * to -1 instead of 0 on lock acquisition.
  */
 static bool mutex_optimistic_spin(struct mutex *lock,
-				  struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+				  struct ww_acquire_ctx *ww_ctx,
+				  const bool use_ww_ctx, int waiter)
 {
 	struct task_struct *task = current;
+	bool acquired = false;
 
-	if (!mutex_can_spin_on_owner(lock))
-		goto done;
+	if (!waiter) {
+		if (!mutex_can_spin_on_owner(lock))
+			goto done;
 
-	/*
-	 * In order to avoid a stampede of mutex spinners trying to
-	 * acquire the mutex all at once, the spinners need to take a
-	 * MCS (queued) lock first before spinning on the owner field.
-	 */
-	if (!osq_lock(&lock->osq))
-		goto done;
+		/*
+		 * In order to avoid a stampede of mutex spinners trying to
+		 * acquire the mutex all at once, the spinners need to take a
+		 * MCS (queued) lock first before spinning on the owner field.
+		 */
+		if (!osq_lock(&lock->osq))
+			goto done;
+	}
 
 	while (true) {
 		struct task_struct *owner;
@@ -347,7 +362,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 			break;
 
 		/* Try to acquire the mutex if it is unlocked. */
-		if (mutex_try_to_acquire(lock)) {
+		if (mutex_try_to_acquire(lock, waiter)) {
 			lock_acquired(&lock->dep_map, ip);
 
 			if (use_ww_ctx) {
@@ -358,8 +373,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 			}
 
 			mutex_set_owner(lock);
-			osq_unlock(&lock->osq);
-			return true;
+			acquired = true;
+			break;
 		}
 
 		/*
@@ -380,7 +395,10 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		cpu_relax_lowlatency();
 	}
 
-	osq_unlock(&lock->osq);
+	if (!waiter)
+		osq_unlock(&lock->osq);
+	if (acquired || waiter)
+		return acquired;
 done:
 	/*
 	 * If we fell out of the spin path because of need_resched(),
@@ -400,7 +418,8 @@ done:
 }
 #else
 static bool mutex_optimistic_spin(struct mutex *lock,
-				  struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+				  struct ww_acquire_ctx *ww_ctx,
+				  const bool use_ww_ctx, int waiter)
 {
 	return false;
 }
@@ -517,7 +536,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	preempt_disable();
 	mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
 
-	if (mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx)) {
+	if (mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, false)) {
 		/* got the lock, yay! */
 		preempt_enable();
 		return 0;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 17:32 ` [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin() Waiman Long
@ 2016-02-12 20:23   ` Peter Zijlstra
  2016-02-12 22:14     ` Davidlohr Bueso
  2016-02-15 22:06     ` Waiman Long
  2016-02-12 20:40   ` Peter Zijlstra
  2016-02-12 22:02   ` Davidlohr Bueso
  2 siblings, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-02-12 20:23 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
> This patch adds a new waiter parameter to the mutex_optimistic_spin()
> function to prepare it to be used by a waiter-spinner that doesn't
> need to go into the OSQ as there can only be one waiter-spinner which
> is the head of the waiting queue.

Does not explain why..

>  static bool mutex_optimistic_spin(struct mutex *lock,
> +				  struct ww_acquire_ctx *ww_ctx,
> +				  const bool use_ww_ctx, int waiter)
>  {
>  	struct task_struct *task = current;
> +	bool acquired = false;
>  
> +	if (!waiter) {
> +		if (!mutex_can_spin_on_owner(lock))
> +			goto done;

Why doesn't the waiter have to check mutex_can_spin_on_owner() ?

>  
> +		/*
> +		 * In order to avoid a stampede of mutex spinners trying to
> +		 * acquire the mutex all at once, the spinners need to take a
> +		 * MCS (queued) lock first before spinning on the owner field.
> +		 */
> +		if (!osq_lock(&lock->osq))
> +			goto done;
> +	}
>  
>  	while (true) {
>  		struct task_struct *owner;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 20:23   ` Peter Zijlstra
@ 2016-02-12 22:14     ` Davidlohr Bueso
  2016-02-13 12:10       ` Peter Zijlstra
  2016-02-16  2:15       ` Jason Low
  2016-02-15 22:06     ` Waiman Long
  1 sibling, 2 replies; 28+ messages in thread
From: Davidlohr Bueso @ 2016-02-12 22:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, linux-kernel, Linus Torvalds,
	Ding Tianhong, Jason Low, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On Fri, 12 Feb 2016, Peter Zijlstra wrote:

>On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
>>  static bool mutex_optimistic_spin(struct mutex *lock,
>> +				  struct ww_acquire_ctx *ww_ctx,
>> +				  const bool use_ww_ctx, int waiter)
>>  {
>>  	struct task_struct *task = current;
>> +	bool acquired = false;
>>
>> +	if (!waiter) {
>> +		if (!mutex_can_spin_on_owner(lock))
>> +			goto done;
>
>Why doesn't the waiter have to check mutex_can_spin_on_owner() ?

afaict because mutex_can_spin_on_owner() fails immediately when the counter
is -1, which is a nono for the waiters case.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 22:14     ` Davidlohr Bueso
@ 2016-02-13 12:10       ` Peter Zijlstra
  2016-02-13 18:14         ` Davidlohr Bueso
  2016-02-16  2:15       ` Jason Low
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-02-13 12:10 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Waiman Long, Ingo Molnar, linux-kernel, Linus Torvalds,
	Ding Tianhong, Jason Low, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On Fri, Feb 12, 2016 at 02:14:44PM -0800, Davidlohr Bueso wrote:
> On Fri, 12 Feb 2016, Peter Zijlstra wrote:
> 
> >On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
> >> static bool mutex_optimistic_spin(struct mutex *lock,
> >>+				  struct ww_acquire_ctx *ww_ctx,
> >>+				  const bool use_ww_ctx, int waiter)
> >> {
> >> 	struct task_struct *task = current;
> >>+	bool acquired = false;
> >>
> >>+	if (!waiter) {
> >>+		if (!mutex_can_spin_on_owner(lock))
> >>+			goto done;
> >
> >Why doesn't the waiter have to check mutex_can_spin_on_owner() ?
> 
> afaict because mutex_can_spin_on_owner() fails immediately when the counter
> is -1, which is a nono for the waiters case.

Can't see it do that, also, if it were to do that, we'd not be here
since having a waiter would then mean no spinners and no starvation
etc..

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-13 12:10       ` Peter Zijlstra
@ 2016-02-13 18:14         ` Davidlohr Bueso
  0 siblings, 0 replies; 28+ messages in thread
From: Davidlohr Bueso @ 2016-02-13 18:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, linux-kernel, Linus Torvalds,
	Ding Tianhong, Jason Low, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On Sat, 13 Feb 2016, Peter Zijlstra wrote:

>Can't see it do that, also, if it were to do that, we'd not be here
>since having a waiter would then mean no spinners and no starvation
>etc..

I was having a hard time understanding why on earth you didn't see that. And yes
I was also wondering why it was even there... but I was looking at -next, where we
have Ding's original patch b9ce3647b4901071b0dd35d62954a4bb0e5ba1d1.... Paul, could
you drop this patch?

But yeah, I do agree that waiters should also do the mutex_can_spin_on_owner()
check unless I'm missing something subtle from Waiman.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 22:14     ` Davidlohr Bueso
  2016-02-13 12:10       ` Peter Zijlstra
@ 2016-02-16  2:15       ` Jason Low
  2016-02-16  2:22         ` Jason Low
  1 sibling, 1 reply; 28+ messages in thread
From: Jason Low @ 2016-02-16  2:15 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Waiman Long, Ingo Molnar, linux-kernel,
	Linus Torvalds, Ding Tianhong, Jason Low, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Fri, 2016-02-12 at 14:14 -0800, Davidlohr Bueso wrote:
> On Fri, 12 Feb 2016, Peter Zijlstra wrote:
> 
> >On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
> >>  static bool mutex_optimistic_spin(struct mutex *lock,
> >> +				  struct ww_acquire_ctx *ww_ctx,
> >> +				  const bool use_ww_ctx, int waiter)
> >>  {
> >>  	struct task_struct *task = current;
> >> +	bool acquired = false;
> >>
> >> +	if (!waiter) {
> >> +		if (!mutex_can_spin_on_owner(lock))
> >> +			goto done;
> >
> >Why doesn't the waiter have to check mutex_can_spin_on_owner() ?
> 
> afaict because mutex_can_spin_on_owner() fails immediately when the counter
> is -1, which is a nono for the waiters case.

mutex_can_spin_on_owner() returns false if the task needs to reschedule
or if the lock owner is not on_cpu. In either case, the task will end up
not spinning when it enters the spin loop. So it makes sense if the
waiter also checks mutex_can_spin_on_owner() so that the optimistic spin
queue overhead can be avoided in those cases.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-16  2:15       ` Jason Low
@ 2016-02-16  2:22         ` Jason Low
  2016-02-16  8:53           ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Low @ 2016-02-16  2:22 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Waiman Long, Ingo Molnar, linux-kernel,
	Linus Torvalds, Ding Tianhong, Jason Low, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2

On Mon, 2016-02-15 at 18:15 -0800, Jason Low wrote:
> On Fri, 2016-02-12 at 14:14 -0800, Davidlohr Bueso wrote:
> > On Fri, 12 Feb 2016, Peter Zijlstra wrote:
> > 
> > >On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
> > >>  static bool mutex_optimistic_spin(struct mutex *lock,
> > >> +				  struct ww_acquire_ctx *ww_ctx,
> > >> +				  const bool use_ww_ctx, int waiter)
> > >>  {
> > >>  	struct task_struct *task = current;
> > >> +	bool acquired = false;
> > >>
> > >> +	if (!waiter) {
> > >> +		if (!mutex_can_spin_on_owner(lock))
> > >> +			goto done;
> > >
> > >Why doesn't the waiter have to check mutex_can_spin_on_owner() ?
> > 
> > afaict because mutex_can_spin_on_owner() fails immediately when the counter
> > is -1, which is a nono for the waiters case.
> 
> mutex_can_spin_on_owner() returns false if the task needs to reschedule
> or if the lock owner is not on_cpu. In either case, the task will end up
> not spinning when it enters the spin loop. So it makes sense if the
> waiter also checks mutex_can_spin_on_owner() so that the optimistic spin
> queue overhead can be avoided in those cases.

Actually, since waiters bypass the optimistic spin queue, that means the
the mutex_can_spin_on_owner() isn't really beneficial. So Waiman is
right in that it's fine to skip this in the waiter case.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-16  2:22         ` Jason Low
@ 2016-02-16  8:53           ` Peter Zijlstra
  2016-02-17  1:40             ` Waiman Long
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-02-16  8:53 UTC (permalink / raw)
  To: Jason Low
  Cc: Davidlohr Bueso, Waiman Long, Ingo Molnar, linux-kernel,
	Linus Torvalds, Ding Tianhong, Jason Low, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Mon, Feb 15, 2016 at 06:22:14PM -0800, Jason Low wrote:
> On Mon, 2016-02-15 at 18:15 -0800, Jason Low wrote:
> > On Fri, 2016-02-12 at 14:14 -0800, Davidlohr Bueso wrote:
> > > On Fri, 12 Feb 2016, Peter Zijlstra wrote:
> > > 
> > > >On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
> > > >>  static bool mutex_optimistic_spin(struct mutex *lock,
> > > >> +				  struct ww_acquire_ctx *ww_ctx,
> > > >> +				  const bool use_ww_ctx, int waiter)
> > > >>  {
> > > >>  	struct task_struct *task = current;
> > > >> +	bool acquired = false;
> > > >>
> > > >> +	if (!waiter) {
> > > >> +		if (!mutex_can_spin_on_owner(lock))
> > > >> +			goto done;
> > > >
> > > >Why doesn't the waiter have to check mutex_can_spin_on_owner() ?
> > > 
> > > afaict because mutex_can_spin_on_owner() fails immediately when the counter
> > > is -1, which is a nono for the waiters case.
> > 
> > mutex_can_spin_on_owner() returns false if the task needs to reschedule
> > or if the lock owner is not on_cpu. In either case, the task will end up
> > not spinning when it enters the spin loop. So it makes sense if the
> > waiter also checks mutex_can_spin_on_owner() so that the optimistic spin
> > queue overhead can be avoided in those cases.
> 
> Actually, since waiters bypass the optimistic spin queue, that means the
> the mutex_can_spin_on_owner() isn't really beneficial. So Waiman is
> right in that it's fine to skip this in the waiter case.

My concern was the 'pointless' divergence between the code-paths. The
less they diverge the easier it is to understand and review.

If it doesn't hurt, please keep it the same. If it does need to diverge,
include a comment on why.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-16  8:53           ` Peter Zijlstra
@ 2016-02-17  1:40             ` Waiman Long
  0 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2016-02-17  1:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, Davidlohr Bueso, Ingo Molnar, linux-kernel,
	Linus Torvalds, Ding Tianhong, Jason Low, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On 02/16/2016 03:53 AM, Peter Zijlstra wrote:
> On Mon, Feb 15, 2016 at 06:22:14PM -0800, Jason Low wrote:
>> On Mon, 2016-02-15 at 18:15 -0800, Jason Low wrote:
>>> On Fri, 2016-02-12 at 14:14 -0800, Davidlohr Bueso wrote:
>>>> On Fri, 12 Feb 2016, Peter Zijlstra wrote:
>>>>
>>>>> On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
>>>>>>   static bool mutex_optimistic_spin(struct mutex *lock,
>>>>>> +				  struct ww_acquire_ctx *ww_ctx,
>>>>>> +				  const bool use_ww_ctx, int waiter)
>>>>>>   {
>>>>>>   	struct task_struct *task = current;
>>>>>> +	bool acquired = false;
>>>>>>
>>>>>> +	if (!waiter) {
>>>>>> +		if (!mutex_can_spin_on_owner(lock))
>>>>>> +			goto done;
>>>>> Why doesn't the waiter have to check mutex_can_spin_on_owner() ?
>>>> afaict because mutex_can_spin_on_owner() fails immediately when the counter
>>>> is -1, which is a nono for the waiters case.
>>> mutex_can_spin_on_owner() returns false if the task needs to reschedule
>>> or if the lock owner is not on_cpu. In either case, the task will end up
>>> not spinning when it enters the spin loop. So it makes sense if the
>>> waiter also checks mutex_can_spin_on_owner() so that the optimistic spin
>>> queue overhead can be avoided in those cases.
>> Actually, since waiters bypass the optimistic spin queue, that means the
>> the mutex_can_spin_on_owner() isn't really beneficial. So Waiman is
>> right in that it's fine to skip this in the waiter case.
> My concern was the 'pointless' divergence between the code-paths. The
> less they diverge the easier it is to understand and review.
>
> If it doesn't hurt, please keep it the same. If it does need to diverge,
> include a comment on why.

I will keep the preemption, but will still leave out the 
mutex_can_spin_on_owner() check for waiter. I will add a comment to 
document that.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 20:23   ` Peter Zijlstra
  2016-02-12 22:14     ` Davidlohr Bueso
@ 2016-02-15 22:06     ` Waiman Long
  1 sibling, 0 replies; 28+ messages in thread
From: Waiman Long @ 2016-02-15 22:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On 02/12/2016 03:23 PM, Peter Zijlstra wrote:
> On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
>> This patch adds a new waiter parameter to the mutex_optimistic_spin()
>> function to prepare it to be used by a waiter-spinner that doesn't
>> need to go into the OSQ as there can only be one waiter-spinner which
>> is the head of the waiting queue.
> Does not explain why..
>
>>   static bool mutex_optimistic_spin(struct mutex *lock,
>> +				  struct ww_acquire_ctx *ww_ctx,
>> +				  const bool use_ww_ctx, int waiter)
>>   {
>>   	struct task_struct *task = current;
>> +	bool acquired = false;
>>
>> +	if (!waiter) {
>> +		if (!mutex_can_spin_on_owner(lock))
>> +			goto done;
> Why doesn't the waiter have to check mutex_can_spin_on_owner() ?

The reason why regular spinner need to call mutex_can_spin_on_owner is 
because we want to avoid the overhead of the OSQ if  we can't spin. As 
mutex_can_spin_on_owner() is similar to mutex_spin_on_owner(), the 
waiter-spinner will exit spinning after calling mutex_spin_on_owner().

Cheers,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 17:32 ` [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin() Waiman Long
  2016-02-12 20:23   ` Peter Zijlstra
@ 2016-02-12 20:40   ` Peter Zijlstra
  2016-02-15 23:55     ` Waiman Long
  2016-02-12 22:02   ` Davidlohr Bueso
  2 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-02-12 20:40 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
> @@ -358,8 +373,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
>  			}
>  
>  			mutex_set_owner(lock);
> -			osq_unlock(&lock->osq);
> -			return true;
> +			acquired = true;
> +			break;
>  		}
>  
>  		/*
> @@ -380,7 +395,10 @@ static bool mutex_optimistic_spin(struct mutex *lock,
>  		cpu_relax_lowlatency();
>  	}
>  
> -	osq_unlock(&lock->osq);
> +	if (!waiter)
> +		osq_unlock(&lock->osq);
> +	if (acquired || waiter)
> +		return acquired;
>  done:
>  	/*
>  	 * If we fell out of the spin path because of need_resched(),

Is there a reason to not also preempt in the wait-loop? Surely the same
reason is still valid there too?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 20:40   ` Peter Zijlstra
@ 2016-02-15 23:55     ` Waiman Long
  2016-02-16  3:00       ` Jason Low
  0 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2016-02-15 23:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On 02/12/2016 03:40 PM, Peter Zijlstra wrote:
> On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
>> @@ -358,8 +373,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
>>   			}
>>
>>   			mutex_set_owner(lock);
>> -			osq_unlock(&lock->osq);
>> -			return true;
>> +			acquired = true;
>> +			break;
>>   		}
>>
>>   		/*
>> @@ -380,7 +395,10 @@ static bool mutex_optimistic_spin(struct mutex *lock,
>>   		cpu_relax_lowlatency();
>>   	}
>>
>> -	osq_unlock(&lock->osq);
>> +	if (!waiter)
>> +		osq_unlock(&lock->osq);
>> +	if (acquired || waiter)
>> +		return acquired;
>>   done:
>>   	/*
>>   	 * If we fell out of the spin path because of need_resched(),
> Is there a reason to not also preempt in the wait-loop? Surely the same
> reason is still valid there too?

The waiter does check for need_sched(). So it will break out of the loop 
and return false in this case. This causes the waiter to loop back and 
goes to sleep if the lock can't be acquired. That is why I don't think 
we need to do another schedule_preempt_disabled() here.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-15 23:55     ` Waiman Long
@ 2016-02-16  3:00       ` Jason Low
  2016-02-16  3:30         ` Waiman Long
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Low @ 2016-02-16  3:00 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Linus Torvalds,
	Ding Tianhong, Jason Low, Davidlohr Bueso, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2

On Mon, 2016-02-15 at 18:55 -0500, Waiman Long wrote:
> On 02/12/2016 03:40 PM, Peter Zijlstra wrote:
> > On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
> >> @@ -358,8 +373,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
> >>   			}
> >>
> >>   			mutex_set_owner(lock);
> >> -			osq_unlock(&lock->osq);
> >> -			return true;
> >> +			acquired = true;
> >> +			break;
> >>   		}
> >>
> >>   		/*
> >> @@ -380,7 +395,10 @@ static bool mutex_optimistic_spin(struct mutex *lock,
> >>   		cpu_relax_lowlatency();
> >>   	}
> >>
> >> -	osq_unlock(&lock->osq);
> >> +	if (!waiter)
> >> +		osq_unlock(&lock->osq);
> >> +	if (acquired || waiter)
> >> +		return acquired;
> >>   done:
> >>   	/*
> >>   	 * If we fell out of the spin path because of need_resched(),
> > Is there a reason to not also preempt in the wait-loop? Surely the same
> > reason is still valid there too?
> 
> The waiter does check for need_sched(). So it will break out of the loop 
> and return false in this case. This causes the waiter to loop back and 
> goes to sleep if the lock can't be acquired. That is why I don't think 
> we need to do another schedule_preempt_disabled() here.

The purpose of the additional reschedule point is to avoid delaying
preemption, which still applies if the spinner is a waiter. If it is a
waiter, the difference is that the delay isn't as long since it doesn't
need to be added to the wait_list. Nonetheless, preemption delays can
still occur, so I think the additional preemption point should also be
there in the waiter case.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-16  3:00       ` Jason Low
@ 2016-02-16  3:30         ` Waiman Long
  0 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2016-02-16  3:30 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Linus Torvalds,
	Ding Tianhong, Jason Low, Davidlohr Bueso, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On 02/15/2016 10:00 PM, Jason Low wrote:
> On Mon, 2016-02-15 at 18:55 -0500, Waiman Long wrote:
>> On 02/12/2016 03:40 PM, Peter Zijlstra wrote:
>>> On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote:
>>>> @@ -358,8 +373,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
>>>>    			}
>>>>
>>>>    			mutex_set_owner(lock);
>>>> -			osq_unlock(&lock->osq);
>>>> -			return true;
>>>> +			acquired = true;
>>>> +			break;
>>>>    		}
>>>>
>>>>    		/*
>>>> @@ -380,7 +395,10 @@ static bool mutex_optimistic_spin(struct mutex *lock,
>>>>    		cpu_relax_lowlatency();
>>>>    	}
>>>>
>>>> -	osq_unlock(&lock->osq);
>>>> +	if (!waiter)
>>>> +		osq_unlock(&lock->osq);
>>>> +	if (acquired || waiter)
>>>> +		return acquired;
>>>>    done:
>>>>    	/*
>>>>    	 * If we fell out of the spin path because of need_resched(),
>>> Is there a reason to not also preempt in the wait-loop? Surely the same
>>> reason is still valid there too?
>> The waiter does check for need_sched(). So it will break out of the loop
>> and return false in this case. This causes the waiter to loop back and
>> goes to sleep if the lock can't be acquired. That is why I don't think
>> we need to do another schedule_preempt_disabled() here.
> The purpose of the additional reschedule point is to avoid delaying
> preemption, which still applies if the spinner is a waiter. If it is a
> waiter, the difference is that the delay isn't as long since it doesn't
> need to be added to the wait_list. Nonetheless, preemption delays can
> still occur, so I think the additional preemption point should also be
> there in the waiter case.

You are right. Taking the wait lock can introduce arbitrary delay. So I 
will modify the patch to fall through and check for preemption.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 17:32 ` [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin() Waiman Long
  2016-02-12 20:23   ` Peter Zijlstra
  2016-02-12 20:40   ` Peter Zijlstra
@ 2016-02-12 22:02   ` Davidlohr Bueso
  2016-02-12 22:09     ` Davidlohr Bueso
  2016-02-16  0:03     ` Waiman Long
  2 siblings, 2 replies; 28+ messages in thread
From: Davidlohr Bueso @ 2016-02-12 22:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Linus Torvalds,
	Ding Tianhong, Jason Low, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On Fri, 12 Feb 2016, Waiman Long wrote:

>This patch adds a new waiter parameter to the mutex_optimistic_spin()
>function to prepare it to be used by a waiter-spinner that doesn't
>need to go into the OSQ as there can only be one waiter-spinner which
>is the head of the waiting queue.
>
>Signed-off-by: Waiman Long <Waiman.Long@hp.com>
>---
> kernel/locking/mutex.c |   55 ++++++++++++++++++++++++++++++++---------------
> 1 files changed, 37 insertions(+), 18 deletions(-)
>
>diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>index 0551c21..3c41448 100644
>--- a/kernel/locking/mutex.c
>+++ b/kernel/locking/mutex.c
>@@ -273,11 +273,15 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
>
> /*
>  * Atomically try to take the lock when it is available
>+ *
>+ * For waiter-spinner, the count needs to be set to -1 first which will be
>+ * cleared to 0 later on if the list becomes empty. For regular spinner,
>+ * the count will be set to 0.
>  */
>-static inline bool mutex_try_to_acquire(struct mutex *lock)
>+static inline bool mutex_try_to_acquire(struct mutex *lock, int waiter)
> {
> 	return !mutex_is_locked(lock) &&
>-		(atomic_cmpxchg_acquire(&lock->count, 1, 0) == 1);
>+		(atomic_cmpxchg_acquire(&lock->count, 1, waiter ? -1 : 0) == 1);
> }

This can be a really hot path, could we get rid of the waiter check and just
introduce mutex_tro_to_acquire_waiter() or such and set the counter to -1 there?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 22:02   ` Davidlohr Bueso
@ 2016-02-12 22:09     ` Davidlohr Bueso
  2016-02-16  0:03     ` Waiman Long
  1 sibling, 0 replies; 28+ messages in thread
From: Davidlohr Bueso @ 2016-02-12 22:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Linus Torvalds,
	Ding Tianhong, Jason Low, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On Fri, 12 Feb 2016, Davidlohr Bueso wrote:

>This can be a really hot path, could we get rid of the waiter check and just
>introduce mutex_tro_to_acquire_waiter() or such and set the counter to -1 there?

Ah nm, I just realized why you do this :) For a second I thought that you called into
this directly from __mutex_lock_common.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()
  2016-02-12 22:02   ` Davidlohr Bueso
  2016-02-12 22:09     ` Davidlohr Bueso
@ 2016-02-16  0:03     ` Waiman Long
  1 sibling, 0 replies; 28+ messages in thread
From: Waiman Long @ 2016-02-16  0:03 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Linus Torvalds,
	Ding Tianhong, Jason Low, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On 02/12/2016 05:02 PM, Davidlohr Bueso wrote:
> On Fri, 12 Feb 2016, Waiman Long wrote:
>
>> This patch adds a new waiter parameter to the mutex_optimistic_spin()
>> function to prepare it to be used by a waiter-spinner that doesn't
>> need to go into the OSQ as there can only be one waiter-spinner which
>> is the head of the waiting queue.
>>
>> Signed-off-by: Waiman Long <Waiman.Long@hp.com>
>> ---
>> kernel/locking/mutex.c |   55 
>> ++++++++++++++++++++++++++++++++---------------
>> 1 files changed, 37 insertions(+), 18 deletions(-)
>>
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 0551c21..3c41448 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -273,11 +273,15 @@ static inline int 
>> mutex_can_spin_on_owner(struct mutex *lock)
>>
>> /*
>>  * Atomically try to take the lock when it is available
>> + *
>> + * For waiter-spinner, the count needs to be set to -1 first which 
>> will be
>> + * cleared to 0 later on if the list becomes empty. For regular 
>> spinner,
>> + * the count will be set to 0.
>>  */
>> -static inline bool mutex_try_to_acquire(struct mutex *lock)
>> +static inline bool mutex_try_to_acquire(struct mutex *lock, int waiter)
>> {
>>     return !mutex_is_locked(lock) &&
>> -        (atomic_cmpxchg_acquire(&lock->count, 1, 0) == 1);
>> +        (atomic_cmpxchg_acquire(&lock->count, 1, waiter ? -1 : 0) == 
>> 1);
>> }
>
> This can be a really hot path, could we get rid of the waiter check 
> and just
> introduce mutex_tro_to_acquire_waiter() or such and set the counter to 
> -1 there?
>
> Thanks,
> Davidlohr

It is hot in the sense that the lock cacheline is highly contested. On 
x86, the ?: statement will most likely be translated to a cmov 
instruction before doing the cmpxchg. The cmov instruction won't affect 
the amount of cacheline contention on that lock cacheline. So I don't 
see there is any problem here.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 2/4] locking/mutex: Enable optimistic spinning of woken task in wait queue
  2016-02-12 17:32 [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Waiman Long
  2016-02-12 17:32 ` [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin() Waiman Long
@ 2016-02-12 17:32 ` Waiman Long
  2016-02-12 17:32 ` [PATCH v2 3/4] locking/mutex: Avoid missed wakeup of mutex waiter Waiman Long
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2016-02-12 17:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
woken task in the wait queue to enter into an optimistic spinning
loop itself in parallel with the regular spinners in the OSQ. This
should prevent the live-lock condition from happening.

Running the AIM7 benchmarks on a 4-socket E7-4820 v3 system (with ext4
filesystem), the additional spinning of the waiter-spinning improved
performance for the following workloads at high user count:

  Workload	% Improvement
  --------	-------------
  alltests	    3.9%
  disk		    3.4%
  fserver	    2.0%
  long		    3.8%
  new_fserver	   10.5%

The other workloads were about the same as before.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 kernel/locking/mutex.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 3c41448..29c6d90 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -531,6 +531,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	struct task_struct *task = current;
 	struct mutex_waiter waiter;
 	unsigned long flags;
+	bool  acquired = false;	/* True if the lock is acquired */
 	int ret;
 
 	preempt_disable();
@@ -561,7 +562,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 
 	lock_contended(&lock->dep_map, ip);
 
-	for (;;) {
+	while (!acquired) {
 		/*
 		 * Lets try to take the lock again - this is needed even if
 		 * we get here for the first time (shortly after failing to
@@ -596,9 +597,18 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
+
+		/*
+		 * Optimistically spinning on the mutex without the wait lock
+		 * The state has to be set to running to avoid another waker
+		 * spinning on the on_cpu flag while the woken waiter is
+		 * spinning on the mutex.
+		 */
+		__set_task_state(task, TASK_RUNNING);
+		acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx,
+						 true);
 		spin_lock_mutex(&lock->wait_lock, flags);
 	}
-	__set_task_state(task, TASK_RUNNING);
 
 	mutex_remove_waiter(lock, &waiter, current_thread_info());
 	/* set it to 0 if there are no waiters left: */
@@ -606,6 +616,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		atomic_set(&lock->count, 0);
 	debug_mutex_free_waiter(&waiter);
 
+	if (acquired)
+		goto unlock;
+
 skip_wait:
 	/* got the lock - cleanup and rejoice! */
 	lock_acquired(&lock->dep_map, ip);
@@ -616,6 +629,7 @@ skip_wait:
 		ww_mutex_set_context_slowpath(ww, ww_ctx);
 	}
 
+unlock:
 	spin_unlock_mutex(&lock->wait_lock, flags);
 	preempt_enable();
 	return 0;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/4] locking/mutex: Avoid missed wakeup of mutex waiter
  2016-02-12 17:32 [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Waiman Long
  2016-02-12 17:32 ` [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin() Waiman Long
  2016-02-12 17:32 ` [PATCH v2 2/4] locking/mutex: Enable optimistic spinning of woken task in wait queue Waiman Long
@ 2016-02-12 17:32 ` Waiman Long
  2016-02-12 17:32 ` [PATCH v2 4/4] sched/fair: Abort wakeup when task is no longer in a sleeping state Waiman Long
  2016-02-16  8:51 ` [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Peter Zijlstra
  4 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2016-02-12 17:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long, Waiman Long

The current mutex code sets count to -1 and then sets the task
state. This is the same sequence that the mutex unlock path is checking
count and task state. That could lead to a missed wakeup even though
the problem will be cleared when a new waiter enters the waiting queue.

This patch reverses the order in the locking slowpath so that the task
state is set first before setting the count. This should eliminate
the potential missed wakeup and improve latency.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/mutex.c |   46 +++++++++++++++++++++++++++++++---------------
 1 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 29c6d90..27123bd 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -564,20 +564,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 
 	while (!acquired) {
 		/*
-		 * Lets try to take the lock again - this is needed even if
-		 * we get here for the first time (shortly after failing to
-		 * acquire the lock), to make sure that we get a wakeup once
-		 * it's unlocked. Later on, if we sleep, this is the
-		 * operation that gives us the lock. We xchg it to -1, so
-		 * that when we release the lock, we properly wake up the
-		 * other waiters. We only attempt the xchg if the count is
-		 * non-negative in order to avoid unnecessary xchg operations:
-		 */
-		if (atomic_read(&lock->count) >= 0 &&
-		    (atomic_xchg_acquire(&lock->count, -1) == 1))
-			break;
-
-		/*
 		 * got a signal? (This code gets eliminated in the
 		 * TASK_UNINTERRUPTIBLE case.)
 		 */
@@ -592,7 +578,37 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 				goto err;
 		}
 
-		__set_task_state(task, state);
+
+		/*
+		 * We need to set the state first before changing the count
+		 * to -1 to avoid missed wakeup even though the problem can
+		 * be cleared by a new waiter entering the queue.
+		 *
+		 *	Sleep			Wakeup
+		 *	-----			------
+		 *   [S] p->state = state [RmW] count = 1
+		 *	 MB			MB
+		 * [RmW] count = -1 	    [L] if ((prev_count < 0) &&
+		 *	 if (prev_count < 1)	    (p->state & state))
+		 *	     sleep		    wakeup
+		 */
+		set_task_state(task, state);
+
+		/*
+		 * Lets try to take the lock again - this is needed even if
+		 * we get here for the first time (shortly after failing to
+		 * acquire the lock), to make sure that we get a wakeup once
+		 * it's unlocked. Later on, if we sleep, this is the
+		 * operation that gives us the lock. We xchg it to -1, so
+		 * that when we release the lock, we properly wake up the
+		 * other waiters. We only attempt the xchg if the count is
+		 * non-negative in order to avoid unnecessary xchg operations:
+		 */
+		if (atomic_read(&lock->count) >= 0 &&
+		   (atomic_xchg_acquire(&lock->count, -1) == 1)) {
+			__set_task_state(task, TASK_RUNNING);
+			break;
+		}
 
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/4] sched/fair: Abort wakeup when task is no longer in a sleeping state
  2016-02-12 17:32 [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Waiman Long
                   ` (2 preceding siblings ...)
  2016-02-12 17:32 ` [PATCH v2 3/4] locking/mutex: Avoid missed wakeup of mutex waiter Waiman Long
@ 2016-02-12 17:32 ` Waiman Long
  2016-02-12 20:18   ` Peter Zijlstra
  2016-02-16  8:51 ` [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Peter Zijlstra
  4 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2016-02-12 17:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long, Waiman Long

When a task prepares to sleep and then aborts it somehow, there is
a small chance that a waker may be spinning on the on_cpu flag of
that task waiting for the flag to turn off before doing the wakeup
operation. It may keep on spinning for a long time until that task
actually sleeps leading to spurious wakeup.

This patch adds code to detect the change in task state and abort
the wakeup operation, when appropriate, to free up the waker's cpu
to do other useful works.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/sched/core.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7e548bd..e4b6e84 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2075,8 +2075,15 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	 *
 	 * This ensures that tasks getting woken will be fully ordered against
 	 * their previous state and preserve Program Order.
+	 *
+	 * If the owning cpu decides not to sleep after all by changing back
+	 * its task state, we can return immediately.
 	 */
-	smp_cond_acquire(!p->on_cpu);
+	smp_cond_acquire(!p->on_cpu || !(p->state & state));
+	if (!(p->state & state)) {
+		success = 0;
+		goto out;
+	}
 
 	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/4] sched/fair: Abort wakeup when task is no longer in a sleeping state
  2016-02-12 17:32 ` [PATCH v2 4/4] sched/fair: Abort wakeup when task is no longer in a sleeping state Waiman Long
@ 2016-02-12 20:18   ` Peter Zijlstra
  2016-02-12 21:22     ` Waiman Long
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-02-12 20:18 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On Fri, Feb 12, 2016 at 12:32:15PM -0500, Waiman Long wrote:
> When a task prepares to sleep and then aborts it somehow, there is
> a small chance that a waker may be spinning on the on_cpu flag of
> that task waiting for the flag to turn off before doing the wakeup
> operation. It may keep on spinning for a long time until that task
> actually sleeps leading to spurious wakeup.
> 
> This patch adds code to detect the change in task state and abort
> the wakeup operation, when appropriate, to free up the waker's cpu
> to do other useful works.
> 
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> ---
>  kernel/sched/core.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7e548bd..e4b6e84 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2075,8 +2075,15 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  	 *
>  	 * This ensures that tasks getting woken will be fully ordered against
>  	 * their previous state and preserve Program Order.
> +	 *
> +	 * If the owning cpu decides not to sleep after all by changing back
> +	 * its task state, we can return immediately.
>  	 */
> -	smp_cond_acquire(!p->on_cpu);
> +	smp_cond_acquire(!p->on_cpu || !(p->state & state));
> +	if (!(p->state & state)) {
> +		success = 0;
> +		goto out;
> +	}

This doesn't make sense, if we managed to get here, p->on_rq must be
false, which means the other side is already in the middle of
schedule().

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/4] sched/fair: Abort wakeup when task is no longer in a sleeping state
  2016-02-12 20:18   ` Peter Zijlstra
@ 2016-02-12 21:22     ` Waiman Long
  2016-02-13 12:09       ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2016-02-12 21:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On 02/12/2016 03:18 PM, Peter Zijlstra wrote:
> On Fri, Feb 12, 2016 at 12:32:15PM -0500, Waiman Long wrote:
>> When a task prepares to sleep and then aborts it somehow, there is
>> a small chance that a waker may be spinning on the on_cpu flag of
>> that task waiting for the flag to turn off before doing the wakeup
>> operation. It may keep on spinning for a long time until that task
>> actually sleeps leading to spurious wakeup.
>>
>> This patch adds code to detect the change in task state and abort
>> the wakeup operation, when appropriate, to free up the waker's cpu
>> to do other useful works.
>>
>> Signed-off-by: Waiman Long<Waiman.Long@hp.com>
>> ---
>>   kernel/sched/core.c |    9 ++++++++-
>>   1 files changed, 8 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 7e548bd..e4b6e84 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2075,8 +2075,15 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>>   	 *
>>   	 * This ensures that tasks getting woken will be fully ordered against
>>   	 * their previous state and preserve Program Order.
>> +	 *
>> +	 * If the owning cpu decides not to sleep after all by changing back
>> +	 * its task state, we can return immediately.
>>   	 */
>> -	smp_cond_acquire(!p->on_cpu);
>> +	smp_cond_acquire(!p->on_cpu || !(p->state&  state));
>> +	if (!(p->state&  state)) {
>> +		success = 0;
>> +		goto out;
>> +	}
> This doesn't make sense, if we managed to get here, p->on_rq must be
> false, which means the other side is already in the middle of
> schedule().
Yes, you are right. It is my bad that I miss the on_rq check earlier. 
Just scrap the last patch.

Sorry for that:-[
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/4] sched/fair: Abort wakeup when task is no longer in a sleeping state
  2016-02-12 21:22     ` Waiman Long
@ 2016-02-13 12:09       ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-02-13 12:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen, Waiman Long

On Fri, Feb 12, 2016 at 04:22:29PM -0500, Waiman Long wrote:

> >>+	smp_cond_acquire(!p->on_cpu || !(p->state&  state));
> >>+	if (!(p->state&  state)) {
> >>+		success = 0;
> >>+		goto out;
> >>+	}
> >This doesn't make sense, if we managed to get here, p->on_rq must be
> >false, which means the other side is already in the middle of
> >schedule().
> Yes, you are right. It is my bad that I miss the on_rq check earlier. Just
> scrap the last patch.

No worries, that is tricky code :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter
  2016-02-12 17:32 [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Waiman Long
                   ` (3 preceding siblings ...)
  2016-02-12 17:32 ` [PATCH v2 4/4] sched/fair: Abort wakeup when task is no longer in a sleeping state Waiman Long
@ 2016-02-16  8:51 ` Peter Zijlstra
  2016-02-17  1:39   ` Waiman Long
  2016-03-22  3:19   ` Waiman Long
  4 siblings, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-02-16  8:51 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen

On Fri, Feb 12, 2016 at 12:32:11PM -0500, Waiman Long wrote:
> My own test on a 4-socket E7-4820 v3 system showed a regression of
> about 4% in the high_systime workload with Peter's patch which this
> new patch effectively eliminates.
> 
> Testing on an 8-socket Westmere-EX server, however, has performance
> change from -9% to than +140% on the fserver workload of AIM7
> depending on how the system was set up.

Subject: [lkp] [locking/mutex] aaca135480: -72.9% fsmark.files_per_sec

My patch also generated the above email.

Please also test that benchmark against this approach.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter
  2016-02-16  8:51 ` [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Peter Zijlstra
@ 2016-02-17  1:39   ` Waiman Long
  2016-03-22  3:19   ` Waiman Long
  1 sibling, 0 replies; 28+ messages in thread
From: Waiman Long @ 2016-02-17  1:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen

On 02/16/2016 03:51 AM, Peter Zijlstra wrote:
> On Fri, Feb 12, 2016 at 12:32:11PM -0500, Waiman Long wrote:
>> My own test on a 4-socket E7-4820 v3 system showed a regression of
>> about 4% in the high_systime workload with Peter's patch which this
>> new patch effectively eliminates.
>>
>> Testing on an 8-socket Westmere-EX server, however, has performance
>> change from -9% to than +140% on the fserver workload of AIM7
>> depending on how the system was set up.
> Subject: [lkp] [locking/mutex] aaca135480: -72.9% fsmark.files_per_sec
>
> My patch also generated the above email.
>
> Please also test that benchmark against this approach.
>

I will ran that benchmark on my patch.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter
  2016-02-16  8:51 ` [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Peter Zijlstra
  2016-02-17  1:39   ` Waiman Long
@ 2016-03-22  3:19   ` Waiman Long
  2016-03-22  9:59     ` Peter Zijlstra
  1 sibling, 1 reply; 28+ messages in thread
From: Waiman Long @ 2016-03-22  3:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen

On 02/16/2016 03:51 AM, Peter Zijlstra wrote:
> On Fri, Feb 12, 2016 at 12:32:11PM -0500, Waiman Long wrote:
>> My own test on a 4-socket E7-4820 v3 system showed a regression of
>> about 4% in the high_systime workload with Peter's patch which this
>> new patch effectively eliminates.
>>
>> Testing on an 8-socket Westmere-EX server, however, has performance
>> change from -9% to than +140% on the fserver workload of AIM7
>> depending on how the system was set up.
> Subject: [lkp] [locking/mutex] aaca135480: -72.9% fsmark.files_per_sec
>
> My patch also generated the above email.
>
> Please also test that benchmark against this approach.
>

I also got an email from "kernel test robot", it didn't list fsmark at 
all. Instead, the subject was

[lkp] [locking/mutex] 5267438002: +38.9% 
fileio.time.involuntary_context_switches

       4409 ±  1%     +38.9%       6126 ±  2%  
fileio.time.involuntary_context_switches
       6.00 ±  0%     +33.3%       8.00 ±  0%  
fileio.time.percent_of_cpu_this_job_got
      36.06 ±  0%     +43.0%      51.55 ±  0%  fileio.time.system_time
    1828660 ±  0%     -92.5%     137258 ±  0%  
fileio.time.voluntary_context_switches

Given that the number of voluntary context switches dropped by 92.5%, an 
increase in involuntary context switches that is order of magnitude less 
than the voluntary context switches should be OK, I think.

Do you know how to report back that this increase is expected and is 
nothing to worry about? Do I just reply it back?

Cheers,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter
  2016-03-22  3:19   ` Waiman Long
@ 2016-03-22  9:59     ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-03-22  9:59 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Ding Tianhong,
	Jason Low, Davidlohr Bueso, Paul E. McKenney, Thomas Gleixner,
	Will Deacon, Tim Chen

On Mon, Mar 21, 2016 at 11:19:02PM -0400, Waiman Long wrote:
> Instead, the subject was
> 
> [lkp] [locking/mutex] 5267438002: +38.9%
> fileio.time.involuntary_context_switches
> 
>       4409 ±  1%     +38.9%       6126 ±  2%
> fileio.time.involuntary_context_switches
>       6.00 ±  0%     +33.3%       8.00 ±  0%
> fileio.time.percent_of_cpu_this_job_got
>      36.06 ±  0%     +43.0%      51.55 ±  0%  fileio.time.system_time
>    1828660 ±  0%     -92.5%     137258 ±  0%
> fileio.time.voluntary_context_switches
> 
> Given that the number of voluntary context switches dropped by 92.5%, an
> increase in involuntary context switches that is order of magnitude less
> than the voluntary context switches should be OK, I think.
> 
> Do you know how to report back that this increase is expected and is nothing
> to worry about? Do I just reply it back?

Nah, just ignore.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-03-22  9:59 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-12 17:32 [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Waiman Long
2016-02-12 17:32 ` [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin() Waiman Long
2016-02-12 20:23   ` Peter Zijlstra
2016-02-12 22:14     ` Davidlohr Bueso
2016-02-13 12:10       ` Peter Zijlstra
2016-02-13 18:14         ` Davidlohr Bueso
2016-02-16  2:15       ` Jason Low
2016-02-16  2:22         ` Jason Low
2016-02-16  8:53           ` Peter Zijlstra
2016-02-17  1:40             ` Waiman Long
2016-02-15 22:06     ` Waiman Long
2016-02-12 20:40   ` Peter Zijlstra
2016-02-15 23:55     ` Waiman Long
2016-02-16  3:00       ` Jason Low
2016-02-16  3:30         ` Waiman Long
2016-02-12 22:02   ` Davidlohr Bueso
2016-02-12 22:09     ` Davidlohr Bueso
2016-02-16  0:03     ` Waiman Long
2016-02-12 17:32 ` [PATCH v2 2/4] locking/mutex: Enable optimistic spinning of woken task in wait queue Waiman Long
2016-02-12 17:32 ` [PATCH v2 3/4] locking/mutex: Avoid missed wakeup of mutex waiter Waiman Long
2016-02-12 17:32 ` [PATCH v2 4/4] sched/fair: Abort wakeup when task is no longer in a sleeping state Waiman Long
2016-02-12 20:18   ` Peter Zijlstra
2016-02-12 21:22     ` Waiman Long
2016-02-13 12:09       ` Peter Zijlstra
2016-02-16  8:51 ` [PATCH v2 0/4] locking/mutex: Enable optimistic spinning of lock waiter Peter Zijlstra
2016-02-17  1:39   ` Waiman Long
2016-03-22  3:19   ` Waiman Long
2016-03-22  9:59     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).