LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] rwsem: fix missed wakeup due to reordering of load
@ 2017-08-23 11:28 Prateek Sood
  2017-08-24 11:29 ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Prateek Sood @ 2017-08-23 11:28 UTC (permalink / raw)
  To: peterz, mingo; +Cc: Prateek Sood, sramana, linux-kernel

If a spinner is present, there is a chance that the load of
rwsem_has_spinner() in rwsem_wake() can be reordered with
respect to decrement of rwsem count in __up_write() leading
to wakeup being missed.
 spinning writer                  up_write caller
 ---------------                  -----------------------
 [S] osq_unlock()                 [L] osq
  spin_lock(wait_lock)
  sem->count=0xFFFFFFFF00000001
            +0xFFFFFFFF00000000
  count=sem->count
  MB
                                   sem->count=0xFFFFFFFE00000001
                                             -0xFFFFFFFF00000001
                                   RMB
                                   spin_trylock(wait_lock)
                                   return
 rwsem_try_write_lock(count)
 spin_unlock(wait_lock)
 schedule()

Reordering of atomic_long_sub_return_release() in __up_write()
and rwsem_has_spinner() in rwsem_wake() can cause missing of
wakeup in up_write() context. In spinning writer, sem->count
and local variable count is 0XFFFFFFFE00000001. It would result
in rwsem_try_write_lock() failing to acquire rwsem and spinning
writer going to sleep in rwsem_down_write_failed().

The smp_rmb() will make sure that the spinner state is
consulted after sem->count is updated in up_write context.

Signed-off-by: Prateek Sood <prsood@codeaurora.org>
---
 kernel/locking/rwsem-xadd.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 34e727f..5c687f6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -586,6 +586,51 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	DEFINE_WAKE_Q(wake_q);
 
 	/*
+	 * If a spinner is present, there is a chance that the load of
+	 * rwsem_has_spinner() in rwsem_wake() can be reordered with
+	 * respect to decrement of rwsem count in __up_write() leading
+	 * to wakeup being missed.
+	 *
+	 * spinning writer                  up_write caller
+	 * ---------------                  -----------------------
+	 * [S] osq_unlock()                 [L] osq
+	 *  spin_lock(wait_lock)
+	 *  sem->count=0xFFFFFFFF00000001
+	 *            +0xFFFFFFFF00000000
+	 *  count=sem->count
+	 *  MB
+	 *                                   sem->count=0xFFFFFFFE00000001
+	 *                                             -0xFFFFFFFF00000001
+	 *                                   RMB
+	 *                                   spin_trylock(wait_lock)
+	 *                                   return
+	 * rwsem_try_write_lock(count)
+	 * spin_unlock(wait_lock)
+	 * schedule()
+	 *
+	 * Reordering of atomic_long_sub_return_release() in __up_write()
+	 * and rwsem_has_spinner() in rwsem_wake() can cause missing of
+	 * wakeup in up_write() context. In spinning writer, sem->count
+	 * and local variable count is 0XFFFFFFFE00000001. It would result
+	 * in rwsem_try_write_lock() failing to acquire rwsem and spinning
+	 * writer going to sleep in rwsem_down_write_failed().
+	 *
+	 *
+	 * The RMB in below example is to make sure that the spinner state is
+	 * consulted after sem->count is updated in up_write context.
+	 * This would guarantee trylock on rwsem is successful
+	 * in rwsem_down_write_failed().
+	 * spinning writer                  up_write caller
+	 * ---------------                  -----------------------
+	 * [S] osq_unlock()                 atomic_update(sem->count)
+	 *                                  RMB
+	 * atomic_update(sem->count)        [L] osq
+	 * MB
+	 * rwsem_try_write_lock(count)
+	 */
+	smp_rmb();
+
+	/*
 	 * If a spinner is present, it is not necessary to do the wakeup.
 	 * Try to do wakeup only if the trylock succeeds to minimize
 	 * spinlock contention which may introduce too much delay in the
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., 
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-08-23 11:28 [PATCH] rwsem: fix missed wakeup due to reordering of load Prateek Sood
@ 2017-08-24 11:29 ` Peter Zijlstra
  2017-08-24 12:33   ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2017-08-24 11:29 UTC (permalink / raw)
  To: Prateek Sood
  Cc: mingo, sramana, linux-kernel, Waiman Long, Davidlohr Bueso, Andrea Parri


WTH did you not Cc the people that commented on your patch last time?

On Wed, Aug 23, 2017 at 04:58:55PM +0530, Prateek Sood wrote:
> If a spinner is present, there is a chance that the load of
> rwsem_has_spinner() in rwsem_wake() can be reordered with
> respect to decrement of rwsem count in __up_write() leading
> to wakeup being missed.

>  spinning writer                  up_write caller
>  ---------------                  -----------------------
>  [S] osq_unlock()                 [L] osq
>   spin_lock(wait_lock)
>   sem->count=0xFFFFFFFF00000001
>             +0xFFFFFFFF00000000
>   count=sem->count
>   MB
>                                    sem->count=0xFFFFFFFE00000001
>                                              -0xFFFFFFFF00000001
>                                    RMB

This doesn't make sense, it appears to order a STORE against something
else.

>                                    spin_trylock(wait_lock)
>                                    return
>  rwsem_try_write_lock(count)
>  spin_unlock(wait_lock)
>  schedule()

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-08-24 11:29 ` Peter Zijlstra
@ 2017-08-24 12:33   ` Peter Zijlstra
  2017-08-24 12:52     ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2017-08-24 12:33 UTC (permalink / raw)
  To: Prateek Sood
  Cc: mingo, sramana, linux-kernel, Waiman Long, Davidlohr Bueso,
	Andrea Parri, Will Deacon

On Thu, Aug 24, 2017 at 01:29:27PM +0200, Peter Zijlstra wrote:
> 
> WTH did you not Cc the people that commented on your patch last time?
> 
> On Wed, Aug 23, 2017 at 04:58:55PM +0530, Prateek Sood wrote:
> > If a spinner is present, there is a chance that the load of
> > rwsem_has_spinner() in rwsem_wake() can be reordered with
> > respect to decrement of rwsem count in __up_write() leading
> > to wakeup being missed.
> 
> >  spinning writer                  up_write caller
> >  ---------------                  -----------------------
> >  [S] osq_unlock()                 [L] osq
> >   spin_lock(wait_lock)
> >   sem->count=0xFFFFFFFF00000001
> >             +0xFFFFFFFF00000000
> >   count=sem->count
> >   MB
> >                                    sem->count=0xFFFFFFFE00000001
> >                                              -0xFFFFFFFF00000001
> >                                    RMB
> 
> This doesn't make sense, it appears to order a STORE against something
> else.
> 
> >                                    spin_trylock(wait_lock)
> >                                    return
> >  rwsem_try_write_lock(count)
> >  spin_unlock(wait_lock)
> >  schedule()

Is this what you wanted to write?

---
 kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 02f660666ab8..813b5d3654ce 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -613,6 +613,33 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	DEFINE_WAKE_Q(wake_q);
 
 	/*
+	 * __rwsem_down_write_failed_common(sem)
+	 *   rwsem_optimistic_spin(sem)
+	 *     osq_unlock(sem->osq)
+	 *   ...
+	 *   atomic_long_add_return(&sem->count)
+	 *
+	 *		- VS -
+	 *
+	 *			__up_write()
+	 *			  if (atomic_long_sub_return_release(&sem->count) < 0)
+	 *			    rwsem_wake(sem)
+	 *			      osq_is_locked(&sem->osq)
+	 *
+	 * And __up_write() must observe !osq_is_locked() when it observes the
+	 * atomic_long_add_return() in order to not miss a wakeup.
+	 *
+	 * This boils down to:
+	 *
+	 * [S.rel] X = 1		[RmW] r0 = (Y += 0)
+	 *	   MB			      RMB
+	 * [RmW]   Y += 1		[L]   r1 = X
+	 *
+	 * exists (r0=1 /\ r1=0)
+	 */
+	smp_rmb();
+
+	/*
 	 * If a spinner is present, it is not necessary to do the wakeup.
 	 * Try to do wakeup only if the trylock succeeds to minimize
 	 * spinlock contention which may introduce too much delay in the

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-08-24 12:33   ` Peter Zijlstra
@ 2017-08-24 12:52     ` Peter Zijlstra
  2017-09-07 14:08       ` Prateek Sood
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2017-08-24 12:52 UTC (permalink / raw)
  To: Prateek Sood
  Cc: mingo, sramana, linux-kernel, Waiman Long, Davidlohr Bueso,
	Andrea Parri, Will Deacon

On Thu, Aug 24, 2017 at 02:33:04PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 24, 2017 at 01:29:27PM +0200, Peter Zijlstra wrote:
> > 
> > WTH did you not Cc the people that commented on your patch last time?
> > 
> > On Wed, Aug 23, 2017 at 04:58:55PM +0530, Prateek Sood wrote:
> > > If a spinner is present, there is a chance that the load of
> > > rwsem_has_spinner() in rwsem_wake() can be reordered with
> > > respect to decrement of rwsem count in __up_write() leading
> > > to wakeup being missed.
> > 
> > >  spinning writer                  up_write caller
> > >  ---------------                  -----------------------
> > >  [S] osq_unlock()                 [L] osq
> > >   spin_lock(wait_lock)
> > >   sem->count=0xFFFFFFFF00000001
> > >             +0xFFFFFFFF00000000
> > >   count=sem->count
> > >   MB
> > >                                    sem->count=0xFFFFFFFE00000001
> > >                                              -0xFFFFFFFF00000001
> > >                                    RMB
> > 
> > This doesn't make sense, it appears to order a STORE against something
> > else.
> > 
> > >                                    spin_trylock(wait_lock)
> > >                                    return
> > >  rwsem_try_write_lock(count)
> > >  spin_unlock(wait_lock)
> > >  schedule()
> 
> Is this what you wanted to write?

And ideally there should be a comment near the atomic_long_add_return()
in __rwsem_down_write_failed_common() to indicate we rely on the implied
smp_mb() before it -- just in case someone goes and makes it
atomic_long_add_return_relaxed().

And I suppose someone should look at the waiting branch of that thing
too.. because I'm not sure what happens if waiting is true but count
isn't big enough.

I bloody hate the rwsem code, that BIAS stuff forever confuses me. I
have a start at rewriting the thing to put the owner in the lock word
just like we now do for mutex, but never seem to get around to finishing
it.

> ---
>  kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 02f660666ab8..813b5d3654ce 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -613,6 +613,33 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
>  	DEFINE_WAKE_Q(wake_q);
>  
>  	/*
> +	 * __rwsem_down_write_failed_common(sem)
> +	 *   rwsem_optimistic_spin(sem)
> +	 *     osq_unlock(sem->osq)
> +	 *   ...
> +	 *   atomic_long_add_return(&sem->count)
> +	 *
> +	 *		- VS -
> +	 *
> +	 *			__up_write()
> +	 *			  if (atomic_long_sub_return_release(&sem->count) < 0)
> +	 *			    rwsem_wake(sem)
> +	 *			      osq_is_locked(&sem->osq)
> +	 *
> +	 * And __up_write() must observe !osq_is_locked() when it observes the
> +	 * atomic_long_add_return() in order to not miss a wakeup.
> +	 *
> +	 * This boils down to:
> +	 *
> +	 * [S.rel] X = 1		[RmW] r0 = (Y += 0)
> +	 *	   MB			      RMB
> +	 * [RmW]   Y += 1		[L]   r1 = X
> +	 *
> +	 * exists (r0=1 /\ r1=0)
> +	 */
> +	smp_rmb();
> +
> +	/*
>  	 * If a spinner is present, it is not necessary to do the wakeup.
>  	 * Try to do wakeup only if the trylock succeeds to minimize
>  	 * spinlock contention which may introduce too much delay in the

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-08-24 12:52     ` Peter Zijlstra
@ 2017-09-07 14:08       ` Prateek Sood
  0 siblings, 0 replies; 17+ messages in thread
From: Prateek Sood @ 2017-09-07 14:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, sramana, linux-kernel, Waiman Long, Davidlohr Bueso,
	Andrea Parri, Will Deacon

On 08/24/2017 06:22 PM, Peter Zijlstra wrote:
> On Thu, Aug 24, 2017 at 02:33:04PM +0200, Peter Zijlstra wrote:
>> On Thu, Aug 24, 2017 at 01:29:27PM +0200, Peter Zijlstra wrote:
>>>
>>> WTH did you not Cc the people that commented on your patch last time?
>>>
>>> On Wed, Aug 23, 2017 at 04:58:55PM +0530, Prateek Sood wrote:
>>>> If a spinner is present, there is a chance that the load of
>>>> rwsem_has_spinner() in rwsem_wake() can be reordered with
>>>> respect to decrement of rwsem count in __up_write() leading
>>>> to wakeup being missed.
>>>
>>>>  spinning writer                  up_write caller
>>>>  ---------------                  -----------------------
>>>>  [S] osq_unlock()                 [L] osq
>>>>   spin_lock(wait_lock)
>>>>   sem->count=0xFFFFFFFF00000001
>>>>             +0xFFFFFFFF00000000
>>>>   count=sem->count
>>>>   MB
>>>>                                    sem->count=0xFFFFFFFE00000001
>>>>                                              -0xFFFFFFFF00000001
>>>>                                    RMB
>>>
>>> This doesn't make sense, it appears to order a STORE against something
>>> else.
>>>
>>>>                                    spin_trylock(wait_lock)
>>>>                                    return
>>>>  rwsem_try_write_lock(count)
>>>>  spin_unlock(wait_lock)
>>>>  schedule()
>>
>> Is this what you wanted to write?
> 
> And ideally there should be a comment near the atomic_long_add_return()
> in __rwsem_down_write_failed_common() to indicate we rely on the implied
> smp_mb() before it -- just in case someone goes and makes it
> atomic_long_add_return_relaxed().
> 
> And I suppose someone should look at the waiting branch of that thing
> too.. because I'm not sure what happens if waiting is true but count
> isn't big enough.
> 
> I bloody hate the rwsem code, that BIAS stuff forever confuses me. I
> have a start at rewriting the thing to put the owner in the lock word
> just like we now do for mutex, but never seem to get around to finishing
> it.
> 
>> ---
>>  kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++++++++++++
>>  1 file changed, 27 insertions(+)
>>
>> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
>> index 02f660666ab8..813b5d3654ce 100644
>> --- a/kernel/locking/rwsem-xadd.c
>> +++ b/kernel/locking/rwsem-xadd.c
>> @@ -613,6 +613,33 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
>>  	DEFINE_WAKE_Q(wake_q);
>>  
>>  	/*
>> +	 * __rwsem_down_write_failed_common(sem)
>> +	 *   rwsem_optimistic_spin(sem)
>> +	 *     osq_unlock(sem->osq)
>> +	 *   ...
>> +	 *   atomic_long_add_return(&sem->count)
>> +	 *
>> +	 *		- VS -
>> +	 *
>> +	 *			__up_write()
>> +	 *			  if (atomic_long_sub_return_release(&sem->count) < 0)
>> +	 *			    rwsem_wake(sem)
>> +	 *			      osq_is_locked(&sem->osq)
>> +	 *
>> +	 * And __up_write() must observe !osq_is_locked() when it observes the
>> +	 * atomic_long_add_return() in order to not miss a wakeup.
>> +	 *
>> +	 * This boils down to:
>> +	 *
>> +	 * [S.rel] X = 1		[RmW] r0 = (Y += 0)
>> +	 *	   MB			      RMB
>> +	 * [RmW]   Y += 1		[L]   r1 = X
>> +	 *
>> +	 * exists (r0=1 /\ r1=0)
>> +	 */
>> +	smp_rmb();
>> +
>> +	/*
>>  	 * If a spinner is present, it is not necessary to do the wakeup.
>>  	 * Try to do wakeup only if the trylock succeeds to minimize
>>  	 * spinlock contention which may introduce too much delay in the

Thanks Peter for your suggestion on comments.
I will resend the patch with updated comments

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-09-20 21:17   ` Andrea Parri
@ 2017-09-27 21:20     ` Davidlohr Bueso
  0 siblings, 0 replies; 17+ messages in thread
From: Davidlohr Bueso @ 2017-09-27 21:20 UTC (permalink / raw)
  To: Andrea Parri; +Cc: Prateek Sood, mingo, longman, peterz, linux-kernel, sramana

On Wed, 20 Sep 2017, Andrea Parri wrote:

>> Instead, how about just removing the release from atomic_long_sub_return_release()
>> such that the osq load is not hoisted over the atomic compound (along with Peter's
>> comment):
>
>This solution will actually enforce a stronger (full) ordering w.r.t. the
>solution described by Prateek and Peter. Also, it will "trade" two lwsync
>for two sync (powerpc), one dmb.ld for one dmb (arm64).
>
>What are the reasons you would prefer this?

It was mainly to maintain consistency about dealing with sem->count, but sure
I won't argue with the above.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-09-07 14:30 Prateek Sood
  2017-09-19 14:05 ` Andrea Parri
  2017-09-20 14:52 ` Davidlohr Bueso
@ 2017-09-26 18:37 ` Prateek Sood
  2 siblings, 0 replies; 17+ messages in thread
From: Prateek Sood @ 2017-09-26 18:37 UTC (permalink / raw)
  To: mingo, longman, peterz, parri.andrea, dave; +Cc: linux-kernel, sramana

On 09/07/2017 08:00 PM, Prateek Sood wrote:
> If a spinner is present, there is a chance that the load of
> rwsem_has_spinner() in rwsem_wake() can be reordered with
> respect to decrement of rwsem count in __up_write() leading
> to wakeup being missed.
> 
>  spinning writer                  up_write caller
>  ---------------                  -----------------------
>  [S] osq_unlock()                 [L] osq
>   spin_lock(wait_lock)
>   sem->count=0xFFFFFFFF00000001
>             +0xFFFFFFFF00000000
>   count=sem->count
>   MB
>                                    sem->count=0xFFFFFFFE00000001
>                                              -0xFFFFFFFF00000001
>                                    spin_trylock(wait_lock)
>                                    return
>  rwsem_try_write_lock(count)
>  spin_unlock(wait_lock)
>  schedule()
> 
> Reordering of atomic_long_sub_return_release() in __up_write()
> and rwsem_has_spinner() in rwsem_wake() can cause missing of
> wakeup in up_write() context. In spinning writer, sem->count
> and local variable count is 0XFFFFFFFE00000001. It would result
> in rwsem_try_write_lock() failing to acquire rwsem and spinning
> writer going to sleep in rwsem_down_write_failed().
> 
> The smp_rmb() will make sure that the spinner state is
> consulted after sem->count is updated in up_write context.
> 
> Signed-off-by: Prateek Sood <prsood@codeaurora.org>
> ---
>  kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 02f6606..1fefe6d 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -613,6 +613,33 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
>  	DEFINE_WAKE_Q(wake_q);
>  
>  	/*
> +	* __rwsem_down_write_failed_common(sem)
> +	*   rwsem_optimistic_spin(sem)
> +	*     osq_unlock(sem->osq)
> +	*   ...
> +	*   atomic_long_add_return(&sem->count)
> +	*
> +	*      - VS -
> +	*
> +	*              __up_write()
> +	*                if (atomic_long_sub_return_release(&sem->count) < 0)
> +	*                  rwsem_wake(sem)
> +	*                    osq_is_locked(&sem->osq)
> +	*
> +	* And __up_write() must observe !osq_is_locked() when it observes the
> +	* atomic_long_add_return() in order to not miss a wakeup.
> +	*
> +	* This boils down to:
> +	*
> +	* [S.rel] X = 1                [RmW] r0 = (Y += 0)
> +	*         MB                         RMB
> +	* [RmW]   Y += 1               [L]   r1 = X
> +	*
> +	* exists (r0=1 /\ r1=0)
> +	*/
> +	smp_rmb();
> +
> +	/*
>  	 * If a spinner is present, it is not necessary to do the wakeup.
>  	 * Try to do wakeup only if the trylock succeeds to minimize
>  	 * spinlock contention which may introduce too much delay in the
> 

Hi Folks,

Do you have any more suggestion/feedback on this patch.


Regards
Prateek

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-09-20 14:52 ` Davidlohr Bueso
@ 2017-09-20 21:17   ` Andrea Parri
  2017-09-27 21:20     ` Davidlohr Bueso
  0 siblings, 1 reply; 17+ messages in thread
From: Andrea Parri @ 2017-09-20 21:17 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Prateek Sood, mingo, longman, peterz, linux-kernel, sramana

On Wed, Sep 20, 2017 at 07:52:54AM -0700, Davidlohr Bueso wrote:
> On Thu, 07 Sep 2017, Prateek Sood wrote:
> >	/*
> >+	* __rwsem_down_write_failed_common(sem)
> >+	*   rwsem_optimistic_spin(sem)
> >+	*     osq_unlock(sem->osq)
> >+	*   ...
> >+	*   atomic_long_add_return(&sem->count)
> >+	*
> >+	*      - VS -
> >+	*
> >+	*              __up_write()
> >+	*                if (atomic_long_sub_return_release(&sem->count) < 0)
> >+	*                  rwsem_wake(sem)
> >+	*                    osq_is_locked(&sem->osq)
> >+	*
> >+	* And __up_write() must observe !osq_is_locked() when it observes the
> >+	* atomic_long_add_return() in order to not miss a wakeup.
> >+	*
> >+	* This boils down to:
> >+	*
> >+	* [S.rel] X = 1                [RmW] r0 = (Y += 0)
> >+	*         MB                         RMB
> >+	* [RmW]   Y += 1               [L]   r1 = X
> >+	*
> >+	* exists (r0=1 /\ r1=0)
> >+	*/
> >+	smp_rmb();
> 
> Instead, how about just removing the release from atomic_long_sub_return_release()
> such that the osq load is not hoisted over the atomic compound (along with Peter's
> comment):

This solution will actually enforce a stronger (full) ordering w.r.t. the
solution described by Prateek and Peter. Also, it will "trade" two lwsync
for two sync (powerpc), one dmb.ld for one dmb (arm64).

What are the reasons you would prefer this?

  Andrea


> 
> diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
> index 6c6a2141f271..487ce31078ff 100644
> --- a/include/asm-generic/rwsem.h
> +++ b/include/asm-generic/rwsem.h
> @@ -101,7 +101,7 @@ static inline void __up_read(struct rw_semaphore *sem)
>  */
> static inline void __up_write(struct rw_semaphore *sem)
> {
> -	if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS,
> +	if (unlikely(atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
> 						    &sem->count) < 0))
> 		rwsem_wake(sem);
> }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-09-07 14:30 Prateek Sood
  2017-09-19 14:05 ` Andrea Parri
@ 2017-09-20 14:52 ` Davidlohr Bueso
  2017-09-20 21:17   ` Andrea Parri
  2017-09-26 18:37 ` Prateek Sood
  2 siblings, 1 reply; 17+ messages in thread
From: Davidlohr Bueso @ 2017-09-20 14:52 UTC (permalink / raw)
  To: Prateek Sood; +Cc: mingo, longman, peterz, parri.andrea, linux-kernel, sramana

On Thu, 07 Sep 2017, Prateek Sood wrote:
> 	/*
>+	* __rwsem_down_write_failed_common(sem)
>+	*   rwsem_optimistic_spin(sem)
>+	*     osq_unlock(sem->osq)
>+	*   ...
>+	*   atomic_long_add_return(&sem->count)
>+	*
>+	*      - VS -
>+	*
>+	*              __up_write()
>+	*                if (atomic_long_sub_return_release(&sem->count) < 0)
>+	*                  rwsem_wake(sem)
>+	*                    osq_is_locked(&sem->osq)
>+	*
>+	* And __up_write() must observe !osq_is_locked() when it observes the
>+	* atomic_long_add_return() in order to not miss a wakeup.
>+	*
>+	* This boils down to:
>+	*
>+	* [S.rel] X = 1                [RmW] r0 = (Y += 0)
>+	*         MB                         RMB
>+	* [RmW]   Y += 1               [L]   r1 = X
>+	*
>+	* exists (r0=1 /\ r1=0)
>+	*/
>+	smp_rmb();

Instead, how about just removing the release from atomic_long_sub_return_release()
such that the osq load is not hoisted over the atomic compound (along with Peter's
comment):

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index 6c6a2141f271..487ce31078ff 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -101,7 +101,7 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS,
+	if (unlikely(atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
 						    &sem->count) < 0))
 		rwsem_wake(sem);
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-09-07 14:30 Prateek Sood
@ 2017-09-19 14:05 ` Andrea Parri
  2017-09-20 14:52 ` Davidlohr Bueso
  2017-09-26 18:37 ` Prateek Sood
  2 siblings, 0 replies; 17+ messages in thread
From: Andrea Parri @ 2017-09-19 14:05 UTC (permalink / raw)
  To: Prateek Sood, peterz, mingo; +Cc: longman, dave, linux-kernel, sramana

On Thu, Sep 07, 2017 at 08:00:58PM +0530, Prateek Sood wrote:
> If a spinner is present, there is a chance that the load of
> rwsem_has_spinner() in rwsem_wake() can be reordered with
> respect to decrement of rwsem count in __up_write() leading
> to wakeup being missed.
> 
>  spinning writer                  up_write caller
>  ---------------                  -----------------------
>  [S] osq_unlock()                 [L] osq
>   spin_lock(wait_lock)
>   sem->count=0xFFFFFFFF00000001
>             +0xFFFFFFFF00000000
>   count=sem->count
>   MB
>                                    sem->count=0xFFFFFFFE00000001
>                                              -0xFFFFFFFF00000001
>                                    spin_trylock(wait_lock)
>                                    return
>  rwsem_try_write_lock(count)
>  spin_unlock(wait_lock)
>  schedule()
> 
> Reordering of atomic_long_sub_return_release() in __up_write()
> and rwsem_has_spinner() in rwsem_wake() can cause missing of
> wakeup in up_write() context. In spinning writer, sem->count
> and local variable count is 0XFFFFFFFE00000001. It would result
> in rwsem_try_write_lock() failing to acquire rwsem and spinning
> writer going to sleep in rwsem_down_write_failed().
> 
> The smp_rmb() will make sure that the spinner state is
> consulted after sem->count is updated in up_write context.
> 
> Signed-off-by: Prateek Sood <prsood@codeaurora.org>

Reviewed-by: Andrea Parri <parri.andrea@gmail.com>

I understand that the merge window and LPC made this stalls for
a while... what am I missing? are there other changes that need
to be considered for this patch?

  Andrea


> ---
>  kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 02f6606..1fefe6d 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -613,6 +613,33 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
>  	DEFINE_WAKE_Q(wake_q);
>  
>  	/*
> +	* __rwsem_down_write_failed_common(sem)
> +	*   rwsem_optimistic_spin(sem)
> +	*     osq_unlock(sem->osq)
> +	*   ...
> +	*   atomic_long_add_return(&sem->count)
> +	*
> +	*      - VS -
> +	*
> +	*              __up_write()
> +	*                if (atomic_long_sub_return_release(&sem->count) < 0)
> +	*                  rwsem_wake(sem)
> +	*                    osq_is_locked(&sem->osq)
> +	*
> +	* And __up_write() must observe !osq_is_locked() when it observes the
> +	* atomic_long_add_return() in order to not miss a wakeup.
> +	*
> +	* This boils down to:
> +	*
> +	* [S.rel] X = 1                [RmW] r0 = (Y += 0)
> +	*         MB                         RMB
> +	* [RmW]   Y += 1               [L]   r1 = X
> +	*
> +	* exists (r0=1 /\ r1=0)
> +	*/
> +	smp_rmb();
> +
> +	/*
>  	 * If a spinner is present, it is not necessary to do the wakeup.
>  	 * Try to do wakeup only if the trylock succeeds to minimize
>  	 * spinlock contention which may introduce too much delay in the
> -- 
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., 
> is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] rwsem: fix missed wakeup due to reordering of load
@ 2017-09-07 14:30 Prateek Sood
  2017-09-19 14:05 ` Andrea Parri
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Prateek Sood @ 2017-09-07 14:30 UTC (permalink / raw)
  To: mingo, longman, peterz, parri.andrea, dave
  Cc: Prateek Sood, linux-kernel, sramana

If a spinner is present, there is a chance that the load of
rwsem_has_spinner() in rwsem_wake() can be reordered with
respect to decrement of rwsem count in __up_write() leading
to wakeup being missed.

 spinning writer                  up_write caller
 ---------------                  -----------------------
 [S] osq_unlock()                 [L] osq
  spin_lock(wait_lock)
  sem->count=0xFFFFFFFF00000001
            +0xFFFFFFFF00000000
  count=sem->count
  MB
                                   sem->count=0xFFFFFFFE00000001
                                             -0xFFFFFFFF00000001
                                   spin_trylock(wait_lock)
                                   return
 rwsem_try_write_lock(count)
 spin_unlock(wait_lock)
 schedule()

Reordering of atomic_long_sub_return_release() in __up_write()
and rwsem_has_spinner() in rwsem_wake() can cause missing of
wakeup in up_write() context. In spinning writer, sem->count
and local variable count is 0XFFFFFFFE00000001. It would result
in rwsem_try_write_lock() failing to acquire rwsem and spinning
writer going to sleep in rwsem_down_write_failed().

The smp_rmb() will make sure that the spinner state is
consulted after sem->count is updated in up_write context.

Signed-off-by: Prateek Sood <prsood@codeaurora.org>
---
 kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 02f6606..1fefe6d 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -613,6 +613,33 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	DEFINE_WAKE_Q(wake_q);
 
 	/*
+	* __rwsem_down_write_failed_common(sem)
+	*   rwsem_optimistic_spin(sem)
+	*     osq_unlock(sem->osq)
+	*   ...
+	*   atomic_long_add_return(&sem->count)
+	*
+	*      - VS -
+	*
+	*              __up_write()
+	*                if (atomic_long_sub_return_release(&sem->count) < 0)
+	*                  rwsem_wake(sem)
+	*                    osq_is_locked(&sem->osq)
+	*
+	* And __up_write() must observe !osq_is_locked() when it observes the
+	* atomic_long_add_return() in order to not miss a wakeup.
+	*
+	* This boils down to:
+	*
+	* [S.rel] X = 1                [RmW] r0 = (Y += 0)
+	*         MB                         RMB
+	* [RmW]   Y += 1               [L]   r1 = X
+	*
+	* exists (r0=1 /\ r1=0)
+	*/
+	smp_rmb();
+
+	/*
 	 * If a spinner is present, it is not necessary to do the wakeup.
 	 * Try to do wakeup only if the trylock succeeds to minimize
 	 * spinlock contention which may introduce too much delay in the
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., 
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-07-26 20:17 Prateek Sood
  2017-07-27 15:48 ` Waiman Long
@ 2017-08-10 10:44 ` Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2017-08-10 10:44 UTC (permalink / raw)
  To: Prateek Sood; +Cc: mingo, sramana, linux-kernel

On Thu, Jul 27, 2017 at 01:47:52AM +0530, Prateek Sood wrote:
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 34e727f..21c111a 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -585,6 +585,40 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
>  	unsigned long flags;
>  	DEFINE_WAKE_Q(wake_q);
>  
> +        /*
> +         * If a spinner is present, there is a chance that the load of
> +         * rwsem_has_spinner() in rwsem_wake() can be reordered with
> +         * respect to decrement of rwsem count in __up_write() leading
> +         * to wakeup being missed.
> +         *
> +         * spinning writer                  up_write caller
> +         * ---------------                  -----------------------
> +         * [S] osq_unlock()                 [L] osq
> +         *  spin_lock(wait_lock)
> +         *  sem->count=0xFFFFFFFF00000001
> +         *            +0xFFFFFFFF00000000
> +         *  count=sem->count
> +         *  MB
> +         *                                   sem->count=0xFFFFFFFE00000001
> +         *                                             -0xFFFFFFFF00000001
> +         *                                   spin_trylock(wait_lock)
> +         *                                   return
> +         * rwsem_try_write_lock(count)
> +         * spin_unlock(wait_lock)
> +         * schedule()
> +         *
> +         * Reordering of atomic_long_sub_return_release() in __up_write()
> +         * and rwsem_has_spinner() in rwsem_wake() can cause missing of
> +         * wakeup in up_write() context. In spinning writer, sem->count
> +         * and local variable count is 0XFFFFFFFE00000001. It would result
> +         * in rwsem_try_write_lock() failing to acquire rwsem and spinning
> +         * writer going to sleep in rwsem_down_write_failed().
> +         *
> +         * The smp_rmb() here is to make sure that the spinner state is
> +         * consulted after sem->count is updated in up_write context.

I feel that comment can use help.. for example the RMB you add below is
not present at all.

> +         */
> +        smp_rmb();
> +
>  	/*
>  	 * If a spinner is present, it is not necessary to do the wakeup.
>  	 * Try to do wakeup only if the trylock succeeds to minimize

Your patch is whitespace damaged, all the indentation on the + lines is
with spaces. Please resend with \t.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-08-10  8:32   ` Andrea Parri
@ 2017-08-10 10:41     ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2017-08-10 10:41 UTC (permalink / raw)
  To: Andrea Parri; +Cc: Waiman Long, Prateek Sood, mingo, sramana, linux-kernel

On Thu, Aug 10, 2017 at 10:32:56AM +0200, Andrea Parri wrote:
> On Thu, Jul 27, 2017 at 11:48:53AM -0400, Waiman Long wrote:
> > On 07/26/2017 04:17 PM, Prateek Sood wrote:
> > > If a spinner is present, there is a chance that the load of
> > > rwsem_has_spinner() in rwsem_wake() can be reordered with
> > > respect to decrement of rwsem count in __up_write() leading
> > > to wakeup being missed.
> > >
> > >  spinning writer                  up_write caller
> > >  ---------------                  -----------------------
> > >  [S] osq_unlock()                 [L] osq
> > >   spin_lock(wait_lock)
> > >   sem->count=0xFFFFFFFF00000001
> > >             +0xFFFFFFFF00000000
> > >   count=sem->count
> > >   MB
> > >                                    sem->count=0xFFFFFFFE00000001
> > >                                              -0xFFFFFFFF00000001
> > >                                    spin_trylock(wait_lock)
> > >                                    return
> > >  rwsem_try_write_lock(count)
> > >  spin_unlock(wait_lock)
> > >  schedule()
> > >
> > > Reordering of atomic_long_sub_return_release() in __up_write()
> > > and rwsem_has_spinner() in rwsem_wake() can cause missing of
> > > wakeup in up_write() context. In spinning writer, sem->count
> > > and local variable count is 0XFFFFFFFE00000001. It would result
> > > in rwsem_try_write_lock() failing to acquire rwsem and spinning
> > > writer going to sleep in rwsem_down_write_failed().
> > >
> > > The smp_rmb() will make sure that the spinner state is
> > > consulted after sem->count is updated in up_write context.
> > >
> > > Signed-off-by: Prateek Sood <prsood@codeaurora.org>
> > 
> > Did you actually observe that the reordering happens?
> > 
> > I am not sure if some architectures can actually speculatively execute
> > instruction ahead of a branch and then ahead into a function call. I
> > know it can happen if the function call is inlined, but rwsem_wake()
> > will not be inlined into __up_read() or __up_write().
> 
> Branches/control dependencies targeting a read do not necessarily preserve
> program order; this is for example the case for PowerPC and ARM.
> 
> I'd not expect more than a compiler barrier from a function call (in fact,
> not even that if the function happens to be inlined).

Indeed. Reads can be speculated by deep out-of-order CPUs no problem.
That's what branch predictors are for.

> > Even if that is the case, I am not sure if smp_rmb() alone is enough to
> > guarantee the ordering as I think it will depend on how the
> > atomic_long_sub_return_release() is implmented.
> 
> AFAICT, the pattern under discussion is MP with:
> 
>   - a store-release to osq->tail(unlock) followed by a store to sem->count,
>     separated by a MB (from atomic_long_add_return()) on CPU0;
> 
>   - a load of sem->count (for atomic_long_sub_return_release()) followed by

Which is a regular load, as 'release' only need apply to the store.

>     a load of osq->tail (rwsem_has_spinner()) on CPU1.
> 
> Thus a RMW between the two loads suffices to forbid the weak behaviour.

Agreed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-07-27 15:48 ` Waiman Long
  2017-07-27 16:59   ` Peter Zijlstra
@ 2017-08-10  8:32   ` Andrea Parri
  2017-08-10 10:41     ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Andrea Parri @ 2017-08-10  8:32 UTC (permalink / raw)
  To: Waiman Long; +Cc: Prateek Sood, peterz, mingo, sramana, linux-kernel

On Thu, Jul 27, 2017 at 11:48:53AM -0400, Waiman Long wrote:
> On 07/26/2017 04:17 PM, Prateek Sood wrote:
> > If a spinner is present, there is a chance that the load of
> > rwsem_has_spinner() in rwsem_wake() can be reordered with
> > respect to decrement of rwsem count in __up_write() leading
> > to wakeup being missed.
> >
> >  spinning writer                  up_write caller
> >  ---------------                  -----------------------
> >  [S] osq_unlock()                 [L] osq
> >   spin_lock(wait_lock)
> >   sem->count=0xFFFFFFFF00000001
> >             +0xFFFFFFFF00000000
> >   count=sem->count
> >   MB
> >                                    sem->count=0xFFFFFFFE00000001
> >                                              -0xFFFFFFFF00000001
> >                                    spin_trylock(wait_lock)
> >                                    return
> >  rwsem_try_write_lock(count)
> >  spin_unlock(wait_lock)
> >  schedule()
> >
> > Reordering of atomic_long_sub_return_release() in __up_write()
> > and rwsem_has_spinner() in rwsem_wake() can cause missing of
> > wakeup in up_write() context. In spinning writer, sem->count
> > and local variable count is 0XFFFFFFFE00000001. It would result
> > in rwsem_try_write_lock() failing to acquire rwsem and spinning
> > writer going to sleep in rwsem_down_write_failed().
> >
> > The smp_rmb() will make sure that the spinner state is
> > consulted after sem->count is updated in up_write context.
> >
> > Signed-off-by: Prateek Sood <prsood@codeaurora.org>
> 
> Did you actually observe that the reordering happens?
> 
> I am not sure if some architectures can actually speculatively execute
> instruction ahead of a branch and then ahead into a function call. I
> know it can happen if the function call is inlined, but rwsem_wake()
> will not be inlined into __up_read() or __up_write().

Branches/control dependencies targeting a read do not necessarily preserve
program order; this is for example the case for PowerPC and ARM.

I'd not expect more than a compiler barrier from a function call (in fact,
not even that if the function happens to be inlined).


> 
> Even if that is the case, I am not sure if smp_rmb() alone is enough to
> guarantee the ordering as I think it will depend on how the
> atomic_long_sub_return_release() is implmented.

AFAICT, the pattern under discussion is MP with:

  - a store-release to osq->tail(unlock) followed by a store to sem->count,
    separated by a MB (from atomic_long_add_return()) on CPU0;

  - a load of sem->count (for atomic_long_sub_return_release()) followed by
    a load of osq->tail (rwsem_has_spinner()) on CPU1.

Thus a RMW between the two loads suffices to forbid the weak behaviour.

  Andrea


> 
> Cheers,
> Longman
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-07-27 15:48 ` Waiman Long
@ 2017-07-27 16:59   ` Peter Zijlstra
  2017-08-10  8:32   ` Andrea Parri
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2017-07-27 16:59 UTC (permalink / raw)
  To: Waiman Long; +Cc: Prateek Sood, mingo, sramana, linux-kernel

On Thu, Jul 27, 2017 at 11:48:53AM -0400, Waiman Long wrote:
> atomic_long_sub_return_release() is implmented.

I've not had time to really thing about the problem at hand, but this I
can answer:

TSO (x86, s390, sparc): fully serialized
PPC: lwsync; ll/sc	(RCpc)
ARM64: ll/sc-release	(RCsc)
other: smp_mb(); atomic_sub_return_relaxed();

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] rwsem: fix missed wakeup due to reordering of load
  2017-07-26 20:17 Prateek Sood
@ 2017-07-27 15:48 ` Waiman Long
  2017-07-27 16:59   ` Peter Zijlstra
  2017-08-10  8:32   ` Andrea Parri
  2017-08-10 10:44 ` Peter Zijlstra
  1 sibling, 2 replies; 17+ messages in thread
From: Waiman Long @ 2017-07-27 15:48 UTC (permalink / raw)
  To: Prateek Sood, peterz, mingo; +Cc: sramana, linux-kernel

On 07/26/2017 04:17 PM, Prateek Sood wrote:
> If a spinner is present, there is a chance that the load of
> rwsem_has_spinner() in rwsem_wake() can be reordered with
> respect to decrement of rwsem count in __up_write() leading
> to wakeup being missed.
>
>  spinning writer                  up_write caller
>  ---------------                  -----------------------
>  [S] osq_unlock()                 [L] osq
>   spin_lock(wait_lock)
>   sem->count=0xFFFFFFFF00000001
>             +0xFFFFFFFF00000000
>   count=sem->count
>   MB
>                                    sem->count=0xFFFFFFFE00000001
>                                              -0xFFFFFFFF00000001
>                                    spin_trylock(wait_lock)
>                                    return
>  rwsem_try_write_lock(count)
>  spin_unlock(wait_lock)
>  schedule()
>
> Reordering of atomic_long_sub_return_release() in __up_write()
> and rwsem_has_spinner() in rwsem_wake() can cause missing of
> wakeup in up_write() context. In spinning writer, sem->count
> and local variable count is 0XFFFFFFFE00000001. It would result
> in rwsem_try_write_lock() failing to acquire rwsem and spinning
> writer going to sleep in rwsem_down_write_failed().
>
> The smp_rmb() will make sure that the spinner state is
> consulted after sem->count is updated in up_write context.
>
> Signed-off-by: Prateek Sood <prsood@codeaurora.org>

Did you actually observe that the reordering happens?

I am not sure if some architectures can actually speculatively execute
instruction ahead of a branch and then ahead into a function call. I
know it can happen if the function call is inlined, but rwsem_wake()
will not be inlined into __up_read() or __up_write().

Even if that is the case, I am not sure if smp_rmb() alone is enough to
guarantee the ordering as I think it will depend on how the
atomic_long_sub_return_release() is implmented.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] rwsem: fix missed wakeup due to reordering of load
@ 2017-07-26 20:17 Prateek Sood
  2017-07-27 15:48 ` Waiman Long
  2017-08-10 10:44 ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Prateek Sood @ 2017-07-26 20:17 UTC (permalink / raw)
  To: peterz, mingo; +Cc: Prateek Sood, sramana, linux-kernel

If a spinner is present, there is a chance that the load of
rwsem_has_spinner() in rwsem_wake() can be reordered with
respect to decrement of rwsem count in __up_write() leading
to wakeup being missed.

 spinning writer                  up_write caller
 ---------------                  -----------------------
 [S] osq_unlock()                 [L] osq
  spin_lock(wait_lock)
  sem->count=0xFFFFFFFF00000001
            +0xFFFFFFFF00000000
  count=sem->count
  MB
                                   sem->count=0xFFFFFFFE00000001
                                             -0xFFFFFFFF00000001
                                   spin_trylock(wait_lock)
                                   return
 rwsem_try_write_lock(count)
 spin_unlock(wait_lock)
 schedule()

Reordering of atomic_long_sub_return_release() in __up_write()
and rwsem_has_spinner() in rwsem_wake() can cause missing of
wakeup in up_write() context. In spinning writer, sem->count
and local variable count is 0XFFFFFFFE00000001. It would result
in rwsem_try_write_lock() failing to acquire rwsem and spinning
writer going to sleep in rwsem_down_write_failed().

The smp_rmb() will make sure that the spinner state is
consulted after sem->count is updated in up_write context.

Signed-off-by: Prateek Sood <prsood@codeaurora.org>
---
 kernel/locking/rwsem-xadd.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 34e727f..21c111a 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -585,6 +585,40 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	unsigned long flags;
 	DEFINE_WAKE_Q(wake_q);
 
+        /*
+         * If a spinner is present, there is a chance that the load of
+         * rwsem_has_spinner() in rwsem_wake() can be reordered with
+         * respect to decrement of rwsem count in __up_write() leading
+         * to wakeup being missed.
+         *
+         * spinning writer                  up_write caller
+         * ---------------                  -----------------------
+         * [S] osq_unlock()                 [L] osq
+         *  spin_lock(wait_lock)
+         *  sem->count=0xFFFFFFFF00000001
+         *            +0xFFFFFFFF00000000
+         *  count=sem->count
+         *  MB
+         *                                   sem->count=0xFFFFFFFE00000001
+         *                                             -0xFFFFFFFF00000001
+         *                                   spin_trylock(wait_lock)
+         *                                   return
+         * rwsem_try_write_lock(count)
+         * spin_unlock(wait_lock)
+         * schedule()
+         *
+         * Reordering of atomic_long_sub_return_release() in __up_write()
+         * and rwsem_has_spinner() in rwsem_wake() can cause missing of
+         * wakeup in up_write() context. In spinning writer, sem->count
+         * and local variable count is 0XFFFFFFFE00000001. It would result
+         * in rwsem_try_write_lock() failing to acquire rwsem and spinning
+         * writer going to sleep in rwsem_down_write_failed().
+         *
+         * The smp_rmb() here is to make sure that the spinner state is
+         * consulted after sem->count is updated in up_write context.
+         */
+        smp_rmb();
+
 	/*
 	 * If a spinner is present, it is not necessary to do the wakeup.
 	 * Try to do wakeup only if the trylock succeeds to minimize
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., 
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-23 11:28 [PATCH] rwsem: fix missed wakeup due to reordering of load Prateek Sood
2017-08-24 11:29 ` Peter Zijlstra
2017-08-24 12:33   ` Peter Zijlstra
2017-08-24 12:52     ` Peter Zijlstra
2017-09-07 14:08       ` Prateek Sood
  -- strict thread matches above, loose matches on Subject: below --
2017-09-07 14:30 Prateek Sood
2017-09-19 14:05 ` Andrea Parri
2017-09-20 14:52 ` Davidlohr Bueso
2017-09-20 21:17   ` Andrea Parri
2017-09-27 21:20     ` Davidlohr Bueso
2017-09-26 18:37 ` Prateek Sood
2017-07-26 20:17 Prateek Sood
2017-07-27 15:48 ` Waiman Long
2017-07-27 16:59   ` Peter Zijlstra
2017-08-10  8:32   ` Andrea Parri
2017-08-10 10:41     ` Peter Zijlstra
2017-08-10 10:44 ` Peter Zijlstra

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git