All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
@ 2022-09-29 18:04 Waiman Long
  2022-09-29 18:06 ` Waiman Long
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Waiman Long @ 2022-09-29 18:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Mukesh Ojha, Waiman Long

A non-first waiter can potentially spin in the for loop of
rwsem_down_write_slowpath() without sleeping but fail to acquire the
lock even if the rwsem is free if the following sequence happens:

  Non-first waiter       First waiter      Lock holder
  ----------------       ------------      -----------
  Acquire wait_lock
  rwsem_try_write_lock():
    Set handoff bit if RT or
      wait too long
    Set waiter->handoff_set
  Release wait_lock
                         Acquire wait_lock
                         Inherit waiter->handoff_set
                         Release wait_lock
					   Clear owner
                                           Release lock
  if (waiter.handoff_set) {
    rwsem_spin_on_owner(();
    if (OWNER_NULL)
      goto trylock_again;
  }
  trylock_again:
  Acquire wait_lock
  rwsem_try_write_lock():
     if (first->handoff_set && (waiter != first))
     	return false;
  Release wait_lock

It is especially problematic if the non-first waiter is an RT task and
it is running on the same CPU as the first waiter as this can lead to
live lock.

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 65f0262f635e..ad676e99e0b3 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 		new = count;
 
 		if (count & RWSEM_LOCK_MASK) {
+			/*
+			 * A waiter (first or not) can set the handoff bit
+			 * if it is an RT task or wait in the wait queue
+			 * for too long.
+			 */
 			if (has_handoff || (!rt_task(waiter->task) &&
 					    !time_after(jiffies, waiter->timeout)))
 				return false;
@@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 	} while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
 
 	/*
-	 * We have either acquired the lock with handoff bit cleared or
-	 * set the handoff bit.
+	 * We have either acquired the lock with handoff bit cleared or set
+	 * the handoff bit. Only the first waiter can have its handoff_set
+	 * set here to enable optimistic spinning in slowpath loop.
 	 */
 	if (new & RWSEM_FLAG_HANDOFF) {
-		waiter->handoff_set = true;
+		if (waiter == first)
+			waiter->handoff_set = true;
 		lockevent_inc(rwsem_wlock_handoff);
 		return false;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-09-29 18:04 [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
@ 2022-09-29 18:06 ` Waiman Long
  2022-09-30  4:47   ` Mukesh Ojha
  2022-10-10 10:24   ` Mukesh Ojha
  2022-09-29 21:15 ` John Donnelly
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: Waiman Long @ 2022-09-29 18:06 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Peter Zijlstra,
	Will Deacon, Boqun Feng, Ingo Molnar

On 9/29/22 14:04, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
>
>    Non-first waiter       First waiter      Lock holder
>    ----------------       ------------      -----------
>    Acquire wait_lock
>    rwsem_try_write_lock():
>      Set handoff bit if RT or
>        wait too long
>      Set waiter->handoff_set
>    Release wait_lock
>                           Acquire wait_lock
>                           Inherit waiter->handoff_set
>                           Release wait_lock
> 					   Clear owner
>                                             Release lock
>    if (waiter.handoff_set) {
>      rwsem_spin_on_owner(();
>      if (OWNER_NULL)
>        goto trylock_again;
>    }
>    trylock_again:
>    Acquire wait_lock
>    rwsem_try_write_lock():
>       if (first->handoff_set && (waiter != first))
>       	return false;
>    Release wait_lock
>
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.
>
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>   kernel/locking/rwsem.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)

Mukesh, can you test if this patch can fix the RT task lockup problem?

Thanks,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-09-29 18:04 [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
  2022-09-29 18:06 ` Waiman Long
@ 2022-09-29 21:15 ` John Donnelly
  2022-09-30  1:09   ` Waiman Long
  2022-09-30  4:46 ` Mukesh Ojha
  2022-09-30  5:06 ` Mukesh Ojha
  3 siblings, 1 reply; 20+ messages in thread
From: John Donnelly @ 2022-09-29 21:15 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, Hillf Danton, Mukesh Ojha, John Donnelly

On 9/29/22 13:04, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
> 
>    Non-first waiter       First waiter      Lock holder
>    ----------------       ------------      -----------
>    Acquire wait_lock
>    rwsem_try_write_lock():
>      Set handoff bit if RT or
>        wait too long
>      Set waiter->handoff_set
>    Release wait_lock
>                           Acquire wait_lock
>                           Inherit waiter->handoff_set
>                           Release wait_lock
> 					   Clear owner
>                                             Release lock
>    if (waiter.handoff_set) {
>      rwsem_spin_on_owner(();
>      if (OWNER_NULL)
>        goto trylock_again;
>    }
>    trylock_again:
>    Acquire wait_lock
>    rwsem_try_write_lock():
>       if (first->handoff_set && (waiter != first))
>       	return false;
>    Release wait_lock
> 
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.
> 
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>   kernel/locking/rwsem.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 65f0262f635e..ad676e99e0b3 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
>   		new = count;
>   
>   		if (count & RWSEM_LOCK_MASK) {
> +			/*
> +			 * A waiter (first or not) can set the handoff bit
> +			 * if it is an RT task or wait in the wait queue
> +			 * for too long.
> +			 */
>   			if (has_handoff || (!rt_task(waiter->task) &&
>   					    !time_after(jiffies, waiter->timeout)))
>   				return false;
> @@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
>   	} while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
>   
>   	/*
> -	 * We have either acquired the lock with handoff bit cleared or
> -	 * set the handoff bit.
> +	 * We have either acquired the lock with handoff bit cleared or set
> +	 * the handoff bit. Only the first waiter can have its handoff_set
> +	 * set here to enable optimistic spinning in slowpath loop.
>   	 */
>   	if (new & RWSEM_FLAG_HANDOFF) {
> -		waiter->handoff_set = true;
> +		if (waiter == first)
> +			waiter->handoff_set = true;
>   		lockevent_inc(rwsem_wlock_handoff);
>   		return false;
>   	}
Hi,.

Are you missing


[PATCH 5.18 87/88] locking/rwsem: Allow slowpath writer to ignore 
handoff bit if not set by first waiter



Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more 
consistent")

Or is this another regression ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-09-29 21:15 ` John Donnelly
@ 2022-09-30  1:09   ` Waiman Long
  0 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2022-09-30  1:09 UTC (permalink / raw)
  To: John Donnelly, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, Hillf Danton, Mukesh Ojha

On 9/29/22 17:15, John Donnelly wrote:
> On 9/29/22 13:04, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>>                        Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>           return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more 
>> consistent")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 65f0262f635e..ad676e99e0b3 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct 
>> rw_semaphore *sem,
>>           new = count;
>>             if (count & RWSEM_LOCK_MASK) {
>> +            /*
>> +             * A waiter (first or not) can set the handoff bit
>> +             * if it is an RT task or wait in the wait queue
>> +             * for too long.
>> +             */
>>               if (has_handoff || (!rt_task(waiter->task) &&
>>                           !time_after(jiffies, waiter->timeout)))
>>                   return false;
>> @@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct 
>> rw_semaphore *sem,
>>       } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, 
>> new));
>>         /*
>> -     * We have either acquired the lock with handoff bit cleared or
>> -     * set the handoff bit.
>> +     * We have either acquired the lock with handoff bit cleared or set
>> +     * the handoff bit. Only the first waiter can have its handoff_set
>> +     * set here to enable optimistic spinning in slowpath loop.
>>        */
>>       if (new & RWSEM_FLAG_HANDOFF) {
>> -        waiter->handoff_set = true;
>> +        if (waiter == first)
>> +            waiter->handoff_set = true;
>>           lockevent_inc(rwsem_wlock_handoff);
>>           return false;
>>       }
> Hi,.
>
> Are you missing
>
>
> [PATCH 5.18 87/88] locking/rwsem: Allow slowpath writer to ignore 
> handoff bit if not set by first waiter
>
>
>
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more 
> consistent")
>
> Or is this another regression ?
>
It is another regression which.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-09-29 18:04 [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
  2022-09-29 18:06 ` Waiman Long
  2022-09-29 21:15 ` John Donnelly
@ 2022-09-30  4:46 ` Mukesh Ojha
  2022-09-30 13:57   ` Waiman Long
  2022-09-30  5:06 ` Mukesh Ojha
  3 siblings, 1 reply; 20+ messages in thread
From: Mukesh Ojha @ 2022-09-30  4:46 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton

Hi,

Thanks for looking into this issue.

On 9/29/2022 11:34 PM, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
> 
>    Non-first waiter       First waiter      Lock holder
>    ----------------       ------------      -----------
>    Acquire wait_lock
>    rwsem_try_write_lock():
>      Set handoff bit if RT or
>        wait too long
>      Set waiter->handoff_set
>    Release wait_lock
>                           Acquire wait_lock
>                           Inherit waiter->handoff_set
>                           Release wait_lock
> 					   Clear owner
>                                             Release lock
>    if (waiter.handoff_set) {
>      rwsem_spin_on_owner(();
>      if (OWNER_NULL)
>        goto trylock_again;
>    }
>    trylock_again:
>    Acquire wait_lock
>    rwsem_try_write_lock():
>       if (first->handoff_set && (waiter != first))
>       	return false;
>    Release wait_lock
> 
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.
> 
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>   kernel/locking/rwsem.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 65f0262f635e..ad676e99e0b3 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
>   		new = count;
>   
>   		if (count & RWSEM_LOCK_MASK) {
> +			/*
> +			 * A waiter (first or not) can set the handoff bit
> +			 * if it is an RT task or wait in the wait queue
> +			 * for too long.
> +			 */
>   			if (has_handoff || (!rt_task(waiter->task) &&
>   					    !time_after(jiffies, waiter->timeout)))
>   				return false;
> @@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
>   	} while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
>   
>   	/*
> -	 * We have either acquired the lock with handoff bit cleared or
> -	 * set the handoff bit.
> +	 * We have either acquired the lock with handoff bit cleared or set
> +	 * the handoff bit. Only the first waiter can have its handoff_set
> +	 * set here to enable optimistic spinning in slowpath loop.
>   	 */
>   	if (new & RWSEM_FLAG_HANDOFF) {
> -		waiter->handoff_set = true;
> +		if (waiter == first)
> +			waiter->handoff_set = true;
>   		lockevent_inc(rwsem_wlock_handoff);

nit: Should this not get incremented on waiter->handoff_set=true only ?


>   		return false;
>   	}

-Mukesh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-09-29 18:06 ` Waiman Long
@ 2022-09-30  4:47   ` Mukesh Ojha
  2022-10-10 10:24   ` Mukesh Ojha
  1 sibling, 0 replies; 20+ messages in thread
From: Mukesh Ojha @ 2022-09-30  4:47 UTC (permalink / raw)
  To: Waiman Long
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Peter Zijlstra,
	Will Deacon, Boqun Feng, Ingo Molnar

Hi,

On 9/29/2022 11:36 PM, Waiman Long wrote:
> On 9/29/22 14:04, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>>                        Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>           return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more 
>> consistent")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> Mukesh, can you test if this patch can fix the RT task lockup problem?

I will come back with the result.

Thank you.

> 
> Thanks,
> Longman
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-09-29 18:04 [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
                   ` (2 preceding siblings ...)
  2022-09-30  4:46 ` Mukesh Ojha
@ 2022-09-30  5:06 ` Mukesh Ojha
  2022-09-30 14:08   ` Waiman Long
  3 siblings, 1 reply; 20+ messages in thread
From: Mukesh Ojha @ 2022-09-30  5:06 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton

Hi,

On 9/29/2022 11:34 PM, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
> 
>    Non-first waiter       First waiter      Lock holder
>    ----------------       ------------      -----------
>    Acquire wait_lock
>    rwsem_try_write_lock():
>      Set handoff bit if RT or
>        wait too long
>      Set waiter->handoff_set
>    Release wait_lock
>                           Acquire wait_lock
>                           Inherit waiter->handoff_set
>                           Release wait_lock
> 					   Clear owner
>                                             Release lock
>    if (waiter.handoff_set) {
>      rwsem_spin_on_owner(();
>      if (OWNER_NULL)
>        goto trylock_again;
>    }
>    trylock_again:
>    Acquire wait_lock
>    rwsem_try_write_lock():
>       if (first->handoff_set && (waiter != first))
>       	return false;
>    Release wait_lock
> 
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.
> 
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>   kernel/locking/rwsem.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 65f0262f635e..ad676e99e0b3 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
>   		new = count;
>   
>   		if (count & RWSEM_LOCK_MASK) {
> +			/*
> +			 * A waiter (first or not) can set the handoff bit
> +			 * if it is an RT task or wait in the wait queue
> +			 * for too long.
> +			 */
>   			if (has_handoff || (!rt_task(waiter->task) &&
>   					    !time_after(jiffies, waiter->timeout)))

Not related to this issue, however wanted to understand the idea about this.

If RT task comes in any order either come first or later it is setting 
the RWSEM_FLAG_HANDOFF bit.
So, here we are giving some priority right a way to RT task however it
can not get waiter->handoff_set=true since it is not the first 
waiter.(after this patch), is it not conflicting ?


Why can't we just keep like as below and not set
new |= RWSEM_FLAG_HANDOFF; and return false from here.

--------------0<------------------------------------
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 65f0262..dbe3e16 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -628,8 +628,8 @@ static inline bool rwsem_try_write_lock(struct
rw_semaphore *sem,
                  new = count;

                  if (count & RWSEM_LOCK_MASK) {
-                       if (has_handoff || (!rt_task(waiter->task) &&
-                                           !time_after(jiffies,
waiter->timeout)))
+                       if (has_handoff || (rt_task(waiter->task) &&
waiter != first) ||
+                          (!rt_task(waiter->task) &&
!time_after(jiffies, waiter->timeout)))
                                  return false;



-Mukesh

>   				return false;
> @@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
>   	} while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
>   
>   	/*
> -	 * We have either acquired the lock with handoff bit cleared or
> -	 * set the handoff bit.
> +	 * We have either acquired the lock with handoff bit cleared or set
> +	 * the handoff bit. Only the first waiter can have its handoff_set
> +	 * set here to enable optimistic spinning in slowpath loop.
>   	 */
>   	if (new & RWSEM_FLAG_HANDOFF) {
> -		waiter->handoff_set = true;
> +		if (waiter == first)
> +			waiter->handoff_set = true;
>   		lockevent_inc(rwsem_wlock_handoff);
>   		return false;
>   	}

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-09-30  4:46 ` Mukesh Ojha
@ 2022-09-30 13:57   ` Waiman Long
  0 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2022-09-30 13:57 UTC (permalink / raw)
  To: Mukesh Ojha, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton

On 9/30/22 00:46, Mukesh Ojha wrote:
> Hi,
>
> Thanks for looking into this issue.
>
> On 9/29/2022 11:34 PM, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>>                        Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>           return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more 
>> consistent")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 65f0262f635e..ad676e99e0b3 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct 
>> rw_semaphore *sem,
>>           new = count;
>>             if (count & RWSEM_LOCK_MASK) {
>> +            /*
>> +             * A waiter (first or not) can set the handoff bit
>> +             * if it is an RT task or wait in the wait queue
>> +             * for too long.
>> +             */
>>               if (has_handoff || (!rt_task(waiter->task) &&
>>                           !time_after(jiffies, waiter->timeout)))
>>                   return false;
>> @@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct 
>> rw_semaphore *sem,
>>       } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, 
>> new));
>>         /*
>> -     * We have either acquired the lock with handoff bit cleared or
>> -     * set the handoff bit.
>> +     * We have either acquired the lock with handoff bit cleared or set
>> +     * the handoff bit. Only the first waiter can have its handoff_set
>> +     * set here to enable optimistic spinning in slowpath loop.
>>        */
>>       if (new & RWSEM_FLAG_HANDOFF) {
>> -        waiter->handoff_set = true;
>> +        if (waiter == first)
>> +            waiter->handoff_set = true;
>>           lockevent_inc(rwsem_wlock_handoff);
>
> nit: Should this not get incremented on waiter->handoff_set=true only ?

The lock event counter records the # of time a handoff bit is set. It is 
not related to how the handoff_set in the waiter structure is being set.

cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-09-30  5:06 ` Mukesh Ojha
@ 2022-09-30 14:08   ` Waiman Long
  2022-09-30 16:03     ` Mukesh Ojha
  0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2022-09-30 14:08 UTC (permalink / raw)
  To: Mukesh Ojha, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton

On 9/30/22 01:06, Mukesh Ojha wrote:
> Hi,
>
> On 9/29/2022 11:34 PM, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>>                        Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>           return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more 
>> consistent")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 65f0262f635e..ad676e99e0b3 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct 
>> rw_semaphore *sem,
>>           new = count;
>>             if (count & RWSEM_LOCK_MASK) {
>> +            /*
>> +             * A waiter (first or not) can set the handoff bit
>> +             * if it is an RT task or wait in the wait queue
>> +             * for too long.
>> +             */
>>               if (has_handoff || (!rt_task(waiter->task) &&
>>                           !time_after(jiffies, waiter->timeout)))
>
> Not related to this issue, however wanted to understand the idea about 
> this.
>
> If RT task comes in any order either come first or later it is setting 
> the RWSEM_FLAG_HANDOFF bit.
> So, here we are giving some priority right a way to RT task however it
> can not get waiter->handoff_set=true since it is not the first 
> waiter.(after this patch), is it not conflicting ?

I have thought about moving the RT task forward in the wait queue, but 
then it will greatly complicate the code to try to do what a PREEMPT_RT 
kernel does using a rt_mutex variant of rwsem. The reason why HANDOFF 
bit is set when an RT task is in the wait queue is speed up the 
progression of the wait queue without the interference of optimistic 
spinner.

>
>
> Why can't we just keep like as below and not set
> new |= RWSEM_FLAG_HANDOFF; and return false from here.
>
> --------------0<------------------------------------
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 65f0262..dbe3e16 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -628,8 +628,8 @@ static inline bool rwsem_try_write_lock(struct
> rw_semaphore *sem,
>                  new = count;
>
>                  if (count & RWSEM_LOCK_MASK) {
> -                       if (has_handoff || (!rt_task(waiter->task) &&
> -                                           !time_after(jiffies,
> waiter->timeout)))
> +                       if (has_handoff || (rt_task(waiter->task) &&
> waiter != first) ||
> +                          (!rt_task(waiter->task) &&
> !time_after(jiffies, waiter->timeout)))
>                                  return false;
>
As I said above, we want to make more forward progress in the wait queue 
if a RT task is waiting there to try to reduce its latency. That is the 
point of that if statement.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-09-30 14:08   ` Waiman Long
@ 2022-09-30 16:03     ` Mukesh Ojha
  0 siblings, 0 replies; 20+ messages in thread
From: Mukesh Ojha @ 2022-09-30 16:03 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, john.p.donnelly, Hillf Danton

Hi,

On 9/30/2022 7:38 PM, Waiman Long wrote:
> On 9/30/22 01:06, Mukesh Ojha wrote:
>> Hi,
>>
>> On 9/29/2022 11:34 PM, Waiman Long wrote:
>>> A non-first waiter can potentially spin in the for loop of
>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>> lock even if the rwsem is free if the following sequence happens:
>>>
>>>    Non-first waiter       First waiter      Lock holder
>>>    ----------------       ------------      -----------
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>      Set handoff bit if RT or
>>>        wait too long
>>>      Set waiter->handoff_set
>>>    Release wait_lock
>>>                           Acquire wait_lock
>>>                           Inherit waiter->handoff_set
>>>                           Release wait_lock
>>>                        Clear owner
>>>                                             Release lock
>>>    if (waiter.handoff_set) {
>>>      rwsem_spin_on_owner(();
>>>      if (OWNER_NULL)
>>>        goto trylock_again;
>>>    }
>>>    trylock_again:
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>       if (first->handoff_set && (waiter != first))
>>>           return false;
>>>    Release wait_lock
>>>
>>> It is especially problematic if the non-first waiter is an RT task and
>>> it is running on the same CPU as the first waiter as this can lead to
>>> live lock.
>>>
>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more 
>>> consistent")
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>>> index 65f0262f635e..ad676e99e0b3 100644
>>> --- a/kernel/locking/rwsem.c
>>> +++ b/kernel/locking/rwsem.c
>>> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct 
>>> rw_semaphore *sem,
>>>           new = count;
>>>             if (count & RWSEM_LOCK_MASK) {
>>> +            /*
>>> +             * A waiter (first or not) can set the handoff bit
>>> +             * if it is an RT task or wait in the wait queue
>>> +             * for too long.
>>> +             */
>>>               if (has_handoff || (!rt_task(waiter->task) &&
>>>                           !time_after(jiffies, waiter->timeout)))
>>
>> Not related to this issue, however wanted to understand the idea about 
>> this.
>>
>> If RT task comes in any order either come first or later it is setting 
>> the RWSEM_FLAG_HANDOFF bit.
>> So, here we are giving some priority right a way to RT task however it
>> can not get waiter->handoff_set=true since it is not the first 
>> waiter.(after this patch), is it not conflicting ?
> 
> I have thought about moving the RT task forward in the wait queue, but 
> then it will greatly complicate the code to try to do what a PREEMPT_RT 
> kernel does using a rt_mutex variant of rwsem. The reason why HANDOFF 
> bit is set when an RT task is in the wait queue is speed up the 
> progression of the wait queue without the interference of optimistic 
> spinner.
> 

Thanks for taking time to explain the motivation behind it.

It will take sometime to test your patch from our side however, like the 
other patches this patch too seem to fix the issue.

Feel free to add
  Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
  Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>

-Mukesh

>>
>>
>> Why can't we just keep like as below and not set
>> new |= RWSEM_FLAG_HANDOFF; and return false from here.
>>
>> --------------0<------------------------------------
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 65f0262..dbe3e16 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -628,8 +628,8 @@ static inline bool rwsem_try_write_lock(struct
>> rw_semaphore *sem,
>>                  new = count;
>>
>>                  if (count & RWSEM_LOCK_MASK) {
>> -                       if (has_handoff || (!rt_task(waiter->task) &&
>> -                                           !time_after(jiffies,
>> waiter->timeout)))
>> +                       if (has_handoff || (rt_task(waiter->task) &&
>> waiter != first) ||
>> +                          (!rt_task(waiter->task) &&
>> !time_after(jiffies, waiter->timeout)))
>>                                  return false;
>>
> As I said above, we want to make more forward progress in the wait queue 
> if a RT task is waiting there to try to reduce its latency. That is the 
> point of that if statement.
> 
> Cheers,
> Longman
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-09-29 18:06 ` Waiman Long
  2022-09-30  4:47   ` Mukesh Ojha
@ 2022-10-10 10:24   ` Mukesh Ojha
  2022-10-10 16:57     ` Waiman Long
  2022-10-11 10:46     ` Hillf Danton
  1 sibling, 2 replies; 20+ messages in thread
From: Mukesh Ojha @ 2022-10-10 10:24 UTC (permalink / raw)
  To: Waiman Long
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Peter Zijlstra,
	Will Deacon, Boqun Feng, Ingo Molnar

Hi Waiman,


On 9/29/2022 11:36 PM, Waiman Long wrote:
> On 9/29/22 14:04, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>>                        Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>           return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more 
>> consistent")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> Mukesh, can you test if this patch can fix the RT task lockup problem?
>

Looks like, There is still a window for a race.

There is a chance when a reader who came first added it's BIAS and goes 
to slowpath and before it gets added to wait list it got preempted by RT 
task which  goes to slowpath as well and being the first waiter gets its 
hand-off bit set and not able to get the lock due to following condition 
in rwsem_try_write_lock()

  630                 if (count & RWSEM_LOCK_MASK) {  ==> reader has 
sets its bias
..
...

  634
  635                         new |= RWSEM_FLAG_HANDOFF;
  636                 } else {
  637                         new |= RWSEM_WRITER_LOCKED;


---------------------->----------------------->-------------------------

First reader (1)    	  writer(2) RT task             Lock holder(3)

It sets
RWSEM_READER_BIAS.
while it is going to
slowpath(as the lock
was held by (3)) and
before it got added
to the waiters list
it got preempted
by (2).
                         RT task also takes
                         the slowpath and add              release the 

                         itself into waiting list          rwsem lock 

			and since it is the first         clear the
                         it is the next one to get         owner.
                         the lock but it can not
                         get the lock as (count &
                         RWSEM_LOCK_MASK) is set
                         as (1) has added it but
                         not able to remove its
			adjustment.


-Mukesh


> Thanks,
> Longman
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-10 10:24   ` Mukesh Ojha
@ 2022-10-10 16:57     ` Waiman Long
       [not found]       ` <03421f0e453143f6ae4a9a7a8a4f47f4@xiaomi.com>
  2022-10-11 10:46     ` Hillf Danton
  1 sibling, 1 reply; 20+ messages in thread
From: Waiman Long @ 2022-10-10 16:57 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: linux-kernel, john.p.donnelly, Hillf Danton, Peter Zijlstra,
	Will Deacon, Boqun Feng, Ingo Molnar

On 10/10/22 06:24, Mukesh Ojha wrote:
> Hi Waiman,
>
>
> On 9/29/2022 11:36 PM, Waiman Long wrote:
>> On 9/29/22 14:04, Waiman Long wrote:
>>> A non-first waiter can potentially spin in the for loop of
>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>> lock even if the rwsem is free if the following sequence happens:
>>>
>>>    Non-first waiter       First waiter      Lock holder
>>>    ----------------       ------------      -----------
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>      Set handoff bit if RT or
>>>        wait too long
>>>      Set waiter->handoff_set
>>>    Release wait_lock
>>>                           Acquire wait_lock
>>>                           Inherit waiter->handoff_set
>>>                           Release wait_lock
>>>                        Clear owner
>>>                                             Release lock
>>>    if (waiter.handoff_set) {
>>>      rwsem_spin_on_owner(();
>>>      if (OWNER_NULL)
>>>        goto trylock_again;
>>>    }
>>>    trylock_again:
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>       if (first->handoff_set && (waiter != first))
>>>           return false;
>>>    Release wait_lock
>>>
>>> It is especially problematic if the non-first waiter is an RT task and
>>> it is running on the same CPU as the first waiter as this can lead to
>>> live lock.
>>>
>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more 
>>> consistent")
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>
>
> Looks like, There is still a window for a race.
>
> There is a chance when a reader who came first added it's BIAS and 
> goes to slowpath and before it gets added to wait list it got 
> preempted by RT task which  goes to slowpath as well and being the 
> first waiter gets its hand-off bit set and not able to get the lock 
> due to following condition in rwsem_try_write_lock()
>
>  630                 if (count & RWSEM_LOCK_MASK) {  ==> reader has 
> sets its bias
> ..
> ...
>
>  634
>  635                         new |= RWSEM_FLAG_HANDOFF;
>  636                 } else {
>  637                         new |= RWSEM_WRITER_LOCKED;
>
>
> ---------------------->----------------------->-------------------------
>
> First reader (1)          writer(2) RT task             Lock holder(3)
>
> It sets
> RWSEM_READER_BIAS.
> while it is going to
> slowpath(as the lock
> was held by (3)) and
> before it got added
> to the waiters list
> it got preempted
> by (2).
>                         RT task also takes
>                         the slowpath and add              release the
>                         itself into waiting list          rwsem lock
>             and since it is the first         clear the
>                         it is the next one to get         owner.
>                         the lock but it can not
>                         get the lock as (count &
>                         RWSEM_LOCK_MASK) is set
>                         as (1) has added it but
>                         not able to remove its
>             adjustment.
>
Good catch!

It is indeed a possible livelock scenario. Will update the patch to 
address that.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-10 10:24   ` Mukesh Ojha
  2022-10-10 16:57     ` Waiman Long
@ 2022-10-11 10:46     ` Hillf Danton
  2022-10-11 13:16       ` Mukesh Ojha
  2022-10-12 13:14       ` Waiman Long
  1 sibling, 2 replies; 20+ messages in thread
From: Hillf Danton @ 2022-10-11 10:46 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: linux-kernel, john.p.donnelly, linux-mm, Waiman Long,
	Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar

On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com>
> Hi Waiman,
>
> On 9/29/2022 11:36 PM, Waiman Long wrote:
>> On 9/29/22 14:04, Waiman Long wrote:
>>> A non-first waiter can potentially spin in the for loop of
>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>> lock even if the rwsem is free if the following sequence happens:
>>>
>>>    Non-first waiter       First waiter      Lock holder
>>>    ----------------       ------------      -----------
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>      Set handoff bit if RT or
>>>        wait too long
>>>      Set waiter->handoff_set
>>>    Release wait_lock
>>>                           Acquire wait_lock
>>>                           Inherit waiter->handoff_set
>>>                           Release wait_lock
>>>                        Clear owner
>>>                                             Release lock
>>>    if (waiter.handoff_set) {
>>>      rwsem_spin_on_owner(();
>>>      if (OWNER_NULL)
>>>        goto trylock_again;
>>>    }
>>>    trylock_again:
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>       if (first->handoff_set && (waiter != first))
>>>           return false;
>>>    Release wait_lock
>>>
>>> It is especially problematic if the non-first waiter is an RT task and
>>> it is running on the same CPU as the first waiter as this can lead to
>>> live lock.
>>>
>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more 
>>> consistent")
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>
>
> Looks like, There is still a window for a race.
>
> There is a chance when a reader who came first added it's BIAS and 
> goes to slowpath and before it gets added to wait list it got 
> preempted by RT task which  goes to slowpath as well and being the 
> first waiter gets its hand-off bit set and not able to get the lock 
> due to following condition in rwsem_try_write_lock()
>
>  630                 if (count & RWSEM_LOCK_MASK) {  ==> reader has 
> sets its bias
> ..
> ...
>
>  634
>  635                         new |= RWSEM_FLAG_HANDOFF;
>  636                 } else {
>  637                         new |= RWSEM_WRITER_LOCKED;
>
>
> ---------------------->----------------------->-------------------------
>
> First reader (1)          writer(2) RT task             Lock holder(3)
>
> It sets
> RWSEM_READER_BIAS.
> while it is going to
> slowpath(as the lock
> was held by (3)) and
> before it got added
> to the waiters list
> it got preempted
> by (2).
>                         RT task also takes
>                         the slowpath and add              release the
>                         itself into waiting list          rwsem lock
>             and since it is the first         clear the
>                         it is the next one to get         owner.
>                         the lock but it can not
>                         get the lock as (count &
>                         RWSEM_LOCK_MASK) is set
>                         as (1) has added it but
>                         not able to remove its
>             adjustment.
>
Hey Mukesh,

Can you test the diff if it makes sense to you?

It simply prevents the first waiter from spinning any longer after detecting
it barely makes any progress to spin without lock owner.

Hillf

--- mainline/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
 	long count, new;
 
 	lockdep_assert_held(&sem->wait_lock);
+	waiter->handoff_set = false;
 
 	count = atomic_long_read(&sem->count);
 	do {
 		bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
 
 		if (has_handoff) {
-			/*
-			 * Honor handoff bit and yield only when the first
-			 * waiter is the one that set it. Otherwisee, we
-			 * still try to acquire the rwsem.
-			 */
-			if (first->handoff_set && (waiter != first))
+			if (waiter != first)
 				return false;
-
-			/*
-			 * First waiter can inherit a previously set handoff
-			 * bit and spin on rwsem if lock acquisition fails.
-			 */
-			if (waiter == first)
-				waiter->handoff_set = true;
 		}
 
 		new = count;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-11 10:46     ` Hillf Danton
@ 2022-10-11 13:16       ` Mukesh Ojha
  2022-10-12  4:04         ` Hillf Danton
  2022-10-12 13:16         ` Waiman Long
  2022-10-12 13:14       ` Waiman Long
  1 sibling, 2 replies; 20+ messages in thread
From: Mukesh Ojha @ 2022-10-11 13:16 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-kernel, john.p.donnelly, linux-mm, Waiman Long,
	Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar

Hi @Hilf,

Thanks for looking into this issue.

On 10/11/2022 4:16 PM, Hillf Danton wrote:
> On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com>
>> Hi Waiman,
>>
>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>> On 9/29/22 14:04, Waiman Long wrote:
>>>> A non-first waiter can potentially spin in the for loop of
>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>> lock even if the rwsem is free if the following sequence happens:
>>>>
>>>>     Non-first waiter       First waiter      Lock holder
>>>>     ----------------       ------------      -----------
>>>>     Acquire wait_lock
>>>>     rwsem_try_write_lock():
>>>>       Set handoff bit if RT or
>>>>         wait too long
>>>>       Set waiter->handoff_set
>>>>     Release wait_lock
>>>>                            Acquire wait_lock
>>>>                            Inherit waiter->handoff_set
>>>>                            Release wait_lock
>>>>                         Clear owner
>>>>                                              Release lock
>>>>     if (waiter.handoff_set) {
>>>>       rwsem_spin_on_owner(();
>>>>       if (OWNER_NULL)
>>>>         goto trylock_again;
>>>>     }
>>>>     trylock_again:
>>>>     Acquire wait_lock
>>>>     rwsem_try_write_lock():
>>>>        if (first->handoff_set && (waiter != first))
>>>>            return false;
>>>>     Release wait_lock
>>>>
>>>> It is especially problematic if the non-first waiter is an RT task and
>>>> it is running on the same CPU as the first waiter as this can lead to
>>>> live lock.
>>>>
>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>> consistent")
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>    kernel/locking/rwsem.c | 13 ++++++++++---
>>>>    1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>
>>
>> Looks like, There is still a window for a race.
>>
>> There is a chance when a reader who came first added it's BIAS and
>> goes to slowpath and before it gets added to wait list it got
>> preempted by RT task which  goes to slowpath as well and being the
>> first waiter gets its hand-off bit set and not able to get the lock
>> due to following condition in rwsem_try_write_lock()

[]

>>
>>   630                 if (count & RWSEM_LOCK_MASK) {  ==> reader has
>> sets its bias
>> ..
>> ...
>>
>>   634
>>   635                         new |= RWSEM_FLAG_HANDOFF;
>>   636                 } else {
>>   637                         new |= RWSEM_WRITER_LOCKED;
>>
>>
>> ---------------------->----------------------->-------------------------
>>
>> First reader (1)          writer(2) RT task             Lock holder(3)
>>
>> It sets
>> RWSEM_READER_BIAS.
>> while it is going to
>> slowpath(as the lock
>> was held by (3)) and
>> before it got added
>> to the waiters list
>> it got preempted
>> by (2).
>>                         RT task also takes
>>                          the slowpath and add              release the
>>                          itself into waiting list          rwsem lock
>>              and since it is the first         clear the
>>                          it is the next one to get         owner.
>>                          the lock but it can not
>>                          get the lock as (count &
>>                          RWSEM_LOCK_MASK) is set
>>                          as (1) has added it but
>>                          not able to remove its
>>              adjustment.

[]

>>
> Hey Mukesh,
> 
> Can you test the diff if it makes sense to you?
> 
> It simply prevents the first waiter from spinning any longer after detecting
> it barely makes any progress to spin without lock owner.
> 
> Hillf
> 
> --- mainline/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
>   	long count, new;
>   
>   	lockdep_assert_held(&sem->wait_lock);
> +	waiter->handoff_set = false;
>   
>   	count = atomic_long_read(&sem->count);
>   	do {
>   		bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>   
>   		if (has_handoff) {
> -			/*
> -			 * Honor handoff bit and yield only when the first
> -			 * waiter is the one that set it. Otherwisee, we
> -			 * still try to acquire the rwsem.
> -			 */
> -			if (first->handoff_set && (waiter != first))
> +			if (waiter != first)
>   				return false;

you mean, you want to check and change waiter->handoff_set on every run 
rwsem_try_write_lock().

But does it break optimistic spinning ? @waiman ?

-Mukesh

> -
> -			/*
> -			 * First waiter can inherit a previously set handoff
> -			 * bit and spin on rwsem if lock acquisition fails.
> -			 */
> -			if (waiter == first)
> -				waiter->handoff_set = true;
>   		}
>   
>   		new = count;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-11 13:16       ` Mukesh Ojha
@ 2022-10-12  4:04         ` Hillf Danton
  2022-10-12 13:19           ` Waiman Long
  2022-10-12 13:42           ` Mukesh Ojha
  2022-10-12 13:16         ` Waiman Long
  1 sibling, 2 replies; 20+ messages in thread
From: Hillf Danton @ 2022-10-12  4:04 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: linux-kernel, john.p.donnelly, linux-mm, Waiman Long,
	Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar

On 11 Oct 2022 18:46:20 +0530 Mukesh Ojha <quic_mojha@quicinc.com>
>On 10/11/2022 4:16 PM, Hillf Danton wrote:
>> On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com>
>>> Hi Waiman,
>>>
>>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>>> On 9/29/22 14:04, Waiman Long wrote:
>>>>> A non-first waiter can potentially spin in the for loop of
>>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>>> lock even if the rwsem is free if the following sequence happens:
>>>>>
>>>>>     Non-first waiter       First waiter      Lock holder
>>>>>     ----------------       ------------      -----------
>>>>>     Acquire wait_lock
>>>>>     rwsem_try_write_lock():
>>>>>       Set handoff bit if RT or
>>>>>         wait too long
>>>>>       Set waiter->handoff_set
>>>>>     Release wait_lock
>>>>>                            Acquire wait_lock
>>>>>                            Inherit waiter->handoff_set
>>>>>                            Release wait_lock
>>>>>                         Clear owner
>>>>>                                              Release lock
>>>>>     if (waiter.handoff_set) {
>>>>>       rwsem_spin_on_owner(();
>>>>>       if (OWNER_NULL)
>>>>>         goto trylock_again;
>>>>>     }
>>>>>     trylock_again:
>>>>>     Acquire wait_lock
>>>>>     rwsem_try_write_lock():
>>>>>        if (first->handoff_set && (waiter != first))
>>>>>            return false;
>>>>>     Release wait_lock
>>>>>
>>>>> It is especially problematic if the non-first waiter is an RT task and
>>>>> it is running on the same CPU as the first waiter as this can lead to
>>>>> live lock.
>>>>>
>>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>>> consistent")
>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>> ---
>>>>>    kernel/locking/rwsem.c | 13 ++++++++++---
>>>>>    1 file changed, 10 insertions(+), 3 deletions(-)
>>>>
>>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>>
>>>
>>> Looks like, There is still a window for a race.
>>>
>>> There is a chance when a reader who came first added it's BIAS and
>>> goes to slowpath and before it gets added to wait list it got
>>> preempted by RT task which  goes to slowpath as well and being the
>>> first waiter gets its hand-off bit set and not able to get the lock
>>> due to following condition in rwsem_try_write_lock()
>
>[]
>
>>>
>>>   630                 if (count & RWSEM_LOCK_MASK) {  ==> reader has
>>> sets its bias
>>> ..
>>> ...
>>>
>>>   634
>>>   635                         new |= RWSEM_FLAG_HANDOFF;
>>>   636                 } else {
>>>   637                         new |= RWSEM_WRITER_LOCKED;
>>>
>>>
>>> ---------------------->----------------------->-------------------------
>>>
>>> First reader (1)          writer(2) RT task             Lock holder(3)
>>>
>>> It sets
>>> RWSEM_READER_BIAS.
>>> while it is going to
>>> slowpath(as the lock
>>> was held by (3)) and
>>> before it got added
>>> to the waiters list
>>> it got preempted
>>> by (2).
>>>                         RT task also takes
>>>                          the slowpath and add              release the
>>>                          itself into waiting list          rwsem lock
>>>              and since it is the first         clear the
>>>                          it is the next one to get         owner.
>>>                          the lock but it can not
>>>                          get the lock as (count &
>>>                          RWSEM_LOCK_MASK) is set
>>>                          as (1) has added it but
>>>                          not able to remove its
>>>              adjustment.
>
>[]
>
>>>
>> Hey Mukesh,
>> 
>> Can you test the diff if it makes sense to you?
>> 
>> It simply prevents the first waiter from spinning any longer after detecting
>> it barely makes any progress to spin without lock owner.
>> 
>> Hillf
>> 
>> --- mainline/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
>>   	long count, new;
>>   
>>   	lockdep_assert_held(&sem->wait_lock);
>> +	waiter->handoff_set = false;
>>   
>>   	count = atomic_long_read(&sem->count);
>>   	do {
>>   		bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>>   
>>   		if (has_handoff) {
>> -			/*
>> -			 * Honor handoff bit and yield only when the first
>> -			 * waiter is the one that set it. Otherwisee, we
>> -			 * still try to acquire the rwsem.
>> -			 */
>> -			if (first->handoff_set && (waiter != first))
>> +			if (waiter != first)
>>   				return false;
>
> you mean, you want to check and change waiter->handoff_set on every run 
> rwsem_try_write_lock().
>
Yes, with RWSEM_FLAG_HANDOFF set, it is too late for non first waiters to
spin, and with both RWSEM_LOCK_MASK and RWSEM_FLAG_HANDOFF set, the rivals
in the RWSEM_LOCK_MASK have an uphand over the first waiter wrt acquiring
the lock, and it is not a bad option for the first waiter to take a step
back off.

		if (count & RWSEM_LOCK_MASK) {
			if (has_handoff || (!rt_task(waiter->task) &&
					    !time_after(jiffies, waiter->timeout)))
				return false;

			new |= RWSEM_FLAG_HANDOFF;
		} else {

> But does it break optimistic spinning ? @waiman ?

Waiters spin for acquiring lock instead of lockup and your report shows
spinning too much makes trouble. The key is stop spinning neither too
late nor too early. My proposal is a simple one with as few heuristics
added as possible.

Hillf


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-11 10:46     ` Hillf Danton
  2022-10-11 13:16       ` Mukesh Ojha
@ 2022-10-12 13:14       ` Waiman Long
  1 sibling, 0 replies; 20+ messages in thread
From: Waiman Long @ 2022-10-12 13:14 UTC (permalink / raw)
  To: Hillf Danton, Mukesh Ojha
  Cc: linux-kernel, john.p.donnelly, linux-mm, Peter Zijlstra,
	Will Deacon, Boqun Feng, Ingo Molnar

On 10/11/22 06:46, Hillf Danton wrote:
> On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com>
>> Hi Waiman,
>>
>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>> On 9/29/22 14:04, Waiman Long wrote:
>>>> A non-first waiter can potentially spin in the for loop of
>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>> lock even if the rwsem is free if the following sequence happens:
>>>>
>>>>     Non-first waiter       First waiter      Lock holder
>>>>     ----------------       ------------      -----------
>>>>     Acquire wait_lock
>>>>     rwsem_try_write_lock():
>>>>       Set handoff bit if RT or
>>>>         wait too long
>>>>       Set waiter->handoff_set
>>>>     Release wait_lock
>>>>                            Acquire wait_lock
>>>>                            Inherit waiter->handoff_set
>>>>                            Release wait_lock
>>>>                         Clear owner
>>>>                                              Release lock
>>>>     if (waiter.handoff_set) {
>>>>       rwsem_spin_on_owner(();
>>>>       if (OWNER_NULL)
>>>>         goto trylock_again;
>>>>     }
>>>>     trylock_again:
>>>>     Acquire wait_lock
>>>>     rwsem_try_write_lock():
>>>>        if (first->handoff_set && (waiter != first))
>>>>            return false;
>>>>     Release wait_lock
>>>>
>>>> It is especially problematic if the non-first waiter is an RT task and
>>>> it is running on the same CPU as the first waiter as this can lead to
>>>> live lock.
>>>>
>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>> consistent")
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>    kernel/locking/rwsem.c | 13 ++++++++++---
>>>>    1 file changed, 10 insertions(+), 3 deletions(-)
>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>
>> Looks like, There is still a window for a race.
>>
>> There is a chance when a reader who came first added it's BIAS and
>> goes to slowpath and before it gets added to wait list it got
>> preempted by RT task which  goes to slowpath as well and being the
>> first waiter gets its hand-off bit set and not able to get the lock
>> due to following condition in rwsem_try_write_lock()
>>
>>   630                 if (count & RWSEM_LOCK_MASK) {  ==> reader has
>> sets its bias
>> ..
>> ...
>>
>>   634
>>   635                         new |= RWSEM_FLAG_HANDOFF;
>>   636                 } else {
>>   637                         new |= RWSEM_WRITER_LOCKED;
>>
>>
>> ---------------------->----------------------->-------------------------
>>
>> First reader (1)          writer(2) RT task             Lock holder(3)
>>
>> It sets
>> RWSEM_READER_BIAS.
>> while it is going to
>> slowpath(as the lock
>> was held by (3)) and
>> before it got added
>> to the waiters list
>> it got preempted
>> by (2).
>>                          RT task also takes
>>                          the slowpath and add              release the
>>                          itself into waiting list          rwsem lock
>>              and since it is the first         clear the
>>                          it is the next one to get         owner.
>>                          the lock but it can not
>>                          get the lock as (count &
>>                          RWSEM_LOCK_MASK) is set
>>                          as (1) has added it but
>>                          not able to remove its
>>              adjustment.
>>
> Hey Mukesh,
>
> Can you test the diff if it makes sense to you?
>
> It simply prevents the first waiter from spinning any longer after detecting
> it barely makes any progress to spin without lock owner.
>
> Hillf
>
> --- mainline/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
>   	long count, new;
>   
>   	lockdep_assert_held(&sem->wait_lock);
> +	waiter->handoff_set = false;
>   
>   	count = atomic_long_read(&sem->count);
>   	do {
>   		bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>   
>   		if (has_handoff) {
> -			/*
> -			 * Honor handoff bit and yield only when the first
> -			 * waiter is the one that set it. Otherwisee, we
> -			 * still try to acquire the rwsem.
> -			 */
> -			if (first->handoff_set && (waiter != first))
> +			if (waiter != first)
>   				return false;
> -
> -			/*
> -			 * First waiter can inherit a previously set handoff
> -			 * bit and spin on rwsem if lock acquisition fails.
> -			 */
> -			if (waiter == first)
> -				waiter->handoff_set = true;
>   		}
>   
>   		new = count;

That is somewhat equivalent to allowing first waiter to spin only once 
after setting the handoff bit. It also remove the code to handle some of 
corner cases.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-11 13:16       ` Mukesh Ojha
  2022-10-12  4:04         ` Hillf Danton
@ 2022-10-12 13:16         ` Waiman Long
  1 sibling, 0 replies; 20+ messages in thread
From: Waiman Long @ 2022-10-12 13:16 UTC (permalink / raw)
  To: Mukesh Ojha, Hillf Danton
  Cc: linux-kernel, john.p.donnelly, linux-mm, Peter Zijlstra,
	Will Deacon, Boqun Feng, Ingo Molnar

On 10/11/22 09:16, Mukesh Ojha wrote:
> Hi @Hilf,
>
> Thanks for looking into this issue.
>
> On 10/11/2022 4:16 PM, Hillf Danton wrote:
>> On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com>
>>> Hi Waiman,
>>>
>>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>>> On 9/29/22 14:04, Waiman Long wrote:
>>>>> A non-first waiter can potentially spin in the for loop of
>>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>>> lock even if the rwsem is free if the following sequence happens:
>>>>>
>>>>>     Non-first waiter       First waiter      Lock holder
>>>>>     ----------------       ------------      -----------
>>>>>     Acquire wait_lock
>>>>>     rwsem_try_write_lock():
>>>>>       Set handoff bit if RT or
>>>>>         wait too long
>>>>>       Set waiter->handoff_set
>>>>>     Release wait_lock
>>>>>                            Acquire wait_lock
>>>>>                            Inherit waiter->handoff_set
>>>>>                            Release wait_lock
>>>>>                         Clear owner
>>>>>                                              Release lock
>>>>>     if (waiter.handoff_set) {
>>>>>       rwsem_spin_on_owner(();
>>>>>       if (OWNER_NULL)
>>>>>         goto trylock_again;
>>>>>     }
>>>>>     trylock_again:
>>>>>     Acquire wait_lock
>>>>>     rwsem_try_write_lock():
>>>>>        if (first->handoff_set && (waiter != first))
>>>>>            return false;
>>>>>     Release wait_lock
>>>>>
>>>>> It is especially problematic if the non-first waiter is an RT task 
>>>>> and
>>>>> it is running on the same CPU as the first waiter as this can lead to
>>>>> live lock.
>>>>>
>>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>>> consistent")
>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>> ---
>>>>>    kernel/locking/rwsem.c | 13 ++++++++++---
>>>>>    1 file changed, 10 insertions(+), 3 deletions(-)
>>>>
>>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>>
>>>
>>> Looks like, There is still a window for a race.
>>>
>>> There is a chance when a reader who came first added it's BIAS and
>>> goes to slowpath and before it gets added to wait list it got
>>> preempted by RT task which  goes to slowpath as well and being the
>>> first waiter gets its hand-off bit set and not able to get the lock
>>> due to following condition in rwsem_try_write_lock()
>
> []
>
>>>
>>>   630                 if (count & RWSEM_LOCK_MASK) { ==> reader has
>>> sets its bias
>>> ..
>>> ...
>>>
>>>   634
>>>   635                         new |= RWSEM_FLAG_HANDOFF;
>>>   636                 } else {
>>>   637                         new |= RWSEM_WRITER_LOCKED;
>>>
>>>
>>> ---------------------->----------------------->------------------------- 
>>>
>>>
>>> First reader (1)          writer(2) RT task             Lock holder(3)
>>>
>>> It sets
>>> RWSEM_READER_BIAS.
>>> while it is going to
>>> slowpath(as the lock
>>> was held by (3)) and
>>> before it got added
>>> to the waiters list
>>> it got preempted
>>> by (2).
>>>                         RT task also takes
>>>                          the slowpath and add release the
>>>                          itself into waiting list rwsem lock
>>>              and since it is the first         clear the
>>>                          it is the next one to get owner.
>>>                          the lock but it can not
>>>                          get the lock as (count &
>>>                          RWSEM_LOCK_MASK) is set
>>>                          as (1) has added it but
>>>                          not able to remove its
>>>              adjustment.
>
> []
>
>>>
>> Hey Mukesh,
>>
>> Can you test the diff if it makes sense to you?
>>
>> It simply prevents the first waiter from spinning any longer after 
>> detecting
>> it barely makes any progress to spin without lock owner.
>>
>> Hillf
>>
>> --- mainline/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
>>       long count, new;
>>         lockdep_assert_held(&sem->wait_lock);
>> +    waiter->handoff_set = false;
>>         count = atomic_long_read(&sem->count);
>>       do {
>>           bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>>             if (has_handoff) {
>> -            /*
>> -             * Honor handoff bit and yield only when the first
>> -             * waiter is the one that set it. Otherwisee, we
>> -             * still try to acquire the rwsem.
>> -             */
>> -            if (first->handoff_set && (waiter != first))
>> +            if (waiter != first)
>>                   return false;
>
> you mean, you want to check and change waiter->handoff_set on every 
> run rwsem_try_write_lock().
>
> But does it break optimistic spinning ? @waiman ?
>
As I said in my previous mail, this is equivalent to allow only one 
optimistic spinning attempt after setting the handoff bit. That will 
likely reduce the performance benefit provided by this feature.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-12  4:04         ` Hillf Danton
@ 2022-10-12 13:19           ` Waiman Long
  2022-10-12 13:42           ` Mukesh Ojha
  1 sibling, 0 replies; 20+ messages in thread
From: Waiman Long @ 2022-10-12 13:19 UTC (permalink / raw)
  To: Hillf Danton, Mukesh Ojha
  Cc: linux-kernel, john.p.donnelly, linux-mm, Peter Zijlstra,
	Will Deacon, Boqun Feng, Ingo Molnar

On 10/12/22 00:04, Hillf Danton wrote:
>> you mean, you want to check and change waiter->handoff_set on every run
>> rwsem_try_write_lock().
>>
> Yes, with RWSEM_FLAG_HANDOFF set, it is too late for non first waiters to
> spin, and with both RWSEM_LOCK_MASK and RWSEM_FLAG_HANDOFF set, the rivals
> in the RWSEM_LOCK_MASK have an uphand over the first waiter wrt acquiring
> the lock, and it is not a bad option for the first waiter to take a step
> back off.
>
> 		if (count & RWSEM_LOCK_MASK) {
> 			if (has_handoff || (!rt_task(waiter->task) &&
> 					    !time_after(jiffies, waiter->timeout)))
> 				return false;
>
> 			new |= RWSEM_FLAG_HANDOFF;
> 		} else {
>
>> But does it break optimistic spinning ? @waiman ?
> Waiters spin for acquiring lock instead of lockup and your report shows
> spinning too much makes trouble. The key is stop spinning neither too
> late nor too early. My proposal is a simple one with as few heuristics
> added as possible.

Yes, too much spinning is bad if we have RT tasks in the mix, otherwise 
it should be fine.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
       [not found]         ` <446288c6c7db4ab2b23f7041baa62335@xiaomi.com>
@ 2022-10-12 13:21           ` Waiman Long
  0 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2022-10-12 13:21 UTC (permalink / raw)
  To: Ting11 Wang 王婷
  Cc: Mukesh Ojha, linux-kernel, john.p.donnelly, Hillf Danton,
	Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar

On 10/11/22 03:20, Ting11 Wang 王婷 wrote:
> Hi Waiman,
>
> Thanks for your quick reply.
> Regarding this issue, we have done a lot of testing on our devices and found that the first patch is the most effective,is it possible to using this patch?
>
> https://lore.kernel.org/lkml/e1cb435a-4471-ac3a-344f-04448f167c9f@quicinc.com/
>
Yes, I am going to resurrect this patch.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
  2022-10-12  4:04         ` Hillf Danton
  2022-10-12 13:19           ` Waiman Long
@ 2022-10-12 13:42           ` Mukesh Ojha
  1 sibling, 0 replies; 20+ messages in thread
From: Mukesh Ojha @ 2022-10-12 13:42 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-kernel, john.p.donnelly, linux-mm, Waiman Long,
	Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar

Hi,

On 10/12/2022 9:34 AM, Hillf Danton wrote:
> On 11 Oct 2022 18:46:20 +0530 Mukesh Ojha <quic_mojha@quicinc.com>
>> On 10/11/2022 4:16 PM, Hillf Danton wrote:
>>> On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com>
>>>> Hi Waiman,
>>>>
>>>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>>>> On 9/29/22 14:04, Waiman Long wrote:
>>>>>> A non-first waiter can potentially spin in the for loop of
>>>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>>>> lock even if the rwsem is free if the following sequence happens:
>>>>>>
>>>>>>      Non-first waiter       First waiter      Lock holder
>>>>>>      ----------------       ------------      -----------
>>>>>>      Acquire wait_lock
>>>>>>      rwsem_try_write_lock():
>>>>>>        Set handoff bit if RT or
>>>>>>          wait too long
>>>>>>        Set waiter->handoff_set
>>>>>>      Release wait_lock
>>>>>>                             Acquire wait_lock
>>>>>>                             Inherit waiter->handoff_set
>>>>>>                             Release wait_lock
>>>>>>                          Clear owner
>>>>>>                                               Release lock
>>>>>>      if (waiter.handoff_set) {
>>>>>>        rwsem_spin_on_owner(();
>>>>>>        if (OWNER_NULL)
>>>>>>          goto trylock_again;
>>>>>>      }
>>>>>>      trylock_again:
>>>>>>      Acquire wait_lock
>>>>>>      rwsem_try_write_lock():
>>>>>>         if (first->handoff_set && (waiter != first))
>>>>>>             return false;
>>>>>>      Release wait_lock
>>>>>>
>>>>>> It is especially problematic if the non-first waiter is an RT task and
>>>>>> it is running on the same CPU as the first waiter as this can lead to
>>>>>> live lock.
>>>>>>
>>>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>>>> consistent")
>>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>>> ---
>>>>>>     kernel/locking/rwsem.c | 13 ++++++++++---
>>>>>>     1 file changed, 10 insertions(+), 3 deletions(-)
>>>>>
>>>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>>>
>>>>
>>>> Looks like, There is still a window for a race.
>>>>
>>>> There is a chance when a reader who came first added it's BIAS and
>>>> goes to slowpath and before it gets added to wait list it got
>>>> preempted by RT task which  goes to slowpath as well and being the
>>>> first waiter gets its hand-off bit set and not able to get the lock
>>>> due to following condition in rwsem_try_write_lock()
>>
>> []
>>
>>>>
>>>>    630                 if (count & RWSEM_LOCK_MASK) {  ==> reader has
>>>> sets its bias
>>>> ..
>>>> ...
>>>>
>>>>    634
>>>>    635                         new |= RWSEM_FLAG_HANDOFF;
>>>>    636                 } else {
>>>>    637                         new |= RWSEM_WRITER_LOCKED;
>>>>
>>>>
>>>> ---------------------->----------------------->-------------------------
>>>>
>>>> First reader (1)          writer(2) RT task             Lock holder(3)
>>>>
>>>> It sets
>>>> RWSEM_READER_BIAS.
>>>> while it is going to
>>>> slowpath(as the lock
>>>> was held by (3)) and
>>>> before it got added
>>>> to the waiters list
>>>> it got preempted
>>>> by (2).
>>>>                          RT task also takes
>>>>                           the slowpath and add              release the
>>>>                           itself into waiting list          rwsem lock
>>>>               and since it is the first         clear the
>>>>                           it is the next one to get         owner.
>>>>                           the lock but it can not
>>>>                           get the lock as (count &
>>>>                           RWSEM_LOCK_MASK) is set
>>>>                           as (1) has added it but
>>>>                           not able to remove its
>>>>               adjustment.
>>
>> []
>>
>>>>
>>> Hey Mukesh,
>>>
>>> Can you test the diff if it makes sense to you?
>>>
>>> It simply prevents the first waiter from spinning any longer after detecting
>>> it barely makes any progress to spin without lock owner.
>>>
>>> Hillf
>>>
>>> --- mainline/kernel/locking/rwsem.c
>>> +++ b/kernel/locking/rwsem.c
>>> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
>>>    	long count, new;
>>>    
>>>    	lockdep_assert_held(&sem->wait_lock);
>>> +	waiter->handoff_set = false;
>>>    
>>>    	count = atomic_long_read(&sem->count);
>>>    	do {
>>>    		bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>>>    
>>>    		if (has_handoff) {
>>> -			/*
>>> -			 * Honor handoff bit and yield only when the first
>>> -			 * waiter is the one that set it. Otherwisee, we
>>> -			 * still try to acquire the rwsem.
>>> -			 */
>>> -			if (first->handoff_set && (waiter != first))
>>> +			if (waiter != first)
>>>    				return false;
>>
>> you mean, you want to check and change waiter->handoff_set on every run
>> rwsem_try_write_lock().
>>
> Yes, with RWSEM_FLAG_HANDOFF set, it is too late for non first waiters to
> spin, and with both RWSEM_LOCK_MASK and RWSEM_FLAG_HANDOFF set, the rivals
> in the RWSEM_LOCK_MASK have an uphand over the first waiter wrt acquiring
> the lock, and it is not a bad option for the first waiter to take a step
> back off.
> 
> 		if (count & RWSEM_LOCK_MASK) {
> 			if (has_handoff || (!rt_task(waiter->task) &&
> 					    !time_after(jiffies, waiter->timeout)))
> 				return false;
> 
> 			new |= RWSEM_FLAG_HANDOFF;
> 		} else {
> 
>> But does it break optimistic spinning ? @waiman ?
> 
> Waiters spin for acquiring lock instead of lockup and your report shows
> spinning too much makes trouble. The key is stop spinning neither too
> late nor too early. My proposal is a simple one with as few heuristics
> added as possible.

 From the high level, it looks like it will work.
Let me check and get back on this.

-Mukesh
> 
> Hillf

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-10-12 13:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-29 18:04 [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
2022-09-29 18:06 ` Waiman Long
2022-09-30  4:47   ` Mukesh Ojha
2022-10-10 10:24   ` Mukesh Ojha
2022-10-10 16:57     ` Waiman Long
     [not found]       ` <03421f0e453143f6ae4a9a7a8a4f47f4@xiaomi.com>
     [not found]         ` <446288c6c7db4ab2b23f7041baa62335@xiaomi.com>
2022-10-12 13:21           ` Waiman Long
2022-10-11 10:46     ` Hillf Danton
2022-10-11 13:16       ` Mukesh Ojha
2022-10-12  4:04         ` Hillf Danton
2022-10-12 13:19           ` Waiman Long
2022-10-12 13:42           ` Mukesh Ojha
2022-10-12 13:16         ` Waiman Long
2022-10-12 13:14       ` Waiman Long
2022-09-29 21:15 ` John Donnelly
2022-09-30  1:09   ` Waiman Long
2022-09-30  4:46 ` Mukesh Ojha
2022-09-30 13:57   ` Waiman Long
2022-09-30  5:06 ` Mukesh Ojha
2022-09-30 14:08   ` Waiman Long
2022-09-30 16:03     ` Mukesh Ojha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.