linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] locking/mutex: fix the MUTEX_FLAG_HANDOFF bit is cleared unexpected
@ 2021-06-28 15:51 Yanfei Xu
  2021-06-28 17:57 ` Waiman Long
  0 siblings, 1 reply; 7+ messages in thread
From: Yanfei Xu @ 2021-06-28 15:51 UTC (permalink / raw)
  To: peterz, mingo, longman, boqun.feng; +Cc: linux-kernel

When the mutex unlock path is excuted with WAITERS bit and without
HANDOFF bit set, it will wake up the first task in wait_list. If
there are some tasks not in wait_list are stealing lock, it is very
likely successfully due to the task field of lock is NULL and flags
field is non-NULL. Then the HANDOFF bit will be cleared. But if the
HANDOFF bit was just set by the waked task in wait_list, this clearing
is unexcepted.

__mutex_lock_common                   __mutex_lock_common
  __mutex_trylock                       schedule_preempt_disabled
    /*steal lock successfully*/         __mutex_set_flag(lock,MUTEX_FLAG_HANDOFF)
    __mutex_trylock_or_owner
      if (task==NULL)
      flags &= ~MUTEX_FLAG_HANDOFF
      atomic_long_cmpxchg_acquire
                                        __mutex_trylock  //failed
                                        mutex_optimistic_spin  //failed
                                        schedule_preempt_disabled  //sleep without HANDOFF bit

So the HANDOFF bit should be set as late as possible, here we defer
it util the task is going to be scheduled.

Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
---

Hi maintainers,

I am not very sure if I missed or misunderstanded something, please help
to review. Many thanks!

 kernel/locking/mutex.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 013e1b08a1bf..e57d920e96bf 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -1033,17 +1033,17 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		}
 
 		spin_unlock(&lock->wait_lock);
+
+		if (first)
+			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
 		schedule_preempt_disabled();
 
 		/*
 		 * ww_mutex needs to always recheck its position since its waiter
 		 * list is not FIFO ordered.
 		 */
-		if (ww_ctx || !first) {
+		if (ww_ctx || !first)
 			first = __mutex_waiter_is_first(lock, &waiter);
-			if (first)
-				__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
-		}
 
 		set_current_state(state);
 		/*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] locking/mutex: fix the MUTEX_FLAG_HANDOFF bit is cleared unexpected
  2021-06-28 15:51 [PATCH] locking/mutex: fix the MUTEX_FLAG_HANDOFF bit is cleared unexpected Yanfei Xu
@ 2021-06-28 17:57 ` Waiman Long
  2021-06-29  9:52   ` Xu, Yanfei
  0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2021-06-28 17:57 UTC (permalink / raw)
  To: Yanfei Xu, peterz, mingo, boqun.feng; +Cc: linux-kernel

On 6/28/21 11:51 AM, Yanfei Xu wrote:
> When the mutex unlock path is excuted with WAITERS bit and without
> HANDOFF bit set, it will wake up the first task in wait_list. If
> there are some tasks not in wait_list are stealing lock, it is very
> likely successfully due to the task field of lock is NULL and flags
> field is non-NULL. Then the HANDOFF bit will be cleared. But if the
> HANDOFF bit was just set by the waked task in wait_list, this clearing
> is unexcepted.

I think you mean "unexpected". Right? Anyway, all the setting and 
clearing of the HANDOFF bit are atomic. There shouldn't be any 
unexpected clearing.

> __mutex_lock_common                   __mutex_lock_common
>    __mutex_trylock                       schedule_preempt_disabled
>      /*steal lock successfully*/         __mutex_set_flag(lock,MUTEX_FLAG_HANDOFF)
>      __mutex_trylock_or_owner
>        if (task==NULL)
>        flags &= ~MUTEX_FLAG_HANDOFF
>        atomic_long_cmpxchg_acquire
>                                          __mutex_trylock  //failed
>                                          mutex_optimistic_spin  //failed
>                                          schedule_preempt_disabled  //sleep without HANDOFF bit
>
> So the HANDOFF bit should be set as late as possible, here we defer
> it util the task is going to be scheduled.
> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
> ---
>
> Hi maintainers,
>
> I am not very sure if I missed or misunderstanded something, please help
> to review. Many thanks!
>
>   kernel/locking/mutex.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 013e1b08a1bf..e57d920e96bf 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -1033,17 +1033,17 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   		}
>   
>   		spin_unlock(&lock->wait_lock);
> +
> +		if (first)
> +			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>   		schedule_preempt_disabled();
>   
>   		/*
>   		 * ww_mutex needs to always recheck its position since its waiter
>   		 * list is not FIFO ordered.
>   		 */
> -		if (ww_ctx || !first) {
> +		if (ww_ctx || !first)
>   			first = __mutex_waiter_is_first(lock, &waiter);
> -			if (first)
> -				__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> -		}
>   
>   		set_current_state(state);
>   		/*

In general, I don't mind setting the HANDOFF bit later, but 
mutex_optimistic_spin() at the end of the loop should only be called 
after the HANDOFF bit is set. So the logic isn't quite right yet.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] locking/mutex: fix the MUTEX_FLAG_HANDOFF bit is cleared unexpected
  2021-06-28 17:57 ` Waiman Long
@ 2021-06-29  9:52   ` Xu, Yanfei
  2021-06-29 14:40     ` Waiman Long
  0 siblings, 1 reply; 7+ messages in thread
From: Xu, Yanfei @ 2021-06-29  9:52 UTC (permalink / raw)
  To: Waiman Long, peterz, mingo, boqun.feng; +Cc: linux-kernel



On 6/29/21 1:57 AM, Waiman Long wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
> 
> On 6/28/21 11:51 AM, Yanfei Xu wrote:
>> When the mutex unlock path is excuted with WAITERS bit and without
>> HANDOFF bit set, it will wake up the first task in wait_list. If
>> there are some tasks not in wait_list are stealing lock, it is very
>> likely successfully due to the task field of lock is NULL and flags
>> field is non-NULL. Then the HANDOFF bit will be cleared. But if the
>> HANDOFF bit was just set by the waked task in wait_list, this clearing
>> is unexcepted.
> 
> I think you mean "unexpected". Right? Anyway, all the setting and

Right. It's my fault.

> clearing of the HANDOFF bit are atomic. There shouldn't be any
> unexpected clearing.
> 
>> __mutex_lock_common                   __mutex_lock_common
>>    __mutex_trylock                       schedule_preempt_disabled
>>      /*steal lock successfully*/         
>> __mutex_set_flag(lock,MUTEX_FLAG_HANDOFF)
>>      __mutex_trylock_or_owner
>>        if (task==NULL)
>>        flags &= ~MUTEX_FLAG_HANDOFF
>>        atomic_long_cmpxchg_acquire
>>                                          __mutex_trylock  //failed
>>                                          mutex_optimistic_spin  //failed
>>                                          schedule_preempt_disabled  
>> //sleep without HANDOFF bit
>>
>> So the HANDOFF bit should be set as late as possible, here we defer
>> it util the task is going to be scheduled.
>> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
>> ---
>>
>> Hi maintainers,
>>
>> I am not very sure if I missed or misunderstanded something, please help
>> to review. Many thanks!
>>
>>   kernel/locking/mutex.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 013e1b08a1bf..e57d920e96bf 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -1033,17 +1033,17 @@ __mutex_lock_common(struct mutex *lock, long 
>> state, unsigned int subclass,
>>               }
>>
>>               spin_unlock(&lock->wait_lock);
>> +
>> +             if (first)
>> +                     __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>>               schedule_preempt_disabled();
>>
>>               /*
>>                * ww_mutex needs to always recheck its position since 
>> its waiter
>>                * list is not FIFO ordered.
>>                */
>> -             if (ww_ctx || !first) {
>> +             if (ww_ctx || !first)
>>                       first = __mutex_waiter_is_first(lock, &waiter);
>> -                     if (first)
>> -                             __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>> -             }
>>
>>               set_current_state(state);
>>               /*
> 
> In general, I don't mind setting the HANDOFF bit later, but
> mutex_optimistic_spin() at the end of the loop should only be called
> after the HANDOFF bit is set. So the logic isn't quite right yet.

Thanks for your reply.

Why the mutex_optimistic_spin should be called after the HANDOFF bit is
set? I think the HANDOFF bit is only related to unlock path, and the
mutex_optimistic_spin and __mutex_trylock don't actually use it. (Or I
missed something? )

Maybe I described the issue not much clearly. Let me try again.

The HANDOFF bit aims to avoid lock starvation. Lock starvation is
possible because mutex_lock() allows lock stealing, where a runing( or
optimistic spinning) task beats the woken waiter to the acquire. So
maintainer add HANDOFF bit, once the stealing happened, the top-waiter
will must get the lock at the second wake up due to the HANDOFF bit.

However, the fact is if the stealing happened, the HANDOFF will must be
clear by the thief task. Hence the top-waiter still might starve at the
second wake up.

__mutex_trylock
   ->__mutex_trylock_or_owner
     ->flags &= ~MUTEX_FLAG_HANDOFF
     ->atomic_long_cmpxchg_acquire(&lock->owner, owner, curr | flags)

The next unlock path will not see HANDOFF bit, and will not give the
lock to the top-waiter. The task starves.

May we could add a parameter to the __mutex_trylock to judge if the task
is the top-waiter. If yes, pickup the lock and clear MUTEX_FLAG_HANDOFF,
if not, do not clear MUTEX_FLAG_HANDOFF. Like this:

__mutex_trylock
   ->__mutex_trylock_or_owner
     ->if(first) flags &= ~MUTEX_FLAG_HANDOFF
     ->atomic_long_cmpxchg_acquire(&lock->owner, owner, curr | flags)

But I think the cost of this is higher than my patch.


The other problem my patch solved is separate __mutex_set_flag(lock,
MUTEX_FLAG_HANDOFF) out of the if statement which is "if (ww_ctx ||
!first) ". Or once the HANDOFF is cleared by thief task, it will never
be set again.

if (ww_ctx || !first) {
     first = __mutex_waiter_is_first(lock, &waiter);
     if (first)
         __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
}

change to

if (ww_ctx || !first)
     first = __mutex_waiter_is_first(lock, &waiter);
if (first)
     __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);



Thanks,
Yanfei







> 
> Cheers,
> Longman
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] locking/mutex: fix the MUTEX_FLAG_HANDOFF bit is cleared unexpected
  2021-06-29  9:52   ` Xu, Yanfei
@ 2021-06-29 14:40     ` Waiman Long
  2021-06-29 15:18       ` Waiman Long
  0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2021-06-29 14:40 UTC (permalink / raw)
  To: Xu, Yanfei, Waiman Long, peterz, mingo, boqun.feng; +Cc: linux-kernel

On 6/29/21 5:52 AM, Xu, Yanfei wrote:
>
>
> On 6/29/21 1:57 AM, Waiman Long wrote:
>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>
>> On 6/28/21 11:51 AM, Yanfei Xu wrote:
>>> When the mutex unlock path is excuted with WAITERS bit and without
>>> HANDOFF bit set, it will wake up the first task in wait_list. If
>>> there are some tasks not in wait_list are stealing lock, it is very
>>> likely successfully due to the task field of lock is NULL and flags
>>> field is non-NULL. Then the HANDOFF bit will be cleared. But if the
>>> HANDOFF bit was just set by the waked task in wait_list, this clearing
>>> is unexcepted.
>>
>> I think you mean "unexpected". Right? Anyway, all the setting and
>
> Right. It's my fault.
>
>> clearing of the HANDOFF bit are atomic. There shouldn't be any
>> unexpected clearing.
>>
>>> __mutex_lock_common __mutex_lock_common
>>>    __mutex_trylock schedule_preempt_disabled
>>>      /*steal lock successfully*/ 
>>> __mutex_set_flag(lock,MUTEX_FLAG_HANDOFF)
>>>      __mutex_trylock_or_owner
>>>        if (task==NULL)
>>>        flags &= ~MUTEX_FLAG_HANDOFF
>>>        atomic_long_cmpxchg_acquire
>>>                                          __mutex_trylock //failed
>>> mutex_optimistic_spin  //failed
>>> schedule_preempt_disabled  //sleep without HANDOFF bit
>>>
>>> So the HANDOFF bit should be set as late as possible, here we defer
>>> it util the task is going to be scheduled.
>>> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
>>> ---
>>>
>>> Hi maintainers,
>>>
>>> I am not very sure if I missed or misunderstanded something, please 
>>> help
>>> to review. Many thanks!
>>>
>>>   kernel/locking/mutex.c | 8 ++++----
>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>>> index 013e1b08a1bf..e57d920e96bf 100644
>>> --- a/kernel/locking/mutex.c
>>> +++ b/kernel/locking/mutex.c
>>> @@ -1033,17 +1033,17 @@ __mutex_lock_common(struct mutex *lock, long 
>>> state, unsigned int subclass,
>>>               }
>>>
>>>               spin_unlock(&lock->wait_lock);
>>> +
>>> +             if (first)
>>> +                     __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>>>               schedule_preempt_disabled();
>>>
>>>               /*
>>>                * ww_mutex needs to always recheck its position since 
>>> its waiter
>>>                * list is not FIFO ordered.
>>>                */
>>> -             if (ww_ctx || !first) {
>>> +             if (ww_ctx || !first)
>>>                       first = __mutex_waiter_is_first(lock, &waiter);
>>> -                     if (first)
>>> -                             __mutex_set_flag(lock, 
>>> MUTEX_FLAG_HANDOFF);
>>> -             }
>>>
>>>               set_current_state(state);
>>>               /*
>>
>> In general, I don't mind setting the HANDOFF bit later, but
>> mutex_optimistic_spin() at the end of the loop should only be called
>> after the HANDOFF bit is set. So the logic isn't quite right yet.
>
> Thanks for your reply.
>
> Why the mutex_optimistic_spin should be called after the HANDOFF bit is
> set? I think the HANDOFF bit is only related to unlock path, and the
> mutex_optimistic_spin and __mutex_trylock don't actually use it. (Or I
> missed something? )

The purpose of doing spinning after the HANDOFF bit is set is to 
eliminate the waiter wakeup latency, if possible. Before the HANDOFF bit 
is set, the lock can be easily stolen and there is no point in adding 
pressure to the lock cacheline.


>
> Maybe I described the issue not much clearly. Let me try again.
>
> The HANDOFF bit aims to avoid lock starvation. Lock starvation is
> possible because mutex_lock() allows lock stealing, where a runing( or
> optimistic spinning) task beats the woken waiter to the acquire. So
> maintainer add HANDOFF bit, once the stealing happened, the top-waiter
> will must get the lock at the second wake up due to the HANDOFF bit.
>
> However, the fact is if the stealing happened, the HANDOFF will must be
> clear by the thief task. Hence the top-waiter still might starve at the
> second wake up.
>
I think you have some confusion here. If the HANDOFF bit is set before 
the lock is stolen by an optimistic spinner, lock stealing can't happen 
which is the point of having a HANDOFF bit. Also the clearing of the 
HANDOFF bit isn't done by the task that steal the lock, it is done by 
the current lock holder (i.e. the task that held the lock when the 
HANDOFF bit was set) in the unlock path. It can be a lock stealer that 
stole the lock before the HANDOFF bit is set. Of course, it will be a 
bug if the current code doesn't actually do that.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] locking/mutex: fix the MUTEX_FLAG_HANDOFF bit is cleared unexpected
  2021-06-29 14:40     ` Waiman Long
@ 2021-06-29 15:18       ` Waiman Long
  2021-06-30  6:20         ` Xu, Yanfei
  0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2021-06-29 15:18 UTC (permalink / raw)
  To: Xu, Yanfei, Waiman Long, peterz, mingo, boqun.feng; +Cc: linux-kernel

On 6/29/21 10:40 AM, Waiman Long wrote:
> On 6/29/21 5:52 AM, Xu, Yanfei wrote:
>>
>>
>> On 6/29/21 1:57 AM, Waiman Long wrote:
>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>
>>> On 6/28/21 11:51 AM, Yanfei Xu wrote:
>>>> When the mutex unlock path is excuted with WAITERS bit and without
>>>> HANDOFF bit set, it will wake up the first task in wait_list. If
>>>> there are some tasks not in wait_list are stealing lock, it is very
>>>> likely successfully due to the task field of lock is NULL and flags
>>>> field is non-NULL. Then the HANDOFF bit will be cleared. But if the
>>>> HANDOFF bit was just set by the waked task in wait_list, this clearing
>>>> is unexcepted.
>>>
>>> I think you mean "unexpected". Right? Anyway, all the setting and
>>
>> Right. It's my fault.
>>
>>> clearing of the HANDOFF bit are atomic. There shouldn't be any
>>> unexpected clearing.
>>>
>>>> __mutex_lock_common __mutex_lock_common
>>>>    __mutex_trylock schedule_preempt_disabled
>>>>      /*steal lock successfully*/ 
>>>> __mutex_set_flag(lock,MUTEX_FLAG_HANDOFF)
>>>>      __mutex_trylock_or_owner
>>>>        if (task==NULL)
>>>>        flags &= ~MUTEX_FLAG_HANDOFF
>>>>        atomic_long_cmpxchg_acquire
>>>>                                          __mutex_trylock //failed
>>>> mutex_optimistic_spin  //failed
>>>> schedule_preempt_disabled  //sleep without HANDOFF bit
>>>>
>>>> So the HANDOFF bit should be set as late as possible, here we defer
>>>> it util the task is going to be scheduled.
>>>> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
>>>> ---
>>>>
>>>> Hi maintainers,
>>>>
>>>> I am not very sure if I missed or misunderstanded something, please 
>>>> help
>>>> to review. Many thanks!
>>>>
>>>>   kernel/locking/mutex.c | 8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>>>> index 013e1b08a1bf..e57d920e96bf 100644
>>>> --- a/kernel/locking/mutex.c
>>>> +++ b/kernel/locking/mutex.c
>>>> @@ -1033,17 +1033,17 @@ __mutex_lock_common(struct mutex *lock, 
>>>> long state, unsigned int subclass,
>>>>               }
>>>>
>>>>               spin_unlock(&lock->wait_lock);
>>>> +
>>>> +             if (first)
>>>> +                     __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>>>>               schedule_preempt_disabled();
>>>>
>>>>               /*
>>>>                * ww_mutex needs to always recheck its position 
>>>> since its waiter
>>>>                * list is not FIFO ordered.
>>>>                */
>>>> -             if (ww_ctx || !first) {
>>>> +             if (ww_ctx || !first)
>>>>                       first = __mutex_waiter_is_first(lock, &waiter);
>>>> -                     if (first)
>>>> -                             __mutex_set_flag(lock, 
>>>> MUTEX_FLAG_HANDOFF);
>>>> -             }
>>>>
>>>>               set_current_state(state);
>>>>               /*
>>>
>>> In general, I don't mind setting the HANDOFF bit later, but
>>> mutex_optimistic_spin() at the end of the loop should only be called
>>> after the HANDOFF bit is set. So the logic isn't quite right yet.
>>
>> Thanks for your reply.
>>
>> Why the mutex_optimistic_spin should be called after the HANDOFF bit is
>> set? I think the HANDOFF bit is only related to unlock path, and the
>> mutex_optimistic_spin and __mutex_trylock don't actually use it. (Or I
>> missed something? )
>
> The purpose of doing spinning after the HANDOFF bit is set is to 
> eliminate the waiter wakeup latency, if possible. Before the HANDOFF 
> bit is set, the lock can be easily stolen and there is no point in 
> adding pressure to the lock cacheline.
>
>
>>
>> Maybe I described the issue not much clearly. Let me try again.
>>
>> The HANDOFF bit aims to avoid lock starvation. Lock starvation is
>> possible because mutex_lock() allows lock stealing, where a runing( or
>> optimistic spinning) task beats the woken waiter to the acquire. So
>> maintainer add HANDOFF bit, once the stealing happened, the top-waiter
>> will must get the lock at the second wake up due to the HANDOFF bit.
>>
>> However, the fact is if the stealing happened, the HANDOFF will must be
>> clear by the thief task. Hence the top-waiter still might starve at the
>> second wake up.
>>
> I think you have some confusion here. If the HANDOFF bit is set before 
> the lock is stolen by an optimistic spinner, lock stealing can't 
> happen which is the point of having a HANDOFF bit. Also the clearing 
> of the HANDOFF bit isn't done by the task that steal the lock, it is 
> done by the current lock holder (i.e. the task that held the lock when 
> the HANDOFF bit was set) in the unlock path. It can be a lock stealer 
> that stole the lock before the HANDOFF bit is set. Of course, it will 
> be a bug if the current code doesn't actually do that. 

Oh, you are right. The current code doesn't actually prevent lock 
stealer from actually stealing the lock in the special case that the 
lock is in the unlock state when the HANDOFF bit is set. In this case, 
it is free for all and whoever gets the lock will also clear the the 
HANDOFF bit. The comment in __mutex_trylock_or_owner() about "We set the 
HANDOFF bit" isn't quite right.

One way to address this issue is to enforce the rule that non-first 
waiter can't steal the lock when the HANDOFF bit is set. That probably 
eliminates the need of a separate PICKUP bit.

Alternatively, we can let this state happens similar to what your patch 
proposes. However, we should clearly document in the code this special 
race condition.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] locking/mutex: fix the MUTEX_FLAG_HANDOFF bit is cleared unexpected
  2021-06-29 15:18       ` Waiman Long
@ 2021-06-30  6:20         ` Xu, Yanfei
  2021-06-30 10:06           ` Xu, Yanfei
  0 siblings, 1 reply; 7+ messages in thread
From: Xu, Yanfei @ 2021-06-30  6:20 UTC (permalink / raw)
  To: Waiman Long, peterz, mingo, boqun.feng; +Cc: linux-kernel



On 6/29/21 11:18 PM, Waiman Long wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
> 
> On 6/29/21 10:40 AM, Waiman Long wrote:
>> On 6/29/21 5:52 AM, Xu, Yanfei wrote:
>>>
>>>
>>> On 6/29/21 1:57 AM, Waiman Long wrote:
>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>>
>>>> On 6/28/21 11:51 AM, Yanfei Xu wrote:
>>>>> When the mutex unlock path is excuted with WAITERS bit and without
>>>>> HANDOFF bit set, it will wake up the first task in wait_list. If
>>>>> there are some tasks not in wait_list are stealing lock, it is very
>>>>> likely successfully due to the task field of lock is NULL and flags
>>>>> field is non-NULL. Then the HANDOFF bit will be cleared. But if the
>>>>> HANDOFF bit was just set by the waked task in wait_list, this clearing
>>>>> is unexcepted.
>>>>
>>>> I think you mean "unexpected". Right? Anyway, all the setting and
>>>
>>> Right. It's my fault.
>>>
>>>> clearing of the HANDOFF bit are atomic. There shouldn't be any
>>>> unexpected clearing.
>>>>
>>>>> __mutex_lock_common __mutex_lock_common
>>>>>    __mutex_trylock schedule_preempt_disabled
>>>>>      /*steal lock successfully*/
>>>>> __mutex_set_flag(lock,MUTEX_FLAG_HANDOFF)
>>>>>      __mutex_trylock_or_owner
>>>>>        if (task==NULL)
>>>>>        flags &= ~MUTEX_FLAG_HANDOFF
>>>>>        atomic_long_cmpxchg_acquire
>>>>>                                          __mutex_trylock //failed
>>>>> mutex_optimistic_spin  //failed
>>>>> schedule_preempt_disabled  //sleep without HANDOFF bit
>>>>>
>>>>> So the HANDOFF bit should be set as late as possible, here we defer
>>>>> it util the task is going to be scheduled.
>>>>> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
>>>>> ---
>>>>>
>>>>> Hi maintainers,
>>>>>
>>>>> I am not very sure if I missed or misunderstanded something, please
>>>>> help
>>>>> to review. Many thanks!
>>>>>
>>>>>   kernel/locking/mutex.c | 8 ++++----
>>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>>>>> index 013e1b08a1bf..e57d920e96bf 100644
>>>>> --- a/kernel/locking/mutex.c
>>>>> +++ b/kernel/locking/mutex.c
>>>>> @@ -1033,17 +1033,17 @@ __mutex_lock_common(struct mutex *lock,
>>>>> long state, unsigned int subclass,
>>>>>               }
>>>>>
>>>>>               spin_unlock(&lock->wait_lock);
>>>>> +
>>>>> +             if (first)
>>>>> +                     __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>>>>>               schedule_preempt_disabled();
>>>>>
>>>>>               /*
>>>>>                * ww_mutex needs to always recheck its position
>>>>> since its waiter
>>>>>                * list is not FIFO ordered.
>>>>>                */
>>>>> -             if (ww_ctx || !first) {
>>>>> +             if (ww_ctx || !first)
>>>>>                       first = __mutex_waiter_is_first(lock, &waiter);
>>>>> -                     if (first)
>>>>> -                             __mutex_set_flag(lock,
>>>>> MUTEX_FLAG_HANDOFF);
>>>>> -             }
>>>>>
>>>>>               set_current_state(state);
>>>>>               /*
>>>>
>>>> In general, I don't mind setting the HANDOFF bit later, but
>>>> mutex_optimistic_spin() at the end of the loop should only be called
>>>> after the HANDOFF bit is set. So the logic isn't quite right yet.
>>>
>>> Thanks for your reply.
>>>
>>> Why the mutex_optimistic_spin should be called after the HANDOFF bit is
>>> set? I think the HANDOFF bit is only related to unlock path, and the
>>> mutex_optimistic_spin and __mutex_trylock don't actually use it. (Or I
>>> missed something? )
>>
>> The purpose of doing spinning after the HANDOFF bit is set is to
>> eliminate the waiter wakeup latency, if possible. Before the HANDOFF
>> bit is set, the lock can be easily stolen and there is no point in
>> adding pressure to the lock cacheline.
>>
>>
>>>
>>> Maybe I described the issue not much clearly. Let me try again.
>>>
>>> The HANDOFF bit aims to avoid lock starvation. Lock starvation is
>>> possible because mutex_lock() allows lock stealing, where a runing( or
>>> optimistic spinning) task beats the woken waiter to the acquire. So
>>> maintainer add HANDOFF bit, once the stealing happened, the top-waiter
>>> will must get the lock at the second wake up due to the HANDOFF bit.
>>>
>>> However, the fact is if the stealing happened, the HANDOFF will must be
>>> clear by the thief task. Hence the top-waiter still might starve at the
>>> second wake up.
>>>
>> I think you have some confusion here. If the HANDOFF bit is set before
>> the lock is stolen by an optimistic spinner, lock stealing can't
>> happen which is the point of having a HANDOFF bit. Also the clearing
>> of the HANDOFF bit isn't done by the task that steal the lock, it is
>> done by the current lock holder (i.e. the task that held the lock when
>> the HANDOFF bit was set) in the unlock path. It can be a lock stealer
>> that stole the lock before the HANDOFF bit is set. Of course, it will
>> be a bug if the current code doesn't actually do that.
> 
> Oh, you are right. The current code doesn't actually prevent lock
> stealer from actually stealing the lock in the special case that the
> lock is in the unlock state when the HANDOFF bit is set. In this case,

How about setting the HANDOFF bit before the top-waiter first give up
cpu and fall asleep. Then It must can get the lock after being woken up,
and there is no chance happen stealing lock.  And I sent a v2 with this.

> it is free for all and whoever gets the lock will also clear the the
> HANDOFF bit. The comment in __mutex_trylock_or_owner() about "We set the
> HANDOFF bit" isn't quite right.
> 
> One way to address this issue is to enforce the rule that non-first
> waiter can't steal the lock when the HANDOFF bit is set. That probably
> eliminates the need of a separate PICKUP bit.
> 
> Alternatively, we can let this state happens similar to what your patch
> proposes. However, we should clearly document in the code this special
> race condition.

Yes, the document is obsolete after commit e274795ea7b7 ("locking/mutex: 
Fix mutex handoff").

Thanks,
Yanfei

> 
> Cheers,
> Longman
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] locking/mutex: fix the MUTEX_FLAG_HANDOFF bit is cleared unexpected
  2021-06-30  6:20         ` Xu, Yanfei
@ 2021-06-30 10:06           ` Xu, Yanfei
  0 siblings, 0 replies; 7+ messages in thread
From: Xu, Yanfei @ 2021-06-30 10:06 UTC (permalink / raw)
  To: Waiman Long, peterz, mingo, boqun.feng; +Cc: linux-kernel



On 6/30/21 2:20 PM, Xu, Yanfei wrote:
>> Oh, you are right. The current code doesn't actually prevent lock
>> stealer from actually stealing the lock in the special case that the
>> lock is in the unlock state when the HANDOFF bit is set. In this case,
> 
> How about setting the HANDOFF bit before the top-waiter first give up
> cpu and fall asleep. Then It must can get the lock after being woken up,
> and there is no chance happen stealing lock.  And I sent a v2 with this.

Please ignore this, It is incorrect. Lock stealing is a performance
optimization.

Yanfei

> 
>> it is free for all and whoever gets the lock will also clear the the
>> HANDOFF bit. The comment in __mutex_trylock_or_owner() about "We set the
>> HANDOFF bit" isn't quite right.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-30 10:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-28 15:51 [PATCH] locking/mutex: fix the MUTEX_FLAG_HANDOFF bit is cleared unexpected Yanfei Xu
2021-06-28 17:57 ` Waiman Long
2021-06-29  9:52   ` Xu, Yanfei
2021-06-29 14:40     ` Waiman Long
2021-06-29 15:18       ` Waiman Long
2021-06-30  6:20         ` Xu, Yanfei
2021-06-30 10:06           ` Xu, Yanfei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).