Re: [BUG]locking/rwsem: only clean RWSEM_FLAG_HANDOFF when already set

* Re: [BUG]locking/rwsem: only clean RWSEM_FLAG_HANDOFF when already set
       [not found] <4fafad133b074f279dbab1aa3642e23f@xiaomi.com>
@ 2021-11-07  3:25 ` Waiman Long
  2021-11-07  3:28   ` Waiman Long
       [not found] ` <20211107090131.1535-1-hdanton@sina.com>
  1 sibling, 1 reply; 26+ messages in thread
From: Waiman Long @ 2021-11-07  3:25 UTC (permalink / raw)
  To: 马振华, peterz, mingo, will, boqun.feng, linux-kernel

On 11/6/21 08:39, 马振华 wrote:
> Dear longman,
>
> recently , i find a issue which rwsem count is negative value, it 
> happened always when a task try to get the lock 
> with __down_write_killable , then it is killed
>
> this issue happened like this
>
>             CPU2         CPU4
>     task A[reader]     task B[writer]
>     down_read_killable[locked]
>     sem->count=0x100
>             down_write_killable
>             sem->count=0x102[wlist not empty]
>     up_read
>     count=0x2
>             sig kill received
>     down_read_killable
>     sem->count=0x102[wlist not empty]
>             goto branch out_nolock:
> list_del(&waiter.list);
> wait list is empty
> sem->count-RWSEM_FLAG_HANDOFF
> sem->count=0xFE
>     list_empty(&sem->wait_list) is TRUE
>      sem->count andnot RWSEM_FLAG_WAITERS
>       sem->count=0xFC
>     up_read
>     sem->count -= 0x100
>     sem->count=0xFFFFFFFFFFFFFFFC
>     DEBUG_RWSEMS_WARN_ON(tmp < 0, sem);
>
> so sem->count will be negative after writer is killed
> i think if flag RWSEM_FLAG_HANDOFF is not set, we shouldn't clean it

Thanks for reporting this possible race condition.

However, I am still trying to figure how it is possible to set the 
wstate to WRITER_HANDOFF without actually setting the handoff bit as 
well. The statement sequence should be as follows:

wstate = WRITER_HANDOFF;
raw_spin_lock_irq(&sem->wait_lock);
if (rwsem_try_write_lock(sem, wstate))
raw_spin_unlock_irq(&sem->wait_lock);
   :
if (signal_pending_state(state, current))
     goto out_nolock

The rwsem_try_write_lock() function will make sure that we either 
acquire the lock and clear handoff or set the handoff bit. This should 
be done before we actually check for signal. I do think that it is 
probably safer to use atomic_long_andnot to clear the handoff bit 
instead of using atomic_long_add().

Cheers,
Longman

^ permalink raw reply	[flat|nested] 26+ messages in thread