RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up

* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
@ 2015-09-15  1:26 Zhu Jefferry
  2015-09-16  0:01 ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Zhu Jefferry @ 2015-09-15  1:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, bigeasy

Hi 

Just in the list, I see the patch "[PATCH v2] futex: lower the lock contention on the HB lock during wake up" at http://www.gossamer-threads.com/lists/linux/kernel/2199938?search_string=futex;#2199938.

But I see another patch with same name, different content here,
    23b7776290b10297fe2cae0fb5f166a4f2c68121(http://code.metager.de/source/xref/linux/stable/kernel/futex.c?r=23b7776290b10297fe2cae0fb5f166a4f2c68121) 23-Jun-2015 Linus Torvalds 
    futex: Lower the lock contention on the HB lock during wake up wake_futex_pi() wakes the task before releasing the hash bucket lock (HB). 
    The first thing the woken up task usually does is to acquire the lock which requires the HB lock. On SMP Systems this leads to blocking 
     on the HB lock which is released by the owner shortly after. This patch rearranges the unlock path by first releasing the HB lock and
     then waking up the task.

Could you please help to give a little bit more explanation on this, why they have same name with different modify in the futex.c? I'm a newbie in the community.

Actually, I encounter a customer issue which is related to the glibc code "pthread_mutex_lock", which is using the futex service in kernel, without the patches above.

After lots of customer discussing, ( I could not reproduce the failure in my office), I seriously suspect there might be some particular corner cases in the futex code.

In the unlock flow, the user space code (pthread_mutex_unlock) will check FUTEX_WAITERS flag first, then wakeup the waiters in the kernel list. But in the lock flow, the kernel code (futex) will set FUTEX_WAITERS in first too, then try to get the waiter from the list. They are following same sequence, flag first, entry in list secondly. But there might be some timing problem in SMP system, if the query (unlock flow) is executing just before the list adding action (lock flow).

It might cause the mutex is never really released, and other threads will infinite waiting. Could you please help to take a look at it?

===========================================================================================================================
CPU 0 (trhead 0)                                CPU 1 (thread 1)

 mutex_lock                                               
 val = *futex;                                  
 sys_futex(LOCK_PI, futex, val);                

 return to user space                           
 after acquire the lock                           mutex_lock
                                                  val = *futex;
                                                  sys_futex(LOCK_PI, futex, val);
                                                    lock(hash_bucket(futex));
                                                    set FUTEX_WAITERS flag
                                                    unlock(hash_bucket(futex)) and retry due to page fault

 mutex_unlock in user space                     
 check FUTEX_WAITERS flag                                               
 sys_futex(UNLOCK_PI, futex, val);              
   lock(hash_bucket(futex));        <--.            
                                       .---------   waiting for the lock of (hash_bucket(futex)) to do list adding

   try to get the waiter in waitling <--.           
   list, but it's empty                 |       
                                        |       
   set new_owner to itself              |       
   instead of expecting waiter          |       
                                        |       
                                        |       
   unlock(hash_bucket(futex));          |       
                                        |           lock(hash_bucket(futex));
                                        .--------   add itself to the waiting list
                                                    unlock(hash_bucket(futex));
                                                    waiting forever since there is nobody will release the PI
   the futex is owned by itself                 
   forever in userspace. Because                
   the __owner in user space has                
   been cleared and mutex_unlock                
   will fail forever before it 
   call kernel.                           

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread