linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
@ 2015-09-15  1:26 Zhu Jefferry
  2015-09-16  0:01 ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Zhu Jefferry @ 2015-09-15  1:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, bigeasy

Hi 

Just in the list, I see the patch "[PATCH v2] futex: lower the lock contention on the HB lock during wake up" at http://www.gossamer-threads.com/lists/linux/kernel/2199938?search_string=futex;#2199938.

But I see another patch with same name, different content here,
    23b7776290b10297fe2cae0fb5f166a4f2c68121(http://code.metager.de/source/xref/linux/stable/kernel/futex.c?r=23b7776290b10297fe2cae0fb5f166a4f2c68121) 23-Jun-2015 Linus Torvalds 
    futex: Lower the lock contention on the HB lock during wake up wake_futex_pi() wakes the task before releasing the hash bucket lock (HB). 
    The first thing the woken up task usually does is to acquire the lock which requires the HB lock. On SMP Systems this leads to blocking 
     on the HB lock which is released by the owner shortly after. This patch rearranges the unlock path by first releasing the HB lock and
     then waking up the task.

Could you please help to give a little bit more explanation on this, why they have same name with different modify in the futex.c? I'm a newbie in the community.

Actually, I encounter a customer issue which is related to the glibc code "pthread_mutex_lock", which is using the futex service in kernel, without the patches above.

After lots of customer discussing, ( I could not reproduce the failure in my office), I seriously suspect there might be some particular corner cases in the futex code.

In the unlock flow, the user space code (pthread_mutex_unlock) will check FUTEX_WAITERS flag first, then wakeup the waiters in the kernel list. But in the lock flow, the kernel code (futex) will set FUTEX_WAITERS in first too, then try to get the waiter from the list. They are following same sequence, flag first, entry in list secondly. But there might be some timing problem in SMP system, if the query (unlock flow) is executing just before the list adding action (lock flow).

It might cause the mutex is never really released, and other threads will infinite waiting. Could you please help to take a look at it?

===========================================================================================================================
CPU 0 (trhead 0)                                CPU 1 (thread 1)

 mutex_lock                                               
 val = *futex;                                  
 sys_futex(LOCK_PI, futex, val);                
                                                
 return to user space                           
 after acquire the lock                           mutex_lock
                                                  val = *futex;
                                                  sys_futex(LOCK_PI, futex, val);
                                                    lock(hash_bucket(futex));
                                                    set FUTEX_WAITERS flag
                                                    unlock(hash_bucket(futex)) and retry due to page fault
                                                
 mutex_unlock in user space                     
 check FUTEX_WAITERS flag                                               
 sys_futex(UNLOCK_PI, futex, val);              
   lock(hash_bucket(futex));        <--.            
                                       .---------   waiting for the lock of (hash_bucket(futex)) to do list adding
                                                
   try to get the waiter in waitling <--.           
   list, but it's empty                 |       
                                        |       
   set new_owner to itself              |       
   instead of expecting waiter          |       
                                        |       
                                        |       
   unlock(hash_bucket(futex));          |       
                                        |           lock(hash_bucket(futex));
                                        .--------   add itself to the waiting list
                                                    unlock(hash_bucket(futex));
                                                    waiting forever since there is nobody will release the PI
   the futex is owned by itself                 
   forever in userspace. Because                
   the __owner in user space has                
   been cleared and mutex_unlock                
   will fail forever before it 
   call kernel.                           


Thanks,
Jeff


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-09-15  1:26 [PATCH v2] futex: lower the lock contention on the HB lock during wake up Zhu Jefferry
@ 2015-09-16  0:01 ` Thomas Gleixner
  2015-09-16  0:17   ` Zhu Jefferry
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-09-16  0:01 UTC (permalink / raw)
  To: Zhu Jefferry; +Cc: linux-kernel, bigeasy

On Tue, 15 Sep 2015, Zhu Jefferry wrote:

Please configure your e-mail client proper and follow the basic rules:

- Choose a meaningful subject for your questions

  You just copied a random subject line from some other mail thread,
  which makes your mail look like a patch. But it's not a patch. You
  have a question about futexes and patches related to them.

- Make sure your lines are no longer than 72 to 76 characters in length.

  You can't assume that all e-mail and news clients behave like yours, and while yours might wrap lines automatically when the text reaches the right of the window containing it, not all do.

  For the sentence above I need a 190 character wide display ....

- Do not use colors or other gimmicks. They just make the mail
  unreadable in simple text based readers.

> Just in the list, I see the patch "[PATCH v2] futex: lower the lock
> contention on the HB lock during wake up" at
> http://www.gossamer-threads.com/lists/linux/kernel/2199938?search_string=futex;#2199938.
 
> But I see another patch with same name, different content here,
>     23b7776290b10297fe2cae0fb5f166a4f2c68121(http://code.metager.de/source/xref/linux/stable/kernel/futex.c?r=23b7776290b10297fe2cae0fb5f166a4f2c68121)

I have no idea what that metager thing tells you and I really don't
want to know. Plain git tells me:

# git show 23b7776290b10297fe2cae0fb5f166a4f2c68121
Merge: 6bc4c3ad3619 6fab54101923
Author: Linus Torvalds <torvalds@linux-foundation.org
Date:   Mon Jun 22 15:52:04 2015 -0700

    Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

So that's a merge commit where Linus pulled a pile of changes into the
mainline kernel. And that merge does not contain the patch above, but
it contains a different change to the futex code.

> Could you please help to give a little bit more explanation on this,
> why they have same name with different modify in the futex.c? I'm a
> newbie in the community.

Use the proper tools and not some random web interface. The commit you
are looking for is a completely different one.

# git log kernel/futex.c
....
commit 802ab58da74bb49ab348d2872190ef26ddc1a3e0
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de
Date:   Wed Jun 17 10:33:50 2015 +0200

    futex: Lower the lock contention on the HB lock during wake up
....

And that's the same as the one in the LKML thread plus a fixup.

> Actually, I encounter a customer issue which is related to the glibc
> code "pthread_mutex_lock", which is using the futex service in
> kernel, without the patches above.

The patches above are merily an optimization and completely unrelated
to your problem.

You fail to provide the real interesting information here:

 - Which architecture/SoC
 - Which kernel version and which extra patches
 - Which glibc version and which extra patches

> After lots of customer discussing, ( I could not reproduce the
> failure in my office), I seriously suspect there might be some
> particular corner cases in the futex code.

The futex code is more or less a conglomorate of corner cases.

But again you fail to provide the real interesting information:

 - What is the actual failure ?

The information that you discussed that with your customer is
completely irrelevant and your suspicion does not clarify the issue
either.

> In the unlock flow, the user space code (pthread_mutex_unlock) will
> check FUTEX_WAITERS flag first, then wakeup the waiters in the
> kernel list. But in the lock flow, the kernel code (futex) will set
> FUTEX_WAITERS in first too, then try to get the waiter from the
> list. They are following same sequence, flag first, entry in list
> secondly. But there might be some timing problem in SMP system, if
> the query (unlock flow) is executing just before the list adding
> action (lock flow).

There might be some timing problem, if the code would look like the
scheme you painted below, but it does not.

> It might cause the mutex is never really released, and other threads
> will infinite waiting. Could you please help to take a look at it?
>
> CPU 0 (trhead 0)                                CPU 1 (thread 1)
> 
>  mutex_lock                                               
>  val = *futex;                                  
>  sys_futex(LOCK_PI, futex, val);                
>
>  return to user space

If the futex is uncontended then you don't enter the kernel for
acquiring the futex.

>  after acquire the lock                           mutex_lock
>                                                   val = *futex;
>                                                   sys_futex(LOCK_PI, futex, val);

The futex FUTEX_LOCK_PI operation does not take the user space value. That's
what FUTEX_WAIT does.

>                                                     lock(hash_bucket(futex));
>                                                     set FUTEX_WAITERS flag
>                                                     unlock(hash_bucket(futex)) and retry due to page fault

So here you are completely off the track. If the 'set FUTEX_WAITERS
bit' operation fails due to a page fault, then the FUTEX_WAITERS bit
is not set. So it cannot be observed on the other core.

The flow is:

sys_futex(LOCK_PI, futex, ...)

 retry:
  lock(hb(futex));
  ret = set_waiter_bit(futex);
  if (ret == -EFAULT) {
    unlock(hb(futex));
    handle_fault();
    goto retry;
  }

  list_add();
  unlock(hb(futex));
  schedule();

So when set_waiter_bit() succeeds, then the hash bucket lock is held
and blocks the waker. So it's guaranteed that the waker will see the
waiter on the list.

If set_waiter_bit() faults, then the waiter bit is not set and
therefor there is nothing to wake. So the waker will not enter the
kernel because the futex is uncontended.

So now, lets assume that the waiter failed to set the waiter bit and
the waker unlocked the futex. When the waiter retries then it actually
checks whether the futex still has an owner. So it observes the owner
has been cleared, it acquires the futex and returns.

It's a bit more complex than that due to handling of the gazillion of
corner cases, but that's the basic synchronization mechanism and there
is no hidden timing issue on SMP.

Random speculation is not helping here.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-09-16  0:01 ` Thomas Gleixner
@ 2015-09-16  0:17   ` Zhu Jefferry
  2015-09-16  8:06     ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Zhu Jefferry @ 2015-09-16  0:17 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, bigeasy

Thanks for your detail guideline and explanations. Please see my questions in-line.

> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Wednesday, September 16, 2015 8:01 AM
> To: Zhu Shuangjun-R65879
> Cc: linux-kernel@vger.kernel.org; bigeasy@linutronix.de
> Subject: RE: [PATCH v2] futex: lower the lock contention on the HB lock
> during wake up
> 
> On Tue, 15 Sep 2015, Zhu Jefferry wrote:
> 
> Please configure your e-mail client proper and follow the basic rules:
> 
> - Choose a meaningful subject for your questions
> 
>   You just copied a random subject line from some other mail thread,
>   which makes your mail look like a patch. But it's not a patch. You
>   have a question about futexes and patches related to them.
> 
> - Make sure your lines are no longer than 72 to 76 characters in length.
> 
>   You can't assume that all e-mail and news clients behave like yours,
> and while yours might wrap lines automatically when the text reaches the
> right of the window containing it, not all do.
> 
>   For the sentence above I need a 190 character wide display ....
> 
> - Do not use colors or other gimmicks. They just make the mail
>   unreadable in simple text based readers.
> 
> > Just in the list, I see the patch "[PATCH v2] futex: lower the lock
> > contention on the HB lock during wake up" at
> > http://www.gossamer-
> threads.com/lists/linux/kernel/2199938?search_string=futex;#2199938.
> 
> > But I see another patch with same name, different content here,
> >
> > 23b7776290b10297fe2cae0fb5f166a4f2c68121(http://code.metager.de/source
> > /xref/linux/stable/kernel/futex.c?r=23b7776290b10297fe2cae0fb5f166a4f2
> > c68121)
> 
> I have no idea what that metager thing tells you and I really don't want
> to know. Plain git tells me:
> 
> # git show 23b7776290b10297fe2cae0fb5f166a4f2c68121
> Merge: 6bc4c3ad3619 6fab54101923
> Author: Linus Torvalds <torvalds@linux-foundation.org
> Date:   Mon Jun 22 15:52:04 2015 -0700
> 
>     Merge branch 'sched-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> 
> So that's a merge commit where Linus pulled a pile of changes into the
> mainline kernel. And that merge does not contain the patch above, but it
> contains a different change to the futex code.
> 
> > Could you please help to give a little bit more explanation on this,
> > why they have same name with different modify in the futex.c? I'm a
> > newbie in the community.
> 
> Use the proper tools and not some random web interface. The commit you
> are looking for is a completely different one.
> 
> # git log kernel/futex.c
> ....
> commit 802ab58da74bb49ab348d2872190ef26ddc1a3e0
> Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de
> Date:   Wed Jun 17 10:33:50 2015 +0200
> 
>     futex: Lower the lock contention on the HB lock during wake up ....
> 
> And that's the same as the one in the LKML thread plus a fixup.
> 
> > Actually, I encounter a customer issue which is related to the glibc
> > code "pthread_mutex_lock", which is using the futex service in kernel,
> > without the patches above.
> 
> The patches above are merily an optimization and completely unrelated to
> your problem.
> 
> You fail to provide the real interesting information here:
> 
>  - Which architecture/SoC
>  - Which kernel version and which extra patches
>  - Which glibc version and which extra patches
> 
> > After lots of customer discussing, ( I could not reproduce the failure
> > in my office), I seriously suspect there might be some particular
> > corner cases in the futex code.
> 
> The futex code is more or less a conglomorate of corner cases.
> 
> But again you fail to provide the real interesting information:
> 
>  - What is the actual failure ?
> 
> The information that you discussed that with your customer is completely
> irrelevant and your suspicion does not clarify the issue either.
> 
> > In the unlock flow, the user space code (pthread_mutex_unlock) will
> > check FUTEX_WAITERS flag first, then wakeup the waiters in the kernel
> > list. But in the lock flow, the kernel code (futex) will set
> > FUTEX_WAITERS in first too, then try to get the waiter from the list.
> > They are following same sequence, flag first, entry in list secondly.
> > But there might be some timing problem in SMP system, if the query
> > (unlock flow) is executing just before the list adding action (lock
> > flow).
> 
> There might be some timing problem, if the code would look like the
> scheme you painted below, but it does not.
> 
> > It might cause the mutex is never really released, and other threads
> > will infinite waiting. Could you please help to take a look at it?
> >
> > CPU 0 (trhead 0)                                CPU 1 (thread 1)
> >
> >  mutex_lock
> >  val = *futex;
> >  sys_futex(LOCK_PI, futex, val);
> >
> >  return to user space
> 
> If the futex is uncontended then you don't enter the kernel for acquiring
> the futex.
> 
> >  after acquire the lock                           mutex_lock
> >                                                   val = *futex;
> >                                                   sys_futex(LOCK_PI,
> > futex, val);
> 
> The futex FUTEX_LOCK_PI operation does not take the user space value.
> That's what FUTEX_WAIT does.
> 
> >
> lock(hash_bucket(futex));
> >                                                     set FUTEX_WAITERS
> flag
> >
> > unlock(hash_bucket(futex)) and retry due to page fault
> 
> So here you are completely off the track. If the 'set FUTEX_WAITERS bit'
> operation fails due to a page fault, then the FUTEX_WAITERS bit is not
> set. So it cannot be observed on the other core.
> 
> The flow is:
> 
> sys_futex(LOCK_PI, futex, ...)
> 
>  retry:
>   lock(hb(futex));
>   ret = set_waiter_bit(futex);
>   if (ret == -EFAULT) {
>     unlock(hb(futex));
>     handle_fault();
>     goto retry;
>   }
> 
>   list_add();
>   unlock(hb(futex));
>   schedule();
> 
> So when set_waiter_bit() succeeds, then the hash bucket lock is held and
> blocks the waker. So it's guaranteed that the waker will see the waiter
> on the list.
> 
> If set_waiter_bit() faults, then the waiter bit is not set and therefor
> there is nothing to wake. So the waker will not enter the kernel because
> the futex is uncontended.
> 

I assume your pseudo code set_waiter_bit is mapped to the real code "futex_lock_pi_atomic",
It's possible for futex_lock_pi_atomic to successfully set FUTEX_WAITERS bit, but return with
Page fault, for example, like fail in lookup_pi_state().


> So now, lets assume that the waiter failed to set the waiter bit and the
> waker unlocked the futex. When the waiter retries then it actually checks
> whether the futex still has an owner. So it observes the owner has been
> cleared, it acquires the futex and returns.
> 
> It's a bit more complex than that due to handling of the gazillion of
> corner cases, but that's the basic synchronization mechanism and there is
> no hidden timing issue on SMP.
> 
> Random speculation is not helping here.
> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-09-16  0:17   ` Zhu Jefferry
@ 2015-09-16  8:06     ` Thomas Gleixner
  2015-09-16  9:52       ` Zhu Jefferry
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-09-16  8:06 UTC (permalink / raw)
  To: Zhu Jefferry; +Cc: linux-kernel, bigeasy

On Wed, 16 Sep 2015, Zhu Jefferry wrote:

> Thanks for your detail guideline and explanations. Please see my questions in-line.

Please trim the reply to the relevant sections. It's annoying if I
have to search your replies inside of useless quoted text.
 
> > -----Original Message-----
> > From: Thomas Gleixner [mailto:tglx@linutronix.de]
> > The flow is:
> > 
> > sys_futex(LOCK_PI, futex, ...)
> > 
> >  retry:
> >   lock(hb(futex));
> >   ret = set_waiter_bit(futex);
> >   if (ret == -EFAULT) {
> >     unlock(hb(futex));
> >     handle_fault();
> >     goto retry;
> >   }
> > 
> >   list_add();
> >   unlock(hb(futex));
> >   schedule();
> > 
> > So when set_waiter_bit() succeeds, then the hash bucket lock is held and
> > blocks the waker. So it's guaranteed that the waker will see the waiter
> > on the list.
> > 
> > If set_waiter_bit() faults, then the waiter bit is not set and therefor
> > there is nothing to wake. So the waker will not enter the kernel because
> > the futex is uncontended.
> > 

> I assume your pseudo code set_waiter_bit is mapped to the real code
> "futex_lock_pi_atomic", It's possible for futex_lock_pi_atomic to
> successfully set FUTEX_WAITERS bit, but return with Page fault, for
> example, like fail in lookup_pi_state().

No. It's not. lookup_pi_state() cannot return EFAULT. The only
function which can fault inside of lock_pi_update_atomic() is the
actual cmpxchg. Though lock_pi_update_atomic() can successfully set
the waiter bit and then return with some other failure code (ESRCH,
EAGAIN, ...). But that does not matter at all.

Any failure return will end up in a retry. And if the waker managed to
release the futex before the retry takes place then the waiter will
see that and take the futex.

As I said before:

> > Random speculation is not helping here.

You still fail to provide the relevant information I asked for. If you
cannot provide that information, we can't help.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-09-16  8:06     ` Thomas Gleixner
@ 2015-09-16  9:52       ` Zhu Jefferry
  2015-09-16 10:22         ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Zhu Jefferry @ 2015-09-16  9:52 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, bigeasy

> > I assume your pseudo code set_waiter_bit is mapped to the real code
> > "futex_lock_pi_atomic", It's possible for futex_lock_pi_atomic to
> > successfully set FUTEX_WAITERS bit, but return with Page fault, for
> > example, like fail in lookup_pi_state().
> 
> No. It's not. lookup_pi_state() cannot return EFAULT. The only function
> which can fault inside of lock_pi_update_atomic() is the actual cmpxchg.
> Though lock_pi_update_atomic() can successfully set the waiter bit and
> then return with some other failure code (ESRCH, EAGAIN, ...). But that
> does not matter at all.
> 
> Any failure return will end up in a retry. And if the waker managed to
> release the futex before the retry takes place then the waiter will see
> that and take the futex.
> 
Let me try to descript the application failure here.

The application is a multi-thread program, to use the pairs of mutex_lock and 
mutex_unlock to protect the shared data structure. The type of this mutex
is PTHREAD_MUTEX_PI_RECURSIVE_NP. After running long time, to say several days,
the mutex_lock data structure in user space looks like corrupt.

   thread 0 can do mutex_lock/unlock     
   __lock = this thread | FUTEX_WAITERS
   __owner = 0, should be this thread
   __counter keep increasing, although there is no recursive mutex_lock call.

   thread 1 will be stuck 

The primary debugging shows the content of __lock is wrong in first. After a call of
Mutex_unlock, the value of __lock should not be this thread self. But we observed
The value of __lock is still self after unlock. So, other threads will be stuck,
This thread could lock due to recursive type and __counter keep increasing, 
although mutex_unlock return fails, due to the wrong value of __owner, 
but the application did not check the return value. So the thread 0 looks
like fine. But thread 1 will be stuck forever.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-09-16  9:52       ` Zhu Jefferry
@ 2015-09-16 10:22         ` Thomas Gleixner
  2015-09-16 11:13           ` Zhu Jefferry
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-09-16 10:22 UTC (permalink / raw)
  To: Zhu Jefferry; +Cc: linux-kernel, bigeasy

On Wed, 16 Sep 2015, Zhu Jefferry wrote:
> The application is a multi-thread program, to use the pairs of mutex_lock and 
> mutex_unlock to protect the shared data structure. The type of this mutex
> is PTHREAD_MUTEX_PI_RECURSIVE_NP. After running long time, to say several days,
> the mutex_lock data structure in user space looks like corrupt.
> 
>    thread 0 can do mutex_lock/unlock     
>    __lock = this thread | FUTEX_WAITERS
>    __owner = 0, should be this thread

The kernel does not know about __owner.

>    __counter keep increasing, although there is no recursive mutex_lock call.
>
>    thread 1 will be stuck 
> 
> The primary debugging shows the content of __lock is wrong in first. After a call of
> Mutex_unlock, the value of __lock should not be this thread self. But we observed
> The value of __lock is still self after unlock. So, other threads will be stuck,

How did you observe that?

> This thread could lock due to recursive type and __counter keep increasing, 
> although mutex_unlock return fails, due to the wrong value of __owner, 
> but the application did not check the return value. So the thread 0 looks
> like fine. But thread 1 will be stuck forever.

Oh well. So thread 0 looks all fine, despite not checking return
values.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-09-16 10:22         ` Thomas Gleixner
@ 2015-09-16 11:13           ` Zhu Jefferry
  2015-09-16 13:39             ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Zhu Jefferry @ 2015-09-16 11:13 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, bigeasy

> On Wed, 16 Sep 2015, Zhu Jefferry wrote:
> > The application is a multi-thread program, to use the pairs of
> > mutex_lock and mutex_unlock to protect the shared data structure. The
> > type of this mutex is PTHREAD_MUTEX_PI_RECURSIVE_NP. After running
> > long time, to say several days, the mutex_lock data structure in user
> space looks like corrupt.
> >
> >    thread 0 can do mutex_lock/unlock
> >    __lock = this thread | FUTEX_WAITERS
> >    __owner = 0, should be this thread
> 
> The kernel does not know about __owner.

Correct, it shows the last failure is in mutex_unlock, 
which clear the __owner in user space.

> 
> >    __counter keep increasing, although there is no recursive mutex_lock
> call.
> >
> >    thread 1 will be stuck
> > 
> > The primary debugging shows the content of __lock is wrong in first.
> > After a call of Mutex_unlock, the value of __lock should not be this
> > thread self. But we observed The value of __lock is still self after
> > unlock. So, other threads will be stuck,
> 
> How did you observe that?

Add one assert in mutex_unlock, after it finish the __lock modify either in
User space or kernel space, before return.

> 
> > This thread could lock due to recursive type and __counter keep
> > increasing, although mutex_unlock return fails, due to the wrong value
> > of __owner, but the application did not check the return value. So the
> > thread 0 looks like fine. But thread 1 will be stuck forever.
> 
> Oh well. So thread 0 looks all fine, despite not checking return values.
> 

Correct.

Actually, I'm not clear how about the state changing of futex in kernel.
I search the Internet, see a similar failure from other users. He is using 
Kernel 2.6.38. Our customer is using kernel 2.6.34 (WindRiver Linux 4.1)

    ====
    http://www.programdoc.com/1272_157986_1.htm

    Maybe, there is a bug about pi-futex, it would let the program in 
    user-space going to hang.
    We have a board: CPU is powerpc 8572, two core. after ran one month, 
    the state of pi-futex in user-space got bad: 
    mutex->__data.__lock is 0x8000023e, 
    mutex->__data.__count is 0, 
    mutex->__data.__owner is 0.

But I can not understand the sample failure case which he mentioned. But I think
It might be helpful for you to analyze the corner case.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-09-16 11:13           ` Zhu Jefferry
@ 2015-09-16 13:39             ` Thomas Gleixner
  2015-09-16 23:57               ` Zhu Jefferry
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-09-16 13:39 UTC (permalink / raw)
  To: Zhu Jefferry; +Cc: linux-kernel, bigeasy

On Wed, 16 Sep 2015, Zhu Jefferry wrote:
> > > The primary debugging shows the content of __lock is wrong in first.
> > > After a call of Mutex_unlock, the value of __lock should not be this
> > > thread self. But we observed The value of __lock is still self after
> > > unlock. So, other threads will be stuck,
> > 
> > How did you observe that?
> 
> Add one assert in mutex_unlock, after it finish the __lock modify either in
> User space or kernel space, before return.

And that assert tells you that the kernel screwed up the futex value?
No, it does not. It merily tells you that the value is not what you
expect, but it does not tell you what caused that.

Hint: There are proper instrumentation tools, e.g. tracing, which tell
you the exact flow of events and not just the observation after the
fact.

> > > This thread could lock due to recursive type and __counter keep
> > > increasing, although mutex_unlock return fails, due to the wrong value
> > > of __owner, but the application did not check the return value. So the
> > > thread 0 looks like fine. But thread 1 will be stuck forever.
> > 
> > Oh well. So thread 0 looks all fine, despite not checking return values.
> > 
> 
> Correct.

No. That's absolutely NOT correct. Not checking return values can
cause all kind of corruptions. Return values are there for a reason.
 
> Actually, I'm not clear how about the state changing of futex in kernel.
> I search the Internet, see a similar failure from other users. He is using 
> Kernel 2.6.38. Our customer is using kernel 2.6.34 (WindRiver Linux 4.1)

So your customer should talk to WindRiver about this. I have no idea
what kind of patches WindRiver has in their kernel and I really don't
want to know it.

If you can reproduce that issue against a recent mainline kernel, then
I'm happy to analyze that.

>     ====
>     http://www.programdoc.com/1272_157986_1.htm

Your supply of weird web pages seems to be infinite.
 
> But I can not understand the sample failure case which he mentioned. But I think
> It might be helpful for you to analyze the corner case.

No, it's absolutely NOT helpful because it's just random guesswork as
the flow he is describing is just not possible. That guy never showed
his test case, so I have no idea how he can 'proof' his theory.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-09-16 13:39             ` Thomas Gleixner
@ 2015-09-16 23:57               ` Zhu Jefferry
  2015-09-17  7:08                 ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Zhu Jefferry @ 2015-09-16 23:57 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, bigeasy

> On Wed, 16 Sep 2015, Zhu Jefferry wrote:
> > > > The primary debugging shows the content of __lock is wrong in first.
> > > > After a call of Mutex_unlock, the value of __lock should not be
> > > > this thread self. But we observed The value of __lock is still
> > > > self after unlock. So, other threads will be stuck,
> > >
> > > How did you observe that?
> >
> > Add one assert in mutex_unlock, after it finish the __lock modify
> > either in User space or kernel space, before return.
> 
> And that assert tells you that the kernel screwed up the futex value?
> No, it does not. It merily tells you that the value is not what you
> expect, but it does not tell you what caused that.
> 
> Hint: There are proper instrumentation tools, e.g. tracing, which tell
> you the exact flow of events and not just the observation after the fact.

I'm trying to get more details about the failure flow. But I'm told a little
Bit timing changing in the code might impact the failure appear in a longer time,
or even disappear.

> 
> > > > This thread could lock due to recursive type and __counter keep
> > > > increasing, although mutex_unlock return fails, due to the wrong
> > > > value of __owner, but the application did not check the return
> > > > value. So the thread 0 looks like fine. But thread 1 will be stuck
> forever.
> > >
> > > Oh well. So thread 0 looks all fine, despite not checking return
> values.
> > >
> >
> > Correct.
> 
> No. That's absolutely NOT correct. Not checking return values can cause
> all kind of corruptions. Return values are there for a reason.
> 

Besides the application did not check the return value, the mutex_unlock in
Libc did not check the return value from kernel neither. 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-09-16 23:57               ` Zhu Jefferry
@ 2015-09-17  7:08                 ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2015-09-17  7:08 UTC (permalink / raw)
  To: Zhu Jefferry; +Cc: linux-kernel, bigeasy

On Wed, 16 Sep 2015, Zhu Jefferry wrote:
> Besides the application did not check the return value, the mutex_unlock in
> Libc did not check the return value from kernel neither. 

That's even worse.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-06-17 14:28           ` Sebastian Andrzej Siewior
  2015-06-17 14:31             ` Mike Galbraith
@ 2015-06-21  4:35             ` Mike Galbraith
  1 sibling, 0 replies; 18+ messages in thread
From: Mike Galbraith @ 2015-06-21  4:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Steven Rostedt, Paul E. McKenney, linux-kernel

On Wed, 2015-06-17 at 16:28 +0200, Sebastian Andrzej Siewior wrote:
> On 06/17/2015 04:17 PM, Mike Galbraith wrote:
> > On Wed, 2015-06-17 at 10:33 +0200, Sebastian Andrzej Siewior wrote:
> >> wake_futex_pi() wakes the task before releasing the hash bucket lock
> >> (HB). The first thing the woken up task usually does is to acquire the
> >> lock which requires the HB lock. On SMP Systems this leads to blocking
> >> on the HB lock which is released by the owner shortly after.
> >> This patch rearranges the unlock path by first releasing the HB lock and
> >> then waking up the task.
> >>
> >> [bigeasy: redo ontop of lockless wake-queues]
> >> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> >> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > 
> > 4.1-rc8-rt4 contains this via 4.0-rt4, and seems fine on my 64 core
> > DL980.  I ran a few iterations of futextests and stockfish, then mixed
> > two loops of futextest at different rt prios, with stockfish also rt,
> > and ltplight as tossed in as... crack filler.  Box is still doing that,
> > is way too busy, but not griping about it.  
> 
> There are two patches mostly doing the same thing. The patch posted
> here is a redo ontop of "lockless wake-queues". It does hb-unlock,
> wakeup, de-boost. The patch merged into -RT is the original approach
> not using "lockless wake-queues" and performing wakeup, hb-unlock,
> de-boost.
> 
> I plan to get into -RT the final solution once it hits upstream.

I plugged patch1 and tip version into rt and beat it, seems solid.

Converting the rest of rtmutex.c to use wake queues with ->save_state to
select wake function went less well.  Kernel does a good impersonation
of a working kernel until I beat it up, then it loses wakeups.  Hohum,
so much for yet another early morning tinker session.

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-06-19 18:54           ` Thomas Gleixner
@ 2015-06-19 19:32             ` Kevin Hilman
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin Hilman @ 2015-06-19 19:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney,
	lkml, Tyler Baker, Olof Johansson, Tony Lindgren, linux-omap,
	Santosh Shilimkar, Felipe Balbi, Nishanth Menon

Thomas Gleixner <tglx@linutronix.de> writes:

> On Fri, 19 Jun 2015, Kevin Hilman wrote:
>> On Wed, Jun 17, 2015 at 1:33 AM, Sebastian Andrzej Siewior
>> A handful of boot test failures on ARM/OMAP were found by kernelci.org
>> in next-20150619[1] and were bisected down to this patch, which hit
>> next-20150619 in the form of commit 881bd58d6e9e (futex: Lower the
>> lock contention on the HB lock during wake up).  I confirmed that
>> reverting that patch on top of next-20150619 gets things booting again
>> for the affected platforms.
>> 
>> I haven't debugged this any further, but full boot logs are available
>> for the boot failures[2][3] and the linux-omap list and maintainer are
>> Cc'd here to help investigate further if needed.
>
> Found it. Dunno, how I missed that one. Fix below.
>

Yup, that fix on top of next-20150619 gets the two OMAP platforms
booting again.

Tested-by: Kevin Hilman <khilman@linaro.org>

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-06-19 17:51         ` Kevin Hilman
@ 2015-06-19 18:54           ` Thomas Gleixner
  2015-06-19 19:32             ` Kevin Hilman
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-06-19 18:54 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Sebastian Andrzej Siewior, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney,
	lkml, Tyler Baker, Olof Johansson, Tony Lindgren, linux-omap,
	Santosh Shilimkar, Felipe Balbi, Nishanth Menon

On Fri, 19 Jun 2015, Kevin Hilman wrote:
> On Wed, Jun 17, 2015 at 1:33 AM, Sebastian Andrzej Siewior
> A handful of boot test failures on ARM/OMAP were found by kernelci.org
> in next-20150619[1] and were bisected down to this patch, which hit
> next-20150619 in the form of commit 881bd58d6e9e (futex: Lower the
> lock contention on the HB lock during wake up).  I confirmed that
> reverting that patch on top of next-20150619 gets things booting again
> for the affected platforms.
> 
> I haven't debugged this any further, but full boot logs are available
> for the boot failures[2][3] and the linux-omap list and maintainer are
> Cc'd here to help investigate further if needed.

Found it. Dunno, how I missed that one. Fix below.

Thanks,

	tglx
---

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 10dbeb6fe96f..5674b073473c 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1365,9 +1365,14 @@ rt_mutex_fastunlock(struct rt_mutex *lock,
 	if (likely(rt_mutex_cmpxchg(lock, current, NULL))) {
 		rt_mutex_deadlock_account_unlock(current);
 
-	} else if (slowfn(lock, &wake_q)) {
+	} else {
+		bool deboost = slowfn(lock, &wake_q);
+
+		wake_up_q(&wake_q);
+
 		/* Undo pi boosting if necessary: */
-		rt_mutex_adjust_prio(current);
+		if (deboost)
+			rt_mutex_adjust_prio(current);
 	}
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-06-17  8:33       ` [PATCH v2] " Sebastian Andrzej Siewior
  2015-06-17 14:17         ` Mike Galbraith
@ 2015-06-19 17:51         ` Kevin Hilman
  2015-06-19 18:54           ` Thomas Gleixner
  1 sibling, 1 reply; 18+ messages in thread
From: Kevin Hilman @ 2015-06-19 17:51 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Steven Rostedt, Mike Galbraith, Paul E. McKenney, lkml,
	Tyler Baker, Olof Johansson, Tony Lindgren, linux-omap,
	Santosh Shilimkar, Felipe Balbi, Nishanth Menon

On Wed, Jun 17, 2015 at 1:33 AM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> wake_futex_pi() wakes the task before releasing the hash bucket lock
> (HB). The first thing the woken up task usually does is to acquire the
> lock which requires the HB lock. On SMP Systems this leads to blocking
> on the HB lock which is released by the owner shortly after.
> This patch rearranges the unlock path by first releasing the HB lock and
> then waking up the task.
>
> [bigeasy: redo ontop of lockless wake-queues]
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> * Davidlohr Bueso | 2015-06-16 12:50:26 [-0700]:
>
>>I prefer having two separate patches, thus keeping their own changelog
>>for the change justification.
>
> okay, here it is on top of #1.

A handful of boot test failures on ARM/OMAP were found by kernelci.org
in next-20150619[1] and were bisected down to this patch, which hit
next-20150619 in the form of commit 881bd58d6e9e (futex: Lower the
lock contention on the HB lock during wake up).  I confirmed that
reverting that patch on top of next-20150619 gets things booting again
for the affected platforms.

I haven't debugged this any further, but full boot logs are available
for the boot failures[2][3] and the linux-omap list and maintainer are
Cc'd here to help investigate further if needed.

Kevin

[1] http://kernelci.org/boot/all/job/next/kernel/next-20150619/
[2] http://storage.kernelci.org/next/next-20150619/arm-multi_v7_defconfig/lab-khilman/boot-omap5-uevm.html
[3] http://storage.kernelci.org/next/next-20150619/arm-omap2plus_defconfig/lab-tbaker/boot-omap3-beagle-xm.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-06-17 14:28           ` Sebastian Andrzej Siewior
@ 2015-06-17 14:31             ` Mike Galbraith
  2015-06-21  4:35             ` Mike Galbraith
  1 sibling, 0 replies; 18+ messages in thread
From: Mike Galbraith @ 2015-06-17 14:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Steven Rostedt, Paul E. McKenney, linux-kernel

On Wed, 2015-06-17 at 16:28 +0200, Sebastian Andrzej Siewior wrote:
> On 06/17/2015 04:17 PM, Mike Galbraith wrote:
> > On Wed, 2015-06-17 at 10:33 +0200, Sebastian Andrzej Siewior wrote:
> >> wake_futex_pi() wakes the task before releasing the hash bucket lock
> >> (HB). The first thing the woken up task usually does is to acquire the
> >> lock which requires the HB lock. On SMP Systems this leads to blocking
> >> on the HB lock which is released by the owner shortly after.
> >> This patch rearranges the unlock path by first releasing the HB lock and
> >> then waking up the task.
> >>
> >> [bigeasy: redo ontop of lockless wake-queues]
> >> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> >> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > 
> > 4.1-rc8-rt4 contains this via 4.0-rt4, and seems fine on my 64 core
> > DL980.  I ran a few iterations of futextests and stockfish, then mixed
> > two loops of futextest at different rt prios, with stockfish also rt,
> > and ltplight as tossed in as... crack filler.  Box is still doing that,
> > is way too busy, but not griping about it.  
> 
> There are two patches mostly doing the same thing. The patch posted
> here is a redo ontop of "lockless wake-queues". It does hb-unlock,
> wakeup, de-boost. The patch merged into -RT is the original approach
> not using "lockless wake-queues" and performing wakeup, hb-unlock,
> de-boost.
> 
> I plan to get into -RT the final solution once it hits upstream.

OK, a glance wasn't enough.  Guess I can let tired ole box rest.

	-Mike


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-06-17 14:17         ` Mike Galbraith
@ 2015-06-17 14:28           ` Sebastian Andrzej Siewior
  2015-06-17 14:31             ` Mike Galbraith
  2015-06-21  4:35             ` Mike Galbraith
  0 siblings, 2 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-06-17 14:28 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Steven Rostedt, Paul E. McKenney, linux-kernel

On 06/17/2015 04:17 PM, Mike Galbraith wrote:
> On Wed, 2015-06-17 at 10:33 +0200, Sebastian Andrzej Siewior wrote:
>> wake_futex_pi() wakes the task before releasing the hash bucket lock
>> (HB). The first thing the woken up task usually does is to acquire the
>> lock which requires the HB lock. On SMP Systems this leads to blocking
>> on the HB lock which is released by the owner shortly after.
>> This patch rearranges the unlock path by first releasing the HB lock and
>> then waking up the task.
>>
>> [bigeasy: redo ontop of lockless wake-queues]
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> 4.1-rc8-rt4 contains this via 4.0-rt4, and seems fine on my 64 core
> DL980.  I ran a few iterations of futextests and stockfish, then mixed
> two loops of futextest at different rt prios, with stockfish also rt,
> and ltplight as tossed in as... crack filler.  Box is still doing that,
> is way too busy, but not griping about it.  

There are two patches mostly doing the same thing. The patch posted
here is a redo ontop of "lockless wake-queues". It does hb-unlock,
wakeup, de-boost. The patch merged into -RT is the original approach
not using "lockless wake-queues" and performing wakeup, hb-unlock,
de-boost.

I plan to get into -RT the final solution once it hits upstream.

> 
> 	-Mike
> 

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-06-17  8:33       ` [PATCH v2] " Sebastian Andrzej Siewior
@ 2015-06-17 14:17         ` Mike Galbraith
  2015-06-17 14:28           ` Sebastian Andrzej Siewior
  2015-06-19 17:51         ` Kevin Hilman
  1 sibling, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2015-06-17 14:17 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Steven Rostedt, Paul E. McKenney, linux-kernel

On Wed, 2015-06-17 at 10:33 +0200, Sebastian Andrzej Siewior wrote:
> wake_futex_pi() wakes the task before releasing the hash bucket lock
> (HB). The first thing the woken up task usually does is to acquire the
> lock which requires the HB lock. On SMP Systems this leads to blocking
> on the HB lock which is released by the owner shortly after.
> This patch rearranges the unlock path by first releasing the HB lock and
> then waking up the task.
> 
> [bigeasy: redo ontop of lockless wake-queues]
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

4.1-rc8-rt4 contains this via 4.0-rt4, and seems fine on my 64 core
DL980.  I ran a few iterations of futextests and stockfish, then mixed
two loops of futextest at different rt prios, with stockfish also rt,
and ltplight as tossed in as... crack filler.  Box is still doing that,
is way too busy, but not griping about it.  

	-Mike


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] futex: lower the lock contention on the HB lock during wake up
  2015-06-16 19:50     ` Davidlohr Bueso
@ 2015-06-17  8:33       ` Sebastian Andrzej Siewior
  2015-06-17 14:17         ` Mike Galbraith
  2015-06-19 17:51         ` Kevin Hilman
  0 siblings, 2 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-06-17  8:33 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Mike Galbraith, Paul E. McKenney, linux-kernel

wake_futex_pi() wakes the task before releasing the hash bucket lock
(HB). The first thing the woken up task usually does is to acquire the
lock which requires the HB lock. On SMP Systems this leads to blocking
on the HB lock which is released by the owner shortly after.
This patch rearranges the unlock path by first releasing the HB lock and
then waking up the task.

[bigeasy: redo ontop of lockless wake-queues]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
* Davidlohr Bueso | 2015-06-16 12:50:26 [-0700]:

>I prefer having two separate patches, thus keeping their own changelog
>for the change justification.

okay, here it is on top of #1.

 kernel/futex.c                  | 32 +++++++++++++++++++++++---
 kernel/locking/rtmutex.c        | 51 +++++++++++++++++++++++++++++------------
 kernel/locking/rtmutex_common.h |  3 +++
 3 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index ea6ca0bca525..026594f02bd2 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1117,12 +1117,15 @@ static void mark_wake_futex(struct wake_q_head *wake_q, struct futex_q *q)
 	q->lock_ptr = NULL;
 }
 
-static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
+static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this,
+			 struct futex_hash_bucket *hb)
 {
 	struct task_struct *new_owner;
 	struct futex_pi_state *pi_state = this->pi_state;
 	u32 uninitialized_var(curval), newval;
 	int ret = 0;
+	WAKE_Q(wake_q);
+	bool deboost;
 
 	if (!pi_state)
 		return -EINVAL;
@@ -1173,7 +1176,19 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
 	raw_spin_unlock_irq(&new_owner->pi_lock);
 
 	raw_spin_unlock(&pi_state->pi_mutex.wait_lock);
-	rt_mutex_unlock(&pi_state->pi_mutex);
+
+	deboost = rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q);
+
+	/*
+	 * First unlock HB so the waiter does not spin on it once he got woken
+	 * up. Second wake up the waiter before the priority is adjusted. If we
+	 * deboost first (and lose our higher priority), then the task might get
+	 * scheduled away before the wake up can take place.
+	 */
+	spin_unlock(&hb->lock);
+	wake_up_q(&wake_q);
+	if (deboost)
+		rt_mutex_adjust_prio(current);
 
 	return 0;
 }
@@ -2410,13 +2425,23 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 	 */
 	match = futex_top_waiter(hb, &key);
 	if (match) {
-		ret = wake_futex_pi(uaddr, uval, match);
+		ret = wake_futex_pi(uaddr, uval, match, hb);
+		/*
+		 * In case of success wake_futex_pi dropped the hash
+		 * bucket lock.
+		 */
+		if (!ret)
+			goto out_putkey;
 		/*
 		 * The atomic access to the futex value generated a
 		 * pagefault, so retry the user-access and the wakeup:
 		 */
 		if (ret == -EFAULT)
 			goto pi_faulted;
+		/*
+		 * wake_futex_pi has detected invalid state. Tell user
+		 * space.
+		 */
 		goto out_unlock;
 	}
 
@@ -2437,6 +2462,7 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 
 out_unlock:
 	spin_unlock(&hb->lock);
+out_putkey:
 	put_futex_key(&key);
 	return ret;
 
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 74188d83b065..53fab686a1c2 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -300,7 +300,7 @@ static void __rt_mutex_adjust_prio(struct task_struct *task)
  * of task. We do not use the spin_xx_mutex() variants here as we are
  * outside of the debug path.)
  */
-static void rt_mutex_adjust_prio(struct task_struct *task)
+void rt_mutex_adjust_prio(struct task_struct *task)
 {
 	unsigned long flags;
 
@@ -1244,13 +1244,12 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
 }
 
 /*
- * Slow path to release a rt-mutex:
+ * Slow path to release a rt-mutex.
+ * Return whether the current task needs to undo a potential priority boosting.
  */
-static void __sched
-rt_mutex_slowunlock(struct rt_mutex *lock)
+static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock,
+					struct wake_q_head *wake_q)
 {
-	WAKE_Q(wake_q);
-
 	raw_spin_lock(&lock->wait_lock);
 
 	debug_rt_mutex_unlock(lock);
@@ -1291,7 +1290,7 @@ rt_mutex_slowunlock(struct rt_mutex *lock)
 	while (!rt_mutex_has_waiters(lock)) {
 		/* Drops lock->wait_lock ! */
 		if (unlock_rt_mutex_safe(lock) == true)
-			return;
+			return false;
 		/* Relock the rtmutex and try again */
 		raw_spin_lock(&lock->wait_lock);
 	}
@@ -1302,13 +1301,12 @@ rt_mutex_slowunlock(struct rt_mutex *lock)
 	 *
 	 * Queue the next waiter for wakeup once we release the wait_lock.
 	 */
-	mark_wakeup_next_waiter(&wake_q, lock);
+	mark_wakeup_next_waiter(wake_q, lock);
 
 	raw_spin_unlock(&lock->wait_lock);
-	wake_up_q(&wake_q);
 
-	/* Undo pi boosting if necessary: */
-	rt_mutex_adjust_prio(current);
+	/* check PI boosting */
+	return true;
 }
 
 /*
@@ -1359,12 +1357,18 @@ rt_mutex_fasttrylock(struct rt_mutex *lock,
 
 static inline void
 rt_mutex_fastunlock(struct rt_mutex *lock,
-		    void (*slowfn)(struct rt_mutex *lock))
+		    bool (*slowfn)(struct rt_mutex *lock,
+				   struct wake_q_head *wqh))
 {
-	if (likely(rt_mutex_cmpxchg(lock, current, NULL)))
+	WAKE_Q(wake_q);
+
+	if (likely(rt_mutex_cmpxchg(lock, current, NULL))) {
 		rt_mutex_deadlock_account_unlock(current);
-	else
-		slowfn(lock);
+
+	} else if (slowfn(lock, &wake_q)) {
+		/* Undo pi boosting if necessary: */
+		rt_mutex_adjust_prio(current);
+	}
 }
 
 /**
@@ -1466,6 +1470,23 @@ void __sched rt_mutex_unlock(struct rt_mutex *lock)
 EXPORT_SYMBOL_GPL(rt_mutex_unlock);
 
 /**
+ * rt_mutex_futex_unlock - Futex variant of rt_mutex_unlock
+ * @lock: the rt_mutex to be unlocked
+ *
+ * Returns: true/false indicating whether priority adjustment is
+ * required or not.
+ */
+bool __sched rt_mutex_futex_unlock(struct rt_mutex *lock,
+				   struct wake_q_head *wqh)
+{
+	if (likely(rt_mutex_cmpxchg(lock, current, NULL))) {
+		rt_mutex_deadlock_account_unlock(current);
+		return false;
+	}
+	return rt_mutex_slowunlock(lock, wqh);
+}
+
+/**
  * rt_mutex_destroy - mark a mutex unusable
  * @lock: the mutex to be destroyed
  *
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 855212501407..7844f8f0e639 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -131,6 +131,9 @@ extern int rt_mutex_finish_proxy_lock(struct rt_mutex *lock,
 				      struct hrtimer_sleeper *to,
 				      struct rt_mutex_waiter *waiter);
 extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to);
+extern bool rt_mutex_futex_unlock(struct rt_mutex *lock,
+				  struct wake_q_head *wqh);
+extern void rt_mutex_adjust_prio(struct task_struct *task);
 
 #ifdef CONFIG_DEBUG_RT_MUTEXES
 # include "rtmutex-debug.h"
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-09-17  7:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-15  1:26 [PATCH v2] futex: lower the lock contention on the HB lock during wake up Zhu Jefferry
2015-09-16  0:01 ` Thomas Gleixner
2015-09-16  0:17   ` Zhu Jefferry
2015-09-16  8:06     ` Thomas Gleixner
2015-09-16  9:52       ` Zhu Jefferry
2015-09-16 10:22         ` Thomas Gleixner
2015-09-16 11:13           ` Zhu Jefferry
2015-09-16 13:39             ` Thomas Gleixner
2015-09-16 23:57               ` Zhu Jefferry
2015-09-17  7:08                 ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2015-05-19 17:24 [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso
2015-05-19 17:24 ` [PATCH 1/4] locking/rtmutex: Implement lockless top-waiter wakeup Davidlohr Bueso
2015-06-16 19:29   ` [PATCH] futex: lower the lock contention on the HB lock during wake up Sebastian Andrzej Siewior
2015-06-16 19:50     ` Davidlohr Bueso
2015-06-17  8:33       ` [PATCH v2] " Sebastian Andrzej Siewior
2015-06-17 14:17         ` Mike Galbraith
2015-06-17 14:28           ` Sebastian Andrzej Siewior
2015-06-17 14:31             ` Mike Galbraith
2015-06-21  4:35             ` Mike Galbraith
2015-06-19 17:51         ` Kevin Hilman
2015-06-19 18:54           ` Thomas Gleixner
2015-06-19 19:32             ` Kevin Hilman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).