All of lore.kernel.org
 help / color / mirror / Atom feed
* commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
@ 2017-05-17 17:36 Markus Trippelsdorf
  2017-05-17 17:50 ` Florian Weimer
  2017-05-18  7:40 ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Markus Trippelsdorf @ 2017-05-17 17:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: fweimer, Peter Zijlstra, Thomas Gleixner

Since:
commit cfafcd117da0216520568c195cb2f6cd1980c4bb
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Wed Mar 22 11:35:58 2017 +0100

    futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()

glibc's nptl/tst-robustpi8 testcase fails:

glibc-build % ./nptl/tst-robustpi8
tst-robustpi8: ../nptl/pthread_mutex_lock.c:424: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.

pthread_mutex_lock.c:
415             if (INTERNAL_SYSCALL_ERROR_P (e, __err)
416                 && (INTERNAL_SYSCALL_ERRNO (e, __err) == ESRCH
417                     || INTERNAL_SYSCALL_ERRNO (e, __err) == EDEADLK))
418               {
419                 assert (INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK
420                         || (kind != PTHREAD_MUTEX_ERRORCHECK_NP
421                             && kind != PTHREAD_MUTEX_RECURSIVE_NP));
422                 /* ESRCH can happen only for non-robust PI mutexes where
423                    the owner of the lock died.  */
424                 assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);

During bisection the commit above hangs the machine when I run the
testcase.

See: https://sourceware.org/bugzilla/show_bug.cgi?id=21487

--
Markus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-17 17:36 commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure Markus Trippelsdorf
@ 2017-05-17 17:50 ` Florian Weimer
  2017-05-18  6:46   ` Peter Zijlstra
  2017-05-18  7:40 ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2017-05-17 17:50 UTC (permalink / raw)
  To: Markus Trippelsdorf, linux-kernel
  Cc: Peter Zijlstra, Thomas Gleixner, Torvald Riegel

On 05/17/2017 07:36 PM, Markus Trippelsdorf wrote:
> Since:
> commit cfafcd117da0216520568c195cb2f6cd1980c4bb
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Wed Mar 22 11:35:58 2017 +0100
> 
>     futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
> 
> glibc's nptl/tst-robustpi8 testcase fails:
> 
> glibc-build % ./nptl/tst-robustpi8
> tst-robustpi8: ../nptl/pthread_mutex_lock.c:424: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.
> 
> pthread_mutex_lock.c:
> 415             if (INTERNAL_SYSCALL_ERROR_P (e, __err)
> 416                 && (INTERNAL_SYSCALL_ERRNO (e, __err) == ESRCH
> 417                     || INTERNAL_SYSCALL_ERRNO (e, __err) == EDEADLK))
> 418               {
> 419                 assert (INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK
> 420                         || (kind != PTHREAD_MUTEX_ERRORCHECK_NP
> 421                             && kind != PTHREAD_MUTEX_RECURSIVE_NP));
> 422                 /* ESRCH can happen only for non-robust PI mutexes where
> 423                    the owner of the lock died.  */
> 424                 assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
> 
> During bisection the commit above hangs the machine when I run the
> testcase.
> 
> See: https://sourceware.org/bugzilla/show_bug.cgi?id=21487

Markus, could you confirm that it is chocking on the EAGAIN failure?  Or
is it something else?

What is userspace supposed to do with the error code?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-17 17:50 ` Florian Weimer
@ 2017-05-18  6:46   ` Peter Zijlstra
  2017-05-18  6:57     ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-18  6:46 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Markus Trippelsdorf, linux-kernel, Thomas Gleixner, Torvald Riegel

On Wed, May 17, 2017 at 07:50:31PM +0200, Florian Weimer wrote:
> On 05/17/2017 07:36 PM, Markus Trippelsdorf wrote:
> > Since:
> > commit cfafcd117da0216520568c195cb2f6cd1980c4bb
> > Author: Peter Zijlstra <peterz@infradead.org>
> > Date:   Wed Mar 22 11:35:58 2017 +0100
> > 
> >     futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
> > 
> > glibc's nptl/tst-robustpi8 testcase fails:
> > 
> > glibc-build % ./nptl/tst-robustpi8
> > tst-robustpi8: ../nptl/pthread_mutex_lock.c:424: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.
> > 
> > pthread_mutex_lock.c:
> > 415             if (INTERNAL_SYSCALL_ERROR_P (e, __err)
> > 416                 && (INTERNAL_SYSCALL_ERRNO (e, __err) == ESRCH
> > 417                     || INTERNAL_SYSCALL_ERRNO (e, __err) == EDEADLK))
> > 418               {
> > 419                 assert (INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK
> > 420                         || (kind != PTHREAD_MUTEX_ERRORCHECK_NP
> > 421                             && kind != PTHREAD_MUTEX_RECURSIVE_NP));
> > 422                 /* ESRCH can happen only for non-robust PI mutexes where
> > 423                    the owner of the lock died.  */
> > 424                 assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
> > 
> > During bisection the commit above hangs the machine when I run the
> > testcase.
> > 
> > See: https://sourceware.org/bugzilla/show_bug.cgi?id=21487
> 
> Markus, could you confirm that it is chocking on the EAGAIN failure?  Or
> is it something else?
> 
> What is userspace supposed to do with the error code?

IIRC that -EAGAIN should not get to userspace. The kernel should retry
the lock operation. I'll go stare at it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18  6:46   ` Peter Zijlstra
@ 2017-05-18  6:57     ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-18  6:57 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Markus Trippelsdorf, linux-kernel, Thomas Gleixner, Torvald Riegel

On Thu, May 18, 2017 at 08:46:17AM +0200, Peter Zijlstra wrote:
> On Wed, May 17, 2017 at 07:50:31PM +0200, Florian Weimer wrote:
> > On 05/17/2017 07:36 PM, Markus Trippelsdorf wrote:
> > > Since:
> > > commit cfafcd117da0216520568c195cb2f6cd1980c4bb
> > > Author: Peter Zijlstra <peterz@infradead.org>
> > > Date:   Wed Mar 22 11:35:58 2017 +0100
> > > 
> > >     futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
> > > 
> > > glibc's nptl/tst-robustpi8 testcase fails:
> > > 
> > > glibc-build % ./nptl/tst-robustpi8
> > > tst-robustpi8: ../nptl/pthread_mutex_lock.c:424: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.
> > > 
> > > pthread_mutex_lock.c:
> > > 415             if (INTERNAL_SYSCALL_ERROR_P (e, __err)
> > > 416                 && (INTERNAL_SYSCALL_ERRNO (e, __err) == ESRCH
> > > 417                     || INTERNAL_SYSCALL_ERRNO (e, __err) == EDEADLK))
> > > 418               {
> > > 419                 assert (INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK
> > > 420                         || (kind != PTHREAD_MUTEX_ERRORCHECK_NP
> > > 421                             && kind != PTHREAD_MUTEX_RECURSIVE_NP));
> > > 422                 /* ESRCH can happen only for non-robust PI mutexes where
> > > 423                    the owner of the lock died.  */
> > > 424                 assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
> > > 
> > > During bisection the commit above hangs the machine when I run the
> > > testcase.
> > > 
> > > See: https://sourceware.org/bugzilla/show_bug.cgi?id=21487
> > 
> > Markus, could you confirm that it is chocking on the EAGAIN failure?  Or
> > is it something else?
> > 
> > What is userspace supposed to do with the error code?
> 
> IIRC that -EAGAIN should not get to userspace. The kernel should retry
> the lock operation. I'll go stare at it.

So commit:

  bebe5b514345 ("futex: Futex_unlock_pi() determinism")

put a WARN_ON_ONCE() on that -EAGAIN condition, and since that doesn't
appear to be triggering, I suspect something else is buggered.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-17 17:36 commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure Markus Trippelsdorf
  2017-05-17 17:50 ` Florian Weimer
@ 2017-05-18  7:40 ` Peter Zijlstra
  2017-05-18  8:04   ` Markus Trippelsdorf
  2017-05-18  8:12   ` Florian Weimer
  1 sibling, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-18  7:40 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: linux-kernel, fweimer, Thomas Gleixner

On Wed, May 17, 2017 at 07:36:46PM +0200, Markus Trippelsdorf wrote:
> Since:
> commit cfafcd117da0216520568c195cb2f6cd1980c4bb
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Wed Mar 22 11:35:58 2017 +0100
> 
>     futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
> 
> glibc's nptl/tst-robustpi8 testcase fails:
> 
> glibc-build % ./nptl/tst-robustpi8
> tst-robustpi8: ../nptl/pthread_mutex_lock.c:424: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.

OK, so how do I get those tests build?

I did a checkout of glibc.git

glibc$ mkdir build; cd build
build$ ../configure --disable-sanity-checks ; make -j40
build$ ./nptl/tst-robustpi8
-bash: ./nptl/tst-robustpi8: No such file or directory

"make tests" doesn't seem to work either even though its a build target
listed in the Makefiles.

What magic incantation do I need?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18  7:40 ` Peter Zijlstra
@ 2017-05-18  8:04   ` Markus Trippelsdorf
  2017-05-18  8:14     ` Peter Zijlstra
  2017-05-18  8:12   ` Florian Weimer
  1 sibling, 1 reply; 18+ messages in thread
From: Markus Trippelsdorf @ 2017-05-18  8:04 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, fweimer, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]

On 2017.05.18 at 09:40 +0200, Peter Zijlstra wrote:
> On Wed, May 17, 2017 at 07:36:46PM +0200, Markus Trippelsdorf wrote:
> > Since:
> > commit cfafcd117da0216520568c195cb2f6cd1980c4bb
> > Author: Peter Zijlstra <peterz@infradead.org>
> > Date:   Wed Mar 22 11:35:58 2017 +0100
> > 
> >     futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
> > 
> > glibc's nptl/tst-robustpi8 testcase fails:
> > 
> > glibc-build % ./nptl/tst-robustpi8
> > tst-robustpi8: ../nptl/pthread_mutex_lock.c:424: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.
> 
> OK, so how do I get those tests build?
> 
> I did a checkout of glibc.git
> 
> glibc$ mkdir build; cd build
> build$ ../configure --disable-sanity-checks ; make -j40
> build$ ./nptl/tst-robustpi8
> -bash: ./nptl/tst-robustpi8: No such file or directory
> 
> "make tests" doesn't seem to work either even though its a build target
> listed in the Makefiles.
> 
> What magic incantation do I need?

Something like:

 % ~/glibc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --with-headers=/usr/include --enable-add-ons --enable-kernel=4.10 --with-tls --with-__thread --enable-bind-now --without-gd --without-cvs --disable-profile --disable-multi-arch --disable-werror 
 % make -j128
 % make -j128 check

But I've attached the testcase, so building should be unnecessary.

-- 
Markus

[-- Attachment #2: tst-robustpi8.xz --]
[-- Type: application/octet-stream, Size: 23544 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18  7:40 ` Peter Zijlstra
  2017-05-18  8:04   ` Markus Trippelsdorf
@ 2017-05-18  8:12   ` Florian Weimer
  2017-05-18  8:31     ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2017-05-18  8:12 UTC (permalink / raw)
  To: Peter Zijlstra, Markus Trippelsdorf; +Cc: linux-kernel, Thomas Gleixner

On 05/18/2017 09:40 AM, Peter Zijlstra wrote:
> OK, so how do I get those tests build?
> 
> I did a checkout of glibc.git
> 
> glibc$ mkdir build; cd build
> build$ ../configure --disable-sanity-checks ; make -j40
> build$ ./nptl/tst-robustpi8
> -bash: ./nptl/tst-robustpi8: No such file or directory
> 
> "make tests" doesn't seem to work either even though its a build target
> listed in the Makefiles.

This builds the nptl test:

make subdirs=nptl run-built-tests=no

To run a single test, use this:

bash -x testrun.sh nptl/tst-robustpi8

This is required because the test might not be compatible with the
installed glibc (some nptl tests are whitebox tests).  You can feed the
command line printed due to “bash -x” into strace or a debugger.  The
environment variable settings aren't necessary for most tests.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18  8:04   ` Markus Trippelsdorf
@ 2017-05-18  8:14     ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-18  8:14 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: linux-kernel, fweimer, Thomas Gleixner

On Thu, May 18, 2017 at 10:04:39AM +0200, Markus Trippelsdorf wrote:
> On 2017.05.18 at 09:40 +0200, Peter Zijlstra wrote:
> > On Wed, May 17, 2017 at 07:36:46PM +0200, Markus Trippelsdorf wrote:
> > > Since:
> > > commit cfafcd117da0216520568c195cb2f6cd1980c4bb
> > > Author: Peter Zijlstra <peterz@infradead.org>
> > > Date:   Wed Mar 22 11:35:58 2017 +0100
> > > 
> > >     futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
> > > 
> > > glibc's nptl/tst-robustpi8 testcase fails:
> > > 
> > > glibc-build % ./nptl/tst-robustpi8
> > > tst-robustpi8: ../nptl/pthread_mutex_lock.c:424: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.
> > 
> > OK, so how do I get those tests build?
> > 
> > I did a checkout of glibc.git
> > 
> > glibc$ mkdir build; cd build
> > build$ ../configure --disable-sanity-checks ; make -j40
> > build$ ./nptl/tst-robustpi8
> > -bash: ./nptl/tst-robustpi8: No such file or directory
> > 
> > "make tests" doesn't seem to work either even though its a build target
> > listed in the Makefiles.
> > 
> > What magic incantation do I need?
> 
> Something like:
> 
>  % ~/glibc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --with-headers=/usr/include --enable-add-ons --enable-kernel=4.10 --with-tls --with-__thread --enable-bind-now --without-gd --without-cvs --disable-profile --disable-multi-arch --disable-werror 

I specifically didn't want to set --prefix=/usr etc.. because I didn't
want to run any risk of the thing actually installing and wrecking my
system, I'll try the other bits through.

>  % make -j128
>  % make -j128 check

make check is failing here:

/usr/local/src/glibc/build/math/test-ldouble-carg.o -MD -MP -MF /usr/local/src/glibc/build/math/test-ldouble-carg.o.dt -MT /usr/local/src/glibc/build/math/test-ldouble-carg.o
/usr/bin/ld: /usr/local/src/glibc/build/math/test-math-iszero.o: relocation R_X86_64_32S against `.text' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status

> But I've attached the testcase, so building should be unnecessary.

Thanks..

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18  8:12   ` Florian Weimer
@ 2017-05-18  8:31     ` Peter Zijlstra
  2017-05-18  8:34       ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-18  8:31 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Markus Trippelsdorf, linux-kernel, Thomas Gleixner

On Thu, May 18, 2017 at 10:12:04AM +0200, Florian Weimer wrote:
> On 05/18/2017 09:40 AM, Peter Zijlstra wrote:
> > OK, so how do I get those tests build?
> > 
> > I did a checkout of glibc.git
> > 
> > glibc$ mkdir build; cd build
> > build$ ../configure --disable-sanity-checks ; make -j40
> > build$ ./nptl/tst-robustpi8
> > -bash: ./nptl/tst-robustpi8: No such file or directory
> > 
> > "make tests" doesn't seem to work either even though its a build target
> > listed in the Makefiles.
> 
> This builds the nptl test:
> 
> make subdirs=nptl run-built-tests=no

That more or less worked, it still failed:

/usr/bin/ld: /usr/local/src/glibc/build/nptl/tst-once5.o: relocation R_X86_64_32 against `.text' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status

But it does that after building the tst-robustpi8 thing, so I seem to
have all I need here.

> To run a single test, use this:
> 
> bash -x testrun.sh nptl/tst-robustpi8
> 
> This is required because the test might not be compatible with the
> installed glibc (some nptl tests are whitebox tests).  You can feed the
> command line printed due to “bash -x” into strace or a debugger.  The
> environment variable settings aren't necessary for most tests.

Indeed, that seems to work:

  $ ./elf/ld-linux-x86-64.so.2 --library-path .:./math:./elf:./dlfcn:./nss:./nis:./rt:./resolv:./crypt:./mathvec:./support:./nptl ./nptl/tst-robustpi8
  tst-robustpi8: ../nptl/pthread_mutex_lock.c:424: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18  8:31     ` Peter Zijlstra
@ 2017-05-18  8:34       ` Florian Weimer
  2017-05-18  8:41         ` Peter Zijlstra
                           ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Florian Weimer @ 2017-05-18  8:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Markus Trippelsdorf, linux-kernel, Thomas Gleixner

On 05/18/2017 10:31 AM, Peter Zijlstra wrote:
> That more or less worked, it still failed:
> 
> /usr/bin/ld: /usr/local/src/glibc/build/nptl/tst-once5.o: relocation R_X86_64_32 against `.text' can not be used when making a shared object; recompile with -fPIC
> /usr/bin/ld: final link failed: Nonrepresentable section on output
> collect2: error: ld returned 1 exit status

I think this is a bug in the binutils version shipping with Debian jessie.

> But it does that after building the tst-robustpi8 thing, so I seem to
> have all I need here.

Great, have fun figuring out what's going on. :-/

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18  8:34       ` Florian Weimer
@ 2017-05-18  8:41         ` Peter Zijlstra
  2017-05-18  8:42           ` Florian Weimer
  2017-05-18 11:43         ` Peter Zijlstra
  2017-05-19 15:48         ` Peter Zijlstra
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-18  8:41 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Markus Trippelsdorf, linux-kernel, Thomas Gleixner

On Thu, May 18, 2017 at 10:34:34AM +0200, Florian Weimer wrote:
> On 05/18/2017 10:31 AM, Peter Zijlstra wrote:
> > That more or less worked, it still failed:
> > 
> > /usr/bin/ld: /usr/local/src/glibc/build/nptl/tst-once5.o: relocation R_X86_64_32 against `.text' can not be used when making a shared object; recompile with -fPIC
> > /usr/bin/ld: final link failed: Nonrepresentable section on output
> > collect2: error: ld returned 1 exit status
> 
> I think this is a bug in the binutils version shipping with Debian jessie.

I am indeed on debian/testing (which is jessie just now I think), I'll
apt-get upgrade to see if its fixed, otherwise, I'll just have to live
with it for now.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18  8:41         ` Peter Zijlstra
@ 2017-05-18  8:42           ` Florian Weimer
  0 siblings, 0 replies; 18+ messages in thread
From: Florian Weimer @ 2017-05-18  8:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Markus Trippelsdorf, linux-kernel, Thomas Gleixner

On 05/18/2017 10:41 AM, Peter Zijlstra wrote:
> On Thu, May 18, 2017 at 10:34:34AM +0200, Florian Weimer wrote:
>> On 05/18/2017 10:31 AM, Peter Zijlstra wrote:
>>> That more or less worked, it still failed:
>>>
>>> /usr/bin/ld: /usr/local/src/glibc/build/nptl/tst-once5.o: relocation R_X86_64_32 against `.text' can not be used when making a shared object; recompile with -fPIC
>>> /usr/bin/ld: final link failed: Nonrepresentable section on output
>>> collect2: error: ld returned 1 exit status
>>
>> I think this is a bug in the binutils version shipping with Debian jessie.
> 
> I am indeed on debian/testing (which is jessie just now I think), I'll
> apt-get upgrade to see if its fixed, otherwise, I'll just have to live
> with it for now.

Careful.  It's fixed in *current* debian/testing (stretch), so a very
large upgrade might be waiting for you.

Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18  8:34       ` Florian Weimer
  2017-05-18  8:41         ` Peter Zijlstra
@ 2017-05-18 11:43         ` Peter Zijlstra
  2017-05-18 12:30           ` Peter Zijlstra
  2017-05-19 15:48         ` Peter Zijlstra
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-18 11:43 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Markus Trippelsdorf, linux-kernel, Thomas Gleixner

On Thu, May 18, 2017 at 10:34:34AM +0200, Florian Weimer wrote:
> On 05/18/2017 10:31 AM, Peter Zijlstra wrote:

> > But it does that after building the tst-robustpi8 thing, so I seem to
> > have all I need here.
> 
> Great, have fun figuring out what's going on. :-/



 ld-linux-x86-64-2165  [018] ....   290.235869: sched_process_fork: comm=ld-linux-x86-64 pid=2165 child_comm=ld-linux-x86-64 child_pid=2166
 ld-linux-x86-64-2166  [019] ....   290.436398: handle_futex_death: 00007f066634e870: 876 -> 40000000
 ld-linux-x86-64-2166  [019] ....   290.436399: handle_futex_death: 00007f066634e0c8: 876 -> 40000000
 ld-linux-x86-64-2166  [019] ....   290.436400: handle_futex_death: 00007f066634ee38: 80000876 -> c0000000
 ld-linux-x86-64-2166  [019] ....   290.436401: sched_process_exit: comm=ld-linux-x86-64 pid=2166 prio=120
 ld-linux-x86-64-2164  [019] ...1   290.436546: attach_to_pi_owner: 2: 00007f066634e078 = 80000876




 ld-linux-x86-64-2183  [026] ....   827.987914: sched_process_fork: comm=ld-linux-x86-64 pid=2183 child_comm=ld-linux-x86-64 child_pid=2187
 ld-linux-x86-64-2187  [029] ....   828.188218: handle_futex_death: 00007f76dd361690: 88b -> 40000000
 ld-linux-x86-64-2187  [029] ....   828.188219: handle_futex_death: 00007f76dd361898: 8000088b -> c0000000
 ld-linux-x86-64-2187  [029] ....   828.188220: handle_futex_death: 00007f76dd3615c8: 8000088b -> c0000000
 ld-linux-x86-64-2187  [029] ....   828.188220: handle_futex_death: 00007f76dd3612d0: 8000088b -> c0000000
 ld-linux-x86-64-2187  [029] ....   828.188220: handle_futex_death: 00007f76dd361af0: 8000088b -> c0000000
 ld-linux-x86-64-2187  [029] ....   828.188221: handle_futex_death: 00007f76dd361168: 8000088b -> c0000000
 ld-linux-x86-64-2187  [029] ....   828.188222: sched_process_exit: comm=ld-linux-x86-64 pid=2187 prio=120
 ld-linux-x86-64-2182  [019] ...1   828.188373: attach_to_pi_owner: 2: 00007f76dd361000 = 8000088b



In both cases we fail in FUTEX_LOCK_PI trying to acquire a futex owned
by a dead task, resulting in the -ESRCH.

Now, pthread_mutex_lock() isn't expecting -ESRCH for robust futexes,
because for robust we'd expect handle_futex_death() to clear out the
futex value and set OWNER_DIED, as can be seen above.

The problem is however that the futex address we fail on, doesn't appear
to have been fixed up, so its either not on the robust list, or the
robust list got broken.

I'll see if I can narrow that down a little more.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18 11:43         ` Peter Zijlstra
@ 2017-05-18 12:30           ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-18 12:30 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Markus Trippelsdorf, linux-kernel, Thomas Gleixner

On Thu, May 18, 2017 at 01:43:23PM +0200, Peter Zijlstra wrote:
> On Thu, May 18, 2017 at 10:34:34AM +0200, Florian Weimer wrote:
> > On 05/18/2017 10:31 AM, Peter Zijlstra wrote:
> 
> > > But it does that after building the tst-robustpi8 thing, so I seem to
> > > have all I need here.
> > 
> > Great, have fun figuring out what's going on. :-/
> 
> 
> 
>  ld-linux-x86-64-2165  [018] ....   290.235869: sched_process_fork: comm=ld-linux-x86-64 pid=2165 child_comm=ld-linux-x86-64 child_pid=2166
>  ld-linux-x86-64-2166  [019] ....   290.436398: handle_futex_death: 00007f066634e870: 876 -> 40000000
>  ld-linux-x86-64-2166  [019] ....   290.436399: handle_futex_death: 00007f066634e0c8: 876 -> 40000000
>  ld-linux-x86-64-2166  [019] ....   290.436400: handle_futex_death: 00007f066634ee38: 80000876 -> c0000000
>  ld-linux-x86-64-2166  [019] ....   290.436401: sched_process_exit: comm=ld-linux-x86-64 pid=2166 prio=120
>  ld-linux-x86-64-2164  [019] ...1   290.436546: attach_to_pi_owner: 2: 00007f066634e078 = 80000876
> 
> 
> 
> 
>  ld-linux-x86-64-2183  [026] ....   827.987914: sched_process_fork: comm=ld-linux-x86-64 pid=2183 child_comm=ld-linux-x86-64 child_pid=2187
>  ld-linux-x86-64-2187  [029] ....   828.188218: handle_futex_death: 00007f76dd361690: 88b -> 40000000
>  ld-linux-x86-64-2187  [029] ....   828.188219: handle_futex_death: 00007f76dd361898: 8000088b -> c0000000
>  ld-linux-x86-64-2187  [029] ....   828.188220: handle_futex_death: 00007f76dd3615c8: 8000088b -> c0000000
>  ld-linux-x86-64-2187  [029] ....   828.188220: handle_futex_death: 00007f76dd3612d0: 8000088b -> c0000000
>  ld-linux-x86-64-2187  [029] ....   828.188220: handle_futex_death: 00007f76dd361af0: 8000088b -> c0000000
>  ld-linux-x86-64-2187  [029] ....   828.188221: handle_futex_death: 00007f76dd361168: 8000088b -> c0000000
>  ld-linux-x86-64-2187  [029] ....   828.188222: sched_process_exit: comm=ld-linux-x86-64 pid=2187 prio=120
>  ld-linux-x86-64-2182  [019] ...1   828.188373: attach_to_pi_owner: 2: 00007f76dd361000 = 8000088b
> 
> 
> 
> In both cases we fail in FUTEX_LOCK_PI trying to acquire a futex owned
> by a dead task, resulting in the -ESRCH.
> 
> Now, pthread_mutex_lock() isn't expecting -ESRCH for robust futexes,
> because for robust we'd expect handle_futex_death() to clear out the
> futex value and set OWNER_DIED, as can be seen above.
> 
> The problem is however that the futex address we fail on, doesn't appear
> to have been fixed up, so its either not on the robust list, or the
> robust list got broken.

The robust list walk finishes without issue. So no premature
terminations. The address really isn't on it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-18  8:34       ` Florian Weimer
  2017-05-18  8:41         ` Peter Zijlstra
  2017-05-18 11:43         ` Peter Zijlstra
@ 2017-05-19 15:48         ` Peter Zijlstra
  2017-05-19 16:07           ` Peter Zijlstra
                             ` (2 more replies)
  2 siblings, 3 replies; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-19 15:48 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Markus Trippelsdorf, linux-kernel, Thomas Gleixner,
	Sebastian Andrzej Siewior, Darren Hart

On Thu, May 18, 2017 at 10:34:34AM +0200, Florian Weimer wrote:
> On 05/18/2017 10:31 AM, Peter Zijlstra wrote:

> > But it does that after building the tst-robustpi8 thing, so I seem to
> > have all I need here.
> 
> Great, have fun figuring out what's going on. :-/

Please test the below, it cures things for me.


---
Subject: futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()

Markus reported that the glibc/nptl/tst-robustpi8 test was failing after
commit:

  cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()")

Much tracing later I managed to catch the culprit:

 ld-linux-x86-64-2161  [019] ....   410.760971: SyS_futex: 00007ffbeb76b028: 80000875  op=FUTEX_LOCK_PI
 ld-linux-x86-64-2161  [019] ...1   410.760972: lock_pi_update_atomic: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000875 ret=0
 ld-linux-x86-64-2165  [011] ....   410.760978: SyS_futex: 00007ffbeb76b028: 80000875  op=FUTEX_UNLOCK_PI
 ld-linux-x86-64-2165  [011] d..1   410.760979: do_futex: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000871 ret=0
 ld-linux-x86-64-2165  [011] ....   410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=0000
 ld-linux-x86-64-2161  [019] ....   410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=ETIMEDOUT

What we see here is 2165 doing an UNLOCK_PI, assigning the lock to the
pending 2161 which then returns with -ETIMEDOUT.

That wrecks the lock state, because now the owner isn't aware it
acquired the lock and we remove the pending robust list entry.

If we then kill 2161, the robust list will not clear out this futex and
the subsequent acquire on this futex will then (correctly) result in
-ESRCH which is unexpected by glibc which then triggers an internal
assertion and dies.

Thomas spotted the bug in rt_mutex_cleanup_proxy_lock(). I had
overlooked that mark_wakeup_next_waiter() results in !rt_mutex_owner()
instead of it pointing to the (next) task. This is to allow lock
stealing.

This means that we need to do a try_to_take_rt_mutex() call in order to
set rt_mutex_owner() before we test to see if its ours.

While there, fix what looks like a merge error which resulted in
rt_mutex_cleanup_proxy_lock() having two calls to
fixup_rt_mutex_waiters() and rt_mutex_wait_proxy_lock() not having any.
Both should have one, since both potentially touch the waiter list.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Fixes: 38d589f2fd08 ("futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/locking/rtmutex.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index b95509416909..28cd09e635ed 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1785,12 +1785,14 @@ int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
 	int ret;
 
 	raw_spin_lock_irq(&lock->wait_lock);
-
-	set_current_state(TASK_INTERRUPTIBLE);
-
 	/* sleep on the mutex */
+	set_current_state(TASK_INTERRUPTIBLE);
 	ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter);
-
+	/*
+	 * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
+	 * have to fix that up.
+	 */
+	fixup_rt_mutex_waiters(lock);
 	raw_spin_unlock_irq(&lock->wait_lock);
 
 	return ret;
@@ -1822,15 +1824,25 @@ bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
 
 	raw_spin_lock_irq(&lock->wait_lock);
 	/*
+	 * Do an unconditional try-lock, this deals with the lock stealing
+	 * state where __rt_mutex_futex_unlock() -> mark_wakeup_next_waiter()
+	 * sets a NULL owner.
+	 *
+	 * We're not interested in the return value, because the subsequent
+	 * test on rt_mutex_owner() will infer that. If the trylock succeeded,
+	 * we will own the lock and it will have removed the waiter. If we
+	 * failed the trylock, we're still not owner and we need to remove
+	 * ourselves.
+	 */
+	try_to_take_rt_mutex(lock, current, waiter);
+	/*
 	 * Unless we're the owner; we're still enqueued on the wait_list.
 	 * So check if we became owner, if not, take us off the wait_list.
 	 */
 	if (rt_mutex_owner(lock) != current) {
 		remove_waiter(lock, waiter);
-		fixup_rt_mutex_waiters(lock);
 		cleanup = true;
 	}
-
 	/*
 	 * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
 	 * have to fix that up.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-19 15:48         ` Peter Zijlstra
@ 2017-05-19 16:07           ` Peter Zijlstra
  2017-05-19 17:35           ` Markus Trippelsdorf
  2017-05-22 20:04           ` [tip:locking/urgent] futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock() tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2017-05-19 16:07 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Markus Trippelsdorf, linux-kernel, Thomas Gleixner,
	Sebastian Andrzej Siewior, Darren Hart

On Fri, May 19, 2017 at 05:48:50PM +0200, Peter Zijlstra wrote:

> Markus reported that the glibc/nptl/tst-robustpi8 test was failing after
> commit:
> 
>   cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()")
> 
> Much tracing later I managed to catch the culprit:
> 
>  ld-linux-x86-64-2161  [019] ....   410.760971: SyS_futex: 00007ffbeb76b028: 80000875  op=FUTEX_LOCK_PI
>  ld-linux-x86-64-2161  [019] ...1   410.760972: lock_pi_update_atomic: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000875 ret=0
>  ld-linux-x86-64-2165  [011] ....   410.760978: SyS_futex: 00007ffbeb76b028: 80000875  op=FUTEX_UNLOCK_PI
>  ld-linux-x86-64-2165  [011] d..1   410.760979: do_futex: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000871 ret=0
>  ld-linux-x86-64-2165  [011] ....   410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=0000
>  ld-linux-x86-64-2161  [019] ....   410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=ETIMEDOUT

The above trace continues like:

 ld-linux-x86-64-2164  [006] ....   410.762336: SyS_futex: 00007ffbeb76b028: 80000871  op=FUTEX_LOCK_PI
 ld-linux-x86-64-2164  [006] ...1   410.762337: lock_pi_update_atomic: 00007ffbeb76b028: curval=80000871 uval=80000871 newval=80000871 ret=0
 ld-linux-x86-64-2164  [006] ....   410.762347: SyS_futex: 00007ffbeb76b028: 80000871 ret=ETIMEDOUT
 ld-linux-x86-64-2161  [019] ....   410.762521: SyS_futex: 00007ffbeb76b028: 80000871  op=FUTEX_LOCK_PI
 ld-linux-x86-64-2161  [019] ....   410.762522: SyS_futex: 00007ffbeb76b028: 80000871 ret=EDEADLK

And every subsequent attempt by 2161 will (obviously) return EDEADLK.

Now since the test explicitly tracks the lock state[] and
pthread_mutex_*lock() return values this _should_ have triggered one of
the printf()'s, but I never saw any of those.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure
  2017-05-19 15:48         ` Peter Zijlstra
  2017-05-19 16:07           ` Peter Zijlstra
@ 2017-05-19 17:35           ` Markus Trippelsdorf
  2017-05-22 20:04           ` [tip:locking/urgent] futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock() tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 18+ messages in thread
From: Markus Trippelsdorf @ 2017-05-19 17:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Florian Weimer, linux-kernel, Thomas Gleixner,
	Sebastian Andrzej Siewior, Darren Hart

On 2017.05.19 at 17:48 +0200, Peter Zijlstra wrote:
> On Thu, May 18, 2017 at 10:34:34AM +0200, Florian Weimer wrote:
> > On 05/18/2017 10:31 AM, Peter Zijlstra wrote:
> 
> > > But it does that after building the tst-robustpi8 thing, so I seem to
> > > have all I need here.
> > 
> > Great, have fun figuring out what's going on. :-/
> 
> Please test the below, it cures things for me.

Thanks. The glibc testsuite now finishes without any failure.

-- 
Markus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:locking/urgent] futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()
  2017-05-19 15:48         ` Peter Zijlstra
  2017-05-19 16:07           ` Peter Zijlstra
  2017-05-19 17:35           ` Markus Trippelsdorf
@ 2017-05-22 20:04           ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-05-22 20:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, bigeasy, dvhart, peterz, markus, fweimer, linux-kernel, mingo, tglx

Commit-ID:  04dc1b2fff4e96cb4142227fbdc63c8871ad4ed9
Gitweb:     http://git.kernel.org/tip/04dc1b2fff4e96cb4142227fbdc63c8871ad4ed9
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 19 May 2017 17:48:50 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 22 May 2017 21:57:18 +0200

futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()

Markus reported that the glibc/nptl/tst-robustpi8 test was failing after
commit:

  cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()")

The following trace shows the problem:

 ld-linux-x86-64-2161  [019] ....   410.760971: SyS_futex: 00007ffbeb76b028: 80000875  op=FUTEX_LOCK_PI
 ld-linux-x86-64-2161  [019] ...1   410.760972: lock_pi_update_atomic: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000875 ret=0
 ld-linux-x86-64-2165  [011] ....   410.760978: SyS_futex: 00007ffbeb76b028: 80000875  op=FUTEX_UNLOCK_PI
 ld-linux-x86-64-2165  [011] d..1   410.760979: do_futex: 00007ffbeb76b028: curval=80000875 uval=80000875 newval=80000871 ret=0
 ld-linux-x86-64-2165  [011] ....   410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=0000
 ld-linux-x86-64-2161  [019] ....   410.760980: SyS_futex: 00007ffbeb76b028: 80000871 ret=ETIMEDOUT

Task 2165 does an UNLOCK_PI, assigning the lock to the waiter task 2161
which then returns with -ETIMEDOUT. That wrecks the lock state, because now
the owner isn't aware it acquired the lock and removes the pending robust
list entry.

If 2161 is killed, the robust list will not clear out this futex and the
subsequent acquire on this futex will then (correctly) result in -ESRCH
which is unexpected by glibc, triggers an internal assertion and dies.

Task 2161			Task 2165

rt_mutex_wait_proxy_lock()
   timeout();
   /* T2161 is still queued in  the waiter list */
   return -ETIMEDOUT;

				futex_unlock_pi()
				spin_lock(hb->lock);
				rtmutex_unlock()
				  remove_rtmutex_waiter(T2161);
				   mark_lock_available();
				/* Make the next waiter owner of the user space side */
				futex_uval = 2161;
				spin_unlock(hb->lock);
spin_lock(hb->lock);
rt_mutex_cleanup_proxy_lock()
  if (rtmutex_owner() !== current)
     ...
     return FAIL;
....
return -ETIMEOUT;

This means that rt_mutex_cleanup_proxy_lock() needs to call
try_to_take_rt_mutex() so it can take over the rtmutex correctly which was
assigned by the waker. If the rtmutex is owned by some other task then this
call is harmless and just confirmes that the waiter is not able to acquire
it.

While there, fix what looks like a merge error which resulted in
rt_mutex_cleanup_proxy_lock() having two calls to
fixup_rt_mutex_waiters() and rt_mutex_wait_proxy_lock() not having any.
Both should have one, since both potentially touch the waiter list.

Fixes: 38d589f2fd08 ("futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()")
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Bug-Spotted-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Link: http://lkml.kernel.org/r/20170519154850.mlomgdsd26drq5j6@hirez.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/locking/rtmutex.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index b955094..28cd09e 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1785,12 +1785,14 @@ int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
 	int ret;
 
 	raw_spin_lock_irq(&lock->wait_lock);
-
-	set_current_state(TASK_INTERRUPTIBLE);
-
 	/* sleep on the mutex */
+	set_current_state(TASK_INTERRUPTIBLE);
 	ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter);
-
+	/*
+	 * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
+	 * have to fix that up.
+	 */
+	fixup_rt_mutex_waiters(lock);
 	raw_spin_unlock_irq(&lock->wait_lock);
 
 	return ret;
@@ -1822,15 +1824,25 @@ bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
 
 	raw_spin_lock_irq(&lock->wait_lock);
 	/*
+	 * Do an unconditional try-lock, this deals with the lock stealing
+	 * state where __rt_mutex_futex_unlock() -> mark_wakeup_next_waiter()
+	 * sets a NULL owner.
+	 *
+	 * We're not interested in the return value, because the subsequent
+	 * test on rt_mutex_owner() will infer that. If the trylock succeeded,
+	 * we will own the lock and it will have removed the waiter. If we
+	 * failed the trylock, we're still not owner and we need to remove
+	 * ourselves.
+	 */
+	try_to_take_rt_mutex(lock, current, waiter);
+	/*
 	 * Unless we're the owner; we're still enqueued on the wait_list.
 	 * So check if we became owner, if not, take us off the wait_list.
 	 */
 	if (rt_mutex_owner(lock) != current) {
 		remove_waiter(lock, waiter);
-		fixup_rt_mutex_waiters(lock);
 		cleanup = true;
 	}
-
 	/*
 	 * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
 	 * have to fix that up.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-05-22 20:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-17 17:36 commit cfafcd117 "futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()" causes glibc nptl/tst-robustpi8 failure Markus Trippelsdorf
2017-05-17 17:50 ` Florian Weimer
2017-05-18  6:46   ` Peter Zijlstra
2017-05-18  6:57     ` Peter Zijlstra
2017-05-18  7:40 ` Peter Zijlstra
2017-05-18  8:04   ` Markus Trippelsdorf
2017-05-18  8:14     ` Peter Zijlstra
2017-05-18  8:12   ` Florian Weimer
2017-05-18  8:31     ` Peter Zijlstra
2017-05-18  8:34       ` Florian Weimer
2017-05-18  8:41         ` Peter Zijlstra
2017-05-18  8:42           ` Florian Weimer
2017-05-18 11:43         ` Peter Zijlstra
2017-05-18 12:30           ` Peter Zijlstra
2017-05-19 15:48         ` Peter Zijlstra
2017-05-19 16:07           ` Peter Zijlstra
2017-05-19 17:35           ` Markus Trippelsdorf
2017-05-22 20:04           ` [tip:locking/urgent] futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock() tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.