All of lore.kernel.org
 help / color / mirror / Atom feed
* futex_cmpxchg_enabled breakage
@ 2018-08-29 22:22 Rich Felker
  2018-08-30  9:19 ` Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Rich Felker @ 2018-08-29 22:22 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel

I just spent a number of hours helping someone track down a bug that
looks like it's some kind of futex_cmpxchg_enabled detection error on
powerpc64 (still not sure of the root cause; set_robust_list producing
-ENOSYS), and a while back I hit the same problem on sh2 due to lack
of EFAULT on nommu, leading to commit 72cc564f16ca. I think the test
(introduced way back in commit a0c1e9073ef7) is fundamentally buggy;
if anything, it should be checking for !=-ENOSYS, not ==-EFAULT.
Presumably it could also fail to produce -EFAULT if mmap_min_addr is 0
and page 0 is mapped (a bad idea, but maybe someone does it...). And
of course other nommu archs are possibly still broken.

Ultimately from an API perspective, the functionality that depends on
futex_cmpxchg_enabled is non-optional, and the current approach of
treating it as something that can be disabled via detection at runtime
is fragile and wrong.

If there are no archs that support SMP but don't provide their own
asm/futex.h (as opposed to the asm-generic one that does -ENOSYS on
SMP), the detection code should just be removed, and the SMP case in
asm-generic/futex.h should be made into #error.

If there are archs that support SMP but don't provide their own
working asm/futex.h, then asm-generic/futex.h's SMP case should be
enhanced to perform a stop-the-world IPI and then do the same thing as
the non-SMP case (disable preemption[/interrupts?], perform the
cmpxchg non-atomically).

Thoughts? Would a patch to do this be acceptable?

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-08-29 22:22 futex_cmpxchg_enabled breakage Rich Felker
@ 2018-08-30  9:19 ` Thomas Gleixner
  2018-08-30 13:55   ` Rich Felker
  2018-08-30 10:39 ` Peter Zijlstra
  2018-09-16 12:16   ` Florian Weimer
  2 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2018-08-30  9:19 UTC (permalink / raw)
  To: Rich Felker; +Cc: LKML, Peter Zijlstra, Ingo Molnar

On Wed, 29 Aug 2018, Rich Felker wrote:

> I just spent a number of hours helping someone track down a bug that
> looks like it's some kind of futex_cmpxchg_enabled detection error on
> powerpc64 (still not sure of the root cause; set_robust_list producing
> -ENOSYS), and a while back I hit the same problem on sh2 due to lack
> of EFAULT on nommu, leading to commit 72cc564f16ca. I think the test
> (introduced way back in commit a0c1e9073ef7) is fundamentally buggy;
> if anything, it should be checking for !=-ENOSYS, not ==-EFAULT.

Errm? This does a futex_cmpxchg() on NULL which has to return EFAULT if
it's available. There is nothing fundamentally buggy about it at all.

> Presumably it could also fail to produce -EFAULT if mmap_min_addr is 0
> and page 0 is mapped (a bad idea, but maybe someone does it...). And
> of course other nommu archs are possibly still broken.

If NULL is mapped in the kernel then a lot of other things are broken. The
futex thing is then the least of your worries.

> Ultimately from an API perspective, the functionality that depends on
> futex_cmpxchg_enabled is non-optional, and the current approach of
> treating it as something that can be disabled via detection at runtime
> is fragile and wrong.

The availibility of the interfaces which depend on futex_cmpxchg_enabled
has been runtime detectable forever and it's documented that way. I have no
idea why you think it's non-optional. If you made it unconditional in your
lib, then it's hardly the kernels problem.

> If there are no archs that support SMP but don't provide their own
> asm/futex.h (as opposed to the asm-generic one that does -ENOSYS on
> SMP), the detection code should just be removed, and the SMP case in
> asm-generic/futex.h should be made into #error.

And why so? Just because?

> If there are archs that support SMP but don't provide their own
> working asm/futex.h, then asm-generic/futex.h's SMP case should be
> enhanced to perform a stop-the-world IPI and then do the same thing as
> the non-SMP case (disable preemption[/interrupts?], perform the
> cmpxchg non-atomically).
> 
> Thoughts? Would a patch to do this be acceptable?

No. There is nothing at all in the world which requires that PI futexes and
robust list are provided and even if you implement that hack in the kernel
then the user space side still does not work because the user space part of
those interfaces has a hard dependency on working cmpxchg as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-08-29 22:22 futex_cmpxchg_enabled breakage Rich Felker
  2018-08-30  9:19 ` Thomas Gleixner
@ 2018-08-30 10:39 ` Peter Zijlstra
  2018-09-16 12:16   ` Florian Weimer
  2 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2018-08-30 10:39 UTC (permalink / raw)
  To: Rich Felker; +Cc: Thomas Gleixner, linux-kernel

On Wed, Aug 29, 2018 at 06:22:21PM -0400, Rich Felker wrote:
> If there are archs that support SMP but don't provide their own

Then throw it out the window and drive over it with a tank.

Seriously, an SMP architecture that doesn't have native cmpxchg (or
equivalent) is not worth anybodies time.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-08-30  9:19 ` Thomas Gleixner
@ 2018-08-30 13:55   ` Rich Felker
  2018-09-15 15:34     ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2018-08-30 13:55 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar

On Thu, Aug 30, 2018 at 11:19:58AM +0200, Thomas Gleixner wrote:
> On Wed, 29 Aug 2018, Rich Felker wrote:
> 
> > I just spent a number of hours helping someone track down a bug that
> > looks like it's some kind of futex_cmpxchg_enabled detection error on
> > powerpc64 (still not sure of the root cause; set_robust_list producing
> > -ENOSYS), and a while back I hit the same problem on sh2 due to lack
> > of EFAULT on nommu, leading to commit 72cc564f16ca. I think the test
> > (introduced way back in commit a0c1e9073ef7) is fundamentally buggy;
> > if anything, it should be checking for !=-ENOSYS, not ==-EFAULT.
> 
> Errm? This does a futex_cmpxchg() on NULL which has to return EFAULT if
> it's available. There is nothing fundamentally buggy about it at all.

I'll let you know when if/when we finish figuring out how this
happened on powerpc64, but it's an arch that most certainly has a
working cmpxchg where these syscalls ended up returning -ENOSYS. This
is an error condition that should not be able to happen. At the very
very least all the archs that actually have a working cmpxchg
unconditionally should be updated with:

	select HAVE_FUTEX_CMPXCHG if FUTEX

so that whatever caused this for powerpc64 doesn't happen again.

> > Presumably it could also fail to produce -EFAULT if mmap_min_addr is 0
> > and page 0 is mapped (a bad idea, but maybe someone does it...). And
> > of course other nommu archs are possibly still broken.
> 
> If NULL is mapped in the kernel then a lot of other things are broken. The
> futex thing is then the least of your worries.

For nommu NULL is always "mapped".

> > Ultimately from an API perspective, the functionality that depends on
> > futex_cmpxchg_enabled is non-optional, and the current approach of
> > treating it as something that can be disabled via detection at runtime
> > is fragile and wrong.
> 
> The availibility of the interfaces which depend on futex_cmpxchg_enabled
> has been runtime detectable forever and it's documented that way. I have no
> idea why you think it's non-optional. If you made it unconditional in your
> lib, then it's hardly the kernels problem.

Modern glibc also defines:

#define __ASSUME_SET_ROBUST_LIST        1

It's then #undef'd for some archs (mips, m68k, sparc, arm), but not
all archs that lack HAVE_FUTEX_CMPXCHG, so glibc seems to be assuming
set_robust_list actually works on some where the kernel is treating
failure as a possibility.

> > If there are no archs that support SMP but don't provide their own
> > asm/futex.h (as opposed to the asm-generic one that does -ENOSYS on
> > SMP), the detection code should just be removed, and the SMP case in
> > asm-generic/futex.h should be made into #error.
> 
> And why so? Just because?

To prevent inadvertent introduction of this issue on new archs (if the
porters don't realize asm-generic/futex.h doesn't actually work for
SMP).

> > If there are archs that support SMP but don't provide their own
> > working asm/futex.h, then asm-generic/futex.h's SMP case should be
> > enhanced to perform a stop-the-world IPI and then do the same thing as
> > the non-SMP case (disable preemption[/interrupts?], perform the
> > cmpxchg non-atomically).
> > 
> > Thoughts? Would a patch to do this be acceptable?
> 
> No. There is nothing at all in the world which requires that PI futexes and
> robust list are provided and even if you implement that hack in the kernel
> then the user space side still does not work because the user space part of
> those interfaces has a hard dependency on working cmpxchg as well.

Of course there's a working cmpxchg; this is a hard requirement for
userspace. You cannot implement pthread primitives without one. On
archs/ISA-levels that lack an insn it's already provided as a syscall,
a vdso/khelper, or trap-and-emulate. The problem I'm complaining about
here is that, despite already needing and having working ways to
achieve this, the kernel is not using them, and breaking functionality
that should work.

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-08-30 13:55   ` Rich Felker
@ 2018-09-15 15:34     ` Thomas Gleixner
  2018-09-15 16:15       ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2018-09-15 15:34 UTC (permalink / raw)
  To: Rich Felker; +Cc: LKML, Peter Zijlstra, Ingo Molnar

On Thu, 30 Aug 2018, Rich Felker wrote:
> On Thu, Aug 30, 2018 at 11:19:58AM +0200, Thomas Gleixner wrote:
> > On Wed, 29 Aug 2018, Rich Felker wrote:
> > 
> > > I just spent a number of hours helping someone track down a bug that
> > > looks like it's some kind of futex_cmpxchg_enabled detection error on
> > > powerpc64 (still not sure of the root cause; set_robust_list producing
> > > -ENOSYS), and a while back I hit the same problem on sh2 due to lack
> > > of EFAULT on nommu, leading to commit 72cc564f16ca. I think the test
> > > (introduced way back in commit a0c1e9073ef7) is fundamentally buggy;
> > > if anything, it should be checking for !=-ENOSYS, not ==-EFAULT.
> > 
> > Errm? This does a futex_cmpxchg() on NULL which has to return EFAULT if
> > it's available. There is nothing fundamentally buggy about it at all.
> 
> I'll let you know when if/when we finish figuring out how this
> happened on powerpc64, but it's an arch that most certainly has a
> working cmpxchg where these syscalls ended up returning -ENOSYS. This
> is an error condition that should not be able to happen. At the very
> very least all the archs that actually have a working cmpxchg
> unconditionally should be updated with:
> 
> 	select HAVE_FUTEX_CMPXCHG if FUTEX
> 
> so that whatever caused this for powerpc64 doesn't happen again.

By doing that you paper over a non functional fixup which could cause other
really hard to decode runtime failures. If that fails there is a bug
somewhere else and that runtime check is not to blame at all for it.

> > > Presumably it could also fail to produce -EFAULT if mmap_min_addr is 0
> > > and page 0 is mapped (a bad idea, but maybe someone does it...). And
> > > of course other nommu archs are possibly still broken.
> > 
> > If NULL is mapped in the kernel then a lot of other things are broken. The
> > futex thing is then the least of your worries.
> 
> For nommu NULL is always "mapped".

Cool, so you can have a NULL pointer dereference without noticing it.

> > The availibility of the interfaces which depend on futex_cmpxchg_enabled
> > has been runtime detectable forever and it's documented that way. I have no
> > idea why you think it's non-optional. If you made it unconditional in your
> > lib, then it's hardly the kernels problem.
> 
> Modern glibc also defines:
> 
> #define __ASSUME_SET_ROBUST_LIST        1
> 
> It's then #undef'd for some archs (mips, m68k, sparc, arm), but not
> all archs that lack HAVE_FUTEX_CMPXCHG, so glibc seems to be assuming
> set_robust_list actually works on some where the kernel is treating
> failure as a possibility.

That's hardly a kernel issue, right?

> > > If there are no archs that support SMP but don't provide their own
> > > asm/futex.h (as opposed to the asm-generic one that does -ENOSYS on
> > > SMP), the detection code should just be removed, and the SMP case in
> > > asm-generic/futex.h should be made into #error.
> > 
> > And why so? Just because?
> 
> To prevent inadvertent introduction of this issue on new archs (if the
> porters don't realize asm-generic/futex.h doesn't actually work for
> SMP).
> 
> > > If there are archs that support SMP but don't provide their own
> > > working asm/futex.h, then asm-generic/futex.h's SMP case should be
> > > enhanced to perform a stop-the-world IPI and then do the same thing as
> > > the non-SMP case (disable preemption[/interrupts?], perform the
> > > cmpxchg non-atomically).
> > > 
> > > Thoughts? Would a patch to do this be acceptable?
> > 
> > No. There is nothing at all in the world which requires that PI futexes and
> > robust list are provided and even if you implement that hack in the kernel
> > then the user space side still does not work because the user space part of
> > those interfaces has a hard dependency on working cmpxchg as well.
> 
> Of course there's a working cmpxchg; this is a hard requirement for
> userspace. You cannot implement pthread primitives without one. On
> archs/ISA-levels that lack an insn it's already provided as a syscall,
> a vdso/khelper, or trap-and-emulate. The problem I'm complaining about
> here is that, despite already needing and having working ways to
> achieve this, the kernel is not using them, and breaking functionality
> that should work.

I kinda agree for the nommu case, but for power64 you are barking up the
wrong tree. If that check fails something very fundamentaly is broken and
that breakage is _NOT_ in the futex code. This has to work independent of
the futex code, really.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-09-15 15:34     ` Thomas Gleixner
@ 2018-09-15 16:15       ` Rich Felker
  2018-09-15 16:58         ` Rich Felker
  2018-09-15 18:39         ` Thomas Gleixner
  0 siblings, 2 replies; 14+ messages in thread
From: Rich Felker @ 2018-09-15 16:15 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar

On Sat, Sep 15, 2018 at 05:34:24PM +0200, Thomas Gleixner wrote:
> On Thu, 30 Aug 2018, Rich Felker wrote:
> > On Thu, Aug 30, 2018 at 11:19:58AM +0200, Thomas Gleixner wrote:
> > > On Wed, 29 Aug 2018, Rich Felker wrote:
> > > 
> > > > I just spent a number of hours helping someone track down a bug that
> > > > looks like it's some kind of futex_cmpxchg_enabled detection error on
> > > > powerpc64 (still not sure of the root cause; set_robust_list producing
> > > > -ENOSYS), and a while back I hit the same problem on sh2 due to lack
> > > > of EFAULT on nommu, leading to commit 72cc564f16ca. I think the test
> > > > (introduced way back in commit a0c1e9073ef7) is fundamentally buggy;
> > > > if anything, it should be checking for !=-ENOSYS, not ==-EFAULT.
> > > 
> > > Errm? This does a futex_cmpxchg() on NULL which has to return EFAULT if
> > > it's available. There is nothing fundamentally buggy about it at all.
> > 
> > I'll let you know when if/when we finish figuring out how this
> > happened on powerpc64, but it's an arch that most certainly has a
> > working cmpxchg where these syscalls ended up returning -ENOSYS. This
> > is an error condition that should not be able to happen. At the very
> > very least all the archs that actually have a working cmpxchg
> > unconditionally should be updated with:
> > 
> > 	select HAVE_FUTEX_CMPXCHG if FUTEX
> > 
> > so that whatever caused this for powerpc64 doesn't happen again.
> 
> By doing that you paper over a non functional fixup which could cause other
> really hard to decode runtime failures. If that fails there is a bug
> somewhere else and that runtime check is not to blame at all for it.

The bug, if it's a bug (lack of memory protection on a system that
explicitly does not have memory protection is not a bug), is unrelated
to futexes, so having futex functionality break is not a very
intuitive way of leading people to find the bug. If kernel developers
really want to consider 0-address-is-mapped a bug (conditional on
CONFIG_MMU) there should be some test separate from futex that does
BUG_ON(attempt to access *0 != EFAULT).

> > > > Presumably it could also fail to produce -EFAULT if mmap_min_addr is 0
> > > > and page 0 is mapped (a bad idea, but maybe someone does it...). And
> > > > of course other nommu archs are possibly still broken.
> > > 
> > > If NULL is mapped in the kernel then a lot of other things are broken. The
> > > futex thing is then the least of your worries.
> > 
> > For nommu NULL is always "mapped".
> 
> Cool, so you can have a NULL pointer dereference without noticing it.

Yes, you can also clobber arbitrary kernel memory from userspace. Fun, eh?

> > > The availibility of the interfaces which depend on futex_cmpxchg_enabled
> > > has been runtime detectable forever and it's documented that way. I have no
> > > idea why you think it's non-optional. If you made it unconditional in your
> > > lib, then it's hardly the kernels problem.
> > 
> > Modern glibc also defines:
> > 
> > #define __ASSUME_SET_ROBUST_LIST        1
> > 
> > It's then #undef'd for some archs (mips, m68k, sparc, arm), but not
> > all archs that lack HAVE_FUTEX_CMPXCHG, so glibc seems to be assuming
> > set_robust_list actually works on some where the kernel is treating
> > failure as a possibility.
> 
> That's hardly a kernel issue, right?

It's an issue of mismatched assumptions about the contract of these
interfaces.

> > > > If there are no archs that support SMP but don't provide their own
> > > > asm/futex.h (as opposed to the asm-generic one that does -ENOSYS on
> > > > SMP), the detection code should just be removed, and the SMP case in
> > > > asm-generic/futex.h should be made into #error.
> > > 
> > > And why so? Just because?
> > 
> > To prevent inadvertent introduction of this issue on new archs (if the
> > porters don't realize asm-generic/futex.h doesn't actually work for
> > SMP).
> > 
> > > > If there are archs that support SMP but don't provide their own
> > > > working asm/futex.h, then asm-generic/futex.h's SMP case should be
> > > > enhanced to perform a stop-the-world IPI and then do the same thing as
> > > > the non-SMP case (disable preemption[/interrupts?], perform the
> > > > cmpxchg non-atomically).
> > > > 
> > > > Thoughts? Would a patch to do this be acceptable?
> > > 
> > > No. There is nothing at all in the world which requires that PI futexes and
> > > robust list are provided and even if you implement that hack in the kernel
> > > then the user space side still does not work because the user space part of
> > > those interfaces has a hard dependency on working cmpxchg as well.
> > 
> > Of course there's a working cmpxchg; this is a hard requirement for
> > userspace. You cannot implement pthread primitives without one. On
> > archs/ISA-levels that lack an insn it's already provided as a syscall,
> > a vdso/khelper, or trap-and-emulate. The problem I'm complaining about
> > here is that, despite already needing and having working ways to
> > achieve this, the kernel is not using them, and breaking functionality
> > that should work.
> 
> I kinda agree for the nommu case, but for power64 you are barking up the
> wrong tree. If that check fails something very fundamentaly is broken and
> that breakage is _NOT_ in the futex code. This has to work independent of
> the futex code, really.

I agree, but it would have been nice for that to be caught by BUG_ON,
not subtle failures in robust mutex functionality that would go
completely unnoticed in production until something broke horribly if
not for the affected user having been running conformance tests.

For musl I will probably add code to pthread_mutexattr_setrobust that
probes whether SYS_get_robust_list fails, and errors out on attempts
to produce an attribute object with the robust flag set if the kernel
cannot support it, since there is no way to safely make forward
progress if the SYS_set_robust_list call fails later. But I was not
aware of the possibility of failure here, and apparently the glibc
folks were not either.

In any case, morally it "should not" be able to fail -- some sort of
working cmpxchg is needed all over the place, at least in userspace
and likely also in kernelspace, and if the cpu lacks a direct way of
implementing it, whatever form of emulation is used elsewhere should
be used here.

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-09-15 16:15       ` Rich Felker
@ 2018-09-15 16:58         ` Rich Felker
  2018-09-15 18:39         ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Rich Felker @ 2018-09-15 16:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, linux-man,
	Michael Kerrisk (man-pages)

On Sat, Sep 15, 2018 at 12:15:54PM -0400, Rich Felker wrote:
> On Sat, Sep 15, 2018 at 05:34:24PM +0200, Thomas Gleixner wrote:
> > On Thu, 30 Aug 2018, Rich Felker wrote:
> > > On Thu, Aug 30, 2018 at 11:19:58AM +0200, Thomas Gleixner wrote:
> > > > On Wed, 29 Aug 2018, Rich Felker wrote:
> > > > 
> > > > > I just spent a number of hours helping someone track down a bug that
> > > > > looks like it's some kind of futex_cmpxchg_enabled detection error on
> > > > > powerpc64 (still not sure of the root cause; set_robust_list producing
> > > > > -ENOSYS), and a while back I hit the same problem on sh2 due to lack
> > > > > of EFAULT on nommu, leading to commit 72cc564f16ca. I think the test
> > > > > (introduced way back in commit a0c1e9073ef7) is fundamentally buggy;
> > > > > if anything, it should be checking for !=-ENOSYS, not ==-EFAULT.
> > > > 
> > > > Errm? This does a futex_cmpxchg() on NULL which has to return EFAULT if
> > > > it's available. There is nothing fundamentally buggy about it at all.
> > > 
> > > I'll let you know when if/when we finish figuring out how this
> > > happened on powerpc64, but it's an arch that most certainly has a
> > > working cmpxchg where these syscalls ended up returning -ENOSYS. This
> > > is an error condition that should not be able to happen. At the very
> > > very least all the archs that actually have a working cmpxchg
> > > unconditionally should be updated with:
> > > 
> > > 	select HAVE_FUTEX_CMPXCHG if FUTEX
> > > 
> > > so that whatever caused this for powerpc64 doesn't happen again.
> > 
> > By doing that you paper over a non functional fixup which could cause other
> > really hard to decode runtime failures. If that fails there is a bug
> > somewhere else and that runtime check is not to blame at all for it.
> 
> The bug, if it's a bug (lack of memory protection on a system that
> explicitly does not have memory protection is not a bug), is unrelated
> to futexes, so having futex functionality break is not a very
> intuitive way of leading people to find the bug. If kernel developers
> really want to consider 0-address-is-mapped a bug (conditional on
> CONFIG_MMU) there should be some test separate from futex that does
> BUG_ON(attempt to access *0 != EFAULT).
> 
> > > > > Presumably it could also fail to produce -EFAULT if mmap_min_addr is 0
> > > > > and page 0 is mapped (a bad idea, but maybe someone does it...). And
> > > > > of course other nommu archs are possibly still broken.
> > > > 
> > > > If NULL is mapped in the kernel then a lot of other things are broken. The
> > > > futex thing is then the least of your worries.
> > > 
> > > For nommu NULL is always "mapped".
> > 
> > Cool, so you can have a NULL pointer dereference without noticing it.
> 
> Yes, you can also clobber arbitrary kernel memory from userspace. Fun, eh?
> 
> > > > The availibility of the interfaces which depend on futex_cmpxchg_enabled
> > > > has been runtime detectable forever and it's documented that way. I have no
> > > > idea why you think it's non-optional. If you made it unconditional in your
> > > > lib, then it's hardly the kernels problem.
> > > 
> > > Modern glibc also defines:
> > > 
> > > #define __ASSUME_SET_ROBUST_LIST        1
> > > 
> > > It's then #undef'd for some archs (mips, m68k, sparc, arm), but not
> > > all archs that lack HAVE_FUTEX_CMPXCHG, so glibc seems to be assuming
> > > set_robust_list actually works on some where the kernel is treating
> > > failure as a possibility.
> > 
> > That's hardly a kernel issue, right?
> 
> It's an issue of mismatched assumptions about the contract of these
> interfaces.

Also, the man pages do not document ENOSYS as a possible error (of
course it's implicitly possible for old kernels that don't support the
syscall, but this is different IMO). See:

http://man7.org/linux/man-pages/man2/get_robust_list.2.html

Cc'ing linux-man and Michael Kerrisk since this should probably be
fixed in the documentation if the kernel behavior is not going to be
changed (and even if it does change, the behavior on old kernels
should probably be documented).

Rich



> > > > > If there are no archs that support SMP but don't provide their own
> > > > > asm/futex.h (as opposed to the asm-generic one that does -ENOSYS on
> > > > > SMP), the detection code should just be removed, and the SMP case in
> > > > > asm-generic/futex.h should be made into #error.
> > > > 
> > > > And why so? Just because?
> > > 
> > > To prevent inadvertent introduction of this issue on new archs (if the
> > > porters don't realize asm-generic/futex.h doesn't actually work for
> > > SMP).
> > > 
> > > > > If there are archs that support SMP but don't provide their own
> > > > > working asm/futex.h, then asm-generic/futex.h's SMP case should be
> > > > > enhanced to perform a stop-the-world IPI and then do the same thing as
> > > > > the non-SMP case (disable preemption[/interrupts?], perform the
> > > > > cmpxchg non-atomically).
> > > > > 
> > > > > Thoughts? Would a patch to do this be acceptable?
> > > > 
> > > > No. There is nothing at all in the world which requires that PI futexes and
> > > > robust list are provided and even if you implement that hack in the kernel
> > > > then the user space side still does not work because the user space part of
> > > > those interfaces has a hard dependency on working cmpxchg as well.
> > > 
> > > Of course there's a working cmpxchg; this is a hard requirement for
> > > userspace. You cannot implement pthread primitives without one. On
> > > archs/ISA-levels that lack an insn it's already provided as a syscall,
> > > a vdso/khelper, or trap-and-emulate. The problem I'm complaining about
> > > here is that, despite already needing and having working ways to
> > > achieve this, the kernel is not using them, and breaking functionality
> > > that should work.
> > 
> > I kinda agree for the nommu case, but for power64 you are barking up the
> > wrong tree. If that check fails something very fundamentaly is broken and
> > that breakage is _NOT_ in the futex code. This has to work independent of
> > the futex code, really.
> 
> I agree, but it would have been nice for that to be caught by BUG_ON,
> not subtle failures in robust mutex functionality that would go
> completely unnoticed in production until something broke horribly if
> not for the affected user having been running conformance tests.
> 
> For musl I will probably add code to pthread_mutexattr_setrobust that
> probes whether SYS_get_robust_list fails, and errors out on attempts
> to produce an attribute object with the robust flag set if the kernel
> cannot support it, since there is no way to safely make forward
> progress if the SYS_set_robust_list call fails later. But I was not
> aware of the possibility of failure here, and apparently the glibc
> folks were not either.
> 
> In any case, morally it "should not" be able to fail -- some sort of
> working cmpxchg is needed all over the place, at least in userspace
> and likely also in kernelspace, and if the cpu lacks a direct way of
> implementing it, whatever form of emulation is used elsewhere should
> be used here.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-09-15 16:15       ` Rich Felker
  2018-09-15 16:58         ` Rich Felker
@ 2018-09-15 18:39         ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2018-09-15 18:39 UTC (permalink / raw)
  To: Rich Felker; +Cc: LKML, Peter Zijlstra, Ingo Molnar

On Sat, 15 Sep 2018, Rich Felker wrote:
> On Sat, Sep 15, 2018 at 05:34:24PM +0200, Thomas Gleixner wrote:
> > By doing that you paper over a non functional fixup which could cause other
> > really hard to decode runtime failures. If that fails there is a bug
> > somewhere else and that runtime check is not to blame at all for it.
> 
> The bug, if it's a bug (lack of memory protection on a system that
> explicitly does not have memory protection is not a bug), is unrelated
> to futexes, so having futex functionality break is not a very
> intuitive way of leading people to find the bug. If kernel developers
> really want to consider 0-address-is-mapped a bug (conditional on
> CONFIG_MMU) there should be some test separate from futex that does
> BUG_ON(attempt to access *0 != EFAULT).

Fair enough.

> > Cool, so you can have a NULL pointer dereference without noticing it.
> 
> Yes, you can also clobber arbitrary kernel memory from userspace. Fun, eh?

I was assuming that the mmu-less CPUs Linux runs on at least have a MPU
nowadays.

> > That's hardly a kernel issue, right?
> 
> It's an issue of mismatched assumptions about the contract of these
> interfaces.

ENOSYS is a general contract.

> > I kinda agree for the nommu case, but for power64 you are barking up the
> > wrong tree. If that check fails something very fundamentaly is broken and
> > that breakage is _NOT_ in the futex code. This has to work independent of
> > the futex code, really.
> 
> I agree, but it would have been nice for that to be caught by BUG_ON,
> not subtle failures in robust mutex functionality that would go
> completely unnoticed in production until something broke horribly if
> not for the affected user having been running conformance tests.
> 
> For musl I will probably add code to pthread_mutexattr_setrobust that
> probes whether SYS_get_robust_list fails, and errors out on attempts
> to produce an attribute object with the robust flag set if the kernel
> cannot support it, since there is no way to safely make forward
> progress if the SYS_set_robust_list call fails later. But I was not
> aware of the possibility of failure here, and apparently the glibc
> folks were not either.

They surely were when robust list and PI futex were introduced. That check
got removed somewhere on the way for whatever reason.

> In any case, morally it "should not" be able to fail -- some sort of
> working cmpxchg is needed all over the place, at least in userspace
> and likely also in kernelspace, and if the cpu lacks a direct way of
> implementing it, whatever form of emulation is used elsewhere should
> be used here.

There is some history behind this test. In the early days when robust
futexes and PI futexes were introduced we still had 386 around which did
not have cmpxchg and there was no real requirement to emulate it. Other
archictures like MIPS followed that and aside of the NOMMU/NOMPU case it
simply should just work.

We surely can revisit this on technical grounds, but I really fail to see
any moral issues with a syscall returning -ENOSYS. Futexes can be compiled
out completely and futex_pi is conditional as well.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-08-29 22:22 futex_cmpxchg_enabled breakage Rich Felker
@ 2018-09-16 12:16   ` Florian Weimer
  2018-08-30 10:39 ` Peter Zijlstra
  2018-09-16 12:16   ` Florian Weimer
  2 siblings, 0 replies; 14+ messages in thread
From: Florian Weimer @ 2018-09-16 12:16 UTC (permalink / raw)
  To: Rich Felker
  Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar,
	linux-man, Michael Kerrisk (man-pages)

* Rich Felker:

> I just spent a number of hours helping someone track down a bug that
> looks like it's some kind of futex_cmpxchg_enabled detection error on
> powerpc64 (still not sure of the root cause; set_robust_list producing
> -ENOSYS), and a while back I hit the same problem on sh2 due to lack
> of EFAULT on nommu, leading to commit 72cc564f16ca. I think the test
> (introduced way back in commit a0c1e9073ef7) is fundamentally buggy;
> if anything, it should be checking for !=-ENOSYS, not ==-EFAULT.
> Presumably it could also fail to produce -EFAULT if mmap_min_addr is 0
> and page 0 is mapped (a bad idea, but maybe someone does it...). And
> of course other nommu archs are possibly still broken.

Maybe it was related to this (“Kernel 4.15 lost set_robust_list
support on POWER 9”):

<https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-February/168570.html>

The Kconfig change you suggest was explicitly rejected as the fix.

I believe the expected userspace interface is that you probe support
with set_robust_list first, and then start using the relevant futex
interfaces only if that call succeeded.  If you do that, most parts of
a typical system will work as expected even if the kernel support is
not there, which is a bit surprising.  It definitely makes the root
cause harder to spot.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
@ 2018-09-16 12:16   ` Florian Weimer
  0 siblings, 0 replies; 14+ messages in thread
From: Florian Weimer @ 2018-09-16 12:16 UTC (permalink / raw)
  To: Rich Felker
  Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar,
	linux-man, Michael Kerrisk (man-pages)

* Rich Felker:

> I just spent a number of hours helping someone track down a bug that
> looks like it's some kind of futex_cmpxchg_enabled detection error on
> powerpc64 (still not sure of the root cause; set_robust_list producing
> -ENOSYS), and a while back I hit the same problem on sh2 due to lack
> of EFAULT on nommu, leading to commit 72cc564f16ca. I think the test
> (introduced way back in commit a0c1e9073ef7) is fundamentally buggy;
> if anything, it should be checking for !=-ENOSYS, not ==-EFAULT.
> Presumably it could also fail to produce -EFAULT if mmap_min_addr is 0
> and page 0 is mapped (a bad idea, but maybe someone does it...). And
> of course other nommu archs are possibly still broken.

Maybe it was related to this (“Kernel 4.15 lost set_robust_list
support on POWER 9”):

<https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-February/168570.html>

The Kconfig change you suggest was explicitly rejected as the fix.

I believe the expected userspace interface is that you probe support
with set_robust_list first, and then start using the relevant futex
interfaces only if that call succeeded.  If you do that, most parts of
a typical system will work as expected even if the kernel support is
not there, which is a bit surprising.  It definitely makes the root
cause harder to spot.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-09-16 12:16   ` Florian Weimer
  (?)
@ 2018-09-16 13:16   ` Rich Felker
  2018-09-16 13:38       ` Florian Weimer
  -1 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2018-09-16 13:16 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar,
	linux-man, Michael Kerrisk (man-pages)

On Sun, Sep 16, 2018 at 02:16:25PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > I just spent a number of hours helping someone track down a bug that
> > looks like it's some kind of futex_cmpxchg_enabled detection error on
> > powerpc64 (still not sure of the root cause; set_robust_list producing
> > -ENOSYS), and a while back I hit the same problem on sh2 due to lack
> > of EFAULT on nommu, leading to commit 72cc564f16ca. I think the test
> > (introduced way back in commit a0c1e9073ef7) is fundamentally buggy;
> > if anything, it should be checking for !=-ENOSYS, not ==-EFAULT.
> > Presumably it could also fail to produce -EFAULT if mmap_min_addr is 0
> > and page 0 is mapped (a bad idea, but maybe someone does it...). And
> > of course other nommu archs are possibly still broken.
> 
> Maybe it was related to this (“Kernel 4.15 lost set_robust_list
> support on POWER 9”):
> 
> <https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-February/168570.html>

Thanks! This is really helpful; I'll pass it along.

> The Kconfig change you suggest was explicitly rejected as the fix.

Even if it was rejected as a fix for this specific bug, I think it
would be a good change for other reasons. If nothing else it reduces
kernel code size and eliminates a few instructions in some (albeit
long) hot paths.

> I believe the expected userspace interface is that you probe support
> with set_robust_list first, and then start using the relevant futex
> interfaces only if that call succeeded.

In order for it to work, set_robust_list needs to succeed for all
threads, present and future, so there's an implicit contract needed
here that, if it succeeds once, it needs to always succeed. This is
satisfied by the kernel implementation. FWIW I adopted a fix using
get_robust_list to probe just to avoid coupling the internals of
setting up the list with the bland function
pthread_mutexattr_setrobust which is unaware of any implementation
details except the location of the attribute bit.

Presumably a similar probing should happen in
pthread_mutexattr_setprotocol for PI mutex support. Does glibc do
this? musl still lacks PI mutex support so I'll save this as a note
for when it's added.

> If you do that, most parts of
> a typical system will work as expected even if the kernel support is
> not there, which is a bit surprising.  It definitely makes the root
> cause harder to spot.

I don't follow here. "most parts of a typical system will work as
expected" seems to be the case whether you do or don't correctly
probe. The only difference is whether a program that carefully checks
for errors will see and report that pthread_mutexattr_setrobust
failed.

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-09-16 13:16   ` Rich Felker
@ 2018-09-16 13:38       ` Florian Weimer
  0 siblings, 0 replies; 14+ messages in thread
From: Florian Weimer @ 2018-09-16 13:38 UTC (permalink / raw)
  To: Rich Felker
  Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar,
	linux-man, Michael Kerrisk (man-pages)

* Rich Felker:

>> I believe the expected userspace interface is that you probe support
>> with set_robust_list first, and then start using the relevant futex
>> interfaces only if that call succeeded.
>
> In order for it to work, set_robust_list needs to succeed for all
> threads, present and future, so there's an implicit contract needed
> here that, if it succeeds once, it needs to always succeed. This is
> satisfied by the kernel implementation.

It certainly makes simpler if set_robust_list cannot fail due to
resource allocation issues.

> Presumably a similar probing should happen in
> pthread_mutexattr_setprotocol for PI mutex support. Does glibc do
> this? musl still lacks PI mutex support so I'll save this as a note
> for when it's added.

glibc currently implements checking for support in pthread_mutex_init,
presumably due to the fact that some invalid attribute/flag
combinations can only reasonably detected at that point.  It makes
probing for support slightly more difficult, of course.

>> If you do that, most parts of
>> a typical system will work as expected even if the kernel support is
>> not there, which is a bit surprising.  It definitely makes the root
>> cause harder to spot.
>
> I don't follow here. "most parts of a typical system will work as
> expected" seems to be the case whether you do or don't correctly
> probe. The only difference is whether a program that carefully checks
> for errors will see and report that pthread_mutexattr_setrobust
> failed.

This may be the case.  We only ever had the glibc test failures as
evidence that something was quite wrong, despite ongoing validation of
the system.  But this could have been accident due to an invalid test
environment.  (The product in question is only supposed to support the
radix MMU, but when running under KVM, the kernel switches to the hash
MMU instead, which masks the presence of the bug—set_robust_list is
magically available again.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
@ 2018-09-16 13:38       ` Florian Weimer
  0 siblings, 0 replies; 14+ messages in thread
From: Florian Weimer @ 2018-09-16 13:38 UTC (permalink / raw)
  To: Rich Felker
  Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar,
	linux-man, Michael Kerrisk (man-pages)

* Rich Felker:

>> I believe the expected userspace interface is that you probe support
>> with set_robust_list first, and then start using the relevant futex
>> interfaces only if that call succeeded.
>
> In order for it to work, set_robust_list needs to succeed for all
> threads, present and future, so there's an implicit contract needed
> here that, if it succeeds once, it needs to always succeed. This is
> satisfied by the kernel implementation.

It certainly makes simpler if set_robust_list cannot fail due to
resource allocation issues.

> Presumably a similar probing should happen in
> pthread_mutexattr_setprotocol for PI mutex support. Does glibc do
> this? musl still lacks PI mutex support so I'll save this as a note
> for when it's added.

glibc currently implements checking for support in pthread_mutex_init,
presumably due to the fact that some invalid attribute/flag
combinations can only reasonably detected at that point.  It makes
probing for support slightly more difficult, of course.

>> If you do that, most parts of
>> a typical system will work as expected even if the kernel support is
>> not there, which is a bit surprising.  It definitely makes the root
>> cause harder to spot.
>
> I don't follow here. "most parts of a typical system will work as
> expected" seems to be the case whether you do or don't correctly
> probe. The only difference is whether a program that carefully checks
> for errors will see and report that pthread_mutexattr_setrobust
> failed.

This may be the case.  We only ever had the glibc test failures as
evidence that something was quite wrong, despite ongoing validation of
the system.  But this could have been accident due to an invalid test
environment.  (The product in question is only supposed to support the
radix MMU, but when running under KVM, the kernel switches to the hash
MMU instead, which masks the presence of the bug—set_robust_list is
magically available again.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: futex_cmpxchg_enabled breakage
  2018-09-16 13:38       ` Florian Weimer
  (?)
@ 2018-09-17 16:51       ` Rich Felker
  -1 siblings, 0 replies; 14+ messages in thread
From: Rich Felker @ 2018-09-17 16:51 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar,
	linux-man, Michael Kerrisk (man-pages)

On Sun, Sep 16, 2018 at 03:38:44PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> >> I believe the expected userspace interface is that you probe support
> >> with set_robust_list first, and then start using the relevant futex
> >> interfaces only if that call succeeded.
> >
> > In order for it to work, set_robust_list needs to succeed for all
> > threads, present and future, so there's an implicit contract needed
> > here that, if it succeeds once, it needs to always succeed. This is
> > satisfied by the kernel implementation.
> 
> It certainly makes simpler if set_robust_list cannot fail due to
> resource allocation issues.
> 
> > Presumably a similar probing should happen in
> > pthread_mutexattr_setprotocol for PI mutex support. Does glibc do
> > this? musl still lacks PI mutex support so I'll save this as a note
> > for when it's added.
> 
> glibc currently implements checking for support in pthread_mutex_init,
> presumably due to the fact that some invalid attribute/flag
> combinations can only reasonably detected at that point.  It makes
> probing for support slightly more difficult, of course.
> 
> >> If you do that, most parts of
> >> a typical system will work as expected even if the kernel support is
> >> not there, which is a bit surprising.  It definitely makes the root
> >> cause harder to spot.
> >
> > I don't follow here. "most parts of a typical system will work as
> > expected" seems to be the case whether you do or don't correctly
> > probe. The only difference is whether a program that carefully checks
> > for errors will see and report that pthread_mutexattr_setrobust
> > failed.
> 
> This may be the case.  We only ever had the glibc test failures as
> evidence that something was quite wrong, despite ongoing validation of
> the system.  But this could have been accident due to an invalid test
> environment.  (The product in question is only supposed to support the
> radix MMU, but when running under KVM, the kernel switches to the hash
> MMU instead, which masks the presence of the bug—set_robust_list is
> magically available again.)

BTW here's a horrible thought: can the availability of set_robust_list
change across checkpoint/restore? If so that's fundamental breakage in
the checkpoint/restore functionality, and a good reason to make it so
this functionality is not runtime-variable for a given kernel.

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-09-17 16:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-29 22:22 futex_cmpxchg_enabled breakage Rich Felker
2018-08-30  9:19 ` Thomas Gleixner
2018-08-30 13:55   ` Rich Felker
2018-09-15 15:34     ` Thomas Gleixner
2018-09-15 16:15       ` Rich Felker
2018-09-15 16:58         ` Rich Felker
2018-09-15 18:39         ` Thomas Gleixner
2018-08-30 10:39 ` Peter Zijlstra
2018-09-16 12:16 ` Florian Weimer
2018-09-16 12:16   ` Florian Weimer
2018-09-16 13:16   ` Rich Felker
2018-09-16 13:38     ` Florian Weimer
2018-09-16 13:38       ` Florian Weimer
2018-09-17 16:51       ` Rich Felker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.