linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pthread_kill.3: Update to match POSIX.
@ 2019-11-12 20:36 enh
  2019-11-12 21:38 ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: enh @ 2019-11-12 20:36 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: linux-man

[-- Attachment #1: Type: text/plain, Size: 2279 bytes --]

POSIX removed ESRCH years ago.

In resolving http://austingroupbugs.net/view.php?id=1214 it was made
clear that callers can't rely on using signal 0 to test for the
continued existence of a thread. Update the man page to make it clearer
that this doesn't generally work (even if it sometimes seems to).

See also the long explanation of why this is the case (and how to fix
your code) here:

https://android.googlesource.com/platform/bionic/+/master/docs/status.md#invalid-handling-targetsdkversion-o
---
 man3/pthread_kill.3 | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/man3/pthread_kill.3 b/man3/pthread_kill.3
index e70e2669e..fb27afd24 100644
--- a/man3/pthread_kill.3
+++ b/man3/pthread_kill.3
@@ -56,10 +56,6 @@ to
 a thread in the same process as the caller.
 The signal is asynchronously directed to
 .IR thread .
-.PP
-If
-.I sig
-is 0, then no signal is sent, but error checking is still performed.
 .SH RETURN VALUE
 On success,
 .BR pthread_kill ()
@@ -93,26 +89,23 @@ this action will affect the whole process.
 .PP
 The glibc implementation of
 .BR pthread_kill ()
-gives an error
-.RB ( EINVAL )
+gives the error
+.B EINVAL
 on attempts to send either of the real-time signals
 used internally by the NPTL threading implementation.
 See
 .BR nptl (7)
 for details.
 .PP
-POSIX.1-2008 recommends that if an implementation detects the use
-of a thread ID after the end of its lifetime,
+The glibc implementation of
 .BR pthread_kill ()
-should return the error
-.BR ESRCH .
-The glibc implementation returns this error in the cases where
-an invalid thread ID can be detected.
-But note also that POSIX says that an attempt to use a thread ID whose
-lifetime has ended produces undefined behavior,
-and an attempt to use an invalid thread ID in a call to
+tries to give the error
+.B ESRCH
+on attempts to use an invalid thread ID, but this isn't always possible.
+An attempt to use an invalid thread ID in a call to
 .BR pthread_kill ()
-can, for example, cause a segmentation fault.
+can, for example, cause a segmentation fault. Android is stricter, and will
+always abort when a pthread function is given an invalid thread ID.
 .SH SEE ALSO
 .BR kill (2),
 .BR sigaction (2),
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog

[-- Attachment #2: 0001-pthread_kill.3-Update-to-match-POSIX.patch --]
[-- Type: text/x-patch, Size: 2484 bytes --]

From 656249792d1782e2d8ca581663ee88b19181b084 Mon Sep 17 00:00:00 2001
From: Elliott Hughes <enh@google.com>
Date: Tue, 12 Nov 2019 12:19:52 -0800
Subject: [PATCH] pthread_kill.3: Update to match POSIX.

POSIX removed ESRCH years ago.

In resolving http://austingroupbugs.net/view.php?id=1214 it was made
clear that callers can't rely on using signal 0 to test for the
continued existence of a thread. Update the man page to make it clearer
that this doesn't generally work (even if it sometimes seems to).

See also the long explanation of why this is the case (and how to fix
your code) here:

https://android.googlesource.com/platform/bionic/+/master/docs/status.md#invalid-handling-targetsdkversion-o
---
 man3/pthread_kill.3 | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/man3/pthread_kill.3 b/man3/pthread_kill.3
index e70e2669e..fb27afd24 100644
--- a/man3/pthread_kill.3
+++ b/man3/pthread_kill.3
@@ -56,10 +56,6 @@ to
 a thread in the same process as the caller.
 The signal is asynchronously directed to
 .IR thread .
-.PP
-If
-.I sig
-is 0, then no signal is sent, but error checking is still performed.
 .SH RETURN VALUE
 On success,
 .BR pthread_kill ()
@@ -93,26 +89,23 @@ this action will affect the whole process.
 .PP
 The glibc implementation of
 .BR pthread_kill ()
-gives an error
-.RB ( EINVAL )
+gives the error
+.B EINVAL
 on attempts to send either of the real-time signals
 used internally by the NPTL threading implementation.
 See
 .BR nptl (7)
 for details.
 .PP
-POSIX.1-2008 recommends that if an implementation detects the use
-of a thread ID after the end of its lifetime,
+The glibc implementation of
 .BR pthread_kill ()
-should return the error
-.BR ESRCH .
-The glibc implementation returns this error in the cases where
-an invalid thread ID can be detected.
-But note also that POSIX says that an attempt to use a thread ID whose
-lifetime has ended produces undefined behavior,
-and an attempt to use an invalid thread ID in a call to
+tries to give the error
+.B ESRCH
+on attempts to use an invalid thread ID, but this isn't always possible.
+An attempt to use an invalid thread ID in a call to
 .BR pthread_kill ()
-can, for example, cause a segmentation fault.
+can, for example, cause a segmentation fault. Android is stricter, and will
+always abort when a pthread function is given an invalid thread ID.
 .SH SEE ALSO
 .BR kill (2),
 .BR sigaction (2),
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-12 20:36 [PATCH] pthread_kill.3: Update to match POSIX enh
@ 2019-11-12 21:38 ` Florian Weimer
  2019-11-12 21:40   ` enh
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2019-11-12 21:38 UTC (permalink / raw)
  To: enh; +Cc: Michael Kerrisk (man-pages), linux-man

* enh:

> POSIX removed ESRCH years ago.
>
> In resolving http://austingroupbugs.net/view.php?id=1214 it was made
> clear that callers can't rely on using signal 0 to test for the
> continued existence of a thread. Update the man page to make it clearer
> that this doesn't generally work (even if it sometimes seems to).
>
> See also the long explanation of why this is the case (and how to fix
> your code) here:
>
> https://android.googlesource.com/platform/bionic/+/master/docs/status.md#invalid-handling-targetsdkversion-o

Well, if you fix the thread exit race (like musl did, and glibc should
as well, see bug 12889), you could get a reliable ESRCH as a side
effect.  Pity that POSIX doesn't allow that.

I think this might be a case where common (but not unavoidable)
implementation problems get in the way of a better standard.  Usually,
it's the other way round.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-12 21:38 ` Florian Weimer
@ 2019-11-12 21:40   ` enh
  2019-11-12 21:52     ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: enh @ 2019-11-12 21:40 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Michael Kerrisk (man-pages), linux-man

On Tue, Nov 12, 2019 at 1:38 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * enh:
>
> > POSIX removed ESRCH years ago.
> >
> > In resolving http://austingroupbugs.net/view.php?id=1214 it was made
> > clear that callers can't rely on using signal 0 to test for the
> > continued existence of a thread. Update the man page to make it clearer
> > that this doesn't generally work (even if it sometimes seems to).
> >
> > See also the long explanation of why this is the case (and how to fix
> > your code) here:
> >
> > https://android.googlesource.com/platform/bionic/+/master/docs/status.md#invalid-handling-targetsdkversion-o
>
> Well, if you fix the thread exit race (like musl did, and glibc should
> as well, see bug 12889), you could get a reliable ESRCH as a side
> effect.  Pity that POSIX doesn't allow that.

this isn't about the tid stored *in* the object that the pthread_t points to.

like i (briefly) said in the commit message, this is because a
pthread_t is a pointer, so if you have an old pthread_t that's been
recycled... boom!

> I think this might be a case where common (but not unavoidable)
> implementation problems get in the way of a better standard.  Usually,
> it's the other way round.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-12 21:40   ` enh
@ 2019-11-12 21:52     ` Florian Weimer
  2019-11-12 22:06       ` enh
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2019-11-12 21:52 UTC (permalink / raw)
  To: enh; +Cc: Michael Kerrisk (man-pages), linux-man

* enh:

> On Tue, Nov 12, 2019 at 1:38 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>>
>> * enh:
>>
>> > POSIX removed ESRCH years ago.
>> >
>> > In resolving http://austingroupbugs.net/view.php?id=1214 it was made
>> > clear that callers can't rely on using signal 0 to test for the
>> > continued existence of a thread. Update the man page to make it clearer
>> > that this doesn't generally work (even if it sometimes seems to).
>> >
>> > See also the long explanation of why this is the case (and how to fix
>> > your code) here:
>> >
>> > https://android.googlesource.com/platform/bionic/+/master/docs/status.md#invalid-handling-targetsdkversion-o
>>
>> Well, if you fix the thread exit race (like musl did, and glibc should
>> as well, see bug 12889), you could get a reliable ESRCH as a side
>> effect.  Pity that POSIX doesn't allow that.
>
> this isn't about the tid stored *in* the object that the pthread_t points to.
>
> like i (briefly) said in the commit message, this is because a
> pthread_t is a pointer, so if you have an old pthread_t that's been
> recycled... boom!

Backing storage for a pthread_t object denoting a joinable thread
cannot be recycled, so that's not the case here.  POSIX mandates
returning success even if the implementation has detected that it must
not send the signal because the thread has already terminated.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-12 21:52     ` Florian Weimer
@ 2019-11-12 22:06       ` enh
  2019-11-12 22:11         ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: enh @ 2019-11-12 22:06 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Michael Kerrisk (man-pages), linux-man

On Tue, Nov 12, 2019 at 1:52 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * enh:
>
> > On Tue, Nov 12, 2019 at 1:38 PM Florian Weimer <fw@deneb.enyo.de> wrote:
> >>
> >> * enh:
> >>
> >> > POSIX removed ESRCH years ago.
> >> >
> >> > In resolving http://austingroupbugs.net/view.php?id=1214 it was made
> >> > clear that callers can't rely on using signal 0 to test for the
> >> > continued existence of a thread. Update the man page to make it clearer
> >> > that this doesn't generally work (even if it sometimes seems to).
> >> >
> >> > See also the long explanation of why this is the case (and how to fix
> >> > your code) here:
> >> >
> >> > https://android.googlesource.com/platform/bionic/+/master/docs/status.md#invalid-handling-targetsdkversion-o
> >>
> >> Well, if you fix the thread exit race (like musl did, and glibc should
> >> as well, see bug 12889), you could get a reliable ESRCH as a side
> >> effect.  Pity that POSIX doesn't allow that.
> >
> > this isn't about the tid stored *in* the object that the pthread_t points to.
> >
> > like i (briefly) said in the commit message, this is because a
> > pthread_t is a pointer, so if you have an old pthread_t that's been
> > recycled... boom!
>
> Backing storage for a pthread_t object denoting a joinable thread
> cannot be recycled, so that's not the case here.  POSIX mandates
> returning success even if the implementation has detected that it must
> not send the signal because the thread has already terminated.

who said anything about joinable?

the cases we've seen in practice are that folks incorrectly believe
that pthread_kill(3) with a signal of 0 is a reliable way to test
whether a thread is still running.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-12 22:06       ` enh
@ 2019-11-12 22:11         ` Florian Weimer
  2019-11-12 22:22           ` enh
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2019-11-12 22:11 UTC (permalink / raw)
  To: enh; +Cc: Michael Kerrisk (man-pages), linux-man

* enh:

> On Tue, Nov 12, 2019 at 1:52 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>>
>> * enh:
>>
>> > On Tue, Nov 12, 2019 at 1:38 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>> >>
>> >> * enh:
>> >>
>> >> > POSIX removed ESRCH years ago.
>> >> >
>> >> > In resolving http://austingroupbugs.net/view.php?id=1214 it was made
>> >> > clear that callers can't rely on using signal 0 to test for the
>> >> > continued existence of a thread. Update the man page to make it clearer
>> >> > that this doesn't generally work (even if it sometimes seems to).
>> >> >
>> >> > See also the long explanation of why this is the case (and how to fix
>> >> > your code) here:
>> >> >
>> >> > https://android.googlesource.com/platform/bionic/+/master/docs/status.md#invalid-handling-targetsdkversion-o
>> >>
>> >> Well, if you fix the thread exit race (like musl did, and glibc should
>> >> as well, see bug 12889), you could get a reliable ESRCH as a side
>> >> effect.  Pity that POSIX doesn't allow that.
>> >
>> > this isn't about the tid stored *in* the object that the pthread_t
>> > points to.
>> >
>> > like i (briefly) said in the commit message, this is because a
>> > pthread_t is a pointer, so if you have an old pthread_t that's been
>> > recycled... boom!
>>
>> Backing storage for a pthread_t object denoting a joinable thread
>> cannot be recycled, so that's not the case here.  POSIX mandates
>> returning success even if the implementation has detected that it must
>> not send the signal because the thread has already terminated.
>
> who said anything about joinable?

That determines whether the pthread_t object is still valid.

> the cases we've seen in practice are that folks incorrectly believe
> that pthread_kill(3) with a signal of 0 is a reliable way to test
> whether a thread is still running.

Right, that's not working according to (future) POSIX.  Which I
dislike because a correct implementation of pthread_kill has to do all
the work to support this usage (or something like it; after all, only
testing for termination gives stable results), and then is forced by
POSIX to discard the data.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-12 22:11         ` Florian Weimer
@ 2019-11-12 22:22           ` enh
  2019-11-12 22:28             ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: enh @ 2019-11-12 22:22 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Michael Kerrisk (man-pages), linux-man

On Tue, Nov 12, 2019 at 2:11 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * enh:
>
> > On Tue, Nov 12, 2019 at 1:52 PM Florian Weimer <fw@deneb.enyo.de> wrote:
> >>
> >> * enh:
> >>
> >> > On Tue, Nov 12, 2019 at 1:38 PM Florian Weimer <fw@deneb.enyo.de> wrote:
> >> >>
> >> >> * enh:
> >> >>
> >> >> > POSIX removed ESRCH years ago.
> >> >> >
> >> >> > In resolving http://austingroupbugs.net/view.php?id=1214 it was made
> >> >> > clear that callers can't rely on using signal 0 to test for the
> >> >> > continued existence of a thread. Update the man page to make it clearer
> >> >> > that this doesn't generally work (even if it sometimes seems to).
> >> >> >
> >> >> > See also the long explanation of why this is the case (and how to fix
> >> >> > your code) here:
> >> >> >
> >> >> > https://android.googlesource.com/platform/bionic/+/master/docs/status.md#invalid-handling-targetsdkversion-o
> >> >>
> >> >> Well, if you fix the thread exit race (like musl did, and glibc should
> >> >> as well, see bug 12889), you could get a reliable ESRCH as a side
> >> >> effect.  Pity that POSIX doesn't allow that.
> >> >
> >> > this isn't about the tid stored *in* the object that the pthread_t
> >> > points to.
> >> >
> >> > like i (briefly) said in the commit message, this is because a
> >> > pthread_t is a pointer, so if you have an old pthread_t that's been
> >> > recycled... boom!
> >>
> >> Backing storage for a pthread_t object denoting a joinable thread
> >> cannot be recycled, so that's not the case here.  POSIX mandates
> >> returning success even if the implementation has detected that it must
> >> not send the signal because the thread has already terminated.
> >
> > who said anything about joinable?
>
> That determines whether the pthread_t object is still valid.

but this is all about *invalid* threads, which obviously can't be
joinable. i'm really not sure what you're trying to say.

> > the cases we've seen in practice are that folks incorrectly believe
> > that pthread_kill(3) with a signal of 0 is a reliable way to test
> > whether a thread is still running.
>
> Right, that's not working according to (future) POSIX.  Which I
> dislike because a correct implementation of pthread_kill has to do all
> the work to support this usage (or something like it; after all, only
> testing for termination gives stable results), and then is forced by
> POSIX to discard the data.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-12 22:22           ` enh
@ 2019-11-12 22:28             ` Florian Weimer
  2019-11-13  5:27               ` enh
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2019-11-12 22:28 UTC (permalink / raw)
  To: enh; +Cc: Michael Kerrisk (man-pages), linux-man

* enh:

> but this is all about *invalid* threads, which obviously can't be
> joinable. i'm really not sure what you're trying to say.

Uhm, people try use pthread_kill to probe for thread termination.
Termintation of a non-detached thread doesn't make a thread
non-joinable, so from a temporal memory safety perspective, that's
totally fine.  Except that POSIX requires implementations to hide this
information from callers.

Maybe we are talking past each other, though.

Let's look at what musl does:

int pthread_kill(pthread_t t, int sig)
{
        int r;
        LOCK(t->killlock);
        r = t->tid ? -__syscall(SYS_tkill, t->tid, sig)
                : (sig+0U >= _NSIG ? EINVAL : 0);
        UNLOCK(t->killlock);
        return r;
}

The 0 could be ESRCH to support probing for termination.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-12 22:28             ` Florian Weimer
@ 2019-11-13  5:27               ` enh
  2019-11-13  5:51                 ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: enh @ 2019-11-13  5:27 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Michael Kerrisk (man-pages), linux-man

On Tue, Nov 12, 2019 at 2:28 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * enh:
>
> > but this is all about *invalid* threads, which obviously can't be
> > joinable. i'm really not sure what you're trying to say.
>
> Uhm, people try use pthread_kill to probe for thread termination.

yes, that's why i'm keen that we make it clearer that this doesn't work.

> Termintation of a non-detached thread doesn't make a thread
> non-joinable, so from a temporal memory safety perspective, that's
> totally fine.  Except that POSIX requires implementations to hide this
> information from callers.
>
> Maybe we are talking past each other, though.
>
> Let's look at what musl does:
>
> int pthread_kill(pthread_t t, int sig)
> {
>         int r;
>         LOCK(t->killlock);
>         r = t->tid ? -__syscall(SYS_tkill, t->tid, sig)
>                 : (sig+0U >= _NSIG ? EINVAL : 0);
>         UNLOCK(t->killlock);
>         return r;
> }
>
> The 0 could be ESRCH to support probing for termination.

no, because the C library has two choices when a thread exits:

1. unmap the thread.

2. keep the thread around for recycling.

if you choose 1 (optimizing for space, like Android), your dereference
is illegal.

if you choose 2 (optimizing for time, as i believe glibc does), your
dereference is fine and you read the zero that the kernel put there
... until the thread is reused. now you're actually looking at a
different thread than the one you think you're looking at. and as a
caller who by definition doesn't know the current state of the thread,
you've no idea whether it's been reused or not. (this is also strictly
the case on Android if ASLR has put a new thread's stack where the old
one used to be.)

there's more detail about this -- and some less unreliable options --
in the Android documentation i linked to in the commit message.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-13  5:27               ` enh
@ 2019-11-13  5:51                 ` Florian Weimer
  2019-11-13  5:59                   ` enh
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2019-11-13  5:51 UTC (permalink / raw)
  To: enh; +Cc: Michael Kerrisk (man-pages), linux-man

* enh:

> no, because the C library has two choices when a thread exits:
>
> 1. unmap the thread.
>
> 2. keep the thread around for recycling.
>
> if you choose 1 (optimizing for space, like Android), your dereference
> is illegal.

This choice is only available for threads in a detached state.  For
joinable threads, a conforming implementation cannot immediately
deallocate all data structures on thread termination.  Among other
things, it has to store the future return value of pthread_join
somewhere.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-13  5:51                 ` Florian Weimer
@ 2019-11-13  5:59                   ` enh
  2019-11-13  6:10                     ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: enh @ 2019-11-13  5:59 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Michael Kerrisk (man-pages), linux-man

On Tue, Nov 12, 2019 at 9:51 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * enh:
>
> > no, because the C library has two choices when a thread exits:
> >
> > 1. unmap the thread.
> >
> > 2. keep the thread around for recycling.
> >
> > if you choose 1 (optimizing for space, like Android), your dereference
> > is illegal.
>
> This choice is only available for threads in a detached state.  For
> joinable threads, a conforming implementation cannot immediately
> deallocate all data structures on thread termination.  Among other
> things, it has to store the future return value of pthread_join
> somewhere.

ah, you're trying to say "signal 0 is potentially usable for a
joinable thread that's waiting to be joined"? that's true, but i'm not
sure how that's relevant to this patch. that wouldn't be an "invalid
thread ID" until it's joined.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-13  5:59                   ` enh
@ 2019-11-13  6:10                     ` Florian Weimer
  2021-11-09 23:00                       ` enh
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2019-11-13  6:10 UTC (permalink / raw)
  To: enh; +Cc: Michael Kerrisk (man-pages), linux-man

* enh:

> On Tue, Nov 12, 2019 at 9:51 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>>
>> * enh:
>>
>> > no, because the C library has two choices when a thread exits:
>> >
>> > 1. unmap the thread.
>> >
>> > 2. keep the thread around for recycling.
>> >
>> > if you choose 1 (optimizing for space, like Android), your dereference
>> > is illegal.
>>
>> This choice is only available for threads in a detached state.  For
>> joinable threads, a conforming implementation cannot immediately
>> deallocate all data structures on thread termination.  Among other
>> things, it has to store the future return value of pthread_join
>> somewhere.
>
> ah, you're trying to say "signal 0 is potentially usable for a
> joinable thread that's waiting to be joined"? that's true, but i'm not
> sure how that's relevant to this patch. that wouldn't be an "invalid
> thread ID" until it's joined.

Correct.  That's POSIX's argument why ESRCH wouldn't be valid to
return here.  It's still a forceful loss of information, and
particularly annoying since POSIX doesn't specify pthread_tryjoin.

But I'm glad we've brought our discussion to a conclusion. 8-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2019-11-13  6:10                     ` Florian Weimer
@ 2021-11-09 23:00                       ` enh
  2021-11-10  7:14                         ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: enh @ 2021-11-09 23:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Michael Kerrisk (man-pages), linux-man

any comment from either of the maintainers?

i think what we currently have on this page is factually incorrect,
and this patch better matches reality.

On Tue, Nov 12, 2019 at 10:10 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * enh:
>
> > On Tue, Nov 12, 2019 at 9:51 PM Florian Weimer <fw@deneb.enyo.de> wrote:
> >>
> >> * enh:
> >>
> >> > no, because the C library has two choices when a thread exits:
> >> >
> >> > 1. unmap the thread.
> >> >
> >> > 2. keep the thread around for recycling.
> >> >
> >> > if you choose 1 (optimizing for space, like Android), your dereference
> >> > is illegal.
> >>
> >> This choice is only available for threads in a detached state.  For
> >> joinable threads, a conforming implementation cannot immediately
> >> deallocate all data structures on thread termination.  Among other
> >> things, it has to store the future return value of pthread_join
> >> somewhere.
> >
> > ah, you're trying to say "signal 0 is potentially usable for a
> > joinable thread that's waiting to be joined"? that's true, but i'm not
> > sure how that's relevant to this patch. that wouldn't be an "invalid
> > thread ID" until it's joined.
>
> Correct.  That's POSIX's argument why ESRCH wouldn't be valid to
> return here.  It's still a forceful loss of information, and
> particularly annoying since POSIX doesn't specify pthread_tryjoin.
>
> But I'm glad we've brought our discussion to a conclusion. 8-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2021-11-09 23:00                       ` enh
@ 2021-11-10  7:14                         ` Florian Weimer
  2021-11-12  0:01                           ` enh
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2021-11-10  7:14 UTC (permalink / raw)
  To: enh; +Cc: Michael Kerrisk (man-pages), linux-man

> any comment from either of the maintainers?
>
> i think what we currently have on this page is factually incorrect,
> and this patch better matches reality.

One more data point:

As of glibc 2.34, pthread_kill in glibc cannot fail with ESRCH anymore
(unless the kernel thread is terminated by a direct system call).  And
the race that the signal could be sent to the wrong thread is gone.

> On Tue, Nov 12, 2019 at 10:10 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>>
>> * enh:
>>
>> > On Tue, Nov 12, 2019 at 9:51 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>> >>
>> >> * enh:
>> >>
>> >> > no, because the C library has two choices when a thread exits:
>> >> >
>> >> > 1. unmap the thread.
>> >> >
>> >> > 2. keep the thread around for recycling.
>> >> >
>> >> > if you choose 1 (optimizing for space, like Android), your dereference
>> >> > is illegal.
>> >>
>> >> This choice is only available for threads in a detached state.  For
>> >> joinable threads, a conforming implementation cannot immediately
>> >> deallocate all data structures on thread termination.  Among other
>> >> things, it has to store the future return value of pthread_join
>> >> somewhere.
>> >
>> > ah, you're trying to say "signal 0 is potentially usable for a
>> > joinable thread that's waiting to be joined"? that's true, but i'm not
>> > sure how that's relevant to this patch. that wouldn't be an "invalid
>> > thread ID" until it's joined.
>>
>> Correct.  That's POSIX's argument why ESRCH wouldn't be valid to
>> return here.  It's still a forceful loss of information, and
>> particularly annoying since POSIX doesn't specify pthread_tryjoin.
>>
>> But I'm glad we've brought our discussion to a conclusion. 8-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2021-11-10  7:14                         ` Florian Weimer
@ 2021-11-12  0:01                           ` enh
  2021-11-12 13:02                             ` Florian Weimer
  0 siblings, 1 reply; 16+ messages in thread
From: enh @ 2021-11-12  0:01 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Michael Kerrisk (man-pages), linux-man

that was never the issue though ... well, okay, that's *an* issue, but
not the one i'm most concerned about :-)

the issue i'm trying to fix (and so maybe need to find even clearer
wording for) is basically this:

  * lots of people don't realize that pthread_t != pid_t
  * they think that "the worst that can happen" when passing a
no-longer valid pthread_t to these functions is ESRCH
  * they don't realize that using pthread_kill(3) like this is just a
use-after-free bug

i think one reason this persists is glibc's thread cache makes it
harder to hit there. i don't actually know whether glibc's thread
cache has an eviction policy at all? if it doesn't, that would indeed
turn this use-after-free into "just" a question of whether you have
the right pid_t or not. but assuming glibc's thread cache _does_ have
an eviction policy, glibc's in the same boat as more svelte libcs
(such as bionic and musl, plus the BSDs, and also Apple's anonymous
libc) --- it just needs more threads.

this confusion causes bugs (and crashes) today, and it's only going to
get worse as we get better tools for detecting UAF, such as Arm MTE,
and it's really hard to get people to understand the problem when the
man page is worded as it currently is (with a weak "can, for example"
hidden in the NOTES section).

if it really is the case that glibc has no eviction policy, then my
suggestion will probably be that we come up with wording along the
lines of "libc implementers face a choice here between the memory cost
of never freeing [whatever man7 calls the TCB] or not being able to
detect this temporal error; of all the libcs, only glibc chose the
former".

if nothing else that should at least answer the question "why can't
you just be like glibc?" :-)

the current text kind of sounds like "glibc has a great
implementation, but POSIX doesn't require that, and everyone else
sucks", but that's pretty misleading. (even if the glibc
implementation is safe, which i'm not sure it is. it also seems like
the "that's not a cache, that's a memory leak" implementation would
preclude memory tagging for thread stacks, which would be another
infelicity?)

-*-

this page is a bit weird in general... ESRCH isn't mentioned in
ERRORS, but the sig == 0 case is called out in DESCRIPTION, but you
need to read NOTES to find out that that's basically broken. and
no-where on the page do we try to describe alternatives that _do_
work. (happy to volunteer text along the lines of "you need to stash
your thread's tid at a time when you *know* the pthread_t is valid,
such as when the thread starts, and then you can use that with kill(2)
and sig == 0 to do what you _thought_ pthread_kill(3) with sig == 0
did, which still isn't 100% safe in light of pid wrapping, but is the
best you can get if you refuse to actually keep track of your threads'
lifetimes properly :-P ".)

actually, even this would be quite a good improvement:

        If sig is 0, then no signal is sent, but error checking is still
-       performed.
+       performed. See NOTES for why this can't be used to detect
whether another thread is still running.


On Tue, Nov 9, 2021 at 11:16 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> > any comment from either of the maintainers?
> >
> > i think what we currently have on this page is factually incorrect,
> > and this patch better matches reality.
>
> One more data point:
>
> As of glibc 2.34, pthread_kill in glibc cannot fail with ESRCH anymore
> (unless the kernel thread is terminated by a direct system call).  And
> the race that the signal could be sent to the wrong thread is gone.
>
> > On Tue, Nov 12, 2019 at 10:10 PM Florian Weimer <fw@deneb.enyo.de> wrote:
> >>
> >> * enh:
> >>
> >> > On Tue, Nov 12, 2019 at 9:51 PM Florian Weimer <fw@deneb.enyo.de> wrote:
> >> >>
> >> >> * enh:
> >> >>
> >> >> > no, because the C library has two choices when a thread exits:
> >> >> >
> >> >> > 1. unmap the thread.
> >> >> >
> >> >> > 2. keep the thread around for recycling.
> >> >> >
> >> >> > if you choose 1 (optimizing for space, like Android), your dereference
> >> >> > is illegal.
> >> >>
> >> >> This choice is only available for threads in a detached state.  For
> >> >> joinable threads, a conforming implementation cannot immediately
> >> >> deallocate all data structures on thread termination.  Among other
> >> >> things, it has to store the future return value of pthread_join
> >> >> somewhere.
> >> >
> >> > ah, you're trying to say "signal 0 is potentially usable for a
> >> > joinable thread that's waiting to be joined"? that's true, but i'm not
> >> > sure how that's relevant to this patch. that wouldn't be an "invalid
> >> > thread ID" until it's joined.
> >>
> >> Correct.  That's POSIX's argument why ESRCH wouldn't be valid to
> >> return here.  It's still a forceful loss of information, and
> >> particularly annoying since POSIX doesn't specify pthread_tryjoin.
> >>
> >> But I'm glad we've brought our discussion to a conclusion. 8-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] pthread_kill.3: Update to match POSIX.
  2021-11-12  0:01                           ` enh
@ 2021-11-12 13:02                             ` Florian Weimer
  0 siblings, 0 replies; 16+ messages in thread
From: Florian Weimer @ 2021-11-12 13:02 UTC (permalink / raw)
  To: enh; +Cc: Michael Kerrisk (man-pages), linux-man

> the issue i'm trying to fix (and so maybe need to find even clearer
> wording for) is basically this:
>
>   * lots of people don't realize that pthread_t != pid_t
>   * they think that "the worst that can happen" when passing a
> no-longer valid pthread_t to these functions is ESRCH
>   * they don't realize that using pthread_kill(3) like this is just a
> use-after-free bug

Okay, this is just not about pthread_kill, though.  So man-pages-wise,
pthread_kill may not be the right place to document it.

> i think one reason this persists is glibc's thread cache makes it
> harder to hit there. i don't actually know whether glibc's thread
> cache has an eviction policy at all?

It's typically at least five entries deep, so it's pretty good at
obscuring these issues.

In glibc 2.34, the stack size is tunable, and it can be disabled (more
or less) using

  GLIBC_TUNABLES=glibc.pthread.stack_cache_size=0

for typical distribution builds that do not disable tunables.  If you do
that, you get a segmentation fault for such invalid pthread_kill calls.

(The ‘more or less’ part refers to detached threads, where the TCB
lingers around after exit because it is tied to the thread stack in our
current implementation.)

> if it doesn't, that would indeed
> turn this use-after-free into "just" a question of whether you have
> the right pid_t or not. but assuming glibc's thread cache _does_ have
> an eviction policy, glibc's in the same boat as more svelte libcs
> (such as bionic and musl, plus the BSDs, and also Apple's anonymous
> libc) --- it just needs more threads.

Right.

> this confusion causes bugs (and crashes) today, and it's only going to
> get worse as we get better tools for detecting UAF, such as Arm MTE,
> and it's really hard to get people to understand the problem when the
> man page is worded as it currently is (with a weak "can, for example"
> hidden in the NOTES section).

I helped to fixed an incorrect LTP test around precisely this, and the
GLIBC_TUNABLES setting was helpful to show that there was indeed a
use-after-free.  Maybe that can help you with your “just like glibc”
problems, too.

glibc 2.35 (and glibc 2.34 post-release) also break pthread-kill-based
probing loops to detect kernel thread exit.  An unjoined pthread_t can
“receive” signals even if the TID is no longer in use on the kernel
side.

> this page is a bit weird in general... ESRCH isn't mentioned in
> ERRORS, but the sig == 0 case is called out in DESCRIPTION, but you
> need to read NOTES to find out that that's basically broken. and
> no-where on the page do we try to describe alternatives that _do_
> work. (happy to volunteer text along the lines of "you need to stash
> your thread's tid at a time when you *know* the pthread_t is valid,
> such as when the thread starts, and then you can use that with kill(2)
> and sig == 0 to do what you _thought_ pthread_kill(3) with sig == 0
> did, which still isn't 100% safe in light of pid wrapping, but is the
> best you can get if you refuse to actually keep track of your threads'
> lifetimes properly :-P ".)

I don't think that's good advice.  Any such use has TID race issues
(even if you use tgkill).

> actually, even this would be quite a good improvement:
>
>         If sig is 0, then no signal is sent, but error checking is still
> -       performed.
> +       performed. See NOTES for why this can't be used to detect
> whether another thread is still running.

That makes sense.  (And I need to fix the bug that we don't have enough
error checking, now that we no longer try to send the signal.)

Thanks,
Florian


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-11-12 13:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 20:36 [PATCH] pthread_kill.3: Update to match POSIX enh
2019-11-12 21:38 ` Florian Weimer
2019-11-12 21:40   ` enh
2019-11-12 21:52     ` Florian Weimer
2019-11-12 22:06       ` enh
2019-11-12 22:11         ` Florian Weimer
2019-11-12 22:22           ` enh
2019-11-12 22:28             ` Florian Weimer
2019-11-13  5:27               ` enh
2019-11-13  5:51                 ` Florian Weimer
2019-11-13  5:59                   ` enh
2019-11-13  6:10                     ` Florian Weimer
2021-11-09 23:00                       ` enh
2021-11-10  7:14                         ` Florian Weimer
2021-11-12  0:01                           ` enh
2021-11-12 13:02                             ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).