linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Syscall kill() can send signal to thread ID
@ 2022-09-22  9:11 cambda
  2022-09-22 15:09 ` Eric W. Biederman
  2022-09-22 15:33 ` Eric W. Biederman
  0 siblings, 2 replies; 12+ messages in thread
From: cambda @ 2022-09-22  9:11 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: linux-api, Xuan Zhuo, Dust Li, Tony Lu, Cambda Zhu

I found syscall kill() can send signal to a thread id, which is
not the TGID. But the Linux manual page kill(2) said:

"The kill() system call can be used to send any signal to any
process group or process."

And the Linux manual page tkill(2) said:

"tgkill() sends the signal sig to the thread with the thread ID
tid in the thread group tgid.  (By contrast, kill(2) can be used
to send a signal only to a process (i.e., thread group) as a
whole, and the signal will be delivered to an arbitrary thread
within that process.)"

I don't know whether the meaning of this 'process' should be
the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
while Linux sends signal to the thread group that the thread belongs
to.

If this is as expected, should we add a notice to the Linux manual
page? Because it's a syscall and the pids not equal to tgid are not
listed under /proc. This may be a little confusing, I guess.

Regards,
Cambda

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syscall kill() can send signal to thread ID
  2022-09-22  9:11 Syscall kill() can send signal to thread ID cambda
@ 2022-09-22 15:09 ` Eric W. Biederman
  2022-09-23  5:31   ` Florian Weimer
  2022-09-22 15:33 ` Eric W. Biederman
  1 sibling, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2022-09-22 15:09 UTC (permalink / raw)
  To: cambda; +Cc: linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu

cambda@linux.alibaba.com writes:

> I found syscall kill() can send signal to a thread id, which is
> not the TGID. But the Linux manual page kill(2) said:
>
> "The kill() system call can be used to send any signal to any
> process group or process."
>
> And the Linux manual page tkill(2) said:
>
> "tgkill() sends the signal sig to the thread with the thread ID
> tid in the thread group tgid.  (By contrast, kill(2) can be used
> to send a signal only to a process (i.e., thread group) as a
> whole, and the signal will be delivered to an arbitrary thread
> within that process.)"
>
> I don't know whether the meaning of this 'process' should be
> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
> while Linux sends signal to the thread group that the thread belongs
> to.
>
> If this is as expected, should we add a notice to the Linux manual
> page? Because it's a syscall and the pids not equal to tgid are not
> listed under /proc. This may be a little confusing, I guess.

This is as expected.

The bit about is /proc is interesting.  On linux try
"cd /proc; cd tid" and see what happens.

Using the thread id in kill(2) is used to select the process, and the
delivery happens just the same as if the TGID had been used.

It is one of those odd behaviors that we could potentially remove.  It
would require hunting through all of the userspace applications to see
if something happens to depend upon that behavior.  Unless it becomes
expensive to maintain I don't expect we will ever do that.

For the same reason we probably don't want to document it as we don't
want to encourage anyone to use that strange corner case.  As it is when
we break it by accident and noone notices for a couple of years we can
remove the behavior as that will have proved that no one uses it ;)

Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syscall kill() can send signal to thread ID
  2022-09-22  9:11 Syscall kill() can send signal to thread ID cambda
  2022-09-22 15:09 ` Eric W. Biederman
@ 2022-09-22 15:33 ` Eric W. Biederman
  2022-09-23  3:56   ` cambda
  1 sibling, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2022-09-22 15:33 UTC (permalink / raw)
  To: cambda; +Cc: linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu

cambda@linux.alibaba.com writes:

> I found syscall kill() can send signal to a thread id, which is
> not the TGID. But the Linux manual page kill(2) said:
>
> "The kill() system call can be used to send any signal to any
> process group or process."
>
> And the Linux manual page tkill(2) said:
>
> "tgkill() sends the signal sig to the thread with the thread ID
> tid in the thread group tgid.  (By contrast, kill(2) can be used
> to send a signal only to a process (i.e., thread group) as a
> whole, and the signal will be delivered to an arbitrary thread
> within that process.)"
>
> I don't know whether the meaning of this 'process' should be
> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
> while Linux sends signal to the thread group that the thread belongs
> to.
>
> If this is as expected, should we add a notice to the Linux manual
> page? Because it's a syscall and the pids not equal to tgid are not
> listed under /proc. This may be a little confusing, I guess.

How did you come across this?  Were you just experimenting?

I am wondering if you were tracking a bug, or a portability problem
or something else.  If the current behavior is causing problems in
some way instead of just being a detail that no one really cares about
either way it would be worth considering if we want to maintain the
current behavior.

Eric


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syscall kill() can send signal to thread ID
  2022-09-22 15:33 ` Eric W. Biederman
@ 2022-09-23  3:56   ` cambda
  2022-09-23 11:24     ` David Laight
  2022-09-23 21:21     ` Eric W. Biederman
  0 siblings, 2 replies; 12+ messages in thread
From: cambda @ 2022-09-23  3:56 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu



> On Sep 22, 2022, at 23:33, Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
> cambda@linux.alibaba.com writes:
> 
>> I found syscall kill() can send signal to a thread id, which is
>> not the TGID. But the Linux manual page kill(2) said:
>> 
>> "The kill() system call can be used to send any signal to any
>> process group or process."
>> 
>> And the Linux manual page tkill(2) said:
>> 
>> "tgkill() sends the signal sig to the thread with the thread ID
>> tid in the thread group tgid.  (By contrast, kill(2) can be used
>> to send a signal only to a process (i.e., thread group) as a
>> whole, and the signal will be delivered to an arbitrary thread
>> within that process.)"
>> 
>> I don't know whether the meaning of this 'process' should be
>> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
>> while Linux sends signal to the thread group that the thread belongs
>> to.
>> 
>> If this is as expected, should we add a notice to the Linux manual
>> page? Because it's a syscall and the pids not equal to tgid are not
>> listed under /proc. This may be a little confusing, I guess.
> 
> How did you come across this?  Were you just experimenting?
> 
> I am wondering if you were tracking a bug, or a portability problem
> or something else.  If the current behavior is causing problems in
> some way instead of just being a detail that no one really cares about
> either way it would be worth considering if we want to maintain the
> current behavior.
> 
> Eric

I have found I can cd into /proc/tid, and the proc_pid_readdir()
uses next_tgid() to filter tid. Also the 'ps' command reads the
/proc dir to show processes. That's why I was confused with kill().

And yes, I'm tracking a bug. A service monitor, like systemd or
some watchdog, uses kill() to check if a pid is valid or not:
  1. Store service pid into cache.
  2. Check if pid in cache is valid by kill(pid, 0).
  3. Check if pid in cache is the service to watch.

So if kill(pid, 0) returns success but no process info shows on 'ps'
command, the service monitor could be confused. The monitor could
check if pid is tid, but this means the odd behavior would be used
intentionally. And this workaround may be unsafe on other OS?

I'm agreed with you that this behavior shouldn't be removed, in case
some userspace applications use it now.

Regards,
Cambda



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syscall kill() can send signal to thread ID
  2022-09-22 15:09 ` Eric W. Biederman
@ 2022-09-23  5:31   ` Florian Weimer
  2022-09-23  6:25     ` cambda
  0 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2022-09-23  5:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: cambda, linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu

* Eric W. Biederman:

> cambda@linux.alibaba.com writes:
>
>> I found syscall kill() can send signal to a thread id, which is
>> not the TGID. But the Linux manual page kill(2) said:
>>
>> "The kill() system call can be used to send any signal to any
>> process group or process."
>>
>> And the Linux manual page tkill(2) said:
>>
>> "tgkill() sends the signal sig to the thread with the thread ID
>> tid in the thread group tgid.  (By contrast, kill(2) can be used
>> to send a signal only to a process (i.e., thread group) as a
>> whole, and the signal will be delivered to an arbitrary thread
>> within that process.)"
>>
>> I don't know whether the meaning of this 'process' should be
>> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
>> while Linux sends signal to the thread group that the thread belongs
>> to.
>>
>> If this is as expected, should we add a notice to the Linux manual
>> page? Because it's a syscall and the pids not equal to tgid are not
>> listed under /proc. This may be a little confusing, I guess.
>
> This is as expected.
>
> The bit about is /proc is interesting.  On linux try
> "cd /proc; cd tid" and see what happens.
>
> Using the thread id in kill(2) is used to select the process, and the
> delivery happens just the same as if the TGID had been used.
>
> It is one of those odd behaviors that we could potentially remove.  It
> would require hunting through all of the userspace applications to see
> if something happens to depend upon that behavior.  Unless it becomes
> expensive to maintain I don't expect we will ever do that.

It would just replace one odd behavior by another because kill for the
TID of the main thread will still send the signal to the entire process
(because the TID is equal to the PID), but for the other threads, it
would just send it to the thread.  So it would still be inconsistent.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syscall kill() can send signal to thread ID
  2022-09-23  5:31   ` Florian Weimer
@ 2022-09-23  6:25     ` cambda
  2022-09-23  7:53       ` Florian Weimer
  0 siblings, 1 reply; 12+ messages in thread
From: cambda @ 2022-09-23  6:25 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Eric W. Biederman, linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu


> On Sep 23, 2022, at 13:31, Florian Weimer <fweimer@redhat.com> wrote:
> 
> * Eric W. Biederman:
> 
>> cambda@linux.alibaba.com writes:
>> 
>>> I found syscall kill() can send signal to a thread id, which is
>>> not the TGID. But the Linux manual page kill(2) said:
>>> 
>>> "The kill() system call can be used to send any signal to any
>>> process group or process."
>>> 
>>> And the Linux manual page tkill(2) said:
>>> 
>>> "tgkill() sends the signal sig to the thread with the thread ID
>>> tid in the thread group tgid.  (By contrast, kill(2) can be used
>>> to send a signal only to a process (i.e., thread group) as a
>>> whole, and the signal will be delivered to an arbitrary thread
>>> within that process.)"
>>> 
>>> I don't know whether the meaning of this 'process' should be
>>> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
>>> while Linux sends signal to the thread group that the thread belongs
>>> to.
>>> 
>>> If this is as expected, should we add a notice to the Linux manual
>>> page? Because it's a syscall and the pids not equal to tgid are not
>>> listed under /proc. This may be a little confusing, I guess.
>> 
>> This is as expected.
>> 
>> The bit about is /proc is interesting.  On linux try
>> "cd /proc; cd tid" and see what happens.
>> 
>> Using the thread id in kill(2) is used to select the process, and the
>> delivery happens just the same as if the TGID had been used.
>> 
>> It is one of those odd behaviors that we could potentially remove.  It
>> would require hunting through all of the userspace applications to see
>> if something happens to depend upon that behavior.  Unless it becomes
>> expensive to maintain I don't expect we will ever do that.
> 
> It would just replace one odd behavior by another because kill for the
> TID of the main thread will still send the signal to the entire process
> (because the TID is equal to the PID), but for the other threads, it
> would just send it to the thread.  So it would still be inconsistent.
> 
> Thanks,
> Florian

I don't quite understand what you mean, sorry. But if kill() returns -ESRCH for
tid which is not equal to tgid, kill() can only send signal to thread group via
main thread id, that is what BSD did and manual said. It seems not odd?

Regards,
Cambda

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syscall kill() can send signal to thread ID
  2022-09-23  6:25     ` cambda
@ 2022-09-23  7:53       ` Florian Weimer
  2022-09-23  8:40         ` Cambda Zhu
  0 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2022-09-23  7:53 UTC (permalink / raw)
  To: cambda
  Cc: Eric W. Biederman, linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu

> I don't quite understand what you mean, sorry. But if kill() returns
> -ESRCH for tid which is not equal to tgid, kill() can only send signal
> to thread group via main thread id, that is what BSD did and manual
> said. It seems not odd?

It's still odd because there's one TID per process that's valid for
kill by accident.  That's all.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syscall kill() can send signal to thread ID
  2022-09-23  7:53       ` Florian Weimer
@ 2022-09-23  8:40         ` Cambda Zhu
  2022-09-23 21:15           ` Eric W. Biederman
  0 siblings, 1 reply; 12+ messages in thread
From: Cambda Zhu @ 2022-09-23  8:40 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Eric W. Biederman, linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu


> On Sep 23, 2022, at 15:53, Florian Weimer <fweimer@redhat.com> wrote:
> 
>> I don't quite understand what you mean, sorry. But if kill() returns
>> -ESRCH for tid which is not equal to tgid, kill() can only send signal
>> to thread group via main thread id, that is what BSD did and manual
>> said. It seems not odd?
> 
> It's still odd because there's one TID per process that's valid for
> kill by accident.  That's all.
> 
> Thanks,
> Florian

As far as I know, there is no rule forbidding 'process ID'(TGID on Linux)
equals to main thread ID, is it right? If one wants to send signal to a
specific thread, tgkill() can do that. As far as I understand, the difference
between kill() and tgkill() is whether the signal is set on shared_pending,
whatever the ID is a process ID or a thread ID. For Linux, the main thread ID
just equals to the process ID. So the meaning of kill(main_tid, sig) is sending
signal to a process, of which the PID equals to the first argument. It's not odd,
I think.

Thanks,
Cambda

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Syscall kill() can send signal to thread ID
  2022-09-23  3:56   ` cambda
@ 2022-09-23 11:24     ` David Laight
  2022-09-23 21:21     ` Eric W. Biederman
  1 sibling, 0 replies; 12+ messages in thread
From: David Laight @ 2022-09-23 11:24 UTC (permalink / raw)
  To: 'cambda@linux.alibaba.com', Eric W. Biederman
  Cc: linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu

...
> And yes, I'm tracking a bug. A service monitor, like systemd or
> some watchdog, uses kill() to check if a pid is valid or not:
>   1. Store service pid into cache.
>   2. Check if pid in cache is valid by kill(pid, 0).
>   3. Check if pid in cache is the service to watch.
> 
> So if kill(pid, 0) returns success but no process info shows on 'ps'
> command, the service monitor could be confused. The monitor could
> check if pid is tid, but this means the odd behavior would be used
> intentionally. And this workaround may be unsafe on other OS?

That looks pretty broken to me.
On Linux a pid can be reused immediately a process exits.
So there is really no guarantee that the pid is the one you want.
IIRC there are some recent changes that mean opening /proc/<pid>
will stop the pid being reused - allowing checks before sending a signal.
(Netbsd won't reuse a pid for a reasonable number of forks
and then uses a semi-random pid allocator.
Don't know whether any other 'bsd picked up that change.)

Also using signals in multi-threaded programs is pretty much
non-portable.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syscall kill() can send signal to thread ID
  2022-09-23  8:40         ` Cambda Zhu
@ 2022-09-23 21:15           ` Eric W. Biederman
  0 siblings, 0 replies; 12+ messages in thread
From: Eric W. Biederman @ 2022-09-23 21:15 UTC (permalink / raw)
  To: Cambda Zhu
  Cc: Florian Weimer, linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu

Cambda Zhu <cambda@linux.alibaba.com> writes:

>> On Sep 23, 2022, at 15:53, Florian Weimer <fweimer@redhat.com> wrote:
>> 
>>> I don't quite understand what you mean, sorry. But if kill() returns
>>> -ESRCH for tid which is not equal to tgid, kill() can only send signal
>>> to thread group via main thread id, that is what BSD did and manual
>>> said. It seems not odd?
>> 
>> It's still odd because there's one TID per process that's valid for
>> kill by accident.  That's all.

> As far as I know, there is no rule forbidding 'process ID'(TGID on Linux)
> equals to main thread ID, is it right?

There is an unfortunate guarantee that glibc depends upon that after
exec TGID == TID for the initial thread in a process.  I say unfortunate
because maintaining that guarantee when another thread in the process
calls exec is a bit painful.

> If one wants to send signal to a specific thread, tgkill() can do
> that. As far as I understand, the difference between kill() and
> tgkill() is whether the signal is set on shared_pending, whatever the
> ID is a process ID or a thread ID. For Linux, the main thread ID just
> equals to the process ID.

Correct.  kill and tgkill uses different signal queues.  Kill is global
to the destination process and tgkill is always thread local.

> So the meaning of kill(main_tid, sig) is sending signal to a process,
> of which the PID equals to the first argument. It's not odd, I think.

Yes, the oddity is the TGID and TID share the same value, nothing else.

Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syscall kill() can send signal to thread ID
  2022-09-23  3:56   ` cambda
  2022-09-23 11:24     ` David Laight
@ 2022-09-23 21:21     ` Eric W. Biederman
  2022-09-24  3:16       ` Cambda Zhu
  1 sibling, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2022-09-23 21:21 UTC (permalink / raw)
  To: cambda; +Cc: linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu

"cambda@linux.alibaba.com" <cambda@linux.alibaba.com> writes:

>> On Sep 22, 2022, at 23:33, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> 
>> cambda@linux.alibaba.com writes:
>> 
>>> I found syscall kill() can send signal to a thread id, which is
>>> not the TGID. But the Linux manual page kill(2) said:
>>> 
>>> "The kill() system call can be used to send any signal to any
>>> process group or process."
>>> 
>>> And the Linux manual page tkill(2) said:
>>> 
>>> "tgkill() sends the signal sig to the thread with the thread ID
>>> tid in the thread group tgid.  (By contrast, kill(2) can be used
>>> to send a signal only to a process (i.e., thread group) as a
>>> whole, and the signal will be delivered to an arbitrary thread
>>> within that process.)"
>>> 
>>> I don't know whether the meaning of this 'process' should be
>>> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
>>> while Linux sends signal to the thread group that the thread belongs
>>> to.
>>> 
>>> If this is as expected, should we add a notice to the Linux manual
>>> page? Because it's a syscall and the pids not equal to tgid are not
>>> listed under /proc. This may be a little confusing, I guess.
>> 
>> How did you come across this?  Were you just experimenting?
>> 
>> I am wondering if you were tracking a bug, or a portability problem
>> or something else.  If the current behavior is causing problems in
>> some way instead of just being a detail that no one really cares about
>> either way it would be worth considering if we want to maintain the
>> current behavior.
>> 
>> Eric
>
> I have found I can cd into /proc/tid, and the proc_pid_readdir()
> uses next_tgid() to filter tid. Also the 'ps' command reads the
> /proc dir to show processes. That's why I was confused with kill().
>
> And yes, I'm tracking a bug. A service monitor, like systemd or
> some watchdog, uses kill() to check if a pid is valid or not:
>   1. Store service pid into cache.
>   2. Check if pid in cache is valid by kill(pid, 0).
>   3. Check if pid in cache is the service to watch.
>
> So if kill(pid, 0) returns success but no process info shows on 'ps'
> command, the service monitor could be confused. The monitor could
> check if pid is tid, but this means the odd behavior would be used
> intentionally. And this workaround may be unsafe on other OS?
>
> I'm agreed with you that this behavior shouldn't be removed, in case
> some userspace applications use it now.

As has already been mentioned using pids and api's like kill is
fundamentally racy.  We try and to keep from reusing pids too quickly.
Unfortunately what we have is that on average there will be some time
between pid reuse not an kind of worst case guarantee.

We have slowly been introducing techniques into linux allow combatting
that.  A directory processes directory in proc that you have open will
never point to another process even after the pid is reused.  Similarly
we have pidfd that will associate with a specific process and will not
associate with any other process even if the processes pid is reused.

That is we have userspace pid value reuse, but we don't reuse struct pid
in the kernel.

Unfortunately I don't think there is anything that allows these races to
be addressed in a portable manner.

Eric


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syscall kill() can send signal to thread ID
  2022-09-23 21:21     ` Eric W. Biederman
@ 2022-09-24  3:16       ` Cambda Zhu
  0 siblings, 0 replies; 12+ messages in thread
From: Cambda Zhu @ 2022-09-24  3:16 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, linux-api, Xuan Zhuo, Dust Li, Tony Lu


> On Sep 24, 2022, at 05:21, Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
> "cambda@linux.alibaba.com" <cambda@linux.alibaba.com> writes:
> 
>>> On Sep 22, 2022, at 23:33, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> 
>>> cambda@linux.alibaba.com writes:
>>> 
>>>> I found syscall kill() can send signal to a thread id, which is
>>>> not the TGID. But the Linux manual page kill(2) said:
>>>> 
>>>> "The kill() system call can be used to send any signal to any
>>>> process group or process."
>>>> 
>>>> And the Linux manual page tkill(2) said:
>>>> 
>>>> "tgkill() sends the signal sig to the thread with the thread ID
>>>> tid in the thread group tgid.  (By contrast, kill(2) can be used
>>>> to send a signal only to a process (i.e., thread group) as a
>>>> whole, and the signal will be delivered to an arbitrary thread
>>>> within that process.)"
>>>> 
>>>> I don't know whether the meaning of this 'process' should be
>>>> the TGID? Because I found kill(tid, 0) will return ESRCH on FreeBSD,
>>>> while Linux sends signal to the thread group that the thread belongs
>>>> to.
>>>> 
>>>> If this is as expected, should we add a notice to the Linux manual
>>>> page? Because it's a syscall and the pids not equal to tgid are not
>>>> listed under /proc. This may be a little confusing, I guess.
>>> 
>>> How did you come across this?  Were you just experimenting?
>>> 
>>> I am wondering if you were tracking a bug, or a portability problem
>>> or something else.  If the current behavior is causing problems in
>>> some way instead of just being a detail that no one really cares about
>>> either way it would be worth considering if we want to maintain the
>>> current behavior.
>>> 
>>> Eric
>> 
>> I have found I can cd into /proc/tid, and the proc_pid_readdir()
>> uses next_tgid() to filter tid. Also the 'ps' command reads the
>> /proc dir to show processes. That's why I was confused with kill().
>> 
>> And yes, I'm tracking a bug. A service monitor, like systemd or
>> some watchdog, uses kill() to check if a pid is valid or not:
>>  1. Store service pid into cache.
>>  2. Check if pid in cache is valid by kill(pid, 0).
>>  3. Check if pid in cache is the service to watch.
>> 
>> So if kill(pid, 0) returns success but no process info shows on 'ps'
>> command, the service monitor could be confused. The monitor could
>> check if pid is tid, but this means the odd behavior would be used
>> intentionally. And this workaround may be unsafe on other OS?
>> 
>> I'm agreed with you that this behavior shouldn't be removed, in case
>> some userspace applications use it now.
> 
> As has already been mentioned using pids and api's like kill is
> fundamentally racy.  We try and to keep from reusing pids too quickly.
> Unfortunately what we have is that on average there will be some time
> between pid reuse not an kind of worst case guarantee.
> 
> We have slowly been introducing techniques into linux allow combatting
> that.  A directory processes directory in proc that you have open will
> never point to another process even after the pid is reused.  Similarly
> we have pidfd that will associate with a specific process and will not
> associate with any other process even if the processes pid is reused.
> 
> That is we have userspace pid value reuse, but we don't reuse struct pid
> in the kernel.
> 
> Unfortunately I don't think there is anything that allows these races to
> be addressed in a portable manner.
> 
> Eric

I got it. Thank you!

Regards,
Cambda

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-09-24  3:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22  9:11 Syscall kill() can send signal to thread ID cambda
2022-09-22 15:09 ` Eric W. Biederman
2022-09-23  5:31   ` Florian Weimer
2022-09-23  6:25     ` cambda
2022-09-23  7:53       ` Florian Weimer
2022-09-23  8:40         ` Cambda Zhu
2022-09-23 21:15           ` Eric W. Biederman
2022-09-22 15:33 ` Eric W. Biederman
2022-09-23  3:56   ` cambda
2022-09-23 11:24     ` David Laight
2022-09-23 21:21     ` Eric W. Biederman
2022-09-24  3:16       ` Cambda Zhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).