Linux-man Archive on lore.kernel.org
 help / color / Atom feed
* For review: pidfd_send_signal(2) manual page
@ 2019-09-23  9:12 Michael Kerrisk (man-pages)
  2019-09-23 11:26 ` Florian Weimer
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-23  9:12 UTC (permalink / raw)
  To: Oleg Nesterov, Christian Brauner, Jann Horn, Eric W. Biederman,
	Daniel Colascione, Joel Fernandes
  Cc: mtk.manpages, linux-man, Linux API, lkml

Hello Christian and all,

Below, I have the rendered version of the current draft of
the pidfd_send_signal(2) manual page that I have written.
The page source can be found in a Git branch at:
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd

I would be pleased to receive corrections and notes on any
details that should be added. (For example, are there error
cases that I have missed?)

Would you be able to review please?

Thanks,

Michael


NAME
       pidfd_send_signal - send a signal to a process specified by a file
       descriptor

SYNOPSIS
       int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
                             unsigned int flags);

DESCRIPTION
       The pidfd_send_signal() system call sends the signal  sig  to  the
       target  process  referred  to by pidfd, a PID file descriptor that
       refers to a process.

       If the info argument points to a  siginfo_t  buffer,  that  buffer
       should be populated as described in rt_sigqueueinfo(2).

       If  the  info  argument  is  a NULL pointer, this is equivalent to
       specifying a pointer to a siginfo_t buffer whose fields match  the
       values  that  are  implicitly supplied when a signal is sent using
       kill(2):

       *  si_signo is set to the signal number;
       *  si_errno is set to 0;
       *  si_code is set to SI_USER;
       *  si_pid is set to the caller's PID; and
       *  si_uid is set to the caller's real user ID.

       The calling process must either be in the same  PID  namespace  as
       the  process  referred  to  by pidfd, or be in an ancestor of that
       namespace.

       The flags argument is reserved for  future  use;  currently,  this
       argument must be specified as 0.

RETURN VALUE
       On  success,  pidfd_send_signal()  returns  0.   On success, -1 is
       returned and errno is set to indicate the cause of the error.

ERRORS
       EBADF  pidfd is not a valid PID file descriptor.

       EINVAL sig is not a valid signal.

       EINVAL The calling process is not in a PID namespace from which it
              can send a signal to the target process.

       EINVAL flags is not 0.

       EPERM  The  calling  process  does not have permission to send the
              signal to the target process.

       EPERM  pidfd  doesn't  refer   to   the   calling   process,   and
              info.si_code is invalid (see rt_sigqueueinfo(2)).

       ESRCH  The target process does not exist.

VERSIONS
       pidfd_send_signal() first appeared in Linux 5.1.

CONFORMING TO
       pidfd_send_signal() is Linux specific.

NOTES
       Currently, there is no glibc wrapper for this system call; call it
       using syscall(2).

   PID file descriptors
       The pidfd argument is a PID file  descriptor,  a  file  descriptor
       that  refers  to  process.  Such a file descriptor can be obtained
       in any of the following ways:

       *  by opening a /proc/[pid] directory;

       *  using pidfd_open(2); or

       *  via the PID file descriptor that  is  returned  by  a  call  to
          clone(2) or clone3(2) that specifies the CLONE_PIDFD flag.

       The  pidfd_send_signal()  system call allows the avoidance of race
       conditions that occur when using traditional interfaces  (such  as
       kill(2)) to signal a process.  The problem is that the traditional
       interfaces specify the target process via a process ID (PID), with
       the  result  that the sender may accidentally send a signal to the
       wrong process if the originally intended target process has termi‐
       nated  and its PID has been recycled for another process.  By con‐
       trast, a PID file descriptor is a stable reference to  a  specific
       process;  if  that  process  terminates,  then the file descriptor
       ceases to be  valid  and  the  caller  of  pidfd_send_signal()  is
       informed of this fact via an ESRCH error.

EXAMPLE
       #define _GNU_SOURCE
       #include <limits.h>
       #include <signal.h>
       #include <fcntl.h>
       #include <stdio.h>
       #include <string.h>
       #include <stdlib.h>
       #include <unistd.h>
       #include <sys/syscall.h>

       #ifndef __NR_pidfd_send_signal
       #define __NR_pidfd_send_signal 424
       #endif

       static
       int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
               unsigned int flags)
       {
           return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
       }

       int
       main(int argc, char *argv[])
       {
           siginfo_t info;
           char path[PATH_MAX];
           int pidfd, sig;

           if (argc != 3) {
               fprintf(stderr, "Usage: %s <pid> <signal>\n", argv[0]);
               exit(EXIT_FAILURE);
           }

           sig = atoi(argv[2]);

           /* Obtain a PID file descriptor by opening the /proc/PID directory
              of the target process */

           snprintf(path, sizeof(path), "/proc/%s", argv[1]);

           pidfd = open(path, O_RDONLY);
           if (pidfd == -1) {
               perror("open");
               exit(EXIT_FAILURE);
           }

           /* Populate a 'siginfo_t' structure for use with
              pidfd_send_signal() */

           memset(&info, 0, sizeof(info));
           info.si_code = SI_QUEUE;
           info.si_signo = sig;
           info.si_errno = 0;
           info.si_uid = getuid();
           info.si_pid = getpid();
           info.si_value.sival_int = 1234;

           /* Send the signal */

           if (pidfd_send_signal(pidfd, sig, &info, 0) == -1) {
               perror("pidfd_send_signal");
               exit(EXIT_FAILURE);
           }

           exit(EXIT_SUCCESS);
       }

SEE ALSO
       clone(2),   kill(2),   pidfd_open(2),  rt_sigqueueinfo(2),  sigac‐
       tion(2), pid_namespaces(7), signal(7)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23  9:12 For review: pidfd_send_signal(2) manual page Michael Kerrisk (man-pages)
@ 2019-09-23 11:26 ` Florian Weimer
  2019-09-23 14:23   ` Christian Brauner
                     ` (2 more replies)
  2019-09-23 11:31 ` Daniel Colascione
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 23+ messages in thread
From: Florian Weimer @ 2019-09-23 11:26 UTC (permalink / raw)
  To: Michael Kerrisk \(man-pages\)
  Cc: Oleg Nesterov, Christian Brauner, Jann Horn, Eric W. Biederman,
	Daniel Colascione, Joel Fernandes, linux-man, Linux API, lkml

* Michael Kerrisk:

> SYNOPSIS
>        int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
>                              unsigned int flags);

This probably should reference a header for siginfo_t.

>        ESRCH  The target process does not exist.

If the descriptor is valid, does this mean the process has been waited
for?  Maybe this can be made more explicit.

>        The  pidfd_send_signal()  system call allows the avoidance of race
>        conditions that occur when using traditional interfaces  (such  as
>        kill(2)) to signal a process.  The problem is that the traditional
>        interfaces specify the target process via a process ID (PID), with
>        the  result  that the sender may accidentally send a signal to the
>        wrong process if the originally intended target process has termi‐
>        nated  and its PID has been recycled for another process.  By con‐
>        trast, a PID file descriptor is a stable reference to  a  specific
>        process;  if  that  process  terminates,  then the file descriptor
>        ceases to be  valid  and  the  caller  of  pidfd_send_signal()  is
>        informed of this fact via an ESRCH error.

It would be nice to explain somewhere how you can avoid the race using
a PID descriptor.  Is there anything else besides CLONE_PIDFD?

>        static
>        int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
>                unsigned int flags)
>        {
>            return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
>        }

Please use a different function name.  Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23  9:12 For review: pidfd_send_signal(2) manual page Michael Kerrisk (man-pages)
  2019-09-23 11:26 ` Florian Weimer
@ 2019-09-23 11:31 ` Daniel Colascione
  2019-09-24 19:42   ` Michael Kerrisk (man-pages)
  2019-09-23 14:29 ` Christian Brauner
  2019-09-23 21:27 ` ebiederm
  3 siblings, 1 reply; 23+ messages in thread
From: Daniel Colascione @ 2019-09-23 11:31 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Oleg Nesterov, Christian Brauner, Jann Horn, Eric W. Biederman,
	Joel Fernandes, linux-man, Linux API, lkml

On Mon, Sep 23, 2019 at 2:12 AM Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
>        The  pidfd_send_signal()  system call allows the avoidance of race
>        conditions that occur when using traditional interfaces  (such  as
>        kill(2)) to signal a process.  The problem is that the traditional
>        interfaces specify the target process via a process ID (PID), with
>        the  result  that the sender may accidentally send a signal to the
>        wrong process if the originally intended target process has termi‐
>        nated  and its PID has been recycled for another process.  By con‐
>        trast, a PID file descriptor is a stable reference to  a  specific
>        process;  if  that  process  terminates,  then the file descriptor
>        ceases to be  valid

The file *descriptor* remains valid even after the process to which it
refers exits. You can close(2) the file descriptor without getting
EBADF. I'd say, instead, that "a PID file descriptor is a stable
reference to a specific process; process-related operations on a PID
file descriptor fail after that process exits".

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23 11:26 ` Florian Weimer
@ 2019-09-23 14:23   ` Christian Brauner
  2019-09-24 19:44     ` Michael Kerrisk (man-pages)
  2019-09-24 19:43   ` Michael Kerrisk (man-pages)
  2019-09-25  1:48   ` Jann Horn
  2 siblings, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2019-09-23 14:23 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Michael Kerrisk (man-pages),
	Oleg Nesterov, Christian Brauner, Jann Horn, Eric W. Biederman,
	Daniel Colascione, Joel Fernandes, linux-man, Linux API, lkml

On Mon, Sep 23, 2019 at 01:26:34PM +0200, Florian Weimer wrote:
> * Michael Kerrisk:
> 
> > SYNOPSIS
> >        int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
> >                              unsigned int flags);
> 
> This probably should reference a header for siginfo_t.

Agreed.

> 
> >        ESRCH  The target process does not exist.
> 
> If the descriptor is valid, does this mean the process has been waited
> for?  Maybe this can be made more explicit.

If by valid you mean "refers to a process/thread-group leader" aka is a
pidfd then yes: Getting ESRCH means that the process has exited and has
already been waited upon.
If it had only exited but not waited upon aka is a zombie, then sending
a signal will just work because that's currently how sending signals to
zombies works, i.e. if you only send a signal and don't do any
additional checks you won't notice a difference between a process being
alive and a process being a zombie. The userspace visible behavior in
terms of signaling them is identical.

> 
> >        The  pidfd_send_signal()  system call allows the avoidance of race
> >        conditions that occur when using traditional interfaces  (such  as
> >        kill(2)) to signal a process.  The problem is that the traditional
> >        interfaces specify the target process via a process ID (PID), with
> >        the  result  that the sender may accidentally send a signal to the
> >        wrong process if the originally intended target process has termi‐
> >        nated  and its PID has been recycled for another process.  By con‐
> >        trast, a PID file descriptor is a stable reference to  a  specific
> >        process;  if  that  process  terminates,  then the file descriptor
> >        ceases to be  valid  and  the  caller  of  pidfd_send_signal()  is
> >        informed of this fact via an ESRCH error.
> 
> It would be nice to explain somewhere how you can avoid the race using
> a PID descriptor.  Is there anything else besides CLONE_PIDFD?

If you're the parent of the process you can do this without CLONE_PIDFD:
pid = fork();
pidfd = pidfd_open();
ret = pidfd_send_signal(pidfd, 0, NULL, 0);
if (ret < 0 && errno == ESRCH)
	/* pidfd refers to another, recycled process */

> 
> >        static
> >        int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
> >                unsigned int flags)
> >        {
> >            return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
> >        }
> 
> Please use a different function name.  Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23  9:12 For review: pidfd_send_signal(2) manual page Michael Kerrisk (man-pages)
  2019-09-23 11:26 ` Florian Weimer
  2019-09-23 11:31 ` Daniel Colascione
@ 2019-09-23 14:29 ` Christian Brauner
  2019-09-23 20:27   ` Michael Kerrisk (man-pages)
  2019-09-23 21:27 ` ebiederm
  3 siblings, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2019-09-23 14:29 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Oleg Nesterov, Christian Brauner, Jann Horn, Eric W. Biederman,
	Daniel Colascione, Joel Fernandes, linux-man, Linux API, lkml

On Mon, Sep 23, 2019 at 11:12:00AM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Christian and all,
> 
> Below, I have the rendered version of the current draft of
> the pidfd_send_signal(2) manual page that I have written.
> The page source can be found in a Git branch at:
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd
> 
> I would be pleased to receive corrections and notes on any
> details that should be added. (For example, are there error
> cases that I have missed?)
> 
> Would you be able to review please?

Michael,

A big big thank you for doing this! Really appreciated.
I'm happy to review this!

> 
> Thanks,
> 
> Michael
> 
> 
> NAME
>        pidfd_send_signal - send a signal to a process specified by a file
>        descriptor
> 
> SYNOPSIS
>        int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
>                              unsigned int flags);
> 
> DESCRIPTION
>        The pidfd_send_signal() system call sends the signal  sig  to  the
>        target  process  referred  to by pidfd, a PID file descriptor that
>        refers to a process.
> 
>        If the info argument points to a  siginfo_t  buffer,  that  buffer
>        should be populated as described in rt_sigqueueinfo(2).
> 
>        If  the  info  argument  is  a NULL pointer, this is equivalent to
>        specifying a pointer to a siginfo_t buffer whose fields match  the
>        values  that  are  implicitly supplied when a signal is sent using
>        kill(2):
> 
>        *  si_signo is set to the signal number;
>        *  si_errno is set to 0;
>        *  si_code is set to SI_USER;
>        *  si_pid is set to the caller's PID; and
>        *  si_uid is set to the caller's real user ID.
> 
>        The calling process must either be in the same  PID  namespace  as
>        the  process  referred  to  by pidfd, or be in an ancestor of that
>        namespace.
> 
>        The flags argument is reserved for  future  use;  currently,  this
>        argument must be specified as 0.
> 
> RETURN VALUE
>        On  success,  pidfd_send_signal()  returns  0.   On success, -1 is

This should probably be "On error, -1 is [...]".

>        returned and errno is set to indicate the cause of the error.
> 
> ERRORS
>        EBADF  pidfd is not a valid PID file descriptor.
> 
>        EINVAL sig is not a valid signal.
> 
>        EINVAL The calling process is not in a PID namespace from which it
>               can send a signal to the target process.
> 
>        EINVAL flags is not 0.
> 
>        EPERM  The  calling  process  does not have permission to send the
>               signal to the target process.
> 
>        EPERM  pidfd  doesn't  refer   to   the   calling   process,   and
>               info.si_code is invalid (see rt_sigqueueinfo(2)).
> 
>        ESRCH  The target process does not exist.
> 
> VERSIONS
>        pidfd_send_signal() first appeared in Linux 5.1.
> 
> CONFORMING TO
>        pidfd_send_signal() is Linux specific.
> 
> NOTES
>        Currently, there is no glibc wrapper for this system call; call it
>        using syscall(2).
> 
>    PID file descriptors
>        The pidfd argument is a PID file  descriptor,  a  file  descriptor
>        that  refers  to  process.  Such a file descriptor can be obtained
>        in any of the following ways:
> 
>        *  by opening a /proc/[pid] directory;
> 
>        *  using pidfd_open(2); or
> 
>        *  via the PID file descriptor that  is  returned  by  a  call  to
>           clone(2) or clone3(2) that specifies the CLONE_PIDFD flag.
> 
>        The  pidfd_send_signal()  system call allows the avoidance of race
>        conditions that occur when using traditional interfaces  (such  as
>        kill(2)) to signal a process.  The problem is that the traditional
>        interfaces specify the target process via a process ID (PID), with
>        the  result  that the sender may accidentally send a signal to the
>        wrong process if the originally intended target process has termi‐
>        nated  and its PID has been recycled for another process.  By con‐
>        trast, a PID file descriptor is a stable reference to  a  specific
>        process;  if  that  process  terminates,  then the file descriptor
>        ceases to be  valid  and  the  caller  of  pidfd_send_signal()  is
>        informed of this fact via an ESRCH error.
> 
> EXAMPLE
>        #define _GNU_SOURCE
>        #include <limits.h>
>        #include <signal.h>
>        #include <fcntl.h>
>        #include <stdio.h>
>        #include <string.h>
>        #include <stdlib.h>
>        #include <unistd.h>
>        #include <sys/syscall.h>
> 
>        #ifndef __NR_pidfd_send_signal
>        #define __NR_pidfd_send_signal 424
>        #endif
> 
>        static
>        int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
>                unsigned int flags)
>        {
>            return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
>        }
> 
>        int
>        main(int argc, char *argv[])
>        {
>            siginfo_t info;
>            char path[PATH_MAX];
>            int pidfd, sig;
> 
>            if (argc != 3) {
>                fprintf(stderr, "Usage: %s <pid> <signal>\n", argv[0]);
>                exit(EXIT_FAILURE);
>            }
> 
>            sig = atoi(argv[2]);
> 
>            /* Obtain a PID file descriptor by opening the /proc/PID directory
>               of the target process */
> 
>            snprintf(path, sizeof(path), "/proc/%s", argv[1]);
> 
>            pidfd = open(path, O_RDONLY);
>            if (pidfd == -1) {
>                perror("open");
>                exit(EXIT_FAILURE);
>            }
> 
>            /* Populate a 'siginfo_t' structure for use with
>               pidfd_send_signal() */
> 
>            memset(&info, 0, sizeof(info));
>            info.si_code = SI_QUEUE;
>            info.si_signo = sig;
>            info.si_errno = 0;
>            info.si_uid = getuid();
>            info.si_pid = getpid();
>            info.si_value.sival_int = 1234;
> 
>            /* Send the signal */
> 
>            if (pidfd_send_signal(pidfd, sig, &info, 0) == -1) {
>                perror("pidfd_send_signal");
>                exit(EXIT_FAILURE);
>            }
> 
>            exit(EXIT_SUCCESS);
>        }
> 
> SEE ALSO
>        clone(2),   kill(2),   pidfd_open(2),  rt_sigqueueinfo(2),  sigac‐
>        tion(2), pid_namespaces(7), signal(7)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23 14:29 ` Christian Brauner
@ 2019-09-23 20:27   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-23 20:27 UTC (permalink / raw)
  To: Christian Brauner
  Cc: mtk.manpages, Oleg Nesterov, Christian Brauner, Jann Horn,
	Eric W. Biederman, Daniel Colascione, Joel Fernandes, linux-man,
	Linux API, lkml

Hello Christian,

On 9/23/19 4:29 PM, Christian Brauner wrote:
> On Mon, Sep 23, 2019 at 11:12:00AM +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Christian and all,
>>
>> Below, I have the rendered version of the current draft of
>> the pidfd_send_signal(2) manual page that I have written.
>> The page source can be found in a Git branch at:
>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd
>>
>> I would be pleased to receive corrections and notes on any
>> details that should be added. (For example, are there error
>> cases that I have missed?)
>>
>> Would you be able to review please?
> 
> Michael,
> 
> A big big thank you for doing this! Really appreciated.
> I'm happy to review this!
> 
>>
>> Thanks,
>>
>> Michael
>>
>>
>> NAME
>>        pidfd_send_signal - send a signal to a process specified by a file
>>        descriptor
>>
>> SYNOPSIS
>>        int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
>>                              unsigned int flags);
>>
>> DESCRIPTION
>>        The pidfd_send_signal() system call sends the signal  sig  to  the
>>        target  process  referred  to by pidfd, a PID file descriptor that
>>        refers to a process.
>>
>>        If the info argument points to a  siginfo_t  buffer,  that  buffer
>>        should be populated as described in rt_sigqueueinfo(2).
>>
>>        If  the  info  argument  is  a NULL pointer, this is equivalent to
>>        specifying a pointer to a siginfo_t buffer whose fields match  the
>>        values  that  are  implicitly supplied when a signal is sent using
>>        kill(2):
>>
>>        *  si_signo is set to the signal number;
>>        *  si_errno is set to 0;
>>        *  si_code is set to SI_USER;
>>        *  si_pid is set to the caller's PID; and
>>        *  si_uid is set to the caller's real user ID.
>>
>>        The calling process must either be in the same  PID  namespace  as
>>        the  process  referred  to  by pidfd, or be in an ancestor of that
>>        namespace.
>>
>>        The flags argument is reserved for  future  use;  currently,  this
>>        argument must be specified as 0.
>>
>> RETURN VALUE
>>        On  success,  pidfd_send_signal()  returns  0.   On success, -1 is
> 
> This should probably be "On error, -1 is [...]".

Thanks. Fixed.


Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23  9:12 For review: pidfd_send_signal(2) manual page Michael Kerrisk (man-pages)
                   ` (2 preceding siblings ...)
  2019-09-23 14:29 ` Christian Brauner
@ 2019-09-23 21:27 ` ebiederm
  2019-09-24 19:10   ` Michael Kerrisk (man-pages)
  3 siblings, 1 reply; 23+ messages in thread
From: ebiederm @ 2019-09-23 21:27 UTC (permalink / raw)
  To: Michael Kerrisk \(man-pages\)
  Cc: Oleg Nesterov, Christian Brauner, Jann Horn, Daniel Colascione,
	Joel Fernandes, linux-man, Linux API, lkml

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hello Christian and all,
>
> Below, I have the rendered version of the current draft of
> the pidfd_send_signal(2) manual page that I have written.
> The page source can be found in a Git branch at:
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd
>
> I would be pleased to receive corrections and notes on any
> details that should be added. (For example, are there error
> cases that I have missed?)
>
> Would you be able to review please?
>
> Thanks,
>
> Michael
>
>
> NAME
>        pidfd_send_signal - send a signal to a process specified by a file
>        descriptor
>
> SYNOPSIS
>        int pidfd_send_signal(int pidfd, int sig, siginfo_t info,

 This needs to be "siginfo_t *info," -----------------------^

>                              unsigned int flags);
>

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23 21:27 ` ebiederm
@ 2019-09-24 19:10   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-24 19:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mtk.manpages, Oleg Nesterov, Christian Brauner, Jann Horn,
	Daniel Colascione, Joel Fernandes, linux-man, Linux API, lkml

On 9/23/19 11:27 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> 
>> Hello Christian and all,
>>
>> Below, I have the rendered version of the current draft of
>> the pidfd_send_signal(2) manual page that I have written.
>> The page source can be found in a Git branch at:
>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd
>>
>> I would be pleased to receive corrections and notes on any
>> details that should be added. (For example, are there error
>> cases that I have missed?)
>>
>> Would you be able to review please?
>>
>> Thanks,
>>
>> Michael
>>
>>
>> NAME
>>        pidfd_send_signal - send a signal to a process specified by a file
>>        descriptor
>>
>> SYNOPSIS
>>        int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
> 
>  This needs to be "siginfo_t *info," -----------------------^

Thanks, Eric. Fixed.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23 11:31 ` Daniel Colascione
@ 2019-09-24 19:42   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-24 19:42 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: mtk.manpages, Oleg Nesterov, Christian Brauner, Jann Horn,
	Eric W. Biederman, Joel Fernandes, linux-man, Linux API, lkml

On 9/23/19 1:31 PM, Daniel Colascione wrote:
> On Mon, Sep 23, 2019 at 2:12 AM Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>>        The  pidfd_send_signal()  system call allows the avoidance of race
>>        conditions that occur when using traditional interfaces  (such  as
>>        kill(2)) to signal a process.  The problem is that the traditional
>>        interfaces specify the target process via a process ID (PID), with
>>        the  result  that the sender may accidentally send a signal to the
>>        wrong process if the originally intended target process has termi‐
>>        nated  and its PID has been recycled for another process.  By con‐
>>        trast, a PID file descriptor is a stable reference to  a  specific
>>        process;  if  that  process  terminates,  then the file descriptor
>>        ceases to be  valid
> 
> The file *descriptor* remains valid even after the process to which it
> refers exits. You can close(2) the file descriptor without getting
> EBADF. I'd say, instead, that "a PID file descriptor is a stable
> reference to a specific process; process-related operations on a PID
> file descriptor fail after that process exits".

Thanks, Daniel. I like that rephrasing, but, since pidfd_send_signal()
is (so far as I know) currently the only relevant process-related
operation (and because this is the manual page describing that
syscall), I made it:

[[
By contrast, a PID file descriptor is a stable reference to a
specific process; if that process terminates, pidfd_send_signal()
fails with the error ESRCH.
]]

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23 11:26 ` Florian Weimer
  2019-09-23 14:23   ` Christian Brauner
@ 2019-09-24 19:43   ` Michael Kerrisk (man-pages)
  2019-09-25  1:48   ` Jann Horn
  2 siblings, 0 replies; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-24 19:43 UTC (permalink / raw)
  To: Florian Weimer
  Cc: mtk.manpages, Oleg Nesterov, Christian Brauner, Jann Horn,
	Eric W. Biederman, Daniel Colascione, Joel Fernandes, linux-man,
	Linux API, lkml

Hello Florian,

On 9/23/19 1:26 PM, Florian Weimer wrote:
> * Michael Kerrisk:
> 
>> SYNOPSIS
>>        int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
>>                              unsigned int flags);
> 
> This probably should reference a header for siginfo_t.

Thanks. I added: #include <signal.h>

>>        ESRCH  The target process does not exist.
> 
> If the descriptor is valid, does this mean the process has been waited
> for?  Maybe this can be made more explicit.

Yes. I added "(i.e., it has terminated and been waited on)".

>>        The  pidfd_send_signal()  system call allows the avoidance of race
>>        conditions that occur when using traditional interfaces  (such  as
>>        kill(2)) to signal a process.  The problem is that the traditional
>>        interfaces specify the target process via a process ID (PID), with
>>        the  result  that the sender may accidentally send a signal to the
>>        wrong process if the originally intended target process has termi‐
>>        nated  and its PID has been recycled for another process.  By con‐
>>        trast, a PID file descriptor is a stable reference to  a  specific
>>        process;  if  that  process  terminates,  then the file descriptor
>>        ceases to be  valid  and  the  caller  of  pidfd_send_signal()  is
>>        informed of this fact via an ESRCH error.
> 
> It would be nice to explain somewhere how you can avoid the race using
> a PID descriptor.  Is there anything else besides CLONE_PIDFD?

Please see my comment in reply to Christian (which will be sent just
after this).

>>        static
>>        int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
>>                unsigned int flags)
>>        {
>>            return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
>>        }
> 
> Please use a different function name.  Thanks.

Please see my open question in the thread on pidfd_open().

Thanks for the review, Florian.

Cheers,

Michael

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23 14:23   ` Christian Brauner
@ 2019-09-24 19:44     ` Michael Kerrisk (man-pages)
  2019-09-24 19:57       ` Christian Brauner
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-24 19:44 UTC (permalink / raw)
  To: Christian Brauner, Florian Weimer
  Cc: mtk.manpages, Oleg Nesterov, Christian Brauner, Jann Horn,
	Eric W. Biederman, Daniel Colascione, Joel Fernandes, linux-man,
	Linux API, lkml

Hello Christian,

On 9/23/19 4:23 PM, Christian Brauner wrote:
> On Mon, Sep 23, 2019 at 01:26:34PM +0200, Florian Weimer wrote:
>> * Michael Kerrisk:
>>
>>> SYNOPSIS
>>>        int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
>>>                              unsigned int flags);
>>
>> This probably should reference a header for siginfo_t.
> 
> Agreed.
> 
>>
>>>        ESRCH  The target process does not exist.
>>
>> If the descriptor is valid, does this mean the process has been waited
>> for?  Maybe this can be made more explicit.
> 
> If by valid you mean "refers to a process/thread-group leader" aka is a
> pidfd then yes: Getting ESRCH means that the process has exited and has
> already been waited upon.
> If it had only exited but not waited upon aka is a zombie, then sending
> a signal will just work because that's currently how sending signals to
> zombies works, i.e. if you only send a signal and don't do any
> additional checks you won't notice a difference between a process being
> alive and a process being a zombie. The userspace visible behavior in
> terms of signaling them is identical.

(Thanks for the clarification. I added the text "(i.e., it has 
terminated and been waited on)" to the ESRCH error.)

>>>        The  pidfd_send_signal()  system call allows the avoidance of race
>>>        conditions that occur when using traditional interfaces  (such  as
>>>        kill(2)) to signal a process.  The problem is that the traditional
>>>        interfaces specify the target process via a process ID (PID), with
>>>        the  result  that the sender may accidentally send a signal to the
>>>        wrong process if the originally intended target process has termi‐
>>>        nated  and its PID has been recycled for another process.  By con‐
>>>        trast, a PID file descriptor is a stable reference to  a  specific
>>>        process;  if  that  process  terminates,  then the file descriptor
>>>        ceases to be  valid  and  the  caller  of  pidfd_send_signal()  is
>>>        informed of this fact via an ESRCH error.
>>
>> It would be nice to explain somewhere how you can avoid the race using
>> a PID descriptor.  Is there anything else besides CLONE_PIDFD?
> 
> If you're the parent of the process you can do this without CLONE_PIDFD:
> pid = fork();
> pidfd = pidfd_open();
> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> if (ret < 0 && errno == ESRCH)
> 	/* pidfd refers to another, recycled process */

Although there is still the race between the fork() and the
pidfd_open(), right?

>>>        static
>>>        int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
>>>                unsigned int flags)
>>>        {
>>>            return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
>>>        }
>>
>> Please use a different function name.  Thanks.

Covered in another thread. I await some further feedback from Florian.

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-24 19:44     ` Michael Kerrisk (man-pages)
@ 2019-09-24 19:57       ` Christian Brauner
  2019-09-24 20:07         ` Christian Brauner
  2019-09-24 21:00         ` Michael Kerrisk (man-pages)
  0 siblings, 2 replies; 23+ messages in thread
From: Christian Brauner @ 2019-09-24 19:57 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Florian Weimer, Oleg Nesterov, Jann Horn, Eric W. Biederman,
	Daniel Colascione, Joel Fernandes, linux-man, Linux API, lkml

On Tue, Sep 24, 2019 at 09:44:49PM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Christian,
> 
> On 9/23/19 4:23 PM, Christian Brauner wrote:
> > On Mon, Sep 23, 2019 at 01:26:34PM +0200, Florian Weimer wrote:
> >> * Michael Kerrisk:
> >>
> >>> SYNOPSIS
> >>>        int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
> >>>                              unsigned int flags);
> >>
> >> This probably should reference a header for siginfo_t.
> > 
> > Agreed.
> > 
> >>
> >>>        ESRCH  The target process does not exist.
> >>
> >> If the descriptor is valid, does this mean the process has been waited
> >> for?  Maybe this can be made more explicit.
> > 
> > If by valid you mean "refers to a process/thread-group leader" aka is a
> > pidfd then yes: Getting ESRCH means that the process has exited and has
> > already been waited upon.
> > If it had only exited but not waited upon aka is a zombie, then sending
> > a signal will just work because that's currently how sending signals to
> > zombies works, i.e. if you only send a signal and don't do any
> > additional checks you won't notice a difference between a process being
> > alive and a process being a zombie. The userspace visible behavior in
> > terms of signaling them is identical.
> 
> (Thanks for the clarification. I added the text "(i.e., it has 
> terminated and been waited on)" to the ESRCH error.)
> 
> >>>        The  pidfd_send_signal()  system call allows the avoidance of race
> >>>        conditions that occur when using traditional interfaces  (such  as
> >>>        kill(2)) to signal a process.  The problem is that the traditional
> >>>        interfaces specify the target process via a process ID (PID), with
> >>>        the  result  that the sender may accidentally send a signal to the
> >>>        wrong process if the originally intended target process has termi‐
> >>>        nated  and its PID has been recycled for another process.  By con‐
> >>>        trast, a PID file descriptor is a stable reference to  a  specific
> >>>        process;  if  that  process  terminates,  then the file descriptor
> >>>        ceases to be  valid  and  the  caller  of  pidfd_send_signal()  is
> >>>        informed of this fact via an ESRCH error.
> >>
> >> It would be nice to explain somewhere how you can avoid the race using
> >> a PID descriptor.  Is there anything else besides CLONE_PIDFD?
> > 
> > If you're the parent of the process you can do this without CLONE_PIDFD:
> > pid = fork();
> > pidfd = pidfd_open();
> > ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> > if (ret < 0 && errno == ESRCH)
> > 	/* pidfd refers to another, recycled process */
> 
> Although there is still the race between the fork() and the
> pidfd_open(), right?

Actually no and my code is even too complex.
If you are the parent, and this is really a sequence that obeys the
ordering pidfd_open() before waiting:

pid = fork();
if (pid == 0)
	exit(EXIT_SUCCESS);
pidfd = pidfd_open(pid, 0);
waitid(pid, ...);

Then you are guaranteed that pidfd will refer to pid. No recycling can
happen since the process has not been waited upon yet (That is,
excluding special cases such as where you have a mainloop where a
callback reacts to a SIGCHLD event and waits on the child behind your
back and your next callback in the mainloop calls pidfd_open() while the
pid has been recycled etc.).
A race could only appear in sequences where waiting happens before
pidfd_open():

pid = fork();
if (pid == 0)
	exit(EXIT_SUCCESS);
waitid(pid, ...);
pidfd = pidfd_open(pid, 0);

which honestly simply doesn't make any sense. So if you're the parent
and you combine fork() + pidfd_open() correctly things should be fine
without even having to verify via pidfd_send_signal() (I missed that in
my first mail.).
(Now, it gets more hairy when one considers clone(CLONE_PARENT) but that
would be wildly esoteric because at that point you're using clone()
already and then you should simply pass clone(CLONE_PARENT | CLONE_PIDFD).)

If you're _not_ the parent then CLONE_PIDFD and sending around the pidfd
are your only option to avoiding the race imho.

> 
> >>>        static
> >>>        int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
> >>>                unsigned int flags)
> >>>        {
> >>>            return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
> >>>        }
> >>
> >> Please use a different function name.  Thanks.
> 
> Covered in another thread. I await some further feedback from Florian.

Right, that wasn't my suggestion anyway. :)

Thanks!
Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-24 19:57       ` Christian Brauner
@ 2019-09-24 20:07         ` Christian Brauner
  2019-09-24 21:00         ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2019-09-24 20:07 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Florian Weimer, Oleg Nesterov, Jann Horn, Eric W. Biederman,
	Daniel Colascione, Joel Fernandes, linux-man, Linux API, lkml

On Tue, Sep 24, 2019 at 09:57:04PM +0200, Christian Brauner wrote:
> On Tue, Sep 24, 2019 at 09:44:49PM +0200, Michael Kerrisk (man-pages) wrote:
> > Hello Christian,
> > 
> > On 9/23/19 4:23 PM, Christian Brauner wrote:
> > > On Mon, Sep 23, 2019 at 01:26:34PM +0200, Florian Weimer wrote:
> > >> * Michael Kerrisk:
> > >>
> > >>> SYNOPSIS
> > >>>        int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
> > >>>                              unsigned int flags);
> > >>
> > >> This probably should reference a header for siginfo_t.
> > > 
> > > Agreed.
> > > 
> > >>
> > >>>        ESRCH  The target process does not exist.
> > >>
> > >> If the descriptor is valid, does this mean the process has been waited
> > >> for?  Maybe this can be made more explicit.
> > > 
> > > If by valid you mean "refers to a process/thread-group leader" aka is a
> > > pidfd then yes: Getting ESRCH means that the process has exited and has
> > > already been waited upon.
> > > If it had only exited but not waited upon aka is a zombie, then sending
> > > a signal will just work because that's currently how sending signals to
> > > zombies works, i.e. if you only send a signal and don't do any
> > > additional checks you won't notice a difference between a process being
> > > alive and a process being a zombie. The userspace visible behavior in
> > > terms of signaling them is identical.
> > 
> > (Thanks for the clarification. I added the text "(i.e., it has 
> > terminated and been waited on)" to the ESRCH error.)
> > 
> > >>>        The  pidfd_send_signal()  system call allows the avoidance of race
> > >>>        conditions that occur when using traditional interfaces  (such  as
> > >>>        kill(2)) to signal a process.  The problem is that the traditional
> > >>>        interfaces specify the target process via a process ID (PID), with
> > >>>        the  result  that the sender may accidentally send a signal to the
> > >>>        wrong process if the originally intended target process has termi‐
> > >>>        nated  and its PID has been recycled for another process.  By con‐
> > >>>        trast, a PID file descriptor is a stable reference to  a  specific
> > >>>        process;  if  that  process  terminates,  then the file descriptor
> > >>>        ceases to be  valid  and  the  caller  of  pidfd_send_signal()  is
> > >>>        informed of this fact via an ESRCH error.
> > >>
> > >> It would be nice to explain somewhere how you can avoid the race using
> > >> a PID descriptor.  Is there anything else besides CLONE_PIDFD?
> > > 
> > > If you're the parent of the process you can do this without CLONE_PIDFD:
> > > pid = fork();
> > > pidfd = pidfd_open();
> > > ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> > > if (ret < 0 && errno == ESRCH)
> > > 	/* pidfd refers to another, recycled process */
> > 
> > Although there is still the race between the fork() and the
> > pidfd_open(), right?
> 
> Actually no and my code is even too complex.
> If you are the parent, and this is really a sequence that obeys the
> ordering pidfd_open() before waiting:
> 
> pid = fork();
> if (pid == 0)
> 	exit(EXIT_SUCCESS);
> pidfd = pidfd_open(pid, 0);
> waitid(pid, ...);
> 
> Then you are guaranteed that pidfd will refer to pid. No recycling can
> happen since the process has not been waited upon yet (That is,
> excluding special cases such as where you have a mainloop where a
> callback reacts to a SIGCHLD event and waits on the child behind your
> back and your next callback in the mainloop calls pidfd_open() while the
> pid has been recycled etc.).

If we wanted to be super nitpicky one could also get in that situation
where you do:

signal(SIGCHLD,SIG_IGN);

// or

struct sigaction sa;
sa.sa_handler = SIG_IGN;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGCHLD, &sa, 0)

pid = fork();
if (pid == 0)
	exit(EXIT_SUCCESS);
pidfd = pidfd_open();

because then the process gets autoreaped and can be recycled. But again,
that's just bad form and in that scenario one should again use
clone(CLONE_PIDFD) instead of fork().

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-24 19:57       ` Christian Brauner
  2019-09-24 20:07         ` Christian Brauner
@ 2019-09-24 21:00         ` Michael Kerrisk (man-pages)
  2019-09-24 21:08           ` Daniel Colascione
  2019-09-24 21:53           ` Christian Brauner
  1 sibling, 2 replies; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-24 21:00 UTC (permalink / raw)
  To: Christian Brauner
  Cc: mtk.manpages, Florian Weimer, Oleg Nesterov, Jann Horn,
	Eric W. Biederman, Daniel Colascione, Joel Fernandes, linux-man,
	Linux API, lkml

Hello Christian,

>>> If you're the parent of the process you can do this without CLONE_PIDFD:
>>> pid = fork();
>>> pidfd = pidfd_open();
>>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
>>> if (ret < 0 && errno == ESRCH)
>>> 	/* pidfd refers to another, recycled process */
>>
>> Although there is still the race between the fork() and the
>> pidfd_open(), right?
> 
> Actually no and my code is even too complex.
> If you are the parent, and this is really a sequence that obeys the
> ordering pidfd_open() before waiting:
> 
> pid = fork();
> if (pid == 0)
> 	exit(EXIT_SUCCESS);
> pidfd = pidfd_open(pid, 0);
> waitid(pid, ...);
> 
> Then you are guaranteed that pidfd will refer to pid. No recycling can
> happen since the process has not been waited upon yet (That is,

D'oh! Yes, of course. 

> excluding special cases such as where you have a mainloop where a
> callback reacts to a SIGCHLD event and waits on the child behind your
> back and your next callback in the mainloop calls pidfd_open() while the
> pid has been recycled etc.).
> A race could only appear in sequences where waiting happens before
> pidfd_open():
> 
> pid = fork();
> if (pid == 0)
> 	exit(EXIT_SUCCESS);
> waitid(pid, ...);
> pidfd = pidfd_open(pid, 0);
> 
> which honestly simply doesn't make any sense. So if you're the parent
> and you combine fork() + pidfd_open() correctly things should be fine
> without even having to verify via pidfd_send_signal() (I missed that in
> my first mail.).

Thanks for the additional detail.

I added the following to the pidfd_open() page, to
prevent people making the same thinko as me:

       The following code sequence can be used to obtain a file  descrip‐
       tor for the child of fork(2):

           pid = fork();
           if (pid > 0) {     /* If parent */
               pidfd = pidfd_open(pid, 0);
               ...
           }

       Even  if  the  child process has already terminated by the time of
       the pidfd_open() call, the returned file descriptor is  guaranteed
       to refer to the child because the parent has not yet waited on the
       child (and therefore, the child's ID has not been recycled).

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-24 21:00         ` Michael Kerrisk (man-pages)
@ 2019-09-24 21:08           ` Daniel Colascione
  2019-09-25 13:46             ` Michael Kerrisk (man-pages)
  2019-09-24 21:53           ` Christian Brauner
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Colascione @ 2019-09-24 21:08 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Christian Brauner, Florian Weimer, Oleg Nesterov, Jann Horn,
	Eric W. Biederman, Joel Fernandes, linux-man, Linux API, lkml

On Tue, Sep 24, 2019 at 2:00 PM Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
>
> Hello Christian,
>
> >>> If you're the parent of the process you can do this without CLONE_PIDFD:
> >>> pid = fork();
> >>> pidfd = pidfd_open();
> >>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> >>> if (ret < 0 && errno == ESRCH)
> >>>     /* pidfd refers to another, recycled process */
> >>
> >> Although there is still the race between the fork() and the
> >> pidfd_open(), right?
> >
> > Actually no and my code is even too complex.
> > If you are the parent, and this is really a sequence that obeys the
> > ordering pidfd_open() before waiting:
> >
> > pid = fork();
> > if (pid == 0)
> >       exit(EXIT_SUCCESS);
> > pidfd = pidfd_open(pid, 0);
> > waitid(pid, ...);
> >
> > Then you are guaranteed that pidfd will refer to pid. No recycling can
> > happen since the process has not been waited upon yet (That is,
>
> D'oh! Yes, of course.

You still have a race if you're the parent and you have SIGCHLD set to
SIG_IGN though.

> > excluding special cases such as where you have a mainloop where a
> > callback reacts to a SIGCHLD event and waits on the child behind your
> > back and your next callback in the mainloop calls pidfd_open() while the
> > pid has been recycled etc.).

That's a pretty common case though, especially if you're a library.

> > A race could only appear in sequences where waiting happens before
> > pidfd_open():
> >
> > pid = fork();
> > if (pid == 0)
> >       exit(EXIT_SUCCESS);
> > waitid(pid, ...);
> > pidfd = pidfd_open(pid, 0);
> >
> > which honestly simply doesn't make any sense. So if you're the parent
> > and you combine fork() + pidfd_open() correctly things should be fine
> > without even having to verify via pidfd_send_signal() (I missed that in
> > my first mail.).
>
> Thanks for the additional detail.
>
> I added the following to the pidfd_open() page, to
> prevent people making the same thinko as me:
>
>        The following code sequence can be used to obtain a file  descrip‐
>        tor for the child of fork(2):
>
>            pid = fork();
>            if (pid > 0) {     /* If parent */
>                pidfd = pidfd_open(pid, 0);
>                ...
>            }
>
>        Even  if  the  child process has already terminated by the time of
>        the pidfd_open() call, the returned file descriptor is  guaranteed
>        to refer to the child because the parent has not yet waited on the
>        child (and therefore, the child's ID has not been recycled).

I'd prefer that sample code be robust in all cases.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-24 21:00         ` Michael Kerrisk (man-pages)
  2019-09-24 21:08           ` Daniel Colascione
@ 2019-09-24 21:53           ` Christian Brauner
  2019-09-25 13:46             ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2019-09-24 21:53 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Florian Weimer, Oleg Nesterov, Jann Horn, Eric W. Biederman,
	Daniel Colascione, Joel Fernandes, linux-man, Linux API, lkml

On Tue, Sep 24, 2019 at 11:00:03PM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Christian,
> 
> >>> If you're the parent of the process you can do this without CLONE_PIDFD:
> >>> pid = fork();
> >>> pidfd = pidfd_open();
> >>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> >>> if (ret < 0 && errno == ESRCH)
> >>> 	/* pidfd refers to another, recycled process */
> >>
> >> Although there is still the race between the fork() and the
> >> pidfd_open(), right?
> > 
> > Actually no and my code is even too complex.
> > If you are the parent, and this is really a sequence that obeys the
> > ordering pidfd_open() before waiting:
> > 
> > pid = fork();
> > if (pid == 0)
> > 	exit(EXIT_SUCCESS);
> > pidfd = pidfd_open(pid, 0);
> > waitid(pid, ...);
> > 
> > Then you are guaranteed that pidfd will refer to pid. No recycling can
> > happen since the process has not been waited upon yet (That is,
> 
> D'oh! Yes, of course. 
> 
> > excluding special cases such as where you have a mainloop where a
> > callback reacts to a SIGCHLD event and waits on the child behind your
> > back and your next callback in the mainloop calls pidfd_open() while the
> > pid has been recycled etc.).
> > A race could only appear in sequences where waiting happens before
> > pidfd_open():
> > 
> > pid = fork();
> > if (pid == 0)
> > 	exit(EXIT_SUCCESS);
> > waitid(pid, ...);
> > pidfd = pidfd_open(pid, 0);
> > 
> > which honestly simply doesn't make any sense. So if you're the parent
> > and you combine fork() + pidfd_open() correctly things should be fine
> > without even having to verify via pidfd_send_signal() (I missed that in
> > my first mail.).
> 
> Thanks for the additional detail.

You're very welcome.

> 
> I added the following to the pidfd_open() page, to
> prevent people making the same thinko as me:
> 
>        The following code sequence can be used to obtain a file  descrip‐
>        tor for the child of fork(2):
> 
>            pid = fork();
>            if (pid > 0) {     /* If parent */
>                pidfd = pidfd_open(pid, 0);
>                ...
>            }
> 
>        Even  if  the  child process has already terminated by the time of
>        the pidfd_open() call, the returned file descriptor is  guaranteed
>        to refer to the child because the parent has not yet waited on the
>        child (and therefore, the child's ID has not been recycled).

Thanks! I'm fine with the example. The code illustrates the basics. If
you want to go overboard, you can mention my callback example and put my
SIG_IGN code snippet from my earlier mails (cf. [1] and [2]) in there.
But imho, that'll complicate the manpage and I'm not sure it's worth it.

Thanks!
Christian

/* References */
[1]: https://lore.kernel.org/r/20190924195701.7pw2olbviieqsg5q@wittgenstein
[2]: https://lore.kernel.org/r/20190924200735.2dvqhan7ynnmfc7s@wittgenstein

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-23 11:26 ` Florian Weimer
  2019-09-23 14:23   ` Christian Brauner
  2019-09-24 19:43   ` Michael Kerrisk (man-pages)
@ 2019-09-25  1:48   ` Jann Horn
  2 siblings, 0 replies; 23+ messages in thread
From: Jann Horn @ 2019-09-25  1:48 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Michael Kerrisk (man-pages),
	Oleg Nesterov, Christian Brauner, Eric W. Biederman,
	Daniel Colascione, Joel Fernandes, linux-man, Linux API, lkml

On Mon, Sep 23, 2019 at 1:26 PM Florian Weimer <fw@deneb.enyo.de> wrote:
> * Michael Kerrisk:
> >        The  pidfd_send_signal()  system call allows the avoidance of race
> >        conditions that occur when using traditional interfaces  (such  as
> >        kill(2)) to signal a process.  The problem is that the traditional
> >        interfaces specify the target process via a process ID (PID), with
> >        the  result  that the sender may accidentally send a signal to the
> >        wrong process if the originally intended target process has termi‐
> >        nated  and its PID has been recycled for another process.  By con‐
> >        trast, a PID file descriptor is a stable reference to  a  specific
> >        process;  if  that  process  terminates,  then the file descriptor
> >        ceases to be  valid  and  the  caller  of  pidfd_send_signal()  is
> >        informed of this fact via an ESRCH error.
>
> It would be nice to explain somewhere how you can avoid the race using
> a PID descriptor.  Is there anything else besides CLONE_PIDFD?

My favorite example here is that you could implement "killall" without
PID reuse races. With /proc/$pid file descriptors, you could do it
like this (rough pseudocode with missing error handling and resource
leaks and such):

for each pid {
  procfs_pid_fd = open("/proc/"+pid);
  if (procfs_pid_fd == -1) continue;
  comm_fd = openat(procfs_pid_fd, "comm");
  if (comm_fd == -1) continue;
  char buf[1000];
  int n = read(comm_fd, buf, sizeof(buf)-1);
  buf[n] = 0;
  if (strcmp(buf, expected_comm) == 0) {
    pidfd_send_signal(procfs_pid_fd, SIGKILL, NULL, 0);
  }
}

If you want to avoid using a procfs fd for this, I think you can still
do it, the dance just gets more complicated:

for each pid {
  procfs_pid_fd = open("/proc/"+pid);
  if (procfs_pid_fd == -1) continue;
  pid_fd = pidfd_open(pid, 0);
  if (pid_fd == -1) continue;
  /* at this point procfs_pid_fd and pid_fd may refer to different processes */
  comm_fd = openat(procfs_pid_fd, "comm");
  if (comm_fd == -1) continue;
  /* at this point we know that procfs_pid_fd and pid_fd refer to the
same struct pid, because otherwise the procfs_pid_fd must point to a
directory that throws -ESRCH for everything */
  char buf[1000];
  int n = read(comm_fd, buf, sizeof(buf)-1);
  buf[n] = 0;
  if (strcmp(buf, expected_comm) == 0) {
    pidfd_send_signal(pid_fd, SIGKILL, NULL, 0);
  }
}

But I don't think anyone is actually interested in using pidfds for
this kind of usecase right now.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-24 21:08           ` Daniel Colascione
@ 2019-09-25 13:46             ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-25 13:46 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: mtk.manpages, Christian Brauner, Florian Weimer, Oleg Nesterov,
	Jann Horn, Eric W. Biederman, Joel Fernandes, linux-man,
	Linux API, lkml

Hello Daniel,

On 9/24/19 11:08 PM, Daniel Colascione wrote:
> On Tue, Sep 24, 2019 at 2:00 PM Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>>
>> Hello Christian,
>>
>>>>> If you're the parent of the process you can do this without CLONE_PIDFD:
>>>>> pid = fork();
>>>>> pidfd = pidfd_open();
>>>>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
>>>>> if (ret < 0 && errno == ESRCH)
>>>>>     /* pidfd refers to another, recycled process */
>>>>
>>>> Although there is still the race between the fork() and the
>>>> pidfd_open(), right?
>>>
>>> Actually no and my code is even too complex.
>>> If you are the parent, and this is really a sequence that obeys the
>>> ordering pidfd_open() before waiting:
>>>
>>> pid = fork();
>>> if (pid == 0)
>>>       exit(EXIT_SUCCESS);
>>> pidfd = pidfd_open(pid, 0);
>>> waitid(pid, ...);
>>>
>>> Then you are guaranteed that pidfd will refer to pid. No recycling can
>>> happen since the process has not been waited upon yet (That is,
>>
>> D'oh! Yes, of course.
> 
> You still have a race if you're the parent and you have SIGCHLD set to
> SIG_IGN though.

Yes, thanks for reminding me of that (as did Christian also).

>>> excluding special cases such as where you have a mainloop where a
>>> callback reacts to a SIGCHLD event and waits on the child behind your
>>> back and your next callback in the mainloop calls pidfd_open() while the
>>> pid has been recycled etc.).
> 
> That's a pretty common case though, especially if you're a library.

I presume that conventionally the only real resolution of this kind of
problem is that the mainloop SIGCHLD call back has a waitpid() loop
that (in a nonblocking fashion) loops checking each of the PIDs created
by the mainloop, right?

>>> A race could only appear in sequences where waiting happens before
>>> pidfd_open():
>>>
>>> pid = fork();
>>> if (pid == 0)
>>>       exit(EXIT_SUCCESS);
>>> waitid(pid, ...);
>>> pidfd = pidfd_open(pid, 0);
>>>
>>> which honestly simply doesn't make any sense. So if you're the parent
>>> and you combine fork() + pidfd_open() correctly things should be fine
>>> without even having to verify via pidfd_send_signal() (I missed that in
>>> my first mail.).
>>
>> Thanks for the additional detail.
>>
>> I added the following to the pidfd_open() page, to
>> prevent people making the same thinko as me:
>>
>>        The following code sequence can be used to obtain a file  descrip‐
>>        tor for the child of fork(2):
>>
>>            pid = fork();
>>            if (pid > 0) {     /* If parent */
>>                pidfd = pidfd_open(pid, 0);
>>                ...
>>            }
>>
>>        Even  if  the  child process has already terminated by the time of
>>        the pidfd_open() call, the returned file descriptor is  guaranteed
>>        to refer to the child because the parent has not yet waited on the
>>        child (and therefore, the child's ID has not been recycled).
> 
> I'd prefer that sample code be robust in all cases.

I'm not clear what you think is missing. Or do you mean that the code
can't be robust in the face of (1) waitpid(-1) in another thread or an
asynchronous SIGCHLD handler and (2) SIGCHLD disposition set to SIG_IGN?

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-24 21:53           ` Christian Brauner
@ 2019-09-25 13:46             ` Michael Kerrisk (man-pages)
  2019-09-25 13:51               ` Florian Weimer
  2019-09-25 13:53               ` Christian Brauner
  0 siblings, 2 replies; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-25 13:46 UTC (permalink / raw)
  To: Christian Brauner
  Cc: mtk.manpages, Florian Weimer, Oleg Nesterov, Jann Horn,
	Eric W. Biederman, Daniel Colascione, Joel Fernandes, linux-man,
	Linux API, lkml

On 9/24/19 11:53 PM, Christian Brauner wrote:
> On Tue, Sep 24, 2019 at 11:00:03PM +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Christian,
>>
>>>>> If you're the parent of the process you can do this without CLONE_PIDFD:
>>>>> pid = fork();
>>>>> pidfd = pidfd_open();
>>>>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
>>>>> if (ret < 0 && errno == ESRCH)
>>>>> 	/* pidfd refers to another, recycled process */
>>>>
>>>> Although there is still the race between the fork() and the
>>>> pidfd_open(), right?
>>>
>>> Actually no and my code is even too complex.
>>> If you are the parent, and this is really a sequence that obeys the
>>> ordering pidfd_open() before waiting:
>>>
>>> pid = fork();
>>> if (pid == 0)
>>> 	exit(EXIT_SUCCESS);
>>> pidfd = pidfd_open(pid, 0);
>>> waitid(pid, ...);
>>>
>>> Then you are guaranteed that pidfd will refer to pid. No recycling can
>>> happen since the process has not been waited upon yet (That is,
>>
>> D'oh! Yes, of course. 
>>
>>> excluding special cases such as where you have a mainloop where a
>>> callback reacts to a SIGCHLD event and waits on the child behind your
>>> back and your next callback in the mainloop calls pidfd_open() while the
>>> pid has been recycled etc.).
>>> A race could only appear in sequences where waiting happens before
>>> pidfd_open():
>>>
>>> pid = fork();
>>> if (pid == 0)
>>> 	exit(EXIT_SUCCESS);
>>> waitid(pid, ...);
>>> pidfd = pidfd_open(pid, 0);
>>>
>>> which honestly simply doesn't make any sense. So if you're the parent
>>> and you combine fork() + pidfd_open() correctly things should be fine
>>> without even having to verify via pidfd_send_signal() (I missed that in
>>> my first mail.).
>>
>> Thanks for the additional detail.
> 
> You're very welcome.
> 
>>
>> I added the following to the pidfd_open() page, to
>> prevent people making the same thinko as me:
>>
>>        The following code sequence can be used to obtain a file  descrip‐
>>        tor for the child of fork(2):
>>
>>            pid = fork();
>>            if (pid > 0) {     /* If parent */
>>                pidfd = pidfd_open(pid, 0);
>>                ...
>>            }
>>
>>        Even  if  the  child process has already terminated by the time of
>>        the pidfd_open() call, the returned file descriptor is  guaranteed
>>        to refer to the child because the parent has not yet waited on the
>>        child (and therefore, the child's ID has not been recycled).
> 
> Thanks! I'm fine with the example. The code illustrates the basics. If
> you want to go overboard, you can mention my callback example and put my
> SIG_IGN code snippet from my earlier mails (cf. [1] and [2]) in there.
> But imho, that'll complicate the manpage and I'm not sure it's worth it.

I agree that we should not complicate this discussion with more code,
but how about we refine the text as follows:

       The following code sequence can be used to obtain a file  descrip‐
       tor for the child of fork(2):

           pid = fork();
           if (pid > 0) {     /* If parent */
               pidfd = pidfd_open(pid, 0);
               ...
           }

       Even  if  the  child  has  already  terminated  by the time of the
       pidfd_open() call, its PID will not have  been  recycled  and  the
       returned  file  descriptor  will  refer  to  the  resulting zombie
       process.  Note, however, that this is guaranteed only if the  fol‐
       lowing conditions hold true:

       *  the  disposition  of  SIGCHLD  has  not  been explicitly set to
          SIG_IGN (see sigaction(2)); and

       *  the zombie process was not  reaped  elsewhere  in  the  program
          (e.g.,  either  by an asynchronously executed signal handler or
          by wait(2) or similar in another thread).

       If these conditions don't hold true, then the child process should
       instead be created using clone(2) with the CLONE_PID flag.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-25 13:46             ` Michael Kerrisk (man-pages)
@ 2019-09-25 13:51               ` Florian Weimer
  2019-09-25 14:02                 ` Michael Kerrisk (man-pages)
  2019-09-25 13:53               ` Christian Brauner
  1 sibling, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2019-09-25 13:51 UTC (permalink / raw)
  To: Michael Kerrisk \(man-pages\)
  Cc: Christian Brauner, Oleg Nesterov, Jann Horn, Eric W. Biederman,
	Daniel Colascione, Joel Fernandes, linux-man, Linux API, lkml

* Michael Kerrisk:

>        If these conditions don't hold true, then the child process should
>        instead be created using clone(2) with the CLONE_PID flag.

I think this should be CLONE_PIDFD.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-25 13:46             ` Michael Kerrisk (man-pages)
  2019-09-25 13:51               ` Florian Weimer
@ 2019-09-25 13:53               ` Christian Brauner
  2019-09-25 14:29                 ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2019-09-25 13:53 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Florian Weimer, Oleg Nesterov, Jann Horn, Eric W. Biederman,
	Daniel Colascione, Joel Fernandes, linux-man, Linux API, lkml

On Wed, Sep 25, 2019 at 03:46:26PM +0200, Michael Kerrisk (man-pages) wrote:
> On 9/24/19 11:53 PM, Christian Brauner wrote:
> > On Tue, Sep 24, 2019 at 11:00:03PM +0200, Michael Kerrisk (man-pages) wrote:
> >> Hello Christian,
> >>
> >>>>> If you're the parent of the process you can do this without CLONE_PIDFD:
> >>>>> pid = fork();
> >>>>> pidfd = pidfd_open();
> >>>>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> >>>>> if (ret < 0 && errno == ESRCH)
> >>>>> 	/* pidfd refers to another, recycled process */
> >>>>
> >>>> Although there is still the race between the fork() and the
> >>>> pidfd_open(), right?
> >>>
> >>> Actually no and my code is even too complex.
> >>> If you are the parent, and this is really a sequence that obeys the
> >>> ordering pidfd_open() before waiting:
> >>>
> >>> pid = fork();
> >>> if (pid == 0)
> >>> 	exit(EXIT_SUCCESS);
> >>> pidfd = pidfd_open(pid, 0);
> >>> waitid(pid, ...);
> >>>
> >>> Then you are guaranteed that pidfd will refer to pid. No recycling can
> >>> happen since the process has not been waited upon yet (That is,
> >>
> >> D'oh! Yes, of course. 
> >>
> >>> excluding special cases such as where you have a mainloop where a
> >>> callback reacts to a SIGCHLD event and waits on the child behind your
> >>> back and your next callback in the mainloop calls pidfd_open() while the
> >>> pid has been recycled etc.).
> >>> A race could only appear in sequences where waiting happens before
> >>> pidfd_open():
> >>>
> >>> pid = fork();
> >>> if (pid == 0)
> >>> 	exit(EXIT_SUCCESS);
> >>> waitid(pid, ...);
> >>> pidfd = pidfd_open(pid, 0);
> >>>
> >>> which honestly simply doesn't make any sense. So if you're the parent
> >>> and you combine fork() + pidfd_open() correctly things should be fine
> >>> without even having to verify via pidfd_send_signal() (I missed that in
> >>> my first mail.).
> >>
> >> Thanks for the additional detail.
> > 
> > You're very welcome.
> > 
> >>
> >> I added the following to the pidfd_open() page, to
> >> prevent people making the same thinko as me:
> >>
> >>        The following code sequence can be used to obtain a file  descrip‐
> >>        tor for the child of fork(2):
> >>
> >>            pid = fork();
> >>            if (pid > 0) {     /* If parent */
> >>                pidfd = pidfd_open(pid, 0);
> >>                ...
> >>            }
> >>
> >>        Even  if  the  child process has already terminated by the time of
> >>        the pidfd_open() call, the returned file descriptor is  guaranteed
> >>        to refer to the child because the parent has not yet waited on the
> >>        child (and therefore, the child's ID has not been recycled).
> > 
> > Thanks! I'm fine with the example. The code illustrates the basics. If
> > you want to go overboard, you can mention my callback example and put my
> > SIG_IGN code snippet from my earlier mails (cf. [1] and [2]) in there.
> > But imho, that'll complicate the manpage and I'm not sure it's worth it.
> 
> I agree that we should not complicate this discussion with more code,
> but how about we refine the text as follows:
> 
>        The following code sequence can be used to obtain a file  descrip‐
>        tor for the child of fork(2):
> 
>            pid = fork();
>            if (pid > 0) {     /* If parent */
>                pidfd = pidfd_open(pid, 0);
>                ...
>            }
> 
>        Even  if  the  child  has  already  terminated  by the time of the
>        pidfd_open() call, its PID will not have  been  recycled  and  the
>        returned  file  descriptor  will  refer  to  the  resulting zombie
>        process.  Note, however, that this is guaranteed only if the  fol‐
>        lowing conditions hold true:
> 
>        *  the  disposition  of  SIGCHLD  has  not  been explicitly set to
>           SIG_IGN (see sigaction(2)); and

Ugh, I forgot a third one. There's also SA_NOCLDWAIT. When set and
the SIGCHLD handler is set to SIG_DFL then no zombie processes are
created and no SIGCHLD signal is sent. When an explicit handler for
SIGCHLD is set then a SIGCHLD signal is generated but the process will
still not be turned into a zombie...

> 
>        *  the zombie process was not  reaped  elsewhere  in  the  program
>           (e.g.,  either  by an asynchronously executed signal handler or
>           by wait(2) or similar in another thread).
> 
>        If these conditions don't hold true, then the child process should

"If any of these conditions does not hold, the child process..."

That might be clearer. But I leave the call on that to you. :)

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-25 13:51               ` Florian Weimer
@ 2019-09-25 14:02                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-25 14:02 UTC (permalink / raw)
  To: Florian Weimer
  Cc: mtk.manpages, Christian Brauner, Oleg Nesterov, Jann Horn,
	Eric W. Biederman, Daniel Colascione, Joel Fernandes, linux-man,
	Linux API, lkml

On 9/25/19 3:51 PM, Florian Weimer wrote:
> * Michael Kerrisk:
> 
>>        If these conditions don't hold true, then the child process should
>>        instead be created using clone(2) with the CLONE_PID flag.
> 
> I think this should be CLONE_PIDFD.

Thanks Florian. Fixed.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: For review: pidfd_send_signal(2) manual page
  2019-09-25 13:53               ` Christian Brauner
@ 2019-09-25 14:29                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-25 14:29 UTC (permalink / raw)
  To: Christian Brauner
  Cc: mtk.manpages, Florian Weimer, Oleg Nesterov, Jann Horn,
	Eric W. Biederman, Daniel Colascione, Joel Fernandes, linux-man,
	Linux API, lkml

Hello Christian,

On 9/25/19 3:53 PM, Christian Brauner wrote:
> On Wed, Sep 25, 2019 at 03:46:26PM +0200, Michael Kerrisk (man-pages) wrote:
>> On 9/24/19 11:53 PM, Christian Brauner wrote:
>>> On Tue, Sep 24, 2019 at 11:00:03PM +0200, Michael Kerrisk (man-pages) wrote:
>>>> Hello Christian,
>>>>
>>>>>>> If you're the parent of the process you can do this without CLONE_PIDFD:
>>>>>>> pid = fork();
>>>>>>> pidfd = pidfd_open();
>>>>>>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
>>>>>>> if (ret < 0 && errno == ESRCH)
>>>>>>> 	/* pidfd refers to another, recycled process */
>>>>>>
>>>>>> Although there is still the race between the fork() and the
>>>>>> pidfd_open(), right?
>>>>>
>>>>> Actually no and my code is even too complex.
>>>>> If you are the parent, and this is really a sequence that obeys the
>>>>> ordering pidfd_open() before waiting:
>>>>>
>>>>> pid = fork();
>>>>> if (pid == 0)
>>>>> 	exit(EXIT_SUCCESS);
>>>>> pidfd = pidfd_open(pid, 0);
>>>>> waitid(pid, ...);
>>>>>
>>>>> Then you are guaranteed that pidfd will refer to pid. No recycling can
>>>>> happen since the process has not been waited upon yet (That is,
>>>>
>>>> D'oh! Yes, of course. 
>>>>
>>>>> excluding special cases such as where you have a mainloop where a
>>>>> callback reacts to a SIGCHLD event and waits on the child behind your
>>>>> back and your next callback in the mainloop calls pidfd_open() while the
>>>>> pid has been recycled etc.).
>>>>> A race could only appear in sequences where waiting happens before
>>>>> pidfd_open():
>>>>>
>>>>> pid = fork();
>>>>> if (pid == 0)
>>>>> 	exit(EXIT_SUCCESS);
>>>>> waitid(pid, ...);
>>>>> pidfd = pidfd_open(pid, 0);
>>>>>
>>>>> which honestly simply doesn't make any sense. So if you're the parent
>>>>> and you combine fork() + pidfd_open() correctly things should be fine
>>>>> without even having to verify via pidfd_send_signal() (I missed that in
>>>>> my first mail.).
>>>>
>>>> Thanks for the additional detail.
>>>
>>> You're very welcome.
>>>
>>>>
>>>> I added the following to the pidfd_open() page, to
>>>> prevent people making the same thinko as me:
>>>>
>>>>        The following code sequence can be used to obtain a file  descrip‐
>>>>        tor for the child of fork(2):
>>>>
>>>>            pid = fork();
>>>>            if (pid > 0) {     /* If parent */
>>>>                pidfd = pidfd_open(pid, 0);
>>>>                ...
>>>>            }
>>>>
>>>>        Even  if  the  child process has already terminated by the time of
>>>>        the pidfd_open() call, the returned file descriptor is  guaranteed
>>>>        to refer to the child because the parent has not yet waited on the
>>>>        child (and therefore, the child's ID has not been recycled).
>>>
>>> Thanks! I'm fine with the example. The code illustrates the basics. If
>>> you want to go overboard, you can mention my callback example and put my
>>> SIG_IGN code snippet from my earlier mails (cf. [1] and [2]) in there.
>>> But imho, that'll complicate the manpage and I'm not sure it's worth it.
>>
>> I agree that we should not complicate this discussion with more code,
>> but how about we refine the text as follows:
>>
>>        The following code sequence can be used to obtain a file  descrip‐
>>        tor for the child of fork(2):
>>
>>            pid = fork();
>>            if (pid > 0) {     /* If parent */
>>                pidfd = pidfd_open(pid, 0);
>>                ...
>>            }
>>
>>        Even  if  the  child  has  already  terminated  by the time of the
>>        pidfd_open() call, its PID will not have  been  recycled  and  the
>>        returned  file  descriptor  will  refer  to  the  resulting zombie
>>        process.  Note, however, that this is guaranteed only if the  fol‐
>>        lowing conditions hold true:
>>
>>        *  the  disposition  of  SIGCHLD  has  not  been explicitly set to
>>           SIG_IGN (see sigaction(2)); and
> 
> Ugh, I forgot a third one. There's also SA_NOCLDWAIT. When set and
> the SIGCHLD handler is set to SIG_DFL then no zombie processes are
> created and no SIGCHLD signal is sent. When an explicit handler for
> SIGCHLD is set then a SIGCHLD signal is generated but the process will
> still not be turned into a zombie...

Oh, yes. I added:

       *  the SA_NOCLDSTOP flag was not specified  while  establishing  a
          handler  for  SIGCHLD  or while setting the disposition of that
          signal to SIG_DFL (see sigaction(2));

>>        *  the zombie process was not  reaped  elsewhere  in  the  program
>>           (e.g.,  either  by an asynchronously executed signal handler or
>>           by wait(2) or similar in another thread).
>>
>>        If these conditions don't hold true, then the child process should
> 
> "If any of these conditions does not hold, the child process..."
> 
> That might be clearer. But I leave the call on that to you. :)

Yep, your wording is better. Fixed.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, back to index

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-23  9:12 For review: pidfd_send_signal(2) manual page Michael Kerrisk (man-pages)
2019-09-23 11:26 ` Florian Weimer
2019-09-23 14:23   ` Christian Brauner
2019-09-24 19:44     ` Michael Kerrisk (man-pages)
2019-09-24 19:57       ` Christian Brauner
2019-09-24 20:07         ` Christian Brauner
2019-09-24 21:00         ` Michael Kerrisk (man-pages)
2019-09-24 21:08           ` Daniel Colascione
2019-09-25 13:46             ` Michael Kerrisk (man-pages)
2019-09-24 21:53           ` Christian Brauner
2019-09-25 13:46             ` Michael Kerrisk (man-pages)
2019-09-25 13:51               ` Florian Weimer
2019-09-25 14:02                 ` Michael Kerrisk (man-pages)
2019-09-25 13:53               ` Christian Brauner
2019-09-25 14:29                 ` Michael Kerrisk (man-pages)
2019-09-24 19:43   ` Michael Kerrisk (man-pages)
2019-09-25  1:48   ` Jann Horn
2019-09-23 11:31 ` Daniel Colascione
2019-09-24 19:42   ` Michael Kerrisk (man-pages)
2019-09-23 14:29 ` Christian Brauner
2019-09-23 20:27   ` Michael Kerrisk (man-pages)
2019-09-23 21:27 ` ebiederm
2019-09-24 19:10   ` Michael Kerrisk (man-pages)

Linux-man Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-man/0 linux-man/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-man linux-man/ https://lore.kernel.org/linux-man \
		linux-man@vger.kernel.org linux-man@archiver.kernel.org
	public-inbox-index linux-man

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-man


AGPL code for this site: git clone https://public-inbox.org/ public-inbox