linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* For review: pidfd_open(2) manual page
@ 2019-09-23  9:11 Michael Kerrisk (man-pages)
  2019-09-23 10:53 ` Florian Weimer
  2019-09-23 14:38 ` Christian Brauner
  0 siblings, 2 replies; 12+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-23  9:11 UTC (permalink / raw)
  To: Christian Brauner, Jann Horn, Daniel Colascione,
	Eric W. Biederman, Joel Fernandes
  Cc: mtk.manpages, Linux API, lkml, linux-man, Oleg Nesterov

Hello Christian and all,

Below, I have the rendered version of the current draft of
the pidfd_open(2) manual page that I have written.
The page source can be found in a Git branch at:
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd

I would be pleased to receive corrections and notes on any
details that should be added. (For example, are there error
cases that I have missed?)

Would you be able to review please?

Thanks,

Michael


NAME
       pidfd_open - obtain a file descriptor that refers to a process

SYNOPSIS
       int pidfd_open(pid_t pid, unsigned int flags);

DESCRIPTION
       The  pidfd_open()  system creates a file descriptor that refers to
       the process whose PID is specified in pid.  The file descriptor is
       returned  as the function result; the close-on-exec flag is set on
       the file descriptor.

       The flags argument is reserved for  future  use;  currently,  this
       argument must be specified as 0.

RETURN VALUE
       On  success,  pidfd_open()  returns a nonnegative file descriptor.
       On success, -1 is returned and errno is set to indicate the  cause
       of the error.

ERRORS
       EINVAL flags is not 0.

       EINVAL pid is not valid.

       ESRCH  The process specified by pid does not exist.

VERSIONS
       pidfd_open() first appeared in Linux 5.3.

CONFORMING TO
       pidfd_open() is Linux specific.

NOTES
       Currently, there is no glibc wrapper for this system call; call it
       using syscall(2).

       The pidfd_send_signal(2) system call can be used to send a  signal
       to the process referred to by a PID file descriptor.

       A  PID  file descriptor can be monitored using poll(2), select(2),
       and epoll(7).  When the process that it refers to terminates,  the
       file descriptor indicates as readable.  Note, however, that in the
       current implementation, nothing can be read from the file descrip‐
       tor.

       The  pidfd_open()  system call is the preferred way of obtaining a
       PID file descriptor.  The alternative is to obtain a file descrip‐
       tor by opening a /proc/[pid] directory.  However, the latter tech‐
       nique is possible only if the proc(5) file system is mounted; fur‐
       thermore,  the  file  descriptor  obtained in this way is not pol‐
       lable.

       See also the discussion of the CLONE_PIDFD flag in clone(2).

EXAMPLE
       The program below opens a PID  file  descriptor  for  the  process
       whose PID is specified as its command-line argument.  It then mon‐
       itors the file descriptor for readability (POLLIN) using  poll(2).
       When  the  process  with  the specified by PID terminates, poll(2)
       returns, and indicates that the file descriptor is readable.

   Program source

       #define _GNU_SOURCE
       #include <sys/syscall.h>
       #include <unistd.h>
       #include <poll.h>
       #include <stdlib.h>
       #include <stdio.h>

       #ifndef __NR_pidfd_open
       #define __NR_pidfd_open 434
       #endif

       static
       int pidfd_open(pid_t pid, unsigned int flags)
       {
           return syscall(__NR_pidfd_open, pid, flags);
       }

       int
       main(int argc, char *argv[])
       {
           struct pollfd pollfd;
           int pidfd, ready;

           if (argc != 2) {
               fprintf(stderr, "Usage: %s <pid>\n", argv[0]);
               exit(EXIT_SUCCESS);
           }

           pidfd = pidfd_open(atoi(argv[1]), 0);
           if (pidfd == -1) {
               perror("pidfd_open");
               exit(EXIT_FAILURE);
           }

           pollfd.fd = pidfd;
           pollfd.events = POLLIN;

           ready = poll(&pollfd, 1, -1);
           if (ready == -1) {
               perror("poll");
               exit(EXIT_FAILURE);
           }

           printf("Events (0x%x): POLLIN is %sset\n", pollfd.revents,
                   (pollfd.revents & POLLIN) ? "" : "not ");

           exit(EXIT_SUCCESS);
       }

SEE ALSO
       clone(2),  kill(2),  pidfd_send_signal(2),   poll(2),   select(2),
       epoll(7)


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23  9:11 For review: pidfd_open(2) manual page Michael Kerrisk (man-pages)
@ 2019-09-23 10:53 ` Florian Weimer
  2019-09-23 11:26   ` Daniel Colascione
                     ` (2 more replies)
  2019-09-23 14:38 ` Christian Brauner
  1 sibling, 3 replies; 12+ messages in thread
From: Florian Weimer @ 2019-09-23 10:53 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Christian Brauner, Jann Horn, Daniel Colascione,
	Eric W. Biederman, Joel Fernandes, Linux API, lkml, linux-man,
	Oleg Nesterov

* Michael Kerrisk:

> SYNOPSIS
>        int pidfd_open(pid_t pid, unsigned int flags);

Should this mention <sys/types.h> for pid_t?

> ERRORS
>        EINVAL flags is not 0.
>
>        EINVAL pid is not valid.
>
>        ESRCH  The process specified by pid does not exist.

Presumably, EMFILE and ENFILE are also possible errors, and so is
ENOMEM.

>        A  PID  file descriptor can be monitored using poll(2), select(2),
>        and epoll(7).  When the process that it refers to terminates,  the
>        file descriptor indicates as readable.  Note, however, that in the
>        current implementation, nothing can be read from the file descrip‐
>        tor.

“is indicated as readable” or “becomes readable”?  Will reading block?

>        The  pidfd_open()  system call is the preferred way of obtaining a
>        PID file descriptor.  The alternative is to obtain a file descrip‐
>        tor by opening a /proc/[pid] directory.  However, the latter tech‐
>        nique is possible only if the proc(5) file system is mounted; fur‐
>        thermore,  the  file  descriptor  obtained in this way is not pol‐
>        lable.

One question is whether the glibc wrapper should fall back back to the
/proc subdirectory if it is not available.  Probably not.

>        static
>        int pidfd_open(pid_t pid, unsigned int flags)
>        {
>            return syscall(__NR_pidfd_open, pid, flags);
>        }

Please call this function something else (not pidfd_open), so that the
example continues to work if glibc provides the system call wrapper.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23 10:53 ` Florian Weimer
@ 2019-09-23 11:26   ` Daniel Colascione
  2019-09-23 20:22     ` Michael Kerrisk (man-pages)
  2019-09-23 14:47   ` Christian Brauner
  2019-09-23 20:20   ` Michael Kerrisk (man-pages)
  2 siblings, 1 reply; 12+ messages in thread
From: Daniel Colascione @ 2019-09-23 11:26 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Michael Kerrisk (man-pages),
	Christian Brauner, Jann Horn, Eric W. Biederman, Joel Fernandes,
	Linux API, lkml, linux-man, Oleg Nesterov

On Mon, Sep 23, 2019 at 3:53 AM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * Michael Kerrisk:
>
> > SYNOPSIS
> >        int pidfd_open(pid_t pid, unsigned int flags);
>
> Should this mention <sys/types.h> for pid_t?
>
> > ERRORS
> >        EINVAL flags is not 0.
> >
> >        EINVAL pid is not valid.
> >
> >        ESRCH  The process specified by pid does not exist.
>
> Presumably, EMFILE and ENFILE are also possible errors, and so is
> ENOMEM.
>
> >        A  PID  file descriptor can be monitored using poll(2), select(2),
> >        and epoll(7).  When the process that it refers to terminates,  the
> >        file descriptor indicates as readable.

The phrase "becomes readable" is simpler than "indicates as readable"
and conveys the same meaning. I agree with Florian's comment on this
point below.

> > Note, however, that in the
> >        current implementation, nothing can be read from the file descrip‐
> >        tor.
>
> “is indicated as readable” or “becomes readable”?  Will reading block?
>
> >        The  pidfd_open()  system call is the preferred way of obtaining a
> >        PID file descriptor.  The alternative is to obtain a file descrip‐
> >        tor by opening a /proc/[pid] directory.  However, the latter tech‐
> >        nique is possible only if the proc(5) file system is mounted; fur‐
> >        thermore,  the  file  descriptor  obtained in this way is not pol‐
> >        lable.

Referring to procfs directory FDs as pidfds will probably confuse
people. I'd just omit this paragraph.

> One question is whether the glibc wrapper should fall back back to the
> /proc subdirectory if it is not available.  Probably not.

I'd prefer that glibc not provide this kind of fallback.
posix_fallocate-style emulation is, IMHO, too surprising.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23  9:11 For review: pidfd_open(2) manual page Michael Kerrisk (man-pages)
  2019-09-23 10:53 ` Florian Weimer
@ 2019-09-23 14:38 ` Christian Brauner
  2019-09-23 20:21   ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 12+ messages in thread
From: Christian Brauner @ 2019-09-23 14:38 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Christian Brauner, Jann Horn, Daniel Colascione,
	Eric W. Biederman, Joel Fernandes, Linux API, lkml, linux-man,
	Oleg Nesterov

On Mon, Sep 23, 2019 at 11:11:53AM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Christian and all,
> 
> Below, I have the rendered version of the current draft of
> the pidfd_open(2) manual page that I have written.
> The page source can be found in a Git branch at:
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd
> 
> I would be pleased to receive corrections and notes on any
> details that should be added. (For example, are there error
> cases that I have missed?)
> 
> Would you be able to review please?

Again, thank you Michael for doing this!

> 
> Thanks,
> 
> Michael
> 
> 
> NAME
>        pidfd_open - obtain a file descriptor that refers to a process
> 
> SYNOPSIS
>        int pidfd_open(pid_t pid, unsigned int flags);
> 
> DESCRIPTION
>        The  pidfd_open()  system creates a file descriptor that refers to

s/system/system call/

>        the process whose PID is specified in pid.  The file descriptor is
>        returned  as the function result; the close-on-exec flag is set on
>        the file descriptor.
> 
>        The flags argument is reserved for  future  use;  currently,  this
>        argument must be specified as 0.
> 
> RETURN VALUE
>        On  success,  pidfd_open()  returns a nonnegative file descriptor.
>        On success, -1 is returned and errno is set to indicate the  cause

s/On success/On error/g

>        of the error.
> 
> ERRORS
>        EINVAL flags is not 0.
> 
>        EINVAL pid is not valid.
> 
>        ESRCH  The process specified by pid does not exist.
> 
> VERSIONS
>        pidfd_open() first appeared in Linux 5.3.
> 
> CONFORMING TO
>        pidfd_open() is Linux specific.
> 
> NOTES
>        Currently, there is no glibc wrapper for this system call; call it
>        using syscall(2).
> 
>        The pidfd_send_signal(2) system call can be used to send a  signal
>        to the process referred to by a PID file descriptor.
> 
>        A  PID  file descriptor can be monitored using poll(2), select(2),
>        and epoll(7).  When the process that it refers to terminates,  the
>        file descriptor indicates as readable.  Note, however, that in the

Not a native English speaker but should this be "indicates it is
readable"?

>        current implementation, nothing can be read from the file descrip‐
>        tor.
> 
>        The  pidfd_open()  system call is the preferred way of obtaining a
>        PID file descriptor.  The alternative is to obtain a file descrip‐
>        tor by opening a /proc/[pid] directory.  However, the latter tech‐
>        nique is possible only if the proc(5) file system is mounted; fur‐
>        thermore,  the  file  descriptor  obtained in this way is not pol‐
>        lable.

I mentioned this already in the CLONE_PIDFD manpage, we should probably
not make a big deal out of this and not mention /proc/<pid> here at all.
(Crazy idea, but we could also have a config option that allows you to
turn of proc-pid-dirfds as pidfds if we start to feel really strongly
about this or a sysctl whatever...)

> 
>        See also the discussion of the CLONE_PIDFD flag in clone(2).
> 
> EXAMPLE
>        The program below opens a PID  file  descriptor  for  the  process
>        whose PID is specified as its command-line argument.  It then mon‐
>        itors the file descriptor for readability (POLLIN) using  poll(2).

Yeah, maybe say "monitors the file descriptor for process exit indicated
by an EPOLLIN event" or something. Readability might be confusing.

>        When  the  process  with  the specified by PID terminates, poll(2)
>        returns, and indicates that the file descriptor is readable.

See comment above "readable". (I'm on my phone and I think someone
pointed this out already.)

> 
>    Program source
> 
>        #define _GNU_SOURCE
>        #include <sys/syscall.h>
>        #include <unistd.h>
>        #include <poll.h>
>        #include <stdlib.h>
>        #include <stdio.h>
> 
>        #ifndef __NR_pidfd_open
>        #define __NR_pidfd_open 434
>        #endif

Alpha is special... (and not in a good way).
So you would need to special case Alpha since that's the only arch where
we haven't been able to unify syscall numbering. :D
But it's not super important.

I like the program example.

> 
>        static
>        int pidfd_open(pid_t pid, unsigned int flags)
>        {
>            return syscall(__NR_pidfd_open, pid, flags);
>        }
> 
>        int
>        main(int argc, char *argv[])
>        {
>            struct pollfd pollfd;
>            int pidfd, ready;
> 
>            if (argc != 2) {
>                fprintf(stderr, "Usage: %s <pid>\n", argv[0]);
>                exit(EXIT_SUCCESS);
>            }
> 
>            pidfd = pidfd_open(atoi(argv[1]), 0);
>            if (pidfd == -1) {
>                perror("pidfd_open");
>                exit(EXIT_FAILURE);
>            }
> 
>            pollfd.fd = pidfd;
>            pollfd.events = POLLIN;
> 
>            ready = poll(&pollfd, 1, -1);
>            if (ready == -1) {
>                perror("poll");
>                exit(EXIT_FAILURE);
>            }
> 
>            printf("Events (0x%x): POLLIN is %sset\n", pollfd.revents,
>                    (pollfd.revents & POLLIN) ? "" : "not ");
> 
>            exit(EXIT_SUCCESS);
>        }
> 
> SEE ALSO
>        clone(2),  kill(2),  pidfd_send_signal(2),   poll(2),   select(2),
>        epoll(7)
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23 10:53 ` Florian Weimer
  2019-09-23 11:26   ` Daniel Colascione
@ 2019-09-23 14:47   ` Christian Brauner
  2019-09-23 20:22     ` Michael Kerrisk (man-pages)
  2019-09-23 20:20   ` Michael Kerrisk (man-pages)
  2 siblings, 1 reply; 12+ messages in thread
From: Christian Brauner @ 2019-09-23 14:47 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Michael Kerrisk (man-pages),
	Christian Brauner, Jann Horn, Daniel Colascione,
	Eric W. Biederman, Joel Fernandes, Linux API, lkml, linux-man,
	Oleg Nesterov

On Mon, Sep 23, 2019 at 12:53:09PM +0200, Florian Weimer wrote:
> * Michael Kerrisk:
> 
> > SYNOPSIS
> >        int pidfd_open(pid_t pid, unsigned int flags);
> 
> Should this mention <sys/types.h> for pid_t?
> 
> > ERRORS
> >        EINVAL flags is not 0.
> >
> >        EINVAL pid is not valid.
> >
> >        ESRCH  The process specified by pid does not exist.
> 
> Presumably, EMFILE and ENFILE are also possible errors, and so is
> ENOMEM.

So, error codes that could surface are:
EMFILE: too many open files
ENODEV: the anon inode filesystem is not available in this kernel (unlikely)
ENOMEM: not enough memory (to allocate the backing struct file)
ENFILE: you're over the max_files limit which can be set through proc

I think that should be it.

> 
> >        A  PID  file descriptor can be monitored using poll(2), select(2),
> >        and epoll(7).  When the process that it refers to terminates,  the
> >        file descriptor indicates as readable.  Note, however, that in the
> >        current implementation, nothing can be read from the file descrip‐
> >        tor.
> 
> “is indicated as readable” or “becomes readable”?  Will reading block?
> 
> >        The  pidfd_open()  system call is the preferred way of obtaining a
> >        PID file descriptor.  The alternative is to obtain a file descrip‐
> >        tor by opening a /proc/[pid] directory.  However, the latter tech‐
> >        nique is possible only if the proc(5) file system is mounted; fur‐
> >        thermore,  the  file  descriptor  obtained in this way is not pol‐
> >        lable.
> 
> One question is whether the glibc wrapper should fall back back to the
> /proc subdirectory if it is not available.  Probably not.

No, that would not be transparent to userspace. Especially because both
fds differ in what can be done with them.

> 
> >        static
> >        int pidfd_open(pid_t pid, unsigned int flags)
> >        {
> >            return syscall(__NR_pidfd_open, pid, flags);
> >        }
> 
> Please call this function something else (not pidfd_open), so that the
> example continues to work if glibc provides the system call wrapper.

Agreed!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23 10:53 ` Florian Weimer
  2019-09-23 11:26   ` Daniel Colascione
  2019-09-23 14:47   ` Christian Brauner
@ 2019-09-23 20:20   ` Michael Kerrisk (man-pages)
  2019-09-23 20:41     ` Florian Weimer
  2 siblings, 1 reply; 12+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-23 20:20 UTC (permalink / raw)
  To: Florian Weimer
  Cc: mtk.manpages, Christian Brauner, Jann Horn, Daniel Colascione,
	Eric W. Biederman, Joel Fernandes, Linux API, lkml, linux-man,
	Oleg Nesterov

Hello Florian,

Thanks for taking a look at this page.

On 9/23/19 12:53 PM, Florian Weimer wrote:
> * Michael Kerrisk:
> 
>> SYNOPSIS
>>        int pidfd_open(pid_t pid, unsigned int flags);
> 
> Should this mention <sys/types.h> for pid_t?

Seems reasonable. I added this.

>> ERRORS
>>        EINVAL flags is not 0.
>>
>>        EINVAL pid is not valid.
>>
>>        ESRCH  The process specified by pid does not exist.
> 
> Presumably, EMFILE and ENFILE are also possible errors, and so is
> ENOMEM.

Thanks. I've added those.

>>        A  PID  file descriptor can be monitored using poll(2), select(2),
>>        and epoll(7).  When the process that it refers to terminates,  the
>>        file descriptor indicates as readable.  Note, however, that in the
>>        current implementation, nothing can be read from the file descrip‐
>>        tor.
> 
> “is indicated as readable” or “becomes readable”?  Will reading block?

It won't block. Reads from a pidfd always fail with the error EINVAL
(regardless of whether the target process has terminated).

I specifically wanted to avoid "becomes readable" to avoid any
suggestion that read() does something for a pidfd. I thought 
"indicates as readable" was fine, but you, Christian and Joel 
all called this wording out, so I changed this to:

"When the process that it refers to terminates,
these interfaces indicate the file descriptor as readable."

>>        The  pidfd_open()  system call is the preferred way of obtaining a
>>        PID file descriptor.  The alternative is to obtain a file descrip‐
>>        tor by opening a /proc/[pid] directory.  However, the latter tech‐
>>        nique is possible only if the proc(5) file system is mounted; fur‐
>>        thermore,  the  file  descriptor  obtained in this way is not pol‐
>>        lable.
> 
> One question is whether the glibc wrapper should fall back back to the
> /proc subdirectory if it is not available.  Probably not.

No, since the FD returned by opening /proc/PID is less functional
(it is not pollable) than the one returned by pidfd_open().

>>        static
>>        int pidfd_open(pid_t pid, unsigned int flags)
>>        {
>>            return syscall(__NR_pidfd_open, pid, flags);
>>        }
> 
> Please call this function something else (not pidfd_open), so that the
> example continues to work if glibc provides the system call wrapper.

I figured that if the syscall does get added to glibc, then I would
modify the example. In the meantime, this does seem the most natural
way of doing things, since the example then uses the real syscall
name as it would be used if there were a wrapper function.
 
But, this leads to the question: what do you think the likelihood
is that this system call will land in glibc?

Thanks for your feedback, Florian. I've pushed various changes
to the Git branch at 
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23 14:38 ` Christian Brauner
@ 2019-09-23 20:21   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-23 20:21 UTC (permalink / raw)
  To: Christian Brauner
  Cc: mtk.manpages, Christian Brauner, Jann Horn, Daniel Colascione,
	Eric W. Biederman, Joel Fernandes, Linux API, lkml, linux-man,
	Oleg Nesterov

Hello Christian,

On 9/23/19 4:38 PM, Christian Brauner wrote:
> On Mon, Sep 23, 2019 at 11:11:53AM +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Christian and all,
>>
>> Below, I have the rendered version of the current draft of
>> the pidfd_open(2) manual page that I have written.
>> The page source can be found in a Git branch at:
>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd
>>
>> I would be pleased to receive corrections and notes on any
>> details that should be added. (For example, are there error
>> cases that I have missed?)
>>
>> Would you be able to review please?
> 
> Again, thank you Michael for doing this!
> 
>>
>> Thanks,
>>
>> Michael
>>
>>
>> NAME
>>        pidfd_open - obtain a file descriptor that refers to a process
>>
>> SYNOPSIS
>>        int pidfd_open(pid_t pid, unsigned int flags);
>>
>> DESCRIPTION
>>        The  pidfd_open()  system creates a file descriptor that refers to
> 
> s/system/system call/

Fixed.

>>        the process whose PID is specified in pid.  The file descriptor is
>>        returned  as the function result; the close-on-exec flag is set on
>>        the file descriptor.
>>
>>        The flags argument is reserved for  future  use;  currently,  this
>>        argument must be specified as 0.
>>
>> RETURN VALUE
>>        On  success,  pidfd_open()  returns a nonnegative file descriptor.
>>        On success, -1 is returned and errno is set to indicate the  cause
> 
> s/On success/On error/g

Fixed.

>>        of the error.
>>
>> ERRORS
>>        EINVAL flags is not 0.
>>
>>        EINVAL pid is not valid.
>>
>>        ESRCH  The process specified by pid does not exist.
>>
>> VERSIONS
>>        pidfd_open() first appeared in Linux 5.3.
>>
>> CONFORMING TO
>>        pidfd_open() is Linux specific.
>>
>> NOTES
>>        Currently, there is no glibc wrapper for this system call; call it
>>        using syscall(2).
>>
>>        The pidfd_send_signal(2) system call can be used to send a  signal
>>        to the process referred to by a PID file descriptor.
>>
>>        A  PID  file descriptor can be monitored using poll(2), select(2),
>>        and epoll(7).  When the process that it refers to terminates,  the
>>        file descriptor indicates as readable.  Note, however, that in the
> 
> Not a native English speaker but should this be "indicates it is
> readable"?

See my reply to Florian.

>>        current implementation, nothing can be read from the file descrip‐
>>        tor.
>>
>>        The  pidfd_open()  system call is the preferred way of obtaining a
>>        PID file descriptor.  The alternative is to obtain a file descrip‐
>>        tor by opening a /proc/[pid] directory.  However, the latter tech‐
>>        nique is possible only if the proc(5) file system is mounted; fur‐
>>        thermore,  the  file  descriptor  obtained in this way is not pol‐
>>        lable.
> 
> I mentioned this already in the CLONE_PIDFD manpage, we should probably
> not make a big deal out of this and not mention /proc/<pid> here at all.

The thing is, people *will* learn about these two different types
of FDs, whether we document them or not. So, I think it's better to
be up front about what's available, and make a suitably strong
recommendation about the preferred technique.

Reading between the lines, it sounds like just a couple of releases
after it was implemented, you're saying that implementing
open(/proc/PID) was a mistake?

> (Crazy idea, but we could also have a config option that allows you to
> turn of proc-pid-dirfds as pidfds if we start to feel really strongly
> about this or a sysctl whatever...)
> 
>>
>>        See also the discussion of the CLONE_PIDFD flag in clone(2).
>>
>> EXAMPLE
>>        The program below opens a PID  file  descriptor  for  the  process
>>        whose PID is specified as its command-line argument.  It then mon‐
>>        itors the file descriptor for readability (POLLIN) using  poll(2).
> 
> Yeah, maybe say "monitors the file descriptor for process exit indicated
> by an EPOLLIN event" or something. Readability might be confusing.

I like that suggestion! I reworded to something close to what you suggest.

>>        When  the  process  with  the specified by PID terminates, poll(2)
>>        returns, and indicates that the file descriptor is readable.
> 
> See comment above "readable". (I'm on my phone and I think someone
> pointed this out already.)

Actually, I think I can just remove that sentence. It doesn't really
add much.

>>    Program source
>>
>>        #define _GNU_SOURCE
>>        #include <sys/syscall.h>
>>        #include <unistd.h>
>>        #include <poll.h>
>>        #include <stdlib.h>
>>        #include <stdio.h>
>>
>>        #ifndef __NR_pidfd_open
>>        #define __NR_pidfd_open 434
>>        #endif
> 
> Alpha is special... (and not in a good way).
> So you would need to special case Alpha since that's the only arch where
> we haven't been able to unify syscall numbering. :D
> But it's not super important.

Okay.

> I like the program example.

Good.

Thanks for reviewing! I've pushed various changes
to the Git branch at 
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd


Cheers,

Michael

>>
>>        static
>>        int pidfd_open(pid_t pid, unsigned int flags)
>>        {
>>            return syscall(__NR_pidfd_open, pid, flags);
>>        }
>>
>>        int
>>        main(int argc, char *argv[])
>>        {
>>            struct pollfd pollfd;
>>            int pidfd, ready;
>>
>>            if (argc != 2) {
>>                fprintf(stderr, "Usage: %s <pid>\n", argv[0]);
>>                exit(EXIT_SUCCESS);
>>            }
>>
>>            pidfd = pidfd_open(atoi(argv[1]), 0);
>>            if (pidfd == -1) {
>>                perror("pidfd_open");
>>                exit(EXIT_FAILURE);
>>            }
>>
>>            pollfd.fd = pidfd;
>>            pollfd.events = POLLIN;
>>
>>            ready = poll(&pollfd, 1, -1);
>>            if (ready == -1) {
>>                perror("poll");
>>                exit(EXIT_FAILURE);
>>            }
>>
>>            printf("Events (0x%x): POLLIN is %sset\n", pollfd.revents,
>>                    (pollfd.revents & POLLIN) ? "" : "not ");
>>
>>            exit(EXIT_SUCCESS);
>>        }
>>
>> SEE ALSO
>>        clone(2),  kill(2),  pidfd_send_signal(2),   poll(2),   select(2),
>>        epoll(7)
>>
>>
>> -- 
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23 14:47   ` Christian Brauner
@ 2019-09-23 20:22     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-23 20:22 UTC (permalink / raw)
  To: Christian Brauner, Florian Weimer
  Cc: mtk.manpages, Christian Brauner, Jann Horn, Daniel Colascione,
	Eric W. Biederman, Joel Fernandes, Linux API, lkml, linux-man,
	Oleg Nesterov

Hello Christian,

On 9/23/19 4:47 PM, Christian Brauner wrote:
> On Mon, Sep 23, 2019 at 12:53:09PM +0200, Florian Weimer wrote:
>> * Michael Kerrisk:
>>
>>> SYNOPSIS
>>>        int pidfd_open(pid_t pid, unsigned int flags);
>>
>> Should this mention <sys/types.h> for pid_t?
>>
>>> ERRORS
>>>        EINVAL flags is not 0.
>>>
>>>        EINVAL pid is not valid.
>>>
>>>        ESRCH  The process specified by pid does not exist.
>>
>> Presumably, EMFILE and ENFILE are also possible errors, and so is
>> ENOMEM.
> 
> So, error codes that could surface are:
> EMFILE: too many open files
> ENODEV: the anon inode filesystem is not available in this kernel (unlikely)
> ENOMEM: not enough memory (to allocate the backing struct file)
> ENFILE: you're over the max_files limit which can be set through proc
> 
> I think that should be it.

Thanks. I've added those.
>>>        A  PID  file descriptor can be monitored using poll(2), select(2),
>>>        and epoll(7).  When the process that it refers to terminates,  the
>>>        file descriptor indicates as readable.  Note, however, that in the
>>>        current implementation, nothing can be read from the file descrip‐
>>>        tor.
>>
>> “is indicated as readable” or “becomes readable”?  Will reading block?
>>
>>>        The  pidfd_open()  system call is the preferred way of obtaining a
>>>        PID file descriptor.  The alternative is to obtain a file descrip‐
>>>        tor by opening a /proc/[pid] directory.  However, the latter tech‐
>>>        nique is possible only if the proc(5) file system is mounted; fur‐
>>>        thermore,  the  file  descriptor  obtained in this way is not pol‐
>>>        lable.
>>
>> One question is whether the glibc wrapper should fall back back to the
>> /proc subdirectory if it is not available.  Probably not.
> 
> No, that would not be transparent to userspace. Especially because both
> fds differ in what can be done with them.
> 
>>
>>>        static
>>>        int pidfd_open(pid_t pid, unsigned int flags)
>>>        {
>>>            return syscall(__NR_pidfd_open, pid, flags);
>>>        }
>>
>> Please call this function something else (not pidfd_open), so that the
>> example continues to work if glibc provides the system call wrapper.
> 
> Agreed!

See my reply to Florian. (So far, I didn't change anything here.)

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23 11:26   ` Daniel Colascione
@ 2019-09-23 20:22     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-23 20:22 UTC (permalink / raw)
  To: Daniel Colascione, Florian Weimer
  Cc: mtk.manpages, Christian Brauner, Jann Horn, Eric W. Biederman,
	Joel Fernandes, Linux API, lkml, linux-man, Oleg Nesterov

Hello Daniel,

Than you for reviewing the page!

On 9/23/19 1:26 PM, Daniel Colascione wrote:
> On Mon, Sep 23, 2019 at 3:53 AM Florian Weimer <fw@deneb.enyo.de> wrote:
>>
>> * Michael Kerrisk:
>>
>>> SYNOPSIS
>>>        int pidfd_open(pid_t pid, unsigned int flags);
>>
>> Should this mention <sys/types.h> for pid_t?
>>
>>> ERRORS
>>>        EINVAL flags is not 0.
>>>
>>>        EINVAL pid is not valid.
>>>
>>>        ESRCH  The process specified by pid does not exist.
>>
>> Presumably, EMFILE and ENFILE are also possible errors, and so is
>> ENOMEM.
>>
>>>        A  PID  file descriptor can be monitored using poll(2), select(2),
>>>        and epoll(7).  When the process that it refers to terminates,  the
>>>        file descriptor indicates as readable.
> 
> The phrase "becomes readable" is simpler than "indicates as readable"
> and conveys the same meaning. I agree with Florian's comment on this
> point below.

See my reply to Florian. (I did change the text here.)

>>> Note, however, that in the
>>>        current implementation, nothing can be read from the file descrip‐
>>>        tor.
>>
>> “is indicated as readable” or “becomes readable”?  Will reading block?
>>
>>>        The  pidfd_open()  system call is the preferred way of obtaining a
>>>        PID file descriptor.  The alternative is to obtain a file descrip‐
>>>        tor by opening a /proc/[pid] directory.  However, the latter tech‐
>>>        nique is possible only if the proc(5) file system is mounted; fur‐
>>>        thermore,  the  file  descriptor  obtained in this way is not pol‐
>>>        lable.
> 
> Referring to procfs directory FDs as pidfds will probably confuse
> people. I'd just omit this paragraph.

See my reply to Christian (and feel free to argue the point, please).
So far, I have made no change here.

>> One question is whether the glibc wrapper should fall back back to the
>> /proc subdirectory if it is not available.  Probably not.
> 
> I'd prefer that glibc not provide this kind of fallback.
> posix_fallocate-style emulation is, IMHO, too surprising.

Agreed.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23 20:20   ` Michael Kerrisk (man-pages)
@ 2019-09-23 20:41     ` Florian Weimer
  2019-09-23 20:57       ` Michael Kerrisk (man-pages)
  2019-09-24  7:38       ` Christian Brauner
  0 siblings, 2 replies; 12+ messages in thread
From: Florian Weimer @ 2019-09-23 20:41 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Christian Brauner, Jann Horn, Daniel Colascione,
	Eric W. Biederman, Joel Fernandes, Linux API, lkml, linux-man,
	Oleg Nesterov

* Michael Kerrisk:

>>>        static
>>>        int pidfd_open(pid_t pid, unsigned int flags)
>>>        {
>>>            return syscall(__NR_pidfd_open, pid, flags);
>>>        }
>> 
>> Please call this function something else (not pidfd_open), so that the
>> example continues to work if glibc provides the system call wrapper.
>
> I figured that if the syscall does get added to glibc, then I would
> modify the example. In the meantime, this does seem the most natural
> way of doing things, since the example then uses the real syscall
> name as it would be used if there were a wrapper function.

The problem is that programs do this as well, so they fail to build
once they are built on a newer glibc version.

> But, this leads to the question: what do you think the likelihood
> is that this system call will land in glibc?

Quite likely.  It's easy enough to document, there are no P&C issues,
and it doesn't need any new types.

pidfd_send_signal is slightly more difficult because we probably need
to add rt_sigqueueinfo first, for consistency.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23 20:41     ` Florian Weimer
@ 2019-09-23 20:57       ` Michael Kerrisk (man-pages)
  2019-09-24  7:38       ` Christian Brauner
  1 sibling, 0 replies; 12+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-09-23 20:57 UTC (permalink / raw)
  To: Florian Weimer
  Cc: mtk.manpages, Christian Brauner, Jann Horn, Daniel Colascione,
	Eric W. Biederman, Joel Fernandes, Linux API, lkml, linux-man,
	Oleg Nesterov

Hello Florian,

On 9/23/19 10:41 PM, Florian Weimer wrote:
> * Michael Kerrisk:
> 
>>>>        static
>>>>        int pidfd_open(pid_t pid, unsigned int flags)
>>>>        {
>>>>            return syscall(__NR_pidfd_open, pid, flags);
>>>>        }
>>>
>>> Please call this function something else (not pidfd_open), so that the
>>> example continues to work if glibc provides the system call wrapper.
>>
>> I figured that if the syscall does get added to glibc, then I would
>> modify the example. In the meantime, this does seem the most natural
>> way of doing things, since the example then uses the real syscall
>> name as it would be used if there were a wrapper function.
> 
> The problem is that programs do this as well, so they fail to build
> once they are built on a newer glibc version.

But isn't such a failure a good thing? I mean: it encourages
people to rid their programs of uses of syscall(2).

>> But, this leads to the question: what do you think the likelihood
>> is that this system call will land in glibc?
> 
> Quite likely.  It's easy enough to document, there are no P&C issues,
> and it doesn't need any new types.

Okay.

> pidfd_send_signal is slightly more difficult because we probably need
> to add rt_sigqueueinfo first, for consistency.

Okay. I see that's a little more problematic.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: For review: pidfd_open(2) manual page
  2019-09-23 20:41     ` Florian Weimer
  2019-09-23 20:57       ` Michael Kerrisk (man-pages)
@ 2019-09-24  7:38       ` Christian Brauner
  1 sibling, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2019-09-24  7:38 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Michael Kerrisk (man-pages),
	Jann Horn, Daniel Colascione, Eric W. Biederman, Joel Fernandes,
	Linux API, lkml, linux-man, Oleg Nesterov

On Mon, Sep 23, 2019 at 10:41:19PM +0200, Florian Weimer wrote:
> * Michael Kerrisk:
> 
> >>>        static
> >>>        int pidfd_open(pid_t pid, unsigned int flags)
> >>>        {
> >>>            return syscall(__NR_pidfd_open, pid, flags);
> >>>        }
> >> 
> >> Please call this function something else (not pidfd_open), so that the
> >> example continues to work if glibc provides the system call wrapper.
> >
> > I figured that if the syscall does get added to glibc, then I would
> > modify the example. In the meantime, this does seem the most natural
> > way of doing things, since the example then uses the real syscall
> > name as it would be used if there were a wrapper function.
> 
> The problem is that programs do this as well, so they fail to build
> once they are built on a newer glibc version.
> 
> > But, this leads to the question: what do you think the likelihood
> > is that this system call will land in glibc?
> 
> Quite likely.  It's easy enough to document, there are no P&C issues,
> and it doesn't need any new types.

My previous mail probably didn't make it so here it is again: I think
especially with the recently established glibc consensus to provide
wrappers for all new system calls (with some sensible exceptions) I'd
expect this to be the case.

> 
> pidfd_send_signal is slightly more difficult because we probably need
> to add rt_sigqueueinfo first, for consistency.

Oh, huh. Somehow I thought we already provide that.

Christian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-09-24  7:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-23  9:11 For review: pidfd_open(2) manual page Michael Kerrisk (man-pages)
2019-09-23 10:53 ` Florian Weimer
2019-09-23 11:26   ` Daniel Colascione
2019-09-23 20:22     ` Michael Kerrisk (man-pages)
2019-09-23 14:47   ` Christian Brauner
2019-09-23 20:22     ` Michael Kerrisk (man-pages)
2019-09-23 20:20   ` Michael Kerrisk (man-pages)
2019-09-23 20:41     ` Florian Weimer
2019-09-23 20:57       ` Michael Kerrisk (man-pages)
2019-09-24  7:38       ` Christian Brauner
2019-09-23 14:38 ` Christian Brauner
2019-09-23 20:21   ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).