Linux-man Archive on
 help / color / Atom feed
From: Christian Brauner <>
To: "Michael Kerrisk (man-pages)" <>
Subject: Re: [PATCH] clone.2: add CLONE_PIDFD entry
Date: Thu, 19 Sep 2019 08:47:51 +0200
Message-ID: <20190919064750.tyxc7lut3mc2lcrx@wittgenstein> (raw)
In-Reply-To: <>

On Thu, Sep 19, 2019 at 06:04:55AM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Christian,
> On 9/18/19 9:14 AM, Christian Brauner wrote:
> > On Wed, Sep 18, 2019 at 08:49:59AM +0200, Michael Kerrisk (man-pages) wrote:
> >>>> One other piece seems to be missing: the returned file descriptor can
> >>>> be fed to poll()/select()/epoll and the FD will test as readable when
> >>>> the child terminates. Right? Did that functionality also land in
> >>>> kernel 5.2? And did it get implemented as a separate commit, or did
> >>>> the behavior just fall naturally out of the implementation of pidfd's?
> >>>> Let me know the details, and I will craft a patch.
> >>>
> >>> It landed in 5.3. The relevant commit is:
> >>>
> >>> and belongs to the following merge:
> >>>
> >>
> >> Thanks for that info. One other questions springs to mind.
> >> I haven't looked at the source or tried testing this,
> >> but can anything actually be read() from a PIDFD? Presumably,
> > 
> > We had discussed this but decided to not implement this right away.
> > Mainly, because we did not have a clear picture what the semantics
> > should be. But it is something that we will probably want in the
> > future...
> That makes sense.
> A further question... We now have three ways of getting a
> process file descriptor [*]:
> open() of /proc/PID
> pidfd_open()
> clone()/clone3() with CLONE_PIDFD
> I thought the FD was supposed to be equivalent in all three cases.
> However, if I try (on kernel 5.3) poll() an FD returned by opening
> /proc/PID, poll() tells me POLLNVAL for the FD. Is that difference
> intentional? (I am guessing it is not.)

It's intentional.
The short answer is that /proc/<pid> is a convenience for sending
The longer answer is that this stems from a heavy debate about what a
process file descriptor was supposed to be and some people pushing for
at least being able to use /proc/<pid> dirfds while ignoring security
problems as soon as you're talking about returning those fds from
clone(); not to mention the additional problems discovered when trying
to implementing this.
A "real" pidfd is one from CLONE_PIDFD or pidfd_open() and all features
such as exit notification, read, and other future extensions will only
be implemented on top of them.
As much as we'd have liked to get rid of two different file descriptor
types it doesn't hurt us much and is not that much different from what
we will e.g. see with fsinfo() in the new mount api which needs to work
on regular fds gotten via open()/openat() and mountfds gotten from
fsopen() and fspick(). The mountfds will also allow for advanced
operations that the other ones will not. There's even an argument to be
made that fds you will get from open()/openat() and openat2() are
different types since they have very different behavior; openat2()
returning fds that are non arbitrarily upgradable etc.

> Thanks,
> Michael
> [*} By the way, going forward, can we call these things
> "process FDs", rather than "PID FDs"? The API names are what
> they are, an that's okay, but these just as we have socket
> FDs that refer to sockets, directory FDs that refer to 
> directories, and timer FDs that refer to timers, and so on,
> these are FDs that refer to *processes*, not "process IDs".
> It's a little thing, but I think the naming better, and
> it's what I propose to use in the manual pages.

The naming was another debate and we ended with this compromise.
I would just clarify that a pidfd is a process file descriptor. I
wouldn't make too much of a deal of hiding the shortcut "pidfd". People
are already using it out there in the wild and it's never proven a good
idea to go against accepted practice.


  parent reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <>
2019-09-11  8:58 ` Michael Kerrisk (man-pages)
2019-09-16  7:40   ` Christian Brauner
2019-09-18  6:49     ` Michael Kerrisk (man-pages)
2019-09-18  7:14       ` Christian Brauner
2019-09-19  4:04         ` Michael Kerrisk (man-pages)
2019-09-19  4:43           ` Michael Kerrisk (man-pages)
2019-09-19  6:47           ` Christian Brauner [this message]
2019-09-23  8:11             ` Michael Kerrisk (man-pages)
2019-09-23 14:13               ` Christian Brauner

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190919064750.tyxc7lut3mc2lcrx@wittgenstein \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-man Archive on

Archives are clonable:
	git clone --mirror linux-man/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-man linux-man/ \
	public-inbox-index linux-man

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone