From: Joel Fernandes <joel@joelfernandes.org>
To: Christian Brauner <christian@brauner.io>
Cc: Oleg Nesterov <oleg@redhat.com>,
torvalds@linux-foundation.org, viro@zeniv.linux.org.uk,
jannh@google.com, dhowells@redhat.com, linux-api@vger.kernel.org,
linux-kernel@vger.kernel.org, serge@hallyn.com, luto@kernel.org,
arnd@arndb.de, ebiederm@xmission.com, keescook@chromium.org,
tglx@linutronix.de, mtk.manpages@gmail.com,
akpm@linux-foundation.org, cyphar@cyphar.com, dancol@google.com
Subject: Re: [PATCH 2/4] clone: add CLONE_PIDFD
Date: Mon, 15 Apr 2019 12:25:42 -0400 [thread overview]
Message-ID: <20190415162542.GA246478@google.com> (raw)
In-Reply-To: <20190415135246.d6pvyf3pkt3sbh6t@brauner.io>
On Mon, Apr 15, 2019 at 03:52:48PM +0200, Christian Brauner wrote:
> On Mon, Apr 15, 2019 at 03:24:16PM +0200, Oleg Nesterov wrote:
> > On 04/15, Christian Brauner wrote:
> > >
> > > > CLONE_PARENT_SETTID doesn't look very usefule, so what if we add
> > > >
> > > > if ((clone_flags & (CLONE_PIDFD|CLONE_PARENT_SETTID)) ==
> > > > (CLONE_PIDFD|CLONE_PARENT_SETTID))
> > > > return ERR_PTR(-EINVAL);
> > > >
> > > > at the start of copy_process() ?
> > > >
> > > > Then it can do
> > > >
> > > > if (clone_flags & CLONE_PIDFD) {
> > > > retval = pidfd_create(pid, &pidfdf);
> > > > if (retval < 0)
> > > > goto bad_fork_free_pid;
> > > > retval = put_user(retval, parent_tidptr)
> > > > if (retval < 0)
> > > > goto bad_fork_free_pid;
> > > > }
> > >
> > > Uhhh Oleg, that is nifty. I have to say I like that a lot. This would
> > > let us return the pid and the pidfd in one go and we can also start
> > > pidfd numbering at 0.
> >
> > Christian, sorry if it was already discussed, but I can't force myself to
> > read all the previous discussions ;)
> >
> > If we forget about CONFIG_PROC_FS, why do we really want to create a file?
> >
> >
> > Suppose we add a global u64 counter incremented by copy_process and reported
> > in /proc/$pid/status. Suppose that clone(CLONE_PIDFD) writes this counter to
> > *parent_tidptr. Let's denote this counter as UNIQ_PID.
> >
> > Now, if you want to (say) safely kill a task and you have its UNIQ_PID, you
> > can do
> >
> > kill_by_pid_uniq(int pid, u64 uniq_pid)
> > {
> > pidfd = open("/proc/$pid", O_DIRECTORY);
> >
> > status = openat(pidfd, "status");
> > u64 this_uniq_pid = ... read UNIQ_PID from status ...;
> >
> > if (uniq_pid != this_uniq_pid)
> > return;
> >
> > pidfd_send_signal(pidfd);
> > }
> >
> > Why else do we want pidfd?
>
> I think this was thrown around at one point but this is rather
> inelegant imho. It basically makes a process unique by using a
> combination of two identifiers. You end up with a similar concept but
> you make it way less flexible and extensible imho. With pidfds you can
> not care about pids at all if you don't want to. The UNIQ_PID thing
> would require you to always juggle two identifiers.
>
> Your example would also only work if CONFIG_PROC_FS is set (Not sure if
> that's what you meant by "forget about CONFIG_PROC_FS")? Say, you get
> a pid from clone() and your UNIQ_PID thing. Then you still can't
> reliably kill a process because pidfd_send_signal() is not useable since
> you can't get pidfds. And if you go the kill way you end up with the same
> problem. Yes, you could solve this by probably extending syscalls to
> take a UNIQ_PID argument but that seems very inelegant.
>
> The UNIQ_PID implementation would also require being tracked in the
> kernel either in task_struct or struct pid potentially and thus would
> probably add more infrastructure in the kernel. We don't need any of
> that if we simply rely on pidfds.
>
> Most of all, the pidfd concept allows one way more flexibility in
> extending it. For example, Joel is working on a patchset to make pidfds
> pollable so you can get information about a process death by polling
> them. We also want to be able to potentially wait on a process with
> waitid(W_PIDFD) or similar as suggested by Linus in earlier threads. At
> that point you end up in a similar situation as tgkill() where you pass
> a tgid and a pid already to make sure that the pid you pass has the tgid
> as thread-group leader. That is all way simpler with pidfds.
I agree the pidfd file descriptor approach is simpler than dealing with 2
pids and is needed for the poll notification support I posted.
Also in the future it allows for a pidfd to sent over IPC to another
process using binder or unix domain sockets.
thanks,
- Joel
next prev parent reply other threads:[~2019-04-15 16:25 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-14 20:14 [PATCH 0/4] clone: add CLONE_PIDFD Christian Brauner
2019-04-14 20:14 ` [PATCH 1/4] Make anon_inodes unconditional Christian Brauner
2019-04-14 20:14 ` [PATCH 2/4] clone: add CLONE_PIDFD Christian Brauner
2019-04-15 10:52 ` Oleg Nesterov
2019-04-15 11:42 ` Christian Brauner
2019-04-15 13:24 ` Oleg Nesterov
2019-04-15 13:52 ` Christian Brauner
2019-04-15 16:25 ` Joel Fernandes [this message]
2019-04-15 17:15 ` Jonathan Kowalski
2019-04-15 19:39 ` Daniel Colascione
2019-04-14 20:14 ` [PATCH 3/4] signal: support CLONE_PIDFD with pidfd_send_signal Christian Brauner
2019-04-14 20:14 ` [PATCH 4/4] samples: show race-free pidfd metadata access Christian Brauner
2019-04-15 10:08 ` RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD] Enrico Weigelt, metux IT consult
2019-04-15 15:50 ` Serge E. Hallyn
2019-04-16 18:32 ` Enrico Weigelt, metux IT consult
2019-04-29 15:49 ` Serge E. Hallyn
2019-04-29 17:31 ` Enrico Weigelt, metux IT consult
2019-05-05 2:32 ` Serge E. Hallyn
2019-04-15 19:59 ` Aleksa Sarai
2019-04-15 20:29 ` Andy Lutomirski
2019-04-15 21:27 ` Jonathan Kowalski
2019-04-15 23:58 ` Andy Lutomirski
2019-04-16 18:45 ` Enrico Weigelt, metux IT consult
2019-04-16 21:31 ` Andy Lutomirski
2019-04-17 12:03 ` Enrico Weigelt, metux IT consult
2019-04-17 12:54 ` Christian Brauner
2019-04-18 15:46 ` Enrico Weigelt, metux IT consult
2019-04-17 12:19 ` Florian Weimer
2019-04-17 16:46 ` Andy Lutomirski
2019-04-20 7:14 ` Kevin Easton
2019-04-20 11:15 ` Christian Brauner
2019-04-20 15:06 ` Daniel Colascione
2019-04-29 19:30 ` Jann Horn
2019-04-29 19:55 ` Jann Horn
2019-04-29 20:21 ` Linus Torvalds
2019-04-29 20:38 ` Florian Weimer
2019-04-29 20:51 ` Christian Brauner
2019-04-29 21:31 ` Linus Torvalds
2019-04-30 7:01 ` Florian Weimer
2019-04-30 0:38 ` Jann Horn
2019-04-30 2:16 ` Linus Torvalds
2019-04-30 8:21 ` Florian Weimer
2019-04-30 16:19 ` Linus Torvalds
2019-04-30 16:26 ` Linus Torvalds
2019-04-30 17:07 ` Florian Weimer
2019-04-30 12:39 ` Oleg Nesterov
2019-04-30 16:24 ` Linus Torvalds
2019-04-29 20:49 ` Florian Weimer
2019-04-29 20:52 ` Christian Brauner
2019-04-20 15:28 ` Al Viro
2019-04-16 18:37 ` Enrico Weigelt, metux IT consult
2019-04-15 10:16 ` [PATCH 0/4] clone: add CLONE_PIDFD Enrico Weigelt, metux IT consult
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190415162542.GA246478@google.com \
--to=joel@joelfernandes.org \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=christian@brauner.io \
--cc=cyphar@cyphar.com \
--cc=dancol@google.com \
--cc=dhowells@redhat.com \
--cc=ebiederm@xmission.com \
--cc=jannh@google.com \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=oleg@redhat.com \
--cc=serge@hallyn.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).