All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Christian Brauner <brauner@kernel.org>
Cc: Tycho Andersen <tycho@tycho.pizza>,
	Oleg Nesterov <oleg@redhat.com>,
	"Eric W . Biederman" <ebiederm@xmission.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	Tycho Andersen <tandersen@netflix.com>, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,
	Joel Fernandes <joel@joelfernandes.org>
Subject: Re: [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders
Date: Fri, 8 Dec 2023 18:47:44 +0100	[thread overview]
Message-ID: <20231208174744.2vsubexeolns7nb5@quack3> (raw)
In-Reply-To: <20231207-netzhaut-wachen-81c34f8ee154@brauner>

On Thu 07-12-23 18:21:18, Christian Brauner wrote:
> [Cc fsdevel & Jan because we had some discussions about fanotify
> returning non-thread-group pidfds. That's just for awareness or in case
> this might need special handling.]

Thanks!

> On Thu, Nov 30, 2023 at 09:39:44AM -0700, Tycho Andersen wrote:
> > From: Tycho Andersen <tandersen@netflix.com>
> > 
> > We are using the pidfd family of syscalls with the seccomp userspace
> > notifier. When some thread triggers a seccomp notification, we want to do
> > some things to its context (munge fd tables via pidfd_getfd(), maybe write
> > to its memory, etc.). However, threads created with ~CLONE_FILES or
> > ~CLONE_VM mean that we can't use the pidfd family of syscalls for this
> > purpose, since their fd table or mm are distinct from the thread group
> > leader's. In this patch, we relax this restriction for pidfd_open().
> > 
> > In order to avoid dangling poll() users we need to notify pidfd waiters
> > when individual threads die, but once we do that all the other machinery
> > seems to work ok viz. the tests. But I suppose there are more cases than
> > just this one.
> > 
> > Another weirdness is the open-coding of this vs. exporting using
> > do_notify_pidfd(). This particular location is after __exit_signal() is
> > called, which does __unhash_process() which kills ->thread_pid, so we need
> > to use the copy we have locally, vs do_notify_pid() which accesses it via
> > task_pid(). Maybe this suggests that the notification should live somewhere
> > in __exit_signals()? I just put it here because I saw we were already
> > testing if this task was the leader.
> > 
> > Signed-off-by: Tycho Andersen <tandersen@netflix.com>
> > ---
> 
> So we've always said that if there's a use-case for this then we're
> willing to support it. And I think that stance hasn't changed. I know
> that others have expressed interest in this as well.
> 
> So currently the series only enables pidfds for threads to be created
> and allows notifications for threads. But all places that currently make
> use of pidfds refuse non-thread-group leaders. We can certainly proceed
> with a patch series that only enables creation and exit notification but
> we should also consider unlocking additional functionality:
 
...

> * pidfd_prepare() is used to create pidfds for:
> 
>   (1) CLONE_PIDFD via clone() and clone3()
>   (2) SCM_PIDFD and SO_PEERPIDFD
>   (3) fanotify

So for fanotify there's no problem I can think of. All we do is return the
pidfd we get to userspace with the event to identify the task generating
the event. So in practice this would mean userspace will get proper pidfd
instead of error value (FAN_EPIDFD) for events generated by
non-thread-group leader. IMO a win.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

      parent reply	other threads:[~2023-12-08 17:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 16:39 [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders Tycho Andersen
2023-11-30 16:39 ` [RFC 2/3] selftests/pidfd: add non-thread-group leader tests Tycho Andersen
2023-11-30 16:39 ` [RFC 3/3] clone: allow CLONE_THREAD | CLONE_PIDFD together Tycho Andersen
2023-11-30 17:39 ` [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders Oleg Nesterov
2023-11-30 17:56   ` Tycho Andersen
2023-12-01 16:31     ` Tycho Andersen
2023-12-07 17:57       ` Christian Brauner
2023-12-07 21:25         ` Christian Brauner
2023-12-08 20:04           ` Tycho Andersen
2023-11-30 18:37 ` Florian Weimer
2023-11-30 18:54   ` Tycho Andersen
2023-11-30 19:00     ` Mathieu Desnoyers
2023-11-30 19:17       ` Tycho Andersen
2023-11-30 19:43       ` Florian Weimer
2023-12-06 15:27         ` Tycho Andersen
2023-12-07 22:58         ` Christian Brauner
2023-12-08  3:16           ` Jens Axboe
2023-12-08 13:15           ` Florian Weimer
2023-12-08 13:48             ` Christian Brauner
2023-12-08 13:58               ` Florian Weimer
2023-12-07 17:21 ` Christian Brauner
2023-12-07 17:52   ` Tycho Andersen
2023-12-08 17:47   ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231208174744.2vsubexeolns7nb5@quack3 \
    --to=jack@suse.cz \
    --cc=brauner@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=tandersen@netflix.com \
    --cc=tycho@tycho.pizza \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.