linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
@ 2024-01-31 13:25 Oleg Nesterov
  2024-01-31 13:26 ` [PATCH v3 1/1] " Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Oleg Nesterov @ 2024-01-31 13:25 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

Please see the interdiff below.

Also, I updated the changelog to document that the behaviour of
pidfd_poll(PIDFD_THREAD, pid-of-group-leader) is not well defined
if a sub-thread execs.

Do you agree with this semantics?

Oleg.
---

diff --git a/fs/exec.c b/fs/exec.c
index 73e4045df271..0fd7e668c477 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1143,7 +1143,11 @@ static int de_thread(struct task_struct *tsk)
 
 		BUG_ON(leader->exit_state != EXIT_ZOMBIE);
 		leader->exit_state = EXIT_DEAD;
-
+		/*
+		 * leader and tsk exhanged their pids, the old pid dies,
+		 * wake up the PIDFD_THREAD waiters.
+		 */
+		do_notify_pidfd(leader);
 		/*
 		 * We are going to release_task()->ptrace_unlink() silently,
 		 * the tracer can sleep in do_wait(). EXIT_DEAD guarantees
diff --git a/include/linux/pid.h b/include/linux/pid.h
index e6a041cb8bac..8124d57752b9 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -70,10 +70,11 @@ extern const struct file_operations pidfd_fops;
 
 struct file;
 
-extern struct pid *pidfd_pid(const struct file *file);
+struct pid *pidfd_pid(const struct file *file);
 struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
 struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
 int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
+void do_notify_pidfd(struct task_struct *task);
 
 static inline struct pid *get_pid(struct pid *pid)
 {
diff --git a/kernel/signal.c b/kernel/signal.c
index 5f48d2c4b409..9b40109f0c56 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2019,7 +2019,7 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
 	return ret;
 }
 
-static void do_notify_pidfd(struct task_struct *task)
+void do_notify_pidfd(struct task_struct *task)
 {
 	struct pid *pid;
 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 13:25 [PATCH v3 0/1] pidfd: implement PIDFD_THREAD flag for pidfd_open() Oleg Nesterov
@ 2024-01-31 13:26 ` Oleg Nesterov
  2024-01-31 14:12   ` Christian Brauner
                     ` (4 more replies)
  2024-01-31 13:58 ` [PATCH v3 0/1] " Christian Brauner
  2024-01-31 14:12 ` Oleg Nesterov
  2 siblings, 5 replies; 19+ messages in thread
From: Oleg Nesterov @ 2024-01-31 13:26 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

With this flag:

	- pidfd_open() doesn't require that the target task must be
	  a thread-group leader

	- pidfd_poll() succeeds when the task exits and becomes a
	  zombie (iow, passes exit_notify()), even if it is a leader
	  and thread-group is not empty.

	  This means that the behaviour of pidfd_poll(PIDFD_THREAD,
	  pid-of-group-leader) is not well defined if it races with
	  exec() from its sub-thread; pidfd_poll() can succeed or not
	  depending on whether pidfd_task_exited() is called before
	  or after exchange_tids().

	  Perhaps we can improve this behaviour later, pidfd_poll()
	  can probably take sig->group_exec_task into account. But
	  this doesn't really differ from the case when the leader
	  exits before other threads (so pidfd_poll() succeeds) and
	  then another thread execs and pidfd_poll() will block again.

thread_group_exited() is no longer used, perhaps it can die.

Co-developed-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/exec.c                  |  6 +++++-
 include/linux/pid.h        |  3 ++-
 include/uapi/linux/pidfd.h |  3 ++-
 kernel/exit.c              |  7 +++++++
 kernel/fork.c              | 38 +++++++++++++++++++++++++++++++-------
 kernel/pid.c               | 14 +++-----------
 kernel/signal.c            |  6 ++++--
 7 files changed, 54 insertions(+), 23 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 73e4045df271..0fd7e668c477 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1143,7 +1143,11 @@ static int de_thread(struct task_struct *tsk)
 
 		BUG_ON(leader->exit_state != EXIT_ZOMBIE);
 		leader->exit_state = EXIT_DEAD;
-
+		/*
+		 * leader and tsk exhanged their pids, the old pid dies,
+		 * wake up the PIDFD_THREAD waiters.
+		 */
+		do_notify_pidfd(leader);
 		/*
 		 * We are going to release_task()->ptrace_unlink() silently,
 		 * the tracer can sleep in do_wait(). EXIT_DEAD guarantees
diff --git a/include/linux/pid.h b/include/linux/pid.h
index e6a041cb8bac..8124d57752b9 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -70,10 +70,11 @@ extern const struct file_operations pidfd_fops;
 
 struct file;
 
-extern struct pid *pidfd_pid(const struct file *file);
+struct pid *pidfd_pid(const struct file *file);
 struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
 struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
 int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
+void do_notify_pidfd(struct task_struct *task);
 
 static inline struct pid *get_pid(struct pid *pid)
 {
diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
index 5406fbc13074..2e6461459877 100644
--- a/include/uapi/linux/pidfd.h
+++ b/include/uapi/linux/pidfd.h
@@ -7,6 +7,7 @@
 #include <linux/fcntl.h>
 
 /* Flags for pidfd_open().  */
-#define PIDFD_NONBLOCK O_NONBLOCK
+#define PIDFD_NONBLOCK	O_NONBLOCK
+#define PIDFD_THREAD	O_EXCL
 
 #endif /* _UAPI_LINUX_PIDFD_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index dfb963d2f862..493647fd7c07 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -739,6 +739,13 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 		kill_orphaned_pgrp(tsk->group_leader, NULL);
 
 	tsk->exit_state = EXIT_ZOMBIE;
+	/*
+	 * sub-thread or delay_group_leader(), wake up the
+	 * PIDFD_THREAD waiters.
+	 */
+	if (!thread_group_empty(tsk))
+		do_notify_pidfd(tsk);
+
 	if (unlikely(tsk->ptrace)) {
 		int sig = thread_group_leader(tsk) &&
 				thread_group_empty(tsk) &&
diff --git a/kernel/fork.c b/kernel/fork.c
index 347641398f9d..bea32071fff2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -101,6 +101,7 @@
 #include <linux/user_events.h>
 #include <linux/iommu.h>
 #include <linux/rseq.h>
+#include <uapi/linux/pidfd.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -2050,6 +2051,8 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
 
 	seq_put_decimal_ll(m, "Pid:\t", nr);
 
+	/* TODO: report PIDFD_THREAD */
+
 #ifdef CONFIG_PID_NS
 	seq_put_decimal_ll(m, "\nNSpid:\t", nr);
 	if (nr > 0) {
@@ -2068,22 +2071,35 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
 }
 #endif
 
+static bool pidfd_task_exited(struct pid *pid, bool thread)
+{
+	struct task_struct *task;
+	bool exited;
+
+	rcu_read_lock();
+	task = pid_task(pid, PIDTYPE_PID);
+	exited = !task ||
+		(READ_ONCE(task->exit_state) && (thread || thread_group_empty(task)));
+	rcu_read_unlock();
+
+	return exited;
+}
+
 /*
  * Poll support for process exit notification.
  */
 static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
 {
 	struct pid *pid = file->private_data;
+	bool thread = file->f_flags & PIDFD_THREAD;
 	__poll_t poll_flags = 0;
 
 	poll_wait(file, &pid->wait_pidfd, pts);
-
 	/*
-	 * Inform pollers only when the whole thread group exits.
-	 * If the thread group leader exits before all other threads in the
-	 * group, then poll(2) should block, similar to the wait(2) family.
+	 * Depending on PIDFD_THREAD, inform pollers when the thread
+	 * or the whole thread-group exits.
 	 */
-	if (thread_group_exited(pid))
+	if (pidfd_task_exited(pid, thread))
 		poll_flags = EPOLLIN | EPOLLRDNORM;
 
 	return poll_flags;
@@ -2141,6 +2157,11 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
 		return PTR_ERR(pidfd_file);
 	}
 	get_pid(pid); /* held by pidfd_file now */
+	/*
+	 * anon_inode_getfile() ignores everything outside of the
+	 * O_ACCMODE | O_NONBLOCK mask, set PIDFD_THREAD manually.
+	 */
+	pidfd_file->f_flags |= (flags & PIDFD_THREAD);
 	*ret = pidfd_file;
 	return pidfd;
 }
@@ -2154,7 +2175,8 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
  * Allocate a new file that stashes @pid and reserve a new pidfd number in the
  * caller's file descriptor table. The pidfd is reserved but not installed yet.
  *
- * The helper verifies that @pid is used as a thread group leader.
+ * The helper verifies that @pid is still in use, without PIDFD_THREAD the
+ * task identified by @pid must be a thread-group leader.
  *
  * If this function returns successfully the caller is responsible to either
  * call fd_install() passing the returned pidfd and pidfd file as arguments in
@@ -2173,7 +2195,9 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
  */
 int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
 {
-	if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
+	bool thread = flags & PIDFD_THREAD;
+
+	if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID));
 		return -EINVAL;
 
 	return __pidfd_prepare(pid, flags, ret);
diff --git a/kernel/pid.c b/kernel/pid.c
index c7a3e359f8f5..e11144466828 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -552,11 +552,6 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
  * Return the task associated with @pidfd. The function takes a reference on
  * the returned task. The caller is responsible for releasing that reference.
  *
- * Currently, the process identified by @pidfd is always a thread-group leader.
- * This restriction currently exists for all aspects of pidfds including pidfd
- * creation (CLONE_PIDFD cannot be used with CLONE_THREAD) and pidfd polling
- * (only supports thread group leaders).
- *
  * Return: On success, the task_struct associated with the pidfd.
  *	   On error, a negative errno number will be returned.
  */
@@ -615,11 +610,8 @@ static int pidfd_create(struct pid *pid, unsigned int flags)
  * @flags: flags to pass
  *
  * This creates a new pid file descriptor with the O_CLOEXEC flag set for
- * the process identified by @pid. Currently, the process identified by
- * @pid must be a thread-group leader. This restriction currently exists
- * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
- * be used with CLONE_THREAD) and pidfd polling (only supports thread group
- * leaders).
+ * the task identified by @pid. Without PIDFD_THREAD flag the target task
+ * must be a thread-group leader.
  *
  * Return: On success, a cloexec pidfd is returned.
  *         On error, a negative errno number will be returned.
@@ -629,7 +621,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
 	int fd;
 	struct pid *p;
 
-	if (flags & ~PIDFD_NONBLOCK)
+	if (flags & ~(PIDFD_NONBLOCK | PIDFD_THREAD))
 		return -EINVAL;
 
 	if (pid <= 0)
diff --git a/kernel/signal.c b/kernel/signal.c
index 9561a3962ca6..9b40109f0c56 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2019,7 +2019,7 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
 	return ret;
 }
 
-static void do_notify_pidfd(struct task_struct *task)
+void do_notify_pidfd(struct task_struct *task)
 {
 	struct pid *pid;
 
@@ -2051,7 +2051,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	WARN_ON_ONCE(!tsk->ptrace &&
 	       (tsk->group_leader != tsk || !thread_group_empty(tsk)));
 	/*
-	 * tsk is a group leader and has no threads, wake up the pidfd waiters.
+	 * tsk is a group leader and has no threads, wake up the
+	 * non-PIDFD_THREAD waiters.
 	 */
 	if (thread_group_empty(tsk))
 		do_notify_pidfd(tsk);
@@ -3926,6 +3927,7 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 		prepare_kill_siginfo(sig, &kinfo);
 	}
 
+	/* TODO: respect PIDFD_THREAD */
 	ret = kill_pid_info(sig, &kinfo, pid);
 
 err:
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 13:25 [PATCH v3 0/1] pidfd: implement PIDFD_THREAD flag for pidfd_open() Oleg Nesterov
  2024-01-31 13:26 ` [PATCH v3 1/1] " Oleg Nesterov
@ 2024-01-31 13:58 ` Christian Brauner
  2024-01-31 14:12 ` Oleg Nesterov
  2 siblings, 0 replies; 19+ messages in thread
From: Christian Brauner @ 2024-01-31 13:58 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

On Wed, Jan 31, 2024 at 02:25:41PM +0100, Oleg Nesterov wrote:
> Please see the interdiff below.
> 
> Also, I updated the changelog to document that the behaviour of
> pidfd_poll(PIDFD_THREAD, pid-of-group-leader) is not well defined
> if a sub-thread execs.
> 
> Do you agree with this semantics?

Yeah, that seems fine to me.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 13:25 [PATCH v3 0/1] pidfd: implement PIDFD_THREAD flag for pidfd_open() Oleg Nesterov
  2024-01-31 13:26 ` [PATCH v3 1/1] " Oleg Nesterov
  2024-01-31 13:58 ` [PATCH v3 0/1] " Christian Brauner
@ 2024-01-31 14:12 ` Oleg Nesterov
  2024-01-31 14:39   ` Christian Brauner
  2 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2024-01-31 14:12 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

Before I forget this...

After this patch we can easily add another feature, pidfd_poll()
can add, say, POLLHUP to poll_flags if the pid is "dead".

So the user can do

	poll(pidfd, { .revents = POLLHUP });

and it will block until release_task() is called and this pid is
no longer in use (pid_task() == NULL).

Do you think this can be useful?

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 13:26 ` [PATCH v3 1/1] " Oleg Nesterov
@ 2024-01-31 14:12   ` Christian Brauner
  2024-01-31 14:19     ` Oleg Nesterov
  2024-01-31 17:23   ` Tycho Andersen
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-01-31 14:12 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

On Wed, Jan 31, 2024 at 02:26:02PM +0100, Oleg Nesterov wrote:
> With this flag:
> 
> 	- pidfd_open() doesn't require that the target task must be
> 	  a thread-group leader
> 
> 	- pidfd_poll() succeeds when the task exits and becomes a
> 	  zombie (iow, passes exit_notify()), even if it is a leader
> 	  and thread-group is not empty.
> 
> 	  This means that the behaviour of pidfd_poll(PIDFD_THREAD,
> 	  pid-of-group-leader) is not well defined if it races with
> 	  exec() from its sub-thread; pidfd_poll() can succeed or not
> 	  depending on whether pidfd_task_exited() is called before
> 	  or after exchange_tids().

I think that we can live with that for now. If it doesn't hinder Tycho's
use-case it's fine.

> 
> 	  Perhaps we can improve this behaviour later, pidfd_poll()
> 	  can probably take sig->group_exec_task into account. But
> 	  this doesn't really differ from the case when the leader
> 	  exits before other threads (so pidfd_poll() succeeds) and
> 	  then another thread execs and pidfd_poll() will block again.

... because it assumes the old thread-group leader's struct pid, right.
One would hope that implementations don't poll on the same fd again
after they saw an exit event.

(I often wish that we could report custom data from poll similar to
kqueue on bsd then we could attach file specific information to the
notification. That would enable userspace to handle this the way they
want.)

> 
> thread_group_exited() is no longer used, perhaps it can die.

Feel free to add a patch for that on top of it.

> 
> Co-developed-by: Tycho Andersen <tycho@tycho.pizza>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  fs/exec.c                  |  6 +++++-
>  include/linux/pid.h        |  3 ++-
>  include/uapi/linux/pidfd.h |  3 ++-
>  kernel/exit.c              |  7 +++++++
>  kernel/fork.c              | 38 +++++++++++++++++++++++++++++++-------
>  kernel/pid.c               | 14 +++-----------
>  kernel/signal.c            |  6 ++++--
>  7 files changed, 54 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 73e4045df271..0fd7e668c477 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1143,7 +1143,11 @@ static int de_thread(struct task_struct *tsk)
>  
>  		BUG_ON(leader->exit_state != EXIT_ZOMBIE);
>  		leader->exit_state = EXIT_DEAD;
> -
> +		/*
> +		 * leader and tsk exhanged their pids, the old pid dies,
> +		 * wake up the PIDFD_THREAD waiters.
> +		 */
> +		do_notify_pidfd(leader);
>  		/*
>  		 * We are going to release_task()->ptrace_unlink() silently,
>  		 * the tracer can sleep in do_wait(). EXIT_DEAD guarantees
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index e6a041cb8bac..8124d57752b9 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -70,10 +70,11 @@ extern const struct file_operations pidfd_fops;
>  
>  struct file;
>  
> -extern struct pid *pidfd_pid(const struct file *file);
> +struct pid *pidfd_pid(const struct file *file);
>  struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
>  struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
>  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
> +void do_notify_pidfd(struct task_struct *task);
>  
>  static inline struct pid *get_pid(struct pid *pid)
>  {
> diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
> index 5406fbc13074..2e6461459877 100644
> --- a/include/uapi/linux/pidfd.h
> +++ b/include/uapi/linux/pidfd.h
> @@ -7,6 +7,7 @@
>  #include <linux/fcntl.h>
>  
>  /* Flags for pidfd_open().  */
> -#define PIDFD_NONBLOCK O_NONBLOCK
> +#define PIDFD_NONBLOCK	O_NONBLOCK
> +#define PIDFD_THREAD	O_EXCL
>  
>  #endif /* _UAPI_LINUX_PIDFD_H */
> diff --git a/kernel/exit.c b/kernel/exit.c
> index dfb963d2f862..493647fd7c07 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -739,6 +739,13 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>  		kill_orphaned_pgrp(tsk->group_leader, NULL);
>  
>  	tsk->exit_state = EXIT_ZOMBIE;
> +	/*
> +	 * sub-thread or delay_group_leader(), wake up the
> +	 * PIDFD_THREAD waiters.
> +	 */
> +	if (!thread_group_empty(tsk))
> +		do_notify_pidfd(tsk);
> +
>  	if (unlikely(tsk->ptrace)) {
>  		int sig = thread_group_leader(tsk) &&
>  				thread_group_empty(tsk) &&
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 347641398f9d..bea32071fff2 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -101,6 +101,7 @@
>  #include <linux/user_events.h>
>  #include <linux/iommu.h>
>  #include <linux/rseq.h>
> +#include <uapi/linux/pidfd.h>
>  
>  #include <asm/pgalloc.h>
>  #include <linux/uaccess.h>
> @@ -2050,6 +2051,8 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
>  
>  	seq_put_decimal_ll(m, "Pid:\t", nr);
>  
> +	/* TODO: report PIDFD_THREAD */
> +
>  #ifdef CONFIG_PID_NS
>  	seq_put_decimal_ll(m, "\nNSpid:\t", nr);
>  	if (nr > 0) {
> @@ -2068,22 +2071,35 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
>  }
>  #endif
>  
> +static bool pidfd_task_exited(struct pid *pid, bool thread)
> +{
> +	struct task_struct *task;
> +	bool exited;
> +
> +	rcu_read_lock();
> +	task = pid_task(pid, PIDTYPE_PID);
> +	exited = !task ||
> +		(READ_ONCE(task->exit_state) && (thread || thread_group_empty(task)));
> +	rcu_read_unlock();
> +
> +	return exited;
> +}
> +
>  /*
>   * Poll support for process exit notification.
>   */
>  static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
>  {
>  	struct pid *pid = file->private_data;
> +	bool thread = file->f_flags & PIDFD_THREAD;
>  	__poll_t poll_flags = 0;
>  
>  	poll_wait(file, &pid->wait_pidfd, pts);
> -
>  	/*
> -	 * Inform pollers only when the whole thread group exits.
> -	 * If the thread group leader exits before all other threads in the
> -	 * group, then poll(2) should block, similar to the wait(2) family.
> +	 * Depending on PIDFD_THREAD, inform pollers when the thread
> +	 * or the whole thread-group exits.
>  	 */
> -	if (thread_group_exited(pid))
> +	if (pidfd_task_exited(pid, thread))
>  		poll_flags = EPOLLIN | EPOLLRDNORM;
>  
>  	return poll_flags;
> @@ -2141,6 +2157,11 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
>  		return PTR_ERR(pidfd_file);
>  	}
>  	get_pid(pid); /* held by pidfd_file now */
> +	/*
> +	 * anon_inode_getfile() ignores everything outside of the
> +	 * O_ACCMODE | O_NONBLOCK mask, set PIDFD_THREAD manually.
> +	 */
> +	pidfd_file->f_flags |= (flags & PIDFD_THREAD);
>  	*ret = pidfd_file;
>  	return pidfd;
>  }
> @@ -2154,7 +2175,8 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
>   * Allocate a new file that stashes @pid and reserve a new pidfd number in the
>   * caller's file descriptor table. The pidfd is reserved but not installed yet.
>   *
> - * The helper verifies that @pid is used as a thread group leader.
> + * The helper verifies that @pid is still in use, without PIDFD_THREAD the
> + * task identified by @pid must be a thread-group leader.
>   *
>   * If this function returns successfully the caller is responsible to either
>   * call fd_install() passing the returned pidfd and pidfd file as arguments in
> @@ -2173,7 +2195,9 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
>   */
>  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
>  {
> -	if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> +	bool thread = flags & PIDFD_THREAD;
> +
> +	if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID));
>  		return -EINVAL;
>  
>  	return __pidfd_prepare(pid, flags, ret);
> diff --git a/kernel/pid.c b/kernel/pid.c
> index c7a3e359f8f5..e11144466828 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -552,11 +552,6 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
>   * Return the task associated with @pidfd. The function takes a reference on
>   * the returned task. The caller is responsible for releasing that reference.
>   *
> - * Currently, the process identified by @pidfd is always a thread-group leader.
> - * This restriction currently exists for all aspects of pidfds including pidfd
> - * creation (CLONE_PIDFD cannot be used with CLONE_THREAD) and pidfd polling
> - * (only supports thread group leaders).
> - *
>   * Return: On success, the task_struct associated with the pidfd.
>   *	   On error, a negative errno number will be returned.
>   */
> @@ -615,11 +610,8 @@ static int pidfd_create(struct pid *pid, unsigned int flags)
>   * @flags: flags to pass
>   *
>   * This creates a new pid file descriptor with the O_CLOEXEC flag set for
> - * the process identified by @pid. Currently, the process identified by
> - * @pid must be a thread-group leader. This restriction currently exists
> - * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
> - * be used with CLONE_THREAD) and pidfd polling (only supports thread group
> - * leaders).
> + * the task identified by @pid. Without PIDFD_THREAD flag the target task
> + * must be a thread-group leader.
>   *
>   * Return: On success, a cloexec pidfd is returned.
>   *         On error, a negative errno number will be returned.
> @@ -629,7 +621,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
>  	int fd;
>  	struct pid *p;
>  
> -	if (flags & ~PIDFD_NONBLOCK)
> +	if (flags & ~(PIDFD_NONBLOCK | PIDFD_THREAD))
>  		return -EINVAL;
>  
>  	if (pid <= 0)
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 9561a3962ca6..9b40109f0c56 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2019,7 +2019,7 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>  	return ret;
>  }
>  
> -static void do_notify_pidfd(struct task_struct *task)
> +void do_notify_pidfd(struct task_struct *task)
>  {
>  	struct pid *pid;
>  
> @@ -2051,7 +2051,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  	WARN_ON_ONCE(!tsk->ptrace &&
>  	       (tsk->group_leader != tsk || !thread_group_empty(tsk)));
>  	/*
> -	 * tsk is a group leader and has no threads, wake up the pidfd waiters.
> +	 * tsk is a group leader and has no threads, wake up the
> +	 * non-PIDFD_THREAD waiters.
>  	 */
>  	if (thread_group_empty(tsk))
>  		do_notify_pidfd(tsk);
> @@ -3926,6 +3927,7 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
>  		prepare_kill_siginfo(sig, &kinfo);
>  	}
>  
> +	/* TODO: respect PIDFD_THREAD */
>  	ret = kill_pid_info(sig, &kinfo, pid);
>  
>  err:
> -- 
> 2.25.1.362.g51ebf55
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 14:12   ` Christian Brauner
@ 2024-01-31 14:19     ` Oleg Nesterov
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2024-01-31 14:19 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

On 01/31, Christian Brauner wrote:
>
> On Wed, Jan 31, 2024 at 02:26:02PM +0100, Oleg Nesterov wrote:
> > With this flag:
> >
> > 	- pidfd_open() doesn't require that the target task must be
> > 	  a thread-group leader
> >
> > 	- pidfd_poll() succeeds when the task exits and becomes a
> > 	  zombie (iow, passes exit_notify()), even if it is a leader
> > 	  and thread-group is not empty.
> >
> > 	  This means that the behaviour of pidfd_poll(PIDFD_THREAD,
> > 	  pid-of-group-leader) is not well defined if it races with
> > 	  exec() from its sub-thread; pidfd_poll() can succeed or not
> > 	  depending on whether pidfd_task_exited() is called before
> > 	  or after exchange_tids().
>
> I think that we can live with that for now. If it doesn't hinder Tycho's
> use-case it's fine.

OK, good.

> (I often wish that we could report custom data from poll similar to
> kqueue on bsd then we could attach file specific information to the
> notification. That would enable userspace to handle this the way they
> want.)

Or at least perhaps we can change do_notify_pidfd() to use the keyed
wakeups... I'll try to take a look later.

> > thread_group_exited() is no longer used, perhaps it can die.
>
> Feel free to add a patch for that on top of it.

Yes, will do.

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 14:12 ` Oleg Nesterov
@ 2024-01-31 14:39   ` Christian Brauner
  2024-01-31 16:10     ` Oleg Nesterov
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-01-31 14:39 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

On Wed, Jan 31, 2024 at 03:12:04PM +0100, Oleg Nesterov wrote:
> Before I forget this...
> 
> After this patch we can easily add another feature, pidfd_poll()
> can add, say, POLLHUP to poll_flags if the pid is "dead".
> 
> So the user can do
> 
> 	poll(pidfd, { .revents = POLLHUP });
> 
> and it will block until release_task() is called and this pid is
> no longer in use (pid_task() == NULL).
> 
> Do you think this can be useful?

Yeah, I think this is something that people would find useful. IIUC, it
would essentially allow them to do things like wait until a task has
been waited upon which for service managers is very useful information.

---

Btw, bigger picture for a second. You probably haven't kept track of
this but the fact that we got pidfds - thanks a lot to your review and
help - has made quite a huge difference in userspace. Since we last did
any meaningful work we got:

* systemd completely relying on pidfds to manage services to guard
  against any pid races.
* Extended dbus to allow authentication via pidfds.
* Extended policy kit to enable secure authentication of processes via pidfds.
* Language support for pidfds: Go, Rust etc.
* An endless number of tools that added support for them.
* glibc support for pidfd apis.

There's a bunch more. That literally obliterated whole bug classes.

I think with PIDFD_THREAD support it might make it interesting to use it
for some pthread management as well.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 14:39   ` Christian Brauner
@ 2024-01-31 16:10     ` Oleg Nesterov
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2024-01-31 16:10 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

On 01/31, Christian Brauner wrote:
>
> On Wed, Jan 31, 2024 at 03:12:04PM +0100, Oleg Nesterov wrote:
> >
> > After this patch we can easily add another feature, pidfd_poll()
> > can add, say, POLLHUP to poll_flags if the pid is "dead".
> >
> > So the user can do
> >
> > 	poll(pidfd, { .revents = POLLHUP });
> >
> > and it will block until release_task() is called and this pid is
> > no longer in use (pid_task() == NULL).
> >
> > Do you think this can be useful?
>
> Yeah, I think this is something that people would find useful. IIUC, it
> would essentially allow them to do things like wait until a task has
> been waited upon

Exactly.

OK. I'll try to make the (hopefully simple) patch on top of this one
on Friday, if Tycho agrees with V3. Will be busy tomorrow.

> * systemd completely relying on pidfds to manage services to guard
>   against any pid races.
> * Extended dbus to allow authentication via pidfds.
> * Extended policy kit to enable secure authentication of processes via pidfds.
> * Language support for pidfds: Go, Rust etc.
> * An endless number of tools that added support for them.
> * glibc support for pidfd apis.
>
> There's a bunch more. That literally obliterated whole bug classes.

Thanks for this info!

Not that I ever thouhgt that pidfd is "useless", not at all, but as I said
(and as a Perl progammer ;) I simply do not know what people actually do with
pidfds ;)

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 13:26 ` [PATCH v3 1/1] " Oleg Nesterov
  2024-01-31 14:12   ` Christian Brauner
@ 2024-01-31 17:23   ` Tycho Andersen
  2024-01-31 18:01     ` Oleg Nesterov
  2024-02-01 17:25   ` Christian Brauner
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Tycho Andersen @ 2024-01-31 17:23 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Christian Brauner, Eric W. Biederman, linux-kernel

Hi Oleg,

On Wed, Jan 31, 2024 at 02:26:02PM +0100, Oleg Nesterov wrote:
> With this flag:
> 
> 	- pidfd_open() doesn't require that the target task must be
> 	  a thread-group leader
> 
> 	- pidfd_poll() succeeds when the task exits and becomes a
> 	  zombie (iow, passes exit_notify()), even if it is a leader
> 	  and thread-group is not empty.
> 
> 	  This means that the behaviour of pidfd_poll(PIDFD_THREAD,
> 	  pid-of-group-leader) is not well defined if it races with
> 	  exec() from its sub-thread; pidfd_poll() can succeed or not
> 	  depending on whether pidfd_task_exited() is called before
> 	  or after exchange_tids().
> 
> 	  Perhaps we can improve this behaviour later, pidfd_poll()
> 	  can probably take sig->group_exec_task into account. But
> 	  this doesn't really differ from the case when the leader
> 	  exits before other threads (so pidfd_poll() succeeds) and
> 	  then another thread execs and pidfd_poll() will block again.

I don't have a strong opinion here; leaving it "undefined" for now is
fine with me.

> @@ -2173,7 +2195,9 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
>   */
>  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
>  {
> -	if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> +	bool thread = flags & PIDFD_THREAD;
> +
> +	if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID));

Small typo here, trailing ;. When I fix that, tests pass for me.

Assuming that's fixed up:

Reviewed-by: Tycho Andersen <tandersen@netflix.com>
Tested-by: Tycho Andersen <tandersen@netflix.com>

Thanks for your help!

Tycho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 17:23   ` Tycho Andersen
@ 2024-01-31 18:01     ` Oleg Nesterov
  2024-01-31 18:15       ` Christian Brauner
  0 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2024-01-31 18:01 UTC (permalink / raw)
  To: Tycho Andersen; +Cc: Christian Brauner, Eric W. Biederman, linux-kernel

On 01/31, Tycho Andersen wrote:
>
> >  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> >  {
> > -	if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> > +	bool thread = flags & PIDFD_THREAD;
> > +
> > +	if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID));
>
> Small typo here, trailing ;. When I fix that, tests pass for me.

OOPS.

Christian, should I spam lkml with v4, or will you add this fix yourself?

> Assuming that's fixed up:
>
> Reviewed-by: Tycho Andersen <tandersen@netflix.com>
> Tested-by: Tycho Andersen <tandersen@netflix.com>

Thanks Tycho!!!

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 18:01     ` Oleg Nesterov
@ 2024-01-31 18:15       ` Christian Brauner
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Brauner @ 2024-01-31 18:15 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Tycho Andersen, Eric W. Biederman, linux-kernel

On Wed, Jan 31, 2024 at 07:01:47PM +0100, Oleg Nesterov wrote:
> On 01/31, Tycho Andersen wrote:
> >
> > >  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> > >  {
> > > -	if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> > > +	bool thread = flags & PIDFD_THREAD;
> > > +
> > > +	if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID));
> >
> > Small typo here, trailing ;. When I fix that, tests pass for me.
> 
> OOPS.
> 
> Christian, should I spam lkml with v4, or will you add this fix yourself?

I fixed that up. No need to resend!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 13:26 ` [PATCH v3 1/1] " Oleg Nesterov
  2024-01-31 14:12   ` Christian Brauner
  2024-01-31 17:23   ` Tycho Andersen
@ 2024-02-01 17:25   ` Christian Brauner
  2024-02-01 18:10     ` Oleg Nesterov
  2024-02-02 12:50   ` Christian Brauner
  2024-02-08  7:54   ` kernel test robot
  4 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-02-01 17:25 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

On Wed, Jan 31, 2024 at 02:26:02PM +0100, Oleg Nesterov wrote:
> With this flag:
> 
> 	- pidfd_open() doesn't require that the target task must be
> 	  a thread-group leader
> 
> 	- pidfd_poll() succeeds when the task exits and becomes a
> 	  zombie (iow, passes exit_notify()), even if it is a leader
> 	  and thread-group is not empty.
> 
> 	  This means that the behaviour of pidfd_poll(PIDFD_THREAD,
> 	  pid-of-group-leader) is not well defined if it races with
> 	  exec() from its sub-thread; pidfd_poll() can succeed or not
> 	  depending on whether pidfd_task_exited() is called before
> 	  or after exchange_tids().
> 
> 	  Perhaps we can improve this behaviour later, pidfd_poll()
> 	  can probably take sig->group_exec_task into account. But
> 	  this doesn't really differ from the case when the leader
> 	  exits before other threads (so pidfd_poll() succeeds) and
> 	  then another thread execs and pidfd_poll() will block again.
> 
> thread_group_exited() is no longer used, perhaps it can die.
> 
> Co-developed-by: Tycho Andersen <tycho@tycho.pizza>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  fs/exec.c                  |  6 +++++-
>  include/linux/pid.h        |  3 ++-
>  include/uapi/linux/pidfd.h |  3 ++-
>  kernel/exit.c              |  7 +++++++
>  kernel/fork.c              | 38 +++++++++++++++++++++++++++++++-------
>  kernel/pid.c               | 14 +++-----------
>  kernel/signal.c            |  6 ++++--
>  7 files changed, 54 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 73e4045df271..0fd7e668c477 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1143,7 +1143,11 @@ static int de_thread(struct task_struct *tsk)
>  
>  		BUG_ON(leader->exit_state != EXIT_ZOMBIE);
>  		leader->exit_state = EXIT_DEAD;
> -
> +		/*
> +		 * leader and tsk exhanged their pids, the old pid dies,
> +		 * wake up the PIDFD_THREAD waiters.
> +		 */
> +		do_notify_pidfd(leader);
>  		/*
>  		 * We are going to release_task()->ptrace_unlink() silently,
>  		 * the tracer can sleep in do_wait(). EXIT_DEAD guarantees
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index e6a041cb8bac..8124d57752b9 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -70,10 +70,11 @@ extern const struct file_operations pidfd_fops;
>  
>  struct file;
>  
> -extern struct pid *pidfd_pid(const struct file *file);
> +struct pid *pidfd_pid(const struct file *file);
>  struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
>  struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
>  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
> +void do_notify_pidfd(struct task_struct *task);
>  
>  static inline struct pid *get_pid(struct pid *pid)
>  {
> diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
> index 5406fbc13074..2e6461459877 100644
> --- a/include/uapi/linux/pidfd.h
> +++ b/include/uapi/linux/pidfd.h
> @@ -7,6 +7,7 @@
>  #include <linux/fcntl.h>
>  
>  /* Flags for pidfd_open().  */
> -#define PIDFD_NONBLOCK O_NONBLOCK
> +#define PIDFD_NONBLOCK	O_NONBLOCK
> +#define PIDFD_THREAD	O_EXCL
>  
>  #endif /* _UAPI_LINUX_PIDFD_H */
> diff --git a/kernel/exit.c b/kernel/exit.c
> index dfb963d2f862..493647fd7c07 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -739,6 +739,13 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>  		kill_orphaned_pgrp(tsk->group_leader, NULL);
>  
>  	tsk->exit_state = EXIT_ZOMBIE;
> +	/*
> +	 * sub-thread or delay_group_leader(), wake up the
> +	 * PIDFD_THREAD waiters.
> +	 */
> +	if (!thread_group_empty(tsk))
> +		do_notify_pidfd(tsk);
> +
>  	if (unlikely(tsk->ptrace)) {
>  		int sig = thread_group_leader(tsk) &&
>  				thread_group_empty(tsk) &&
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 347641398f9d..bea32071fff2 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -101,6 +101,7 @@
>  #include <linux/user_events.h>
>  #include <linux/iommu.h>
>  #include <linux/rseq.h>
> +#include <uapi/linux/pidfd.h>
>  
>  #include <asm/pgalloc.h>
>  #include <linux/uaccess.h>
> @@ -2050,6 +2051,8 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
>  
>  	seq_put_decimal_ll(m, "Pid:\t", nr);
>  
> +	/* TODO: report PIDFD_THREAD */

One more thing that came to my mind. We also support waitid(P_PIDFD,
pidfd). And I just looked through the code and I think it does the right
thing when we pass it a PIDFD_THREAD pidfd because wait_consider_task()
rejects task->exit_signal != SIGCHLD in eligible_child() and
CLONE_THREAD implies task->exit_signal == -1.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-02-01 17:25   ` Christian Brauner
@ 2024-02-01 18:10     ` Oleg Nesterov
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2024-02-01 18:10 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

On 02/01, Christian Brauner wrote:
>
> One more thing that came to my mind. We also support waitid(P_PIDFD,
> pidfd). And I just looked through the code and I think it does the right
> thing when we pass it a PIDFD_THREAD pidfd

Yes, I too looked into kernel_waitid_prepare(P_PIDFD) and didn't notice
anything wrong,

> because wait_consider_task()

Even simpler, I think. waitid(P_PIDFD, pidfd_with_PIDFD_THREAD) doesn't
really differ from waitid(P_PID, pidfd_pid(pidfd_with_PIDFD_THREAD)), so
it should work "as expected".

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 13:26 ` [PATCH v3 1/1] " Oleg Nesterov
                     ` (2 preceding siblings ...)
  2024-02-01 17:25   ` Christian Brauner
@ 2024-02-02 12:50   ` Christian Brauner
  2024-02-05 15:22     ` Oleg Nesterov
  2024-02-08  7:54   ` kernel test robot
  4 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-02-02 12:50 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

> @@ -2050,6 +2051,8 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
>  
>  	seq_put_decimal_ll(m, "Pid:\t", nr);
>  
> +	/* TODO: report PIDFD_THREAD */

So I think we don't need to do anything here. Since PIDFD_THREAD sets
O_EXCL in file->f_flags and in contrast to do_dentry_open() it isn't
dropped. So userspace can already detect PIDFD_NONBLOCK as O_NONBLOCK
and PIDFD_THREAD as O_EXCL.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-02-02 12:50   ` Christian Brauner
@ 2024-02-05 15:22     ` Oleg Nesterov
  2024-02-06 13:38       ` Christian Brauner
  0 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2024-02-05 15:22 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

On 02/02, Christian Brauner wrote:
>
> > @@ -2050,6 +2051,8 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
> >
> >  	seq_put_decimal_ll(m, "Pid:\t", nr);
> >
> > +	/* TODO: report PIDFD_THREAD */
>
> So I think we don't need to do anything here. Since PIDFD_THREAD sets
> O_EXCL in file->f_flags and in contrast to do_dentry_open() it isn't
> dropped. So userspace can already detect PIDFD_NONBLOCK as O_NONBLOCK
> and PIDFD_THREAD as O_EXCL.

Ah, indeed, I didn't know that fs/proc/fd.c:seq_show() reports ->f_flags.
Thanks.



OK, what about another TODO in sys_pidfd_send_signal() ?

I mean, should I send a simple patch which changes pidfd_send_signal()
to use do_send_specific() if PIDFD_THREAD ? Or do you think this should
be controlled by pidfd_send_signal's "flags" argument?

I honestly do not know what makes more sense.

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-02-05 15:22     ` Oleg Nesterov
@ 2024-02-06 13:38       ` Christian Brauner
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Brauner @ 2024-02-06 13:38 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Eric W. Biederman, Tycho Andersen, linux-kernel

On Mon, Feb 05, 2024 at 04:22:41PM +0100, Oleg Nesterov wrote:
> On 02/02, Christian Brauner wrote:
> >
> > > @@ -2050,6 +2051,8 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
> > >
> > >  	seq_put_decimal_ll(m, "Pid:\t", nr);
> > >
> > > +	/* TODO: report PIDFD_THREAD */
> >
> > So I think we don't need to do anything here. Since PIDFD_THREAD sets
> > O_EXCL in file->f_flags and in contrast to do_dentry_open() it isn't
> > dropped. So userspace can already detect PIDFD_NONBLOCK as O_NONBLOCK
> > and PIDFD_THREAD as O_EXCL.
> 
> Ah, indeed, I didn't know that fs/proc/fd.c:seq_show() reports ->f_flags.
> Thanks.
> 
> 
> 
> OK, what about another TODO in sys_pidfd_send_signal() ?
> 
> I mean, should I send a simple patch which changes pidfd_send_signal()
> to use do_send_specific() if PIDFD_THREAD ? Or do you think this should

Yes, by default the type of pidfd should determine the scope of the
signal as discussed elsewhere.

(And later we can add flags that allow the caller to change the scope of
the signal. So that Andy's use-cases are covered as well.)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-01-31 13:26 ` [PATCH v3 1/1] " Oleg Nesterov
                     ` (3 preceding siblings ...)
  2024-02-02 12:50   ` Christian Brauner
@ 2024-02-08  7:54   ` kernel test robot
  2024-02-08  8:58     ` Christian Brauner
  4 siblings, 1 reply; 19+ messages in thread
From: kernel test robot @ 2024-02-08  7:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: oe-lkp, lkp, Tycho Andersen, linux-mm, linux-kernel,
	Christian Brauner, Eric W. Biederman, oliver.sang

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]



Hello,

kernel test robot noticed "last_state.load_disk_fail" on:

commit: aa9c201b150f15cf12e8df8af531b2c74ae1a8fc ("[PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()")
url: https://github.com/intel-lab-lkp/linux/commits/Oleg-Nesterov/pidfd-implement-PIDFD_THREAD-flag-for-pidfd_open/20240131-213408
base: https://git.kernel.org/cgit/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/all/20240131132602.GA23641@redhat.com/
patch subject: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()

in testcase: boot

compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202402081434.9e1ded3-oliver.sang@intel.com



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240208/202402081434.9e1ded3-oliver.sang@intel.com


as in dmesg in above link:

LKP: ttyS0: 1408: can't load the disk /dev/disk/by-id/ata-SanDisk_SDSSDH3250G_182971800454-part1, skip testing...

we know this should be related with some early issues but we failed to figure
out. so here also attached parent dmesg as dmesg-e0ee7b583f.xz FYI.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


[-- Attachment #2: dmesg-e0ee7b583f.xz --]
[-- Type: application/x-xz, Size: 25960 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-02-08  7:54   ` kernel test robot
@ 2024-02-08  8:58     ` Christian Brauner
  2024-02-08  9:18       ` Oleg Nesterov
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Brauner @ 2024-02-08  8:58 UTC (permalink / raw)
  To: kernel test robot
  Cc: Oleg Nesterov, oe-lkp, lkp, Tycho Andersen, linux-mm,
	linux-kernel, Eric W. Biederman

On Thu, Feb 08, 2024 at 03:54:44PM +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed "last_state.load_disk_fail" on:
> 
> commit: aa9c201b150f15cf12e8df8af531b2c74ae1a8fc ("[PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()")
> url: https://github.com/intel-lab-lkp/linux/commits/Oleg-Nesterov/pidfd-implement-PIDFD_THREAD-flag-for-pidfd_open/20240131-213408
> base: https://git.kernel.org/cgit/linux/kernel/git/vfs/vfs.git vfs.all
> patch link: https://lore.kernel.org/all/20240131132602.GA23641@redhat.com/
> patch subject: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
> 
> in testcase: boot
> 
> compiler: gcc-12
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202402081434.9e1ded3-oliver.sang@intel.com
> 
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240208/202402081434.9e1ded3-oliver.sang@intel.com
> 
> 
> as in dmesg in above link:
> 
> LKP: ttyS0: 1408: can't load the disk /dev/disk/by-id/ata-SanDisk_SDSSDH3250G_182971800454-part1, skip testing...
> 
> we know this should be related with some early issues but we failed to figure
> out. so here also attached parent dmesg as dmesg-e0ee7b583f.xz FYI.

I have a hard time seeing how this would be caused by any of Oleg's
changes. Plus, the merges from the vfs tree linked under "url:" above
are all pretty out of date?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
  2024-02-08  8:58     ` Christian Brauner
@ 2024-02-08  9:18       ` Oleg Nesterov
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2024-02-08  9:18 UTC (permalink / raw)
  To: Christian Brauner
  Cc: kernel test robot, oe-lkp, lkp, Tycho Andersen, linux-mm,
	linux-kernel, Eric W. Biederman

On 02/08, Christian Brauner wrote:
>
> On Thu, Feb 08, 2024 at 03:54:44PM +0800, kernel test robot wrote:
> > 
> > 
> > Hello,
> > 
> > kernel test robot noticed "last_state.load_disk_fail" on:
> > 
> > commit: aa9c201b150f15cf12e8df8af531b2c74ae1a8fc ("[PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()")
> > url: https://github.com/intel-lab-lkp/linux/commits/Oleg-Nesterov/pidfd-implement-PIDFD_THREAD-flag-for-pidfd_open/20240131-213408
> > base: https://git.kernel.org/cgit/linux/kernel/git/vfs/vfs.git vfs.all
> > patch link: https://lore.kernel.org/all/20240131132602.GA23641@redhat.com/
> > patch subject: [PATCH v3 1/1] pidfd: implement PIDFD_THREAD flag for pidfd_open()
> > 
> > in testcase: boot
> > 
> > compiler: gcc-12
> > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> > 
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > 
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202402081434.9e1ded3-oliver.sang@intel.com
> > 
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20240208/202402081434.9e1ded3-oliver.sang@intel.com
> > 
> > 
> > as in dmesg in above link:
> > 
> > LKP: ttyS0: 1408: can't load the disk /dev/disk/by-id/ata-SanDisk_SDSSDH3250G_182971800454-part1, skip testing...
> > 
> > we know this should be related with some early issues but we failed to figure
> > out. so here also attached parent dmesg as dmesg-e0ee7b583f.xz FYI.
> 
> I have a hard time seeing how this would be caused by any of Oleg's
> changes. Plus, the merges from the vfs tree linked under "url:" above
> are all pretty out of date?

Yes, the patch under url still has the extra trailing ";" after "if"
so pidfd_prepare() will always fail. Fixed by Tycho.

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-02-08  9:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31 13:25 [PATCH v3 0/1] pidfd: implement PIDFD_THREAD flag for pidfd_open() Oleg Nesterov
2024-01-31 13:26 ` [PATCH v3 1/1] " Oleg Nesterov
2024-01-31 14:12   ` Christian Brauner
2024-01-31 14:19     ` Oleg Nesterov
2024-01-31 17:23   ` Tycho Andersen
2024-01-31 18:01     ` Oleg Nesterov
2024-01-31 18:15       ` Christian Brauner
2024-02-01 17:25   ` Christian Brauner
2024-02-01 18:10     ` Oleg Nesterov
2024-02-02 12:50   ` Christian Brauner
2024-02-05 15:22     ` Oleg Nesterov
2024-02-06 13:38       ` Christian Brauner
2024-02-08  7:54   ` kernel test robot
2024-02-08  8:58     ` Christian Brauner
2024-02-08  9:18       ` Oleg Nesterov
2024-01-31 13:58 ` [PATCH v3 0/1] " Christian Brauner
2024-01-31 14:12 ` Oleg Nesterov
2024-01-31 14:39   ` Christian Brauner
2024-01-31 16:10     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).