All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <christian@brauner.io>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: Jann Horn <jannh@google.com>, Oleg Nesterov <oleg@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	kernel list <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Steven Rostedt <rostedt@goodmis.org>,
	Daniel Colascione <dancol@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrei Vagin <avagin@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Kees Cook <keescook@chromium.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kselftest@vger.kernel.org, Michal Hocko <mhocko@suse.com>,
	Nadav Amit <namit@vmware.com>, Serge Hallyn <serge@hallyn.com>,
	Shuah Khan <shuah@kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Taehee Yoo <ap420073@gmail.com>, Tejun Heo <tj@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	kernel-team <kernel-team@android.com>,
	Tycho Andersen <tycho@tycho.ws>
Subject: Re: [PATCH RFC 1/2] Add polling support to pidfd
Date: Fri, 19 Apr 2019 22:01:00 +0200	[thread overview]
Message-ID: <20190419200059.r2f7i7jtlsza4eun@brauner.io> (raw)
In-Reply-To: <20190419194902.GE251571@google.com>

On Fri, Apr 19, 2019 at 03:49:02PM -0400, Joel Fernandes wrote:
> On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote:
> > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn <jannh@google.com> wrote:
> > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov <oleg@redhat.com> wrote:
> > > > >> On 04/16, Joel Fernandes wrote:
> > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov wrote:
> > > > >> > >
> > > > >> > > Could you explain when it should return POLLIN? When the whole
> > > > >process exits?
> > > > >> >
> > > > >> > It returns POLLIN when the task is dead or doesn't exist anymore,
> > > > >or when it
> > > > >> > is in a zombie state and there's no other thread in the thread
> > > > >group.
> > > > >>
> > > > >> IOW, when the whole thread group exits, so it can't be used to
> > > > >monitor sub-threads.
> > > > >>
> > > > >> just in case... speaking of this patch it doesn't modify
> > > > >proc_tid_base_operations,
> > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc you are
> > > > >going to use
> > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > >
> > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > >proc_tgid_base_operations despite not being a thread group leader.
> > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this code can
> > > > >be hit trivially, and then the code will misbehave.
> > > > >
> > > > >@Joel: I think you'll have to either rewrite this to explicitly bail
> > > > >out if you're dealing with a thread group leader, or make the code
> > > > >work for threads, too.
> > > > 
> > > > The latter case probably being preferred if this API is supposed to be
> > > > useable for thread management in userspace.
> > > 
> > > At the moment, we are not planning to use this for sub-thread management. I
> > > am reworking this patch to only work on clone(2) pidfds which makes the above
> > 
> > Indeed and agreed.
> > 
> > > discussion about /proc a bit unnecessary I think. Per the latest CLONE_PIDFD
> > > patches, CLONE_THREAD with pidfd is not supported.
> > 
> > Yes. We have no one asking for it right now and we can easily add this
> > later.
> > 
> > Admittedly I haven't gotten around to reviewing the patches here yet
> > completely. But one thing about using POLLIN. FreeBSD is using POLLHUP
> > on process exit which I think is nice as well. How about returning
> > POLLIN | POLLHUP on process exit?
> > We already do things like this. For example, when you proxy between
> > ttys. If the process that you're reading data from has exited and closed
> > it's end you still can't usually simply exit because it might have still
> > buffered data that you want to read.  The way one can deal with this
> > from  userspace is that you can observe a (POLLHUP | POLLIN) event and
> > you keep on reading until you only observe a POLLHUP without a POLLIN
> > event at which point you know you have read
> > all data.
> > I like the semantics for pidfds as well as it would indicate:
> > - POLLHUP -> process has exited
> > - POLLIN  -> information can be read
> 
> Actually I think a bit different about this, in my opinion the pidfd should
> always be readable (we would store the exit status somewhere in the future
> which would be readable, even after task_struct is dead). So I was thinking

So your idea is that you always get EPOLLIN when the process is alive,
i.e. epoll_wait() immediately returns for a pidfd that referes to a live
process if you specify EPOLLIN? E.g. if I specify EPOLLIN | EPOLLHUP
then epoll_wait() would constantly return. I would then need to check
for EPOLLHUP, see that it is not present and then go back into the
epoll_wait() loop and play the same game again?
What do you need this for?
And if you have a valid reason to do this would it make sense to set
POLLPRI if the actual exit status can be read? This way one could at
least specify POLLPRI | POLLHUP without being constantly woken.

> we always return EPOLLIN.  If process has not exited, then it blocks.
> 
> However, we also are returning EPOLLERR in previous patch if the task_struct
> has been reaped (task == NULL). I could change that to EPOLLHUP.

That would be here, right?:

> +	if (!task)
> +		poll_flags = POLLIN | POLLRDNORM | POLLHUP;

That sounds better to me that EPOLLERR.

> 
> So the update patch looks like below. Thoughts?
> 
> ---8<-----------------------
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 6a803a0b75df..eb279b5f4115 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3069,8 +3069,45 @@ static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
>  				   tgid_base_stuff, ARRAY_SIZE(tgid_base_stuff));
>  }
>  
> +static unsigned int proc_tgid_base_poll(struct file *file, struct poll_table_struct *pts)
> +{
> +	int poll_flags = 0;
> +	struct task_struct *task;
> +	struct pid *pid;
> +
> +	task = get_proc_task(file->f_path.dentry->d_inode);
> +
> +	WARN_ON_ONCE(task && !thread_group_leader(task));
> +
> +	/*
> +	 * tasklist_lock must be held because to avoid racing with
> +	 * changes in exit_state and wake up. Basically to avoid:
> +	 *
> +	 * P0: read exit_state = 0
> +	 * P1: write exit_state = EXIT_DEAD
> +	 * P1: Do a wake up - wq is empty, so do nothing
> +	 * P0: Queue for polling - wait forever.
> +	 */
> +	read_lock(&tasklist_lock);
> +	if (!task)
> +		poll_flags = POLLIN | POLLRDNORM | POLLHUP;
> +	else if (task->exit_state && thread_group_empty(task))
> +		poll_flags = POLLIN | POLLRDNORM;
> +
> +	if (!poll_flags) {
> +		pid = proc_pid(file->f_path.dentry->d_inode);
> +		poll_wait(file, &pid->wait_pidfd, pts);
> +	}
> +	read_unlock(&tasklist_lock);
> +
> +	if (task)
> +		put_task_struct(task);
> +	return poll_flags;
> +}
> +
>  static const struct file_operations proc_tgid_base_operations = {
>  	.read		= generic_read_dir,
> +	.poll		= proc_tgid_base_poll,
>  	.iterate_shared	= proc_tgid_base_readdir,
>  	.llseek		= generic_file_llseek,
>  };
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index b6f4ba16065a..2e0dcbc6d14e 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_PID_H
>  
>  #include <linux/rculist.h>
> +#include <linux/wait.h>
>  
>  enum pid_type
>  {
> @@ -60,6 +61,8 @@ struct pid
>  	unsigned int level;
>  	/* lists of tasks that use this pid */
>  	struct hlist_head tasks[PIDTYPE_MAX];
> +	/* wait queue for pidfd pollers */
> +	wait_queue_head_t wait_pidfd;
>  	struct rcu_head rcu;
>  	struct upid numbers[1];
>  };
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 20881598bdfa..5c90c239242f 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -214,6 +214,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>  	for (type = 0; type < PIDTYPE_MAX; ++type)
>  		INIT_HLIST_HEAD(&pid->tasks[type]);
>  
> +	init_waitqueue_head(&pid->wait_pidfd);
> +
>  	upid = pid->numbers + ns->level;
>  	spin_lock_irq(&pidmap_lock);
>  	if (!(ns->pid_allocated & PIDNS_ADDING))
> diff --git a/kernel/signal.c b/kernel/signal.c
> index f98448cf2def..e3781703ef7e 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1800,6 +1800,17 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>  	return ret;
>  }
>  
> +static void do_wakeup_pidfd_pollers(struct task_struct *task)
> +{
> +	struct pid *pid;
> +
> +	lockdep_assert_held(&tasklist_lock);
> +
> +	pid = get_task_pid(task, PIDTYPE_PID);
> +	wake_up_all(&pid->wait_pidfd);
> +	put_pid(pid);
> +}
> +
>  /*
>   * Let a parent know about the death of a child.
>   * For a stopped/continued status change, use do_notify_parent_cldstop instead.
> @@ -1823,6 +1834,9 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  	BUG_ON(!tsk->ptrace &&
>  	       (tsk->group_leader != tsk || !thread_group_empty(tsk)));
>  
> +	/* Wake up all pidfd waiters */
> +	do_wakeup_pidfd_pollers(tsk);
> +
>  	if (sig != SIGCHLD) {
>  		/*
>  		 * This is only possible if parent == real_parent.
> diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
> index deaf8073bc06..4b31c14f273c 100644
> --- a/tools/testing/selftests/pidfd/Makefile
> +++ b/tools/testing/selftests/pidfd/Makefile
> @@ -1,4 +1,4 @@
> -CFLAGS += -g -I../../../../usr/include/
> +CFLAGS += -g -I../../../../usr/include/ -lpthread
>  
>  TEST_GEN_PROGS := pidfd_test
>  
> diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c
> index d59378a93782..57ae217339e9 100644
> --- a/tools/testing/selftests/pidfd/pidfd_test.c
> +++ b/tools/testing/selftests/pidfd/pidfd_test.c
> @@ -4,18 +4,26 @@
>  #include <errno.h>
>  #include <fcntl.h>
>  #include <linux/types.h>
> +#include <pthread.h>
>  #include <sched.h>
>  #include <signal.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <syscall.h>
> +#include <sys/epoll.h>
> +#include <sys/mman.h>
>  #include <sys/mount.h>
>  #include <sys/wait.h>
> +#include <time.h>
>  #include <unistd.h>
>  
>  #include "../kselftest.h"
>  
> +#define CHILD_THREAD_MIN_WAIT 3 /* seconds */
> +#define MAX_EVENTS 5
> +#define __NR_pidfd_send_signal 424
> +
>  static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
>  					unsigned int flags)
>  {
> @@ -30,6 +38,22 @@ static void set_signal_received_on_sigusr1(int sig)
>  		signal_received = 1;
>  }
>  
> +static int open_pidfd(const char *test_name, pid_t pid)
> +{
> +	char buf[256];
> +	int pidfd;
> +
> +	snprintf(buf, sizeof(buf), "/proc/%d", pid);
> +	pidfd = open(buf, O_DIRECTORY | O_CLOEXEC);
> +
> +	if (pidfd < 0)
> +		ksft_exit_fail_msg(
> +			"%s test: Failed to open process file descriptor\n",
> +			test_name);
> +
> +	return pidfd;
> +}
> +
>  /*
>   * Straightforward test to see whether pidfd_send_signal() works is to send
>   * a signal to ourself.
> @@ -87,7 +111,6 @@ static int wait_for_pid(pid_t pid)
>  static int test_pidfd_send_signal_exited_fail(void)
>  {
>  	int pidfd, ret, saved_errno;
> -	char buf[256];
>  	pid_t pid;
>  	const char *test_name = "pidfd_send_signal signal exited process";
>  
> @@ -99,17 +122,10 @@ static int test_pidfd_send_signal_exited_fail(void)
>  	if (pid == 0)
>  		_exit(EXIT_SUCCESS);
>  
> -	snprintf(buf, sizeof(buf), "/proc/%d", pid);
> -
> -	pidfd = open(buf, O_DIRECTORY | O_CLOEXEC);
> +	pidfd = open_pidfd(test_name, pid);
>  
>  	(void)wait_for_pid(pid);
>  
> -	if (pidfd < 0)
> -		ksft_exit_fail_msg(
> -			"%s test: Failed to open process file descriptor\n",
> -			test_name);
> -
>  	ret = sys_pidfd_send_signal(pidfd, 0, NULL, 0);
>  	saved_errno = errno;
>  	close(pidfd);
> @@ -368,10 +384,179 @@ static int test_pidfd_send_signal_syscall_support(void)
>  	return 0;
>  }
>  
> +void *test_pidfd_poll_exec_thread(void *priv)
> +{
> +	char waittime[256];
> +
> +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> +			getpid(), syscall(SYS_gettid));
> +	ksft_print_msg("Child Thread: doing exec of sleep\n");
> +
> +	sprintf(waittime, "%d", CHILD_THREAD_MIN_WAIT);
> +	execl("/bin/sleep", "sleep", waittime, (char *)NULL);
> +
> +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n",
> +			getpid(), syscall(SYS_gettid));
> +	return NULL;
> +}
> +
> +static int poll_pidfd(const char *test_name, int pidfd)
> +{
> +	int c;
> +	int epoll_fd = epoll_create1(0);
> +	struct epoll_event event, events[MAX_EVENTS];
> +
> +	if (epoll_fd == -1)
> +		ksft_exit_fail_msg("%s test: Failed to create epoll file descriptor\n",
> +				   test_name);
> +
> +	event.events = EPOLLIN;
> +	event.data.fd = pidfd;
> +
> +	if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, pidfd, &event)) {
> +		ksft_print_msg("%s test: Failed to add epoll file descriptor: Skipping\n",
> +			       test_name);
> +		_exit(PIDFD_SKIP);
> +	}
> +
> +	c = epoll_wait(epoll_fd, events, MAX_EVENTS, 5000);
> +	if (c != 1 || !(events[0].events & EPOLLIN))
> +		ksft_exit_fail_msg("%s test: Unexpected epoll_wait result (c=%d, events=%x)\n",
> +				   test_name, c, events[0].events);
> +
> +	close(epoll_fd);
> +	return events[0].events;
> +
> +}
> +
> +int test_pidfd_poll_exec(int use_waitpid)
> +{
> +	int pid, pidfd;
> +	int status, ret;
> +	pthread_t t1;
> +	time_t prog_start = time(NULL);
> +	const char *test_name = "pidfd_poll check for premature notification on child thread exec";
> +
> +	ksft_print_msg("Parent: pid: %d\n", getpid());
> +	pid = fork();
> +	if (pid == 0) {
> +		ksft_print_msg("Child: starting. pid %d tid %d\n", getpid(),
> +				syscall(SYS_gettid));
> +		pthread_create(&t1, NULL, test_pidfd_poll_exec_thread, NULL);
> +		/*
> +		 * Exec in the non-leader thread will destroy the leader immediately.
> +		 * If the wait in the parent returns too soon, the test fails.
> +		 */
> +		while (1)
> +			;
> +	}
> +
> +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> +
> +	if (use_waitpid) {
> +		ret = waitpid(pid, &status, 0);
> +		if (ret == -1)
> +			ksft_print_msg("Parent: error\n");
> +
> +		if (ret == pid)
> +			ksft_print_msg("Parent: Child process waited for.\n");
> +	} else {
> +		pidfd = open_pidfd(test_name, pid);
> +		poll_pidfd(test_name, pidfd);
> +	}
> +
> +	time_t prog_time = time(NULL) - prog_start;
> +
> +	ksft_print_msg("Time waited for child: %lu\n", prog_time);
> +
> +	close(pidfd);
> +
> +	if (prog_time < CHILD_THREAD_MIN_WAIT || prog_time > CHILD_THREAD_MIN_WAIT + 2)
> +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> +	else
> +		ksft_test_result_pass("%s test: Passed\n", test_name);
> +}
> +
> +void *test_pidfd_poll_leader_exit_thread(void *priv)
> +{
> +	char waittime[256];
> +
> +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> +			getpid(), syscall(SYS_gettid));
> +	sleep(CHILD_THREAD_MIN_WAIT);
> +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> +	return NULL;
> +}
> +
> +static time_t *child_exit_secs;
> +int test_pidfd_poll_leader_exit(int use_waitpid)
> +{
> +	int pid, pidfd;
> +	int status, ret;
> +	pthread_t t1, t2;
> +	time_t prog_start = time(NULL);
> +	const char *test_name = "pidfd_poll check for premature notification on non-empty"
> +				"group leader exit";
> +
> +	child_exit_secs = mmap(NULL, sizeof *child_exit_secs, PROT_READ | PROT_WRITE,
> +			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> +
> +	ksft_print_msg("Parent: pid: %d\n", getpid());
> +	pid = fork();
> +	if (pid == 0) {
> +		ksft_print_msg("Child: starting. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> +		pthread_create(&t1, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> +		pthread_create(&t2, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> +
> +		/*
> +		 * glibc exit calls exit_group syscall, so explicity call exit only
> +		 * so that only the group leader exits, leaving the threads alone.
> +		 */
> +		*child_exit_secs = time(NULL);
> +		syscall(SYS_exit, 0);
> +	}
> +
> +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> +
> +	if (use_waitpid) {
> +		ret = waitpid(pid, &status, 0);
> +		if (ret == -1)
> +			ksft_print_msg("Parent: error\n");
> +	} else {
> +		/*
> +		 * This sleep tests for the case where if the child exits, and is in
> +		 * EXIT_ZOMBIE, but the thread group leader is non-empty, then the poll
> +		 * doesn't prematurely return even though there are active threads
> +		 */
> +		sleep(1);
> +		pidfd = open_pidfd(test_name, pid);
> +		poll_pidfd(test_name, pidfd);
> +	}
> +
> +	if (ret == pid)
> +		ksft_print_msg("Parent: Child process waited for.\n");
> +
> +	time_t since_child_exit = time(NULL) - *child_exit_secs;
> +
> +	ksft_print_msg("Time since child exit: %lu\n", since_child_exit);
> +
> +	close(pidfd);
> +
> +	if (since_child_exit < CHILD_THREAD_MIN_WAIT ||
> +			since_child_exit > CHILD_THREAD_MIN_WAIT + 2)
> +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> +	else
> +		ksft_test_result_pass("%s test: Passed\n", test_name);
> +}
> +
>  int main(int argc, char **argv)
>  {
>  	ksft_print_header();
>  
> +	test_pidfd_poll_exec(0);
> +	test_pidfd_poll_exec(1);
> +	test_pidfd_poll_leader_exit(0);
> +	test_pidfd_poll_leader_exit(1);
>  	test_pidfd_send_signal_syscall_support();
>  	test_pidfd_send_signal_simple_success();
>  	test_pidfd_send_signal_exited_fail();

WARNING: multiple messages have this Message-ID (diff)
From: christian at brauner.io (Christian Brauner)
Subject: [PATCH RFC 1/2] Add polling support to pidfd
Date: Fri, 19 Apr 2019 22:01:00 +0200	[thread overview]
Message-ID: <20190419200059.r2f7i7jtlsza4eun@brauner.io> (raw)
In-Reply-To: <20190419194902.GE251571@google.com>

On Fri, Apr 19, 2019 at 03:49:02PM -0400, Joel Fernandes wrote:
> On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote:
> > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn <jannh at google.com> wrote:
> > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov <oleg at redhat.com> wrote:
> > > > >> On 04/16, Joel Fernandes wrote:
> > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov wrote:
> > > > >> > >
> > > > >> > > Could you explain when it should return POLLIN? When the whole
> > > > >process exits?
> > > > >> >
> > > > >> > It returns POLLIN when the task is dead or doesn't exist anymore,
> > > > >or when it
> > > > >> > is in a zombie state and there's no other thread in the thread
> > > > >group.
> > > > >>
> > > > >> IOW, when the whole thread group exits, so it can't be used to
> > > > >monitor sub-threads.
> > > > >>
> > > > >> just in case... speaking of this patch it doesn't modify
> > > > >proc_tid_base_operations,
> > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc you are
> > > > >going to use
> > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > >
> > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > >proc_tgid_base_operations despite not being a thread group leader.
> > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this code can
> > > > >be hit trivially, and then the code will misbehave.
> > > > >
> > > > >@Joel: I think you'll have to either rewrite this to explicitly bail
> > > > >out if you're dealing with a thread group leader, or make the code
> > > > >work for threads, too.
> > > > 
> > > > The latter case probably being preferred if this API is supposed to be
> > > > useable for thread management in userspace.
> > > 
> > > At the moment, we are not planning to use this for sub-thread management. I
> > > am reworking this patch to only work on clone(2) pidfds which makes the above
> > 
> > Indeed and agreed.
> > 
> > > discussion about /proc a bit unnecessary I think. Per the latest CLONE_PIDFD
> > > patches, CLONE_THREAD with pidfd is not supported.
> > 
> > Yes. We have no one asking for it right now and we can easily add this
> > later.
> > 
> > Admittedly I haven't gotten around to reviewing the patches here yet
> > completely. But one thing about using POLLIN. FreeBSD is using POLLHUP
> > on process exit which I think is nice as well. How about returning
> > POLLIN | POLLHUP on process exit?
> > We already do things like this. For example, when you proxy between
> > ttys. If the process that you're reading data from has exited and closed
> > it's end you still can't usually simply exit because it might have still
> > buffered data that you want to read.  The way one can deal with this
> > from  userspace is that you can observe a (POLLHUP | POLLIN) event and
> > you keep on reading until you only observe a POLLHUP without a POLLIN
> > event at which point you know you have read
> > all data.
> > I like the semantics for pidfds as well as it would indicate:
> > - POLLHUP -> process has exited
> > - POLLIN  -> information can be read
> 
> Actually I think a bit different about this, in my opinion the pidfd should
> always be readable (we would store the exit status somewhere in the future
> which would be readable, even after task_struct is dead). So I was thinking

So your idea is that you always get EPOLLIN when the process is alive,
i.e. epoll_wait() immediately returns for a pidfd that referes to a live
process if you specify EPOLLIN? E.g. if I specify EPOLLIN | EPOLLHUP
then epoll_wait() would constantly return. I would then need to check
for EPOLLHUP, see that it is not present and then go back into the
epoll_wait() loop and play the same game again?
What do you need this for?
And if you have a valid reason to do this would it make sense to set
POLLPRI if the actual exit status can be read? This way one could at
least specify POLLPRI | POLLHUP without being constantly woken.

> we always return EPOLLIN.  If process has not exited, then it blocks.
> 
> However, we also are returning EPOLLERR in previous patch if the task_struct
> has been reaped (task == NULL). I could change that to EPOLLHUP.

That would be here, right?:

> +	if (!task)
> +		poll_flags = POLLIN | POLLRDNORM | POLLHUP;

That sounds better to me that EPOLLERR.

> 
> So the update patch looks like below. Thoughts?
> 
> ---8<-----------------------
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 6a803a0b75df..eb279b5f4115 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3069,8 +3069,45 @@ static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
>  				   tgid_base_stuff, ARRAY_SIZE(tgid_base_stuff));
>  }
>  
> +static unsigned int proc_tgid_base_poll(struct file *file, struct poll_table_struct *pts)
> +{
> +	int poll_flags = 0;
> +	struct task_struct *task;
> +	struct pid *pid;
> +
> +	task = get_proc_task(file->f_path.dentry->d_inode);
> +
> +	WARN_ON_ONCE(task && !thread_group_leader(task));
> +
> +	/*
> +	 * tasklist_lock must be held because to avoid racing with
> +	 * changes in exit_state and wake up. Basically to avoid:
> +	 *
> +	 * P0: read exit_state = 0
> +	 * P1: write exit_state = EXIT_DEAD
> +	 * P1: Do a wake up - wq is empty, so do nothing
> +	 * P0: Queue for polling - wait forever.
> +	 */
> +	read_lock(&tasklist_lock);
> +	if (!task)
> +		poll_flags = POLLIN | POLLRDNORM | POLLHUP;
> +	else if (task->exit_state && thread_group_empty(task))
> +		poll_flags = POLLIN | POLLRDNORM;
> +
> +	if (!poll_flags) {
> +		pid = proc_pid(file->f_path.dentry->d_inode);
> +		poll_wait(file, &pid->wait_pidfd, pts);
> +	}
> +	read_unlock(&tasklist_lock);
> +
> +	if (task)
> +		put_task_struct(task);
> +	return poll_flags;
> +}
> +
>  static const struct file_operations proc_tgid_base_operations = {
>  	.read		= generic_read_dir,
> +	.poll		= proc_tgid_base_poll,
>  	.iterate_shared	= proc_tgid_base_readdir,
>  	.llseek		= generic_file_llseek,
>  };
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index b6f4ba16065a..2e0dcbc6d14e 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_PID_H
>  
>  #include <linux/rculist.h>
> +#include <linux/wait.h>
>  
>  enum pid_type
>  {
> @@ -60,6 +61,8 @@ struct pid
>  	unsigned int level;
>  	/* lists of tasks that use this pid */
>  	struct hlist_head tasks[PIDTYPE_MAX];
> +	/* wait queue for pidfd pollers */
> +	wait_queue_head_t wait_pidfd;
>  	struct rcu_head rcu;
>  	struct upid numbers[1];
>  };
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 20881598bdfa..5c90c239242f 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -214,6 +214,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>  	for (type = 0; type < PIDTYPE_MAX; ++type)
>  		INIT_HLIST_HEAD(&pid->tasks[type]);
>  
> +	init_waitqueue_head(&pid->wait_pidfd);
> +
>  	upid = pid->numbers + ns->level;
>  	spin_lock_irq(&pidmap_lock);
>  	if (!(ns->pid_allocated & PIDNS_ADDING))
> diff --git a/kernel/signal.c b/kernel/signal.c
> index f98448cf2def..e3781703ef7e 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1800,6 +1800,17 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>  	return ret;
>  }
>  
> +static void do_wakeup_pidfd_pollers(struct task_struct *task)
> +{
> +	struct pid *pid;
> +
> +	lockdep_assert_held(&tasklist_lock);
> +
> +	pid = get_task_pid(task, PIDTYPE_PID);
> +	wake_up_all(&pid->wait_pidfd);
> +	put_pid(pid);
> +}
> +
>  /*
>   * Let a parent know about the death of a child.
>   * For a stopped/continued status change, use do_notify_parent_cldstop instead.
> @@ -1823,6 +1834,9 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  	BUG_ON(!tsk->ptrace &&
>  	       (tsk->group_leader != tsk || !thread_group_empty(tsk)));
>  
> +	/* Wake up all pidfd waiters */
> +	do_wakeup_pidfd_pollers(tsk);
> +
>  	if (sig != SIGCHLD) {
>  		/*
>  		 * This is only possible if parent == real_parent.
> diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
> index deaf8073bc06..4b31c14f273c 100644
> --- a/tools/testing/selftests/pidfd/Makefile
> +++ b/tools/testing/selftests/pidfd/Makefile
> @@ -1,4 +1,4 @@
> -CFLAGS += -g -I../../../../usr/include/
> +CFLAGS += -g -I../../../../usr/include/ -lpthread
>  
>  TEST_GEN_PROGS := pidfd_test
>  
> diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c
> index d59378a93782..57ae217339e9 100644
> --- a/tools/testing/selftests/pidfd/pidfd_test.c
> +++ b/tools/testing/selftests/pidfd/pidfd_test.c
> @@ -4,18 +4,26 @@
>  #include <errno.h>
>  #include <fcntl.h>
>  #include <linux/types.h>
> +#include <pthread.h>
>  #include <sched.h>
>  #include <signal.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <syscall.h>
> +#include <sys/epoll.h>
> +#include <sys/mman.h>
>  #include <sys/mount.h>
>  #include <sys/wait.h>
> +#include <time.h>
>  #include <unistd.h>
>  
>  #include "../kselftest.h"
>  
> +#define CHILD_THREAD_MIN_WAIT 3 /* seconds */
> +#define MAX_EVENTS 5
> +#define __NR_pidfd_send_signal 424
> +
>  static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
>  					unsigned int flags)
>  {
> @@ -30,6 +38,22 @@ static void set_signal_received_on_sigusr1(int sig)
>  		signal_received = 1;
>  }
>  
> +static int open_pidfd(const char *test_name, pid_t pid)
> +{
> +	char buf[256];
> +	int pidfd;
> +
> +	snprintf(buf, sizeof(buf), "/proc/%d", pid);
> +	pidfd = open(buf, O_DIRECTORY | O_CLOEXEC);
> +
> +	if (pidfd < 0)
> +		ksft_exit_fail_msg(
> +			"%s test: Failed to open process file descriptor\n",
> +			test_name);
> +
> +	return pidfd;
> +}
> +
>  /*
>   * Straightforward test to see whether pidfd_send_signal() works is to send
>   * a signal to ourself.
> @@ -87,7 +111,6 @@ static int wait_for_pid(pid_t pid)
>  static int test_pidfd_send_signal_exited_fail(void)
>  {
>  	int pidfd, ret, saved_errno;
> -	char buf[256];
>  	pid_t pid;
>  	const char *test_name = "pidfd_send_signal signal exited process";
>  
> @@ -99,17 +122,10 @@ static int test_pidfd_send_signal_exited_fail(void)
>  	if (pid == 0)
>  		_exit(EXIT_SUCCESS);
>  
> -	snprintf(buf, sizeof(buf), "/proc/%d", pid);
> -
> -	pidfd = open(buf, O_DIRECTORY | O_CLOEXEC);
> +	pidfd = open_pidfd(test_name, pid);
>  
>  	(void)wait_for_pid(pid);
>  
> -	if (pidfd < 0)
> -		ksft_exit_fail_msg(
> -			"%s test: Failed to open process file descriptor\n",
> -			test_name);
> -
>  	ret = sys_pidfd_send_signal(pidfd, 0, NULL, 0);
>  	saved_errno = errno;
>  	close(pidfd);
> @@ -368,10 +384,179 @@ static int test_pidfd_send_signal_syscall_support(void)
>  	return 0;
>  }
>  
> +void *test_pidfd_poll_exec_thread(void *priv)
> +{
> +	char waittime[256];
> +
> +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> +			getpid(), syscall(SYS_gettid));
> +	ksft_print_msg("Child Thread: doing exec of sleep\n");
> +
> +	sprintf(waittime, "%d", CHILD_THREAD_MIN_WAIT);
> +	execl("/bin/sleep", "sleep", waittime, (char *)NULL);
> +
> +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n",
> +			getpid(), syscall(SYS_gettid));
> +	return NULL;
> +}
> +
> +static int poll_pidfd(const char *test_name, int pidfd)
> +{
> +	int c;
> +	int epoll_fd = epoll_create1(0);
> +	struct epoll_event event, events[MAX_EVENTS];
> +
> +	if (epoll_fd == -1)
> +		ksft_exit_fail_msg("%s test: Failed to create epoll file descriptor\n",
> +				   test_name);
> +
> +	event.events = EPOLLIN;
> +	event.data.fd = pidfd;
> +
> +	if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, pidfd, &event)) {
> +		ksft_print_msg("%s test: Failed to add epoll file descriptor: Skipping\n",
> +			       test_name);
> +		_exit(PIDFD_SKIP);
> +	}
> +
> +	c = epoll_wait(epoll_fd, events, MAX_EVENTS, 5000);
> +	if (c != 1 || !(events[0].events & EPOLLIN))
> +		ksft_exit_fail_msg("%s test: Unexpected epoll_wait result (c=%d, events=%x)\n",
> +				   test_name, c, events[0].events);
> +
> +	close(epoll_fd);
> +	return events[0].events;
> +
> +}
> +
> +int test_pidfd_poll_exec(int use_waitpid)
> +{
> +	int pid, pidfd;
> +	int status, ret;
> +	pthread_t t1;
> +	time_t prog_start = time(NULL);
> +	const char *test_name = "pidfd_poll check for premature notification on child thread exec";
> +
> +	ksft_print_msg("Parent: pid: %d\n", getpid());
> +	pid = fork();
> +	if (pid == 0) {
> +		ksft_print_msg("Child: starting. pid %d tid %d\n", getpid(),
> +				syscall(SYS_gettid));
> +		pthread_create(&t1, NULL, test_pidfd_poll_exec_thread, NULL);
> +		/*
> +		 * Exec in the non-leader thread will destroy the leader immediately.
> +		 * If the wait in the parent returns too soon, the test fails.
> +		 */
> +		while (1)
> +			;
> +	}
> +
> +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> +
> +	if (use_waitpid) {
> +		ret = waitpid(pid, &status, 0);
> +		if (ret == -1)
> +			ksft_print_msg("Parent: error\n");
> +
> +		if (ret == pid)
> +			ksft_print_msg("Parent: Child process waited for.\n");
> +	} else {
> +		pidfd = open_pidfd(test_name, pid);
> +		poll_pidfd(test_name, pidfd);
> +	}
> +
> +	time_t prog_time = time(NULL) - prog_start;
> +
> +	ksft_print_msg("Time waited for child: %lu\n", prog_time);
> +
> +	close(pidfd);
> +
> +	if (prog_time < CHILD_THREAD_MIN_WAIT || prog_time > CHILD_THREAD_MIN_WAIT + 2)
> +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> +	else
> +		ksft_test_result_pass("%s test: Passed\n", test_name);
> +}
> +
> +void *test_pidfd_poll_leader_exit_thread(void *priv)
> +{
> +	char waittime[256];
> +
> +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> +			getpid(), syscall(SYS_gettid));
> +	sleep(CHILD_THREAD_MIN_WAIT);
> +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> +	return NULL;
> +}
> +
> +static time_t *child_exit_secs;
> +int test_pidfd_poll_leader_exit(int use_waitpid)
> +{
> +	int pid, pidfd;
> +	int status, ret;
> +	pthread_t t1, t2;
> +	time_t prog_start = time(NULL);
> +	const char *test_name = "pidfd_poll check for premature notification on non-empty"
> +				"group leader exit";
> +
> +	child_exit_secs = mmap(NULL, sizeof *child_exit_secs, PROT_READ | PROT_WRITE,
> +			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> +
> +	ksft_print_msg("Parent: pid: %d\n", getpid());
> +	pid = fork();
> +	if (pid == 0) {
> +		ksft_print_msg("Child: starting. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> +		pthread_create(&t1, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> +		pthread_create(&t2, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> +
> +		/*
> +		 * glibc exit calls exit_group syscall, so explicity call exit only
> +		 * so that only the group leader exits, leaving the threads alone.
> +		 */
> +		*child_exit_secs = time(NULL);
> +		syscall(SYS_exit, 0);
> +	}
> +
> +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> +
> +	if (use_waitpid) {
> +		ret = waitpid(pid, &status, 0);
> +		if (ret == -1)
> +			ksft_print_msg("Parent: error\n");
> +	} else {
> +		/*
> +		 * This sleep tests for the case where if the child exits, and is in
> +		 * EXIT_ZOMBIE, but the thread group leader is non-empty, then the poll
> +		 * doesn't prematurely return even though there are active threads
> +		 */
> +		sleep(1);
> +		pidfd = open_pidfd(test_name, pid);
> +		poll_pidfd(test_name, pidfd);
> +	}
> +
> +	if (ret == pid)
> +		ksft_print_msg("Parent: Child process waited for.\n");
> +
> +	time_t since_child_exit = time(NULL) - *child_exit_secs;
> +
> +	ksft_print_msg("Time since child exit: %lu\n", since_child_exit);
> +
> +	close(pidfd);
> +
> +	if (since_child_exit < CHILD_THREAD_MIN_WAIT ||
> +			since_child_exit > CHILD_THREAD_MIN_WAIT + 2)
> +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> +	else
> +		ksft_test_result_pass("%s test: Passed\n", test_name);
> +}
> +
>  int main(int argc, char **argv)
>  {
>  	ksft_print_header();
>  
> +	test_pidfd_poll_exec(0);
> +	test_pidfd_poll_exec(1);
> +	test_pidfd_poll_leader_exit(0);
> +	test_pidfd_poll_leader_exit(1);
>  	test_pidfd_send_signal_syscall_support();
>  	test_pidfd_send_signal_simple_success();
>  	test_pidfd_send_signal_exited_fail();

WARNING: multiple messages have this Message-ID (diff)
From: christian@brauner.io (Christian Brauner)
Subject: [PATCH RFC 1/2] Add polling support to pidfd
Date: Fri, 19 Apr 2019 22:01:00 +0200	[thread overview]
Message-ID: <20190419200059.r2f7i7jtlsza4eun@brauner.io> (raw)
Message-ID: <20190419200100.x491IdqgK4NL80JHE2Qg_Drvy5dVtOUa-K38dK-B2u8@z> (raw)
In-Reply-To: <20190419194902.GE251571@google.com>

On Fri, Apr 19, 2019@03:49:02PM -0400, Joel Fernandes wrote:
> On Fri, Apr 19, 2019@09:18:59PM +0200, Christian Brauner wrote:
> > On Fri, Apr 19, 2019@03:02:47PM -0400, Joel Fernandes wrote:
> > > On Thu, Apr 18, 2019@07:26:44PM +0200, Christian Brauner wrote:
> > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn <jannh@google.com> wrote:
> > > > >On Wed, Apr 17, 2019@3:09 PM Oleg Nesterov <oleg@redhat.com> wrote:
> > > > >> On 04/16, Joel Fernandes wrote:
> > > > >> > On Tue, Apr 16, 2019@02:04:31PM +0200, Oleg Nesterov wrote:
> > > > >> > >
> > > > >> > > Could you explain when it should return POLLIN? When the whole
> > > > >process exits?
> > > > >> >
> > > > >> > It returns POLLIN when the task is dead or doesn't exist anymore,
> > > > >or when it
> > > > >> > is in a zombie state and there's no other thread in the thread
> > > > >group.
> > > > >>
> > > > >> IOW, when the whole thread group exits, so it can't be used to
> > > > >monitor sub-threads.
> > > > >>
> > > > >> just in case... speaking of this patch it doesn't modify
> > > > >proc_tid_base_operations,
> > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc you are
> > > > >going to use
> > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > >
> > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > >proc_tgid_base_operations despite not being a thread group leader.
> > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this code can
> > > > >be hit trivially, and then the code will misbehave.
> > > > >
> > > > >@Joel: I think you'll have to either rewrite this to explicitly bail
> > > > >out if you're dealing with a thread group leader, or make the code
> > > > >work for threads, too.
> > > > 
> > > > The latter case probably being preferred if this API is supposed to be
> > > > useable for thread management in userspace.
> > > 
> > > At the moment, we are not planning to use this for sub-thread management. I
> > > am reworking this patch to only work on clone(2) pidfds which makes the above
> > 
> > Indeed and agreed.
> > 
> > > discussion about /proc a bit unnecessary I think. Per the latest CLONE_PIDFD
> > > patches, CLONE_THREAD with pidfd is not supported.
> > 
> > Yes. We have no one asking for it right now and we can easily add this
> > later.
> > 
> > Admittedly I haven't gotten around to reviewing the patches here yet
> > completely. But one thing about using POLLIN. FreeBSD is using POLLHUP
> > on process exit which I think is nice as well. How about returning
> > POLLIN | POLLHUP on process exit?
> > We already do things like this. For example, when you proxy between
> > ttys. If the process that you're reading data from has exited and closed
> > it's end you still can't usually simply exit because it might have still
> > buffered data that you want to read.  The way one can deal with this
> > from  userspace is that you can observe a (POLLHUP | POLLIN) event and
> > you keep on reading until you only observe a POLLHUP without a POLLIN
> > event at which point you know you have read
> > all data.
> > I like the semantics for pidfds as well as it would indicate:
> > - POLLHUP -> process has exited
> > - POLLIN  -> information can be read
> 
> Actually I think a bit different about this, in my opinion the pidfd should
> always be readable (we would store the exit status somewhere in the future
> which would be readable, even after task_struct is dead). So I was thinking

So your idea is that you always get EPOLLIN when the process is alive,
i.e. epoll_wait() immediately returns for a pidfd that referes to a live
process if you specify EPOLLIN? E.g. if I specify EPOLLIN | EPOLLHUP
then epoll_wait() would constantly return. I would then need to check
for EPOLLHUP, see that it is not present and then go back into the
epoll_wait() loop and play the same game again?
What do you need this for?
And if you have a valid reason to do this would it make sense to set
POLLPRI if the actual exit status can be read? This way one could at
least specify POLLPRI | POLLHUP without being constantly woken.

> we always return EPOLLIN.  If process has not exited, then it blocks.
> 
> However, we also are returning EPOLLERR in previous patch if the task_struct
> has been reaped (task == NULL). I could change that to EPOLLHUP.

That would be here, right?:

> +	if (!task)
> +		poll_flags = POLLIN | POLLRDNORM | POLLHUP;

That sounds better to me that EPOLLERR.

> 
> So the update patch looks like below. Thoughts?
> 
> ---8<-----------------------
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 6a803a0b75df..eb279b5f4115 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3069,8 +3069,45 @@ static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
>  				   tgid_base_stuff, ARRAY_SIZE(tgid_base_stuff));
>  }
>  
> +static unsigned int proc_tgid_base_poll(struct file *file, struct poll_table_struct *pts)
> +{
> +	int poll_flags = 0;
> +	struct task_struct *task;
> +	struct pid *pid;
> +
> +	task = get_proc_task(file->f_path.dentry->d_inode);
> +
> +	WARN_ON_ONCE(task && !thread_group_leader(task));
> +
> +	/*
> +	 * tasklist_lock must be held because to avoid racing with
> +	 * changes in exit_state and wake up. Basically to avoid:
> +	 *
> +	 * P0: read exit_state = 0
> +	 * P1: write exit_state = EXIT_DEAD
> +	 * P1: Do a wake up - wq is empty, so do nothing
> +	 * P0: Queue for polling - wait forever.
> +	 */
> +	read_lock(&tasklist_lock);
> +	if (!task)
> +		poll_flags = POLLIN | POLLRDNORM | POLLHUP;
> +	else if (task->exit_state && thread_group_empty(task))
> +		poll_flags = POLLIN | POLLRDNORM;
> +
> +	if (!poll_flags) {
> +		pid = proc_pid(file->f_path.dentry->d_inode);
> +		poll_wait(file, &pid->wait_pidfd, pts);
> +	}
> +	read_unlock(&tasklist_lock);
> +
> +	if (task)
> +		put_task_struct(task);
> +	return poll_flags;
> +}
> +
>  static const struct file_operations proc_tgid_base_operations = {
>  	.read		= generic_read_dir,
> +	.poll		= proc_tgid_base_poll,
>  	.iterate_shared	= proc_tgid_base_readdir,
>  	.llseek		= generic_file_llseek,
>  };
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index b6f4ba16065a..2e0dcbc6d14e 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_PID_H
>  
>  #include <linux/rculist.h>
> +#include <linux/wait.h>
>  
>  enum pid_type
>  {
> @@ -60,6 +61,8 @@ struct pid
>  	unsigned int level;
>  	/* lists of tasks that use this pid */
>  	struct hlist_head tasks[PIDTYPE_MAX];
> +	/* wait queue for pidfd pollers */
> +	wait_queue_head_t wait_pidfd;
>  	struct rcu_head rcu;
>  	struct upid numbers[1];
>  };
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 20881598bdfa..5c90c239242f 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -214,6 +214,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>  	for (type = 0; type < PIDTYPE_MAX; ++type)
>  		INIT_HLIST_HEAD(&pid->tasks[type]);
>  
> +	init_waitqueue_head(&pid->wait_pidfd);
> +
>  	upid = pid->numbers + ns->level;
>  	spin_lock_irq(&pidmap_lock);
>  	if (!(ns->pid_allocated & PIDNS_ADDING))
> diff --git a/kernel/signal.c b/kernel/signal.c
> index f98448cf2def..e3781703ef7e 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1800,6 +1800,17 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>  	return ret;
>  }
>  
> +static void do_wakeup_pidfd_pollers(struct task_struct *task)
> +{
> +	struct pid *pid;
> +
> +	lockdep_assert_held(&tasklist_lock);
> +
> +	pid = get_task_pid(task, PIDTYPE_PID);
> +	wake_up_all(&pid->wait_pidfd);
> +	put_pid(pid);
> +}
> +
>  /*
>   * Let a parent know about the death of a child.
>   * For a stopped/continued status change, use do_notify_parent_cldstop instead.
> @@ -1823,6 +1834,9 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  	BUG_ON(!tsk->ptrace &&
>  	       (tsk->group_leader != tsk || !thread_group_empty(tsk)));
>  
> +	/* Wake up all pidfd waiters */
> +	do_wakeup_pidfd_pollers(tsk);
> +
>  	if (sig != SIGCHLD) {
>  		/*
>  		 * This is only possible if parent == real_parent.
> diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
> index deaf8073bc06..4b31c14f273c 100644
> --- a/tools/testing/selftests/pidfd/Makefile
> +++ b/tools/testing/selftests/pidfd/Makefile
> @@ -1,4 +1,4 @@
> -CFLAGS += -g -I../../../../usr/include/
> +CFLAGS += -g -I../../../../usr/include/ -lpthread
>  
>  TEST_GEN_PROGS := pidfd_test
>  
> diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c
> index d59378a93782..57ae217339e9 100644
> --- a/tools/testing/selftests/pidfd/pidfd_test.c
> +++ b/tools/testing/selftests/pidfd/pidfd_test.c
> @@ -4,18 +4,26 @@
>  #include <errno.h>
>  #include <fcntl.h>
>  #include <linux/types.h>
> +#include <pthread.h>
>  #include <sched.h>
>  #include <signal.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <syscall.h>
> +#include <sys/epoll.h>
> +#include <sys/mman.h>
>  #include <sys/mount.h>
>  #include <sys/wait.h>
> +#include <time.h>
>  #include <unistd.h>
>  
>  #include "../kselftest.h"
>  
> +#define CHILD_THREAD_MIN_WAIT 3 /* seconds */
> +#define MAX_EVENTS 5
> +#define __NR_pidfd_send_signal 424
> +
>  static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
>  					unsigned int flags)
>  {
> @@ -30,6 +38,22 @@ static void set_signal_received_on_sigusr1(int sig)
>  		signal_received = 1;
>  }
>  
> +static int open_pidfd(const char *test_name, pid_t pid)
> +{
> +	char buf[256];
> +	int pidfd;
> +
> +	snprintf(buf, sizeof(buf), "/proc/%d", pid);
> +	pidfd = open(buf, O_DIRECTORY | O_CLOEXEC);
> +
> +	if (pidfd < 0)
> +		ksft_exit_fail_msg(
> +			"%s test: Failed to open process file descriptor\n",
> +			test_name);
> +
> +	return pidfd;
> +}
> +
>  /*
>   * Straightforward test to see whether pidfd_send_signal() works is to send
>   * a signal to ourself.
> @@ -87,7 +111,6 @@ static int wait_for_pid(pid_t pid)
>  static int test_pidfd_send_signal_exited_fail(void)
>  {
>  	int pidfd, ret, saved_errno;
> -	char buf[256];
>  	pid_t pid;
>  	const char *test_name = "pidfd_send_signal signal exited process";
>  
> @@ -99,17 +122,10 @@ static int test_pidfd_send_signal_exited_fail(void)
>  	if (pid == 0)
>  		_exit(EXIT_SUCCESS);
>  
> -	snprintf(buf, sizeof(buf), "/proc/%d", pid);
> -
> -	pidfd = open(buf, O_DIRECTORY | O_CLOEXEC);
> +	pidfd = open_pidfd(test_name, pid);
>  
>  	(void)wait_for_pid(pid);
>  
> -	if (pidfd < 0)
> -		ksft_exit_fail_msg(
> -			"%s test: Failed to open process file descriptor\n",
> -			test_name);
> -
>  	ret = sys_pidfd_send_signal(pidfd, 0, NULL, 0);
>  	saved_errno = errno;
>  	close(pidfd);
> @@ -368,10 +384,179 @@ static int test_pidfd_send_signal_syscall_support(void)
>  	return 0;
>  }
>  
> +void *test_pidfd_poll_exec_thread(void *priv)
> +{
> +	char waittime[256];
> +
> +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> +			getpid(), syscall(SYS_gettid));
> +	ksft_print_msg("Child Thread: doing exec of sleep\n");
> +
> +	sprintf(waittime, "%d", CHILD_THREAD_MIN_WAIT);
> +	execl("/bin/sleep", "sleep", waittime, (char *)NULL);
> +
> +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n",
> +			getpid(), syscall(SYS_gettid));
> +	return NULL;
> +}
> +
> +static int poll_pidfd(const char *test_name, int pidfd)
> +{
> +	int c;
> +	int epoll_fd = epoll_create1(0);
> +	struct epoll_event event, events[MAX_EVENTS];
> +
> +	if (epoll_fd == -1)
> +		ksft_exit_fail_msg("%s test: Failed to create epoll file descriptor\n",
> +				   test_name);
> +
> +	event.events = EPOLLIN;
> +	event.data.fd = pidfd;
> +
> +	if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, pidfd, &event)) {
> +		ksft_print_msg("%s test: Failed to add epoll file descriptor: Skipping\n",
> +			       test_name);
> +		_exit(PIDFD_SKIP);
> +	}
> +
> +	c = epoll_wait(epoll_fd, events, MAX_EVENTS, 5000);
> +	if (c != 1 || !(events[0].events & EPOLLIN))
> +		ksft_exit_fail_msg("%s test: Unexpected epoll_wait result (c=%d, events=%x)\n",
> +				   test_name, c, events[0].events);
> +
> +	close(epoll_fd);
> +	return events[0].events;
> +
> +}
> +
> +int test_pidfd_poll_exec(int use_waitpid)
> +{
> +	int pid, pidfd;
> +	int status, ret;
> +	pthread_t t1;
> +	time_t prog_start = time(NULL);
> +	const char *test_name = "pidfd_poll check for premature notification on child thread exec";
> +
> +	ksft_print_msg("Parent: pid: %d\n", getpid());
> +	pid = fork();
> +	if (pid == 0) {
> +		ksft_print_msg("Child: starting. pid %d tid %d\n", getpid(),
> +				syscall(SYS_gettid));
> +		pthread_create(&t1, NULL, test_pidfd_poll_exec_thread, NULL);
> +		/*
> +		 * Exec in the non-leader thread will destroy the leader immediately.
> +		 * If the wait in the parent returns too soon, the test fails.
> +		 */
> +		while (1)
> +			;
> +	}
> +
> +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> +
> +	if (use_waitpid) {
> +		ret = waitpid(pid, &status, 0);
> +		if (ret == -1)
> +			ksft_print_msg("Parent: error\n");
> +
> +		if (ret == pid)
> +			ksft_print_msg("Parent: Child process waited for.\n");
> +	} else {
> +		pidfd = open_pidfd(test_name, pid);
> +		poll_pidfd(test_name, pidfd);
> +	}
> +
> +	time_t prog_time = time(NULL) - prog_start;
> +
> +	ksft_print_msg("Time waited for child: %lu\n", prog_time);
> +
> +	close(pidfd);
> +
> +	if (prog_time < CHILD_THREAD_MIN_WAIT || prog_time > CHILD_THREAD_MIN_WAIT + 2)
> +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> +	else
> +		ksft_test_result_pass("%s test: Passed\n", test_name);
> +}
> +
> +void *test_pidfd_poll_leader_exit_thread(void *priv)
> +{
> +	char waittime[256];
> +
> +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> +			getpid(), syscall(SYS_gettid));
> +	sleep(CHILD_THREAD_MIN_WAIT);
> +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> +	return NULL;
> +}
> +
> +static time_t *child_exit_secs;
> +int test_pidfd_poll_leader_exit(int use_waitpid)
> +{
> +	int pid, pidfd;
> +	int status, ret;
> +	pthread_t t1, t2;
> +	time_t prog_start = time(NULL);
> +	const char *test_name = "pidfd_poll check for premature notification on non-empty"
> +				"group leader exit";
> +
> +	child_exit_secs = mmap(NULL, sizeof *child_exit_secs, PROT_READ | PROT_WRITE,
> +			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> +
> +	ksft_print_msg("Parent: pid: %d\n", getpid());
> +	pid = fork();
> +	if (pid == 0) {
> +		ksft_print_msg("Child: starting. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> +		pthread_create(&t1, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> +		pthread_create(&t2, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> +
> +		/*
> +		 * glibc exit calls exit_group syscall, so explicity call exit only
> +		 * so that only the group leader exits, leaving the threads alone.
> +		 */
> +		*child_exit_secs = time(NULL);
> +		syscall(SYS_exit, 0);
> +	}
> +
> +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> +
> +	if (use_waitpid) {
> +		ret = waitpid(pid, &status, 0);
> +		if (ret == -1)
> +			ksft_print_msg("Parent: error\n");
> +	} else {
> +		/*
> +		 * This sleep tests for the case where if the child exits, and is in
> +		 * EXIT_ZOMBIE, but the thread group leader is non-empty, then the poll
> +		 * doesn't prematurely return even though there are active threads
> +		 */
> +		sleep(1);
> +		pidfd = open_pidfd(test_name, pid);
> +		poll_pidfd(test_name, pidfd);
> +	}
> +
> +	if (ret == pid)
> +		ksft_print_msg("Parent: Child process waited for.\n");
> +
> +	time_t since_child_exit = time(NULL) - *child_exit_secs;
> +
> +	ksft_print_msg("Time since child exit: %lu\n", since_child_exit);
> +
> +	close(pidfd);
> +
> +	if (since_child_exit < CHILD_THREAD_MIN_WAIT ||
> +			since_child_exit > CHILD_THREAD_MIN_WAIT + 2)
> +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> +	else
> +		ksft_test_result_pass("%s test: Passed\n", test_name);
> +}
> +
>  int main(int argc, char **argv)
>  {
>  	ksft_print_header();
>  
> +	test_pidfd_poll_exec(0);
> +	test_pidfd_poll_exec(1);
> +	test_pidfd_poll_leader_exit(0);
> +	test_pidfd_poll_leader_exit(1);
>  	test_pidfd_send_signal_syscall_support();
>  	test_pidfd_send_signal_simple_success();
>  	test_pidfd_send_signal_exited_fail();

  reply	other threads:[~2019-04-19 20:01 UTC|newest]

Thread overview: 198+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11 17:50 [PATCH RFC 1/2] Add polling support to pidfd Joel Fernandes (Google)
2019-04-11 17:50 ` Joel Fernandes (Google)
2019-04-11 17:50 ` joel
2019-04-11 17:50 ` [PATCH RFC 2/2] Add selftests for pidfd polling Joel Fernandes (Google)
2019-04-11 17:50   ` Joel Fernandes (Google)
2019-04-11 17:50   ` joel
2019-04-12 14:51   ` Tycho Andersen
2019-04-12 14:51     ` Tycho Andersen
2019-04-12 14:51     ` tycho
2019-04-11 20:00 ` [PATCH RFC 1/2] Add polling support to pidfd Joel Fernandes
2019-04-11 20:00   ` Joel Fernandes
2019-04-11 20:00   ` joel
2019-04-11 20:02   ` Christian Brauner
2019-04-11 20:02     ` Christian Brauner
2019-04-11 20:02     ` christian
2019-04-11 20:20     ` Joel Fernandes
2019-04-11 20:20       ` Joel Fernandes
2019-04-11 20:20       ` joel
2019-04-12 21:32 ` Andy Lutomirski
2019-04-12 21:32   ` Andy Lutomirski
2019-04-12 21:32   ` luto
2019-04-13  0:09   ` Joel Fernandes
2019-04-13  0:09     ` Joel Fernandes
2019-04-13  0:09     ` joel
     [not found]     ` <CAKOZuetX4jMPDtDqAvGgSNo4BHf9BOnu79ufEiULfM5X5nDyyQ@mail.gmail.com>
2019-04-13  0:56       ` Daniel Colascione
2019-04-13  0:56         ` Daniel Colascione
2019-04-13  0:56         ` dancol
2019-04-14 18:19   ` Linus Torvalds
2019-04-14 18:19     ` Linus Torvalds
2019-04-14 18:19     ` torvalds
2019-04-16 12:04 ` Oleg Nesterov
2019-04-16 12:04   ` Oleg Nesterov
2019-04-16 12:04   ` oleg
2019-04-16 12:43   ` Oleg Nesterov
2019-04-16 12:43     ` Oleg Nesterov
2019-04-16 12:43     ` oleg
2019-04-16 19:20   ` Joel Fernandes
2019-04-16 19:20     ` Joel Fernandes
2019-04-16 19:20     ` joel
2019-04-16 19:32     ` Joel Fernandes
2019-04-16 19:32       ` Joel Fernandes
2019-04-16 19:32       ` joel
2019-04-17 13:09     ` Oleg Nesterov
2019-04-17 13:09       ` Oleg Nesterov
2019-04-17 13:09       ` oleg
2019-04-18 17:23       ` Jann Horn
2019-04-18 17:23         ` Jann Horn
2019-04-18 17:23         ` jannh
2019-04-18 17:26         ` Christian Brauner
2019-04-18 17:26           ` Christian Brauner
2019-04-18 17:26           ` christian
2019-04-18 17:53           ` Daniel Colascione
2019-04-18 17:53             ` Daniel Colascione
2019-04-18 17:53             ` dancol
2019-04-19 19:02           ` Joel Fernandes
2019-04-19 19:02             ` Joel Fernandes
2019-04-19 19:02             ` joel
2019-04-19 19:18             ` Christian Brauner
2019-04-19 19:18               ` Christian Brauner
2019-04-19 19:18               ` christian
2019-04-19 19:22               ` Christian Brauner
2019-04-19 19:22                 ` Christian Brauner
2019-04-19 19:22                 ` christian
2019-04-19 19:42                 ` Christian Brauner
2019-04-19 19:42                   ` Christian Brauner
2019-04-19 19:42                   ` christian
2019-04-19 19:49               ` Joel Fernandes
2019-04-19 19:49                 ` Joel Fernandes
2019-04-19 19:49                 ` joel
2019-04-19 20:01                 ` Christian Brauner [this message]
2019-04-19 20:01                   ` Christian Brauner
2019-04-19 20:01                   ` christian
2019-04-19 21:13                   ` Joel Fernandes
2019-04-19 21:13                     ` Joel Fernandes
2019-04-19 21:13                     ` joel
2019-04-19 20:34                 ` Daniel Colascione
2019-04-19 20:34                   ` Daniel Colascione
2019-04-19 20:34                   ` dancol
2019-04-19 20:57                   ` Christian Brauner
2019-04-19 20:57                     ` Christian Brauner
2019-04-19 20:57                     ` christian
2019-04-19 21:20                     ` Joel Fernandes
2019-04-19 21:20                       ` Joel Fernandes
2019-04-19 21:20                       ` joel
2019-04-19 21:24                       ` Daniel Colascione
2019-04-19 21:24                         ` Daniel Colascione
2019-04-19 21:24                         ` dancol
2019-04-19 21:45                         ` Joel Fernandes
2019-04-19 21:45                           ` Joel Fernandes
2019-04-19 21:45                           ` joel
2019-04-19 22:08                           ` Daniel Colascione
2019-04-19 22:08                             ` Daniel Colascione
2019-04-19 22:08                             ` dancol
2019-04-19 22:17                             ` Christian Brauner
2019-04-19 22:17                               ` Christian Brauner
2019-04-19 22:17                               ` christian
2019-04-19 22:37                               ` Daniel Colascione
2019-04-19 22:37                                 ` Daniel Colascione
2019-04-19 22:37                                 ` dancol
2019-04-24  8:04                         ` Enrico Weigelt, metux IT consult
2019-04-24  8:04                           ` Enrico Weigelt, metux IT consult
2019-04-24  8:04                           ` lkml
2019-04-19 21:59                       ` Christian Brauner
2019-04-19 21:59                         ` Christian Brauner
2019-04-19 21:59                         ` christian
2019-04-20 11:51                         ` Oleg Nesterov
2019-04-20 11:51                           ` Oleg Nesterov
2019-04-20 11:51                           ` oleg
2019-04-20 12:26                           ` Oleg Nesterov
2019-04-20 12:26                             ` Oleg Nesterov
2019-04-20 12:26                             ` oleg
2019-04-20 12:35                             ` Christian Brauner
2019-04-20 12:35                               ` Christian Brauner
2019-04-20 12:35                               ` christian
2019-04-19 23:11                       ` Linus Torvalds
2019-04-19 23:11                         ` Linus Torvalds
2019-04-19 23:11                         ` torvalds
2019-04-19 23:20                         ` Christian Brauner
2019-04-19 23:20                           ` Christian Brauner
2019-04-19 23:20                           ` christian
2019-04-19 23:32                           ` Linus Torvalds
2019-04-19 23:32                             ` Linus Torvalds
2019-04-19 23:32                             ` torvalds
2019-04-19 23:36                             ` Daniel Colascione
2019-04-19 23:36                               ` Daniel Colascione
2019-04-19 23:36                               ` dancol
2019-04-20  0:46                         ` Joel Fernandes
2019-04-20  0:46                           ` Joel Fernandes
2019-04-20  0:46                           ` joel
2019-04-19 21:21                     ` Daniel Colascione
2019-04-19 21:21                       ` Daniel Colascione
2019-04-19 21:21                       ` dancol
2019-04-19 21:48                       ` Christian Brauner
2019-04-19 21:48                         ` Christian Brauner
2019-04-19 21:48                         ` christian
2019-04-19 22:02                         ` Christian Brauner
2019-04-19 22:02                           ` Christian Brauner
2019-04-19 22:02                           ` christian
2019-04-19 22:46                           ` Daniel Colascione
2019-04-19 22:46                             ` Daniel Colascione
2019-04-19 22:46                             ` dancol
2019-04-19 23:12                             ` Christian Brauner
2019-04-19 23:12                               ` Christian Brauner
2019-04-19 23:12                               ` christian
2019-04-19 23:46                               ` Daniel Colascione
2019-04-19 23:46                                 ` Daniel Colascione
2019-04-19 23:46                                 ` dancol
2019-04-20  0:17                                 ` Christian Brauner
2019-04-20  0:17                                   ` Christian Brauner
2019-04-20  0:17                                   ` christian
2019-04-24  9:05                                   ` Enrico Weigelt, metux IT consult
2019-04-24  9:05                                     ` Enrico Weigelt, metux IT consult
2019-04-24  9:05                                     ` lkml
2019-04-24  9:03                                 ` Enrico Weigelt, metux IT consult
2019-04-24  9:03                                   ` Enrico Weigelt, metux IT consult
2019-04-24  9:03                                   ` lkml
2019-04-19 22:35                         ` Daniel Colascione
2019-04-19 22:35                           ` Daniel Colascione
2019-04-19 22:35                           ` dancol
2019-04-19 23:02                           ` Christian Brauner
2019-04-19 23:02                             ` Christian Brauner
2019-04-19 23:02                             ` christian
2019-04-19 23:29                             ` Daniel Colascione
2019-04-19 23:29                               ` Daniel Colascione
2019-04-19 23:29                               ` dancol
2019-04-20  0:02                               ` Christian Brauner
2019-04-20  0:02                                 ` Christian Brauner
2019-04-20  0:02                                 ` christian
2019-04-24  9:17                               ` Enrico Weigelt, metux IT consult
2019-04-24  9:17                                 ` Enrico Weigelt, metux IT consult
2019-04-24  9:17                                 ` lkml
2019-04-24  9:11                             ` Enrico Weigelt, metux IT consult
2019-04-24  9:11                               ` Enrico Weigelt, metux IT consult
2019-04-24  9:11                               ` lkml
2019-04-24  8:56                         ` Enrico Weigelt, metux IT consult
2019-04-24  8:56                           ` Enrico Weigelt, metux IT consult
2019-04-24  8:56                           ` lkml
2019-04-24  8:20                       ` Enrico Weigelt, metux IT consult
2019-04-24  8:20                         ` Enrico Weigelt, metux IT consult
2019-04-24  8:20                         ` lkml
2019-04-19 15:43         ` Oleg Nesterov
2019-04-19 15:43           ` Oleg Nesterov
2019-04-19 15:43           ` oleg
2019-04-19 18:12       ` Joel Fernandes
2019-04-19 18:12         ` Joel Fernandes
2019-04-19 18:12         ` joel
2019-04-18 18:44     ` Jonathan Kowalski
2019-04-18 18:44       ` Jonathan Kowalski
2019-04-18 18:44       ` bl0pbl33p
2019-04-18 18:57       ` Daniel Colascione
2019-04-18 18:57         ` Daniel Colascione
2019-04-18 18:57         ` dancol
2019-04-18 19:14         ` Linus Torvalds
2019-04-18 19:14           ` Linus Torvalds
2019-04-18 19:14           ` torvalds
2019-04-19 19:05           ` Joel Fernandes
2019-04-19 19:05             ` Joel Fernandes
2019-04-19 19:05             ` joel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190419200059.r2f7i7jtlsza4eun@brauner.io \
    --to=christian@brauner.io \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=avagin@gmail.com \
    --cc=dancol@google.com \
    --cc=ebiederm@xmission.com \
    --cc=fweimer@redhat.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mhocko@suse.com \
    --cc=namit@vmware.com \
    --cc=oleg@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=serge@hallyn.com \
    --cc=sfr@canb.auug.org.au \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.