All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Thiago Macieira <thiago.macieira@intel.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Ingo Molnar <mingo@redhat.com>, Kees Cook <keescook@chromium.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Rik van Riel <riel@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH 6/6] clone4: Introduce new CLONE_FD flag to get task exit notification via fd
Date: Sat, 14 Mar 2015 13:14:03 -0700	[thread overview]
Message-ID: <20150314201402.GH22130@thin> (raw)
In-Reply-To: <20150314194721.GA9654@redhat.com>

On Sat, Mar 14, 2015 at 08:47:21PM +0100, Oleg Nesterov wrote:
> On 03/14, Oleg Nesterov wrote:
> >
> > On 03/14, Josh Triplett wrote:
> > >
> > > On Sat, Mar 14, 2015 at 11:38:29AM -0700, Thiago Macieira wrote:
> > > > On Saturday 14 March 2015 15:32:35 Oleg Nesterov wrote:
> > > > > It is not clear to me what do_wait() should do with ->autoreap child, even
> > > > > ignoring ptrace.
> > > > >
> > > > > Just suppose that real_parent has a single "autoreap" child. Should
> > > > > wait(NULL) hanf then?
> > > >
> > > > It should ignore the child that is set to autoreap. wait(NULL) should return -
> > > > ECHILD, indicating there are no children waiting to be reaped.
> > >
> > > Right.  And I don't think the current code does this.  I think we need
> > > to change wait_consider_task to early-return for ->autoreap just as it
> > > does for task_state == EXIT_DEAD.
> >
> > No. This EXIT_DEAD is absolutely different. And this is another indication
> > that you might use it wrongly ;)
> >
> > What we actually want is BUG_ON(task_state == EXIT_DEAD) here. We do not
> > want the EXIT_DEAD tasks in ->children/ptraced lists. These EXIT_DEAD tasks
> > complicate the exit/wait/reparent paths.
> >
> > However, currently this is TODO. The main problem is the locking in
> > wait_task_zombie(), we can set EXIT_DEAD and remove the task from list
> > under read_lock().
> 
> Let me clarify in case I confused you.
> 
> The EXIT_DEAD check in do_wait() paths doesn't mean "autoreap". It means
> that this thread/process (depending on ptrace) was already reaped. It was
> reaped by our sub-thread, or it was reaped because we ignore SIGCHLD, or
> other reasons. This doesn't matter.
> 
> In short, EXIT_DEAD means: we have to keep this thread on lists until the
> task which set this state calls release_task().

That much I already understood from reading through the code, since
exit_notify doesn't set task_state to EXIT_DEAD until the task is
actually completely dead.  When wait_consider_task sees p->task_state ==
EXIT_DEAD, that task isn't eligible for waiting at all.

What I was proposing was that a task that isn't yet dead, but that is
going to be autoreaped, is not eligible for waiting either.  All the
various wait* familiy of system calls should pretend it doesn't exist at
all, because returning an autoreaped task from a wait* call introduces a
race condition if the parent tries to *do* anything with the returned
PID.  If you launch a process with CLONE_FD, you need to manage it
exclusively with that fd, not with the wait* family of system calls.

That also implies that the child-stop and child-continued mechanisms
(do_notify_parent_cldstop, WSTOPPED, WCONTINUED) should ignore the task
too.  In the future there could be a flag to clone4 that lets you get
stop and continue notifications through the file descriptor.

- Josh Triplett

WARNING: multiple messages have this Message-ID (diff)
From: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
To: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Thiago Macieira
	<thiago.macieira-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>,
	"Paul E. McKenney"
	<paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>,
	Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Michael Kerrisk
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Subject: Re: [PATCH 6/6] clone4: Introduce new CLONE_FD flag to get task exit notification via fd
Date: Sat, 14 Mar 2015 13:14:03 -0700	[thread overview]
Message-ID: <20150314201402.GH22130@thin> (raw)
In-Reply-To: <20150314194721.GA9654-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Sat, Mar 14, 2015 at 08:47:21PM +0100, Oleg Nesterov wrote:
> On 03/14, Oleg Nesterov wrote:
> >
> > On 03/14, Josh Triplett wrote:
> > >
> > > On Sat, Mar 14, 2015 at 11:38:29AM -0700, Thiago Macieira wrote:
> > > > On Saturday 14 March 2015 15:32:35 Oleg Nesterov wrote:
> > > > > It is not clear to me what do_wait() should do with ->autoreap child, even
> > > > > ignoring ptrace.
> > > > >
> > > > > Just suppose that real_parent has a single "autoreap" child. Should
> > > > > wait(NULL) hanf then?
> > > >
> > > > It should ignore the child that is set to autoreap. wait(NULL) should return -
> > > > ECHILD, indicating there are no children waiting to be reaped.
> > >
> > > Right.  And I don't think the current code does this.  I think we need
> > > to change wait_consider_task to early-return for ->autoreap just as it
> > > does for task_state == EXIT_DEAD.
> >
> > No. This EXIT_DEAD is absolutely different. And this is another indication
> > that you might use it wrongly ;)
> >
> > What we actually want is BUG_ON(task_state == EXIT_DEAD) here. We do not
> > want the EXIT_DEAD tasks in ->children/ptraced lists. These EXIT_DEAD tasks
> > complicate the exit/wait/reparent paths.
> >
> > However, currently this is TODO. The main problem is the locking in
> > wait_task_zombie(), we can set EXIT_DEAD and remove the task from list
> > under read_lock().
> 
> Let me clarify in case I confused you.
> 
> The EXIT_DEAD check in do_wait() paths doesn't mean "autoreap". It means
> that this thread/process (depending on ptrace) was already reaped. It was
> reaped by our sub-thread, or it was reaped because we ignore SIGCHLD, or
> other reasons. This doesn't matter.
> 
> In short, EXIT_DEAD means: we have to keep this thread on lists until the
> task which set this state calls release_task().

That much I already understood from reading through the code, since
exit_notify doesn't set task_state to EXIT_DEAD until the task is
actually completely dead.  When wait_consider_task sees p->task_state ==
EXIT_DEAD, that task isn't eligible for waiting at all.

What I was proposing was that a task that isn't yet dead, but that is
going to be autoreaped, is not eligible for waiting either.  All the
various wait* familiy of system calls should pretend it doesn't exist at
all, because returning an autoreaped task from a wait* call introduces a
race condition if the parent tries to *do* anything with the returned
PID.  If you launch a process with CLONE_FD, you need to manage it
exclusively with that fd, not with the wait* family of system calls.

That also implies that the child-stop and child-continued mechanisms
(do_notify_parent_cldstop, WSTOPPED, WCONTINUED) should ignore the task
too.  In the future there could be a flag to clone4 that lets you get
stop and continue notifications through the file descriptor.

- Josh Triplett

  reply	other threads:[~2015-03-14 20:14 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-13  1:40 [PATCH 0/6] CLONE_FD: Task exit notification via file descriptor Josh Triplett
2015-03-13  1:40 ` Josh Triplett
2015-03-13  1:40 ` [PATCH 1/6] clone: Support passing tls argument via C rather than pt_regs magic Josh Triplett
2015-03-13  1:40 ` [PATCH 2/6] x86: Opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit Josh Triplett
2015-03-13  1:40   ` Josh Triplett
2015-03-13 22:01   ` Andy Lutomirski
2015-03-13 22:01     ` Andy Lutomirski
2015-03-13 22:31     ` josh
2015-03-13 22:38       ` Andy Lutomirski
2015-03-13 22:43         ` josh
2015-03-13 22:43           ` josh-iaAMLnmF4UmaiuxdJuQwMA
2015-03-13 22:45           ` Andy Lutomirski
2015-03-13 22:45             ` Andy Lutomirski
2015-03-13 23:01             ` josh
2015-03-13 23:01               ` josh-iaAMLnmF4UmaiuxdJuQwMA
2015-03-13  1:40 ` [PATCH 3/6] Introduce a new clone4 syscall with more flag bits and extensible arguments Josh Triplett
2015-03-13  1:40 ` [PATCH 4/6] signal: Factor out a helper function to process task_struct exit_code Josh Triplett
2015-03-13  1:40 ` [PATCH 5/6] fs: Make alloc_fd non-private Josh Triplett
2015-03-13  1:40   ` Josh Triplett
2015-03-13  1:41 ` [PATCH 6/6] clone4: Introduce new CLONE_FD flag to get task exit notification via fd Josh Triplett
2015-03-13 16:21   ` Oleg Nesterov
2015-03-13 19:57     ` josh
2015-03-13 21:34       ` Andy Lutomirski
2015-03-13 21:34         ` Andy Lutomirski
2015-03-13 22:20         ` josh
2015-03-13 22:28           ` Andy Lutomirski
2015-03-13 22:28             ` Andy Lutomirski
2015-03-13 22:34             ` josh
2015-03-13 22:34               ` josh-iaAMLnmF4UmaiuxdJuQwMA
2015-03-13 22:38               ` Andy Lutomirski
2015-03-14 14:14       ` Oleg Nesterov
2015-03-14 14:14         ` Oleg Nesterov
2015-03-14 14:32         ` Oleg Nesterov
2015-03-14 14:32           ` Oleg Nesterov
2015-03-14 18:38           ` Thiago Macieira
2015-03-14 18:54             ` Oleg Nesterov
2015-03-14 22:03               ` Josh Triplett
2015-03-14 22:03                 ` Josh Triplett
2015-03-14 22:26                 ` Thiago Macieira
2015-03-14 19:01             ` Josh Triplett
2015-03-14 19:18               ` Oleg Nesterov
2015-03-14 19:18                 ` Oleg Nesterov
2015-03-14 19:47                 ` Oleg Nesterov
2015-03-14 19:47                   ` Oleg Nesterov
2015-03-14 20:14                   ` Josh Triplett [this message]
2015-03-14 20:14                     ` Josh Triplett
2015-03-14 20:30                     ` Oleg Nesterov
2015-03-14 22:14                       ` Josh Triplett
2015-03-14 22:14                         ` Josh Triplett
2015-03-14 20:03                 ` Josh Triplett
2015-03-14 20:03                   ` Josh Triplett
2015-03-14 20:20                   ` Oleg Nesterov
2015-03-14 22:09         ` Josh Triplett
2015-03-14 14:35   ` Oleg Nesterov
2015-03-14 14:35     ` Oleg Nesterov
2015-03-14 19:15     ` Josh Triplett
2015-03-14 19:15       ` Josh Triplett
2015-03-14 19:24       ` Oleg Nesterov
2015-03-14 19:48         ` Josh Triplett
2015-03-14 19:48           ` Josh Triplett
2015-03-13  1:41 ` [PATCH] clone4.2: New manpage documenting clone4(2) Josh Triplett
2015-03-13  2:07 ` [PATCH 0/6] CLONE_FD: Task exit notification via file descriptor Thiago Macieira
2015-03-13  2:07   ` Thiago Macieira
2015-03-13 16:05 ` David Drysdale
2015-03-13 16:05   ` David Drysdale
2015-03-13 19:42   ` Josh Triplett
2015-03-13 21:16     ` Thiago Macieira
2015-03-13 21:44       ` josh
2015-03-13 21:33     ` Andy Lutomirski
2015-03-13 21:45       ` josh
2015-03-13 21:45         ` josh-iaAMLnmF4UmaiuxdJuQwMA
2015-03-13 21:51         ` Andy Lutomirski
2015-03-13 21:51           ` Andy Lutomirski
2015-03-14  1:11           ` Thiago Macieira
2015-03-14  1:11             ` Thiago Macieira
2015-03-14 19:03             ` Thiago Macieira
2015-03-14 19:29               ` Josh Triplett
2015-03-14 19:29                 ` Josh Triplett
2015-03-15 10:18                 ` David Drysdale
2015-03-15 10:18                   ` David Drysdale
2015-03-15 10:59                   ` Josh Triplett
2015-03-15  8:55     ` David Drysdale
2015-03-15  8:55       ` David Drysdale

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150314201402.GH22130@thin \
    --to=josh@joshtriplett.org \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=thiago.macieira@intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.