All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roland McGrath <roland@redhat.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>
Subject: Re: uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible)
Date: Mon, 14 Jun 2010 12:17:10 -0700 (PDT)	[thread overview]
Message-ID: <20100614191710.18C0E403B2@magilla.sf.frob.com> (raw)
In-Reply-To: Oleg Nesterov's message of  Monday, 14 June 2010 18:33:05 +0200 <20100614163304.GA21313@redhat.com>

> But note that oom_kill_process() doesn't kill the children with the
> same ->mm. I never understood this code.

Yes, odd.  This is the first time I've really looked at oom_kill.

> Anyway I agree. Even if I am right, this is not very serious problem
> from oom-kill pov. To me, the uninterruptible CLONE_VFORK is bad by
> itself.

Agreed.

> Yes sure. That is why wait_for_completion_killable(), not _interrutpible.

Right, your code was fine.  I was just being pedantic for the record since
you said "interruptible" in the text.

> But I assume you didn't mean that only SIGKILL should interrupt the
> parent, any sig_fatal() signal should.

Yes.

> Agreed. This needs auditing. And CLONE_VFORK can be used with/without all
> other CLONE_ flags... Probably we should mostly worry about vfork ==
> CLONE_VM | CLONE_VFORK case.

Yes.  I hope it is fine to make clone refuse CLONE_VFORK set without
CLONE_VM in the future as a sanity check.  I don't think any use of
CLONE_VFORK other than the actual vfork use is something we ever intended
to support.

> Anyway. ->vfork_done is per-thread. This means that without any changes
> do_fork(CLONE_VFORK) can return (to user-mode) before the child's thread
> group exits/execs. Perhaps this means we shouldn't worry too much.

You mean some other thread in the parent's group can run in user mode.
Yes.  The real reason for the vfork wait is just that the parent/child will
share the user stack memory, so in practice it's fine if other threads with
other stacks are touching other memory (i.e. it's just the user's problem).

> Hmm. Even without debugger, the parent doesn't react to SIGSTOP. 

Yes.  It's been a long time since I thought about the vfork stuff much.
But I now recall thinking about the SIGSTOP/SIGTSTP issue too.  It does
seem bad.  OTOH, it has lurked there for many years now without complaints.

Note that supporting stop/fatal signals in the normal way means that the
call has to return and pass the syscall-exit tracing point first.  This
means a change in the order of events seen by a debugger.  It also
complicates the subject of PTRACE_EVENT_VFORK_DONE reports, which today
happen before syscall-exit or signal stuff is possible.  For proper
stopping in the normal way, the vfork-wait would be restarted via
sys_restart_syscall or something.  But the way that happens ordinarily is
to get all the way back to user mode and reenter with a normal syscall.
That doesn't touch the user stack itself, but it sure makes one nervous.
It's hard to see how we could ever do that and then prevent normal signals
from being handled before the restart.  (Instead, we'd have the actual
blocking done inside get_signal_to_deliver so we just never get to user
mode until the vfork hold is released, and not actually need to restart.)
So there are multiple cans of worms cascading from a change, even though
the actual work to do the block in a new way might not be very complex.

It all seems kind of doable, at least if we accept a change in the userland
debugger experience of which ptrace reports a vfork parent might make in
what order.  But plenty of hair to worry about.


Thanks,
Roland

WARNING: multiple messages have this Message-ID (diff)
From: Roland McGrath <roland@redhat.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>
Subject: Re: uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible)
Date: Mon, 14 Jun 2010 12:17:10 -0700 (PDT)	[thread overview]
Message-ID: <20100614191710.18C0E403B2@magilla.sf.frob.com> (raw)
In-Reply-To: Oleg Nesterov's message of  Monday, 14 June 2010 18:33:05 +0200 <20100614163304.GA21313@redhat.com>

> But note that oom_kill_process() doesn't kill the children with the
> same ->mm. I never understood this code.

Yes, odd.  This is the first time I've really looked at oom_kill.

> Anyway I agree. Even if I am right, this is not very serious problem
> from oom-kill pov. To me, the uninterruptible CLONE_VFORK is bad by
> itself.

Agreed.

> Yes sure. That is why wait_for_completion_killable(), not _interrutpible.

Right, your code was fine.  I was just being pedantic for the record since
you said "interruptible" in the text.

> But I assume you didn't mean that only SIGKILL should interrupt the
> parent, any sig_fatal() signal should.

Yes.

> Agreed. This needs auditing. And CLONE_VFORK can be used with/without all
> other CLONE_ flags... Probably we should mostly worry about vfork ==
> CLONE_VM | CLONE_VFORK case.

Yes.  I hope it is fine to make clone refuse CLONE_VFORK set without
CLONE_VM in the future as a sanity check.  I don't think any use of
CLONE_VFORK other than the actual vfork use is something we ever intended
to support.

> Anyway. ->vfork_done is per-thread. This means that without any changes
> do_fork(CLONE_VFORK) can return (to user-mode) before the child's thread
> group exits/execs. Perhaps this means we shouldn't worry too much.

You mean some other thread in the parent's group can run in user mode.
Yes.  The real reason for the vfork wait is just that the parent/child will
share the user stack memory, so in practice it's fine if other threads with
other stacks are touching other memory (i.e. it's just the user's problem).

> Hmm. Even without debugger, the parent doesn't react to SIGSTOP. 

Yes.  It's been a long time since I thought about the vfork stuff much.
But I now recall thinking about the SIGSTOP/SIGTSTP issue too.  It does
seem bad.  OTOH, it has lurked there for many years now without complaints.

Note that supporting stop/fatal signals in the normal way means that the
call has to return and pass the syscall-exit tracing point first.  This
means a change in the order of events seen by a debugger.  It also
complicates the subject of PTRACE_EVENT_VFORK_DONE reports, which today
happen before syscall-exit or signal stuff is possible.  For proper
stopping in the normal way, the vfork-wait would be restarted via
sys_restart_syscall or something.  But the way that happens ordinarily is
to get all the way back to user mode and reenter with a normal syscall.
That doesn't touch the user stack itself, but it sure makes one nervous.
It's hard to see how we could ever do that and then prevent normal signals
from being handled before the restart.  (Instead, we'd have the actual
blocking done inside get_signal_to_deliver so we just never get to user
mode until the vfork hold is released, and not actually need to restart.)
So there are multiple cans of worms cascading from a change, even though
the actual work to do the block in a new way might not be very complex.

It all seems kind of doable, at least if we accept a change in the userland
debugger experience of which ptrace reports a vfork parent might make in
what order.  But plenty of hair to worry about.


Thanks,
Roland

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-06-14 19:17 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-31  9:33 [PATCH 1/5] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads KOSAKI Motohiro
2010-05-31  9:33 ` KOSAKI Motohiro
2010-05-31  9:35 ` [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account KOSAKI Motohiro
2010-05-31  9:35   ` KOSAKI Motohiro
2010-05-31 16:43   ` Oleg Nesterov
2010-05-31 16:43     ` Oleg Nesterov
2010-06-01  1:10     ` KOSAKI Motohiro
2010-06-01  1:10       ` KOSAKI Motohiro
2010-06-01 20:18       ` Oleg Nesterov
2010-06-01 20:18         ` Oleg Nesterov
2010-06-02 13:54         ` [PATCH] oom: remove PF_EXITING check completely KOSAKI Motohiro
2010-06-02 13:54           ` KOSAKI Motohiro
2010-06-02 15:54           ` Oleg Nesterov
2010-06-02 15:54             ` Oleg Nesterov
2010-06-02 21:02             ` David Rientjes
2010-06-02 21:02               ` David Rientjes
2010-06-03  4:48               ` KOSAKI Motohiro
2010-06-03  4:48                 ` KOSAKI Motohiro
2010-06-03  6:29                 ` David Rientjes
2010-06-03  6:29                   ` David Rientjes
2010-06-02 13:54         ` [PATCH] oom: Make coredump interruptible KOSAKI Motohiro
2010-06-02 13:54           ` KOSAKI Motohiro
2010-06-02 15:42           ` Oleg Nesterov
2010-06-02 15:42             ` Oleg Nesterov
2010-06-02 17:29             ` Roland McGrath
2010-06-02 17:29               ` Roland McGrath
2010-06-02 17:53               ` Oleg Nesterov
2010-06-02 17:53                 ` Oleg Nesterov
2010-06-02 18:58                 ` Roland McGrath
2010-06-02 18:58                   ` Roland McGrath
2010-06-02 20:38                   ` Oleg Nesterov
2010-06-02 20:38                     ` Oleg Nesterov
2010-06-03 14:03                     ` Oleg Nesterov
2010-06-03 14:03                       ` Oleg Nesterov
2010-06-04 10:54                     ` KOSAKI Motohiro
2010-06-04 10:54                       ` KOSAKI Motohiro
2010-06-04 11:27                       ` Oleg Nesterov
2010-06-04 11:27                         ` Oleg Nesterov
2010-06-04 11:34                         ` Oleg Nesterov
2010-06-04 11:34                           ` Oleg Nesterov
2010-06-09 19:53                         ` Oleg Nesterov
2010-06-09 19:53                           ` Oleg Nesterov
2010-06-09 20:41                           ` David Rientjes
2010-06-09 20:41                             ` David Rientjes
2010-06-09 21:03                             ` Oleg Nesterov
2010-06-09 21:03                               ` Oleg Nesterov
2010-06-13 11:24                           ` KOSAKI Motohiro
2010-06-13 11:24                             ` KOSAKI Motohiro
2010-06-13 15:53                             ` Oleg Nesterov
2010-06-13 15:53                               ` Oleg Nesterov
2010-06-13 17:13                               ` uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible) Oleg Nesterov
2010-06-13 17:13                                 ` Oleg Nesterov
2010-06-14  0:56                                 ` Roland McGrath
2010-06-14  0:56                                   ` Roland McGrath
2010-06-14 16:33                                   ` Oleg Nesterov
2010-06-14 16:33                                     ` Oleg Nesterov
2010-06-14 19:17                                     ` Roland McGrath [this message]
2010-06-14 19:17                                       ` Roland McGrath
2010-06-28 17:33                                       ` Oleg Nesterov
2010-06-28 17:33                                         ` Oleg Nesterov
2010-06-28 18:04                                         ` Roland McGrath
2010-06-28 18:04                                           ` Roland McGrath
2010-06-14  0:36                               ` [PATCH] oom: Make coredump interruptible Roland McGrath
2010-06-14  0:36                                 ` Roland McGrath
2010-06-14  0:26                     ` Roland McGrath
2010-06-14  0:26                       ` Roland McGrath
2010-06-01 20:39   ` [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account David Rientjes
2010-06-01 20:39     ` David Rientjes
2010-05-31  9:36 ` [PATCH 3/5] oom: introduce find_lock_task_mm() to fix !mm false positives KOSAKI Motohiro
2010-05-31  9:36   ` KOSAKI Motohiro
2010-06-01  0:57   ` KAMEZAWA Hiroyuki
2010-06-01  0:57     ` KAMEZAWA Hiroyuki
2010-06-01 20:42   ` David Rientjes
2010-06-01 20:42     ` David Rientjes
2010-06-02 16:05   ` Minchan Kim
2010-06-02 16:05     ` Minchan Kim
2010-05-31  9:37 ` [PATCH 4/5] oom: the points calculation of child processes must use find_lock_task_mm() too KOSAKI Motohiro
2010-05-31  9:37   ` KOSAKI Motohiro
2010-05-31 16:56   ` Oleg Nesterov
2010-05-31 16:56     ` Oleg Nesterov
2010-05-31 23:48     ` KOSAKI Motohiro
2010-05-31 23:48       ` KOSAKI Motohiro
2010-05-31  9:38 ` [PATCH 5/5] oom: __oom_kill_task() " KOSAKI Motohiro
2010-05-31  9:38   ` KOSAKI Motohiro
2010-06-01  1:02   ` KAMEZAWA Hiroyuki
2010-06-01  1:02     ` KAMEZAWA Hiroyuki
2010-06-01 20:44   ` David Rientjes
2010-06-01 20:44     ` David Rientjes
2010-06-01  0:54 ` [PATCH 1/5] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads KAMEZAWA Hiroyuki
2010-06-01  0:54   ` KAMEZAWA Hiroyuki
2010-06-01 20:36 ` David Rientjes
2010-06-01 20:36   ` David Rientjes
2010-06-01 21:20   ` Oleg Nesterov
2010-06-01 21:20     ` Oleg Nesterov
2010-06-01 21:26     ` David Rientjes
2010-06-01 21:26       ` David Rientjes
2010-06-02 13:54       ` KOSAKI Motohiro
2010-06-02 13:54         ` KOSAKI Motohiro
2010-06-02 21:09         ` David Rientjes
2010-06-02 21:09           ` David Rientjes
2010-06-02 21:33           ` Oleg Nesterov
2010-06-02 21:33             ` Oleg Nesterov
2010-06-02 21:46             ` David Rientjes
2010-06-02 21:46               ` David Rientjes
2010-06-03 14:27               ` Oleg Nesterov
2010-06-03 14:27                 ` Oleg Nesterov
2010-06-03 20:11                 ` David Rientjes
2010-06-03 20:11                   ` David Rientjes
2010-06-02 15:32 ` Minchan Kim
2010-06-02 15:32   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100614191710.18C0E403B2@magilla.sf.frob.com \
    --to=roland@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.