All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roland McGrath <roland@redhat.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>
Subject: Re: uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible)
Date: Sun, 13 Jun 2010 17:56:07 -0700 (PDT)	[thread overview]
Message-ID: <20100614005608.0D006408C1@magilla.sf.frob.com> (raw)
In-Reply-To: Oleg Nesterov's message of  Sunday, 13 June 2010 19:13:37 +0200 <20100613171337.GA12159@redhat.com>

> Oh. And another problem, vfork() is not interruptible too. This means
> that the user can hide the memory hog from oom-killer. 

I'm not sure there is really any danger like that, because of the
oom_kill_process "Try to kill a child first" logic.  Eventually the vfork
child will be chosen and killed, and when it finally exits that will
release the vfork wait.  So if the vfork parent is really the culprit,
it will then be subject to oom_kill_process sooner or later.

> But let's forget about oom.

Sure, but it reminds me to mention that vfork mm sharing is another reason
that having oom_kill set some persistent state in the mm seems wrong.  If a
vfork child is chosen for oom_kill and killed, then it's possible that will
relieve the need (e.g. much memory was held indirectly via its fd table or
whatnot else that is not shared with the parent via mm).  So once the child
is dead, there should not be any lingering bits in the parent's mm.

> Roland, any reason it should be uninterruptible? This doesn't look good
> in any case. Perhaps the pseudo-patch below makes sense?

I've long thought that we should make a vfork parent SIGKILL-able.  (Of
course the vfork wait can't be made interruptible by other signals, since
it must never do anything userish like signal handler setup until the child
has died or exec'd.)  I don't know off hand of any problem with your
straightforward change.  But I don't have much confidence that there isn't
any strange gotcha waiting there due to some other kind of implicit
assumption about vfork parent blocks that we are overlooking at the moment.
So I wouldn't change this without more thorough auditing and thinking about
everything related to vfork.

Personally, what I've really been interested in is changing the vfork wait
to use some different kind of blocking entirely.  My real motivation for
that is to let a vfork wait be morphed into and out of TASK_TRACED, so a
debugger can examine its registers and so forth.  That would entail letting
the vfork/clone syscall return fully back to the asm level so it could stop
in a proper state some place like the syscall-exit or notify-resume points.
However, that has other wrinkles on machines like sparc and ia64, where
user_regset access can involve user memory access.  Since we can't allow
those while the user memory is still shared with the child, it might not
really be practical at all.


Thanks,
Roland

WARNING: multiple messages have this Message-ID (diff)
From: Roland McGrath <roland@redhat.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>
Subject: Re: uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible)
Date: Sun, 13 Jun 2010 17:56:07 -0700 (PDT)	[thread overview]
Message-ID: <20100614005608.0D006408C1@magilla.sf.frob.com> (raw)
In-Reply-To: Oleg Nesterov's message of  Sunday, 13 June 2010 19:13:37 +0200 <20100613171337.GA12159@redhat.com>

> Oh. And another problem, vfork() is not interruptible too. This means
> that the user can hide the memory hog from oom-killer. 

I'm not sure there is really any danger like that, because of the
oom_kill_process "Try to kill a child first" logic.  Eventually the vfork
child will be chosen and killed, and when it finally exits that will
release the vfork wait.  So if the vfork parent is really the culprit,
it will then be subject to oom_kill_process sooner or later.

> But let's forget about oom.

Sure, but it reminds me to mention that vfork mm sharing is another reason
that having oom_kill set some persistent state in the mm seems wrong.  If a
vfork child is chosen for oom_kill and killed, then it's possible that will
relieve the need (e.g. much memory was held indirectly via its fd table or
whatnot else that is not shared with the parent via mm).  So once the child
is dead, there should not be any lingering bits in the parent's mm.

> Roland, any reason it should be uninterruptible? This doesn't look good
> in any case. Perhaps the pseudo-patch below makes sense?

I've long thought that we should make a vfork parent SIGKILL-able.  (Of
course the vfork wait can't be made interruptible by other signals, since
it must never do anything userish like signal handler setup until the child
has died or exec'd.)  I don't know off hand of any problem with your
straightforward change.  But I don't have much confidence that there isn't
any strange gotcha waiting there due to some other kind of implicit
assumption about vfork parent blocks that we are overlooking at the moment.
So I wouldn't change this without more thorough auditing and thinking about
everything related to vfork.

Personally, what I've really been interested in is changing the vfork wait
to use some different kind of blocking entirely.  My real motivation for
that is to let a vfork wait be morphed into and out of TASK_TRACED, so a
debugger can examine its registers and so forth.  That would entail letting
the vfork/clone syscall return fully back to the asm level so it could stop
in a proper state some place like the syscall-exit or notify-resume points.
However, that has other wrinkles on machines like sparc and ia64, where
user_regset access can involve user memory access.  Since we can't allow
those while the user memory is still shared with the child, it might not
really be practical at all.


Thanks,
Roland

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-06-14  0:56 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-31  9:33 [PATCH 1/5] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads KOSAKI Motohiro
2010-05-31  9:33 ` KOSAKI Motohiro
2010-05-31  9:35 ` [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account KOSAKI Motohiro
2010-05-31  9:35   ` KOSAKI Motohiro
2010-05-31 16:43   ` Oleg Nesterov
2010-05-31 16:43     ` Oleg Nesterov
2010-06-01  1:10     ` KOSAKI Motohiro
2010-06-01  1:10       ` KOSAKI Motohiro
2010-06-01 20:18       ` Oleg Nesterov
2010-06-01 20:18         ` Oleg Nesterov
2010-06-02 13:54         ` [PATCH] oom: remove PF_EXITING check completely KOSAKI Motohiro
2010-06-02 13:54           ` KOSAKI Motohiro
2010-06-02 15:54           ` Oleg Nesterov
2010-06-02 15:54             ` Oleg Nesterov
2010-06-02 21:02             ` David Rientjes
2010-06-02 21:02               ` David Rientjes
2010-06-03  4:48               ` KOSAKI Motohiro
2010-06-03  4:48                 ` KOSAKI Motohiro
2010-06-03  6:29                 ` David Rientjes
2010-06-03  6:29                   ` David Rientjes
2010-06-02 13:54         ` [PATCH] oom: Make coredump interruptible KOSAKI Motohiro
2010-06-02 13:54           ` KOSAKI Motohiro
2010-06-02 15:42           ` Oleg Nesterov
2010-06-02 15:42             ` Oleg Nesterov
2010-06-02 17:29             ` Roland McGrath
2010-06-02 17:29               ` Roland McGrath
2010-06-02 17:53               ` Oleg Nesterov
2010-06-02 17:53                 ` Oleg Nesterov
2010-06-02 18:58                 ` Roland McGrath
2010-06-02 18:58                   ` Roland McGrath
2010-06-02 20:38                   ` Oleg Nesterov
2010-06-02 20:38                     ` Oleg Nesterov
2010-06-03 14:03                     ` Oleg Nesterov
2010-06-03 14:03                       ` Oleg Nesterov
2010-06-04 10:54                     ` KOSAKI Motohiro
2010-06-04 10:54                       ` KOSAKI Motohiro
2010-06-04 11:27                       ` Oleg Nesterov
2010-06-04 11:27                         ` Oleg Nesterov
2010-06-04 11:34                         ` Oleg Nesterov
2010-06-04 11:34                           ` Oleg Nesterov
2010-06-09 19:53                         ` Oleg Nesterov
2010-06-09 19:53                           ` Oleg Nesterov
2010-06-09 20:41                           ` David Rientjes
2010-06-09 20:41                             ` David Rientjes
2010-06-09 21:03                             ` Oleg Nesterov
2010-06-09 21:03                               ` Oleg Nesterov
2010-06-13 11:24                           ` KOSAKI Motohiro
2010-06-13 11:24                             ` KOSAKI Motohiro
2010-06-13 15:53                             ` Oleg Nesterov
2010-06-13 15:53                               ` Oleg Nesterov
2010-06-13 17:13                               ` uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible) Oleg Nesterov
2010-06-13 17:13                                 ` Oleg Nesterov
2010-06-14  0:56                                 ` Roland McGrath [this message]
2010-06-14  0:56                                   ` Roland McGrath
2010-06-14 16:33                                   ` Oleg Nesterov
2010-06-14 16:33                                     ` Oleg Nesterov
2010-06-14 19:17                                     ` Roland McGrath
2010-06-14 19:17                                       ` Roland McGrath
2010-06-28 17:33                                       ` Oleg Nesterov
2010-06-28 17:33                                         ` Oleg Nesterov
2010-06-28 18:04                                         ` Roland McGrath
2010-06-28 18:04                                           ` Roland McGrath
2010-06-14  0:36                               ` [PATCH] oom: Make coredump interruptible Roland McGrath
2010-06-14  0:36                                 ` Roland McGrath
2010-06-14  0:26                     ` Roland McGrath
2010-06-14  0:26                       ` Roland McGrath
2010-06-01 20:39   ` [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account David Rientjes
2010-06-01 20:39     ` David Rientjes
2010-05-31  9:36 ` [PATCH 3/5] oom: introduce find_lock_task_mm() to fix !mm false positives KOSAKI Motohiro
2010-05-31  9:36   ` KOSAKI Motohiro
2010-06-01  0:57   ` KAMEZAWA Hiroyuki
2010-06-01  0:57     ` KAMEZAWA Hiroyuki
2010-06-01 20:42   ` David Rientjes
2010-06-01 20:42     ` David Rientjes
2010-06-02 16:05   ` Minchan Kim
2010-06-02 16:05     ` Minchan Kim
2010-05-31  9:37 ` [PATCH 4/5] oom: the points calculation of child processes must use find_lock_task_mm() too KOSAKI Motohiro
2010-05-31  9:37   ` KOSAKI Motohiro
2010-05-31 16:56   ` Oleg Nesterov
2010-05-31 16:56     ` Oleg Nesterov
2010-05-31 23:48     ` KOSAKI Motohiro
2010-05-31 23:48       ` KOSAKI Motohiro
2010-05-31  9:38 ` [PATCH 5/5] oom: __oom_kill_task() " KOSAKI Motohiro
2010-05-31  9:38   ` KOSAKI Motohiro
2010-06-01  1:02   ` KAMEZAWA Hiroyuki
2010-06-01  1:02     ` KAMEZAWA Hiroyuki
2010-06-01 20:44   ` David Rientjes
2010-06-01 20:44     ` David Rientjes
2010-06-01  0:54 ` [PATCH 1/5] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads KAMEZAWA Hiroyuki
2010-06-01  0:54   ` KAMEZAWA Hiroyuki
2010-06-01 20:36 ` David Rientjes
2010-06-01 20:36   ` David Rientjes
2010-06-01 21:20   ` Oleg Nesterov
2010-06-01 21:20     ` Oleg Nesterov
2010-06-01 21:26     ` David Rientjes
2010-06-01 21:26       ` David Rientjes
2010-06-02 13:54       ` KOSAKI Motohiro
2010-06-02 13:54         ` KOSAKI Motohiro
2010-06-02 21:09         ` David Rientjes
2010-06-02 21:09           ` David Rientjes
2010-06-02 21:33           ` Oleg Nesterov
2010-06-02 21:33             ` Oleg Nesterov
2010-06-02 21:46             ` David Rientjes
2010-06-02 21:46               ` David Rientjes
2010-06-03 14:27               ` Oleg Nesterov
2010-06-03 14:27                 ` Oleg Nesterov
2010-06-03 20:11                 ` David Rientjes
2010-06-03 20:11                   ` David Rientjes
2010-06-02 15:32 ` Minchan Kim
2010-06-02 15:32   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100614005608.0D006408C1@magilla.sf.frob.com \
    --to=roland@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.