From: Roland McGrath <roland@redhat.com> To: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, LKML <linux-kernel@vger.kernel.org>, linux-mm <linux-mm@kvack.org>, David Rientjes <rientjes@google.com>, Andrew Morton <akpm@linux-foundation.org>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Nick Piggin <npiggin@suse.de> Subject: Re: uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible) Date: Mon, 14 Jun 2010 12:17:10 -0700 (PDT) [thread overview] Message-ID: <20100614191710.18C0E403B2@magilla.sf.frob.com> (raw) In-Reply-To: Oleg Nesterov's message of Monday, 14 June 2010 18:33:05 +0200 <20100614163304.GA21313@redhat.com> > But note that oom_kill_process() doesn't kill the children with the > same ->mm. I never understood this code. Yes, odd. This is the first time I've really looked at oom_kill. > Anyway I agree. Even if I am right, this is not very serious problem > from oom-kill pov. To me, the uninterruptible CLONE_VFORK is bad by > itself. Agreed. > Yes sure. That is why wait_for_completion_killable(), not _interrutpible. Right, your code was fine. I was just being pedantic for the record since you said "interruptible" in the text. > But I assume you didn't mean that only SIGKILL should interrupt the > parent, any sig_fatal() signal should. Yes. > Agreed. This needs auditing. And CLONE_VFORK can be used with/without all > other CLONE_ flags... Probably we should mostly worry about vfork == > CLONE_VM | CLONE_VFORK case. Yes. I hope it is fine to make clone refuse CLONE_VFORK set without CLONE_VM in the future as a sanity check. I don't think any use of CLONE_VFORK other than the actual vfork use is something we ever intended to support. > Anyway. ->vfork_done is per-thread. This means that without any changes > do_fork(CLONE_VFORK) can return (to user-mode) before the child's thread > group exits/execs. Perhaps this means we shouldn't worry too much. You mean some other thread in the parent's group can run in user mode. Yes. The real reason for the vfork wait is just that the parent/child will share the user stack memory, so in practice it's fine if other threads with other stacks are touching other memory (i.e. it's just the user's problem). > Hmm. Even without debugger, the parent doesn't react to SIGSTOP. Yes. It's been a long time since I thought about the vfork stuff much. But I now recall thinking about the SIGSTOP/SIGTSTP issue too. It does seem bad. OTOH, it has lurked there for many years now without complaints. Note that supporting stop/fatal signals in the normal way means that the call has to return and pass the syscall-exit tracing point first. This means a change in the order of events seen by a debugger. It also complicates the subject of PTRACE_EVENT_VFORK_DONE reports, which today happen before syscall-exit or signal stuff is possible. For proper stopping in the normal way, the vfork-wait would be restarted via sys_restart_syscall or something. But the way that happens ordinarily is to get all the way back to user mode and reenter with a normal syscall. That doesn't touch the user stack itself, but it sure makes one nervous. It's hard to see how we could ever do that and then prevent normal signals from being handled before the restart. (Instead, we'd have the actual blocking done inside get_signal_to_deliver so we just never get to user mode until the vfork hold is released, and not actually need to restart.) So there are multiple cans of worms cascading from a change, even though the actual work to do the block in a new way might not be very complex. It all seems kind of doable, at least if we accept a change in the userland debugger experience of which ptrace reports a vfork parent might make in what order. But plenty of hair to worry about. Thanks, Roland
WARNING: multiple messages have this Message-ID (diff)
From: Roland McGrath <roland@redhat.com> To: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, LKML <linux-kernel@vger.kernel.org>, linux-mm <linux-mm@kvack.org>, David Rientjes <rientjes@google.com>, Andrew Morton <akpm@linux-foundation.org>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Nick Piggin <npiggin@suse.de> Subject: Re: uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible) Date: Mon, 14 Jun 2010 12:17:10 -0700 (PDT) [thread overview] Message-ID: <20100614191710.18C0E403B2@magilla.sf.frob.com> (raw) In-Reply-To: Oleg Nesterov's message of Monday, 14 June 2010 18:33:05 +0200 <20100614163304.GA21313@redhat.com> > But note that oom_kill_process() doesn't kill the children with the > same ->mm. I never understood this code. Yes, odd. This is the first time I've really looked at oom_kill. > Anyway I agree. Even if I am right, this is not very serious problem > from oom-kill pov. To me, the uninterruptible CLONE_VFORK is bad by > itself. Agreed. > Yes sure. That is why wait_for_completion_killable(), not _interrutpible. Right, your code was fine. I was just being pedantic for the record since you said "interruptible" in the text. > But I assume you didn't mean that only SIGKILL should interrupt the > parent, any sig_fatal() signal should. Yes. > Agreed. This needs auditing. And CLONE_VFORK can be used with/without all > other CLONE_ flags... Probably we should mostly worry about vfork == > CLONE_VM | CLONE_VFORK case. Yes. I hope it is fine to make clone refuse CLONE_VFORK set without CLONE_VM in the future as a sanity check. I don't think any use of CLONE_VFORK other than the actual vfork use is something we ever intended to support. > Anyway. ->vfork_done is per-thread. This means that without any changes > do_fork(CLONE_VFORK) can return (to user-mode) before the child's thread > group exits/execs. Perhaps this means we shouldn't worry too much. You mean some other thread in the parent's group can run in user mode. Yes. The real reason for the vfork wait is just that the parent/child will share the user stack memory, so in practice it's fine if other threads with other stacks are touching other memory (i.e. it's just the user's problem). > Hmm. Even without debugger, the parent doesn't react to SIGSTOP. Yes. It's been a long time since I thought about the vfork stuff much. But I now recall thinking about the SIGSTOP/SIGTSTP issue too. It does seem bad. OTOH, it has lurked there for many years now without complaints. Note that supporting stop/fatal signals in the normal way means that the call has to return and pass the syscall-exit tracing point first. This means a change in the order of events seen by a debugger. It also complicates the subject of PTRACE_EVENT_VFORK_DONE reports, which today happen before syscall-exit or signal stuff is possible. For proper stopping in the normal way, the vfork-wait would be restarted via sys_restart_syscall or something. But the way that happens ordinarily is to get all the way back to user mode and reenter with a normal syscall. That doesn't touch the user stack itself, but it sure makes one nervous. It's hard to see how we could ever do that and then prevent normal signals from being handled before the restart. (Instead, we'd have the actual blocking done inside get_signal_to_deliver so we just never get to user mode until the vfork hold is released, and not actually need to restart.) So there are multiple cans of worms cascading from a change, even though the actual work to do the block in a new way might not be very complex. It all seems kind of doable, at least if we accept a change in the userland debugger experience of which ptrace reports a vfork parent might make in what order. But plenty of hair to worry about. Thanks, Roland -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-06-14 19:17 UTC|newest] Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-05-31 9:33 [PATCH 1/5] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads KOSAKI Motohiro 2010-05-31 9:33 ` KOSAKI Motohiro 2010-05-31 9:35 ` [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account KOSAKI Motohiro 2010-05-31 9:35 ` KOSAKI Motohiro 2010-05-31 16:43 ` Oleg Nesterov 2010-05-31 16:43 ` Oleg Nesterov 2010-06-01 1:10 ` KOSAKI Motohiro 2010-06-01 1:10 ` KOSAKI Motohiro 2010-06-01 20:18 ` Oleg Nesterov 2010-06-01 20:18 ` Oleg Nesterov 2010-06-02 13:54 ` [PATCH] oom: remove PF_EXITING check completely KOSAKI Motohiro 2010-06-02 13:54 ` KOSAKI Motohiro 2010-06-02 15:54 ` Oleg Nesterov 2010-06-02 15:54 ` Oleg Nesterov 2010-06-02 21:02 ` David Rientjes 2010-06-02 21:02 ` David Rientjes 2010-06-03 4:48 ` KOSAKI Motohiro 2010-06-03 4:48 ` KOSAKI Motohiro 2010-06-03 6:29 ` David Rientjes 2010-06-03 6:29 ` David Rientjes 2010-06-02 13:54 ` [PATCH] oom: Make coredump interruptible KOSAKI Motohiro 2010-06-02 13:54 ` KOSAKI Motohiro 2010-06-02 15:42 ` Oleg Nesterov 2010-06-02 15:42 ` Oleg Nesterov 2010-06-02 17:29 ` Roland McGrath 2010-06-02 17:29 ` Roland McGrath 2010-06-02 17:53 ` Oleg Nesterov 2010-06-02 17:53 ` Oleg Nesterov 2010-06-02 18:58 ` Roland McGrath 2010-06-02 18:58 ` Roland McGrath 2010-06-02 20:38 ` Oleg Nesterov 2010-06-02 20:38 ` Oleg Nesterov 2010-06-03 14:03 ` Oleg Nesterov 2010-06-03 14:03 ` Oleg Nesterov 2010-06-04 10:54 ` KOSAKI Motohiro 2010-06-04 10:54 ` KOSAKI Motohiro 2010-06-04 11:27 ` Oleg Nesterov 2010-06-04 11:27 ` Oleg Nesterov 2010-06-04 11:34 ` Oleg Nesterov 2010-06-04 11:34 ` Oleg Nesterov 2010-06-09 19:53 ` Oleg Nesterov 2010-06-09 19:53 ` Oleg Nesterov 2010-06-09 20:41 ` David Rientjes 2010-06-09 20:41 ` David Rientjes 2010-06-09 21:03 ` Oleg Nesterov 2010-06-09 21:03 ` Oleg Nesterov 2010-06-13 11:24 ` KOSAKI Motohiro 2010-06-13 11:24 ` KOSAKI Motohiro 2010-06-13 15:53 ` Oleg Nesterov 2010-06-13 15:53 ` Oleg Nesterov 2010-06-13 17:13 ` uninterruptible CLONE_VFORK (Was: oom: Make coredump interruptible) Oleg Nesterov 2010-06-13 17:13 ` Oleg Nesterov 2010-06-14 0:56 ` Roland McGrath 2010-06-14 0:56 ` Roland McGrath 2010-06-14 16:33 ` Oleg Nesterov 2010-06-14 16:33 ` Oleg Nesterov 2010-06-14 19:17 ` Roland McGrath [this message] 2010-06-14 19:17 ` Roland McGrath 2010-06-28 17:33 ` Oleg Nesterov 2010-06-28 17:33 ` Oleg Nesterov 2010-06-28 18:04 ` Roland McGrath 2010-06-28 18:04 ` Roland McGrath 2010-06-14 0:36 ` [PATCH] oom: Make coredump interruptible Roland McGrath 2010-06-14 0:36 ` Roland McGrath 2010-06-14 0:26 ` Roland McGrath 2010-06-14 0:26 ` Roland McGrath 2010-06-01 20:39 ` [PATCH 2/5] oom: select_bad_process: PF_EXITING check should take ->mm into account David Rientjes 2010-06-01 20:39 ` David Rientjes 2010-05-31 9:36 ` [PATCH 3/5] oom: introduce find_lock_task_mm() to fix !mm false positives KOSAKI Motohiro 2010-05-31 9:36 ` KOSAKI Motohiro 2010-06-01 0:57 ` KAMEZAWA Hiroyuki 2010-06-01 0:57 ` KAMEZAWA Hiroyuki 2010-06-01 20:42 ` David Rientjes 2010-06-01 20:42 ` David Rientjes 2010-06-02 16:05 ` Minchan Kim 2010-06-02 16:05 ` Minchan Kim 2010-05-31 9:37 ` [PATCH 4/5] oom: the points calculation of child processes must use find_lock_task_mm() too KOSAKI Motohiro 2010-05-31 9:37 ` KOSAKI Motohiro 2010-05-31 16:56 ` Oleg Nesterov 2010-05-31 16:56 ` Oleg Nesterov 2010-05-31 23:48 ` KOSAKI Motohiro 2010-05-31 23:48 ` KOSAKI Motohiro 2010-05-31 9:38 ` [PATCH 5/5] oom: __oom_kill_task() " KOSAKI Motohiro 2010-05-31 9:38 ` KOSAKI Motohiro 2010-06-01 1:02 ` KAMEZAWA Hiroyuki 2010-06-01 1:02 ` KAMEZAWA Hiroyuki 2010-06-01 20:44 ` David Rientjes 2010-06-01 20:44 ` David Rientjes 2010-06-01 0:54 ` [PATCH 1/5] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads KAMEZAWA Hiroyuki 2010-06-01 0:54 ` KAMEZAWA Hiroyuki 2010-06-01 20:36 ` David Rientjes 2010-06-01 20:36 ` David Rientjes 2010-06-01 21:20 ` Oleg Nesterov 2010-06-01 21:20 ` Oleg Nesterov 2010-06-01 21:26 ` David Rientjes 2010-06-01 21:26 ` David Rientjes 2010-06-02 13:54 ` KOSAKI Motohiro 2010-06-02 13:54 ` KOSAKI Motohiro 2010-06-02 21:09 ` David Rientjes 2010-06-02 21:09 ` David Rientjes 2010-06-02 21:33 ` Oleg Nesterov 2010-06-02 21:33 ` Oleg Nesterov 2010-06-02 21:46 ` David Rientjes 2010-06-02 21:46 ` David Rientjes 2010-06-03 14:27 ` Oleg Nesterov 2010-06-03 14:27 ` Oleg Nesterov 2010-06-03 20:11 ` David Rientjes 2010-06-03 20:11 ` David Rientjes 2010-06-02 15:32 ` Minchan Kim 2010-06-02 15:32 ` Minchan Kim
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20100614191710.18C0E403B2@magilla.sf.frob.com \ --to=roland@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=kosaki.motohiro@jp.fujitsu.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=npiggin@suse.de \ --cc=oleg@redhat.com \ --cc=rientjes@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.