From: Oleg Nesterov <oleg@redhat.com> To: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org>, anfei <anfei.zhou@gmail.com>, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, nishimura@mxp.nes.nec.co.jp, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Mel Gorman <mel@csn.ul.ie>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [patch] oom: give current access to memory reserves if it has been killed Date: Thu, 1 Apr 2010 17:26:38 +0200 [thread overview] Message-ID: <20100401152638.GC14603@redhat.com> (raw) In-Reply-To: <alpine.DEB.2.00.1004010044320.6285@chino.kir.corp.google.com> On 04/01, David Rientjes wrote: > > On Thu, 1 Apr 2010, Oleg Nesterov wrote: > > > Why? You ignored this part: > > > > Say, right after exit_mm() we are doing acct_process(), and f_op->write() > > needs a page. So, you are saying that in this case __page_cache_alloc() > > can never trigger out_of_memory() ? > > > > why this is not possible? > > > > It can, but the check for p->mm is sufficient since exit_notify() Yes, but I meant out_of_memory()->__oom_kill_task(current). OK, we already discussed this in the previous emails. > We cannot rely on oom_badness() to filter this task because we still > select it as our chosen task even with a badness score of 0 if !chosen Yes, see another email from me. > Your point about p->mm being non-NULL for kthreads using use_mm() is > taken, we should probably just change the is_global_init() check in > select_bad_process() to p->flags & PF_KTHREAD and ensure we reject > oom_kill_process() for them. Yes, but we have to check both is_global_init() and PF_KTHREAD. The "patch" I sent checks PF_KTHREAD in find_lock_task_mm(), but as I said select_bad_process() is the better place. > > OK, a bad user does > > > > int sleep_forever(void *) > > { > > pause(); > > } > > > > int main(void) > > { > > pthread_create(sleep_forever); > > syscall(__NR_exit); > > } > > > > Now, every time select_bad_process() is called it will find this process > > and PF_EXITING is true, so it just returns ERR_PTR(-1UL). And note that > > this process is not going to exit. > > > > Hmm, so it looks like we need to filter on !p->mm before checking for > PF_EXITING so that tasks that are EXIT_ZOMBIE won't make the oom killer > into a no-op. As it was already discussed, it is not easy to check !p->mm. Once again, we must not filter out the task just because its ->mm == NULL. Probably the best change for now is - if (p->flags & PF_EXITING) { + if (p->flags & PF_EXITING && p->mm) { This is not perfect too, but much better. > > > > Say, oom_forkbomb_penalty() does list_for_each_entry(tsk->children). > > > > Again, this is not right even if we forget about !child->mm check. > > > > This list_for_each_entry() can only see the processes forked by the > > > > main thread. > > > > > > > > > > That's the intention. > > > > Why? shouldn't oom_badness() return the same result for any thread > > in thread group? We should take all childs into account. > > > > oom_forkbomb_penalty() only cares about first-descendant children that > do not share the same memory, I see, but the code doesn't really do this. I mean, it doesn't really see the first-descendant children, only those which were forked by the main thread. Look. We have a main thread M and the sub-thread T. T forks a lot of processes which use a lot of memory. These processes _are_ the first descendant children of the M+T thread group, they should be accounted. But M->children list is empty. oom_forkbomb_penalty() and oom_kill_process() should do t = tsk; do { list_for_each_entry(child, &t->children, sibling) { ... take child into account ... } } while_each_thread(tsk, t); > > > > Hmm. Why oom_forkbomb_penalty() does thread_group_cputime() under > > > > task_lock() ? It seems, ->alloc_lock() is only needed for get_mm_rss(). > > > > > [...snip...] > We need task_lock() to ensure child->mm hasn't detached between the check > for child->mm == tsk->mm and get_mm_rss(child->mm). So I'm not sure what > you're trying to improve with this variation, it's a tradeoff between > calling thread_group_cputime() under task_lock() for a subset of a task's > threads when we already need to hold task_lock() anyway vs. calling it for > all threads unconditionally. See the patch below. Yes, this is minor, but it is always good to avoid the unnecessary locks, and thread_group_cputime() is O(N). Not only for performance reasons. This allows to change the locking in thread_group_cputime() if needed without fear to deadlock with task_lock(). Oleg. --- x/mm/oom_kill.c +++ x/mm/oom_kill.c @@ -97,13 +97,16 @@ static unsigned long oom_forkbomb_penalt return 0; list_for_each_entry(child, &tsk->children, sibling) { struct task_cputime task_time; - unsigned long runtime; + unsigned long runtime, this_rss; task_lock(child); if (!child->mm || child->mm == tsk->mm) { task_unlock(child); continue; } + this_rss = get_mm_rss(child->mm); + task_unlock(child); + thread_group_cputime(child, &task_time); runtime = cputime_to_jiffies(task_time.utime) + cputime_to_jiffies(task_time.stime); @@ -113,10 +116,9 @@ static unsigned long oom_forkbomb_penalt * get to execute at all in such cases anyway. */ if (runtime < HZ) { - child_rss += get_mm_rss(child->mm); + child_rss += this_rss; forkcount++; } - task_unlock(child); } /*
WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com> To: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org>, anfei <anfei.zhou@gmail.com>, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, nishimura@mxp.nes.nec.co.jp, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Mel Gorman <mel@csn.ul.ie>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [patch] oom: give current access to memory reserves if it has been killed Date: Thu, 1 Apr 2010 17:26:38 +0200 [thread overview] Message-ID: <20100401152638.GC14603@redhat.com> (raw) In-Reply-To: <alpine.DEB.2.00.1004010044320.6285@chino.kir.corp.google.com> On 04/01, David Rientjes wrote: > > On Thu, 1 Apr 2010, Oleg Nesterov wrote: > > > Why? You ignored this part: > > > > Say, right after exit_mm() we are doing acct_process(), and f_op->write() > > needs a page. So, you are saying that in this case __page_cache_alloc() > > can never trigger out_of_memory() ? > > > > why this is not possible? > > > > It can, but the check for p->mm is sufficient since exit_notify() Yes, but I meant out_of_memory()->__oom_kill_task(current). OK, we already discussed this in the previous emails. > We cannot rely on oom_badness() to filter this task because we still > select it as our chosen task even with a badness score of 0 if !chosen Yes, see another email from me. > Your point about p->mm being non-NULL for kthreads using use_mm() is > taken, we should probably just change the is_global_init() check in > select_bad_process() to p->flags & PF_KTHREAD and ensure we reject > oom_kill_process() for them. Yes, but we have to check both is_global_init() and PF_KTHREAD. The "patch" I sent checks PF_KTHREAD in find_lock_task_mm(), but as I said select_bad_process() is the better place. > > OK, a bad user does > > > > int sleep_forever(void *) > > { > > pause(); > > } > > > > int main(void) > > { > > pthread_create(sleep_forever); > > syscall(__NR_exit); > > } > > > > Now, every time select_bad_process() is called it will find this process > > and PF_EXITING is true, so it just returns ERR_PTR(-1UL). And note that > > this process is not going to exit. > > > > Hmm, so it looks like we need to filter on !p->mm before checking for > PF_EXITING so that tasks that are EXIT_ZOMBIE won't make the oom killer > into a no-op. As it was already discussed, it is not easy to check !p->mm. Once again, we must not filter out the task just because its ->mm == NULL. Probably the best change for now is - if (p->flags & PF_EXITING) { + if (p->flags & PF_EXITING && p->mm) { This is not perfect too, but much better. > > > > Say, oom_forkbomb_penalty() does list_for_each_entry(tsk->children). > > > > Again, this is not right even if we forget about !child->mm check. > > > > This list_for_each_entry() can only see the processes forked by the > > > > main thread. > > > > > > > > > > That's the intention. > > > > Why? shouldn't oom_badness() return the same result for any thread > > in thread group? We should take all childs into account. > > > > oom_forkbomb_penalty() only cares about first-descendant children that > do not share the same memory, I see, but the code doesn't really do this. I mean, it doesn't really see the first-descendant children, only those which were forked by the main thread. Look. We have a main thread M and the sub-thread T. T forks a lot of processes which use a lot of memory. These processes _are_ the first descendant children of the M+T thread group, they should be accounted. But M->children list is empty. oom_forkbomb_penalty() and oom_kill_process() should do t = tsk; do { list_for_each_entry(child, &t->children, sibling) { ... take child into account ... } } while_each_thread(tsk, t); > > > > Hmm. Why oom_forkbomb_penalty() does thread_group_cputime() under > > > > task_lock() ? It seems, ->alloc_lock() is only needed for get_mm_rss(). > > > > > [...snip...] > We need task_lock() to ensure child->mm hasn't detached between the check > for child->mm == tsk->mm and get_mm_rss(child->mm). So I'm not sure what > you're trying to improve with this variation, it's a tradeoff between > calling thread_group_cputime() under task_lock() for a subset of a task's > threads when we already need to hold task_lock() anyway vs. calling it for > all threads unconditionally. See the patch below. Yes, this is minor, but it is always good to avoid the unnecessary locks, and thread_group_cputime() is O(N). Not only for performance reasons. This allows to change the locking in thread_group_cputime() if needed without fear to deadlock with task_lock(). Oleg. --- x/mm/oom_kill.c +++ x/mm/oom_kill.c @@ -97,13 +97,16 @@ static unsigned long oom_forkbomb_penalt return 0; list_for_each_entry(child, &tsk->children, sibling) { struct task_cputime task_time; - unsigned long runtime; + unsigned long runtime, this_rss; task_lock(child); if (!child->mm || child->mm == tsk->mm) { task_unlock(child); continue; } + this_rss = get_mm_rss(child->mm); + task_unlock(child); + thread_group_cputime(child, &task_time); runtime = cputime_to_jiffies(task_time.utime) + cputime_to_jiffies(task_time.stime); @@ -113,10 +116,9 @@ static unsigned long oom_forkbomb_penalt * get to execute at all in such cases anyway. */ if (runtime < HZ) { - child_rss += get_mm_rss(child->mm); + child_rss += this_rss; forkcount++; } - task_unlock(child); } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-01 16:34 UTC|newest] Thread overview: 197+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-03-24 16:25 [PATCH] oom killer: break from infinite loop Anfei Zhou 2010-03-24 16:25 ` Anfei Zhou 2010-03-25 2:51 ` KOSAKI Motohiro 2010-03-25 2:51 ` KOSAKI Motohiro 2010-03-26 22:08 ` Andrew Morton 2010-03-26 22:08 ` Andrew Morton 2010-03-26 22:33 ` Oleg Nesterov 2010-03-26 22:33 ` Oleg Nesterov 2010-03-28 14:55 ` anfei 2010-03-28 14:55 ` anfei 2010-03-28 16:28 ` Oleg Nesterov 2010-03-28 16:28 ` Oleg Nesterov 2010-03-28 21:21 ` David Rientjes 2010-03-28 21:21 ` David Rientjes 2010-03-29 11:21 ` Oleg Nesterov 2010-03-29 11:21 ` Oleg Nesterov 2010-03-29 20:49 ` [patch] oom: give current access to memory reserves if it has been killed David Rientjes 2010-03-29 20:49 ` David Rientjes 2010-03-30 15:46 ` Oleg Nesterov 2010-03-30 15:46 ` Oleg Nesterov 2010-03-30 20:26 ` David Rientjes 2010-03-30 20:26 ` David Rientjes 2010-03-31 17:58 ` Oleg Nesterov 2010-03-31 17:58 ` Oleg Nesterov 2010-03-31 20:47 ` Oleg Nesterov 2010-03-31 20:47 ` Oleg Nesterov 2010-04-01 8:35 ` David Rientjes 2010-04-01 8:35 ` David Rientjes 2010-04-01 8:57 ` [patch -mm] oom: hold tasklist_lock when dumping tasks David Rientjes 2010-04-01 14:27 ` Oleg Nesterov 2010-04-01 19:16 ` David Rientjes 2010-04-01 13:59 ` [patch] oom: give current access to memory reserves if it has been killed Oleg Nesterov 2010-04-01 14:00 ` Oleg Nesterov 2010-04-01 19:12 ` David Rientjes 2010-04-01 19:12 ` David Rientjes 2010-04-02 11:14 ` Oleg Nesterov 2010-04-02 11:14 ` Oleg Nesterov 2010-04-02 18:30 ` [PATCH -mm 0/4] oom: linux has threads Oleg Nesterov 2010-04-02 18:30 ` Oleg Nesterov 2010-04-02 18:31 ` [PATCH -mm 1/4] oom: select_bad_process: check PF_KTHREAD instead of !mm to skip kthreads Oleg Nesterov 2010-04-02 18:31 ` Oleg Nesterov 2010-04-02 19:05 ` David Rientjes 2010-04-02 19:05 ` David Rientjes 2010-04-02 18:32 ` [PATCH -mm 2/4] oom: select_bad_process: PF_EXITING check should take ->mm into account Oleg Nesterov 2010-04-02 18:32 ` Oleg Nesterov 2010-04-06 11:42 ` anfei 2010-04-06 11:42 ` anfei 2010-04-06 12:18 ` Oleg Nesterov 2010-04-06 12:18 ` Oleg Nesterov 2010-04-06 13:05 ` anfei 2010-04-06 13:05 ` anfei 2010-04-06 13:38 ` Oleg Nesterov 2010-04-06 13:38 ` Oleg Nesterov 2010-04-02 18:32 ` [PATCH -mm 3/4] oom: introduce find_lock_task_mm() to fix !mm false positives Oleg Nesterov 2010-04-02 18:32 ` Oleg Nesterov 2010-04-02 18:33 ` [PATCH -mm 4/4] oom: oom_forkbomb_penalty: move thread_group_cputime() out of task_lock() Oleg Nesterov 2010-04-02 18:33 ` Oleg Nesterov 2010-04-02 19:04 ` David Rientjes 2010-04-02 19:04 ` David Rientjes 2010-04-05 14:23 ` [PATCH -mm] oom: select_bad_process: never choose tasks with badness == 0 Oleg Nesterov 2010-04-05 14:23 ` Oleg Nesterov 2010-04-02 19:02 ` [patch] oom: give current access to memory reserves if it has been killed David Rientjes 2010-04-02 19:02 ` David Rientjes 2010-04-02 19:14 ` Oleg Nesterov 2010-04-02 19:14 ` Oleg Nesterov 2010-04-02 19:46 ` David Rientjes 2010-04-02 19:46 ` David Rientjes 2010-04-02 19:54 ` [patch -mm] oom: exclude tasks with badness score of 0 from being selected David Rientjes 2010-04-02 19:54 ` David Rientjes 2010-04-02 21:04 ` Oleg Nesterov 2010-04-02 21:04 ` Oleg Nesterov 2010-04-02 21:22 ` [patch -mm v2] " David Rientjes 2010-04-02 21:22 ` David Rientjes 2010-04-02 20:55 ` [patch] oom: give current access to memory reserves if it has been killed Oleg Nesterov 2010-04-02 20:55 ` Oleg Nesterov 2010-03-31 21:07 ` David Rientjes 2010-03-31 21:07 ` David Rientjes 2010-03-31 22:50 ` Oleg Nesterov 2010-03-31 22:50 ` Oleg Nesterov 2010-03-31 23:30 ` Oleg Nesterov 2010-03-31 23:30 ` Oleg Nesterov 2010-03-31 23:48 ` David Rientjes 2010-03-31 23:48 ` David Rientjes 2010-04-01 14:39 ` Oleg Nesterov 2010-04-01 14:39 ` Oleg Nesterov 2010-04-01 18:58 ` David Rientjes 2010-04-01 18:58 ` David Rientjes 2010-04-01 8:25 ` David Rientjes 2010-04-01 8:25 ` David Rientjes 2010-04-01 15:26 ` Oleg Nesterov [this message] 2010-04-01 15:26 ` Oleg Nesterov 2010-04-08 21:08 ` David Rientjes 2010-04-08 21:08 ` David Rientjes 2010-04-09 12:38 ` Oleg Nesterov 2010-04-09 12:38 ` Oleg Nesterov 2010-03-30 16:39 ` [PATCH] oom: fix the unsafe proc_oom_score()->badness() call Oleg Nesterov 2010-03-30 16:39 ` Oleg Nesterov 2010-03-30 17:43 ` [PATCH -mm] proc: don't take ->siglock for /proc/pid/oom_adj Oleg Nesterov 2010-03-30 17:43 ` Oleg Nesterov 2010-03-30 20:30 ` David Rientjes 2010-03-30 20:30 ` David Rientjes 2010-03-31 9:17 ` Oleg Nesterov 2010-03-31 9:17 ` Oleg Nesterov 2010-03-31 18:59 ` Oleg Nesterov 2010-03-31 18:59 ` Oleg Nesterov 2010-03-31 21:14 ` David Rientjes 2010-03-31 21:14 ` David Rientjes 2010-03-31 23:00 ` Oleg Nesterov 2010-03-31 23:00 ` Oleg Nesterov 2010-04-01 8:32 ` David Rientjes 2010-04-01 8:32 ` David Rientjes 2010-04-01 15:37 ` Oleg Nesterov 2010-04-01 15:37 ` Oleg Nesterov 2010-04-01 19:04 ` David Rientjes 2010-04-01 19:04 ` David Rientjes 2010-03-30 20:32 ` [PATCH] oom: fix the unsafe proc_oom_score()->badness() call David Rientjes 2010-03-30 20:32 ` David Rientjes 2010-03-31 9:16 ` Oleg Nesterov 2010-03-31 9:16 ` Oleg Nesterov 2010-03-31 20:17 ` Oleg Nesterov 2010-03-31 20:17 ` Oleg Nesterov 2010-04-01 7:41 ` David Rientjes 2010-04-01 7:41 ` David Rientjes 2010-04-01 13:13 ` [PATCH 0/1] oom: fix the unsafe usage of badness() in proc_oom_score() Oleg Nesterov 2010-04-01 13:13 ` Oleg Nesterov 2010-04-01 13:13 ` [PATCH 1/1] " Oleg Nesterov 2010-04-01 13:13 ` Oleg Nesterov 2010-04-01 19:03 ` David Rientjes 2010-04-01 19:03 ` David Rientjes 2010-03-29 14:06 ` [PATCH] oom killer: break from infinite loop anfei 2010-03-29 14:06 ` anfei 2010-03-29 20:01 ` David Rientjes 2010-03-29 20:01 ` David Rientjes 2010-03-30 14:29 ` anfei 2010-03-30 14:29 ` anfei 2010-03-30 20:29 ` David Rientjes 2010-03-30 20:29 ` David Rientjes 2010-03-31 0:57 ` KAMEZAWA Hiroyuki 2010-03-31 0:57 ` KAMEZAWA Hiroyuki 2010-03-31 6:07 ` David Rientjes 2010-03-31 6:07 ` David Rientjes 2010-03-31 6:13 ` KAMEZAWA Hiroyuki 2010-03-31 6:13 ` KAMEZAWA Hiroyuki 2010-03-31 6:30 ` Balbir Singh 2010-03-31 6:30 ` Balbir Singh 2010-03-31 6:31 ` KAMEZAWA Hiroyuki 2010-03-31 6:31 ` KAMEZAWA Hiroyuki 2010-03-31 7:04 ` David Rientjes 2010-03-31 7:04 ` David Rientjes 2010-03-31 6:32 ` David Rientjes 2010-03-31 6:32 ` David Rientjes 2010-03-31 7:08 ` [patch -mm] memcg: make oom killer a no-op when no killable task can be found David Rientjes 2010-03-31 7:08 ` KAMEZAWA Hiroyuki 2010-03-31 8:04 ` Balbir Singh 2010-03-31 10:38 ` David Rientjes 2010-04-04 23:28 ` David Rientjes 2010-04-05 21:30 ` Andrew Morton 2010-04-05 22:40 ` David Rientjes 2010-04-05 22:49 ` Andrew Morton 2010-04-05 23:01 ` David Rientjes 2010-04-06 12:08 ` KOSAKI Motohiro 2010-04-06 21:47 ` David Rientjes 2010-04-07 0:20 ` KAMEZAWA Hiroyuki 2010-04-07 13:29 ` KOSAKI Motohiro 2010-04-08 18:05 ` David Rientjes 2010-04-21 19:17 ` Andrew Morton 2010-04-21 22:04 ` David Rientjes 2010-04-22 0:23 ` KAMEZAWA Hiroyuki 2010-04-22 8:34 ` David Rientjes 2010-04-27 22:58 ` [patch -mm] oom: reintroduce and deprecate oom_kill_allocating_task David Rientjes 2010-04-28 0:57 ` KAMEZAWA Hiroyuki 2010-04-22 7:23 ` [patch -mm] memcg: make oom killer a no-op when no killable task can be found Nick Piggin 2010-04-22 7:25 ` KAMEZAWA Hiroyuki 2010-04-22 10:09 ` Nick Piggin 2010-04-22 10:27 ` KAMEZAWA Hiroyuki 2010-04-22 21:11 ` David Rientjes 2010-04-22 10:28 ` David Rientjes 2010-04-22 15:39 ` Nick Piggin 2010-04-22 21:09 ` David Rientjes 2010-05-04 23:55 ` David Rientjes 2010-04-08 17:36 ` David Rientjes 2010-04-02 10:17 ` [PATCH] oom killer: break from infinite loop Mel Gorman 2010-04-02 10:17 ` Mel Gorman 2010-04-04 23:26 ` David Rientjes 2010-04-04 23:26 ` David Rientjes 2010-04-05 10:47 ` Mel Gorman 2010-04-05 10:47 ` Mel Gorman 2010-04-06 22:40 ` David Rientjes 2010-04-06 22:40 ` David Rientjes 2010-03-29 11:31 ` anfei 2010-03-29 11:31 ` anfei 2010-03-29 11:46 ` Oleg Nesterov 2010-03-29 11:46 ` Oleg Nesterov 2010-03-29 12:09 ` anfei 2010-03-29 12:09 ` anfei 2010-03-28 2:46 ` David Rientjes 2010-03-28 2:46 ` David Rientjes
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20100401152638.GC14603@redhat.com \ --to=oleg@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=anfei.zhou@gmail.com \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=kosaki.motohiro@jp.fujitsu.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mel@csn.ul.ie \ --cc=nishimura@mxp.nes.nec.co.jp \ --cc=rientjes@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.