All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>, Nick Piggin <npiggin@suse.de>,
	Oleg Nesterov <oleg@redhat.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org
Subject: Re: [patch 09/18] oom: select task from tasklist for mempolicy ooms
Date: Tue, 8 Jun 2010 17:46:45 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1006081741150.19582@chino.kir.corp.google.com> (raw)
In-Reply-To: <20100608140818.b413c335.akpm@linux-foundation.org>

On Tue, 8 Jun 2010, Andrew Morton wrote:

> > The oom killer presently kills current whenever there is no more memory
> > free or reclaimable on its mempolicy's nodes.  There is no guarantee that
> > current is a memory-hogging task or that killing it will free any
> > substantial amount of memory, however.
> > 
> > In such situations, it is better to scan the tasklist for nodes that are
> > allowed to allocate on current's set of nodes and kill the task with the
> > highest badness() score.  This ensures that the most memory-hogging task,
> > or the one configured by the user with /proc/pid/oom_adj, is always
> > selected in such scenarios.
> > 
> >
> > ...
> >
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -27,6 +27,7 @@
> >  #include <linux/module.h>
> >  #include <linux/notifier.h>
> >  #include <linux/memcontrol.h>
> > +#include <linux/mempolicy.h>
> >  #include <linux/security.h>
> >  
> >  int sysctl_panic_on_oom;
> > @@ -36,20 +37,36 @@ static DEFINE_SPINLOCK(zone_scan_lock);
> >  /* #define DEBUG */
> >  
> >  /*
> > - * Is all threads of the target process nodes overlap ours?
> > + * Do all threads of the target process overlap our allowed nodes?
> > + * @tsk: task struct of which task to consider
> > + * @mask: nodemask passed to page allocator for mempolicy ooms
> 
> The comment uses kerneldoc annotation but isn't a kerneldoc comment.
> 

I'll fix it.

> >   */
> > -static int has_intersects_mems_allowed(struct task_struct *tsk)
> > +static bool has_intersects_mems_allowed(struct task_struct *tsk,
> > +					const nodemask_t *mask)
> >  {
> > -	struct task_struct *t;
> > +	struct task_struct *start = tsk;
> >  
> > -	t = tsk;
> >  	do {
> > -		if (cpuset_mems_allowed_intersects(current, t))
> > -			return 1;
> > -		t = next_thread(t);
> > -	} while (t != tsk);
> > -
> > -	return 0;
> > +		if (mask) {
> > +			/*
> > +			 * If this is a mempolicy constrained oom, tsk's
> > +			 * cpuset is irrelevant.  Only return true if its
> > +			 * mempolicy intersects current, otherwise it may be
> > +			 * needlessly killed.
> > +			 */
> > +			if (mempolicy_nodemask_intersects(tsk, mask))
> > +				return true;
> 
> The comment refers to `current' but the code does not?
> 

mempolicy_nodemask_intersects() compares tsk's mempolicy to current's, we 
don't need to pass current into the function (and we optimize for that 
since we don't need to do task_lock(current): nothing else can change its 
mempolicy).

> > +		} else {
> > +			/*
> > +			 * This is not a mempolicy constrained oom, so only
> > +			 * check the mems of tsk's cpuset.
> > +			 */
> 
> The comment doesn't refer to `current', but the code does.  Confused.
> 

This simply compares the cpuset mems_allowed of both tasks passed into the 
function.

> > +			if (cpuset_mems_allowed_intersects(current, tsk))
> > +				return true;
> > +		}
> > +		tsk = next_thread(tsk);
> 
> hm, next_thread() uses list_entry_rcu().  What are the locking rules
> here?  It's one of both of rcu_read_lock() and read_lock(&tasklist_lock),
> I think?
> 

Oleg addressed this in his response.

> > +	} while (tsk != start);
> > +	return false;
> >  }
> 
> This is all bloat and overhead for non-NUMA builds.  I doubt if gcc is
> able to eliminate the task_struct walk (although I didn't check).
> 
> The function isn't oom-killer-specific at all - give it a better name
> then move it to mempolicy.c or similar?  If so, the text "oom"
> shouldn't appear in the comments.
> 

It's the only place where we want to filter tasks based on whether they 
share mempolicy nodes or cpuset mems, though, so I think it's 
appropriately placed in mm/oom_kill.c.  I agree that we can add a
#ifndef CONFIG_NUMA variant and I'll do so, thanks.

> >
> > ...
> >
> > @@ -676,24 +699,19 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> >  	 */
> >  	constraint = constrained_alloc(zonelist, gfp_mask, nodemask);
> >  	read_lock(&tasklist_lock);
> > -
> > -	switch (constraint) {
> > -	case CONSTRAINT_MEMORY_POLICY:
> > -		oom_kill_process(current, gfp_mask, order, 0, NULL,
> > -				"No available memory (MPOL_BIND)");
> > -		break;
> > -
> > -	case CONSTRAINT_NONE:
> > -		if (sysctl_panic_on_oom) {
> > +	if (unlikely(sysctl_panic_on_oom)) {
> > +		/*
> > +		 * panic_on_oom only affects CONSTRAINT_NONE, the kernel
> > +		 * should not panic for cpuset or mempolicy induced memory
> > +		 * failures.
> > +		 */
> 
> This wasn't changelogged?
> 

It's not a functional change, sysctl_panic_on_oom == 2 is already handled 
earlier in the function.  This was intended to elaborate on why we're only 
concerned about CONSTRAINT_NONE here since the switch statement was 
removed.

> > +		if (constraint == CONSTRAINT_NONE) {
> >  			dump_header(NULL, gfp_mask, order, NULL);
> > -			panic("out of memory. panic_on_oom is selected\n");
> > +			read_unlock(&tasklist_lock);
> > +			panic("Out of memory: panic_on_oom is enabled\n");
> >  		}
> > -		/* Fall-through */
> > -	case CONSTRAINT_CPUSET:
> > -		__out_of_memory(gfp_mask, order);
> > -		break;
> >  	}
> > -
> > +	__out_of_memory(gfp_mask, order, constraint, nodemask);
> >  	read_unlock(&tasklist_lock);
> >  
> >  	/*
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-06-09  0:46 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-06 22:33 [patch 00/18] oom killer rewrite David Rientjes
2010-06-06 22:34 ` [patch 01/18] oom: check PF_KTHREAD instead of !mm to skip kthreads David Rientjes
2010-06-07 12:12   ` Balbir Singh
2010-06-07 19:50     ` David Rientjes
2010-06-08 19:33   ` Andrew Morton
2010-06-08 23:40     ` David Rientjes
2010-06-08 23:52       ` Andrew Morton
2010-06-06 22:34 ` [patch 02/18] oom: introduce find_lock_task_mm() to fix !mm false positives David Rientjes
2010-06-07 12:58   ` Balbir Singh
2010-06-07 13:49     ` Minchan Kim
2010-06-07 19:49       ` David Rientjes
2010-06-08 19:42   ` Andrew Morton
2010-06-08 20:14     ` Oleg Nesterov
2010-06-08 20:17       ` Oleg Nesterov
2010-06-08 21:34         ` Andrew Morton
2010-06-08 23:50     ` David Rientjes
2010-06-06 22:34 ` [patch 03/18] oom: dump_tasks use find_lock_task_mm too David Rientjes
2010-06-08 19:55   ` Andrew Morton
2010-06-09  0:06     ` David Rientjes
2010-06-06 22:34 ` [patch 04/18] oom: PF_EXITING check should take mm into account David Rientjes
2010-06-08 20:00   ` Andrew Morton
2010-06-06 22:34 ` [patch 05/18] oom: give current access to memory reserves if it has been killed David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:47     ` David Rientjes
2010-06-14 11:08       ` KOSAKI Motohiro
2010-06-08 20:12     ` Andrew Morton
2010-06-13 11:24       ` KOSAKI Motohiro
2010-06-08 20:08   ` Andrew Morton
2010-06-09  0:14     ` David Rientjes
2010-06-06 22:34 ` [patch 06/18] oom: avoid sending exiting tasks a SIGKILL David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:48     ` David Rientjes
2010-06-08 20:17   ` Andrew Morton
2010-06-08 20:26   ` Oleg Nesterov
2010-06-09  6:32     ` David Rientjes
2010-06-09 16:25       ` Oleg Nesterov
2010-06-09 19:44         ` David Rientjes
2010-06-09 20:14           ` Oleg Nesterov
2010-06-10  0:15             ` KAMEZAWA Hiroyuki
2010-06-10  1:21               ` Oleg Nesterov
2010-06-10  1:43                 ` KAMEZAWA Hiroyuki
2010-06-10  1:51                   ` Oleg Nesterov
2010-06-06 22:34 ` [patch 07/18] oom: filter tasks not sharing the same cpuset David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:51     ` David Rientjes
2010-06-08 19:27       ` Andrew Morton
2010-06-13 11:24         ` KOSAKI Motohiro
2010-07-02 22:35           ` Andrew Morton
2010-07-04 22:08             ` David Rientjes
2010-07-09  3:00             ` KOSAKI Motohiro
2010-06-08 20:23   ` Andrew Morton
2010-06-09  0:25     ` David Rientjes
2010-06-06 22:34 ` [patch 08/18] oom: sacrifice child with highest badness score for parent David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 18:53     ` David Rientjes
2010-06-08 20:33   ` Andrew Morton
2010-06-09  0:30     ` David Rientjes
2010-06-06 22:34 ` [patch 09/18] oom: select task from tasklist for mempolicy ooms David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 21:08   ` Andrew Morton
2010-06-08 21:17     ` Oleg Nesterov
2010-06-09  0:46     ` David Rientjes [this message]
2010-06-08 23:43   ` Andrew Morton
2010-06-09  0:40     ` David Rientjes
2010-06-06 22:34 ` [patch 10/18] oom: enable oom tasklist dump by default David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 18:56     ` David Rientjes
2010-06-08 21:13   ` Andrew Morton
2010-06-09  0:52     ` David Rientjes
2010-06-06 22:34 ` [patch 11/18] oom: avoid oom killer for lowmem allocations David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 21:19   ` Andrew Morton
2010-06-06 22:34 ` [patch 12/18] oom: extract panic helper function David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-06 22:34 ` [patch 13/18] oom: remove special handling for pagefault ooms David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 18:57     ` David Rientjes
2010-06-08 21:27   ` Andrew Morton
2010-06-06 22:34 ` [patch 14/18] oom: move sysctl declarations to oom.h David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-06 22:34 ` [patch 15/18] oom: remove unnecessary code and cleanup David Rientjes
2010-06-06 22:34 ` [patch 16/18] oom: badness heuristic rewrite David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 23:02     ` Andrew Morton
2010-06-13 11:24       ` KOSAKI Motohiro
2010-06-17  5:14       ` David Rientjes
2010-06-21 11:45         ` KOSAKI Motohiro
2010-06-21 20:47           ` David Rientjes
2010-06-30  9:26             ` KOSAKI Motohiro
2010-06-17  5:12     ` David Rientjes
2010-06-21 11:45       ` KOSAKI Motohiro
2010-06-08 22:58   ` Andrew Morton
2010-06-17  5:32     ` David Rientjes
2010-06-06 22:34 ` [patch 17/18] oom: add forkbomb penalty to badness heuristic David Rientjes
2010-06-08 11:41   ` KOSAKI Motohiro
2010-06-08 23:15   ` Andrew Morton
2010-06-06 22:35 ` [patch 18/18] oom: deprecate oom_adj tunable David Rientjes
2010-06-08 11:42   ` KOSAKI Motohiro
2010-06-08 19:00     ` David Rientjes
2010-06-08 23:18     ` Andrew Morton
2010-06-13 11:24       ` KOSAKI Motohiro
2010-06-17  3:36         ` David Rientjes
2010-06-21 11:45           ` KOSAKI Motohiro
2010-06-21 20:54             ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1006081741150.19582@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.