All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Dave Jones <davej@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch v2] mm, oom: normalize oom scores to oom_score_adj scale only for userspace
Date: Wed, 23 May 2012 23:02:10 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1205232259040.15547@chino.kir.corp.google.com> (raw)
In-Reply-To: <20120523153718.b70bb762.akpm@linux-foundation.org>

On Wed, 23 May 2012, Andrew Morton wrote:

> > @@ -367,12 +354,13 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
> >  		}
> >  
> >  		points = oom_badness(p, memcg, nodemask, totalpages);
> > -		if (points > *ppoints) {
> > +		if (points > chosen_points) {
> >  			chosen = p;
> > -			*ppoints = points;
> > +			chosen_points = points;
> >  		}
> >  	} while_each_thread(g, p);
> >  
> > +	*ppoints = chosen_points * 1000 / totalpages;
> >  	return chosen;
> >  }
> >  
> 
> It's still not obvious that we always avoid the divide-by-zero here. 
> If there's some weird way of convincing constrained_alloc() to look at
> an empty nodemask, or a nodemask which covers only empty nodes then
> blam.
> 
> Now, it's probably the case that this is a can't-happen but that
> guarantee would be pretty convoluted and fragile?
> 

It can only happen for memcg with a zero limit, something I tried to 
prevent by not allowing tasks to be attached to the memcgs with such a 
limit in a different patch but you didn't like that :)

So I fixed it in this patch with this:

@@ -572,7 +560,7 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	}
 
 	check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, order, NULL);
-	limit = mem_cgroup_get_limit(memcg) >> PAGE_SHIFT;
+	limit = mem_cgroup_get_limit(memcg) >> PAGE_SHIFT ? : 1;
 	read_lock(&tasklist_lock);
 	p = select_bad_process(&points, limit, memcg, NULL, false);
 	if (p && PTR_ERR(p) != -1UL)

Cpusets do not allow threads to be attached without a set of mems or the 
final mem in a cpuset to be removed while tasks are still attached.  The 
page allocator certainly wouldn't be calling the oom killer for a set of 
zones that span no pages.

Any suggestion on where to put the check for !totalpages so it's easier to 
understand?

WARNING: multiple messages have this Message-ID (diff)
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Dave Jones <davej@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch v2] mm, oom: normalize oom scores to oom_score_adj scale only for userspace
Date: Wed, 23 May 2012 23:02:10 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1205232259040.15547@chino.kir.corp.google.com> (raw)
In-Reply-To: <20120523153718.b70bb762.akpm@linux-foundation.org>

On Wed, 23 May 2012, Andrew Morton wrote:

> > @@ -367,12 +354,13 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
> >  		}
> >  
> >  		points = oom_badness(p, memcg, nodemask, totalpages);
> > -		if (points > *ppoints) {
> > +		if (points > chosen_points) {
> >  			chosen = p;
> > -			*ppoints = points;
> > +			chosen_points = points;
> >  		}
> >  	} while_each_thread(g, p);
> >  
> > +	*ppoints = chosen_points * 1000 / totalpages;
> >  	return chosen;
> >  }
> >  
> 
> It's still not obvious that we always avoid the divide-by-zero here. 
> If there's some weird way of convincing constrained_alloc() to look at
> an empty nodemask, or a nodemask which covers only empty nodes then
> blam.
> 
> Now, it's probably the case that this is a can't-happen but that
> guarantee would be pretty convoluted and fragile?
> 

It can only happen for memcg with a zero limit, something I tried to 
prevent by not allowing tasks to be attached to the memcgs with such a 
limit in a different patch but you didn't like that :)

So I fixed it in this patch with this:

@@ -572,7 +560,7 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	}
 
 	check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, order, NULL);
-	limit = mem_cgroup_get_limit(memcg) >> PAGE_SHIFT;
+	limit = mem_cgroup_get_limit(memcg) >> PAGE_SHIFT ? : 1;
 	read_lock(&tasklist_lock);
 	p = select_bad_process(&points, limit, memcg, NULL, false);
 	if (p && PTR_ERR(p) != -1UL)

Cpusets do not allow threads to be attached without a set of mems or the 
final mem in a cpuset to be removed while tasks are still attached.  The 
page allocator certainly wouldn't be calling the oom killer for a set of 
zones that span no pages.

Any suggestion on where to put the check for !totalpages so it's easier to 
understand?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-05-24  6:02 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-26 19:35 3.4-rc4 oom killer out of control Dave Jones
2012-04-26 19:35 ` Dave Jones
2012-04-26 20:53 ` Dave Jones
2012-04-26 20:53   ` Dave Jones
2012-04-26 22:30   ` David Rientjes
2012-04-26 22:30     ` David Rientjes
2012-04-26 21:40 ` David Rientjes
2012-04-26 21:40   ` David Rientjes
2012-04-26 21:52   ` Dave Jones
2012-04-26 21:52     ` Dave Jones
2012-04-26 22:20     ` David Rientjes
2012-04-26 22:20       ` David Rientjes
2012-04-26 22:44       ` Dave Jones
2012-04-26 22:44         ` Dave Jones
2012-04-26 22:49         ` David Rientjes
2012-04-26 22:49           ` David Rientjes
2012-04-26 22:54           ` Dave Jones
2012-04-26 22:54             ` Dave Jones
2012-04-27  0:54         ` Steven Rostedt
2012-04-27  0:54           ` Steven Rostedt
2012-04-27  2:02           ` Dave Jones
2012-04-27  2:02             ` Dave Jones
2012-05-03 22:14   ` David Rientjes
2012-05-03 22:14     ` David Rientjes
2012-05-03 22:29     ` Dave Jones
2012-05-03 22:29       ` Dave Jones
2012-05-17 21:33       ` [patch] mm, oom: normalize oom scores to oom_score_adj scale only for userspace David Rientjes
2012-05-17 21:33         ` David Rientjes
2012-05-17 21:50         ` Andrew Morton
2012-05-17 21:50           ` Andrew Morton
2012-05-23  7:15           ` [patch v2] " David Rientjes
2012-05-23  7:15             ` David Rientjes
2012-05-23 22:37             ` Andrew Morton
2012-05-23 22:37               ` Andrew Morton
2012-05-24  6:02               ` David Rientjes [this message]
2012-05-24  6:02                 ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1205232259040.15547@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.