All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Dave Jones <davej@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch v2] mm, oom: normalize oom scores to oom_score_adj scale only for userspace
Date: Wed, 23 May 2012 15:37:18 -0700	[thread overview]
Message-ID: <20120523153718.b70bb762.akpm@linux-foundation.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1205230014450.9290@chino.kir.corp.google.com>

On Wed, 23 May 2012 00:15:03 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> The oom_score_adj scale ranges from -1000 to 1000 and represents the
> proportion of memory available to the process at allocation time.  This
> means an oom_score_adj value of 300, for example, will bias a process as
> though it was using an extra 30.0% of available memory and a value of
> -350 will discount 35.0% of available memory from its usage.
> 
> The oom killer badness heuristic also uses this scale to report the oom
> score for each eligible process in determining the "best" process to
> kill.  Thus, it can only differentiate each process's memory usage by
> 0.1% of system RAM.
> 
> On large systems, this can end up being a large amount of memory: 256MB
> on 256GB systems, for example.
> 
> This can be fixed by having the badness heuristic to use the actual
> memory usage in scoring threads and then normalizing it to the
> oom_score_adj scale for userspace.  This results in better comparison
> between eligible threads for kill and no change from the userspace
> perspective.
> 
> ...
>
> @@ -367,12 +354,13 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  		}
>  
>  		points = oom_badness(p, memcg, nodemask, totalpages);
> -		if (points > *ppoints) {
> +		if (points > chosen_points) {
>  			chosen = p;
> -			*ppoints = points;
> +			chosen_points = points;
>  		}
>  	} while_each_thread(g, p);
>  
> +	*ppoints = chosen_points * 1000 / totalpages;
>  	return chosen;
>  }
>  

It's still not obvious that we always avoid the divide-by-zero here. 
If there's some weird way of convincing constrained_alloc() to look at
an empty nodemask, or a nodemask which covers only empty nodes then
blam.

Now, it's probably the case that this is a can't-happen but that
guarantee would be pretty convoluted and fragile?


WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Dave Jones <davej@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch v2] mm, oom: normalize oom scores to oom_score_adj scale only for userspace
Date: Wed, 23 May 2012 15:37:18 -0700	[thread overview]
Message-ID: <20120523153718.b70bb762.akpm@linux-foundation.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1205230014450.9290@chino.kir.corp.google.com>

On Wed, 23 May 2012 00:15:03 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> The oom_score_adj scale ranges from -1000 to 1000 and represents the
> proportion of memory available to the process at allocation time.  This
> means an oom_score_adj value of 300, for example, will bias a process as
> though it was using an extra 30.0% of available memory and a value of
> -350 will discount 35.0% of available memory from its usage.
> 
> The oom killer badness heuristic also uses this scale to report the oom
> score for each eligible process in determining the "best" process to
> kill.  Thus, it can only differentiate each process's memory usage by
> 0.1% of system RAM.
> 
> On large systems, this can end up being a large amount of memory: 256MB
> on 256GB systems, for example.
> 
> This can be fixed by having the badness heuristic to use the actual
> memory usage in scoring threads and then normalizing it to the
> oom_score_adj scale for userspace.  This results in better comparison
> between eligible threads for kill and no change from the userspace
> perspective.
> 
> ...
>
> @@ -367,12 +354,13 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  		}
>  
>  		points = oom_badness(p, memcg, nodemask, totalpages);
> -		if (points > *ppoints) {
> +		if (points > chosen_points) {
>  			chosen = p;
> -			*ppoints = points;
> +			chosen_points = points;
>  		}
>  	} while_each_thread(g, p);
>  
> +	*ppoints = chosen_points * 1000 / totalpages;
>  	return chosen;
>  }
>  

It's still not obvious that we always avoid the divide-by-zero here. 
If there's some weird way of convincing constrained_alloc() to look at
an empty nodemask, or a nodemask which covers only empty nodes then
blam.

Now, it's probably the case that this is a can't-happen but that
guarantee would be pretty convoluted and fragile?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-05-23 22:37 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-26 19:35 3.4-rc4 oom killer out of control Dave Jones
2012-04-26 19:35 ` Dave Jones
2012-04-26 20:53 ` Dave Jones
2012-04-26 20:53   ` Dave Jones
2012-04-26 22:30   ` David Rientjes
2012-04-26 22:30     ` David Rientjes
2012-04-26 21:40 ` David Rientjes
2012-04-26 21:40   ` David Rientjes
2012-04-26 21:52   ` Dave Jones
2012-04-26 21:52     ` Dave Jones
2012-04-26 22:20     ` David Rientjes
2012-04-26 22:20       ` David Rientjes
2012-04-26 22:44       ` Dave Jones
2012-04-26 22:44         ` Dave Jones
2012-04-26 22:49         ` David Rientjes
2012-04-26 22:49           ` David Rientjes
2012-04-26 22:54           ` Dave Jones
2012-04-26 22:54             ` Dave Jones
2012-04-27  0:54         ` Steven Rostedt
2012-04-27  0:54           ` Steven Rostedt
2012-04-27  2:02           ` Dave Jones
2012-04-27  2:02             ` Dave Jones
2012-05-03 22:14   ` David Rientjes
2012-05-03 22:14     ` David Rientjes
2012-05-03 22:29     ` Dave Jones
2012-05-03 22:29       ` Dave Jones
2012-05-17 21:33       ` [patch] mm, oom: normalize oom scores to oom_score_adj scale only for userspace David Rientjes
2012-05-17 21:33         ` David Rientjes
2012-05-17 21:50         ` Andrew Morton
2012-05-17 21:50           ` Andrew Morton
2012-05-23  7:15           ` [patch v2] " David Rientjes
2012-05-23  7:15             ` David Rientjes
2012-05-23 22:37             ` Andrew Morton [this message]
2012-05-23 22:37               ` Andrew Morton
2012-05-24  6:02               ` David Rientjes
2012-05-24  6:02                 ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120523153718.b70bb762.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.