[PATCH]oom-kill: direct hardware access processes should get bonus

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH]oom-kill: direct hardware access processes should get bonus
@ 2010-11-02  1:43 Figo.zhang
  2010-11-02  3:10 ` David Rientjes
                   ` (3 more replies)
  0 siblings, 4 replies; 77+ messages in thread
From: Figo.zhang @ 2010-11-02  1:43 UTC (permalink / raw)
  To: lkml, linux-mm, Andrew Morton, rientjes


the victim should not directly access hardware devices like Xorg server,
because the hardware could be left in an unpredictable state, although 
user-application can set /proc/pid/oom_score_adj to protect it. so i think
those processes should get 3% bonus for protection.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
---
mm/oom_kill.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4029583..df6a9da 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -195,10 +195,12 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	task_unlock(p);
 
 	/*
-	 * Root processes get 3% bonus, just like the __vm_enough_memory()
-	 * implementation used by LSMs.
+	 * Root and direct hardware access processes get 3% bonus, just like the
+	 * __vm_enough_memory() implementation used by LSMs.
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
+	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
+	    has_capability_noaudit(p, CAP_SYS_RAWIO))
 		points -= 30;
 
 	/*



^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH]oom-kill: direct hardware access processes should get bonus
  2010-11-02  1:43 [PATCH]oom-kill: direct hardware access processes should get bonus Figo.zhang
@ 2010-11-02  3:10 ` David Rientjes
  2010-11-02 14:24   ` Figo.zhang
  2010-11-03 23:43 ` [PATCH v2]oom-kill: CAP_SYS_RESOURCE " Figo.zhang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-02  3:10 UTC (permalink / raw)
  To: Figo.zhang; +Cc: lkml, linux-mm, Andrew Morton

On Tue, 2 Nov 2010, Figo.zhang wrote:

> the victim should not directly access hardware devices like Xorg server,
> because the hardware could be left in an unpredictable state, although 
> user-application can set /proc/pid/oom_score_adj to protect it. so i think
> those processes should get 3% bonus for protection.
> 

Which applications are you referring to that cannot gracefully exit if 
killed?

> Signed-off-by: Figo.zhang <figo1802@gmail.com>
> ---
> mm/oom_kill.c |    8 +++++---
>  1 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4029583..df6a9da 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -195,10 +195,12 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>  	task_unlock(p);
>  
>  	/*
> -	 * Root processes get 3% bonus, just like the __vm_enough_memory()
> -	 * implementation used by LSMs.
> +	 * Root and direct hardware access processes get 3% bonus, just like the
> +	 * __vm_enough_memory() implementation used by LSMs.

LSM's have this bonus for CAP_SYS_ADMIN, but not for CAP_SYS_RAWIO, so 
this comment is incorrect.

>  	 */
> -	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
>  		points -= 30;
>  
>  	/*

CAP_SYS_RAWIO had a much more dramatic impact in the previous heuristic to 
such a point that it would often allow memory hogging tasks to elude the 
oom killer at the expense of innocent tasks.  I'm not sure this is the 
best way to go.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH]oom-kill: direct hardware access processes should get bonus
  2010-11-02  3:10 ` David Rientjes
@ 2010-11-02 14:24   ` Figo.zhang
  2010-11-02 19:34     ` David Rientjes
  0 siblings, 1 reply; 77+ messages in thread
From: Figo.zhang @ 2010-11-02 14:24 UTC (permalink / raw)
  To: David Rientjes; +Cc: lkml, linux-mm, Andrew Morton


> 
> Which applications are you referring to that cannot gracefully exit if 
> killed?

like Xorg server, if xorg server be killed, the gnome desktop will be
crashed.


> 
> CAP_SYS_RAWIO had a much more dramatic impact in the previous heuristic to 
> such a point that it would often allow memory hogging tasks to elude the 
> oom killer at the expense of innocent tasks.  I'm not sure this is the 
> best way to go.

is it some experiments for demonstration the  CAP_SYS_RAWIO will elude
the oom killer?





^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH]oom-kill: direct hardware access processes should get bonus
  2010-11-02 14:24   ` Figo.zhang
@ 2010-11-02 19:34     ` David Rientjes
  0 siblings, 0 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-02 19:34 UTC (permalink / raw)
  To: Figo.zhang; +Cc: lkml, linux-mm, Andrew Morton

On Tue, 2 Nov 2010, Figo.zhang wrote:

> > Which applications are you referring to that cannot gracefully exit if 
> > killed?
> 
> like Xorg server, if xorg server be killed, the gnome desktop will be
> crashed.
> 

Right, but you didn't explicitly prohibit such applications from being 
killed, so that suggests that doing so may be inconvenient but doesn't 
incur something like corruption or data loss, which is what I would 
consider "unstable" or "inconsistent" state.

We're trying to avoid any additional heuristics from being introduced for 
specific usecases, even for Xorg.  That ensures that the heuristic remains 
as predictable as possible and frees a large amount of memory.  If Xorg is 
being killed first instead of a true memory hogger, then it seems like a 
forkbomb scenario instead; could you please post your kernel log so that 
we can diagnose that issue seperately?

> > CAP_SYS_RAWIO had a much more dramatic impact in the previous heuristic to 
> > such a point that it would often allow memory hogging tasks to elude the 
> > oom killer at the expense of innocent tasks.  I'm not sure this is the 
> > best way to go.
> 
> is it some experiments for demonstration the  CAP_SYS_RAWIO will elude
> the oom killer?
> 

The old heuristic would allow it to elude the oom killer because it would 
divide the score by four if a task had the capability, which is a much 
more drastic "bonus" than you suggest here.  That would reduce the score 
for the memory hogging task significantly enough that we killed tons of 
innocent tasks instead before eventually killing the task that was leaking 
memory but failed to be identified because it had CAP_SYS_RAWIO.  I'm 
trying to avoid any such repeats.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-02  1:43 [PATCH]oom-kill: direct hardware access processes should get bonus Figo.zhang
  2010-11-02  3:10 ` David Rientjes
@ 2010-11-03 23:43 ` Figo.zhang
  2010-11-03 23:47   ` David Rientjes
  2010-11-09 10:41 ` [PATCH]oom-kill: direct hardware access processes " KOSAKI Motohiro
  2010-11-09 12:24 ` [PATCH v2]mm/oom-kill: " Figo.zhang
  3 siblings, 1 reply; 77+ messages in thread
From: Figo.zhang @ 2010-11-03 23:43 UTC (permalink / raw)
  To: lkml; +Cc: linux-mm, Andrew Morton, rientjes


CAP_SYS_RESOURCE also had better get 3% bonus for protection.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
--- 
mm/oom_kill.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4029583..30b24b9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -198,7 +198,8 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 * Root processes get 3% bonus, just like the __vm_enough_memory()
 	 * implementation used by LSMs.
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
+	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
 		points -= 30;
 
 	/*



^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-03 23:43 ` [PATCH v2]oom-kill: CAP_SYS_RESOURCE " Figo.zhang
@ 2010-11-03 23:47   ` David Rientjes
       [not found]     ` <AANLkTimjfmLzr_9+Sf4gk0xGkFjffQ1VcCnwmCXA88R8@mail.gmail.com>
  0 siblings, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-03 23:47 UTC (permalink / raw)
  To: Figo.zhang; +Cc: lkml, linux-mm, Andrew Morton

On Thu, 4 Nov 2010, Figo.zhang wrote:

> CAP_SYS_RESOURCE also had better get 3% bonus for protection.
> 

Would you like to elaborate as to why?

> Signed-off-by: Figo.zhang <figo1802@gmail.com>
> --- 
> mm/oom_kill.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4029583..30b24b9 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -198,7 +198,8 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>  	 * Root processes get 3% bonus, just like the __vm_enough_memory()
>  	 * implementation used by LSMs.
>  	 */
> -	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> +	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
>  		points -= 30;
>  
>  	/*

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re:[PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
       [not found]     ` <AANLkTimjfmLzr_9+Sf4gk0xGkFjffQ1VcCnwmCXA88R8@mail.gmail.com>
@ 2010-11-04  1:38       ` Figo.zhang
  2010-11-04  1:50         ` David Rientjes
  0 siblings, 1 reply; 77+ messages in thread
From: Figo.zhang @ 2010-11-04  1:38 UTC (permalink / raw)
  To: figo zhang, David Rientjes; +Cc: lkml, linux-mm, Andrew Morton


> 
> 
> 
> On Thu, 4 Nov 2010, Figo.zhang wrote:
> 
> > CAP_SYS_RESOURCE also had better get 3% bonus for protection.
> >
> 
> 
> Would you like to elaborate as to why?
> 
> 

process with CAP_SYS_RESOURCE capibility which have system resource
limits, like journaling resource on ext3/4 filesystem, RTC clock. so it
also the same treatment as process with CAP_SYS_ADMIN.

Best,

Figo.zhang




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re:[PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-04  1:38       ` Figo.zhang
@ 2010-11-04  1:50         ` David Rientjes
  2010-11-04  2:12           ` Figo.zhang
  2010-11-09 11:01           ` [PATCH " KOSAKI Motohiro
  0 siblings, 2 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-04  1:50 UTC (permalink / raw)
  To: Figo.zhang; +Cc: figo zhang, lkml, linux-mm, Andrew Morton

On Thu, 4 Nov 2010, Figo.zhang wrote:

> > > CAP_SYS_RESOURCE also had better get 3% bonus for protection.
> > >
> > 
> > 
> > Would you like to elaborate as to why?
> > 
> > 
> 
> process with CAP_SYS_RESOURCE capibility which have system resource
> limits, like journaling resource on ext3/4 filesystem, RTC clock. so it
> also the same treatment as process with CAP_SYS_ADMIN.
> 

NACK, there's no justification that these tasks should be given a 3% 
memory bonus in the oom killer heuristic; in fact, since they can allocate 
without limits it is more important to target these tasks if they are 
using an egregious amount of memory.  CAP_SYS_RESOURCE threads have the 
ability to lower their own oom_score_adj values, thus, they should protect 
themselves if necessary like everything else.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Re:[PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-04  1:50         ` David Rientjes
@ 2010-11-04  2:12           ` Figo.zhang
  2010-11-04  2:54             ` David Rientjes
  2010-11-09 11:01           ` [PATCH " KOSAKI Motohiro
  1 sibling, 1 reply; 77+ messages in thread
From: Figo.zhang @ 2010-11-04  2:12 UTC (permalink / raw)
  To: David Rientjes; +Cc: figo zhang, lkml, linux-mm, Andrew Morton

On Wed, 2010-11-03 at 18:50 -0700, David Rientjes wrote:
> On Thu, 4 Nov 2010, Figo.zhang wrote:
> 
> > > > CAP_SYS_RESOURCE also had better get 3% bonus for protection.
> > > >
> > > 
> > > 
> > > Would you like to elaborate as to why?
> > > 
> > > 
> > 
> > process with CAP_SYS_RESOURCE capibility which have system resource
> > limits, like journaling resource on ext3/4 filesystem, RTC clock. so it
> > also the same treatment as process with CAP_SYS_ADMIN.
> > 
> 
> NACK, there's no justification that these tasks should be given a 3% 
> memory bonus in the oom killer heuristic; in fact, since they can allocate 
> without limits it is more important to target these tasks if they are 
> using an egregious amount of memory.  CAP_SYS_RESOURCE threads have the 
> ability to lower their own oom_score_adj values, thus, they should protect 
> themselves if necessary like everything else.

In your new heuristic, you also get CAP_SYS_RESOURCE to protection.
see fs/proc/base.c, line 1167:
	if (oom_score_adj < task->signal->oom_score_adj &&
			!capable(CAP_SYS_RESOURCE)) {
		err = -EACCES;
		goto err_sighand;
	}

so i want to protect some process like normal process not
CAP_SYS_RESOUCE, i set a small oom_score_adj , if new oom_score_adj is
small than now and it is not limited resource, it will not adjust, that
seems not right?






^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Re:[PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-04  2:12           ` Figo.zhang
@ 2010-11-04  2:54             ` David Rientjes
  2010-11-04  4:42               ` Figo.zhang
  0 siblings, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-04  2:54 UTC (permalink / raw)
  To: Figo.zhang; +Cc: figo zhang, lkml, linux-mm, Andrew Morton

On Thu, 4 Nov 2010, Figo.zhang wrote:

> In your new heuristic, you also get CAP_SYS_RESOURCE to protection.
> see fs/proc/base.c, line 1167:
> 	if (oom_score_adj < task->signal->oom_score_adj &&
> 			!capable(CAP_SYS_RESOURCE)) {
> 		err = -EACCES;
> 		goto err_sighand;
> 	}

That's unchanged from the old behavior with oom_adj.

> so i want to protect some process like normal process not
> CAP_SYS_RESOUCE, i set a small oom_score_adj , if new oom_score_adj is
> small than now and it is not limited resource, it will not adjust, that
> seems not right?
> 

Tasks without CAP_SYS_RESOURCE cannot lower their own oom_score_adj, 
otherwise it can trivially kill other tasks.  They can, however, increase 
their own oom_score_adj so the oom killer prefers to kill it first.

I think you may be confused: CAP_SYS_RESOURCE override resource limits.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Re:[PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-04  2:54             ` David Rientjes
@ 2010-11-04  4:42               ` Figo.zhang
  2010-11-04  5:08                 ` David Rientjes
  0 siblings, 1 reply; 77+ messages in thread
From: Figo.zhang @ 2010-11-04  4:42 UTC (permalink / raw)
  To: David Rientjes; +Cc: figo zhang, lkml, linux-mm, Andrew Morton

On Wed, 2010-11-03 at 19:54 -0700, David Rientjes wrote:
> On Thu, 4 Nov 2010, Figo.zhang wrote:
> 
> > In your new heuristic, you also get CAP_SYS_RESOURCE to protection.
> > see fs/proc/base.c, line 1167:
> > 	if (oom_score_adj < task->signal->oom_score_adj &&
> > 			!capable(CAP_SYS_RESOURCE)) {
> > 		err = -EACCES;
> > 		goto err_sighand;
> > 	}
> 
> That's unchanged from the old behavior with oom_adj.
> 
> > so i want to protect some process like normal process not
> > CAP_SYS_RESOUCE, i set a small oom_score_adj , if new oom_score_adj is
> > small than now and it is not limited resource, it will not adjust, that
> > seems not right?
> > 
> 
> Tasks without CAP_SYS_RESOURCE cannot lower their own oom_score_adj, 

CAP_SYS_RESOURCE == 1 means without resource limits just like a
superuser,
CAP_SYS_RESOURCE == 0 means hold resource limits, like normal user,
right?

a new lower oom_score_adj will protect the process, right?

Tasks without CAP_SYS_RESOURCE, means that it is not a superuser, why
user canot protect it by oom_score_adj?

like i want to protect my program such as gnome-terminal which is
without CAP_SYS_RESOURCE (have resource limits), 

[figo@myhost ~]$ ps -ax | grep gnome-ter
Warning: bad ps syntax, perhaps a bogus '-'? See
http://procps.sf.net/faq.html
 2280 ?        Sl     0:01 gnome-terminal
 8839 pts/0    S+     0:00 grep gnome-ter
[figo@myhost ~]$ cat /proc/2280/oom_adj 
3
[figo@myhost ~]$ echo -17 >  /proc/2280/oom_adj 
bash: echo: write error: Permission denied
[figo@myhost ~]$ 

so, i canot protect my program.


> otherwise it can trivially kill other tasks.  They can, however, increase 
> their own oom_score_adj so the oom killer prefers to kill it first.
> 
> I think you may be confused: CAP_SYS_RESOURCE override resource limits.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Re:[PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-04  4:42               ` Figo.zhang
@ 2010-11-04  5:08                 ` David Rientjes
  0 siblings, 0 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-04  5:08 UTC (permalink / raw)
  To: Figo.zhang; +Cc: figo zhang, lkml, linux-mm, Andrew Morton

On Thu, 4 Nov 2010, Figo.zhang wrote:

> CAP_SYS_RESOURCE == 1 means without resource limits just like a
> superuser,
> CAP_SYS_RESOURCE == 0 means hold resource limits, like normal user,
> right?
> 

Yes.

> a new lower oom_score_adj will protect the process, right?
> 

Yes.

> Tasks without CAP_SYS_RESOURCE, means that it is not a superuser, why
> user canot protect it by oom_score_adj?
> 

Because, as I said, it would be trivial for a user program to deplete all 
memory (either intentionally or unintentioally) and cause every other task 
on the system to be oom killed as a result.  That's an undesired result of 
a blatently obvious DoS.

> like i want to protect my program such as gnome-terminal which is
> without CAP_SYS_RESOURCE (have resource limits), 
> 
> [figo@myhost ~]$ ps -ax | grep gnome-ter
> Warning: bad ps syntax, perhaps a bogus '-'? See
> http://procps.sf.net/faq.html
>  2280 ?        Sl     0:01 gnome-terminal
>  8839 pts/0    S+     0:00 grep gnome-ter
> [figo@myhost ~]$ cat /proc/2280/oom_adj 
> 3
> [figo@myhost ~]$ echo -17 >  /proc/2280/oom_adj 
> bash: echo: write error: Permission denied
> [figo@myhost ~]$ 
> 
> so, i canot protect my program.
> 

If this is your system, you can either give yourself CAP_SYS_RESOURCE or 
do it through the superuser.  This isn't exactly new, it's been the case 
for the past four years.

I'm still struggling to find out the problem that you're trying to address 
with your various patches, perhaps because you haven't said what it is.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH]oom-kill: direct hardware access processes should get bonus
  2010-11-02  1:43 [PATCH]oom-kill: direct hardware access processes should get bonus Figo.zhang
  2010-11-02  3:10 ` David Rientjes
  2010-11-03 23:43 ` [PATCH v2]oom-kill: CAP_SYS_RESOURCE " Figo.zhang
@ 2010-11-09 10:41 ` KOSAKI Motohiro
  2010-11-09 12:24 ` [PATCH v2]mm/oom-kill: " Figo.zhang
  3 siblings, 0 replies; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-09 10:41 UTC (permalink / raw)
  To: Figo.zhang; +Cc: kosaki.motohiro, lkml, linux-mm, Andrew Morton, rientjes

> 
> the victim should not directly access hardware devices like Xorg server,
> because the hardware could be left in an unpredictable state, although 
> user-application can set /proc/pid/oom_score_adj to protect it. so i think
> those processes should get 3% bonus for protection.
> 
> Signed-off-by: Figo.zhang <figo1802@gmail.com>

I was surprised this issue is still there. This was pointed out half year 
ago already :-/


> ---
> mm/oom_kill.c |    8 +++++---
>  1 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4029583..df6a9da 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -195,10 +195,12 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>  	task_unlock(p);
>  
>  	/*
> -	 * Root processes get 3% bonus, just like the __vm_enough_memory()
> -	 * implementation used by LSMs.
> +	 * Root and direct hardware access processes get 3% bonus, just like the
> +	 * __vm_enough_memory() implementation used by LSMs.
>  	 */

This comment is incorrect. LSM is care only CAP_SYS_ADMIN.

> -	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
>  		points -= 30;

But yes. OOM need to care both CAP_SYS_RESOURCE and CAP_SYS_RAWIO.

Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>





^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-04  1:50         ` David Rientjes
  2010-11-04  2:12           ` Figo.zhang
@ 2010-11-09 11:01           ` KOSAKI Motohiro
  2010-11-09 12:24             ` Alan Cox
  1 sibling, 1 reply; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-09 11:01 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Figo.zhang, figo zhang, lkml, linux-mm, Andrew Morton

> On Thu, 4 Nov 2010, Figo.zhang wrote:
> 
> > > > CAP_SYS_RESOURCE also had better get 3% bonus for protection.
> > > >
> > > 
> > > 
> > > Would you like to elaborate as to why?
> > > 
> > > 
> > 
> > process with CAP_SYS_RESOURCE capibility which have system resource
> > limits, like journaling resource on ext3/4 filesystem, RTC clock. so it
> > also the same treatment as process with CAP_SYS_ADMIN.
> > 
> 
> NACK, there's no justification that these tasks should be given a 3% 
> memory bonus in the oom killer heuristic; in fact, since they can allocate 
> without limits it is more important to target these tasks if they are 
> using an egregious amount of memory.  CAP_SYS_RESOURCE threads have the 
> ability to lower their own oom_score_adj values, thus, they should protect 
> themselves if necessary like everything else.

David, Stupid are YOU. you removed CAP_SYS_RESOURCE condition with ZERO
explanation and Figo reported a regression. That's enough the reason to
undo. YOU have a guilty to explain why do you want to change and why
do you think it has justification.

Don't blame bug reporter. That's completely wrong.





^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-02  1:43 [PATCH]oom-kill: direct hardware access processes should get bonus Figo.zhang
                   ` (2 preceding siblings ...)
  2010-11-09 10:41 ` [PATCH]oom-kill: direct hardware access processes " KOSAKI Motohiro
@ 2010-11-09 12:24 ` Figo.zhang
  2010-11-09 21:16   ` David Rientjes
  2010-11-10 15:14   ` [PATCH v3]mm/oom-kill: " Figo.zhang
  3 siblings, 2 replies; 77+ messages in thread
From: Figo.zhang @ 2010-11-09 12:24 UTC (permalink / raw)
  To: lkml, KOSAKI Motohiro; +Cc: linux-mm, Andrew Morton, rientjes, Linus Torvalds

 
the victim should not directly access hardware devices like Xorg server,
because the hardware could be left in an unpredictable state, although 
user-application can set /proc/pid/oom_score_adj to protect it. so i think
those processes should get 3% bonus for protection.

in v2, fix the incorrect comment.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
mm/oom_kill.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4029583..9b06f56 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -196,9 +196,12 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 
 	/*
 	 * Root processes get 3% bonus, just like the __vm_enough_memory()
-	 * implementation used by LSMs.
+	 * implementation used by LSMs. And direct hardware access processes
+	 * also get 3% bonus.
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
+	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
+	    has_capability_noaudit(p, CAP_SYS_RAWIO))
 		points -= 30;
 
 	/*



^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-09 11:01           ` [PATCH " KOSAKI Motohiro
@ 2010-11-09 12:24             ` Alan Cox
  2010-11-09 21:06               ` David Rientjes
  0 siblings, 1 reply; 77+ messages in thread
From: Alan Cox @ 2010-11-09 12:24 UTC (permalink / raw)
  To: KOSAKI Motohiro, Figo.zhang
  Cc: David Rientjes, figo zhang, lkml, linux-mm, Andrew Morton

> > > process with CAP_SYS_RESOURCE capibility which have system resource
> > > limits, like journaling resource on ext3/4 filesystem, RTC clock. so it
> > > also the same treatment as process with CAP_SYS_ADMIN.
> > > 
> > 
> > NACK, there's no justification that these tasks should be given a 3% 
> > memory bonus in the oom killer heuristic; in fact, since they can allocate 
> > without limits it is more important to target these tasks if they are 
> > using an egregious amount of memory.
> 
> David, Stupid are YOU. you removed CAP_SYS_RESOURCE condition with ZERO
> explanation and Figo reported a regression. That's enough the reason to
> undo. YOU have a guilty to explain why do you want to change and why
> do you think it has justification.
> 
> Don't blame bug reporter. That's completely wrong.

Can people stop throwing things at each other and worry about the facts

- If it's a regression it should get reverted or fixed. But is it
  actually a regression ? Has the underlying behaviour changed in a
  problematic way?

"CAP_SYS_RESOURCE threads have the ability to lower their own oom_score_adj
 values, thus, they should protect themselves if necessary like
 everything else."

The reverse can be argued equally - that they can unprotect themselves if
necessary. In fact it seems to be a "point of view" sort of question
which way you deal with CAP_SYS_RESOURCE, and that to me argues that
changing from old expected behaviour to a new behaviour is a regression.




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-09 12:24             ` Alan Cox
@ 2010-11-09 21:06               ` David Rientjes
  2010-11-09 21:25                 ` David Rientjes
  2010-11-10 14:38                 ` Figo.zhang
  0 siblings, 2 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-09 21:06 UTC (permalink / raw)
  To: Alan Cox
  Cc: KOSAKI Motohiro, Figo.zhang, figo zhang, lkml, linux-mm, Andrew Morton

On Tue, 9 Nov 2010, Alan Cox wrote:

> The reverse can be argued equally - that they can unprotect themselves if
> necessary. In fact it seems to be a "point of view" sort of question
> which way you deal with CAP_SYS_RESOURCE, and that to me argues that
> changing from old expected behaviour to a new behaviour is a regression.
> 

I didn't check earlier, but CAP_SYS_RESOURCE hasn't had a place in the oom 
killer's heuristic in over five years, so what regression are we referring 
to in this thread?  These tasks already have full control over 
oom_score_adj to modify its oom killing priority in either direction.

And, as I said, giving these threads a bonus to be less preferred doesn't 
seem appropriate since (1) it's not a defined or expected behavior of 
CAP_SYS_RESOURCE like it is for sysadmin tasks, and (2) these threads are 
not bound by resource limits and thus have a higher liklihood of consuming 
larger amounts of memory.

That's why I nack'd the patch in the first place and still do, there's no 
regression here and it's not in the best interest of freeing a large 
amount of memory which is the sole purpose of the oom killer.

Futhermore, the heuristic was entirely rewritten, but I wouldn't consider 
all the old factors such as cputime and nice level being removed as 
"regressions" since the aim was to make it more predictable and more 
likely to kill a large consumer of memory such that we don't have to kill 
more tasks in the near future.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-09 12:24 ` [PATCH v2]mm/oom-kill: " Figo.zhang
@ 2010-11-09 21:16   ` David Rientjes
  2010-11-10 14:48     ` Figo.zhang
  2010-11-14  5:07     ` KOSAKI Motohiro
  2010-11-10 15:14   ` [PATCH v3]mm/oom-kill: " Figo.zhang
  1 sibling, 2 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-09 21:16 UTC (permalink / raw)
  To: Figo.zhang; +Cc: lkml, KOSAKI Motohiro, linux-mm, Andrew Morton, Linus Torvalds

On Tue, 9 Nov 2010, Figo.zhang wrote:

>  
> the victim should not directly access hardware devices like Xorg server,
> because the hardware could be left in an unpredictable state, although 
> user-application can set /proc/pid/oom_score_adj to protect it. so i think
> those processes should get 3% bonus for protection.
> 

The logic here is wrong: if killing these tasks can leave hardware in an 
unpredictable state (and that state is presumably harmful), then they 
should be completely immune from oom killing since you're still leaving 
them exposed here to be killed.

So the question that needs to be answered is: why do these threads deserve 
to use 3% more memory (not >4%) than others without getting killed?  If 
there was some evidence that these threads have a certain quantity of 
memory they require as a fundamental attribute of CAP_SYS_RAWIO, then I 
have no objection, but that's going to be expressed in a memory quantity 
not a percentage as you have here.

The CAP_SYS_ADMIN heuristic has a background: it is used in the oom killer 
because we have used the same 3% in __vm_enough_memory() for a long time 
and we want consistency amongst the heuristics.  Adding additional bonuses 
with arbitrary values like 3% of memory for things like CAP_SYS_RAWIO 
makes the heuristic less predictable and moves us back toward the old 
heuristic which was almost entirely arbitrary.

Now before KOSAKI-san comes out and says the old heuristic considered 
CAP_SYS_RAWIO and the new one does not so it _must_ be a regression: the 
old heuristic also divided the badness score by 4 for that capability as a 
completely arbitrary value (just like 3% is here).  Other traits like 
runtime and nice levels were also removed from the heuristic.  What needs 
to be shown is that CAP_SYS_RAWIO requires additional memory just to run 
or we should neglect to free 3% of memory, which could be gigabytes, 
because it has this trait.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-09 21:06               ` David Rientjes
@ 2010-11-09 21:25                 ` David Rientjes
  2010-11-10 14:38                 ` Figo.zhang
  1 sibling, 0 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-09 21:25 UTC (permalink / raw)
  To: Alan Cox
  Cc: KOSAKI Motohiro, Figo.zhang, figo zhang, lkml, linux-mm, Andrew Morton

On Tue, 9 Nov 2010, David Rientjes wrote:

> I didn't check earlier, but CAP_SYS_RESOURCE hasn't had a place in the oom 
> killer's heuristic in over five years, so what regression are we referring 
> to in this thread?  These tasks already have full control over 
> oom_score_adj to modify its oom killing priority in either direction.
> 

Yes, CAP_SYS_RESOURCE was a part of the heuristic in 2.6.25 along with 
CAP_SYS_ADMIN and was removed with the rewrite; when I said it "hasn't had 
a place in the oom killer's heuristic," I meant it's an unnecessary 
extention to CAP_SYS_ADMIN and allows for killing innocent tasks when a 
CAP_SYS_RESOURCE task is using too much memory.

The fundamental issue here is whether or not we should give a bonus to 
CAP_SYS_RESOURCE tasks because they are, by definition, allowed to access 
extra resources and we're willing to sacrifice other tasks for that.  This 
is antagonist to the oom killer's sole goal, however, which is to kill the 
task consuming the largest amount of memory unless protected by userspace 
(which CAP_SYS_RESOURCE has completely control in doing).

Since these threads have complete ability to give themselves this bonus 
(echo -30 > /proc/self/oom_score_adj), I don't think this needs to be a 
part of the core heuristic nor with such an arbitrary value of 3% (the old 
heuristic divided its badness score by 4, another arbitrary value).

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-09 21:06               ` David Rientjes
  2010-11-09 21:25                 ` David Rientjes
@ 2010-11-10 14:38                 ` Figo.zhang
  2010-11-10 20:50                   ` David Rientjes
  1 sibling, 1 reply; 77+ messages in thread
From: Figo.zhang @ 2010-11-10 14:38 UTC (permalink / raw)
  To: David Rientjes
  Cc: Alan Cox, KOSAKI Motohiro, Figo.zhang, lkml, linux-mm, Andrew Morton

On Tue, 2010-11-09 at 13:06 -0800, David Rientjes wrote:
> On Tue, 9 Nov 2010, Alan Cox wrote:
> 
> > The reverse can be argued equally - that they can unprotect themselves if
> > necessary. In fact it seems to be a "point of view" sort of question
> > which way you deal with CAP_SYS_RESOURCE, and that to me argues that
> > changing from old expected behaviour to a new behaviour is a regression.
> > 
> 
> I didn't check earlier, but CAP_SYS_RESOURCE hasn't had a place in the oom 
> killer's heuristic in over five years, so what regression are we referring 
> to in this thread?  These tasks already have full control over 
> oom_score_adj to modify its oom killing priority in either direction.

yes, it can control by user, but is it all system administrators will
adjust all of the processes by each one and one in real word? suppose if
it has thousands of processes in database system.

> Futhermore, the heuristic was entirely rewritten, but I wouldn't consider 
> all the old factors such as cputime and nice level being removed as 
> "regressions" since the aim was to make it more predictable and more 
> likely to kill a large consumer of memory such that we don't have to kill 
> more tasks in the near future.

the goal of oom_killer is to find out the best process to kill, the one
should be:
1. it is a most memory comsuming process in all processes
2. and it was a proper process to kill, which will not be let system 
into unpredictable state as possible.

if a user process and a process such email cleint "evolution" with
ditecly hareware access such as "Xorg", they have eat the equal memory,
so which process are you want to kill?



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-09 21:16   ` David Rientjes
@ 2010-11-10 14:48     ` Figo.zhang
  2010-11-14  5:07     ` KOSAKI Motohiro
  1 sibling, 0 replies; 77+ messages in thread
From: Figo.zhang @ 2010-11-10 14:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: lkml, KOSAKI Motohiro, linux-mm, Andrew Morton, Linus Torvalds

On Tue, 2010-11-09 at 13:16 -0800, David Rientjes wrote:
> On Tue, 9 Nov 2010, Figo.zhang wrote:
> 
> >  
> > the victim should not directly access hardware devices like Xorg server,
> > because the hardware could be left in an unpredictable state, although 
> > user-application can set /proc/pid/oom_score_adj to protect it. so i think
> > those processes should get 3% bonus for protection.
> > 
> 
> The logic here is wrong: if killing these tasks can leave hardware in an 
> unpredictable state (and that state is presumably harmful), then they 
> should be completely immune from oom killing since you're still leaving 
> them exposed here to be killed.

we let the processes with hardware access get bonus for protection. the
goal is not select them to be killed as possible.


> 
> So the question that needs to be answered is: why do these threads deserve 
> to use 3% more memory (not >4%) than others without getting killed?  If 
> there was some evidence that these threads have a certain quantity of 
> memory they require as a fundamental attribute of CAP_SYS_RAWIO, then I 
> have no objection, but that's going to be expressed in a memory quantity 
> not a percentage as you have here.
> 
> The CAP_SYS_ADMIN heuristic has a background: it is used in the oom killer 
> because we have used the same 3% in __vm_enough_memory() for a long time 
> and we want consistency amongst the heuristics.  Adding additional bonuses 
> with arbitrary values like 3% of memory for things like CAP_SYS_RAWIO 
> makes the heuristic less predictable and moves us back toward the old 
> heuristic which was almost entirely arbitrary.


yes, i think it is be better those processes which be protection maybe
divided the badness score by 4, like old heuristic.





^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-09 12:24 ` [PATCH v2]mm/oom-kill: " Figo.zhang
  2010-11-09 21:16   ` David Rientjes
@ 2010-11-10 15:14   ` Figo.zhang
  2010-11-10 15:24     ` Figo.zhang
  1 sibling, 1 reply; 77+ messages in thread
From: Figo.zhang @ 2010-11-10 15:14 UTC (permalink / raw)
  To: lkml
  Cc: KOSAKI Motohiro, linux-mm, Andrew Morton, rientjes,
	Linus Torvalds, Figo.zhang

the victim should not directly access hardware devices like Xorg server,
because the hardware could be left in an unpredictable state, although 
user-application can set /proc/pid/oom_score_adj to protect it. so i think
those processes should get bonus for protection.

in v2, fix the incorrect comment.
in v3, change the divided the badness score by 4, like old heuristic for protection. we just
want the oom_killer don't select Root/RESOURCE/RAWIO process as possible.

suppose that if a user process A such as email cleint "evolution" and a process B with
ditecly hareware access such as "Xorg", they have eat the equal memory (the badness score is 
the same),so which process are you want to kill? so in new heuristic, it will kill the process B.
but in reality, we want to kill process A.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
mm/oom_kill.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4029583..f43d759 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -202,6 +202,15 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 		points -= 30;

 	/*
+	 * Root and direct hareware access processor are usually more 
+	 * important, so them should get bonus for protection. 
+	 */
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
+	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
+	    has_capability_noaudit(p, CAP_SYS_RAWIO))
+		points /= 4;
+
+	/*
 	 * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
 	 * either completely disable oom killing or always prefer a certain
 	 * task.

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-10 15:14   ` [PATCH v3]mm/oom-kill: " Figo.zhang
@ 2010-11-10 15:24     ` Figo.zhang
  2010-11-10 21:00       ` David Rientjes
                         ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Figo.zhang @ 2010-11-10 15:24 UTC (permalink / raw)
  To: lkml
  Cc: KOSAKI Motohiro, linux-mm, Andrew Morton, rientjes,
	Linus Torvalds, Figo.zhang

the victim should not directly access hardware devices like Xorg server,
because the hardware could be left in an unpredictable state, although 
user-application can set /proc/pid/oom_score_adj to protect it. so i think
those processes should get bonus for protection.

in v2, fix the incorrect comment.
in v3, change the divided the badness score by 4, like old heuristic for protection. we just
want the oom_killer don't select Root/RESOURCE/RAWIO process as possible.

suppose that if a user process A such as email cleint "evolution" and a process B with
ditecly hareware access such as "Xorg", they have eat the equal memory (the badness score is 
the same),so which process are you want to kill? so in new heuristic, it will kill the process B.
but in reality, we want to kill process A.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
mm/oom_kill.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4029583..f43d759 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -202,6 +202,15 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 		points -= 30;

 	/*
+	 * Root and direct hareware access processes are usually more 
+	 * important, so they should get bonus for protection. 
+	 */
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
+	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
+	    has_capability_noaudit(p, CAP_SYS_RAWIO))
+		points /= 4;
+
+	/*
 	 * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
 	 * either completely disable oom killing or always prefer a certain
 	 * task.

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus
  2010-11-10 14:38                 ` Figo.zhang
@ 2010-11-10 20:50                   ` David Rientjes
  0 siblings, 0 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-10 20:50 UTC (permalink / raw)
  To: Figo.zhang
  Cc: Alan Cox, KOSAKI Motohiro, Figo.zhang, lkml, linux-mm, Andrew Morton

On Wed, 10 Nov 2010, Figo.zhang wrote:

> > I didn't check earlier, but CAP_SYS_RESOURCE hasn't had a place in the oom 
> > killer's heuristic in over five years, so what regression are we referring 
> > to in this thread?  These tasks already have full control over 
> > oom_score_adj to modify its oom killing priority in either direction.
> 
> yes, it can control by user, but is it all system administrators will
> adjust all of the processes by each one and one in real word? suppose if
> it has thousands of processes in database system.
> 

Yes, the kernel can't possibly know the oom killing priorities of your 
task so if you have such requirements then you must use the userspace 
tunable.

> > Futhermore, the heuristic was entirely rewritten, but I wouldn't consider 
> > all the old factors such as cputime and nice level being removed as 
> > "regressions" since the aim was to make it more predictable and more 
> > likely to kill a large consumer of memory such that we don't have to kill 
> > more tasks in the near future.
> 
> the goal of oom_killer is to find out the best process to kill, the one
> should be:
> 1. it is a most memory comsuming process in all processes
> 2. and it was a proper process to kill, which will not be let system 
> into unpredictable state as possible.
> 

There are four types of tasks that are improper to kill and this is 
relatively unchanged in the past five years of the oom killer:

 - init,

 - kthreads,

 - tasks that are bound to a disjoint set of cpuset mems or mempolicy 
   nodes that are not oom, and

 - those disabled from oom killing by userspace.

That does not include CAP_SYS_RESOURCE, nor CAP_SYS_ADMIN.  Your argument 
about killing some tasks that have CAP_SYS_RESOURCE leaving hardware in an 
unpredictable state isn't even addressed by your own patch, you only give 
them a 3% memory bonus so they are still eligible.

As mentioned previously, for this patch to make sense, you would need to 
show that CAP_SYS_RESOURCE equates to 3% of the available memory's 
capacity for a task.  I don't believe that evidence has been presented.  
This has nothing to do with preventing these threads from being killed (at 
the risk of possibly panicking the machine) since your patch doesn't do 
that.

> if a user process and a process such email cleint "evolution" with
> ditecly hareware access such as "Xorg", they have eat the equal memory,
> so which process are you want to kill?
> 

Both have equal oom killing priority according to the heuristic if they 
are not run by root.  If you would like to protect Xorg, then you need to 
use the userspace tunable to protect it just like everything else does.  
This is completely unchanged from the oom killer rewrite.

If you actually have a problem that you're reporting, however, it would 
probably be better to show the oom killer log from that event and let us 
address it instead of introducing arbitrary heuristics into something 
which aims to be as predictable as possible.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-10 15:24     ` Figo.zhang
@ 2010-11-10 21:00       ` David Rientjes
  2010-11-14  5:21       ` KOSAKI Motohiro
  2011-01-04  7:51       ` [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus Figo.zhang
  2 siblings, 0 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-10 21:00 UTC (permalink / raw)
  To: Figo.zhang
  Cc: lkml, KOSAKI Motohiro, linux-mm, Andrew Morton, Linus Torvalds,
	Figo.zhang

On Wed, 10 Nov 2010, Figo.zhang wrote:

> the victim should not directly access hardware devices like Xorg server,
> because the hardware could be left in an unpredictable state, although 
> user-application can set /proc/pid/oom_score_adj to protect it. so i think
> those processes should get bonus for protection.
> 

Again, this argument doesn't work: if killing the task leaves hardware in 
an unpredictable state (and that's presumably harmful), then they 
shouldn't be killed at all.

Please show why CAP_SYS_RESOURCE equates to 3% additional memory for such 
tasks.

CAP_SYS_RESOURCE allows those threads to override resource limits, so 
these have potentially unbounded amounts of memory usage.  Thus, they may 
have the highest memory usage on the machine and now your patch has caused 
other innocent tasks to be killed before this is actually targeted.  
That's a bad result.  Why do we need this type of hack in the oom killer 
when these threads have the privilege to modify oom killing priorities for 
all tasks on the system?  Laziness, at the cost of a less predictable 
heuristic?

Why aren't you doing the same change for __vm_enough_memory() for LSMs?

> in v2, fix the incorrect comment.
> in v3, change the divided the badness score by 4, like old heuristic for protection. we just
> want the oom_killer don't select Root/RESOURCE/RAWIO process as possible.
> 
> suppose that if a user process A such as email cleint "evolution" and a process B with
> ditecly hareware access such as "Xorg", they have eat the equal memory (the badness score is 
> the same),so which process are you want to kill? so in new heuristic, it will kill the process B.
> but in reality, we want to kill process A.
> 

Then you need to protect process B accordingly and since it has 
CAP_SYS_RESOURCE it can easily do that on its own or the admin can protect 
Xorg.

> Signed-off-by: Figo.zhang <figo1802@gmail.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Unless you did this in private, I didn't see KOSAKI-san's reviewed-by line 
for this change and it is drastically different from what you've proposed 
before.

> ---
> mm/oom_kill.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4029583..f43d759 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -202,6 +202,15 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>  		points -= 30;
>  
>  	/*
> +	 * Root and direct hareware access processes are usually more 
> +	 * important, so they should get bonus for protection. 
> +	 */
> +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
> +		points /= 4;
> +

What on earth?  So now CAP_SYS_ADMIN gets a 3% bonus in the if-clause 
above this, then we divide a percentage of memory use by 4?  What does 
that mean AT ALL?

And now you've thrown CAP_SYS_RAWIO in there without any mention in the 
changelog?

Are you just trying to introduce all the old arbitrary heuristics from 
before the rewrite back into the oom killer like this?

Do you actually have a log from an event where the oom killer targeted the 
incorrect task?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-09 21:16   ` David Rientjes
  2010-11-10 14:48     ` Figo.zhang
@ 2010-11-14  5:07     ` KOSAKI Motohiro
  2010-11-14 21:29       ` David Rientjes
  1 sibling, 1 reply; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14  5:07 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Figo.zhang, lkml, linux-mm, Andrew Morton,
	Linus Torvalds

> > the victim should not directly access hardware devices like Xorg server,
> > because the hardware could be left in an unpredictable state, although 
> > user-application can set /proc/pid/oom_score_adj to protect it. so i think
> > those processes should get 3% bonus for protection.
> > 
> 
> The logic here is wrong: if killing these tasks can leave hardware in an 
> unpredictable state (and that state is presumably harmful), then they 
> should be completely immune from oom killing since you're still leaving 
> them exposed here to be killed.
> 
> So the question that needs to be answered is: why do these threads deserve 
> to use 3% more memory (not >4%) than others without getting killed?  If 
> there was some evidence that these threads have a certain quantity of 
> memory they require as a fundamental attribute of CAP_SYS_RAWIO, then I 
> have no objection, but that's going to be expressed in a memory quantity 
> not a percentage as you have here.

3% is choosed by you :-/


> The CAP_SYS_ADMIN heuristic has a background: it is used in the oom killer 
> because we have used the same 3% in __vm_enough_memory() for a long time 
> and we want consistency amongst the heuristics.  Adding additional bonuses 
> with arbitrary values like 3% of memory for things like CAP_SYS_RAWIO 
> makes the heuristic less predictable and moves us back toward the old 
> heuristic which was almost entirely arbitrary.

That's bogus. __vm_enough_memory() does track virtual adress space. oom-killer
doesn't. It's unrelated.


> Now before KOSAKI-san comes out and says the old heuristic considered 
> CAP_SYS_RAWIO and the new one does not so it _must_ be a regression: the 
> old heuristic also divided the badness score by 4 for that capability as a 
> completely arbitrary value (just like 3% is here).  Other traits like 
> runtime and nice levels were also removed from the heuristic.  What needs 
> to be shown is that CAP_SYS_RAWIO requires additional memory just to run 
> or we should neglect to free 3% of memory, which could be gigabytes, 
> because it has this trait.

Old background is very simple and cleaner. 

CAP_SYS_RESOURCE mean the process has a privilege of using more resource.
then, oom-killer gave it additonal bonus.

CAP_SYS_RAWIO mean the process has a direct hardware access privilege
(eg X.org, RDB). and then, killing it might makes system crash.


In another story, somebody doubt 4x bonus is good or not. but 3% has
the same problem.





^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-10 15:24     ` Figo.zhang
  2010-11-10 21:00       ` David Rientjes
@ 2010-11-14  5:21       ` KOSAKI Motohiro
  2010-11-14 21:33         ` David Rientjes
  2011-01-04  7:51       ` [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus Figo.zhang
  2 siblings, 1 reply; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14  5:21 UTC (permalink / raw)
  To: Figo.zhang
  Cc: kosaki.motohiro, lkml, linux-mm, Andrew Morton, rientjes,
	Linus Torvalds, Figo.zhang

> the victim should not directly access hardware devices like Xorg server,
> because the hardware could be left in an unpredictable state, although 
> user-application can set /proc/pid/oom_score_adj to protect it. so i think
> those processes should get bonus for protection.
> 
> in v2, fix the incorrect comment.
> in v3, change the divided the badness score by 4, like old heuristic for protection. we just
> want the oom_killer don't select Root/RESOURCE/RAWIO process as possible.
> 
> suppose that if a user process A such as email cleint "evolution" and a process B with
> ditecly hareware access such as "Xorg", they have eat the equal memory (the badness score is 
> the same),so which process are you want to kill? so in new heuristic, it will kill the process B.
> but in reality, we want to kill process A.
> 
> Signed-off-by: Figo.zhang <figo1802@gmail.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Sorry for the delay. I've sent completely revert patch to linus. It will
disappear your headache, I believe. I'm sorry that our development
caused your harm. We really don't want it.

Thanks.


> ---
> mm/oom_kill.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4029583..f43d759 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -202,6 +202,15 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>  		points -= 30;
>  
>  	/*
> +	 * Root and direct hareware access processes are usually more 
> +	 * important, so they should get bonus for protection. 
> +	 */
> +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
> +		points /= 4;
> +
> +	/*
>  	 * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
>  	 * either completely disable oom killing or always prefer a certain
>  	 * task.
> 
> 




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-14  5:07     ` KOSAKI Motohiro
@ 2010-11-14 21:29       ` David Rientjes
  2010-11-15  1:24         ` KOSAKI Motohiro
  0 siblings, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-14 21:29 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Figo.zhang, lkml, linux-mm, Andrew Morton, Linus Torvalds

On Sun, 14 Nov 2010, KOSAKI Motohiro wrote:

> > So the question that needs to be answered is: why do these threads deserve 
> > to use 3% more memory (not >4%) than others without getting killed?  If 
> > there was some evidence that these threads have a certain quantity of 
> > memory they require as a fundamental attribute of CAP_SYS_RAWIO, then I 
> > have no objection, but that's going to be expressed in a memory quantity 
> > not a percentage as you have here.
> 
> 3% is choosed by you :-/
> 

No, 3% was chosen in __vm_enough_memory() for LSMs as the comment in the 
oom killer shows:

        /*
         * Root processes get 3% bonus, just like the __vm_enough_memory()
         * implementation used by LSMs.
         */

and is described in Documentation/filesystems/proc.txt.

I think in cases of heuristics like this where we obviously want to give 
some bonus to CAP_SYS_ADMIN that there is consistency with other bonuses 
given elsewhere in the kernel.

> Old background is very simple and cleaner. 
> 

The old heuristic divided the arbitrary badness score by 4 with 
CAP_SYS_RESOURCE.  The new heuristic doesn't consider it.

How is that more clean?

> CAP_SYS_RESOURCE mean the process has a privilege of using more resource.
> then, oom-killer gave it additonal bonus.
> 

As a side-effect of being given more resources to allocate, those 
applications are relatively unbounded in terms of memory consumption to 
other tasks.  Thus, it's possible that these applications are using a 
massive amount of memory (say, 75%) and now with the proposed change a 
task using 25% of memory would be killed instead.  This increases the 
liklihood that the CAP_SYS_RESOURCE thread will have to be killed 
eventually, anyway, and the goal is to kill as few tasks as possible to 
free sufficient amount of memory.

Since threads having CAP_SYS_RESOURCE have full control over their 
oom_score_adj, they can take the additional precautions to protect 
themselves if necessary.  It doesn't need to be a part of the heuristic to 
bias these tasks which will lead to the undesired result described above 
by default rather than intentionally from userspace.

> CAP_SYS_RAWIO mean the process has a direct hardware access privilege
> (eg X.org, RDB). and then, killing it might makes system crash.
> 

Then you would want to explicitly filter these tasks from oom kill just as 
OOM_SCORE_ADJ_MIN works rather than giving them a memory quantity bonus.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-14  5:21       ` KOSAKI Motohiro
@ 2010-11-14 21:33         ` David Rientjes
  2010-11-15  3:26           ` [PATCH] Revert oom rewrite series Figo.zhang
  0 siblings, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-14 21:33 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Figo.zhang, lkml, linux-mm, Andrew Morton, Linus Torvalds, Figo.zhang

On Sun, 14 Nov 2010, KOSAKI Motohiro wrote:

> > the victim should not directly access hardware devices like Xorg server,
> > because the hardware could be left in an unpredictable state, although 
> > user-application can set /proc/pid/oom_score_adj to protect it. so i think
> > those processes should get bonus for protection.
> > 
> > in v2, fix the incorrect comment.
> > in v3, change the divided the badness score by 4, like old heuristic for protection. we just
> > want the oom_killer don't select Root/RESOURCE/RAWIO process as possible.
> > 
> > suppose that if a user process A such as email cleint "evolution" and a process B with
> > ditecly hareware access such as "Xorg", they have eat the equal memory (the badness score is 
> > the same),so which process are you want to kill? so in new heuristic, it will kill the process B.
> > but in reality, we want to kill process A.
> > 
> > Signed-off-by: Figo.zhang <figo1802@gmail.com>
> > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> Sorry for the delay. I've sent completely revert patch to linus. It will
> disappear your headache, I believe. I'm sorry that our development
> caused your harm. We really don't want it.
> 

Oh please, your dramatics are getting better and better.

Figo.zhang never described a problem that was being addressed but rather 
proposed several different variants of a patch (some with CAP_SYS_ADMIN, 
some with CAP_SYS_RESOURCE, some with CAP_SYS_RAWIO, some with a 
combination, some with a 3% bonus, some with a order-of-2 bonus, etc) to 
return the same heuristic used in the old oom killer.  I asked several 
times to show the oom killer log from the problematic behavior and none 
were presented.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-14 21:29       ` David Rientjes
@ 2010-11-15  1:24         ` KOSAKI Motohiro
  2010-11-15 10:03           ` David Rientjes
  0 siblings, 1 reply; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-15  1:24 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Figo.zhang, lkml, linux-mm, Andrew Morton,
	Linus Torvalds

> On Sun, 14 Nov 2010, KOSAKI Motohiro wrote:
> 
> > > So the question that needs to be answered is: why do these threads deserve 
> > > to use 3% more memory (not >4%) than others without getting killed?  If 
> > > there was some evidence that these threads have a certain quantity of 
> > > memory they require as a fundamental attribute of CAP_SYS_RAWIO, then I 
> > > have no objection, but that's going to be expressed in a memory quantity 
> > > not a percentage as you have here.
> > 
> > 3% is choosed by you :-/
> > 
> 
> No, 3% was chosen in __vm_enough_memory() for LSMs as the comment in the 
> oom killer shows:
> 
>         /*
>          * Root processes get 3% bonus, just like the __vm_enough_memory()
>          * implementation used by LSMs.
>          */
> 
> and is described in Documentation/filesystems/proc.txt.
> 
> I think in cases of heuristics like this where we obviously want to give 
> some bonus to CAP_SYS_ADMIN that there is consistency with other bonuses 
> given elsewhere in the kernel.

Keep comparision apple to apple. vm_enough_memory() account _virtual_ memory.
oom-killer try to free _physical_ memory. It's unrelated.


> 
> > Old background is very simple and cleaner. 
> > 
> 
> The old heuristic divided the arbitrary badness score by 4 with 
> CAP_SYS_RESOURCE.  The new heuristic doesn't consider it.
> 
> How is that more clean?
> 
> > CAP_SYS_RESOURCE mean the process has a privilege of using more resource.
> > then, oom-killer gave it additonal bonus.
> > 
> 
> As a side-effect of being given more resources to allocate, those 
> applications are relatively unbounded in terms of memory consumption to 
> other tasks.  Thus, it's possible that these applications are using a 
> massive amount of memory (say, 75%) and now with the proposed change a 
> task using 25% of memory would be killed instead.  This increases the 
> liklihood that the CAP_SYS_RESOURCE thread will have to be killed 
> eventually, anyway, and the goal is to kill as few tasks as possible to 
> free sufficient amount of memory.

You are talking two difference at once. 3% vs 4x and CAP_SYS_RESOURCE and
CAP_SYS_ADMIN.

Please keep comparing apple to apple.


> 
> Since threads having CAP_SYS_RESOURCE have full control over their 
> oom_score_adj, they can take the additional precautions to protect 
> themselves if necessary.  It doesn't need to be a part of the heuristic to 
> bias these tasks which will lead to the undesired result described above 
> by default rather than intentionally from userspace.
> 
> > CAP_SYS_RAWIO mean the process has a direct hardware access privilege
> > (eg X.org, RDB). and then, killing it might makes system crash.
> > 
> 
> Then you would want to explicitly filter these tasks from oom kill just as 
> OOM_SCORE_ADJ_MIN works rather than giving them a memory quantity bonus.

No. Why does userland recover your mistake?





^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-14 21:33         ` David Rientjes
@ 2010-11-15  3:26           ` Figo.zhang
  2010-11-15 10:14             ` David Rientjes
  0 siblings, 1 reply; 77+ messages in thread
From: Figo.zhang @ 2010-11-15  3:26 UTC (permalink / raw)
  To: David Rientjes
  Cc: KOSAKI Motohiro, Figo.zhang, lkml, linux-mm, Andrew Morton,
	Linus Torvalds

 >Nothing to say, really.  Seems each time we're told about a bug or a
 >regression, David either fixes the bug or points out why it wasn't a
 >bug or why it wasn't a regression or how it was a deliberate behaviour
 >change for the better.

 >I just haven't seen any solid reason to be concerned about the state of
 >the current oom-killer, sorry.

 >I'm concerned that you're concerned!  A lot.  When someone such as
 >yourself is unhappy with part of MM then I sit up and pay attention.
 >But after all this time I simply don't understand the technical issues
 >which you're seeing here.

we just talk about oom-killer technical issues.

i am doubt that a new rewrite but the athor canot provide some evidence 
and experiment result, why did you do that? what is the prominent change 
for your new algorithm?

as KOSAKI Motohiro said, "you removed CAP_SYS_RESOURCE condition with 
ZERO explanation".

David just said that pls use userspace tunable for protection by 
oom_score_adj. but may i ask question:

1. what is your innovation for your new algorithm, the old one have the 
same way for user tunable oom_adj.

2. if server like db-server/financial-server have huge import processes 
(such as root/hardware access processes)want to be protection, you let 
the administrator to find out which processes should be protection. you
will let the  financial-server administrator huge crazy!! and lose so 
many money!! ^~^

3. i see your email in LKML, you just said
"I have repeatedly said that the oom killer no longer kills KDE when run 
on my desktop in the presence of a memory hogging task that was written 
specifically to oom the machine."
http://thread.gmane.org/gmane.linux.kernel.mm/48998

so you just test your new oom_killer algorithm on your desktop with KDE, 
so have you provide the detail how you do the test? is it do the
experiment again for anyone and got the same result as your comment ?

as KOSAKI Motohiro said, in reality word, it we makes 5-6 brain 
simulation, embedded, desktop, web server,db server, hpc, finance. 
Different workloads certenally makes big impact. have you do those
experiments?

i think that technology should base on experiment not on imagine.

Best,
Figo.zhang

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-15  1:24         ` KOSAKI Motohiro
@ 2010-11-15 10:03           ` David Rientjes
  2010-11-23  7:16             ` KOSAKI Motohiro
  0 siblings, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-15 10:03 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Figo.zhang, lkml, linux-mm, Andrew Morton, Linus Torvalds

On Mon, 15 Nov 2010, KOSAKI Motohiro wrote:

> > I think in cases of heuristics like this where we obviously want to give 
> > some bonus to CAP_SYS_ADMIN that there is consistency with other bonuses 
> > given elsewhere in the kernel.
> 
> Keep comparision apple to apple. vm_enough_memory() account _virtual_ memory.
> oom-killer try to free _physical_ memory. It's unrelated.
> 

It's not unrelated, the LSM function gives an arbitrary 3% bonus to 
CAP_SYS_ADMIN.  Such threads should also be preferred in the oom killer 
over other threads since they tend to be more important but not an overly 
drastic bias such that they don't get killed when using an egregious 
amount of memory.  So in selecting a small percentage of memory that tends 
to be a significant bias but not overwhelming, I went with the 3% found 
elsewhere in the kernel.  __vm_enough_memory() doesn't have that 
preference for any scientifically calculated reason, it's a heuristic just 
like oom_badness().

> > > CAP_SYS_RAWIO mean the process has a direct hardware access privilege
> > > (eg X.org, RDB). and then, killing it might makes system crash.
> > > 
> > 
> > Then you would want to explicitly filter these tasks from oom kill just as 
> > OOM_SCORE_ADJ_MIN works rather than giving them a memory quantity bonus.
> 
> No. Why does userland recover your mistake?
> 

You just said killing any CAP_SYS_RAWIO task may make the system crash, so 
presuming that you don't want the system to crash, you are suggesting we 
should make these threads completely immune?  That's never been the case 
(and isn't for oom_kill_allocating_task, either), so there's no history 
you can draw from to support your argument.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15  3:26           ` [PATCH] Revert oom rewrite series Figo.zhang
@ 2010-11-15 10:14             ` David Rientjes
  2010-11-15 10:57               ` Alan Cox
  0 siblings, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-15 10:14 UTC (permalink / raw)
  To: Figo.zhang
  Cc: KOSAKI Motohiro, Figo.zhang, lkml, linux-mm, Andrew Morton,
	Linus Torvalds

On Mon, 15 Nov 2010, Figo.zhang wrote:

> i am doubt that a new rewrite but the athor canot provide some evidence and
> experiment result, why did you do that? what is the prominent change for your
> new algorithm?
> 
> as KOSAKI Motohiro said, "you removed CAP_SYS_RESOURCE condition with ZERO
> explanation".
> 
> David just said that pls use userspace tunable for protection by
> oom_score_adj. but may i ask question:
> 
> 1. what is your innovation for your new algorithm, the old one have the same
> way for user tunable oom_adj.
> 

The goal was to make the oom killer heuristic as predictable as possible 
and to kill the most memory-hogging task to avoid having to recall it and 
needlessly kill several tasks.

The goal behind oom_score_adj vs. oom_adj was for several reasons, as 
pointed out before:

 - give it a unit (proportion of available memory), oom_adj had no unit,

 - allow it to work on a linear scale for more control over 
   prioritization, oom_adj had an exponential scale,

 - give it a much higher resolution so it can be fine-tuned, it works with 
   a granularity of 0.1% of memory (~128M on a 128G machine), and

 - allow it to describe the oom killing priority of a task regardless of 
   its cpuset attachment, mempolicy, or memcg, or when their respective
   limits change.

> 2. if server like db-server/financial-server have huge import processes (such
> as root/hardware access processes)want to be protection, you let the
> administrator to find out which processes should be protection. you
> will let the  financial-server administrator huge crazy!! and lose so many
> money!! ^~^
> 

You have full control over disabling a task from being considered with 
oom_score_adj just like you did with oom_adj.  Since oom_adj is 
deprecated for two years, you can even use the old interface until then.

> 3. i see your email in LKML, you just said
> "I have repeatedly said that the oom killer no longer kills KDE when run on my
> desktop in the presence of a memory hogging task that was written specifically
> to oom the machine."
> http://thread.gmane.org/gmane.linux.kernel.mm/48998
> 
> so you just test your new oom_killer algorithm on your desktop with KDE, so
> have you provide the detail how you do the test? is it do the
> experiment again for anyone and got the same result as your comment ?
> 

Xorg tends to be killed less because of the change to the heuristic's 
baseline, which is now based on rss and swap instead of total_vm.  This is 
seperate from the issues you list above, but is a benefit to the oom 
killer that desktop users especially will notice.  I, personally, am 
interested more in the server market and that's why I looked for a more 
robust userspace tunable that would still be applicable when things like 
cpusets have a node added or removed.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15 10:14             ` David Rientjes
@ 2010-11-15 10:57               ` Alan Cox
  2010-11-15 20:54                 ` David Rientjes
  2010-11-23  7:16                 ` KOSAKI Motohiro
  0 siblings, 2 replies; 77+ messages in thread
From: Alan Cox @ 2010-11-15 10:57 UTC (permalink / raw)
  To: David Rientjes
  Cc: Figo.zhang, KOSAKI Motohiro, Figo.zhang, lkml, linux-mm,
	Andrew Morton, Linus Torvalds

> The goal was to make the oom killer heuristic as predictable as possible 
> and to kill the most memory-hogging task to avoid having to recall it and 
> needlessly kill several tasks.

Meta question - why is that a good thing. In a desktop environment it's
frequently wrong, in a server environment it is often wrong. We had this
before where people spend months fiddling with the vm and make it work
slightly differently and it suits their workload, then other workloads go
downhill. Then the cycle repeats.

> You have full control over disabling a task from being considered with 
> oom_score_adj just like you did with oom_adj.  Since oom_adj is 
> deprecated for two years, you can even use the old interface until then.

Which changeset added it to the Documentation directory as deprecated ?

Alan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15 10:57               ` Alan Cox
@ 2010-11-15 20:54                 ` David Rientjes
  2010-11-23  7:16                 ` KOSAKI Motohiro
  1 sibling, 0 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-15 20:54 UTC (permalink / raw)
  To: Alan Cox
  Cc: Figo.zhang, KOSAKI Motohiro, Figo.zhang, lkml, linux-mm,
	Andrew Morton, Linus Torvalds

On Mon, 15 Nov 2010, Alan Cox wrote:

> > The goal was to make the oom killer heuristic as predictable as possible 
> > and to kill the most memory-hogging task to avoid having to recall it and 
> > needlessly kill several tasks.
> 
> Meta question - why is that a good thing. In a desktop environment it's
> frequently wrong, in a server environment it is often wrong. We had this
> before where people spend months fiddling with the vm and make it work
> slightly differently and it suits their workload, then other workloads go
> downhill. Then the cycle repeats.
> 

Most of the arbitrary heuristics were removed from oom_badness(), things 
like nice level, runtime, CAP_SYS_RESOURCE, etc., so that we only consider 
the rss and swap usage of each application in comparison to each other 
when deciding which task to kill.  We give root tasks a 3% bonus since 
they tend to be more important to the productivity or uptime of the 
machine, which did exist -- albeit with a more dramatic impact -- in the 
old heursitic.

You'll find that the new heuristic always kills the task consuming the 
most amount of rss unless influenced by userspace via the tunables (or 
within 3% of root tasks).

We always want to kill the most memory-hogging task because it avoids 
needlessly killing additional tasks when we must immediately recall the 
oom killer because we continue to allocate memory.  If that task happens 
to be of vital importance to userspace, then the user has full control 
over tuning the oom killer priorities in such circumstances.

> > You have full control over disabling a task from being considered with 
> > oom_score_adj just like you did with oom_adj.  Since oom_adj is 
> > deprecated for two years, you can even use the old interface until then.
> 
> Which changeset added it to the Documentation directory as deprecated ?
> 

51b1bd2a was the actual change that deprecated it, which was a direct 
follow-up to a63d83f4 which actually obsoleted it.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15 10:57               ` Alan Cox
  2010-11-15 20:54                 ` David Rientjes
@ 2010-11-23  7:16                 ` KOSAKI Motohiro
  1 sibling, 0 replies; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-23  7:16 UTC (permalink / raw)
  To: Alan Cox
  Cc: kosaki.motohiro, David Rientjes, Figo.zhang, Figo.zhang, lkml,
	linux-mm, Andrew Morton, Linus Torvalds


sorry for the delay.

> > The goal was to make the oom killer heuristic as predictable as possible 
> > and to kill the most memory-hogging task to avoid having to recall it and 
> > needlessly kill several tasks.
> 
> Meta question - why is that a good thing. In a desktop environment it's
> frequently wrong, in a server environment it is often wrong. We had this
> before where people spend months fiddling with the vm and make it work
> slightly differently and it suits their workload, then other workloads go
> downhill. Then the cycle repeats.
> 
> > You have full control over disabling a task from being considered with 
> > oom_score_adj just like you did with oom_adj.  Since oom_adj is 
> > deprecated for two years, you can even use the old interface until then.
> 
> Which changeset added it to the Documentation directory as deprecated ?

It's insufficient.
a63d83f427fbce97a6cea0db2e64b0eb8435cd10 (oom: badness heuristic rewrite)
introduced a lot of incompatibility to oom_adj and oom_score.
Theresore I would sugestted full revert and resubmit some patches which
cherry pick no pain piece.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-15 10:03           ` David Rientjes
@ 2010-11-23  7:16             ` KOSAKI Motohiro
  2010-11-28  1:36               ` David Rientjes
  0 siblings, 1 reply; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-23  7:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Figo.zhang, lkml, linux-mm, Andrew Morton,
	Linus Torvalds

> On Mon, 15 Nov 2010, KOSAKI Motohiro wrote:
> 
> > > I think in cases of heuristics like this where we obviously want to give 
> > > some bonus to CAP_SYS_ADMIN that there is consistency with other bonuses 
> > > given elsewhere in the kernel.
> > 
> > Keep comparision apple to apple. vm_enough_memory() account _virtual_ memory.
> > oom-killer try to free _physical_ memory. It's unrelated.
> > 
> 
> It's not unrelated, the LSM function gives an arbitrary 3% bonus to 
> CAP_SYS_ADMIN.  

Unrelated. LSM _is_ security module. and It only account virtual memory.


> Such threads should also be preferred in the oom killer 
> over other threads since they tend to be more important but not an overly 
> drastic bias such that they don't get killed when using an egregious 
> amount of memory.  So in selecting a small percentage of memory that tends 
> to be a significant bias but not overwhelming, I went with the 3% found 
> elsewhere in the kernel.  __vm_enough_memory() doesn't have that 
> preference for any scientifically calculated reason, it's a heuristic just 
> like oom_badness().

__vm_enough_memory() only gurard to memory overcommiting. And it doesn't
have any recover way. We expect admin should recover their HAND. In the
other hand, oom-killer _is_ automatic recover way. It's no need admin's 
hand. That's the reason why CAP_ADMIN is important or not.




> > > > CAP_SYS_RAWIO mean the process has a direct hardware access privilege
> > > > (eg X.org, RDB). and then, killing it might makes system crash.
> > > > 
> > > 
> > > Then you would want to explicitly filter these tasks from oom kill just as 
> > > OOM_SCORE_ADJ_MIN works rather than giving them a memory quantity bonus.
> > 
> > No. Why does userland recover your mistake?
> > 
> 
> You just said killing any CAP_SYS_RAWIO task may make the system crash, so 
> presuming that you don't want the system to crash, you are suggesting we 
> should make these threads completely immune?  That's never been the case 
> (and isn't for oom_kill_allocating_task, either), so there's no history 
> you can draw from to support your argument.

No. I only require YOU have to investigate userland usecase BEFORE making
change.




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-23  7:16             ` KOSAKI Motohiro
@ 2010-11-28  1:36               ` David Rientjes
  2010-11-30 13:00                 ` KOSAKI Motohiro
  0 siblings, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-28  1:36 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Figo.zhang, lkml, linux-mm, Andrew Morton, Linus Torvalds

On Tue, 23 Nov 2010, KOSAKI Motohiro wrote:

> > > > I think in cases of heuristics like this where we obviously want to give 
> > > > some bonus to CAP_SYS_ADMIN that there is consistency with other bonuses 
> > > > given elsewhere in the kernel.
> > > 
> > > Keep comparision apple to apple. vm_enough_memory() account _virtual_ memory.
> > > oom-killer try to free _physical_ memory. It's unrelated.
> > > 
> > 
> > It's not unrelated, the LSM function gives an arbitrary 3% bonus to 
> > CAP_SYS_ADMIN.  
> 
> Unrelated. LSM _is_ security module. and It only account virtual memory.
> 

I needed a small bias for CAP_SYS_ADMIN tasks so I chose 3% since it's the 
same proportion used elsewhere in the kernel and works nicely since the 
badness score is now a proportion.  If you'd like to propose a different 
percentage or suggest removing the bias for root tasks altogether, feel 
free to propose a patch.  Thanks!

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-28  1:36               ` David Rientjes
@ 2010-11-30 13:00                 ` KOSAKI Motohiro
  2010-11-30 20:05                   ` David Rientjes
  0 siblings, 1 reply; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-30 13:00 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Figo.zhang, lkml, linux-mm, Andrew Morton,
	Linus Torvalds

> On Tue, 23 Nov 2010, KOSAKI Motohiro wrote:
> 
> > > > > I think in cases of heuristics like this where we obviously want to give 
> > > > > some bonus to CAP_SYS_ADMIN that there is consistency with other bonuses 
> > > > > given elsewhere in the kernel.
> > > > 
> > > > Keep comparision apple to apple. vm_enough_memory() account _virtual_ memory.
> > > > oom-killer try to free _physical_ memory. It's unrelated.
> > > > 
> > > 
> > > It's not unrelated, the LSM function gives an arbitrary 3% bonus to 
> > > CAP_SYS_ADMIN.  
> > 
> > Unrelated. LSM _is_ security module. and It only account virtual memory.
> > 
> 
> I needed a small bias for CAP_SYS_ADMIN tasks so I chose 3% since it's the 
> same proportion used elsewhere in the kernel and works nicely since the 
> badness score is now a proportion.  

Why? Is this important than X?

> If you'd like to propose a different 
> percentage or suggest removing the bias for root tasks altogether, feel 
> free to propose a patch.  Thanks!

I only need to revert bad change.


Thanks.




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-30 13:00                 ` KOSAKI Motohiro
@ 2010-11-30 20:05                   ` David Rientjes
  0 siblings, 0 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-30 20:05 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Figo.zhang, lkml, linux-mm, Andrew Morton, Linus Torvalds

On Tue, 30 Nov 2010, KOSAKI Motohiro wrote:

> > I needed a small bias for CAP_SYS_ADMIN tasks so I chose 3% since it's the 
> > same proportion used elsewhere in the kernel and works nicely since the 
> > badness score is now a proportion.  
> 
> Why? Is this important than X?
> 

We have always preferred to break ties between applications by not 
preferring the root task over the user task in the oom killer.  If you'd 
like to remove this bonus for CAP_SYS_ADMIN, please propose a patch.  
Thanks!

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus
  2010-11-10 15:24     ` Figo.zhang
  2010-11-10 21:00       ` David Rientjes
  2010-11-14  5:21       ` KOSAKI Motohiro
@ 2011-01-04  7:51       ` Figo.zhang
  2011-01-04  8:28         ` KAMEZAWA Hiroyuki
  2011-01-05  3:32         ` David Rientjes
  2 siblings, 2 replies; 77+ messages in thread
From: Figo.zhang @ 2011-01-04  7:51 UTC (permalink / raw)
  To: lkml, KOSAKI Motohiro, linux-mm, Andrew Morton, Linus Torvalds,
	KAMEZAWA Hiroyuki
  Cc: Figo.zhang, rientjes, Wu Fengguang


i had send the patch to protect the hardware access processes for 
oom-killer before, but rientjes have not agree with me.

but today i catch log from my desktop. oom-killer have kill my "minicom" 
and "Xorg". so i think it should add protection about it.


my desktop run on linux-2.6.36.

[figo@figo-desktop android]$ uname -a
Linux figo-desktop 2.6.36-ARCH #1 SMP PREEMPT Fri Dec 10 20:01:53 UTC 
2010 i686 Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux
[figo@figo-desktop android]$

pls see the log:

> Jan  4 15:22:17 figo-desktop kernel: Mem-Info:
> Jan  4 15:22:17 figo-desktop kernel: DMA per-cpu:
> Jan  4 15:22:17 figo-desktop kernel: CPU    0: hi:    0, btch:   1 usd:   0
> Jan  4 15:22:17 figo-desktop kernel: CPU    1: hi:    0, btch:   1 usd:   0
> Jan  4 15:22:17 figo-desktop kernel: Normal per-cpu:
> Jan  4 15:22:17 figo-desktop kernel: CPU    0: hi:  186, btch:  31 usd:  61
> Jan  4 15:22:17 figo-desktop kernel: CPU    1: hi:  186, btch:  31 usd:  31
> Jan  4 15:22:17 figo-desktop kernel: HighMem per-cpu:
> Jan  4 15:22:17 figo-desktop kernel: CPU    0: hi:  186, btch:  31 usd:  51
> Jan  4 15:22:17 figo-desktop kernel: CPU    1: hi:  186, btch:  31 usd:  50
> Jan  4 15:22:17 figo-desktop kernel: active_anon:123081 inactive_anon:170583 isolated_anon:0
> Jan  4 15:22:17 figo-desktop kernel: active_file:53958 inactive_file:122582 isolated_file:0
> Jan  4 15:22:17 figo-desktop kernel: unevictable:17 dirty:6 writeback:31 unstable:0
> Jan  4 15:22:17 figo-desktop kernel: free:11919 slab_reclaimable:5454 slab_unreclaimable:6124
> Jan  4 15:22:17 figo-desktop kernel: mapped:49964 shmem:29206 pagetables:2585 bounce:0
> Jan  4 15:22:17 figo-desktop kernel: DMA free:7976kB min:64kB low:80kB high:96kB active_anon:1272kB inactive_anon:4672kB active_file:516kB inactive_file:760kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15788kB mlocked:0kB dirty:0kB writeback:0kB mapped:752kB shmem:488kB slab_reclaimable:480kB slab_unreclaimable:112kB kernel_stack:8kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2513 all_unreclaimable? no
> Jan  4 15:22:17 figo-desktop kernel: lowmem_reserve[]: 0 865 1980 1980
> Jan  4 15:22:17 figo-desktop kernel: Normal free:39220kB min:3728kB low:4660kB high:5592kB active_anon:137716kB inactive_anon:312976kB active_file:85368kB inactive_file:207988kB unevictable:68kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:68kB dirty:0kB writeback:0kB mapped:57464kB shmem:45012kB slab_reclaimable:21336kB slab_unreclaimable:24384kB kernel_stack:3272kB pagetables:10340kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:441679 all_unreclaimable? yes
> Jan  4 15:22:17 figo-desktop kernel: lowmem_reserve[]: 0 0 8921 8921
> Jan  4 15:22:17 figo-desktop kernel: HighMem free:480kB min:512kB low:1712kB high:2912kB active_anon:353336kB inactive_anon:364684kB active_file:129948kB inactive_file:281580kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1141984kB mlocked:0kB dirty:24kB writeback:124kB mapped:141640kB shmem:71324kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:650648 all_unreclaimable? yes
> Jan  4 15:22:17 figo-desktop kernel: lowmem_reserve[]: 0 0 0 0
> Jan  4 15:22:17 figo-desktop kernel: DMA: 1482*4kB 256*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7976kB
> Jan  4 15:22:17 figo-desktop kernel: Normal: 1409*4kB 3862*8kB 44*16kB 8*32kB 3*64kB 2*128kB 3*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 39220kB
> Jan  4 15:22:17 figo-desktop kernel: HighMem: 20*4kB 10*8kB 18*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 480kB
> Jan  4 15:22:17 figo-desktop kernel: 212446 total pagecache pages
> Jan  4 15:22:17 figo-desktop kernel: 6668 pages in swap cache
> Jan  4 15:22:17 figo-desktop kernel: Swap cache stats: add 789540, delete 782872, find 277051/391346
> Jan  4 15:22:17 figo-desktop kernel: Free swap  = -28688kB
> Jan  4 15:22:17 figo-desktop kernel: Total swap = 0kB
> Jan  4 15:22:17 figo-desktop kernel: 515070 pages RAM
> Jan  4 15:22:17 figo-desktop kernel: 287745 pages HighMem
> Jan  4 15:22:17 figo-desktop kernel: 8297 pages reserved
> Jan  4 15:22:17 figo-desktop kernel: 306524 pages shared
> Jan  4 15:22:17 figo-desktop kernel: 304318 pages non-shared
> Jan  4 15:22:17 figo-desktop kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
> Jan  4 15:22:17 figo-desktop kernel: [  583]     0   583      581      221   0     -17         -1000 udevd
> Jan  4 15:22:17 figo-desktop kernel: [ 1303]     0  1303      868       37   0       0             0 syslog-ng
> Jan  4 15:22:17 figo-desktop kernel: [ 1304]     0  1304     1604      358   0       0             0 syslog-ng
> Jan  4 15:22:17 figo-desktop kernel: [ 1306]    81  1306      808      351   0       0             0 dbus-daemon
> Jan  4 15:22:17 figo-desktop kernel: [ 1309]    82  1309     3777      366   1       0             0 hald
> Jan  4 15:22:17 figo-desktop kernel: [ 1310]     0  1310      903      128   1       0             0 hald-runner
> Jan  4 15:22:17 figo-desktop kernel: [ 1339]     0  1339      919       92   0       0             0 hald-addon-inpu
> Jan  4 15:22:17 figo-desktop kernel: [ 1357]     0  1357      919      184   0       0             0 hald-addon-stor
> Jan  4 15:22:17 figo-desktop kernel: [ 1359]    82  1359      824      107   1       0             0 hald-addon-acpi
> Jan  4 15:22:17 figo-desktop kernel: [ 1448]     0  1448      616      294   1       0             0 crond
> Jan  4 15:22:17 figo-desktop kernel: [ 1477]     0  1477      720      107   0       0             0 mysqld_safe
> Jan  4 15:22:17 figo-desktop kernel: [ 1484]     0  1484     3580      269   0       0             0 gdm-binary
> Jan  4 15:22:17 figo-desktop kernel: [ 1497]     0  1497      440       67   1       0             0 agetty
> Jan  4 15:22:17 figo-desktop kernel: [ 1498]     0  1498      440       67   0       0             0 agetty
> Jan  4 15:22:17 figo-desktop kernel: [ 1499]     0  1499      440       67   0       0             0 agetty
> Jan  4 15:22:17 figo-desktop kernel: [ 1500]     0  1500      440       67   1       0             0 agetty
> Jan  4 15:22:17 figo-desktop kernel: [ 1501]     0  1501      440       67   0       0             0 agetty
> Jan  4 15:22:17 figo-desktop kernel: [ 1502]     0  1502      440       68   0       0             0 agetty
> Jan  4 15:22:17 figo-desktop kernel: [ 1553]     0  1553     6607      615   0       0             0 NetworkManager
> Jan  4 15:22:17 figo-desktop kernel: [ 1592]     0  1592     2256      329   0       0             0 cupsd
> Jan  4 15:22:17 figo-desktop kernel: [ 1597]    89  1597    29903     2811   1       0             0 mysqld
> Jan  4 15:22:17 figo-desktop kernel: [ 1598]     0  1598     1652      147   1     -17         -1000 sshd
> Jan  4 15:22:17 figo-desktop kernel: [ 1612]     0  1612     6263      506   1       0             0 polkitd
> Jan  4 15:22:17 figo-desktop kernel: [ 1613]     0  1613     2015      122   0       0             0 vmware-usbarbit
> Jan  4 15:22:17 figo-desktop kernel: [ 1617]     0  1617     2887     1483   1       0             0 cntlm
> Jan  4 15:22:17 figo-desktop kernel: [ 1620]     0  1620     4398      365   0       0             0 gdm-simple-slav
> Jan  4 15:22:17 figo-desktop kernel: [ 1636]     0  1636    37300    12287   0       0             0 Xorg
> Jan  4 15:22:17 figo-desktop kernel: [ 1638]     0  1638     1248       88   0       0             0 wpa_supplicant
> Jan  4 15:22:17 figo-desktop kernel: [ 1720]     0  1720     4789     1551   1       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [ 1722]     0  1722      488       93   0       0             0 dhcpcd
> Jan  4 15:22:17 figo-desktop kernel: [ 1732]     0  1732      519       62   1       0             0 vmnet-bridge
> Jan  4 15:22:17 figo-desktop kernel: [ 1750]     0  1750     4922      302   1       0             0 smbd
> Jan  4 15:22:17 figo-desktop kernel: [ 1764]     0  1764      655       39   1       0             0 vmnet-dhcpd
> Jan  4 15:22:17 figo-desktop kernel: [ 1772]     0  1772      513       15   0       0             0 vmnet-netifup
> Jan  4 15:22:17 figo-desktop kernel: [ 1774]     0  1774      655       39   0       0             0 vmnet-dhcpd
> Jan  4 15:22:17 figo-desktop kernel: [ 1776]     0  1776     4922      224   0       0             0 smbd
> Jan  4 15:22:17 figo-desktop kernel: [ 1778]     0  1778      634       44   0       0             0 vmnet-natd
> Jan  4 15:22:17 figo-desktop kernel: [ 1780]     0  1780      513       14   1       0             0 vmnet-netifup
> Jan  4 15:22:17 figo-desktop kernel: [ 1796]     0  1796     6677      326   0       0             0 console-kit-dae
> Jan  4 15:22:17 figo-desktop kernel: [ 1899]   120  1899     6587      598   0       0             0 polkit-gnome-au
> Jan  4 15:22:17 figo-desktop kernel: [ 1903]     0  1903     3905      245   0       0             0 gdm-session-wor
> Jan  4 15:22:17 figo-desktop kernel: [ 1906]     0  1906     3677      314   1       0             0 upowerd
> Jan  4 15:22:17 figo-desktop kernel: [ 1974]  1000  1974    10963     1246   0       0             0 gnome-keyring-d
> Jan  4 15:22:17 figo-desktop kernel: [ 1993]  1000  1993     9253      768   0       0             0 gnome-session
> Jan  4 15:22:17 figo-desktop kernel: [ 2012]  1000  2012      796      102   0       0             0 dbus-launch
> Jan  4 15:22:17 figo-desktop kernel: [ 2013]  1000  2013     5854     3940   0       0             0 dbus-daemon
> Jan  4 15:22:17 figo-desktop kernel: [ 2015]  1000  2015      887       54   0       0             0 ssh-agent
> Jan  4 15:22:17 figo-desktop kernel: [ 2022]  1000  2022    10904     4577   0       0             0 fcitx
> Jan  4 15:22:17 figo-desktop kernel: [ 2032]  1000  2032    41764     1992   0       0             0 gnome-settings-
> Jan  4 15:22:17 figo-desktop kernel: [ 2036]  1000  2036     2367      413   0       0             0 gvfsd
> Jan  4 15:22:17 figo-desktop kernel: [ 2039]  1000  2039    85188    29574   1       0             0 metacity
> Jan  4 15:22:17 figo-desktop kernel: [ 2049]  1000  2049    14162      277   1       0             0 gvfs-fuse-daemo
> Jan  4 15:22:17 figo-desktop kernel: [ 2053]  1000  2053    76657     3521   1       0             0 gnome-panel
> Jan  4 15:22:17 figo-desktop kernel: [ 2056]  1000  2056    10978      456   1       0             0 gvfs-gdu-volume
> Jan  4 15:22:17 figo-desktop kernel: [ 2058]     0  2058     5767      451   1       0             0 udisks-daemon
> Jan  4 15:22:17 figo-desktop kernel: [ 2059]     0  2059     1291       72   1       0             0 udisks-daemon
> Jan  4 15:22:17 figo-desktop kernel: [ 2069]  1000  2069   168292    49344   1       0             0 nautilus
> Jan  4 15:22:17 figo-desktop kernel: [ 2071]  1000  2071    12907      223   0       0             0 bonobo-activati
> Jan  4 15:22:17 figo-desktop kernel: [ 2081]  1000  2081     1633      136   0       0             0 sh
> Jan  4 15:22:17 figo-desktop kernel: [ 2082]  1000  2082     1666      136   1       0             0 thunderbird
> Jan  4 15:22:17 figo-desktop kernel: [ 2084]  1000  2084    46718     3505   1       0             0 wnck-applet
> Jan  4 15:22:17 figo-desktop kernel: [ 2086]  1000  2086    42862     1463   0       0             0 polkit-gnome-au
> Jan  4 15:22:17 figo-desktop kernel: [ 2087]  1000  2087   285933     1579   0       0             0 nm-applet
> Jan  4 15:22:17 figo-desktop kernel: [ 2104]  1000  2104     5056      693   1       0             0 gdu-notificatio
> Jan  4 15:22:17 figo-desktop kernel: [ 2107]  1000  2107     1666      142   1       0             0 run-mozilla.sh
> Jan  4 15:22:17 figo-desktop kernel: [ 2108]  1000  2108    39395      919   0       0             0 gnome-power-man
> Jan  4 15:22:17 figo-desktop kernel: [ 2109]  1000  2109     7886      701   0       0             0 vino-server
> Jan  4 15:22:17 figo-desktop kernel: [ 2110]  1000  2110    10764      791   1       0             0 evolution-alarm
> Jan  4 15:22:17 figo-desktop kernel: [ 2114]  1000  2114   167865    28925   1       0             0 thunderbird-bin
> Jan  4 15:22:17 figo-desktop kernel: [ 2121]  1000  2121    42098     1969   1       0             0 notify-osd
> Jan  4 15:22:17 figo-desktop kernel: [ 2125]  1000  2125    42488     1203   0       0             0 cpufreq-applet
> Jan  4 15:22:17 figo-desktop kernel: [ 2126]  1000  2126    41394     1145   1       0             0 multiload-apple
> Jan  4 15:22:17 figo-desktop kernel: [ 2129]  1000  2129    64405     1826   0       0             0 mixer_applet2
> Jan  4 15:22:17 figo-desktop kernel: [ 2131]  1000  2131    75346     2473   0       0             0 clock-applet
> Jan  4 15:22:17 figo-desktop kernel: [ 2132]  1000  2132    41163      931   0       0             0 notification-ar
> Jan  4 15:22:17 figo-desktop kernel: [ 2149]  1000  2149    15325      695   1       0             0 e-calendar-fact
> Jan  4 15:22:17 figo-desktop kernel: [ 2153]  1000  2153     7497     1010   0       0             0 gnome-screensav
> Jan  4 15:22:17 figo-desktop kernel: [ 2155]  1000  2155     3848      202   1       0             0 pxgconf
> Jan  4 15:22:17 figo-desktop kernel: [ 2163]  1000  2163     1781      341   0       0             0 mission-control
> Jan  4 15:22:17 figo-desktop kernel: [ 2173]  1000  2173     4486      424   0       0             0 gvfsd-trash
> Jan  4 15:22:17 figo-desktop kernel: [ 2186]     0  2186     3543      190   1       0             0 system-tools-ba
> Jan  4 15:22:17 figo-desktop kernel: [ 2210]  1000  2210     2202      238   0       0             0 gvfsd-burn
> Jan  4 15:22:17 figo-desktop kernel: [ 2245]  1000  2245     3895     1586   0       0             0 gvfsd-metadata
> Jan  4 15:22:17 figo-desktop kernel: [ 2274]  1000  2274    22823      424   1       0             0 conky
> Jan  4 15:22:17 figo-desktop kernel: [ 2281]     0  2281     3295     2278   0       0             0 SystemToolsBack
> Jan  4 15:22:17 figo-desktop kernel: [ 2663]  1000  2663    68807     3072   0       0             0 gnome-terminal
> Jan  4 15:22:17 figo-desktop kernel: [ 2683]  1000  2683      451       76   0       0             0 gnome-pty-helpe
> Jan  4 15:22:17 figo-desktop kernel: [ 2685]  1000  2685     2072      566   0       0             0 bash
> Jan  4 15:22:17 figo-desktop kernel: [ 2885]     0  2885     1489      119   0       0             0 sudo
> Jan  4 15:22:17 figo-desktop kernel: [ 2886]     0  2886     6273      352   1       0             0 vim
> Jan  4 15:22:17 figo-desktop kernel: [ 2887]  1000  2887      472       84   0       0             0 ping
> Jan  4 15:22:17 figo-desktop kernel: [ 2892]  1000  2892      472       83   1       0             0 ping
> Jan  4 15:22:17 figo-desktop kernel: [ 2894]  1000  2894    76113     5872   1       0             0 vmware
> Jan  4 15:22:17 figo-desktop kernel: [ 2919]  1000  2919    51497     3314   1       0             0 vmware-tray
> Jan  4 15:22:17 figo-desktop kernel: [ 2954]  1000  2954    48676     1589   1       0             0 vmware-unity-he
> Jan  4 15:22:17 figo-desktop kernel: [ 2988]  1000  2988   190471    42233   1       0             0 vmware-vmx
> Jan  4 15:22:17 figo-desktop kernel: [ 3207]  1000  3207     4377      360   1       0             0 gvfsd-computer
> Jan  4 15:22:17 figo-desktop kernel: [ 3211]  1000  3211     9920      506   0       0             0 gvfsd-smb-brows
> Jan  4 15:22:17 figo-desktop kernel: [ 3217]  1000  3217     9876      564   0       0             0 gvfsd-smb
> Jan  4 15:22:17 figo-desktop kernel: [15186]  1000 15186     2069      558   1       0             0 bash
> Jan  4 15:22:17 figo-desktop kernel: [17451]  1000 17451     5679      496   0       0             0 dconf-service
> Jan  4 15:22:17 figo-desktop kernel: [19085]  1000 19085     1576      149   0       0             0 ssh
> Jan  4 15:22:17 figo-desktop kernel: [19261]  1000 19261     1682      962   0       0             0 wineserver
> Jan  4 15:22:17 figo-desktop kernel: [19266]  1000 19266   399085      212   0       0             0 services.exe
> Jan  4 15:22:17 figo-desktop kernel: [19269]  1000 19269   399117      158   1       0             0 winedevice.exe
> Jan  4 15:22:17 figo-desktop kernel: [19342]  1000 19342   404518      639   0       0             0 explorer.exe
> Jan  4 15:22:17 figo-desktop kernel: [19344]  1000 19344   550020    10958   0       0             0 insight3.exe
> Jan  4 15:22:17 figo-desktop kernel: [  360]  1000   360     2069      540   1       0             0 bash
> Jan  4 15:22:17 figo-desktop kernel: [ 9821]  1000  9821   166228     2900   0       0             0 stardict
> Jan  4 15:22:17 figo-desktop kernel: [14614]  1000 14614     2040      521   1       0             0 bash
> Jan  4 15:22:17 figo-desktop kernel: [17002]  1000 17002     2069      541   0       0             0 bash
> Jan  4 15:22:17 figo-desktop kernel: [18612]     0 18612     1536      125   1       0             0 sudo
> Jan  4 15:22:17 figo-desktop kernel: [18613]     0 18613     1988      608   1       0             0 minicom
> Jan  4 15:22:17 figo-desktop kernel: [21183]  1000 21183     2041      517   1       0             0 bash
> Jan  4 15:22:17 figo-desktop kernel: [21194]  1000 21194     1611      125   0       0             0 ssh
> Jan  4 15:22:17 figo-desktop kernel: [22451]  1000 22451     2069      571   1       0             0 bash
> Jan  4 15:22:17 figo-desktop kernel: [23428]  1000 23428     6475      552   1       0             0 vim
> Jan  4 15:22:17 figo-desktop kernel: [23484]  1000 23484     6501      593   1       0             0 vim
> Jan  4 15:22:17 figo-desktop kernel: [23549]  1000 23549     6501      591   0       0             0 vim
> Jan  4 15:22:17 figo-desktop kernel: [23642]  1000 23642     9865      578   0       0             0 gvfsd-smb
> Jan  4 15:22:17 figo-desktop kernel: [26358]  1000 26358   407339     5016   1       0             0 insight3.exe
> Jan  4 15:22:17 figo-desktop kernel: [29711]  1000 29711     9943      606   0       0             0 gvfsd-smb
> Jan  4 15:22:17 figo-desktop kernel: [26156]  1000 26156    61269     9117   0       0             0 skype
> Jan  4 15:22:17 figo-desktop kernel: [32490]  1000 32490     2647      684   0       0             0 gconfd-2
> Jan  4 15:22:17 figo-desktop kernel: [10622]  1000 10622     2072      734   0       0             0 bash
> Jan  4 15:22:17 figo-desktop kernel: [10634]  1000 10634     1576      156   0       0             0 ssh
> Jan  4 15:22:17 figo-desktop kernel: [15410]  1000 15410    76559    12123   0       0             0 evince
> Jan  4 15:22:17 figo-desktop kernel: [15415]  1000 15415     5490      216   1       0             0 evinced
> Jan  4 15:22:17 figo-desktop kernel: [16754]  1000 16754     9899      482   0       0             0 gvfsd-smb
> Jan  4 15:22:17 figo-desktop kernel: [16772]  1000 16772     9900      485   0       0             0 gvfsd-smb
> Jan  4 15:22:17 figo-desktop kernel: [25390]  1000 25390   407306     2149   0       0             0 insight3.exe
> Jan  4 15:22:17 figo-desktop kernel: [ 2127]  1000  2127     1609      122   0       0             0 ssh
> Jan  4 15:22:17 figo-desktop kernel: [10661]    33 10661     4775     1503   1       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [10662]    33 10662     4823     1565   0       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [10663]    33 10663     4823     1566   0       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [10664]    33 10664     4823     1565   0       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [10665]    33 10665     4823     1565   0       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [10666]    33 10666     4823     1565   1       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [32159]  1000 32159   166096    38997   0       0             0 firefox
> Jan  4 15:22:17 figo-desktop kernel: [32228]    33 32228     4823     1508   0       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [32233]    33 32233     4857     1591   1       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [32234]    33 32234     4857     1593   1       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [32241]    33 32241     4789     1532   1       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [32246]    33 32246     4789     1532   0       0             0 httpd
> Jan  4 15:22:17 figo-desktop kernel: [32268]  1000 32268    27543     4106   1       0             0 plugin-containe
> Jan  4 15:22:17 figo-desktop kernel: [32320]  1000 32320    16230     1439   1       0             0 GoogleTalkPlugi
> Jan  4 15:22:17 figo-desktop kernel: [  970]  1000   970   407197     2574   1       0             0 insight3.exe
> Jan  4 15:22:17 figo-desktop kernel: [ 1240]  1000  1240      601      101   1       0             0 top
> Jan  4 15:22:17 figo-desktop kernel: [ 2038]  1000  2038   407785     2659   1       0             0 insight3.exe
> Jan  4 15:22:17 figo-desktop kernel: [12415]     0 12415      580      192   1     -17         -1000 udevd
> Jan  4 15:22:17 figo-desktop kernel: [12416]     0 12416      580      192   1     -17         -1000 udevd
> Jan  4 15:22:17 figo-desktop kernel: [13760]     0 13760      719      217   1       0             0 sh
> Jan  4 15:22:17 figo-desktop kernel: [13762]     0 13762      719      213   0       0             0 sh
> Jan  4 15:22:17 figo-desktop kernel: [13765]     0 13765     1187      161   1       0             0 git
> Jan  4 15:22:17 figo-desktop kernel: [13766]     0 13766      719      236   1       0             0 git-pull
> Jan  4 15:22:17 figo-desktop kernel: [13782]     0 13782     1189      172   0       0             0 git
> Jan  4 15:22:17 figo-desktop kernel: [13784]     0 13784     1575      353   1       0             0 ssh
> Jan  4 15:22:17 figo-desktop kernel: [13824]     0 13824     1535      244   1       0             0 sudo
> Jan  4 15:22:17 figo-desktop kernel: [13825]     0 13825     1386      124   1       0             0 swapoff
> Jan  4 15:22:51 figo-desktop kernel: minicom invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
> Jan  4 15:22:51 figo-desktop kernel: minicom cpuset=/ mems_allowed=0
> Jan  4 15:22:51 figo-desktop kernel: Pid: 18613, comm: minicom Not tainted 2.6.36-ARCH #1
> Jan  4 15:22:51 figo-desktop kernel: Call Trace:
> Jan  4 15:22:51 figo-desktop kernel: [<c10c11d0>] dump_header.clone.5+0x80/0x1e0
> Jan  4 15:22:51 figo-desktop kernel: [<c10c153c>] oom_kill_process+0x5c/0x1c0
> Jan  4 15:22:51 figo-desktop kernel: [<c10c171d>] ? select_bad_process.clone.7+0x7d/0xd0
> Jan  4 15:22:51 figo-desktop kernel: [<c10c1a0f>] out_of_memory+0xbf/0x1d0
> Jan  4 15:22:51 figo-desktop kernel: [<c10c18d8>] ? try_set_zonelist_oom+0xc8/0xe0
> Jan  4 15:22:51 figo-desktop kernel: [<c10c5288>] __alloc_pages_nodemask+0x5e8/0x600
> Jan  4 15:22:51 figo-desktop kernel: [<c10c6c35>] __do_page_cache_readahead+0x105/0x230
> Jan  4 15:22:51 figo-desktop kernel: [<c10c6fc1>] ra_submit+0x21/0x30
> Jan  4 15:22:51 figo-desktop kernel: [<c10bf31b>] filemap_fault+0x36b/0x3e0
> Jan  4 15:22:51 figo-desktop kernel: [<c10d637b>] __do_fault+0x3b/0x4f0
> Jan  4 15:22:51 figo-desktop kernel: [<c122e619>] ? check_modem_status+0x19/0x1d0
> Jan  4 15:22:51 figo-desktop kernel: [<c10befb0>] ? filemap_fault+0x0/0x3e0
> Jan  4 15:22:51 figo-desktop kernel: [<c10d95c1>] handle_mm_fault+0x111/0x970
> Jan  4 15:22:51 figo-desktop kernel: [<c1172121>] ? tomoyo_init_request_info+0x41/0x50
> Jan  4 15:22:51 figo-desktop kernel: [<c1028d60>] ? do_page_fault+0x0/0x3e0
> Jan  4 15:22:51 figo-desktop kernel: [<c1028eb0>] do_page_fault+0x150/0x3e0
> Jan  4 15:22:51 figo-desktop kernel: [<c1170eb2>] ? tomoyo_file_ioctl+0x12/0x20
> Jan  4 15:22:51 figo-desktop kernel: [<c110c43f>] ? sys_ioctl+0x5f/0x80
> Jan  4 15:22:51 figo-desktop kernel: [<c1028d60>] ? do_page_fault+0x0/0x3e0
> Jan  4 15:22:51 figo-desktop kernel: [<c130753b>] error_code+0x67/0x6c
> Jan  4 15:22:51 figo-desktop kernel: Mem-Info:
> Jan  4 15:22:51 figo-desktop kernel: DMA per-cpu:
> Jan  4 15:22:51 figo-desktop kernel: CPU    0: hi:    0, btch:   1 usd:   0
> Jan  4 15:22:51 figo-desktop kernel: CPU    1: hi:    0, btch:   1 usd:   0
> Jan  4 15:22:51 figo-desktop kernel: Normal per-cpu:
> Jan  4 15:22:51 figo-desktop kernel: CPU    0: hi:  186, btch:  31 usd:  61
> Jan  4 15:22:54 figo-desktop kernel: CPU    1: hi:  186, btch:  31 usd:   1
> Jan  4 15:22:54 figo-desktop kernel: HighMem per-cpu:
> Jan  4 15:22:54 figo-desktop kernel: CPU    0: hi:  186, btch:  31 usd:  30
> Jan  4 15:22:54 figo-desktop kernel: CPU    1: hi:  186, btch:  31 usd:  30
> Jan  4 15:22:54 figo-desktop kernel: active_anon:191656 inactive_anon:101947 isolated_anon:0
> Jan  4 15:22:54 figo-desktop kernel: active_file:54186 inactive_file:122506 isolated_file:0
> Jan  4 15:22:54 figo-desktop kernel: unevictable:17 dirty:0 writeback:0 unstable:0
> Jan  4 15:22:54 figo-desktop kernel: free:11958 slab_reclaimable:5450 slab_unreclaimable:6123
> Jan  4 15:22:54 figo-desktop kernel: mapped:49649 shmem:29219 pagetables:2559 bounce:0
> Jan  4 15:22:54 figo-desktop kernel: DMA free:7960kB min:64kB low:80kB high:96kB active_anon:4772kB inactive_anon:1180kB active_file:524kB inactive_file:768kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15788kB mlocked:0kB dirty:0kB writeback:0kB mapped:592kB shmem:348kB slab_reclaimable:480kB slab_unreclaimable:112kB kernel_stack:8kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2689 all_unreclaimable? yes
> Jan  4 15:22:55 figo-desktop kernel: lowmem_reserve[]: 0 865 1980 1980
> Jan  4 15:22:55 figo-desktop kernel: Normal free:39364kB min:3728kB low:4660kB high:5592kB active_anon:224324kB inactive_anon:226020kB active_file:86076kB inactive_file:207880kB unevictable:68kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:68kB dirty:0kB writeback:0kB mapped:57272kB shmem:42072kB slab_reclaimable:21320kB slab_unreclaimable:24380kB kernel_stack:3224kB pagetables:10236kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:445029 all_unreclaimable? yes
> Jan  4 15:22:55 figo-desktop kernel: lowmem_reserve[]: 0 0 8921 8921
> Jan  4 15:22:55 figo-desktop kernel: HighMem free:508kB min:512kB low:1712kB high:2912kB active_anon:537528kB inactive_anon:180588kB active_file:130144kB inactive_file:281376kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1141984kB mlocked:0kB dirty:0kB writeback:0kB mapped:140732kB shmem:74456kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:644938 all_unreclaimable? yes
> Jan  4 15:22:55 figo-desktop kernel: lowmem_reserve[]: 0 0 0 0
> Jan  4 15:22:55 figo-desktop kernel: DMA: 1500*4kB 245*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7960kB
> Jan  4 15:22:55 figo-desktop kernel: Normal: 1399*4kB 3723*8kB 91*16kB 21*32kB 5*64kB 2*128kB 3*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 39364kB
> Jan  4 15:22:55 figo-desktop kernel: HighMem: 27*4kB 10*8kB 18*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508kB
> Jan  4 15:22:55 figo-desktop kernel: 206073 total pagecache pages
> Jan  4 15:22:55 figo-desktop kernel: 151 pages in swap cache
> Jan  4 15:22:55 figo-desktop kernel: Swap cache stats: add 889409, delete 889258, find 277087/391428
> Jan  4 15:22:55 figo-desktop kernel: Free swap  = -1636kB
> Jan  4 15:22:55 figo-desktop kernel: Total swap = 0kB
> Jan  4 15:22:55 figo-desktop kernel: 515070 pages RAM
> Jan  4 15:22:55 figo-desktop kernel: 287745 pages HighMem
> Jan  4 15:22:55 figo-desktop kernel: 8297 pages reserved
> Jan  4 15:22:55 figo-desktop kernel: 304095 pages shared
> Jan  4 15:22:55 figo-desktop kernel: 306148 pages non-shared
> Jan  4 15:22:55 figo-desktop kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
> Jan  4 15:22:55 figo-desktop kernel: [  583]     0   583      581      207   1     -17         -1000 udevd
> Jan  4 15:22:55 figo-desktop kernel: [ 1303]     0  1303      868       37   0       0             0 syslog-ng
> Jan  4 15:22:55 figo-desktop kernel: [ 1304]     0  1304     1604      350   0       0             0 syslog-ng
> Jan  4 15:22:55 figo-desktop kernel: [ 1306]    81  1306      808      396   1       0             0 dbus-daemon
> Jan  4 15:22:55 figo-desktop kernel: [ 1309]    82  1309     3777      406   0       0             0 hald
> Jan  4 15:22:55 figo-desktop kernel: [ 1310]     0  1310      903      130   1       0             0 hald-runner
> Jan  4 15:22:55 figo-desktop kernel: [ 1339]     0  1339      919       90   1       0             0 hald-addon-inpu
> Jan  4 15:22:55 figo-desktop kernel: [ 1357]     0  1357      919      184   1       0             0 hald-addon-stor
> Jan  4 15:22:55 figo-desktop kernel: [ 1359]    82  1359      824      107   1       0             0 hald-addon-acpi
> Jan  4 15:22:55 figo-desktop kernel: [ 1448]     0  1448      616      294   0       0             0 crond
> Jan  4 15:22:55 figo-desktop kernel: [ 1477]     0  1477      720      108   0       0             0 mysqld_safe
> Jan  4 15:22:55 figo-desktop kernel: [ 1484]     0  1484     3580      269   0       0             0 gdm-binary
> Jan  4 15:22:55 figo-desktop kernel: [ 1497]     0  1497      440       67   1       0             0 agetty
> Jan  4 15:22:55 figo-desktop kernel: [ 1498]     0  1498      440       67   0       0             0 agetty
> Jan  4 15:22:55 figo-desktop kernel: [ 1499]     0  1499      440       67   0       0             0 agetty
> Jan  4 15:22:55 figo-desktop kernel: [ 1500]     0  1500      440       67   1       0             0 agetty
> Jan  4 15:22:55 figo-desktop kernel: [ 1501]     0  1501      440       67   0       0             0 agetty
> Jan  4 15:22:55 figo-desktop kernel: [ 1502]     0  1502      440       68   0       0             0 agetty
> Jan  4 15:22:55 figo-desktop kernel: [ 1553]     0  1553     6607      619   0       0             0 NetworkManager
> Jan  4 15:22:55 figo-desktop kernel: [ 1592]     0  1592     2256      330   0       0             0 cupsd
> Jan  4 15:22:55 figo-desktop kernel: [ 1597]    89  1597    29903     2812   1       0             0 mysqld
> Jan  4 15:22:55 figo-desktop kernel: [ 1598]     0  1598     1652      147   1     -17         -1000 sshd
> Jan  4 15:22:55 figo-desktop kernel: [ 1612]     0  1612     6263      508   1       0             0 polkitd
> Jan  4 15:22:55 figo-desktop kernel: [ 1613]     0  1613     2015      123   1       0             0 vmware-usbarbit
> Jan  4 15:22:55 figo-desktop kernel: [ 1617]     0  1617     2887     1603   0       0             0 cntlm
> Jan  4 15:22:55 figo-desktop kernel: [ 1620]     0  1620     4398      366   0       0             0 gdm-simple-slav
> Jan  4 15:22:55 figo-desktop kernel: [ 1636]     0  1636    37298    12186   0       0             0 Xorg
> Jan  4 15:22:55 figo-desktop kernel: [ 1638]     0  1638     1248       88   0       0             0 wpa_supplicant
> Jan  4 15:22:55 figo-desktop kernel: [ 1720]     0  1720     4789     1555   1       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [ 1722]     0  1722      488       93   0       0             0 dhcpcd
> Jan  4 15:22:55 figo-desktop kernel: [ 1732]     0  1732      519       62   1       0             0 vmnet-bridge
> Jan  4 15:22:55 figo-desktop kernel: [ 1750]     0  1750     4922      304   1       0             0 smbd
> Jan  4 15:22:55 figo-desktop kernel: [ 1764]     0  1764      655       39   1       0             0 vmnet-dhcpd
> Jan  4 15:22:55 figo-desktop kernel: [ 1772]     0  1772      513       15   0       0             0 vmnet-netifup
> Jan  4 15:22:55 figo-desktop kernel: [ 1774]     0  1774      655       39   0       0             0 vmnet-dhcpd
> Jan  4 15:22:55 figo-desktop kernel: [ 1776]     0  1776     4922      225   0       0             0 smbd
> Jan  4 15:22:55 figo-desktop kernel: [ 1778]     0  1778      634       44   0       0             0 vmnet-natd
> Jan  4 15:22:55 figo-desktop kernel: [ 1780]     0  1780      513       14   1       0             0 vmnet-netifup
> Jan  4 15:22:55 figo-desktop kernel: [ 1796]     0  1796     6677      329   0       0             0 console-kit-dae
> Jan  4 15:22:55 figo-desktop kernel: [ 1899]   120  1899     6587      598   0       0             0 polkit-gnome-au
> Jan  4 15:22:55 figo-desktop kernel: [ 1903]     0  1903     3905      245   0       0             0 gdm-session-wor
> Jan  4 15:22:55 figo-desktop kernel: [ 1906]     0  1906     3677      314   1       0             0 upowerd
> Jan  4 15:22:55 figo-desktop kernel: [ 1974]  1000  1974    10963     1248   0       0             0 gnome-keyring-d
> Jan  4 15:22:55 figo-desktop kernel: [ 1993]  1000  1993     9253      776   0       0             0 gnome-session
> Jan  4 15:22:55 figo-desktop kernel: [ 2012]  1000  2012      796      103   0       0             0 dbus-launch
> Jan  4 15:22:55 figo-desktop kernel: [ 2013]  1000  2013     5854     4985   0       0             0 dbus-daemon
> Jan  4 15:22:55 figo-desktop kernel: [ 2015]  1000  2015      887       54   0       0             0 ssh-agent
> Jan  4 15:22:55 figo-desktop kernel: [ 2022]  1000  2022    10904     4789   0       0             0 fcitx
> Jan  4 15:22:55 figo-desktop kernel: [ 2032]  1000  2032    41764     2007   0       0             0 gnome-settings-
> Jan  4 15:22:55 figo-desktop kernel: [ 2036]  1000  2036     2367      416   0       0             0 gvfsd
> Jan  4 15:22:55 figo-desktop kernel: [ 2039]  1000  2039    85188    29636   1       0             0 metacity
> Jan  4 15:22:55 figo-desktop kernel: [ 2049]  1000  2049    14162      280   1       0             0 gvfs-fuse-daemo
> Jan  4 15:22:55 figo-desktop kernel: [ 2053]  1000  2053    76657     3620   0       0             0 gnome-panel
> Jan  4 15:22:55 figo-desktop kernel: [ 2056]  1000  2056    10978      563   1       0             0 gvfs-gdu-volume
> Jan  4 15:22:55 figo-desktop kernel: [ 2058]     0  2058     5767      503   0       0             0 udisks-daemon
> Jan  4 15:22:55 figo-desktop kernel: [ 2059]     0  2059     1291       72   0       0             0 udisks-daemon
> Jan  4 15:22:55 figo-desktop kernel: [ 2069]  1000  2069   168292    50943   0       0             0 nautilus
> Jan  4 15:22:55 figo-desktop kernel: [ 2071]  1000  2071    12907      223   0       0             0 bonobo-activati
> Jan  4 15:22:55 figo-desktop kernel: [ 2081]  1000  2081     1633      136   0       0             0 sh
> Jan  4 15:22:55 figo-desktop kernel: [ 2082]  1000  2082     1666      136   1       0             0 thunderbird
> Jan  4 15:22:55 figo-desktop kernel: [ 2084]  1000  2084    46718     3501   1       0             0 wnck-applet
> Jan  4 15:22:55 figo-desktop kernel: [ 2086]  1000  2086    42862     1487   0       0             0 polkit-gnome-au
> Jan  4 15:22:55 figo-desktop kernel: [ 2087]  1000  2087   285933     1584   0       0             0 nm-applet
> Jan  4 15:22:55 figo-desktop kernel: [ 2104]  1000  2104     5056      715   0       0             0 gdu-notificatio
> Jan  4 15:22:55 figo-desktop kernel: [ 2107]  1000  2107     1666      142   1       0             0 run-mozilla.sh
> Jan  4 15:22:55 figo-desktop kernel: [ 2108]  1000  2108    39395      973   0       0             0 gnome-power-man
> Jan  4 15:22:55 figo-desktop kernel: [ 2109]  1000  2109     7886      709   0       0             0 vino-server
> Jan  4 15:22:55 figo-desktop kernel: [ 2110]  1000  2110    10764      791   1       0             0 evolution-alarm
> Jan  4 15:22:55 figo-desktop kernel: [ 2114]  1000  2114   167865    29673   1       0             0 thunderbird-bin
> Jan  4 15:22:55 figo-desktop kernel: [ 2121]  1000  2121    42098     1965   0       0             0 notify-osd
> Jan  4 15:22:55 figo-desktop kernel: [ 2125]  1000  2125    42488     1203   1       0             0 cpufreq-applet
> Jan  4 15:22:55 figo-desktop kernel: [ 2126]  1000  2126    41394     1148   1       0             0 multiload-apple
> Jan  4 15:22:55 figo-desktop kernel: [ 2129]  1000  2129    64405     1827   0       0             0 mixer_applet2
> Jan  4 15:22:55 figo-desktop kernel: [ 2131]  1000  2131    75346     2489   0       0             0 clock-applet
> Jan  4 15:22:55 figo-desktop kernel: [ 2132]  1000  2132    41163      930   1       0             0 notification-ar
> Jan  4 15:22:55 figo-desktop kernel: [ 2149]  1000  2149    15325      697   1       0             0 e-calendar-fact
> Jan  4 15:22:55 figo-desktop kernel: [ 2153]  1000  2153     7497     1008   0       0             0 gnome-screensav
> Jan  4 15:22:55 figo-desktop kernel: [ 2155]  1000  2155     3848      202   1       0             0 pxgconf
> Jan  4 15:22:55 figo-desktop kernel: [ 2163]  1000  2163     1781      342   0       0             0 mission-control
> Jan  4 15:22:55 figo-desktop kernel: [ 2173]  1000  2173     4486      428   0       0             0 gvfsd-trash
> Jan  4 15:22:55 figo-desktop kernel: [ 2186]     0  2186     3543      192   1       0             0 system-tools-ba
> Jan  4 15:22:55 figo-desktop kernel: [ 2210]  1000  2210     2202      238   0       0             0 gvfsd-burn
> Jan  4 15:22:55 figo-desktop kernel: [ 2245]  1000  2245     3895     1730   0       0             0 gvfsd-metadata
> Jan  4 15:22:55 figo-desktop kernel: [ 2274]  1000  2274    22823      424   0       0             0 conky
> Jan  4 15:22:55 figo-desktop kernel: [ 2281]     0  2281     3295     2278   0       0             0 SystemToolsBack
> Jan  4 15:22:55 figo-desktop kernel: [ 2663]  1000  2663    68807     3042   1       0             0 gnome-terminal
> Jan  4 15:22:55 figo-desktop kernel: [ 2683]  1000  2683      451       76   0       0             0 gnome-pty-helpe
> Jan  4 15:22:55 figo-desktop kernel: [ 2685]  1000  2685     2072      570   0       0             0 bash
> Jan  4 15:22:55 figo-desktop kernel: [ 2885]     0  2885     1489      119   0       0             0 sudo
> Jan  4 15:22:55 figo-desktop kernel: [ 2886]     0  2886     6273      355   1       0             0 vim
> Jan  4 15:22:55 figo-desktop kernel: [ 2887]  1000  2887      472       84   0       0             0 ping
> Jan  4 15:22:55 figo-desktop kernel: [ 2892]  1000  2892      472       83   1       0             0 ping
> Jan  4 15:22:55 figo-desktop kernel: [ 2894]  1000  2894    76113     5890   0       0             0 vmware
> Jan  4 15:22:55 figo-desktop kernel: [ 2919]  1000  2919    51497     3315   1       0             0 vmware-tray
> Jan  4 15:22:55 figo-desktop kernel: [ 2954]  1000  2954    48676     1589   0       0             0 vmware-unity-he
> Jan  4 15:22:55 figo-desktop kernel: [ 2988]  1000  2988   190471    42400   0       0             0 vmware-vmx
> Jan  4 15:22:55 figo-desktop kernel: [ 3207]  1000  3207     4377      362   1       0             0 gvfsd-computer
> Jan  4 15:22:55 figo-desktop kernel: [ 3211]  1000  3211     9920      509   0       0             0 gvfsd-smb-brows
> Jan  4 15:22:55 figo-desktop kernel: [ 3217]  1000  3217     9876      569   0       0             0 gvfsd-smb
> Jan  4 15:22:55 figo-desktop kernel: [15186]  1000 15186     2069      558   1       0             0 bash
> Jan  4 15:22:55 figo-desktop kernel: [17451]  1000 17451     5679      503   0       0             0 dconf-service
> Jan  4 15:22:55 figo-desktop kernel: [19085]  1000 19085     1576      149   0       0             0 ssh
> Jan  4 15:22:55 figo-desktop kernel: [19261]  1000 19261     1682      967   0       0             0 wineserver
> Jan  4 15:22:55 figo-desktop kernel: [19266]  1000 19266   399085      212   0       0             0 services.exe
> Jan  4 15:22:55 figo-desktop kernel: [19269]  1000 19269   399117      158   1       0             0 winedevice.exe
> Jan  4 15:22:55 figo-desktop kernel: [19342]  1000 19342   404518      643   0       0             0 explorer.exe
> Jan  4 15:22:55 figo-desktop kernel: [19344]  1000 19344   550020    11028   1       0             0 insight3.exe
> Jan  4 15:22:55 figo-desktop kernel: [  360]  1000   360     2069      541   1       0             0 bash
> Jan  4 15:22:55 figo-desktop kernel: [ 9821]  1000  9821   166228     2915   0       0             0 stardict
> Jan  4 15:22:55 figo-desktop kernel: [14614]  1000 14614     2040      523   1       0             0 bash
> Jan  4 15:22:55 figo-desktop kernel: [17002]  1000 17002     2069      541   0       0             0 bash
> Jan  4 15:22:55 figo-desktop kernel: [18612]     0 18612     1536      125   1       0             0 sudo
> Jan  4 15:22:55 figo-desktop kernel: [18613]     0 18613     1988      608   1       0             0 minicom
> Jan  4 15:22:55 figo-desktop kernel: [21183]  1000 21183     2041      517   1       0             0 bash
> Jan  4 15:22:55 figo-desktop kernel: [21194]  1000 21194     1611      125   0       0             0 ssh
> Jan  4 15:22:55 figo-desktop kernel: [22451]  1000 22451     2069      578   1       0             0 bash
> Jan  4 15:22:55 figo-desktop kernel: [23428]  1000 23428     6475      554   1       0             0 vim
> Jan  4 15:22:55 figo-desktop kernel: [23484]  1000 23484     6501      594   1       0             0 vim
> Jan  4 15:22:55 figo-desktop kernel: [23549]  1000 23549     6501      594   0       0             0 vim
> Jan  4 15:22:55 figo-desktop kernel: [23642]  1000 23642     9865      594   0       0             0 gvfsd-smb
> Jan  4 15:22:55 figo-desktop kernel: [26358]  1000 26358   407339     5019   1       0             0 insight3.exe
> Jan  4 15:22:55 figo-desktop kernel: [29711]  1000 29711     9943      617   0       0             0 gvfsd-smb
> Jan  4 15:22:55 figo-desktop kernel: [26156]  1000 26156    61269     9179   0       0             0 skype
> Jan  4 15:22:55 figo-desktop kernel: [32490]  1000 32490     2647      684   0       0             0 gconfd-2
> Jan  4 15:22:55 figo-desktop kernel: [10622]  1000 10622     2072      725   1       0             0 bash
> Jan  4 15:22:55 figo-desktop kernel: [10634]  1000 10634     1576      156   0       0             0 ssh
> Jan  4 15:22:55 figo-desktop kernel: [15410]  1000 15410    76559    12161   0       0             0 evince
> Jan  4 15:22:55 figo-desktop kernel: [15415]  1000 15415     5490      217   1       0             0 evinced
> Jan  4 15:22:55 figo-desktop kernel: [16754]  1000 16754     9899      485   0       0             0 gvfsd-smb
> Jan  4 15:22:55 figo-desktop kernel: [16772]  1000 16772     9900      500   0       0             0 gvfsd-smb
> Jan  4 15:22:55 figo-desktop kernel: [25390]  1000 25390   407306     2164   1       0             0 insight3.exe
> Jan  4 15:22:55 figo-desktop kernel: [ 2127]  1000  2127     1609      125   0       0             0 ssh
> Jan  4 15:22:55 figo-desktop kernel: [10661]    33 10661     4775     1510   1       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [10662]    33 10662     4823     1569   0       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [10663]    33 10663     4823     1570   0       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [10664]    33 10664     4823     1569   0       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [10665]    33 10665     4823     1569   0       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [10666]    33 10666     4823     1569   1       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [32159]  1000 32159   166096    39479   0       0             0 firefox
> Jan  4 15:22:55 figo-desktop kernel: [32228]    33 32228     4823     1512   0       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [32233]    33 32233     4857     1596   1       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [32234]    33 32234     4857     1597   1       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [32241]    33 32241     4789     1536   1       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [32246]    33 32246     4789     1536   0       0             0 httpd
> Jan  4 15:22:55 figo-desktop kernel: [32268]  1000 32268    27543     4279   1       0             0 plugin-containe
> Jan  4 15:22:55 figo-desktop kernel: [32320]  1000 32320    16230     1445   1       0             0 GoogleTalkPlugi
> Jan  4 15:22:55 figo-desktop kernel: [  970]  1000   970   407197     2636   1       0             0 insight3.exe
> Jan  4 15:22:55 figo-desktop kernel: [ 1240]  1000  1240      601      116   1       0             0 top
> Jan  4 15:22:55 figo-desktop kernel: [ 2038]  1000  2038   407786     2842   0       0             0 insight3.exe
> Jan  4 15:22:55 figo-desktop kernel: [12415]     0 12415      543      232   1     -17         -1000 udevd
> Jan  4 15:22:55 figo-desktop kernel: [12416]     0 12416      580      195   1     -17         -1000 udevd
> Jan  4 15:22:55 figo-desktop kernel: [13904]     0 13904     1488      219   1       0             0 sudo
> Jan  4 15:22:55 figo-desktop kernel: [13906]     0 13906     1386      131   0       0             0 swapoff
> Jan  4 15:23:30 figo-desktop kernel: hald-addon-stor invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
> Jan  4 15:23:30 figo-desktop kernel: hald-addon-stor cpuset=/ mems_allowed=0
> Jan  4 15:23:30 figo-desktop kernel: Pid: 1357, comm: hald-addon-stor Not tainted 2.6.36-ARCH #1
> Jan  4 15:23:30 figo-desktop kernel: Call Trace:
> Jan  4 15:23:30 figo-desktop kernel: [<c10c11d0>] dump_header.clone.5+0x80/0x1e0
> Jan  4 15:23:30 figo-desktop kernel: [<c10c153c>] oom_kill_process+0x5c/0x1c0
> Jan  4 15:23:30 figo-desktop kernel: [<c10c171d>] ? select_bad_process.clone.7+0x7d/0xd0
> Jan  4 15:23:30 figo-desktop kernel: [<c10c1a0f>] out_of_memory+0xbf/0x1d0
> Jan  4 15:23:30 figo-desktop kernel: [<c10c18d8>] ? try_set_zonelist_oom+0xc8/0xe0
> Jan  4 15:23:30 figo-desktop kernel: [<c10c5288>] __alloc_pages_nodemask+0x5e8/0x600
> Jan  4 15:23:35 figo-desktop kernel: [<c10d9b42>] handle_mm_fault+0x692/0x970
> Jan  4 15:23:35 figo-desktop kernel: [<c110f2c9>] ? prepend_path+0x59/0x140
> Jan  4 15:23:35 figo-desktop kernel: [<c10f1b85>] ? __kmalloc+0xb5/0x1b0
> Jan  4 15:23:35 figo-desktop kernel: [<c1028d60>] ? do_page_fault+0x0/0x3e0
> Jan  4 15:23:35 figo-desktop kernel: [<c1028eb0>] do_page_fault+0x150/0x3e0
> Jan  4 15:23:35 figo-desktop kernel: [<c1171e27>] ? tomoyo_fill_path_info+0x17/0xf0
> Jan  4 15:23:35 figo-desktop kernel: [<c113e74a>] ? dquot_file_open+0x1a/0x50
> Jan  4 15:23:35 figo-desktop kernel: [<f81e2225>] ? ext4_file_open+0x45/0xe0 [ext4]
> Jan  4 15:23:35 figo-desktop kernel: [<c1028d60>] ? do_page_fault+0x0/0x3e0
> Jan  4 15:23:35 figo-desktop kernel: [<c130753b>] error_code+0x67/0x6c
> Jan  4 15:23:35 figo-desktop kernel: [<c10be3eb>] ? file_read_actor+0x2b/0xd0
> Jan  4 15:23:35 figo-desktop kernel: [<c10be719>] ? find_get_page+0x29/0xb0
> Jan  4 15:23:35 figo-desktop kernel: [<c10c0381>] generic_file_aio_read+0x331/0x760
> Jan  4 15:23:35 figo-desktop kernel: [<c10fd2dc>] do_sync_read+0x9c/0xd0
> Jan  4 15:23:35 figo-desktop kernel: [<c10fd5ed>] ? rw_verify_area+0x5d/0xd0
> Jan  4 15:23:35 figo-desktop kernel: [<c10fda87>] vfs_read+0x97/0x160
> Jan  4 15:23:35 figo-desktop kernel: [<c10fd240>] ? do_sync_read+0x0/0xd0
> Jan  4 15:23:35 figo-desktop kernel: [<c10fdb8d>] sys_read+0x3d/0x70
> Jan  4 15:23:35 figo-desktop kernel: [<c100379f>] sysenter_do_call+0x12/0x28
> Jan  4 15:23:35 figo-desktop kernel: Mem-Info:
> Jan  4 15:23:35 figo-desktop kernel: DMA per-cpu:
> Jan  4 15:23:35 figo-desktop kernel: CPU    0: hi:    0, btch:   1 usd:   0
> Jan  4 15:23:35 figo-desktop kernel: CPU    1: hi:    0, btch:   1 usd:   0
> Jan  4 15:23:35 figo-desktop kernel: Normal per-cpu:
> Jan  4 15:23:35 figo-desktop kernel: CPU    0: hi:  186, btch:  31 usd:  31
> Jan  4 15:23:35 figo-desktop kernel: CPU    1: hi:  186, btch:  31 usd:   0
> Jan  4 15:23:35 figo-desktop kernel: HighMem per-cpu:
> Jan  4 15:23:35 figo-desktop kernel: CPU    0: hi:  186, btch:  31 usd:  30
> Jan  4 15:23:35 figo-desktop kernel: CPU    1: hi:  186, btch:  31 usd:   0
> Jan  4 15:23:35 figo-desktop kernel: active_anon:184575 inactive_anon:108966 isolated_anon:0
> Jan  4 15:23:35 figo-desktop kernel: active_file:54381 inactive_file:122394 isolated_file:0
> Jan  4 15:23:35 figo-desktop kernel: unevictable:17 dirty:1 writeback:0 unstable:0
> Jan  4 15:23:35 figo-desktop kernel: free:11938 slab_reclaimable:5448 slab_unreclaimable:6124
> Jan  4 15:23:35 figo-desktop kernel: mapped:48017 shmem:28204 pagetables:2587 bounce:0
> Jan  4 15:23:35 figo-desktop kernel: DMA free:7964kB min:64kB low:80kB high:96kB active_anon:4624kB inactive_anon:1316kB active_file:532kB inactive_file:760kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15788kB mlocked:0kB dirty:0kB writeback:0kB mapped:548kB shmem:72kB slab_reclaimable:480kB slab_unreclaimable:112kB kernel_stack:8kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1963 all_unreclaimable? yes
> Jan  4 15:23:35 figo-desktop kernel: lowmem_reserve[]: 0 865 1980 1980
> Jan  4 15:23:35 figo-desktop kernel: Normal free:39344kB min:3728kB low:4660kB high:5592kB active_anon:234180kB inactive_anon:216232kB active_file:86148kB inactive_file:207704kB unevictable:68kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:68kB dirty:4kB writeback:0kB mapped:55852kB shmem:36852kB slab_reclaimable:21312kB slab_unreclaimable:24384kB kernel_stack:3272kB pagetables:10348kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:441365 all_unreclaimable? yes
> Jan  4 15:23:35 figo-desktop kernel: lowmem_reserve[]: 0 0 8921 8921
> Jan  4 15:23:35 figo-desktop kernel: HighMem free:444kB min:512kB low:1712kB high:2912kB active_anon:499496kB inactive_anon:218316kB active_file:130844kB inactive_file:281112kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1141984kB mlocked:0kB dirty:0kB writeback:0kB mapped:135668kB shmem:75892kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:686994 all_unreclaimable? yes
> Jan  4 15:23:35 figo-desktop kernel: lowmem_reserve[]: 0 0 0 0
> Jan  4 15:23:35 figo-desktop kernel: DMA: 1549*4kB 221*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7964kB
> Jan  4 15:23:35 figo-desktop kernel: Normal: 1472*4kB 3840*8kB 47*16kB 8*32kB 3*64kB 2*128kB 3*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 39344kB
> Jan  4 15:23:35 figo-desktop kernel: HighMem: 13*4kB 9*8kB 18*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 444kB
> Jan  4 15:23:35 figo-desktop kernel: 224327 total pagecache pages
> Jan  4 15:23:35 figo-desktop kernel: 19316 pages in swap cache
> Jan  4 15:23:35 figo-desktop kernel: Swap cache stats: add 999341, delete 980025, find 277475/391898
> Jan  4 15:23:35 figo-desktop kernel: Free swap  = -79552kB
> Jan  4 15:23:35 figo-desktop kernel: Total swap = 0kB
> Jan  4 15:23:35 figo-desktop kernel: 515070 pages RAM
> Jan  4 15:23:35 figo-desktop kernel: 287745 pages HighMem
> Jan  4 15:23:35 figo-desktop kernel: 8297 pages reserved
> Jan  4 15:23:35 figo-desktop kernel: 300103 pages shared
> Jan  4 15:23:35 figo-desktop kernel: 310097 pages non-shared
> Jan  4 15:23:35 figo-desktop kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
> Jan  4 15:23:35 figo-desktop kernel: [  583]     0   583      581      195   0     -17         -1000 udevd
> Jan  4 15:23:35 figo-desktop kernel: [ 1303]     0  1303      868       35   0       0             0 syslog-ng
> Jan  4 15:23:35 figo-desktop kernel: [ 1304]     0  1304     1604      387   0       0             0 syslog-ng
> Jan  4 15:23:35 figo-desktop kernel: [ 1306]    81  1306      808      375   0       0             0 dbus-daemon
> Jan  4 15:23:35 figo-desktop kernel: [ 1309]    82  1309     3777      391   0       0             0 hald
> Jan  4 15:23:35 figo-desktop kernel: [ 1310]     0  1310      903      128   1       0             0 hald-runner
> Jan  4 15:23:35 figo-desktop kernel: [ 1339]     0  1339      919       87   1       0             0 hald-addon-inpu
> Jan  4 15:23:35 figo-desktop kernel: [ 1357]     0  1357      920      184   1       0             0 hald-addon-stor
> Jan  4 15:23:35 figo-desktop kernel: [ 1359]    82  1359      824      106   1       0             0 hald-addon-acpi
> Jan  4 15:23:35 figo-desktop kernel: [ 1448]     0  1448      616      294   1       0             0 crond
> Jan  4 15:23:35 figo-desktop kernel: [ 1477]     0  1477      720      108   0       0             0 mysqld_safe
> Jan  4 15:23:35 figo-desktop kernel: [ 1484]     0  1484     3580      268   0       0             0 gdm-binary
> Jan  4 15:23:35 figo-desktop kernel: [ 1497]     0  1497      440       66   1       0             0 agetty
> Jan  4 15:23:35 figo-desktop kernel: [ 1498]     0  1498      440       66   0       0             0 agetty
> Jan  4 15:23:35 figo-desktop kernel: [ 1499]     0  1499      440       66   0       0             0 agetty
> Jan  4 15:23:35 figo-desktop kernel: [ 1500]     0  1500      440       67   1       0             0 agetty
> Jan  4 15:23:35 figo-desktop kernel: [ 1501]     0  1501      440       66   0       0             0 agetty
> Jan  4 15:23:35 figo-desktop kernel: [ 1502]     0  1502      440       67   0       0             0 agetty
> Jan  4 15:23:35 figo-desktop kernel: [ 1553]     0  1553     6607      650   1       0             0 NetworkManager
> Jan  4 15:23:35 figo-desktop kernel: [ 1592]     0  1592     2256      329   0       0             0 cupsd
> Jan  4 15:23:35 figo-desktop kernel: [ 1597]    89  1597    29903     2812   1       0             0 mysqld
> Jan  4 15:23:35 figo-desktop kernel: [ 1598]     0  1598     1652      147   1     -17         -1000 sshd
> Jan  4 15:23:35 figo-desktop kernel: [ 1612]     0  1612     6263      506   1       0             0 polkitd
> Jan  4 15:23:35 figo-desktop kernel: [ 1613]     0  1613     2015      121   1       0             0 vmware-usbarbit
> Jan  4 15:23:35 figo-desktop kernel: [ 1617]     0  1617     2887     1481   0       0             0 cntlm
> Jan  4 15:23:35 figo-desktop kernel: [ 1620]     0  1620     4398      365   0       0             0 gdm-simple-slav
> Jan  4 15:23:35 figo-desktop kernel: [ 1636]     0  1636    37287    10088   1       0             0 Xorg
> Jan  4 15:23:35 figo-desktop kernel: [ 1638]     0  1638     1248       88   0       0             0 wpa_supplicant
> Jan  4 15:23:35 figo-desktop kernel: [ 1720]     0  1720     4789     1527   1       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [ 1722]     0  1722      488       92   0       0             0 dhcpcd
> Jan  4 15:23:35 figo-desktop kernel: [ 1732]     0  1732      519       61   1       0             0 vmnet-bridge
> Jan  4 15:23:35 figo-desktop kernel: [ 1750]     0  1750     4922      302   1       0             0 smbd
> Jan  4 15:23:35 figo-desktop kernel: [ 1764]     0  1764      655       37   1       0             0 vmnet-dhcpd
> Jan  4 15:23:35 figo-desktop kernel: [ 1772]     0  1772      513       15   0       0             0 vmnet-netifup
> Jan  4 15:23:35 figo-desktop kernel: [ 1774]     0  1774      655       38   0       0             0 vmnet-dhcpd
> Jan  4 15:23:35 figo-desktop kernel: [ 1776]     0  1776     4922      223   0       0             0 smbd
> Jan  4 15:23:35 figo-desktop kernel: [ 1778]     0  1778      634       43   0       0             0 vmnet-natd
> Jan  4 15:23:35 figo-desktop kernel: [ 1780]     0  1780      513       13   1       0             0 vmnet-netifup
> Jan  4 15:23:35 figo-desktop kernel: [ 1796]     0  1796     6677      327   0       0             0 console-kit-dae
> Jan  4 15:23:35 figo-desktop kernel: [ 1899]   120  1899     6587      597   0       0             0 polkit-gnome-au
> Jan  4 15:23:35 figo-desktop kernel: [ 1903]     0  1903     3905      244   0       0             0 gdm-session-wor
> Jan  4 15:23:35 figo-desktop kernel: [ 1906]     0  1906     3677      313   1       0             0 upowerd
> Jan  4 15:23:35 figo-desktop kernel: [ 1974]  1000  1974    10963      917   0       0             0 gnome-keyring-d
> Jan  4 15:23:35 figo-desktop kernel: [ 1993]  1000  1993     9253      771   0       0             0 gnome-session
> Jan  4 15:23:35 figo-desktop kernel: [ 2012]  1000  2012      796      103   0       0             0 dbus-launch
> Jan  4 15:23:35 figo-desktop kernel: [ 2013]  1000  2013     5854     4749   0       0             0 dbus-daemon
> Jan  4 15:23:35 figo-desktop kernel: [ 2015]  1000  2015      887       54   1       0             0 ssh-agent
> Jan  4 15:23:35 figo-desktop kernel: [ 2022]  1000  2022    10904     2634   1       0             0 fcitx
> Jan  4 15:23:35 figo-desktop kernel: [ 2032]  1000  2032    41764     1971   1       0             0 gnome-settings-
> Jan  4 15:23:35 figo-desktop kernel: [ 2036]  1000  2036     2367      408   0       0             0 gvfsd
> Jan  4 15:23:35 figo-desktop kernel: [ 2039]  1000  2039    85188    22437   0       0             0 metacity
> Jan  4 15:23:35 figo-desktop kernel: [ 2049]  1000  2049    14162      267   1       0             0 gvfs-fuse-daemo
> Jan  4 15:23:35 figo-desktop kernel: [ 2053]  1000  2053    76657     3448   0       0             0 gnome-panel
> Jan  4 15:23:35 figo-desktop kernel: [ 2056]  1000  2056    10978      563   0       0             0 gvfs-gdu-volume
> Jan  4 15:23:35 figo-desktop kernel: [ 2058]     0  2058     5767      451   1       0             0 udisks-daemon
> Jan  4 15:23:35 figo-desktop kernel: [ 2059]     0  2059     1291       72   0       0             0 udisks-daemon
> Jan  4 15:23:35 figo-desktop kernel: [ 2069]  1000  2069   168292    48504   0       0             0 nautilus
> Jan  4 15:23:35 figo-desktop kernel: [ 2071]  1000  2071    12907      223   0       0             0 bonobo-activati
> Jan  4 15:23:35 figo-desktop kernel: [ 2081]  1000  2081     1633      136   0       0             0 sh
> Jan  4 15:23:35 figo-desktop kernel: [ 2082]  1000  2082     1666      136   1       0             0 thunderbird
> Jan  4 15:23:35 figo-desktop kernel: [ 2084]  1000  2084    46718     3138   0       0             0 wnck-applet
> Jan  4 15:23:35 figo-desktop kernel: [ 2086]  1000  2086    42862     1439   0       0             0 polkit-gnome-au
> Jan  4 15:23:35 figo-desktop kernel: [ 2087]  1000  2087   285933     1524   0       0             0 nm-applet
> Jan  4 15:23:35 figo-desktop kernel: [ 2104]  1000  2104     5056      732   0       0             0 gdu-notificatio
> Jan  4 15:23:35 figo-desktop kernel: [ 2107]  1000  2107     1666      142   1       0             0 run-mozilla.sh
> Jan  4 15:23:35 figo-desktop kernel: [ 2108]  1000  2108    39395      953   1       0             0 gnome-power-man
> Jan  4 15:23:35 figo-desktop kernel: [ 2109]  1000  2109     7886      709   0       0             0 vino-server
> Jan  4 15:23:35 figo-desktop kernel: [ 2110]  1000  2110    10764      790   1       0             0 evolution-alarm
> Jan  4 15:23:35 figo-desktop kernel: [ 2114]  1000  2114   167865    29127   0       0             0 thunderbird-bin
> Jan  4 15:23:35 figo-desktop kernel: [ 2121]  1000  2121    42098     1885   0       0             0 notify-osd
> Jan  4 15:23:35 figo-desktop kernel: [ 2125]  1000  2125    42488     1202   1       0             0 cpufreq-applet
> Jan  4 15:23:35 figo-desktop kernel: [ 2126]  1000  2126    41394     1143   0       0             0 multiload-apple
> Jan  4 15:23:35 figo-desktop kernel: [ 2129]  1000  2129    64405     1748   0       0             0 mixer_applet2
> Jan  4 15:23:35 figo-desktop kernel: [ 2131]  1000  2131    75346     2380   1       0             0 clock-applet
> Jan  4 15:23:35 figo-desktop kernel: [ 2132]  1000  2132    41163      924   1       0             0 notification-ar
> Jan  4 15:23:35 figo-desktop kernel: [ 2149]  1000  2149    15325      696   1       0             0 e-calendar-fact
> Jan  4 15:23:35 figo-desktop kernel: [ 2153]  1000  2153     7497     1006   0       0             0 gnome-screensav
> Jan  4 15:23:35 figo-desktop kernel: [ 2155]  1000  2155     3848      202   1       0             0 pxgconf
> Jan  4 15:23:35 figo-desktop kernel: [ 2163]  1000  2163     1781      342   0       0             0 mission-control
> Jan  4 15:23:35 figo-desktop kernel: [ 2173]  1000  2173     4486      417   0       0             0 gvfsd-trash
> Jan  4 15:23:35 figo-desktop kernel: [ 2186]     0  2186     3543      192   1       0             0 system-tools-ba
> Jan  4 15:23:35 figo-desktop kernel: [ 2210]  1000  2210     2202      236   0       0             0 gvfsd-burn
> Jan  4 15:23:35 figo-desktop kernel: [ 2245]  1000  2245     3895     1713   0       0             0 gvfsd-metadata
> Jan  4 15:23:35 figo-desktop kernel: [ 2274]  1000  2274    22825      423   1       0             0 conky
> Jan  4 15:23:35 figo-desktop kernel: [ 2281]     0  2281     3295     2278   0       0             0 SystemToolsBack
> Jan  4 15:23:35 figo-desktop kernel: [ 2663]  1000  2663    68807     2924   1       0             0 gnome-terminal
> Jan  4 15:23:35 figo-desktop kernel: [ 2683]  1000  2683      451       76   0       0             0 gnome-pty-helpe
> Jan  4 15:23:35 figo-desktop kernel: [ 2685]  1000  2685     2072      555   0       0             0 bash
> Jan  4 15:23:35 figo-desktop kernel: [ 2885]     0  2885     1489      118   0       0             0 sudo
> Jan  4 15:23:35 figo-desktop kernel: [ 2886]     0  2886     6273      353   1       0             0 vim
> Jan  4 15:23:35 figo-desktop kernel: [ 2887]  1000  2887      472       84   0       0             0 ping
> Jan  4 15:23:35 figo-desktop kernel: [ 2892]  1000  2892      472       83   1       0             0 ping
> Jan  4 15:23:35 figo-desktop kernel: [ 2894]  1000  2894    76113     5382   0       0             0 vmware
> Jan  4 15:23:35 figo-desktop kernel: [ 2919]  1000  2919    51497     3224   1       0             0 vmware-tray
> Jan  4 15:23:35 figo-desktop kernel: [ 2954]  1000  2954    48676     1588   1       0             0 vmware-unity-he
> Jan  4 15:23:35 figo-desktop kernel: [ 2988]  1000  2988   190471    40349   1       0             0 vmware-vmx
> Jan  4 15:23:35 figo-desktop kernel: [ 3207]  1000  3207     4377      279   1       0             0 gvfsd-computer
> Jan  4 15:23:35 figo-desktop kernel: [ 3211]  1000  3211     9920      416   0       0             0 gvfsd-smb-brows
> Jan  4 15:23:35 figo-desktop kernel: [ 3217]  1000  3217     9876      545   0       0             0 gvfsd-smb
> Jan  4 15:23:35 figo-desktop kernel: [15186]  1000 15186     2069      551   1       0             0 bash
> Jan  4 15:23:35 figo-desktop kernel: [17451]  1000 17451     5679      500   0       0             0 dconf-service
> Jan  4 15:23:35 figo-desktop kernel: [19085]  1000 19085     1576      147   0       0             0 ssh
> Jan  4 15:23:35 figo-desktop kernel: [19261]  1000 19261     1682      957   0       0             0 wineserver
> Jan  4 15:23:35 figo-desktop kernel: [19266]  1000 19266   399085      102   0       0             0 services.exe
> Jan  4 15:23:35 figo-desktop kernel: [19269]  1000 19269   399117       70   1       0             0 winedevice.exe
> Jan  4 15:23:35 figo-desktop kernel: [19342]  1000 19342   404518      611   0       0             0 explorer.exe
> Jan  4 15:23:35 figo-desktop kernel: [19344]  1000 19344   550020    10567   0       0             0 insight3.exe
> Jan  4 15:23:35 figo-desktop kernel: [  360]  1000   360     2069      508   1       0             0 bash
> Jan  4 15:23:35 figo-desktop kernel: [ 9821]  1000  9821   166228     2782   0       0             0 stardict
> Jan  4 15:23:35 figo-desktop kernel: [14614]  1000 14614     2040      515   1       0             0 bash
> Jan  4 15:23:35 figo-desktop kernel: [17002]  1000 17002     2069      435   0       0             0 bash
> Jan  4 15:23:35 figo-desktop kernel: [18612]     0 18612     1536      124   1       0             0 sudo
> Jan  4 15:23:35 figo-desktop kernel: [18613]     0 18613     1988      607   1       0             0 minicom
> Jan  4 15:23:35 figo-desktop kernel: [21183]  1000 21183     2041      383   1       0             0 bash
> Jan  4 15:23:35 figo-desktop kernel: [21194]  1000 21194     1611      124   0       0             0 ssh
> Jan  4 15:23:35 figo-desktop kernel: [22451]  1000 22451     2069      575   1       0             0 bash
> Jan  4 15:23:35 figo-desktop kernel: [23428]  1000 23428     6475      549   1       0             0 vim
> Jan  4 15:23:35 figo-desktop kernel: [23484]  1000 23484     6501      585   1       0             0 vim
> Jan  4 15:23:35 figo-desktop kernel: [23549]  1000 23549     6501      582   0       0             0 vim
> Jan  4 15:23:35 figo-desktop kernel: [23642]  1000 23642     9865      569   0       0             0 gvfsd-smb
> Jan  4 15:23:35 figo-desktop kernel: [26358]  1000 26358   407339     4454   0       0             0 insight3.exe
> Jan  4 15:23:35 figo-desktop kernel: [29711]  1000 29711     9943      549   0       0             0 gvfsd-smb
> Jan  4 15:23:35 figo-desktop kernel: [26156]  1000 26156    61269     9022   0       0             0 skype
> Jan  4 15:23:35 figo-desktop kernel: [32490]  1000 32490     2647      675   1       0             0 gconfd-2
> Jan  4 15:23:35 figo-desktop kernel: [10622]  1000 10622     2072      725   0       0             0 bash
> Jan  4 15:23:35 figo-desktop kernel: [10634]  1000 10634     1576      155   0       0             0 ssh
> Jan  4 15:23:35 figo-desktop kernel: [15410]  1000 15410    76559    12077   0       0             0 evince
> Jan  4 15:23:35 figo-desktop kernel: [15415]  1000 15415     5490      217   1       0             0 evinced
> Jan  4 15:23:35 figo-desktop kernel: [16754]  1000 16754     9899      484   0       0             0 gvfsd-smb
> Jan  4 15:23:35 figo-desktop kernel: [16772]  1000 16772     9900      498   0       0             0 gvfsd-smb
> Jan  4 15:23:35 figo-desktop kernel: [25390]  1000 25390   407306     2127   1       0             0 insight3.exe
> Jan  4 15:23:35 figo-desktop kernel: [ 2127]  1000  2127     1609      124   0       0             0 ssh
> Jan  4 15:23:35 figo-desktop kernel: [10661]    33 10661     4775     1324   1       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [10662]    33 10662     4823     1557   0       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [10663]    33 10663     4823     1558   0       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [10664]    33 10664     4823     1557   0       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [10665]    33 10665     4823     1557   0       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [10666]    33 10666     4823     1557   1       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [32159]  1000 32159   166096    39370   1       0             0 firefox
> Jan  4 15:23:35 figo-desktop kernel: [32228]    33 32228     4823     1500   0       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [32233]    33 32233     4857     1584   1       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [32234]    33 32234     4857     1585   1       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [32241]    33 32241     4789     1524   1       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [32246]    33 32246     4789     1524   0       0             0 httpd
> Jan  4 15:23:35 figo-desktop kernel: [32268]  1000 32268    27543     4243   1       0             0 plugin-containe
> Jan  4 15:23:35 figo-desktop kernel: [32320]  1000 32320    16230     1445   1       0             0 GoogleTalkPlugi
> Jan  4 15:23:35 figo-desktop kernel: [  970]  1000   970   407197     2605   1       0             0 insight3.exe
> Jan  4 15:23:35 figo-desktop kernel: [ 1240]  1000  1240      601      116   1       0             0 top
> Jan  4 15:23:35 figo-desktop kernel: [ 2038]  1000  2038   407785     2770   1       0             0 insight3.exe
> Jan  4 15:23:35 figo-desktop kernel: [12415]     0 12415      543      242   1     -17         -1000 udevd
> Jan  4 15:23:35 figo-desktop kernel: [12416]     0 12416      580      195   1     -17         -1000 udevd
> Jan  4 15:23:35 figo-desktop kernel: [13911]     0 13911      719      269   0       0             0 sh
> Jan  4 15:23:35 figo-desktop kernel: [13914]     0 13914      719      268   0       0             0 sh
> Jan  4 15:23:35 figo-desktop kernel: [13916]     0 13916     1187      188   1       0             0 git
> Jan  4 15:23:35 figo-desktop kernel: [13918]     0 13918      719      313   0       0             0 git-pull
> Jan  4 15:23:35 figo-desktop kernel: [13952]     0 13952     1189      263   1       0             0 git
> Jan  4 15:23:35 figo-desktop kernel: [13954]     0 13954     1575      460   1       0             0 ssh
> Jan  4 15:23:35 figo-desktop kernel: [13956]     0 13956     1488      219   1       0             0 sudo
> Jan  4 15:23:35 figo-desktop kernel: [13957]     0 13957     1386      130   0       0             0 swapoff
> Jan  4 15:23:43 figo-desktop kernel: Xorg invoked oom-killer: gfp_mask=0x80d2, order=0, oom_adj=0, oom_score_adj=0
> Jan  4 15:23:43 figo-desktop kernel: Xorg cpuset=/ mems_allowed=0
> Jan  4 15:23:43 figo-desktop kernel: Pid: 1636, comm: Xorg Not tainted 2.6.36-ARCH #1
> Jan  4 15:23:43 figo-desktop kernel: Call Trace:
> Jan  4 15:23:43 figo-desktop kernel: [<c10c11d0>] dump_header.clone.5+0x80/0x1e0
> Jan  4 15:23:43 figo-desktop kernel: [<c10c153c>] oom_kill_process+0x5c/0x1c0
> Jan  4 15:23:43 figo-desktop kernel: [<c10c171d>] ? select_bad_process.clone.7+0x7d/0xd0
> Jan  4 15:23:43 figo-desktop kernel: [<c10c1a0f>] out_of_memory+0xbf/0x1d0
> Jan  4 15:23:43 figo-desktop kernel: [<c10c18d8>] ? try_set_zonelist_oom+0xc8/0xe0
> Jan  4 15:23:43 figo-desktop kernel: [<c10c5288>] __alloc_pages_nodemask+0x5e8/0x600
> Jan  4 15:23:44 figo-desktop kernel: [<c10e36f6>] __vmalloc_area_node+0x76/0x100
> Jan  4 15:23:44 figo-desktop kernel: [<c102fb30>] ? __wake_up_common+0x40/0x70
> Jan  4 15:23:44 figo-desktop kernel: [<f8991f87>] ? i915_gem_do_execbuffer+0xa87/0x1110 [i915]
> Jan  4 15:23:44 figo-desktop kernel: [<c10e381a>] __vmalloc_node+0x9a/0xa0
> Jan  4 15:23:44 figo-desktop kernel: [<f8991f87>] ? i915_gem_do_execbuffer+0xa87/0x1110 [i915]
> Jan  4 15:23:44 figo-desktop kernel: [<f8991f87>] ? i915_gem_do_execbuffer+0xa87/0x1110 [i915]
> Jan  4 15:23:44 figo-desktop kernel: [<f8991f87>] ? i915_gem_do_execbuffer+0xa87/0x1110 [i915]
> Jan  4 15:23:44 figo-desktop kernel: [<c10e3975>] __vmalloc+0x25/0x30
> Jan  4 15:23:44 figo-desktop kernel: [<f8991f87>] ? i915_gem_do_execbuffer+0xa87/0x1110 [i915]
> Jan  4 15:23:44 figo-desktop kernel: [<f8991f87>] i915_gem_do_execbuffer+0xa87/0x1110 [i915]
> Jan  4 15:23:44 figo-desktop kernel: [<c12e49c8>] ? unix_stream_recvmsg+0x498/0x580
> Jan  4 15:23:44 figo-desktop kernel: [<c11a34d2>] ? _copy_from_user+0x32/0x50
> Jan  4 15:23:44 figo-desktop kernel: [<f8992687>] i915_gem_execbuffer2+0x77/0x1e0 [i915]
> Jan  4 15:23:44 figo-desktop kernel: [<f8323231>] drm_ioctl+0x1e1/0x470 [drm]
> Jan  4 15:23:44 figo-desktop kernel: [<f8992610>] ? i915_gem_execbuffer2+0x0/0x1e0 [i915]
> Jan  4 15:23:44 figo-desktop kernel: [<c10fd2dc>] ? do_sync_read+0x9c/0xd0
> Jan  4 15:23:44 figo-desktop kernel: [<c1064e06>] ? __hrtimer_start_range_ns+0x1a6/0x4d0
> Jan  4 15:23:44 figo-desktop kernel: [<f8323050>] ? drm_ioctl+0x0/0x470 [drm]
> Jan  4 15:23:44 figo-desktop kernel: [<c110beca>] do_vfs_ioctl+0x7a/0x590
> Jan  4 15:23:44 figo-desktop kernel: [<c1172121>] ? tomoyo_init_request_info+0x41/0x50
> Jan  4 15:23:44 figo-desktop kernel: [<c116ebb9>] ? tomoyo_path_number_perm+0x29/0xe0
> Jan  4 15:23:44 figo-desktop kernel: [<c10fdb39>] ? vfs_read+0x149/0x160
> Jan  4 15:23:44 figo-desktop kernel: [<c1170eb2>] ? tomoyo_file_ioctl+0x12/0x20
> Jan  4 15:23:44 figo-desktop kernel: [<c110c43f>] sys_ioctl+0x5f/0x80
> Jan  4 15:23:44 figo-desktop kernel: [<c100379f>] sysenter_do_call+0x12/0x28
> Jan  4 15:23:44 figo-desktop kernel: Mem-Info:
> Jan  4 15:23:44 figo-desktop kernel: DMA per-cpu:
> Jan  4 15:23:44 figo-desktop kernel: CPU    0: hi:    0, btch:   1 usd:   0
> Jan  4 15:23:44 figo-desktop kernel: CPU    1: hi:    0, btch:   1 usd:   0
> Jan  4 15:23:44 figo-desktop kernel: Normal per-cpu:
> Jan  4 15:23:44 figo-desktop kernel: CPU    0: hi:  186, btch:  31 usd:  37
> Jan  4 15:23:44 figo-desktop kernel: CPU    1: hi:  186, btch:  31 usd:  45
> Jan  4 15:23:44 figo-desktop kernel: HighMem per-cpu:
> Jan  4 15:23:44 figo-desktop kernel: CPU    0: hi:  186, btch:  31 usd:   2
> Jan  4 15:23:44 figo-desktop kernel: CPU    1: hi:  186, btch:  31 usd:  36
> Jan  4 15:23:44 figo-desktop kernel: active_anon:196238 inactive_anon:97262 isolated_anon:0
> Jan  4 15:23:44 figo-desktop kernel: active_file:54385 inactive_file:122363 isolated_file:0
> Jan  4 15:23:44 figo-desktop kernel: unevictable:17 dirty:0 writeback:41 unstable:0
> Jan  4 15:23:44 figo-desktop kernel: free:11933 slab_reclaimable:5456 slab_unreclaimable:6120
> Jan  4 15:23:44 figo-desktop kernel: mapped:47868 shmem:29255 pagetables:2585 bounce:0
> Jan  4 15:23:44 figo-desktop kernel: DMA free:7976kB min:64kB low:80kB high:96kB active_anon:2908kB inactive_anon:3020kB active_file:532kB inactive_file:764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15788kB mlocked:0kB dirty:0kB writeback:0kB mapped:528kB shmem:52kB slab_reclaimable:480kB slab_unreclaimable:112kB kernel_stack:8kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2053 all_unreclaimable? yes
> Jan  4 15:23:44 figo-desktop kernel: lowmem_reserve[]: 0 865 1980 1980
> Jan  4 15:23:44 figo-desktop kernel: Normal free:39352kB min:3728kB low:4660kB high:5592kB active_anon:230308kB inactive_anon:219780kB active_file:86232kB inactive_file:207668kB unevictable:68kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:68kB dirty:0kB writeback:0kB mapped:55204kB shmem:36960kB slab_reclaimable:21344kB slab_unreclaimable:24368kB kernel_stack:3272kB pagetables:10340kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:448565 all_unreclaimable? yes
> Jan  4 15:23:44 figo-desktop kernel: lowmem_reserve[]: 0 0 8921 8921
> Jan  4 15:23:44 figo-desktop kernel: HighMem free:404kB min:512kB low:1712kB high:2912kB active_anon:551736kB inactive_anon:166248kB active_file:130776kB inactive_file:281020kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1141984kB mlocked:0kB dirty:0kB writeback:164kB mapped:135740kB shmem:80008kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:687864 all_unreclaimable? yes
> Jan  4 15:23:44 figo-desktop kernel: lowmem_reserve[]: 0 0 0 0
> Jan  4 15:23:44 figo-desktop kernel: DMA: 1546*4kB 224*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7976kB
> Jan  4 15:23:44 figo-desktop kernel: Normal: 1678*4kB 3750*8kB 49*16kB 6*32kB 2*64kB 2*128kB 3*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 39352kB
> Jan  4 15:23:44 figo-desktop kernel: HighMem: 20*4kB 7*8kB 18*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 456kB
> Jan  4 15:23:44 figo-desktop kernel: 209015 total pagecache pages
> Jan  4 15:23:44 figo-desktop kernel: 2993 pages in swap cache
> Jan  4 15:23:44 figo-desktop kernel: Swap cache stats: add 1046799, delete 1043806, find 277476/391899
> Jan  4 15:23:44 figo-desktop kernel: Free swap  = -14768kB
> Jan  4 15:23:44 figo-desktop kernel: Total swap = 0kB
> Jan  4 15:23:44 figo-desktop kernel: 515070 pages RAM
> Jan  4 15:23:44 figo-desktop kernel: 287745 pages HighMem
> Jan  4 15:23:44 figo-desktop kernel: 8297 pages reserved
> Jan  4 15:23:44 figo-desktop kernel: 302258 pages shared
> Jan  4 15:23:44 figo-desktop kernel: 308041 pages non-shared
> Jan  4 15:23:44 figo-desktop kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
> Jan  4 15:23:44 figo-desktop kernel: [  583]     0   583      581      246   1     -17         -1000 udevd
> Jan  4 15:23:44 figo-desktop kernel: [ 1303]     0  1303      868       37   0       0             0 syslog-ng
> Jan  4 15:23:44 figo-desktop kernel: [ 1304]     0  1304     1604      383   1       0             0 syslog-ng
> Jan  4 15:23:44 figo-desktop kernel: [ 1306]    81  1306      808      362   0       0             0 dbus-daemon
> Jan  4 15:23:44 figo-desktop kernel: [ 1309]    82  1309     3777      376   1       0             0 hald
> Jan  4 15:23:44 figo-desktop kernel: [ 1310]     0  1310      903      130   1       0             0 hald-runner
> Jan  4 15:23:44 figo-desktop kernel: [ 1339]     0  1339      919       87   1       0             0 hald-addon-inpu
> Jan  4 15:23:44 figo-desktop kernel: [ 1357]     0  1357      919      184   0       0             0 hald-addon-stor
> Jan  4 15:23:44 figo-desktop kernel: [ 1359]    82  1359      824      107   1       0             0 hald-addon-acpi
> Jan  4 15:23:44 figo-desktop kernel: [ 1448]     0  1448      616      294   0       0             0 crond
> Jan  4 15:23:44 figo-desktop kernel: [ 1477]     0  1477      720      108   0       0             0 mysqld_safe
> Jan  4 15:23:44 figo-desktop kernel: [ 1484]     0  1484     3580      269   0       0             0 gdm-binary
> Jan  4 15:23:44 figo-desktop kernel: [ 1497]     0  1497      440       67   1       0             0 agetty
> Jan  4 15:23:44 figo-desktop kernel: [ 1498]     0  1498      440       67   0       0             0 agetty
> Jan  4 15:23:44 figo-desktop kernel: [ 1499]     0  1499      440       67   0       0             0 agetty
> Jan  4 15:23:44 figo-desktop kernel: [ 1500]     0  1500      440       67   1       0             0 agetty
> Jan  4 15:23:44 figo-desktop kernel: [ 1501]     0  1501      440       67   0       0             0 agetty
> Jan  4 15:23:44 figo-desktop kernel: [ 1502]     0  1502      440       68   0       0             0 agetty
> Jan  4 15:23:44 figo-desktop kernel: [ 1553]     0  1553     6607      652   0       0             0 NetworkManager
> Jan  4 15:23:44 figo-desktop kernel: [ 1592]     0  1592     2256      330   0       0             0 cupsd
> Jan  4 15:23:44 figo-desktop kernel: [ 1597]    89  1597    29903     2812   1       0             0 mysqld
> Jan  4 15:23:44 figo-desktop kernel: [ 1598]     0  1598     1652      147   1     -17         -1000 sshd
> Jan  4 15:23:44 figo-desktop kernel: [ 1612]     0  1612     6263      508   1       0             0 polkitd
> Jan  4 15:23:44 figo-desktop kernel: [ 1613]     0  1613     2015      123   1       0             0 vmware-usbarbit
> Jan  4 15:23:44 figo-desktop kernel: [ 1617]     0  1617     2887     1600   1       0             0 cntlm
> Jan  4 15:23:44 figo-desktop kernel: [ 1620]     0  1620     4398      366   0       0             0 gdm-simple-slav
> Jan  4 15:23:44 figo-desktop kernel: [ 1636]     0  1636    37333     9882   1       0             0 Xorg
> Jan  4 15:23:44 figo-desktop kernel: [ 1638]     0  1638     1248       88   0       0             0 wpa_supplicant
> Jan  4 15:23:44 figo-desktop kernel: [ 1720]     0  1720     4789     1537   1       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [ 1722]     0  1722      488       93   0       0             0 dhcpcd
> Jan  4 15:23:44 figo-desktop kernel: [ 1732]     0  1732      519       62   1       0             0 vmnet-bridge
> Jan  4 15:23:44 figo-desktop kernel: [ 1750]     0  1750     4922      304   1       0             0 smbd
> Jan  4 15:23:44 figo-desktop kernel: [ 1764]     0  1764      655       39   1       0             0 vmnet-dhcpd
> Jan  4 15:23:44 figo-desktop kernel: [ 1772]     0  1772      513       15   0       0             0 vmnet-netifup
> Jan  4 15:23:44 figo-desktop kernel: [ 1774]     0  1774      655       39   0       0             0 vmnet-dhcpd
> Jan  4 15:23:44 figo-desktop kernel: [ 1776]     0  1776     4922      225   0       0             0 smbd
> Jan  4 15:23:44 figo-desktop kernel: [ 1778]     0  1778      634       44   0       0             0 vmnet-natd
> Jan  4 15:23:44 figo-desktop kernel: [ 1780]     0  1780      513       14   1       0             0 vmnet-netifup
> Jan  4 15:23:44 figo-desktop kernel: [ 1796]     0  1796     6677      329   0       0             0 console-kit-dae
> Jan  4 15:23:44 figo-desktop kernel: [ 1899]   120  1899     6587      598   0       0             0 polkit-gnome-au
> Jan  4 15:23:44 figo-desktop kernel: [ 1903]     0  1903     3905      245   0       0             0 gdm-session-wor
> Jan  4 15:23:44 figo-desktop kernel: [ 1906]     0  1906     3677      314   1       0             0 upowerd
> Jan  4 15:23:44 figo-desktop kernel: [ 1974]  1000  1974    10963     1248   0       0             0 gnome-keyring-d
> Jan  4 15:23:44 figo-desktop kernel: [ 1993]  1000  1993     9253      776   0       0             0 gnome-session
> Jan  4 15:23:44 figo-desktop kernel: [ 2012]  1000  2012      796      103   0       0             0 dbus-launch
> Jan  4 15:23:44 figo-desktop kernel: [ 2013]  1000  2013     5854     4969   0       0             0 dbus-daemon
> Jan  4 15:23:44 figo-desktop kernel: [ 2015]  1000  2015      887       54   1       0             0 ssh-agent
> Jan  4 15:23:44 figo-desktop kernel: [ 2022]  1000  2022    10904     4786   1       0             0 fcitx
> Jan  4 15:23:44 figo-desktop kernel: [ 2032]  1000  2032    41764     1981   1       0             0 gnome-settings-
> Jan  4 15:23:44 figo-desktop kernel: [ 2036]  1000  2036     2367      416   0       0             0 gvfsd
> Jan  4 15:23:44 figo-desktop kernel: [ 2039]  1000  2039    85188    26593   0       0             0 metacity
> Jan  4 15:23:44 figo-desktop kernel: [ 2049]  1000  2049    14162      280   1       0             0 gvfs-fuse-daemo
> Jan  4 15:23:44 figo-desktop kernel: [ 2053]  1000  2053    76657     3497   0       0             0 gnome-panel
> Jan  4 15:23:44 figo-desktop kernel: [ 2056]  1000  2056    10978      563   0       0             0 gvfs-gdu-volume
> Jan  4 15:23:44 figo-desktop kernel: [ 2058]     0  2058     5767      465   1       0             0 udisks-daemon
> Jan  4 15:23:44 figo-desktop kernel: [ 2059]     0  2059     1291       72   1       0             0 udisks-daemon
> Jan  4 15:23:44 figo-desktop kernel: [ 2069]  1000  2069   168292    50767   0       0             0 nautilus
> Jan  4 15:23:44 figo-desktop kernel: [ 2071]  1000  2071    12907      223   0       0             0 bonobo-activati
> Jan  4 15:23:44 figo-desktop kernel: [ 2081]  1000  2081     1633      136   0       0             0 sh
> Jan  4 15:23:44 figo-desktop kernel: [ 2082]  1000  2082     1666      136   1       0             0 thunderbird
> Jan  4 15:23:44 figo-desktop kernel: [ 2084]  1000  2084    46718     3364   0       0             0 wnck-applet
> Jan  4 15:23:44 figo-desktop kernel: [ 2086]  1000  2086    42862     1443   0       0             0 polkit-gnome-au
> Jan  4 15:23:44 figo-desktop kernel: [ 2087]  1000  2087   285933     1529   0       0             0 nm-applet
> Jan  4 15:23:44 figo-desktop kernel: [ 2104]  1000  2104     5056      756   1       0             0 gdu-notificatio
> Jan  4 15:23:44 figo-desktop kernel: [ 2107]  1000  2107     1666      142   1       0             0 run-mozilla.sh
> Jan  4 15:23:44 figo-desktop kernel: [ 2108]  1000  2108    39395      941   0       0             0 gnome-power-man
> Jan  4 15:23:44 figo-desktop kernel: [ 2109]  1000  2109     7886      709   0       0             0 vino-server
> Jan  4 15:23:44 figo-desktop kernel: [ 2110]  1000  2110    10764      791   1       0             0 evolution-alarm
> Jan  4 15:23:44 figo-desktop kernel: [ 2114]  1000  2114   167865    29566   1       0             0 thunderbird-bin
> Jan  4 15:23:44 figo-desktop kernel: [ 2121]  1000  2121    42098     1946   0       0             0 notify-osd
> Jan  4 15:23:44 figo-desktop kernel: [ 2125]  1000  2125    42488     1203   1       0             0 cpufreq-applet
> Jan  4 15:23:44 figo-desktop kernel: [ 2126]  1000  2126    41394     1148   1       0             0 multiload-apple
> Jan  4 15:23:44 figo-desktop kernel: [ 2129]  1000  2129    64405     1758   0       0             0 mixer_applet2
> Jan  4 15:23:44 figo-desktop kernel: [ 2131]  1000  2131    75346     2389   1       0             0 clock-applet
> Jan  4 15:23:44 figo-desktop kernel: [ 2132]  1000  2132    41163      930   1       0             0 notification-ar
> Jan  4 15:23:44 figo-desktop kernel: [ 2149]  1000  2149    15325      697   1       0             0 e-calendar-fact
> Jan  4 15:23:44 figo-desktop kernel: [ 2153]  1000  2153     7497     1008   1       0             0 gnome-screensav
> Jan  4 15:23:44 figo-desktop kernel: [ 2155]  1000  2155     3848      202   1       0             0 pxgconf
> Jan  4 15:23:44 figo-desktop kernel: [ 2163]  1000  2163     1781      342   0       0             0 mission-control
> Jan  4 15:23:44 figo-desktop kernel: [ 2173]  1000  2173     4486      427   0       0             0 gvfsd-trash
> Jan  4 15:23:44 figo-desktop kernel: [ 2186]     0  2186     3543      192   1       0             0 system-tools-ba
> Jan  4 15:23:44 figo-desktop kernel: [ 2210]  1000  2210     2202      238   0       0             0 gvfsd-burn
> Jan  4 15:23:44 figo-desktop kernel: [ 2245]  1000  2245     3895     1728   0       0             0 gvfsd-metadata
> Jan  4 15:23:44 figo-desktop kernel: [ 2274]  1000  2274    22823      424   1       0             0 conky
> Jan  4 15:23:44 figo-desktop kernel: [ 2281]     0  2281     3295     2278   0       0             0 SystemToolsBack
> Jan  4 15:23:44 figo-desktop kernel: [ 2663]  1000  2663    68807     2944   1       0             0 gnome-terminal
> Jan  4 15:23:44 figo-desktop kernel: [ 2683]  1000  2683      451       76   0       0             0 gnome-pty-helpe
> Jan  4 15:23:44 figo-desktop kernel: [ 2685]  1000  2685     2072      570   0       0             0 bash
> Jan  4 15:23:44 figo-desktop kernel: [ 2885]     0  2885     1489      119   0       0             0 sudo
> Jan  4 15:23:44 figo-desktop kernel: [ 2886]     0  2886     6273      355   1       0             0 vim
> Jan  4 15:23:44 figo-desktop kernel: [ 2887]  1000  2887      472       84   0       0             0 ping
> Jan  4 15:23:44 figo-desktop kernel: [ 2892]  1000  2892      472       83   1       0             0 ping
> Jan  4 15:23:44 figo-desktop kernel: [ 2894]  1000  2894    76113     5872   0       0             0 vmware
> Jan  4 15:23:44 figo-desktop kernel: [ 2919]  1000  2919    51497     3266   1       0             0 vmware-tray
> Jan  4 15:23:44 figo-desktop kernel: [ 2954]  1000  2954    48676     1589   0       0             0 vmware-unity-he
> Jan  4 15:23:44 figo-desktop kernel: [ 2988]  1000  2988   190471    42604   1       0             0 vmware-vmx
> Jan  4 15:23:44 figo-desktop kernel: [ 3207]  1000  3207     4377      362   1       0             0 gvfsd-computer
> Jan  4 15:23:44 figo-desktop kernel: [ 3211]  1000  3211     9920      509   0       0             0 gvfsd-smb-brows
> Jan  4 15:23:44 figo-desktop kernel: [ 3217]  1000  3217     9876      569   0       0             0 gvfsd-smb
> Jan  4 15:23:44 figo-desktop kernel: [15186]  1000 15186     2069      558   1       0             0 bash
> Jan  4 15:23:44 figo-desktop kernel: [17451]  1000 17451     5679      503   0       0             0 dconf-service
> Jan  4 15:23:44 figo-desktop kernel: [19085]  1000 19085     1576      149   0       0             0 ssh
> Jan  4 15:23:44 figo-desktop kernel: [19261]  1000 19261     1682      967   1       0             0 wineserver
> Jan  4 15:23:44 figo-desktop kernel: [19266]  1000 19266   399085      212   0       0             0 services.exe
> Jan  4 15:23:44 figo-desktop kernel: [19269]  1000 19269   399117      158   1       0             0 winedevice.exe
> Jan  4 15:23:44 figo-desktop kernel: [19342]  1000 19342   404518      620   0       0             0 explorer.exe
> Jan  4 15:23:44 figo-desktop kernel: [19344]  1000 19344   550020    10921   0       0             0 insight3.exe
> Jan  4 15:23:44 figo-desktop kernel: [  360]  1000   360     2069      541   1       0             0 bash
> Jan  4 15:23:44 figo-desktop kernel: [ 9821]  1000  9821   166228     2835   0       0             0 stardict
> Jan  4 15:23:44 figo-desktop kernel: [14614]  1000 14614     2040      523   1       0             0 bash
> Jan  4 15:23:44 figo-desktop kernel: [17002]  1000 17002     2069      541   0       0             0 bash
> Jan  4 15:23:44 figo-desktop kernel: [18612]     0 18612     1536      125   1       0             0 sudo
> Jan  4 15:23:44 figo-desktop kernel: [18613]     0 18613     1988      607   0       0             0 minicom
> Jan  4 15:23:44 figo-desktop kernel: [21183]  1000 21183     2041      517   1       0             0 bash
> Jan  4 15:23:44 figo-desktop kernel: [21194]  1000 21194     1611      125   0       0             0 ssh
> Jan  4 15:23:44 figo-desktop kernel: [22451]  1000 22451     2069      578   1       0             0 bash
> Jan  4 15:23:44 figo-desktop kernel: [23428]  1000 23428     6475      554   1       0             0 vim
> Jan  4 15:23:44 figo-desktop kernel: [23484]  1000 23484     6501      594   1       0             0 vim
> Jan  4 15:23:44 figo-desktop kernel: [23549]  1000 23549     6501      594   0       0             0 vim
> Jan  4 15:23:44 figo-desktop kernel: [23642]  1000 23642     9865      594   0       0             0 gvfsd-smb
> Jan  4 15:23:44 figo-desktop kernel: [26358]  1000 26358   407339     4819   1       0             0 insight3.exe
> Jan  4 15:23:44 figo-desktop kernel: [29711]  1000 29711     9943      617   0       0             0 gvfsd-smb
> Jan  4 15:23:44 figo-desktop kernel: [26156]  1000 26156    61269     9261   0       0             0 skype
> Jan  4 15:23:44 figo-desktop kernel: [32490]  1000 32490     2647      678   1       0             0 gconfd-2
> Jan  4 15:23:44 figo-desktop kernel: [10622]  1000 10622     2072      741   1       0             0 bash
> Jan  4 15:23:44 figo-desktop kernel: [10634]  1000 10634     1576      156   0       0             0 ssh
> Jan  4 15:23:44 figo-desktop kernel: [15410]  1000 15410    76559    12077   0       0             0 evince
> Jan  4 15:23:44 figo-desktop kernel: [15415]  1000 15415     5490      217   1       0             0 evinced
> Jan  4 15:23:44 figo-desktop kernel: [16754]  1000 16754     9899      485   0       0             0 gvfsd-smb
> Jan  4 15:23:44 figo-desktop kernel: [16772]  1000 16772     9900      500   0       0             0 gvfsd-smb
> Jan  4 15:23:44 figo-desktop kernel: [25390]  1000 25390   407306     2127   1       0             0 insight3.exe
> Jan  4 15:23:44 figo-desktop kernel: [ 2127]  1000  2127     1609      125   0       0             0 ssh
> Jan  4 15:23:44 figo-desktop kernel: [10661]    33 10661     4775     1510   1       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [10662]    33 10662     4823     1567   0       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [10663]    33 10663     4823     1568   0       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [10664]    33 10664     4823     1567   0       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [10665]    33 10665     4823     1567   0       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [10666]    33 10666     4823     1567   1       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [32159]  1000 32159   166096    39444   1       0             0 firefox
> Jan  4 15:23:44 figo-desktop kernel: [32228]    33 32228     4823     1510   0       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [32233]    33 32233     4857     1594   1       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [32234]    33 32234     4857     1595   1       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [32241]    33 32241     4789     1534   1       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [32246]    33 32246     4789     1534   0       0             0 httpd
> Jan  4 15:23:44 figo-desktop kernel: [32268]  1000 32268    27543     4273   1       0             0 plugin-containe
> Jan  4 15:23:44 figo-desktop kernel: [32320]  1000 32320    16230     1445   1       0             0 GoogleTalkPlugi
> Jan  4 15:23:44 figo-desktop kernel: [  970]  1000   970   407197     2621   0       0             0 insight3.exe
> Jan  4 15:23:44 figo-desktop kernel: [ 1240]  1000  1240      601      116   1       0             0 top
> Jan  4 15:23:44 figo-desktop kernel: [ 2038]  1000  2038   407785     2796   1       0             0 insight3.exe
> Jan  4 15:23:44 figo-desktop kernel: [12415]     0 12415      543      242   1     -17         -1000 udevd
> Jan  4 15:23:44 figo-desktop kernel: [12416]     0 12416      580      195   1     -17         -1000 udevd
> Jan  4 15:23:44 figo-desktop kernel: [13911]     0 13911      719      269   0       0             0 sh
> Jan  4 15:23:44 figo-desktop kernel: [13914]     0 13914      719      268   0       0             0 sh
> Jan  4 15:23:44 figo-desktop kernel: [13916]     0 13916     1187      188   1       0             0 git
> Jan  4 15:23:44 figo-desktop kernel: [13918]     0 13918      719      313   0       0             0 git-pull
> Jan  4 15:23:44 figo-desktop kernel: [13952]     0 13952     1189      263   1       0             0 git
> Jan  4 15:23:44 figo-desktop kernel: [13954]     0 13954     1575      460   1       0             0 ssh
> Jan  4 15:23:44 figo-desktop kernel: [13960]     0 13960     1488      224   1       0             0 sudo
> Jan  4 15:23:44 figo-desktop kernel: [13962]     0 13962     1386      130   0       0             0 swapoff






On 11/10/2010 11:24 PM, Figo.zhang wrote:
> the victim should not directly access hardware devices like Xorg server,
> because the hardware could be left in an unpredictable state, although
> user-application can set /proc/pid/oom_score_adj to protect it. so i think
> those processes should get bonus for protection.
>
> in v2, fix the incorrect comment.
> in v3, change the divided the badness score by 4, like old heuristic for protection. we just
> want the oom_killer don't select Root/RESOURCE/RAWIO process as possible.
>
> suppose that if a user process A such as email cleint "evolution" and a process B with
> ditecly hareware access such as "Xorg", they have eat the equal memory (the badness score is
> the same),so which process are you want to kill? so in new heuristic, it will kill the process B.
> but in reality, we want to kill process A.
>
> Signed-off-by: Figo.zhang<figo1802@gmail.com>
> Reviewed-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> ---
> mm/oom_kill.c |    9 +++++++++
>   1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4029583..f43d759 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -202,6 +202,15 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>   		points -= 30;
>
>   	/*
> +	 * Root and direct hareware access processes are usually more
> +	 * important, so they should get bonus for protection.
> +	 */
> +	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
> +	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
> +	    has_capability_noaudit(p, CAP_SYS_RAWIO))
> +		points /= 4;
> +
> +	/*
>   	 * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
>   	 * either completely disable oom killing or always prefer a certain
>   	 * task.
>
>
>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus
  2011-01-04  7:51       ` [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus Figo.zhang
@ 2011-01-04  8:28         ` KAMEZAWA Hiroyuki
  2011-01-04  8:56           ` Figo.zhang
  2011-01-05  3:32         ` David Rientjes
  1 sibling, 1 reply; 77+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-04  8:28 UTC (permalink / raw)
  To: Figo.zhang
  Cc: lkml, KOSAKI Motohiro, linux-mm, Andrew Morton, Linus Torvalds,
	Figo.zhang, rientjes, Wu Fengguang

On Tue, 04 Jan 2011 15:51:44 +0800
"Figo.zhang" <zhangtianfei@leadcoretech.com> wrote:

> 
> i had send the patch to protect the hardware access processes for 
> oom-killer before, but rientjes have not agree with me.
> 
> but today i catch log from my desktop. oom-killer have kill my "minicom" 
> and "Xorg". so i think it should add protection about it.
> 

Off topic.

In this log, I found

> > Jan  4 15:22:55 figo-desktop kernel: Free swap  = -1636kB
> > Jan  4 15:22:55 figo-desktop kernel: Total swap = 0kB
> > Jan  4 15:22:55 figo-desktop kernel: 515070 pages RAM

... This means total_swap_pages = 0 while pages are read-in at swapoff.

Let's see 'points' for oom 
==
points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
                        totalpages;
==

Here, totalpages = total_ram + total_swap but totalswap is 0 here.

So, points can be > 1000, easily.
(This seems not to be related to the Xorg's death itself)



Thanks,
-Kame


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus
  2011-01-04  8:28         ` KAMEZAWA Hiroyuki
@ 2011-01-04  8:56           ` Figo.zhang
  2011-01-06  0:55             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 77+ messages in thread
From: Figo.zhang @ 2011-01-04  8:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: lkml, KOSAKI Motohiro, linux-mm, Andrew Morton, Linus Torvalds,
	Figo.zhang, rientjes, Wu Fengguang

On 01/04/2011 04:28 PM, KAMEZAWA Hiroyuki wrote:
> On Tue, 04 Jan 2011 15:51:44 +0800
> "Figo.zhang"<zhangtianfei@leadcoretech.com>  wrote:
>
>>
>> i had send the patch to protect the hardware access processes for
>> oom-killer before, but rientjes have not agree with me.
>>
>> but today i catch log from my desktop. oom-killer have kill my "minicom"
>> and "Xorg". so i think it should add protection about it.
>>
>
> Off topic.
>
> In this log, I found
>
>>> Jan  4 15:22:55 figo-desktop kernel: Free swap  = -1636kB
>>> Jan  4 15:22:55 figo-desktop kernel: Total swap = 0kB
>>> Jan  4 15:22:55 figo-desktop kernel: 515070 pages RAM
>
> ... This means total_swap_pages = 0 while pages are read-in at swapoff.
>
> Let's see 'points' for oom
> ==
> points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
>                          totalpages;
> ==
>
> Here, totalpages = total_ram + total_swap but totalswap is 0 here.
>
> So, points can be>  1000, easily.
> (This seems not to be related to the Xorg's death itself)

total_swap is 0, so
totalpages = total_ram,
get_mm_counter(p->mm, MM_SWAPENTS) = 0,

so
points = (get_mm_rss(p->mm)) * 1000 / totalpages;

so points canot larger than 1000.




>
>
>
> Thanks,
> -Kame
>
>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus
  2011-01-04  7:51       ` [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus Figo.zhang
  2011-01-04  8:28         ` KAMEZAWA Hiroyuki
@ 2011-01-05  3:32         ` David Rientjes
  1 sibling, 0 replies; 77+ messages in thread
From: David Rientjes @ 2011-01-05  3:32 UTC (permalink / raw)
  To: Figo.zhang
  Cc: lkml, KOSAKI Motohiro, linux-mm, Andrew Morton, Linus Torvalds,
	KAMEZAWA Hiroyuki, Figo.zhang, Wu Fengguang

On Tue, 4 Jan 2011, Figo.zhang wrote:

> i had send the patch to protect the hardware access processes for oom-killer
> before, but rientjes have not agree with me.
> 

My objection wasn't just limited to CAP_SYS_RAWIO but rather to all 
arbitrary heuristics that make the badness scoring less than predictable.  
oom_badness()'s sole responsibility as implemented is to identify the most 
memory-hogging task that is eligible for kill in the current context and 
adding additional heuristics on top of that beyond CAP_SYS_ADMIN will 
slowly make it evolve into what we had before.  It also depreciates the 
unit of the new userspace tunable.

> but today i catch log from my desktop. oom-killer have kill my "minicom" and
> "Xorg". so i think it should add protection about it.
> 

Because Xorg was killed does necessarily mean we need to add more 
heursitics to the oom killer.  In fact, if you are to again suggest the 3% 
memory bonus for these tasks, you should be able to show that the same 
result cannot happen even with your patch.  I don't believe you've made a 
case for that and making all CAP_SYS_RAWIO tasks immune from oom killing 
as a default choice would be inappropriate.

If the kernel is killing Xorg, that means it is the most memory-hogging 
task on the system and it is following the heuristic's core principle: 
kill the most memory-hogging task to free a large amount of memory to 
prevent additional oom kills in the near future if allowed by userspace.  
The kernel doesn't realize that Xorg is important to you and me, it needs 
to tell it using /proc/pid/oom_score_adj.  Throwing additional heuristics 
for things like CAP_SYS_RAWIO is short-sighted, however, because the 
capability itself doesn't have any direct correlation to a memory quantity 
or oom killing preference.

[snipped the first oom killer log because it was incomplete]

> > Jan  4 15:22:51 figo-desktop kernel: minicom invoked oom-killer:
> > gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
> > Jan  4 15:22:51 figo-desktop kernel: minicom cpuset=/ mems_allowed=0
> > Jan  4 15:22:51 figo-desktop kernel: Pid: 18613, comm: minicom Not tainted
> > 2.6.36-ARCH #1
> > Jan  4 15:22:51 figo-desktop kernel: Call Trace:
> > Jan  4 15:22:51 figo-desktop kernel: [<c10c11d0>]
> > dump_header.clone.5+0x80/0x1e0
> > Jan  4 15:22:51 figo-desktop kernel: [<c10c153c>]
> > oom_kill_process+0x5c/0x1c0
> > Jan  4 15:22:51 figo-desktop kernel: [<c10c171d>] ?
> > select_bad_process.clone.7+0x7d/0xd0
> > Jan  4 15:22:51 figo-desktop kernel: [<c10c1a0f>] out_of_memory+0xbf/0x1d0
> > Jan  4 15:22:51 figo-desktop kernel: [<c10c18d8>] ?
> > try_set_zonelist_oom+0xc8/0xe0
> > Jan  4 15:22:51 figo-desktop kernel: [<c10c5288>]
> > __alloc_pages_nodemask+0x5e8/0x600
> > Jan  4 15:22:51 figo-desktop kernel: [<c10c6c35>]
> > __do_page_cache_readahead+0x105/0x230
> > Jan  4 15:22:51 figo-desktop kernel: [<c10c6fc1>] ra_submit+0x21/0x30
> > Jan  4 15:22:51 figo-desktop kernel: [<c10bf31b>] filemap_fault+0x36b/0x3e0
> > Jan  4 15:22:51 figo-desktop kernel: [<c10d637b>] __do_fault+0x3b/0x4f0
> > Jan  4 15:22:51 figo-desktop kernel: [<c122e619>] ?
> > check_modem_status+0x19/0x1d0
> > Jan  4 15:22:51 figo-desktop kernel: [<c10befb0>] ? filemap_fault+0x0/0x3e0
> > Jan  4 15:22:51 figo-desktop kernel: [<c10d95c1>]
> > handle_mm_fault+0x111/0x970
> > Jan  4 15:22:51 figo-desktop kernel: [<c1172121>] ?
> > tomoyo_init_request_info+0x41/0x50
> > Jan  4 15:22:51 figo-desktop kernel: [<c1028d60>] ? do_page_fault+0x0/0x3e0
> > Jan  4 15:22:51 figo-desktop kernel: [<c1028eb0>] do_page_fault+0x150/0x3e0
> > Jan  4 15:22:51 figo-desktop kernel: [<c1170eb2>] ?
> > tomoyo_file_ioctl+0x12/0x20
> > Jan  4 15:22:51 figo-desktop kernel: [<c110c43f>] ? sys_ioctl+0x5f/0x80
> > Jan  4 15:22:51 figo-desktop kernel: [<c1028d60>] ? do_page_fault+0x0/0x3e0
> > Jan  4 15:22:51 figo-desktop kernel: [<c130753b>] error_code+0x67/0x6c
> > Jan  4 15:22:51 figo-desktop kernel: Mem-Info:
> > Jan  4 15:22:51 figo-desktop kernel: DMA per-cpu:
> > Jan  4 15:22:51 figo-desktop kernel: CPU    0: hi:    0, btch:   1 usd:   0
> > Jan  4 15:22:51 figo-desktop kernel: CPU    1: hi:    0, btch:   1 usd:   0
> > Jan  4 15:22:51 figo-desktop kernel: Normal per-cpu:
> > Jan  4 15:22:51 figo-desktop kernel: CPU    0: hi:  186, btch:  31 usd:  61
> > Jan  4 15:22:54 figo-desktop kernel: CPU    1: hi:  186, btch:  31 usd:   1
> > Jan  4 15:22:54 figo-desktop kernel: HighMem per-cpu:
> > Jan  4 15:22:54 figo-desktop kernel: CPU    0: hi:  186, btch:  31 usd:  30
> > Jan  4 15:22:54 figo-desktop kernel: CPU    1: hi:  186, btch:  31 usd:  30
> > Jan  4 15:22:54 figo-desktop kernel: active_anon:191656 inactive_anon:101947
> > isolated_anon:0
> > Jan  4 15:22:54 figo-desktop kernel: active_file:54186 inactive_file:122506
> > isolated_file:0
> > Jan  4 15:22:54 figo-desktop kernel: unevictable:17 dirty:0 writeback:0
> > unstable:0
> > Jan  4 15:22:54 figo-desktop kernel: free:11958 slab_reclaimable:5450
> > slab_unreclaimable:6123
> > Jan  4 15:22:54 figo-desktop kernel: mapped:49649 shmem:29219
> > pagetables:2559 bounce:0
> > Jan  4 15:22:54 figo-desktop kernel: DMA free:7960kB min:64kB low:80kB
> > high:96kB active_anon:4772kB inactive_anon:1180kB active_file:524kB
> > inactive_file:768kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> > present:15788kB mlocked:0kB dirty:0kB writeback:0kB mapped:592kB shmem:348kB
> > slab_reclaimable:480kB slab_unreclaimable:112kB kernel_stack:8kB
> > pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2689
> > all_unreclaimable? yes
> > Jan  4 15:22:55 figo-desktop kernel: lowmem_reserve[]: 0 865 1980 1980
> > Jan  4 15:22:55 figo-desktop kernel: Normal free:39364kB min:3728kB
> > low:4660kB high:5592kB active_anon:224324kB inactive_anon:226020kB
> > active_file:86076kB inactive_file:207880kB unevictable:68kB
> > isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:68kB
> > dirty:0kB writeback:0kB mapped:57272kB shmem:42072kB
> > slab_reclaimable:21320kB slab_unreclaimable:24380kB kernel_stack:3224kB
> > pagetables:10236kB unstable:0kB bounce:0kB writeback_tmp:0kB
> > pages_scanned:445029 all_unreclaimable? yes
> > Jan  4 15:22:55 figo-desktop kernel: lowmem_reserve[]: 0 0 8921 8921
> > Jan  4 15:22:55 figo-desktop kernel: HighMem free:508kB min:512kB low:1712kB
> > high:2912kB active_anon:537528kB inactive_anon:180588kB active_file:130144kB
> > inactive_file:281376kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> > present:1141984kB mlocked:0kB dirty:0kB writeback:0kB mapped:140732kB
> > shmem:74456kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> > pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
> > pages_scanned:644938 all_unreclaimable? yes
> > Jan  4 15:22:55 figo-desktop kernel: lowmem_reserve[]: 0 0 0 0
> > Jan  4 15:22:55 figo-desktop kernel: DMA: 1500*4kB 245*8kB 0*16kB 0*32kB
> > 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7960kB
> > Jan  4 15:22:55 figo-desktop kernel: Normal: 1399*4kB 3723*8kB 91*16kB
> > 21*32kB 5*64kB 2*128kB 3*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 39364kB
> > Jan  4 15:22:55 figo-desktop kernel: HighMem: 27*4kB 10*8kB 18*16kB 1*32kB
> > 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508kB
> > Jan  4 15:22:55 figo-desktop kernel: 206073 total pagecache pages
> > Jan  4 15:22:55 figo-desktop kernel: 151 pages in swap cache
> > Jan  4 15:22:55 figo-desktop kernel: Swap cache stats: add 889409, delete
> > 889258, find 277087/391428
> > Jan  4 15:22:55 figo-desktop kernel: Free swap  = -1636kB
> > Jan  4 15:22:55 figo-desktop kernel: Total swap = 0kB
> > Jan  4 15:22:55 figo-desktop kernel: 515070 pages RAM
> > Jan  4 15:22:55 figo-desktop kernel: 287745 pages HighMem
> > Jan  4 15:22:55 figo-desktop kernel: 8297 pages reserved
> > Jan  4 15:22:55 figo-desktop kernel: 304095 pages shared
> > Jan  4 15:22:55 figo-desktop kernel: 306148 pages non-shared
> > Jan  4 15:22:55 figo-desktop kernel: [ pid ]   uid  tgid total_vm      rss
> > cpu oom_adj oom_score_adj name
> > Jan  4 15:22:55 figo-desktop kernel: [  583]     0   583      581      207
> > 1     -17         -1000 udevd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1303]     0  1303      868       37
> > 0       0             0 syslog-ng
> > Jan  4 15:22:55 figo-desktop kernel: [ 1304]     0  1304     1604      350
> > 0       0             0 syslog-ng
> > Jan  4 15:22:55 figo-desktop kernel: [ 1306]    81  1306      808      396
> > 1       0             0 dbus-daemon
> > Jan  4 15:22:55 figo-desktop kernel: [ 1309]    82  1309     3777      406
> > 0       0             0 hald
> > Jan  4 15:22:55 figo-desktop kernel: [ 1310]     0  1310      903      130
> > 1       0             0 hald-runner
> > Jan  4 15:22:55 figo-desktop kernel: [ 1339]     0  1339      919       90
> > 1       0             0 hald-addon-inpu
> > Jan  4 15:22:55 figo-desktop kernel: [ 1357]     0  1357      919      184
> > 1       0             0 hald-addon-stor
> > Jan  4 15:22:55 figo-desktop kernel: [ 1359]    82  1359      824      107
> > 1       0             0 hald-addon-acpi
> > Jan  4 15:22:55 figo-desktop kernel: [ 1448]     0  1448      616      294
> > 0       0             0 crond
> > Jan  4 15:22:55 figo-desktop kernel: [ 1477]     0  1477      720      108
> > 0       0             0 mysqld_safe
> > Jan  4 15:22:55 figo-desktop kernel: [ 1484]     0  1484     3580      269
> > 0       0             0 gdm-binary
> > Jan  4 15:22:55 figo-desktop kernel: [ 1497]     0  1497      440       67
> > 1       0             0 agetty
> > Jan  4 15:22:55 figo-desktop kernel: [ 1498]     0  1498      440       67
> > 0       0             0 agetty
> > Jan  4 15:22:55 figo-desktop kernel: [ 1499]     0  1499      440       67
> > 0       0             0 agetty
> > Jan  4 15:22:55 figo-desktop kernel: [ 1500]     0  1500      440       67
> > 1       0             0 agetty
> > Jan  4 15:22:55 figo-desktop kernel: [ 1501]     0  1501      440       67
> > 0       0             0 agetty
> > Jan  4 15:22:55 figo-desktop kernel: [ 1502]     0  1502      440       68
> > 0       0             0 agetty
> > Jan  4 15:22:55 figo-desktop kernel: [ 1553]     0  1553     6607      619
> > 0       0             0 NetworkManager
> > Jan  4 15:22:55 figo-desktop kernel: [ 1592]     0  1592     2256      330
> > 0       0             0 cupsd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1597]    89  1597    29903     2812
> > 1       0             0 mysqld
> > Jan  4 15:22:55 figo-desktop kernel: [ 1598]     0  1598     1652      147
> > 1     -17         -1000 sshd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1612]     0  1612     6263      508
> > 1       0             0 polkitd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1613]     0  1613     2015      123
> > 1       0             0 vmware-usbarbit
> > Jan  4 15:22:55 figo-desktop kernel: [ 1617]     0  1617     2887     1603
> > 0       0             0 cntlm
> > Jan  4 15:22:55 figo-desktop kernel: [ 1620]     0  1620     4398      366
> > 0       0             0 gdm-simple-slav
> > Jan  4 15:22:55 figo-desktop kernel: [ 1636]     0  1636    37298    12186
> > 0       0             0 Xorg
> > Jan  4 15:22:55 figo-desktop kernel: [ 1638]     0  1638     1248       88
> > 0       0             0 wpa_supplicant
> > Jan  4 15:22:55 figo-desktop kernel: [ 1720]     0  1720     4789     1555
> > 1       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1722]     0  1722      488       93
> > 0       0             0 dhcpcd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1732]     0  1732      519       62
> > 1       0             0 vmnet-bridge
> > Jan  4 15:22:55 figo-desktop kernel: [ 1750]     0  1750     4922      304
> > 1       0             0 smbd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1764]     0  1764      655       39
> > 1       0             0 vmnet-dhcpd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1772]     0  1772      513       15
> > 0       0             0 vmnet-netifup
> > Jan  4 15:22:55 figo-desktop kernel: [ 1774]     0  1774      655       39
> > 0       0             0 vmnet-dhcpd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1776]     0  1776     4922      225
> > 0       0             0 smbd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1778]     0  1778      634       44
> > 0       0             0 vmnet-natd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1780]     0  1780      513       14
> > 1       0             0 vmnet-netifup
> > Jan  4 15:22:55 figo-desktop kernel: [ 1796]     0  1796     6677      329
> > 0       0             0 console-kit-dae
> > Jan  4 15:22:55 figo-desktop kernel: [ 1899]   120  1899     6587      598
> > 0       0             0 polkit-gnome-au
> > Jan  4 15:22:55 figo-desktop kernel: [ 1903]     0  1903     3905      245
> > 0       0             0 gdm-session-wor
> > Jan  4 15:22:55 figo-desktop kernel: [ 1906]     0  1906     3677      314
> > 1       0             0 upowerd
> > Jan  4 15:22:55 figo-desktop kernel: [ 1974]  1000  1974    10963     1248
> > 0       0             0 gnome-keyring-d
> > Jan  4 15:22:55 figo-desktop kernel: [ 1993]  1000  1993     9253      776
> > 0       0             0 gnome-session
> > Jan  4 15:22:55 figo-desktop kernel: [ 2012]  1000  2012      796      103
> > 0       0             0 dbus-launch
> > Jan  4 15:22:55 figo-desktop kernel: [ 2013]  1000  2013     5854     4985
> > 0       0             0 dbus-daemon
> > Jan  4 15:22:55 figo-desktop kernel: [ 2015]  1000  2015      887       54
> > 0       0             0 ssh-agent
> > Jan  4 15:22:55 figo-desktop kernel: [ 2022]  1000  2022    10904     4789
> > 0       0             0 fcitx
> > Jan  4 15:22:55 figo-desktop kernel: [ 2032]  1000  2032    41764     2007
> > 0       0             0 gnome-settings-
> > Jan  4 15:22:55 figo-desktop kernel: [ 2036]  1000  2036     2367      416
> > 0       0             0 gvfsd
> > Jan  4 15:22:55 figo-desktop kernel: [ 2039]  1000  2039    85188    29636
> > 1       0             0 metacity
> > Jan  4 15:22:55 figo-desktop kernel: [ 2049]  1000  2049    14162      280
> > 1       0             0 gvfs-fuse-daemo
> > Jan  4 15:22:55 figo-desktop kernel: [ 2053]  1000  2053    76657     3620
> > 0       0             0 gnome-panel
> > Jan  4 15:22:55 figo-desktop kernel: [ 2056]  1000  2056    10978      563
> > 1       0             0 gvfs-gdu-volume
> > Jan  4 15:22:55 figo-desktop kernel: [ 2058]     0  2058     5767      503
> > 0       0             0 udisks-daemon
> > Jan  4 15:22:55 figo-desktop kernel: [ 2059]     0  2059     1291       72
> > 0       0             0 udisks-daemon
> > Jan  4 15:22:55 figo-desktop kernel: [ 2069]  1000  2069   168292    50943
> > 0       0             0 nautilus
> > Jan  4 15:22:55 figo-desktop kernel: [ 2071]  1000  2071    12907      223
> > 0       0             0 bonobo-activati
> > Jan  4 15:22:55 figo-desktop kernel: [ 2081]  1000  2081     1633      136
> > 0       0             0 sh
> > Jan  4 15:22:55 figo-desktop kernel: [ 2082]  1000  2082     1666      136
> > 1       0             0 thunderbird
> > Jan  4 15:22:55 figo-desktop kernel: [ 2084]  1000  2084    46718     3501
> > 1       0             0 wnck-applet
> > Jan  4 15:22:55 figo-desktop kernel: [ 2086]  1000  2086    42862     1487
> > 0       0             0 polkit-gnome-au
> > Jan  4 15:22:55 figo-desktop kernel: [ 2087]  1000  2087   285933     1584
> > 0       0             0 nm-applet
> > Jan  4 15:22:55 figo-desktop kernel: [ 2104]  1000  2104     5056      715
> > 0       0             0 gdu-notificatio
> > Jan  4 15:22:55 figo-desktop kernel: [ 2107]  1000  2107     1666      142
> > 1       0             0 run-mozilla.sh
> > Jan  4 15:22:55 figo-desktop kernel: [ 2108]  1000  2108    39395      973
> > 0       0             0 gnome-power-man
> > Jan  4 15:22:55 figo-desktop kernel: [ 2109]  1000  2109     7886      709
> > 0       0             0 vino-server
> > Jan  4 15:22:55 figo-desktop kernel: [ 2110]  1000  2110    10764      791
> > 1       0             0 evolution-alarm
> > Jan  4 15:22:55 figo-desktop kernel: [ 2114]  1000  2114   167865    29673
> > 1       0             0 thunderbird-bin
> > Jan  4 15:22:55 figo-desktop kernel: [ 2121]  1000  2121    42098     1965
> > 0       0             0 notify-osd
> > Jan  4 15:22:55 figo-desktop kernel: [ 2125]  1000  2125    42488     1203
> > 1       0             0 cpufreq-applet
> > Jan  4 15:22:55 figo-desktop kernel: [ 2126]  1000  2126    41394     1148
> > 1       0             0 multiload-apple
> > Jan  4 15:22:55 figo-desktop kernel: [ 2129]  1000  2129    64405     1827
> > 0       0             0 mixer_applet2
> > Jan  4 15:22:55 figo-desktop kernel: [ 2131]  1000  2131    75346     2489
> > 0       0             0 clock-applet
> > Jan  4 15:22:55 figo-desktop kernel: [ 2132]  1000  2132    41163      930
> > 1       0             0 notification-ar
> > Jan  4 15:22:55 figo-desktop kernel: [ 2149]  1000  2149    15325      697
> > 1       0             0 e-calendar-fact
> > Jan  4 15:22:55 figo-desktop kernel: [ 2153]  1000  2153     7497     1008
> > 0       0             0 gnome-screensav
> > Jan  4 15:22:55 figo-desktop kernel: [ 2155]  1000  2155     3848      202
> > 1       0             0 pxgconf
> > Jan  4 15:22:55 figo-desktop kernel: [ 2163]  1000  2163     1781      342
> > 0       0             0 mission-control
> > Jan  4 15:22:55 figo-desktop kernel: [ 2173]  1000  2173     4486      428
> > 0       0             0 gvfsd-trash
> > Jan  4 15:22:55 figo-desktop kernel: [ 2186]     0  2186     3543      192
> > 1       0             0 system-tools-ba
> > Jan  4 15:22:55 figo-desktop kernel: [ 2210]  1000  2210     2202      238
> > 0       0             0 gvfsd-burn
> > Jan  4 15:22:55 figo-desktop kernel: [ 2245]  1000  2245     3895     1730
> > 0       0             0 gvfsd-metadata
> > Jan  4 15:22:55 figo-desktop kernel: [ 2274]  1000  2274    22823      424
> > 0       0             0 conky
> > Jan  4 15:22:55 figo-desktop kernel: [ 2281]     0  2281     3295     2278
> > 0       0             0 SystemToolsBack
> > Jan  4 15:22:55 figo-desktop kernel: [ 2663]  1000  2663    68807     3042
> > 1       0             0 gnome-terminal
> > Jan  4 15:22:55 figo-desktop kernel: [ 2683]  1000  2683      451       76
> > 0       0             0 gnome-pty-helpe
> > Jan  4 15:22:55 figo-desktop kernel: [ 2685]  1000  2685     2072      570
> > 0       0             0 bash
> > Jan  4 15:22:55 figo-desktop kernel: [ 2885]     0  2885     1489      119
> > 0       0             0 sudo
> > Jan  4 15:22:55 figo-desktop kernel: [ 2886]     0  2886     6273      355
> > 1       0             0 vim
> > Jan  4 15:22:55 figo-desktop kernel: [ 2887]  1000  2887      472       84
> > 0       0             0 ping
> > Jan  4 15:22:55 figo-desktop kernel: [ 2892]  1000  2892      472       83
> > 1       0             0 ping
> > Jan  4 15:22:55 figo-desktop kernel: [ 2894]  1000  2894    76113     5890
> > 0       0             0 vmware
> > Jan  4 15:22:55 figo-desktop kernel: [ 2919]  1000  2919    51497     3315
> > 1       0             0 vmware-tray
> > Jan  4 15:22:55 figo-desktop kernel: [ 2954]  1000  2954    48676     1589
> > 0       0             0 vmware-unity-he
> > Jan  4 15:22:55 figo-desktop kernel: [ 2988]  1000  2988   190471    42400
> > 0       0             0 vmware-vmx
> > Jan  4 15:22:55 figo-desktop kernel: [ 3207]  1000  3207     4377      362
> > 1       0             0 gvfsd-computer
> > Jan  4 15:22:55 figo-desktop kernel: [ 3211]  1000  3211     9920      509
> > 0       0             0 gvfsd-smb-brows
> > Jan  4 15:22:55 figo-desktop kernel: [ 3217]  1000  3217     9876      569
> > 0       0             0 gvfsd-smb
> > Jan  4 15:22:55 figo-desktop kernel: [15186]  1000 15186     2069      558
> > 1       0             0 bash
> > Jan  4 15:22:55 figo-desktop kernel: [17451]  1000 17451     5679      503
> > 0       0             0 dconf-service
> > Jan  4 15:22:55 figo-desktop kernel: [19085]  1000 19085     1576      149
> > 0       0             0 ssh
> > Jan  4 15:22:55 figo-desktop kernel: [19261]  1000 19261     1682      967
> > 0       0             0 wineserver
> > Jan  4 15:22:55 figo-desktop kernel: [19266]  1000 19266   399085      212
> > 0       0             0 services.exe
> > Jan  4 15:22:55 figo-desktop kernel: [19269]  1000 19269   399117      158
> > 1       0             0 winedevice.exe
> > Jan  4 15:22:55 figo-desktop kernel: [19342]  1000 19342   404518      643
> > 0       0             0 explorer.exe
> > Jan  4 15:22:55 figo-desktop kernel: [19344]  1000 19344   550020    11028
> > 1       0             0 insight3.exe
> > Jan  4 15:22:55 figo-desktop kernel: [  360]  1000   360     2069      541
> > 1       0             0 bash
> > Jan  4 15:22:55 figo-desktop kernel: [ 9821]  1000  9821   166228     2915
> > 0       0             0 stardict
> > Jan  4 15:22:55 figo-desktop kernel: [14614]  1000 14614     2040      523
> > 1       0             0 bash
> > Jan  4 15:22:55 figo-desktop kernel: [17002]  1000 17002     2069      541
> > 0       0             0 bash
> > Jan  4 15:22:55 figo-desktop kernel: [18612]     0 18612     1536      125
> > 1       0             0 sudo
> > Jan  4 15:22:55 figo-desktop kernel: [18613]     0 18613     1988      608
> > 1       0             0 minicom
> > Jan  4 15:22:55 figo-desktop kernel: [21183]  1000 21183     2041      517
> > 1       0             0 bash
> > Jan  4 15:22:55 figo-desktop kernel: [21194]  1000 21194     1611      125
> > 0       0             0 ssh
> > Jan  4 15:22:55 figo-desktop kernel: [22451]  1000 22451     2069      578
> > 1       0             0 bash
> > Jan  4 15:22:55 figo-desktop kernel: [23428]  1000 23428     6475      554
> > 1       0             0 vim
> > Jan  4 15:22:55 figo-desktop kernel: [23484]  1000 23484     6501      594
> > 1       0             0 vim
> > Jan  4 15:22:55 figo-desktop kernel: [23549]  1000 23549     6501      594
> > 0       0             0 vim
> > Jan  4 15:22:55 figo-desktop kernel: [23642]  1000 23642     9865      594
> > 0       0             0 gvfsd-smb
> > Jan  4 15:22:55 figo-desktop kernel: [26358]  1000 26358   407339     5019
> > 1       0             0 insight3.exe
> > Jan  4 15:22:55 figo-desktop kernel: [29711]  1000 29711     9943      617
> > 0       0             0 gvfsd-smb
> > Jan  4 15:22:55 figo-desktop kernel: [26156]  1000 26156    61269     9179
> > 0       0             0 skype
> > Jan  4 15:22:55 figo-desktop kernel: [32490]  1000 32490     2647      684
> > 0       0             0 gconfd-2
> > Jan  4 15:22:55 figo-desktop kernel: [10622]  1000 10622     2072      725
> > 1       0             0 bash
> > Jan  4 15:22:55 figo-desktop kernel: [10634]  1000 10634     1576      156
> > 0       0             0 ssh
> > Jan  4 15:22:55 figo-desktop kernel: [15410]  1000 15410    76559    12161
> > 0       0             0 evince
> > Jan  4 15:22:55 figo-desktop kernel: [15415]  1000 15415     5490      217
> > 1       0             0 evinced
> > Jan  4 15:22:55 figo-desktop kernel: [16754]  1000 16754     9899      485
> > 0       0             0 gvfsd-smb
> > Jan  4 15:22:55 figo-desktop kernel: [16772]  1000 16772     9900      500
> > 0       0             0 gvfsd-smb
> > Jan  4 15:22:55 figo-desktop kernel: [25390]  1000 25390   407306     2164
> > 1       0             0 insight3.exe
> > Jan  4 15:22:55 figo-desktop kernel: [ 2127]  1000  2127     1609      125
> > 0       0             0 ssh
> > Jan  4 15:22:55 figo-desktop kernel: [10661]    33 10661     4775     1510
> > 1       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [10662]    33 10662     4823     1569
> > 0       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [10663]    33 10663     4823     1570
> > 0       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [10664]    33 10664     4823     1569
> > 0       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [10665]    33 10665     4823     1569
> > 0       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [10666]    33 10666     4823     1569
> > 1       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [32159]  1000 32159   166096    39479
> > 0       0             0 firefox
> > Jan  4 15:22:55 figo-desktop kernel: [32228]    33 32228     4823     1512
> > 0       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [32233]    33 32233     4857     1596
> > 1       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [32234]    33 32234     4857     1597
> > 1       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [32241]    33 32241     4789     1536
> > 1       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [32246]    33 32246     4789     1536
> > 0       0             0 httpd
> > Jan  4 15:22:55 figo-desktop kernel: [32268]  1000 32268    27543     4279
> > 1       0             0 plugin-containe
> > Jan  4 15:22:55 figo-desktop kernel: [32320]  1000 32320    16230     1445
> > 1       0             0 GoogleTalkPlugi
> > Jan  4 15:22:55 figo-desktop kernel: [  970]  1000   970   407197     2636
> > 1       0             0 insight3.exe
> > Jan  4 15:22:55 figo-desktop kernel: [ 1240]  1000  1240      601      116
> > 1       0             0 top
> > Jan  4 15:22:55 figo-desktop kernel: [ 2038]  1000  2038   407786     2842
> > 0       0             0 insight3.exe
> > Jan  4 15:22:55 figo-desktop kernel: [12415]     0 12415      543      232
> > 1     -17         -1000 udevd
> > Jan  4 15:22:55 figo-desktop kernel: [12416]     0 12416      580      195
> > 1     -17         -1000 udevd
> > Jan  4 15:22:55 figo-desktop kernel: [13904]     0 13904     1488      219
> > 1       0             0 sudo
> > Jan  4 15:22:55 figo-desktop kernel: [13906]     0 13906     1386      131
> > 0       0             0 swapoff

We don't know what task was killed here, if any, because it's not showing 
the "Killed process ... total-vm:...kB, anon-rss:...kB, file--rss:...kB" 
line that indicates anything was killed.  In fact, your entire log doesn't 
show that.  minicom simply invoking the oom killer (and Xorg later) 
doesn't indicate a problem.

Please post a complete log that shows the tasklist dump and which task was 
selected from that state.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus
  2011-01-04  8:56           ` Figo.zhang
@ 2011-01-06  0:55             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 77+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-06  0:55 UTC (permalink / raw)
  To: Figo.zhang
  Cc: lkml, KOSAKI Motohiro, linux-mm, Andrew Morton, Linus Torvalds,
	Figo.zhang, rientjes, Wu Fengguang

On Tue, 04 Jan 2011 16:56:47 +0800
"Figo.zhang" <zhangtianfei@leadcoretech.com> wrote:

> On 01/04/2011 04:28 PM, KAMEZAWA Hiroyuki wrote:
> > On Tue, 04 Jan 2011 15:51:44 +0800
> > "Figo.zhang"<zhangtianfei@leadcoretech.com>  wrote:
> >
> >>
> >> i had send the patch to protect the hardware access processes for
> >> oom-killer before, but rientjes have not agree with me.
> >>
> >> but today i catch log from my desktop. oom-killer have kill my "minicom"
> >> and "Xorg". so i think it should add protection about it.
> >>
> >
> > Off topic.
> >
> > In this log, I found
> >
> >>> Jan  4 15:22:55 figo-desktop kernel: Free swap  = -1636kB
> >>> Jan  4 15:22:55 figo-desktop kernel: Total swap = 0kB
> >>> Jan  4 15:22:55 figo-desktop kernel: 515070 pages RAM
> >
> > ... This means total_swap_pages = 0 while pages are read-in at swapoff.
> >
> > Let's see 'points' for oom
> > ==
> > points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
> >                          totalpages;
> > ==
> >
> > Here, totalpages = total_ram + total_swap but totalswap is 0 here.
> >
> > So, points can be>  1000, easily.
> > (This seems not to be related to the Xorg's death itself)
> 
> total_swap is 0, so
> totalpages = total_ram,
> get_mm_counter(p->mm, MM_SWAPENTS) = 0,
> 
> so
> points = (get_mm_rss(p->mm)) * 1000 / totalpages;
> 
> so points canot larger than 1000.

mm_counter's swap count is reduced only when swapents are removed from
page table. But total_swap is reduced to be 0 before try_to_unuse().


Thanks,
-Kame




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-30 13:04               ` KOSAKI Motohiro
@ 2010-11-30 20:02                 ` David Rientjes
  0 siblings, 0 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-30 20:02 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Linus Torvalds, LKML, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

On Tue, 30 Nov 2010, KOSAKI Motohiro wrote:

> > > > You may remember that the initial version of my rewrite replaced oom_adj 
> > > > entirely with the new oom_score_adj semantics.  Others suggested that it 
> > > > be seperated into a new tunable and the old tunable deprecated for a 
> > > > lengthy period of time.  I accepted that criticism and understood the 
> > > > drawbacks of replacing the tunable immediately and followed those 
> > > > suggestions.  I disagree with you that the deprecation of oom_adj for a 
> > > > period of two years is as dramatic as you imply and I disagree that users 
> > > > are experiencing problems with the linear scale that it now operates on 
> > > > versus the old exponential scale.
> > > 
> > > Yes and No. People wanted to separate AND don't break old one.
> > > 
> > 
> > You're arguing on the behalf of applications that don't exist.
> 
> Why?
> You actually got the bug report.
> 

There have never been any bug reports related to applications using 
oom_score_adj and being impacted with its linear mapping onto oom_adj's 
exponential scale.  That's because no users prior to the rewrite were 
using oom_adj scores that were based on either the expected memory usage 
of the application nor the capacity of the machine.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-28  1:45             ` David Rientjes
@ 2010-11-30 13:04               ` KOSAKI Motohiro
  2010-11-30 20:02                 ` David Rientjes
  0 siblings, 1 reply; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-30 13:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Andrew Morton, Linus Torvalds, LKML, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

> On Tue, 23 Nov 2010, KOSAKI Motohiro wrote:
> 
> > > You may remember that the initial version of my rewrite replaced oom_adj 
> > > entirely with the new oom_score_adj semantics.  Others suggested that it 
> > > be seperated into a new tunable and the old tunable deprecated for a 
> > > lengthy period of time.  I accepted that criticism and understood the 
> > > drawbacks of replacing the tunable immediately and followed those 
> > > suggestions.  I disagree with you that the deprecation of oom_adj for a 
> > > period of two years is as dramatic as you imply and I disagree that users 
> > > are experiencing problems with the linear scale that it now operates on 
> > > versus the old exponential scale.
> > 
> > Yes and No. People wanted to separate AND don't break old one.
> > 
> 
> You're arguing on the behalf of applications that don't exist.

Why?
You actually got the bug report.


> 
> > > > 1) About two month ago, Dave hansen observed strange OOM issue because he
> > > >    has a big machine and ALL process are not so big. thus, eventually all 
> > > >    process got oom-score=0 and oom-killer didn't work.
> > > > 
> > > >    https://kerneltrap.org/mailarchive/linux-driver-devel/2010/9/9/6886383
> > > > 
> > > >    DavidR changed oom-score to +1 in such situation. 
> > > > 
> > > >    http://kerneltrap.org/mailarchive/linux-kernel/2010/9/9/4617455
> > > > 
> > > >    But it is completely bognus. If all process have score=1, oom-killer fall
> > > >    back to purely random killer. I expected and explained his patch has
> > > >    its problem at half years ago. but he didn't fix yet.
> > > > 
> > > 
> > > The resolution with which the oom killer considers memory is at 0.1% of 
> > > system RAM at its highest (smaller when you have a memory controller, 
> > > cpuset, or mempolicy constrained oom).  It considers a task within 0.1% of 
> > > memory of another task to have equal "badness" to kill, we don't break 
> > > ties in between that resolution -- it all depends on which one shows up in 
> > > the tasklist first.  If you disagree with that resolution, which I support 
> > > as being high enough, then you may certainly propose a patch to make it 
> > > even finer at 0.01%, 0.001%, etc.  It would only change oom_badness() to 
> > > range between [0,10000], [0,100000], etc.
> > 
> > No.
> > Think Moore's Law. rational value will be not able to work in future anyway.
> > 10 years ago, I used 20M bytes memory desktop machine and I'm now using 2GB.
> > memory amount is growing and growing. and bash size doesn't grwoing so fast.
> > 
> 
> If you'd like to suggest an increase to the upper-bound of the badness 
> score, please do so, although I don't think we need to break ties amongst 
> tasks that differ by at most <0.1% of the system's capacity.

No. I dislike. I dislike propotinal score.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-23  7:16           ` KOSAKI Motohiro
@ 2010-11-28  1:45             ` David Rientjes
  2010-11-30 13:04               ` KOSAKI Motohiro
  0 siblings, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-28  1:45 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Linus Torvalds, LKML, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

On Tue, 23 Nov 2010, KOSAKI Motohiro wrote:

> > You may remember that the initial version of my rewrite replaced oom_adj 
> > entirely with the new oom_score_adj semantics.  Others suggested that it 
> > be seperated into a new tunable and the old tunable deprecated for a 
> > lengthy period of time.  I accepted that criticism and understood the 
> > drawbacks of replacing the tunable immediately and followed those 
> > suggestions.  I disagree with you that the deprecation of oom_adj for a 
> > period of two years is as dramatic as you imply and I disagree that users 
> > are experiencing problems with the linear scale that it now operates on 
> > versus the old exponential scale.
> 
> Yes and No. People wanted to separate AND don't break old one.
> 

You're arguing on the behalf of applications that don't exist.

> > > 1) About two month ago, Dave hansen observed strange OOM issue because he
> > >    has a big machine and ALL process are not so big. thus, eventually all 
> > >    process got oom-score=0 and oom-killer didn't work.
> > > 
> > >    https://kerneltrap.org/mailarchive/linux-driver-devel/2010/9/9/6886383
> > > 
> > >    DavidR changed oom-score to +1 in such situation. 
> > > 
> > >    http://kerneltrap.org/mailarchive/linux-kernel/2010/9/9/4617455
> > > 
> > >    But it is completely bognus. If all process have score=1, oom-killer fall
> > >    back to purely random killer. I expected and explained his patch has
> > >    its problem at half years ago. but he didn't fix yet.
> > > 
> > 
> > The resolution with which the oom killer considers memory is at 0.1% of 
> > system RAM at its highest (smaller when you have a memory controller, 
> > cpuset, or mempolicy constrained oom).  It considers a task within 0.1% of 
> > memory of another task to have equal "badness" to kill, we don't break 
> > ties in between that resolution -- it all depends on which one shows up in 
> > the tasklist first.  If you disagree with that resolution, which I support 
> > as being high enough, then you may certainly propose a patch to make it 
> > even finer at 0.01%, 0.001%, etc.  It would only change oom_badness() to 
> > range between [0,10000], [0,100000], etc.
> 
> No.
> Think Moore's Law. rational value will be not able to work in future anyway.
> 10 years ago, I used 20M bytes memory desktop machine and I'm now using 2GB.
> memory amount is growing and growing. and bash size doesn't grwoing so fast.
> 

If you'd like to suggest an increase to the upper-bound of the badness 
score, please do so, although I don't think we need to break ties amongst 
tasks that differ by at most <0.1% of the system's capacity.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-14 19:32 ` Linus Torvalds
  2010-11-15  0:54   ` KOSAKI Motohiro
@ 2010-11-23 23:51   ` KOSAKI Motohiro
  1 sibling, 0 replies; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-23 23:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kosaki.motohiro, LKML, David Rientjes, Andrew Morton, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

> 2010/11/13 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> >
> > Please apply this. this patch revert commits of oom changes since v2.6.35.
> 
> I'm not getting involved in this whole flame-war. You need to convince
> Andrew, who has been the person everything went through.

I did.

Therefore, I will resend the patch to you. Thanks.


--------------------------------------------------------------------------
Subject: [PATCH] Revert oom rewrite series

This reverts following commits. They has broke an ABI and made multiple
enduser claim.

9c28ab662a8e3d19d07077ac0a8931c015e8afec Revert "oom: badness heuristic rewrite"
74cd8c6cb3e093c4d67ac3eb3581e246e4981dad Revert "oom: deprecate oom_adj tunable"
79a0bd5796e754c4b4e22071c4edddef3517d010 Revert "memcg: use find_lock_task_mm() in memory cgroups oom"
a465ef80c2a9fe73c85029fcea5c68ffee8dbb69 Revert "oom: always return a badness score of non-zero for eligible tas
516fcbb0c45d943df1b739d3be3d417aee2275f3 Revert "oom: filter unkillable tasks from tasklist dump"
b1c98f95a7954c450dadd809280f86863ea9d05d Revert "oom: add per-mm oom disable count"
fd79f3f47c82a0af5288afe7556905dd171bfc43 Revert "oom: avoid killing a task if a thread sharing its mm cannot be
2d72175528870dcef577db4a2a0b49d819c6eaff Revert "oom: kill all threads sharing oom killed task's mm"
be212960618ddcdb9526ce2cb73fd081fd3e90ea Revert "oom: rewrite error handling for oom_adj and oom_score_adj tunab
1b17c41599c594c7d11ef415a92d47c205fe89ea Revert "oom: fix locking for oom_adj and oom_score_adj"

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 Documentation/feature-removal-schedule.txt |   25 ---
 Documentation/filesystems/proc.txt         |   97 ++++-----
 fs/exec.c                                  |    5 -
 fs/proc/base.c                             |  176 ++--------------
 include/linux/memcontrol.h                 |    8 -
 include/linux/mm_types.h                   |    2 -
 include/linux/oom.h                        |   19 +--
 include/linux/sched.h                      |    3 +-
 kernel/exit.c                              |    3 -
 kernel/fork.c                              |   16 +--
 mm/memcontrol.c                            |   28 +---
 mm/oom_kill.c                              |  323 ++++++++++++++--------------
 12 files changed, 227 insertions(+), 478 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index d8f36f9..9af16b9 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -166,31 +166,6 @@ Who:	Eric Biederman <ebiederm@xmission.com>
 
 ---------------------------
 
-What:	/proc/<pid>/oom_adj
-When:	August 2012
-Why:	/proc/<pid>/oom_adj allows userspace to influence the oom killer's
-	badness heuristic used to determine which task to kill when the kernel
-	is out of memory.
-
-	The badness heuristic has since been rewritten since the introduction of
-	this tunable such that its meaning is deprecated.  The value was
-	implemented as a bitshift on a score generated by the badness()
-	function that did not have any precise units of measure.  With the
-	rewrite, the score is given as a proportion of available memory to the
-	task allocating pages, so using a bitshift which grows the score
-	exponentially is, thus, impossible to tune with fine granularity.
-
-	A much more powerful interface, /proc/<pid>/oom_score_adj, was
-	introduced with the oom killer rewrite that allows users to increase or
-	decrease the badness() score linearly.  This interface will replace
-	/proc/<pid>/oom_adj.
-
-	A warning will be emitted to the kernel log if an application uses this
-	deprecated interface.  After it is printed once, future warnings will be
-	suppressed until the kernel is rebooted.
-
----------------------------
-
 What:	remove EXPORT_SYMBOL(kernel_thread)
 When:	August 2006
 Files:	arch/*/kernel/*_ksyms.c
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index e73df27..030e3a1 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -33,8 +33,7 @@ Table of Contents
   2	Modifying System Parameters
 
   3	Per-Process Parameters
-  3.1	/proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
-								score
+  3.1	/proc/<pid>/oom_adj - Adjust the oom-killer score
   3.2	/proc/<pid>/oom_score - Display current oom-killer score
   3.3	/proc/<pid>/io - Display the IO accounting fields
   3.4	/proc/<pid>/coredump_filter - Core dump filtering settings
@@ -1246,64 +1245,42 @@ of the kernel.
 CHAPTER 3: PER-PROCESS PARAMETERS
 ------------------------------------------------------------------------------
 
-3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
---------------------------------------------------------------------------------
-
-These file can be used to adjust the badness heuristic used to select which
-process gets killed in out of memory conditions.
-
-The badness heuristic assigns a value to each candidate task ranging from 0
-(never kill) to 1000 (always kill) to determine which process is targeted.  The
-units are roughly a proportion along that range of allowed memory the process
-may allocate from based on an estimation of its current memory and swap use.
-For example, if a task is using all allowed memory, its badness score will be
-1000.  If it is using half of its allowed memory, its score will be 500.
-
-There is an additional factor included in the badness score: root
-processes are given 3% extra memory over other tasks.
-
-The amount of "allowed" memory depends on the context in which the oom killer
-was called.  If it is due to the memory assigned to the allocating task's cpuset
-being exhausted, the allowed memory represents the set of mems assigned to that
-cpuset.  If it is due to a mempolicy's node(s) being exhausted, the allowed
-memory represents the set of mempolicy nodes.  If it is due to a memory
-limit (or swap limit) being reached, the allowed memory is that configured
-limit.  Finally, if it is due to the entire system being out of memory, the
-allowed memory represents all allocatable resources.
-
-The value of /proc/<pid>/oom_score_adj is added to the badness score before it
-is used to determine which task to kill.  Acceptable values range from -1000
-(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX).  This allows userspace to
-polarize the preference for oom killing either by always preferring a certain
-task or completely disabling it.  The lowest possible value, -1000, is
-equivalent to disabling oom killing entirely for that task since it will always
-report a badness score of 0.
-
-Consequently, it is very simple for userspace to define the amount of memory to
-consider for each task.  Setting a /proc/<pid>/oom_score_adj value of +500, for
-example, is roughly equivalent to allowing the remainder of tasks sharing the
-same system, cpuset, mempolicy, or memory controller resources to use at least
-50% more memory.  A value of -500, on the other hand, would be roughly
-equivalent to discounting 50% of the task's allowed memory from being considered
-as scoring against the task.
-
-For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also
-be used to tune the badness score.  Its acceptable values range from -16
-(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
-(OOM_DISABLE) to disable oom killing entirely for that task.  Its value is
-scaled linearly with /proc/<pid>/oom_score_adj.
-
-Writing to /proc/<pid>/oom_score_adj or /proc/<pid>/oom_adj will change the
-other with its scaled value.
-
-NOTICE: /proc/<pid>/oom_adj is deprecated and will be removed, please see
-Documentation/feature-removal-schedule.txt.
-
-Caveat: when a parent task is selected, the oom killer will sacrifice any first
-generation children with seperate address spaces instead, if possible.  This
-avoids servers and important system daemons from being killed and loses the
-minimal amount of work.
-
+3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
+------------------------------------------------------
+
+This file can be used to adjust the score used to select which processes
+should be killed in an  out-of-memory  situation.  Giving it a high score will
+increase the likelihood of this process being killed by the oom-killer.  Valid
+values are in the range -16 to +15, plus the special value -17, which disables
+oom-killing altogether for this process.
+
+The process to be killed in an out-of-memory situation is selected among all others
+based on its badness score. This value equals the original memory size of the process
+and is then updated according to its CPU time (utime + stime) and the
+run time (uptime - start time). The longer it runs the smaller is the score.
+Badness score is divided by the square root of the CPU time and then by
+the double square root of the run time.
+
+Swapped out tasks are killed first. Half of each child's memory size is added to
+the parent's score if they do not share the same memory. Thus forking servers
+are the prime candidates to be killed. Having only one 'hungry' child will make
+parent less preferable than the child.
+
+/proc/<pid>/oom_score shows process' current badness score.
+
+The following heuristics are then applied:
+ * if the task was reniced, its score doubles
+ * superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE
+ 	or CAP_SYS_RAWIO) have their score divided by 4
+ * if oom condition happened in one cpuset and checked process does not belong
+ 	to it, its score is divided by 8
+ * the resulting score is multiplied by two to the power of oom_adj, i.e.
+	points <<= oom_adj when it is positive and
+	points >>= -(oom_adj) otherwise
+
+The task with the highest badness score is then selected and its children
+are killed, process itself will be killed in an OOM situation when it does
+not have children or some of them disabled oom like described above.
 
 3.2 /proc/<pid>/oom_score - Display current oom-killer score
 -------------------------------------------------------------
diff --git a/fs/exec.c b/fs/exec.c
index 99d33a1..47986fb 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -54,7 +54,6 @@
 #include <linux/fsnotify.h>
 #include <linux/fs_struct.h>
 #include <linux/pipe_fs_i.h>
-#include <linux/oom.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -766,10 +765,6 @@ static int exec_mmap(struct mm_struct *mm)
 	tsk->mm = mm;
 	tsk->active_mm = mm;
 	activate_mm(active_mm, mm);
-	if (old_mm && tsk->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
-		atomic_dec(&old_mm->oom_disable_count);
-		atomic_inc(&tsk->mm->oom_disable_count);
-	}
 	task_unlock(tsk);
 	arch_pick_mmap_layout(mm);
 	if (old_mm) {
diff --git a/fs/proc/base.c b/fs/proc/base.c
index f3d02ca..ed7d18e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -63,7 +63,6 @@
 #include <linux/namei.h>
 #include <linux/mnt_namespace.h>
 #include <linux/mm.h>
-#include <linux/swap.h>
 #include <linux/rcupdate.h>
 #include <linux/kallsyms.h>
 #include <linux/stacktrace.h>
@@ -431,11 +430,12 @@ static const struct file_operations proc_lstats_operations = {
 static int proc_oom_score(struct task_struct *task, char *buffer)
 {
 	unsigned long points = 0;
+	struct timespec uptime;
 
+	do_posix_clock_monotonic_gettime(&uptime);
 	read_lock(&tasklist_lock);
 	if (pid_alive(task))
-		points = oom_badness(task, NULL, NULL,
-					totalram_pages + total_swap_pages);
+		points = badness(task, NULL, NULL, uptime.tv_sec);
 	read_unlock(&tasklist_lock);
 	return sprintf(buffer, "%lu\n", points);
 }
@@ -1025,74 +1025,36 @@ static ssize_t oom_adjust_write(struct file *file, const char __user *buf,
 	memset(buffer, 0, sizeof(buffer));
 	if (count > sizeof(buffer) - 1)
 		count = sizeof(buffer) - 1;
-	if (copy_from_user(buffer, buf, count)) {
-		err = -EFAULT;
-		goto out;
-	}
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
 
 	err = strict_strtol(strstrip(buffer), 0, &oom_adjust);
 	if (err)
-		goto out;
+		return -EINVAL;
 	if ((oom_adjust < OOM_ADJUST_MIN || oom_adjust > OOM_ADJUST_MAX) &&
-	     oom_adjust != OOM_DISABLE) {
-		err = -EINVAL;
-		goto out;
-	}
+	     oom_adjust != OOM_DISABLE)
+		return -EINVAL;
 
 	task = get_proc_task(file->f_path.dentry->d_inode);
-	if (!task) {
-		err = -ESRCH;
-		goto out;
-	}
-
-	task_lock(task);
-	if (!task->mm) {
-		err = -EINVAL;
-		goto err_task_lock;
-	}
-
+	if (!task)
+		return -ESRCH;
 	if (!lock_task_sighand(task, &flags)) {
-		err = -ESRCH;
-		goto err_task_lock;
+		put_task_struct(task);
+		return -ESRCH;
 	}
 
 	if (oom_adjust < task->signal->oom_adj && !capable(CAP_SYS_RESOURCE)) {
-		err = -EACCES;
-		goto err_sighand;
-	}
-
-	if (oom_adjust != task->signal->oom_adj) {
-		if (oom_adjust == OOM_DISABLE)
-			atomic_inc(&task->mm->oom_disable_count);
-		if (task->signal->oom_adj == OOM_DISABLE)
-			atomic_dec(&task->mm->oom_disable_count);
+		unlock_task_sighand(task, &flags);
+		put_task_struct(task);
+		return -EACCES;
 	}
 
-	/*
-	 * Warn that /proc/pid/oom_adj is deprecated, see
-	 * Documentation/feature-removal-schedule.txt.
-	 */
-	printk_once(KERN_WARNING "%s (%d): /proc/%d/oom_adj is deprecated, "
-			"please use /proc/%d/oom_score_adj instead.\n",
-			current->comm, task_pid_nr(current),
-			task_pid_nr(task), task_pid_nr(task));
 	task->signal->oom_adj = oom_adjust;
-	/*
-	 * Scale /proc/pid/oom_score_adj appropriately ensuring that a maximum
-	 * value is always attainable.
-	 */
-	if (task->signal->oom_adj == OOM_ADJUST_MAX)
-		task->signal->oom_score_adj = OOM_SCORE_ADJ_MAX;
-	else
-		task->signal->oom_score_adj = (oom_adjust * OOM_SCORE_ADJ_MAX) /
-								-OOM_DISABLE;
-err_sighand:
+
 	unlock_task_sighand(task, &flags);
-err_task_lock:
-	task_unlock(task);
 	put_task_struct(task);
-out:
-	return err < 0 ? err : count;
+
+	return count;
 }
 
 static const struct file_operations proc_oom_adjust_operations = {
@@ -1101,106 +1063,6 @@ static const struct file_operations proc_oom_adjust_operations = {
 	.llseek		= generic_file_llseek,
 };
 
-static ssize_t oom_score_adj_read(struct file *file, char __user *buf,
-					size_t count, loff_t *ppos)
-{
-	struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
-	char buffer[PROC_NUMBUF];
-	int oom_score_adj = OOM_SCORE_ADJ_MIN;
-	unsigned long flags;
-	size_t len;
-
-	if (!task)
-		return -ESRCH;
-	if (lock_task_sighand(task, &flags)) {
-		oom_score_adj = task->signal->oom_score_adj;
-		unlock_task_sighand(task, &flags);
-	}
-	put_task_struct(task);
-	len = snprintf(buffer, sizeof(buffer), "%d\n", oom_score_adj);
-	return simple_read_from_buffer(buf, count, ppos, buffer, len);
-}
-
-static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
-					size_t count, loff_t *ppos)
-{
-	struct task_struct *task;
-	char buffer[PROC_NUMBUF];
-	unsigned long flags;
-	long oom_score_adj;
-	int err;
-
-	memset(buffer, 0, sizeof(buffer));
-	if (count > sizeof(buffer) - 1)
-		count = sizeof(buffer) - 1;
-	if (copy_from_user(buffer, buf, count)) {
-		err = -EFAULT;
-		goto out;
-	}
-
-	err = strict_strtol(strstrip(buffer), 0, &oom_score_adj);
-	if (err)
-		goto out;
-	if (oom_score_adj < OOM_SCORE_ADJ_MIN ||
-			oom_score_adj > OOM_SCORE_ADJ_MAX) {
-		err = -EINVAL;
-		goto out;
-	}
-
-	task = get_proc_task(file->f_path.dentry->d_inode);
-	if (!task) {
-		err = -ESRCH;
-		goto out;
-	}
-
-	task_lock(task);
-	if (!task->mm) {
-		err = -EINVAL;
-		goto err_task_lock;
-	}
-
-	if (!lock_task_sighand(task, &flags)) {
-		err = -ESRCH;
-		goto err_task_lock;
-	}
-
-	if (oom_score_adj < task->signal->oom_score_adj &&
-			!capable(CAP_SYS_RESOURCE)) {
-		err = -EACCES;
-		goto err_sighand;
-	}
-
-	if (oom_score_adj != task->signal->oom_score_adj) {
-		if (oom_score_adj == OOM_SCORE_ADJ_MIN)
-			atomic_inc(&task->mm->oom_disable_count);
-		if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
-			atomic_dec(&task->mm->oom_disable_count);
-	}
-	task->signal->oom_score_adj = oom_score_adj;
-	/*
-	 * Scale /proc/pid/oom_adj appropriately ensuring that OOM_DISABLE is
-	 * always attainable.
-	 */
-	if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
-		task->signal->oom_adj = OOM_DISABLE;
-	else
-		task->signal->oom_adj = (oom_score_adj * OOM_ADJUST_MAX) /
-							OOM_SCORE_ADJ_MAX;
-err_sighand:
-	unlock_task_sighand(task, &flags);
-err_task_lock:
-	task_unlock(task);
-	put_task_struct(task);
-out:
-	return err < 0 ? err : count;
-}
-
-static const struct file_operations proc_oom_score_adj_operations = {
-	.read		= oom_score_adj_read,
-	.write		= oom_score_adj_write,
-	.llseek		= default_llseek,
-};
-
 #ifdef CONFIG_AUDITSYSCALL
 #define TMPBUFLEN 21
 static ssize_t proc_loginuid_read(struct file * file, char __user * buf,
@@ -2779,7 +2641,6 @@ static const struct pid_entry tgid_base_stuff[] = {
 #endif
 	INF("oom_score",  S_IRUGO, proc_oom_score),
 	REG("oom_adj",    S_IRUGO|S_IWUSR, proc_oom_adjust_operations),
-	REG("oom_score_adj", S_IRUGO|S_IWUSR, proc_oom_score_adj_operations),
 #ifdef CONFIG_AUDITSYSCALL
 	REG("loginuid",   S_IWUSR|S_IRUGO, proc_loginuid_operations),
 	REG("sessionid",  S_IRUGO, proc_sessionid_operations),
@@ -3115,7 +2976,6 @@ static const struct pid_entry tid_base_stuff[] = {
 #endif
 	INF("oom_score", S_IRUGO, proc_oom_score),
 	REG("oom_adj",   S_IRUGO|S_IWUSR, proc_oom_adjust_operations),
-	REG("oom_score_adj", S_IRUGO|S_IWUSR, proc_oom_score_adj_operations),
 #ifdef CONFIG_AUDITSYSCALL
 	REG("loginuid",  S_IWUSR|S_IRUGO, proc_loginuid_operations),
 	REG("sessionid",  S_IRUSR, proc_sessionid_operations),
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 159a076..b13fc2a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -124,8 +124,6 @@ static inline bool mem_cgroup_disabled(void)
 void mem_cgroup_update_file_mapped(struct page *page, int val);
 unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 						gfp_t gfp_mask);
-u64 mem_cgroup_get_limit(struct mem_cgroup *mem);
-
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct mem_cgroup;
 
@@ -305,12 +303,6 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 	return 0;
 }
 
-static inline
-u64 mem_cgroup_get_limit(struct mem_cgroup *mem)
-{
-	return 0;
-}
-
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index bb7288a..cb57d65 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -310,8 +310,6 @@ struct mm_struct {
 #ifdef CONFIG_MMU_NOTIFIER
 	struct mmu_notifier_mm *mmu_notifier_mm;
 #endif
-	/* How many tasks sharing this mm are OOM_DISABLE */
-	atomic_t oom_disable_count;
 };
 
 /* Future-safe accessor for struct mm_struct's cpu_vm_mask. */
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 5e3aa83..40e5e3a 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -1,27 +1,14 @@
 #ifndef __INCLUDE_LINUX_OOM_H
 #define __INCLUDE_LINUX_OOM_H
 
-/*
- * /proc/<pid>/oom_adj is deprecated, see
- * Documentation/feature-removal-schedule.txt.
- *
- * /proc/<pid>/oom_adj set to -17 protects from the oom-killer
- */
+/* /proc/<pid>/oom_adj set to -17 protects from the oom-killer */
 #define OOM_DISABLE (-17)
 /* inclusive */
 #define OOM_ADJUST_MIN (-16)
 #define OOM_ADJUST_MAX 15
 
-/*
- * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for
- * pid.
- */
-#define OOM_SCORE_ADJ_MIN	(-1000)
-#define OOM_SCORE_ADJ_MAX	1000
-
 #ifdef __KERNEL__
 
-#include <linux/sched.h>
 #include <linux/types.h>
 #include <linux/nodemask.h>
 
@@ -40,8 +27,6 @@ enum oom_constraint {
 	CONSTRAINT_MEMCG,
 };
 
-extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
-			const nodemask_t *nodemask, unsigned long totalpages);
 extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
 extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
 
@@ -66,8 +51,6 @@ static inline void oom_killer_enable(void)
 extern unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
 		      const nodemask_t *nodemask, unsigned long uptime);
 
-extern struct task_struct *find_lock_task_mm(struct task_struct *p);
-
 /* sysctls */
 extern int sysctl_oom_dump_tasks;
 extern int sysctl_oom_kill_allocating_task;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d0036e5..a35acb6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -624,8 +624,7 @@ struct signal_struct {
 	struct tty_audit_buf *tty_audit_buf;
 #endif
 
-	int oom_adj;		/* OOM kill score adjustment (bit shift) */
-	int oom_score_adj;	/* OOM kill score adjustment */
+	int oom_adj;	/* OOM kill score adjustment (bit shift) */
 
 	struct mutex cred_guard_mutex;	/* guard against foreign influences on
 					 * credential calculations
diff --git a/kernel/exit.c b/kernel/exit.c
index 21aa7b3..c806406 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -50,7 +50,6 @@
 #include <linux/perf_event.h>
 #include <trace/events/sched.h>
 #include <linux/hw_breakpoint.h>
-#include <linux/oom.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -696,8 +695,6 @@ static void exit_mm(struct task_struct * tsk)
 	enter_lazy_tlb(mm, current);
 	/* We don't want this task to be frozen prematurely */
 	clear_freeze_flag(tsk);
-	if (tsk->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
-		atomic_dec(&mm->oom_disable_count);
 	task_unlock(tsk);
 	mm_update_next_owner(mm);
 	mmput(mm);
diff --git a/kernel/fork.c b/kernel/fork.c
index 3b159c5..cca5e8b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -65,7 +65,6 @@
 #include <linux/perf_event.h>
 #include <linux/posix-timers.h>
 #include <linux/user-return-notifier.h>
-#include <linux/oom.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -489,7 +488,6 @@ static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p)
 	mm->cached_hole_size = ~0UL;
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
-	atomic_set(&mm->oom_disable_count, 0);
 
 	if (likely(!mm_alloc_pgd(mm))) {
 		mm->def_flags = 0;
@@ -743,8 +741,6 @@ good_mm:
 	/* Initializing for Swap token stuff */
 	mm->token_priority = 0;
 	mm->last_interval = 0;
-	if (tsk->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
-		atomic_inc(&mm->oom_disable_count);
 
 	tsk->mm = mm;
 	tsk->active_mm = mm;
@@ -906,7 +902,6 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	tty_audit_fork(sig);
 
 	sig->oom_adj = current->signal->oom_adj;
-	sig->oom_score_adj = current->signal->oom_score_adj;
 
 	mutex_init(&sig->cred_guard_mutex);
 
@@ -1305,13 +1300,8 @@ bad_fork_cleanup_io:
 bad_fork_cleanup_namespaces:
 	exit_task_namespaces(p);
 bad_fork_cleanup_mm:
-	if (p->mm) {
-		task_lock(p);
-		if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
-			atomic_dec(&p->mm->oom_disable_count);
-		task_unlock(p);
+	if (p->mm)
 		mmput(p->mm);
-	}
 bad_fork_cleanup_signal:
 	if (!(clone_flags & CLONE_THREAD))
 		free_signal_struct(p->signal);
@@ -1704,10 +1694,6 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
 			active_mm = current->active_mm;
 			current->mm = new_mm;
 			current->active_mm = new_mm;
-			if (current->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
-				atomic_dec(&mm->oom_disable_count);
-				atomic_inc(&new_mm->oom_disable_count);
-			}
 			activate_mm(active_mm, new_mm);
 			new_mm = mm;
 		}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9a99cfa..c628370 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -47,7 +47,6 @@
 #include <linux/mm_inline.h>
 #include <linux/page_cgroup.h>
 #include <linux/cpu.h>
-#include <linux/oom.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -917,13 +916,10 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 {
 	int ret;
 	struct mem_cgroup *curr = NULL;
-	struct task_struct *p;
 
-	p = find_lock_task_mm(task);
-	if (!p)
-		return 0;
-	curr = try_get_mem_cgroup_from_mm(p->mm);
-	task_unlock(p);
+	task_lock(task);
+	curr = try_get_mem_cgroup_from_mm(task->mm);
+	task_unlock(task);
 	if (!curr)
 		return 0;
 	/*
@@ -1297,24 +1293,6 @@ static int mem_cgroup_count_children(struct mem_cgroup *mem)
 }
 
 /*
- * Return the memory (and swap, if configured) limit for a memcg.
- */
-u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
-{
-	u64 limit;
-	u64 memsw;
-
-	limit = res_counter_read_u64(&memcg->res, RES_LIMIT) +
-			total_swap_pages;
-	memsw = res_counter_read_u64(&memcg->memsw, RES_LIMIT);
-	/*
-	 * If memsw is finite and limits the amount of swap space available
-	 * to this memcg, return that limit.
-	 */
-	return min(limit, memsw);
-}
-
-/*
  * Visit the first child (need not be the first child as per the ordering
  * of the cgroup list, since we track last_scanned_child) of @mem and use
  * that to reclaim free pages from.
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7dcca55..f251ddb 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -4,8 +4,6 @@
  *  Copyright (C)  1998,2000  Rik van Riel
  *	Thanks go out to Claus Fischer for some serious inspiration and
  *	for goading me into coding this file...
- *  Copyright (C)  2010  Google, Inc.
- *	Rewritten by David Rientjes
  *
  *  The routines in this file are used to kill a process when
  *  we're seriously out of memory. This gets called from __alloc_pages()
@@ -36,6 +34,7 @@ int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 static DEFINE_SPINLOCK(zone_scan_lock);
+/* #define DEBUG */
 
 #ifdef CONFIG_NUMA
 /**
@@ -106,7 +105,7 @@ static void boost_dying_task_prio(struct task_struct *p,
  * pointer.  Return p, or any of its subthreads with a valid ->mm, with
  * task_lock() held.
  */
-struct task_struct *find_lock_task_mm(struct task_struct *p)
+static struct task_struct *find_lock_task_mm(struct task_struct *p)
 {
 	struct task_struct *t = p;
 
@@ -121,8 +120,8 @@ struct task_struct *find_lock_task_mm(struct task_struct *p)
 }
 
 /* return true if the task is not adequate as candidate victim task. */
-static bool oom_unkillable_task(struct task_struct *p,
-		const struct mem_cgroup *mem, const nodemask_t *nodemask)
+static bool oom_unkillable_task(struct task_struct *p, struct mem_cgroup *mem,
+			   const nodemask_t *nodemask)
 {
 	if (is_global_init(p))
 		return true;
@@ -141,82 +140,137 @@ static bool oom_unkillable_task(struct task_struct *p,
 }
 
 /**
- * oom_badness - heuristic function to determine which candidate task to kill
+ * badness - calculate a numeric value for how bad this task has been
  * @p: task struct of which task we should calculate
- * @totalpages: total present RAM allowed for page allocation
+ * @uptime: current uptime in seconds
  *
- * The heuristic for determining which task to kill is made to be as simple and
- * predictable as possible.  The goal is to return the highest value for the
- * task consuming the most memory to avoid subsequent oom failures.
+ * The formula used is relatively simple and documented inline in the
+ * function. The main rationale is that we want to select a good task
+ * to kill when we run out of memory.
+ *
+ * Good in this context means that:
+ * 1) we lose the minimum amount of work done
+ * 2) we recover a large amount of memory
+ * 3) we don't kill anything innocent of eating tons of memory
+ * 4) we want to kill the minimum amount of processes (one)
+ * 5) we try to kill the process the user expects us to kill, this
+ *    algorithm has been meticulously tuned to meet the principle
+ *    of least surprise ... (be careful when you change it)
  */
-unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
-		      const nodemask_t *nodemask, unsigned long totalpages)
+unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
+		      const nodemask_t *nodemask, unsigned long uptime)
 {
-	int points;
+	unsigned long points, cpu_time, run_time;
+	struct task_struct *child;
+	struct task_struct *c, *t;
+	int oom_adj = p->signal->oom_adj;
+	struct task_cputime task_time;
+	unsigned long utime;
+	unsigned long stime;
 
 	if (oom_unkillable_task(p, mem, nodemask))
 		return 0;
+	if (oom_adj == OOM_DISABLE)
+		return 0;
 
 	p = find_lock_task_mm(p);
 	if (!p)
 		return 0;
 
 	/*
-	 * Shortcut check for a thread sharing p->mm that is OOM_SCORE_ADJ_MIN
-	 * so the entire heuristic doesn't need to be executed for something
-	 * that cannot be killed.
+	 * The memory size of the process is the basis for the badness.
 	 */
-	if (atomic_read(&p->mm->oom_disable_count)) {
-		task_unlock(p);
-		return 0;
-	}
+	points = p->mm->total_vm;
+	task_unlock(p);
 
 	/*
-	 * When the PF_OOM_ORIGIN bit is set, it indicates the task should have
-	 * priority for oom killing.
+	 * swapoff can easily use up all memory, so kill those first.
 	 */
-	if (p->flags & PF_OOM_ORIGIN) {
-		task_unlock(p);
-		return 1000;
-	}
+	if (p->flags & PF_OOM_ORIGIN)
+		return ULONG_MAX;
 
 	/*
-	 * The memory controller may have a limit of 0 bytes, so avoid a divide
-	 * by zero, if necessary.
+	 * Processes which fork a lot of child processes are likely
+	 * a good choice. We add half the vmsize of the children if they
+	 * have an own mm. This prevents forking servers to flood the
+	 * machine with an endless amount of children. In case a single
+	 * child is eating the vast majority of memory, adding only half
+	 * to the parents will make the child our kill candidate of choice.
 	 */
-	if (!totalpages)
-		totalpages = 1;
+	t = p;
+	do {
+		list_for_each_entry(c, &t->children, sibling) {
+			child = find_lock_task_mm(c);
+			if (child) {
+				if (child->mm != p->mm)
+					points += child->mm->total_vm/2 + 1;
+				task_unlock(child);
+			}
+		}
+	} while_each_thread(p, t);
 
 	/*
-	 * The baseline for the badness score is the proportion of RAM that each
-	 * task's rss and swap space use.
+	 * CPU time is in tens of seconds and run time is in thousands
+         * of seconds. There is no particular reason for this other than
+         * that it turned out to work very well in practice.
 	 */
-	points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
-			totalpages;
-	task_unlock(p);
+	thread_group_cputime(p, &task_time);
+	utime = cputime_to_jiffies(task_time.utime);
+	stime = cputime_to_jiffies(task_time.stime);
+	cpu_time = (utime + stime) >> (SHIFT_HZ + 3);
+
+
+	if (uptime >= p->start_time.tv_sec)
+		run_time = (uptime - p->start_time.tv_sec) >> 10;
+	else
+		run_time = 0;
+
+	if (cpu_time)
+		points /= int_sqrt(cpu_time);
+	if (run_time)
+		points /= int_sqrt(int_sqrt(run_time));
 
 	/*
-	 * Root processes get 3% bonus, just like the __vm_enough_memory()
-	 * implementation used by LSMs.
+	 * Niced processes are most likely less important, so double
+	 * their badness points.
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
-		points -= 30;
+	if (task_nice(p) > 0)
+		points *= 2;
 
 	/*
-	 * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
-	 * either completely disable oom killing or always prefer a certain
-	 * task.
+	 * Superuser processes are usually more important, so we make it
+	 * less likely that we kill those.
 	 */
-	points += p->signal->oom_score_adj;
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
+	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
+		points /= 4;
 
 	/*
-	 * Never return 0 for an eligible task that may be killed since it's
-	 * possible that no single user task uses more than 0.1% of memory and
-	 * no single admin tasks uses more than 3.0%.
+	 * We don't want to kill a process with direct hardware access.
+	 * Not only could that mess up the hardware, but usually users
+	 * tend to only have this flag set on applications they think
+	 * of as important.
 	 */
-	if (points <= 0)
-		return 1;
-	return (points < 1000) ? points : 1000;
+	if (has_capability_noaudit(p, CAP_SYS_RAWIO))
+		points /= 4;
+
+	/*
+	 * Adjust the score by oom_adj.
+	 */
+	if (oom_adj) {
+		if (oom_adj > 0) {
+			if (!points)
+				points = 1;
+			points <<= oom_adj;
+		} else
+			points >>= -(oom_adj);
+	}
+
+#ifdef DEBUG
+	printk(KERN_DEBUG "OOMkill: task %d (%s) got %lu points\n",
+	p->pid, p->comm, points);
+#endif
+	return points;
 }
 
 /*
@@ -224,20 +278,12 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
  */
 #ifdef CONFIG_NUMA
 static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
-				gfp_t gfp_mask, nodemask_t *nodemask,
-				unsigned long *totalpages)
+				    gfp_t gfp_mask, nodemask_t *nodemask)
 {
 	struct zone *zone;
 	struct zoneref *z;
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
-	bool cpuset_limited = false;
-	int nid;
-
-	/* Default to all available memory */
-	*totalpages = totalram_pages + total_swap_pages;
 
-	if (!zonelist)
-		return CONSTRAINT_NONE;
 	/*
 	 * Reach here only when __GFP_NOFAIL is used. So, we should avoid
 	 * to kill current.We have to random task kill in this case.
@@ -247,37 +293,26 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
 		return CONSTRAINT_NONE;
 
 	/*
-	 * This is not a __GFP_THISNODE allocation, so a truncated nodemask in
-	 * the page allocator means a mempolicy is in effect.  Cpuset policy
-	 * is enforced in get_page_from_freelist().
+	 * The nodemask here is a nodemask passed to alloc_pages(). Now,
+	 * cpuset doesn't use this nodemask for its hardwall/softwall/hierarchy
+	 * feature. mempolicy is an only user of nodemask here.
+	 * check mempolicy's nodemask contains all N_HIGH_MEMORY
 	 */
-	if (nodemask && !nodes_subset(node_states[N_HIGH_MEMORY], *nodemask)) {
-		*totalpages = total_swap_pages;
-		for_each_node_mask(nid, *nodemask)
-			*totalpages += node_spanned_pages(nid);
+	if (nodemask && !nodes_subset(node_states[N_HIGH_MEMORY], *nodemask))
 		return CONSTRAINT_MEMORY_POLICY;
-	}
 
 	/* Check this allocation failure is caused by cpuset's wall function */
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 			high_zoneidx, nodemask)
 		if (!cpuset_zone_allowed_softwall(zone, gfp_mask))
-			cpuset_limited = true;
+			return CONSTRAINT_CPUSET;
 
-	if (cpuset_limited) {
-		*totalpages = total_swap_pages;
-		for_each_node_mask(nid, cpuset_current_mems_allowed)
-			*totalpages += node_spanned_pages(nid);
-		return CONSTRAINT_CPUSET;
-	}
 	return CONSTRAINT_NONE;
 }
 #else
 static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
-				gfp_t gfp_mask, nodemask_t *nodemask,
-				unsigned long *totalpages)
+				gfp_t gfp_mask, nodemask_t *nodemask)
 {
-	*totalpages = totalram_pages + total_swap_pages;
 	return CONSTRAINT_NONE;
 }
 #endif
@@ -288,16 +323,17 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
  *
  * (not docbooked, we don't want this one cluttering up the manual)
  */
-static struct task_struct *select_bad_process(unsigned int *ppoints,
-		unsigned long totalpages, struct mem_cgroup *mem,
-		const nodemask_t *nodemask)
+static struct task_struct *select_bad_process(unsigned long *ppoints,
+		struct mem_cgroup *mem, const nodemask_t *nodemask)
 {
 	struct task_struct *p;
 	struct task_struct *chosen = NULL;
+	struct timespec uptime;
 	*ppoints = 0;
 
+	do_posix_clock_monotonic_gettime(&uptime);
 	for_each_process(p) {
-		unsigned int points;
+		unsigned long points;
 
 		if (oom_unkillable_task(p, mem, nodemask))
 			continue;
@@ -329,11 +365,11 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 				return ERR_PTR(-1UL);
 
 			chosen = p;
-			*ppoints = 1000;
+			*ppoints = ULONG_MAX;
 		}
 
-		points = oom_badness(p, mem, nodemask, totalpages);
-		if (points > *ppoints) {
+		points = badness(p, mem, nodemask, uptime.tv_sec);
+		if (points > *ppoints || !chosen) {
 			chosen = p;
 			*ppoints = points;
 		}
@@ -345,24 +381,27 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 /**
  * dump_tasks - dump current memory state of all system tasks
  * @mem: current's memory controller, if constrained
- * @nodemask: nodemask passed to page allocator for mempolicy ooms
  *
- * Dumps the current memory state of all eligible tasks.  Tasks not in the same
- * memcg, not in the same cpuset, or bound to a disjoint set of mempolicy nodes
- * are not shown.
+ * Dumps the current memory state of all system tasks, excluding kernel threads.
  * State information includes task's pid, uid, tgid, vm size, rss, cpu, oom_adj
- * value, oom_score_adj value, and name.
+ * score, and name.
+ *
+ * If the actual is non-NULL, only tasks that are a member of the mem_cgroup are
+ * shown.
  *
  * Call with tasklist_lock read-locked.
  */
-static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
+static void dump_tasks(const struct mem_cgroup *mem)
 {
 	struct task_struct *p;
 	struct task_struct *task;
 
-	pr_info("[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name\n");
+	printk(KERN_INFO "[ pid ]   uid  tgid total_vm      rss cpu oom_adj "
+	       "name\n");
 	for_each_process(p) {
-		if (oom_unkillable_task(p, mem, nodemask))
+		if (p->flags & PF_KTHREAD)
+			continue;
+		if (mem && !task_in_mem_cgroup(p, mem))
 			continue;
 
 		task = find_lock_task_mm(p);
@@ -375,69 +414,43 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
 			continue;
 		}
 
-		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
+		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d %s\n",
 			task->pid, task_uid(task), task->tgid,
 			task->mm->total_vm, get_mm_rss(task->mm),
-			task_cpu(task), task->signal->oom_adj,
-			task->signal->oom_score_adj, task->comm);
+			task_cpu(task), task->signal->oom_adj, task->comm);
 		task_unlock(task);
 	}
 }
 
 static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
-			struct mem_cgroup *mem, const nodemask_t *nodemask)
+							struct mem_cgroup *mem)
 {
 	task_lock(current);
 	pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
-		"oom_adj=%d, oom_score_adj=%d\n",
-		current->comm, gfp_mask, order, current->signal->oom_adj,
-		current->signal->oom_score_adj);
+		"oom_adj=%d\n",
+		current->comm, gfp_mask, order, current->signal->oom_adj);
 	cpuset_print_task_mems_allowed(current);
 	task_unlock(current);
 	dump_stack();
 	mem_cgroup_print_oom_info(mem, p);
 	show_mem();
 	if (sysctl_oom_dump_tasks)
-		dump_tasks(mem, nodemask);
+		dump_tasks(mem);
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
 static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 {
-	struct task_struct *q;
-	struct mm_struct *mm;
-
 	p = find_lock_task_mm(p);
 	if (!p)
 		return 1;
 
-	/* mm cannot be safely dereferenced after task_unlock(p) */
-	mm = p->mm;
-
 	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
 		task_pid_nr(p), p->comm, K(p->mm->total_vm),
 		K(get_mm_counter(p->mm, MM_ANONPAGES)),
 		K(get_mm_counter(p->mm, MM_FILEPAGES)));
 	task_unlock(p);
 
-	/*
-	 * Kill all processes sharing p->mm in other thread groups, if any.
-	 * They don't get access to memory reserves or a higher scheduler
-	 * priority, though, to avoid depletion of all memory or task
-	 * starvation.  This prevents mm->mmap_sem livelock when an oom killed
-	 * task cannot exit because it requires the semaphore and its contended
-	 * by another thread trying to allocate memory itself.  That thread will
-	 * now get access to memory reserves since it has a pending fatal
-	 * signal.
-	 */
-	for_each_process(q)
-		if (q->mm == mm && !same_thread_group(q, p)) {
-			task_lock(q);	/* Protect ->comm from prctl() */
-			pr_err("Kill process %d (%s) sharing same memory\n",
-				task_pid_nr(q), q->comm);
-			task_unlock(q);
-			force_sig(SIGKILL, q);
-		}
 
 	set_tsk_thread_flag(p, TIF_MEMDIE);
 	force_sig(SIGKILL, p);
@@ -454,17 +467,17 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 #undef K
 
 static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
-			    unsigned int points, unsigned long totalpages,
-			    struct mem_cgroup *mem, nodemask_t *nodemask,
-			    const char *message)
+			    unsigned long points, struct mem_cgroup *mem,
+			    nodemask_t *nodemask, const char *message)
 {
 	struct task_struct *victim = p;
 	struct task_struct *child;
 	struct task_struct *t = p;
-	unsigned int victim_points = 0;
+	unsigned long victim_points = 0;
+	struct timespec uptime;
 
 	if (printk_ratelimit())
-		dump_header(p, gfp_mask, order, mem, nodemask);
+		dump_header(p, gfp_mask, order, mem);
 
 	/*
 	 * If the task is already exiting, don't alarm the sysadmin or kill
@@ -477,7 +490,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	}
 
 	task_lock(p);
-	pr_err("%s: Kill process %d (%s) score %d or sacrifice child\n",
+	pr_err("%s: Kill process %d (%s) score %lu or sacrifice child\n",
 		message, task_pid_nr(p), p->comm, points);
 	task_unlock(p);
 
@@ -487,15 +500,14 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 * parent.  This attempts to lose the minimal amount of work done while
 	 * still freeing memory.
 	 */
+	do_posix_clock_monotonic_gettime(&uptime);
 	do {
 		list_for_each_entry(child, &t->children, sibling) {
-			unsigned int child_points;
+			unsigned long child_points;
 
-			/*
-			 * oom_badness() returns 0 if the thread is unkillable
-			 */
-			child_points = oom_badness(child, mem, nodemask,
-								totalpages);
+			/* badness() returns 0 if the thread is unkillable */
+			child_points = badness(child, mem, nodemask,
+					       uptime.tv_sec);
 			if (child_points > victim_points) {
 				victim = child;
 				victim_points = child_points;
@@ -510,7 +522,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
  * Determines whether the kernel must panic because of the panic_on_oom sysctl.
  */
 static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
-				int order, const nodemask_t *nodemask)
+				int order)
 {
 	if (likely(!sysctl_panic_on_oom))
 		return;
@@ -524,7 +536,7 @@ static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
 			return;
 	}
 	read_lock(&tasklist_lock);
-	dump_header(NULL, gfp_mask, order, NULL, nodemask);
+	dump_header(NULL, gfp_mask, order, NULL);
 	read_unlock(&tasklist_lock);
 	panic("Out of memory: %s panic_on_oom is enabled\n",
 		sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
@@ -533,19 +545,17 @@ static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
 {
-	unsigned long limit;
-	unsigned int points = 0;
+	unsigned long points = 0;
 	struct task_struct *p;
 
-	check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, 0, NULL);
-	limit = mem_cgroup_get_limit(mem) >> PAGE_SHIFT;
+	check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, 0);
 	read_lock(&tasklist_lock);
 retry:
-	p = select_bad_process(&points, limit, mem, NULL);
+	p = select_bad_process(&points, mem, NULL);
 	if (!p || PTR_ERR(p) == -1UL)
 		goto out;
 
-	if (oom_kill_process(p, gfp_mask, 0, points, limit, mem, NULL,
+	if (oom_kill_process(p, gfp_mask, 0, points, mem, NULL,
 				"Memory cgroup out of memory"))
 		goto retry;
 out:
@@ -669,11 +679,9 @@ static void clear_system_oom(void)
 void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 		int order, nodemask_t *nodemask)
 {
-	const nodemask_t *mpol_mask;
 	struct task_struct *p;
-	unsigned long totalpages;
 	unsigned long freed = 0;
-	unsigned int points;
+	unsigned long points;
 	enum oom_constraint constraint = CONSTRAINT_NONE;
 	int killed = 0;
 
@@ -697,40 +705,41 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	 * Check if there were limitations on the allocation (only relevant for
 	 * NUMA) that may require different handling.
 	 */
-	constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
-						&totalpages);
-	mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
-	check_panic_on_oom(constraint, gfp_mask, order, mpol_mask);
+	if (zonelist)
+		constraint = constrained_alloc(zonelist, gfp_mask, nodemask);
+	check_panic_on_oom(constraint, gfp_mask, order);
 
 	read_lock(&tasklist_lock);
 	if (sysctl_oom_kill_allocating_task &&
 	    !oom_unkillable_task(current, NULL, nodemask) &&
-	    current->mm && !atomic_read(&current->mm->oom_disable_count)) {
+	    (current->signal->oom_adj != OOM_DISABLE)) {
 		/*
 		 * oom_kill_process() needs tasklist_lock held.  If it returns
 		 * non-zero, current could not be killed so we must fallback to
 		 * the tasklist scan.
 		 */
-		if (!oom_kill_process(current, gfp_mask, order, 0, totalpages,
-				NULL, nodemask,
+		if (!oom_kill_process(current, gfp_mask, order, 0, NULL,
+				nodemask,
 				"Out of memory (oom_kill_allocating_task)"))
 			goto out;
 	}
 
 retry:
-	p = select_bad_process(&points, totalpages, NULL, mpol_mask);
+	p = select_bad_process(&points, NULL,
+			constraint == CONSTRAINT_MEMORY_POLICY ? nodemask :
+								 NULL);
 	if (PTR_ERR(p) == -1UL)
 		goto out;
 
 	/* Found nothing?!?! Either we hang forever, or we panic. */
 	if (!p) {
-		dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
+		dump_header(NULL, gfp_mask, order, NULL);
 		read_unlock(&tasklist_lock);
 		panic("Out of memory and no killable processes...\n");
 	}
 
-	if (oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,
-				nodemask, "Out of memory"))
+	if (oom_kill_process(p, gfp_mask, order, points, NULL, nodemask,
+			     "Out of memory"))
 		goto retry;
 	killed = 1;
 out:
-- 
1.6.5.2








^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15  6:57       ` KOSAKI Motohiro
  2010-11-15 10:34         ` David Rientjes
@ 2010-11-23  7:16         ` KOSAKI Motohiro
  1 sibling, 0 replies; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-23  7:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, Linus Torvalds, LKML, David Rientjes, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

> If you still have a question, please ask me. maybe I can answer all of 
> your question.

Zero question?  If so, I'll resend the revert to linus.

Actually, I don't tend hear the shouting. They aren't discussion. It's
only crappy shout. Googlers have to think why no person agree their claim.
ZERO. even though >20 people discussed with them. DavidR seems to continue
to make flame. But I don't care. He have to learn making flame don't solve
ANYTHING.

And they have to learn correct discussion way and which is different of 
discusstion and shouting. and why we have to learn userland workload and
have to avoid any breakage. I'm angry googlers frequently break kernel
and frequently ignore userland claim.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15 10:34         ` David Rientjes
  2010-11-15 23:31           ` Jesper Juhl
@ 2010-11-23  7:16           ` KOSAKI Motohiro
  2010-11-28  1:45             ` David Rientjes
  1 sibling, 1 reply; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-23  7:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Andrew Morton, Linus Torvalds, LKML, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

Sorry for the delay.

> On Mon, 15 Nov 2010, KOSAKI Motohiro wrote:
> 
> > Of cource, I denied. He seems to think number of email is meaningful than
> > how talk about. but it's incorrect and makes no sense. Why not? Also, He
> > have to talk about logically. "Hey, I think it's not bug" makes no sense.
> > Such claim don't solve anything. userland is still unhappy. Why not?
> > I want to quickly action.
> 
> If there are pending complaints or bugs that I haven't addressed, please 
> bring them to my attention.  To date, I know of no issues that have been 
> raised that I have not addressed; you're always free to disagree with my 
> position, but in the end you may find that when the kernel moves in a 
> different direction that you should begin to accept it.

I can't understand. Why do I need to ignore userland folks? WHY?
I have no reason userland complain. I tend to prefer to avoid userland 
folks painful than kernel developers.


> 
> > That said, If anyone want to change userland ABI, Be carefully. They have
> > to investigate userland usecase carefully and avoid to break them carefully 
> > again. If someone think "hey, It's no big matter. userland rewritten can solve
> > an issue", I strongly disagree. they don't understand why all of userland 
> > applications rewritten is harmful.
>
> You may remember that the initial version of my rewrite replaced oom_adj 
> entirely with the new oom_score_adj semantics.  Others suggested that it 
> be seperated into a new tunable and the old tunable deprecated for a 
> lengthy period of time.  I accepted that criticism and understood the 
> drawbacks of replacing the tunable immediately and followed those 
> suggestions.  I disagree with you that the deprecation of oom_adj for a 
> period of two years is as dramatic as you imply and I disagree that users 
> are experiencing problems with the linear scale that it now operates on 
> versus the old exponential scale.

Yes and No. People wanted to separate AND don't break old one.


> 
> > 1) About two month ago, Dave hansen observed strange OOM issue because he
> >    has a big machine and ALL process are not so big. thus, eventually all 
> >    process got oom-score=0 and oom-killer didn't work.
> > 
> >    https://kerneltrap.org/mailarchive/linux-driver-devel/2010/9/9/6886383
> > 
> >    DavidR changed oom-score to +1 in such situation. 
> > 
> >    http://kerneltrap.org/mailarchive/linux-kernel/2010/9/9/4617455
> > 
> >    But it is completely bognus. If all process have score=1, oom-killer fall
> >    back to purely random killer. I expected and explained his patch has
> >    its problem at half years ago. but he didn't fix yet.
> > 
> 
> The resolution with which the oom killer considers memory is at 0.1% of 
> system RAM at its highest (smaller when you have a memory controller, 
> cpuset, or mempolicy constrained oom).  It considers a task within 0.1% of 
> memory of another task to have equal "badness" to kill, we don't break 
> ties in between that resolution -- it all depends on which one shows up in 
> the tasklist first.  If you disagree with that resolution, which I support 
> as being high enough, then you may certainly propose a patch to make it 
> even finer at 0.01%, 0.001%, etc.  It would only change oom_badness() to 
> range between [0,10000], [0,100000], etc.

No.
Think Moore's Law. rational value will be not able to work in future anyway.
10 years ago, I used 20M bytes memory desktop machine and I'm now using 2GB.
memory amount is growing and growing. and bash size doesn't grwoing so fast.


> 
> > 2) Also half years ago, I did explained oom_adj is used from multiple 
> >    applications. And we can't break them. But DavidR didn't fix.
> > 
> 
> And we didn't.  oom_adj is still there and maps linearly to oom_score_adj; 
> you just can't show a single application where that mapping breaks because 
> it was based on an actual calculation.
> 
> If you would like to cite these "multiple" applications that need to be 
> converted to use oom_score_adj (I know of udev), please let me know and 
> if they're open-source applications then I will commit to submitting 
> patches for them myself.  I believe the two year window is sufficient for 
> everyone else, though.

If you want, you have to change userland at first and by yourself. Don't
claim anyoneelse should working for you.


> > 3) Also about four month ago, I and kamezawa-san pointed out his patch
> >    don't work on memcg. It also haven't been fixed.
> 
> I don't know what you're referring to here, sorry.

You should have read my patch. Even though you haven't use memcg, We do.



>    As kamezawa-san pointed out, This break cgroup and lxr environment.
>    He said,
> 	> Assume 2 proceses A, B which has oom_score_adj of 300 and 0
> 	> And A uses 200M, B uses 1G of memory under 4G system
> 	>
> 	> Under the system.
> 	> 	A's socre = (200M *1000)/4G + 300 = 350
> 	> 	B's score = (1G * 1000)/4G = 250.
> 	>
> 	> In the cpuset, it has 2G of memory.
> 	> 	A's score = (200M * 1000)/2G + 300 = 400
> 	> 	B's socre = (1G * 1000)/2G = 500
> 	>
> 	> This priority-inversion don't happen in current system.



> 
> > In the other hand, You can't explain what worth OOM-rewritten patch has. 
> > Because there is nothing. It is only "powerful"(TM) for Google. but 
> > instead It has zero worth for every other people. Here is just technical 
> > issue. Bah.
> > 
> 
> Please see my reply to Figo.zhang where I enumerate the four reasons why 
> the new userspace tunable is more powerful than oom_adj.

I'm NOT interesting *powerful* crap. Please DON'T talk which is powerful.
I can only said, It's useful only for you.



> At this point, I can only speculate that your distaste for the new oom 
> killer is one of disposition; it seems like everytime you reply to an 
> email (or, more regularly, just repost your revert) that you come into it 
> with the attitude that my response cannot possibly be correct and that the 
> way you see things is exactly as they should be.  If you were to consider 
> other people's opinions, however, you may find some common ground that can 
> be met.  I certainly did that when I introduced oom_score_adj instead of 
> replacing oom_adj immediatley.  I also did it when I removed the forkbomb 
> detector from the rewrite.  I also did it when considering swap in the 
> heuristic when it initially was only rss.  Andrew is in the position where 
> he has to make a judgment call on what should be included and what 
> shouldn't and it should be pretty darn clear after you post your revert 
> the first time, then the second time, then the third time, then the fourth 
> time, and now the fifth time.




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-16 14:55                   ` Alan Cox
  2010-11-16 20:57                     ` David Rientjes
@ 2010-11-17  4:04                     ` Valdis.Kletnieks
  1 sibling, 0 replies; 77+ messages in thread
From: Valdis.Kletnieks @ 2010-11-17  4:04 UTC (permalink / raw)
  To: Alan Cox
  Cc: Florian Mickler, Jesper Juhl, David Rientjes, KOSAKI Motohiro,
	Andrew Morton, Linus Torvalds, LKML, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

On Tue, 16 Nov 2010 14:55:51 GMT, Alan Cox said:
> > How does one mark it apropriately?
> > The commit 51b1bd2 (oom: deprecate oom_adj tunable, see below) 
> > added it to feature-removal-schedule.txt, a patch for
> > Documentation/ABI has also been provided in the meantime, if i'm not
> > mistaken. 
> 
> Yes - so why is it spewing crap, annoying users and trying to irritate
> application authors. It's not 2012 yet.

Aug 2012 is only 6 kernel releases or so away....

Presumably the whinging is so we start tracking down the offending userspace
and getting it fixed before 2012 gets here.  Sticking the warning in just one
or two kernel releases before it becomes official leads to "I can't run the new
kernel because my userspace isn't patched yet".  We really can't win here,
we don't whinge and stuff doesn't get tracked down and fixed, we do whinge
and that gets people upset too.

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-17  0:06       ` Bodo Eggert
  2010-11-17  0:25         ` David Rientjes
@ 2010-11-17  0:48         ` Mandeep Singh Baines
  1 sibling, 0 replies; 77+ messages in thread
From: Mandeep Singh Baines @ 2010-11-17  0:48 UTC (permalink / raw)
  To: Bodo Eggert
  Cc: David Rientjes, KOSAKI Motohiro, LKML, Linus Torvalds,
	Andrew Morton, Ying Han, Bodo Eggert, Figo.zhang

Bodo Eggert (7eggert@gmx.de) wrote:
> On Mon, 15 Nov 2010, David Rientjes wrote:
> > On Tue, 16 Nov 2010, Bodo Eggert wrote:
> 
> > > > CAP_SYS_RESOURCE threads have full control over their oom killing priority
> > > > by /proc/pid/oom_score_adj
> > > 
> > > , but unless they are written in the last months and designed for linux
> > > and if the author took some time to research each external process invocation,
> > > they can not be aware of this possibility.
> > > 
> > 
> > You're clearly wrong, CAP_SYS_RESOURCE has been required to modify oom_adj 
> > for over five years (as long as the git history).  8fb4fc68, merged into 
> > 2.6.20, allowed tasks to raise their own oom_adj but not decrease it.  
> > That is unchanged by the rewrite.
> 
> You are misunderstanding me. It was allowed to do this, but it did not need 
> to do it yet. It was enough to be a well-written POSIX application without 
> linux-specific OOM hacks for some specific kernel versions.
> 
> > > Besides that, if each process is supposed to change the default, the default
> > > is wrong.
> > 
> > That doesn't make any sense, if want to protect a thread from the oom 
> > killer you're going to need to modify oom_score_adj, the kernel can't know 
> > what you perceive as being vital.  Having CAP_SYS_RESOURCE alone does not 
> > imply that, it only allows unbounded access to resources.  That's 
> > completely orthogonal to the goal of the oom killer heuristic, which is to 
> > find the most memory-hogging task to kill.
> 
> The old oom killer's task was to guess the best victim to kill. For me, it 
> did a good job (but the system kept thrashing for too long until it kicked

Here's a patch I've been working on to control thrashing.

http://lkml.org/lkml/2010/10/28/289

It works well for our app: web browser. We'd rather OOM quickly and kill
a browser tab than thrash for a few minutes and then OOM. It works well for
us but I'm working on a more generally useful solution.

> the offender). Looking at CAP_SYS_RESOURCE was one way to recognize 
> important processes.
> 
> > > 1) The exponential scale did have a low resolution.
> > > 
> > > 2) The heuristics were developed using much brain power and much
> > >    trial-and-error. You are going back to basics, and some people
> > >    are not convinced that this is better. I googled and I did not
> > >    find a discussion about how and why the new score was designed
> > >    this way.
> > >    looking at the output of:
> > >    cd /proc; for a in [0-9]*; do
> > >      echo `cat $a/oom_score` $a `perl -pes/'\0.*$'// < $a/cmdline`;
> > >    done|grep -v ^0|sort -n |less
> > >    , I 'm not convinced, too.
> > > 
> > 
> > The old heuristics were a mixture of arbitrary values that didn't adjust 
> > scores based on a unit and would often cause the incorrect task to be 
> > targeted because there was no clear goal being achieved.  The new 
> > heuristic has a solid goal: to identify and kill the most memory-hogging 
> > task that is eligible given the context in which the oom occurs.  If you 
> > disagree with that goal and want any of the old heursitics reintroduced, 
> > please show that it makes sense in the oom killer.
> 
> The first old OOM killer did the same as you promise the current one does,
> except for your bugfixes. That's why it killed the wrong applications and
> all the heuristics were added until the complaints stopped.
> 
> Off cause I did not yet test your OOM killer, maybe it really is better.
> Heuristics tend to rot and you did much work to make it right.
> 
> I don't want the old OOM killer back, but I don't want you to fall
> into the same pits as the pre-old OOM killer used to do.
> 
> > > PS) Mapping an exponential value to a linear score is bad. E.g. A
> > >     oom_adj of 8 should make an 1-MB-process as likely to kill as
> > >     a 256-MB-process with oom_adj=0.
> > > 
> > 
> > To show that, you would have to show that an application that exists today 
> > uses an oom_adj for something other than polarization and is based on a 
> > calculation of allowable memory usage.  It simply doesn't exist.
> 
> No such application should exist because the OOM killer should DTRT.
> oom_adj was supposed to let the sysadmin lower his mission-critical
> DB's score to be just lower than the less-important tasks, or to
> point the kernel to his ever-faulty and easily-restarted browser.
> 
> > > PS2) Because I saw this in your presentation PDF: (@udev-people)
> > >     The -17 score of udevd is wrong, since it will even prevent
> > >     the OOM killer from working correctly if it grows to 100 MB:
> > > 
> > 
> > Threads with CAP_SYS_RESOURCE are free to lower the oom_score_adj of any 
> > thread they deem fit and that includes applications that lower its own 
> > oom_score_adj.  The kernel isn't going to prohibit users from setting 
> > their own oom_score_adj.
> 
> My point is: The udev people should not prevent the OOM killer 
> unconditionally, it has an important task in case something goes wrong.
> I just didn't want to start a new thread at that time of day.
> -- 
> How do I set my laser printer on stun?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-17  0:06       ` Bodo Eggert
@ 2010-11-17  0:25         ` David Rientjes
  2010-11-17  0:48         ` Mandeep Singh Baines
  1 sibling, 0 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-17  0:25 UTC (permalink / raw)
  To: Bodo Eggert
  Cc: KOSAKI Motohiro, LKML, Linus Torvalds, Andrew Morton, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

On Wed, 17 Nov 2010, Bodo Eggert wrote:

> The old oom killer's task was to guess the best victim to kill. For me, it 
> did a good job (but the system kept thrashing for too long until it kicked
> the offender). Looking at CAP_SYS_RESOURCE was one way to recognize 
> important processes.
> 

CAP_SYS_RESOURCE does not imply the task is important.

There's a problem when the kernel is oom; killing a thread that is getting 
work done is one of the most serious remedies the kernel will ever do to 
allow forward progress.  In almost all scenarios (except in some cpuset or 
memcg configurations), it's a userspace configuration issue that exhausts 
memory and the VM finds no other alternative.  CAP_SYS_RESOURCE threads 
have access to unbounded amounts of resources and thus can use an 
extremely large amount of memory very quickly and at a detriment to other 
threads that may be as important to more important.  Considering them any 
different is an unsubstantiated and undefined behavior that should not be 
considered in the heuristic _unless_ the administrator or the task itself 
tells the kernel via oom_score_adj of its priority.

> > The old heuristics were a mixture of arbitrary values that didn't adjust 
> > scores based on a unit and would often cause the incorrect task to be 
> > targeted because there was no clear goal being achieved.  The new 
> > heuristic has a solid goal: to identify and kill the most memory-hogging 
> > task that is eligible given the context in which the oom occurs.  If you 
> > disagree with that goal and want any of the old heursitics reintroduced, 
> > please show that it makes sense in the oom killer.
> 
> The first old OOM killer did the same as you promise the current one does,
> except for your bugfixes. That's why it killed the wrong applications and
> all the heuristics were added until the complaints stopped.
> 

No, the old oom killer did not always kill the application that used the 
most amount of memory; it considered other factors with arbitrary point 
deductions such as nice level, runtime, CAP_SYS_RAWIO, CAP_SYS_RESOURCE, 
etc.  We had to remove those heuristics internally in older kernels as 
well because it would often allow a task to runaway using a massive amount 
of memory because of leaks and kill everything else on the system before 
targeting the appropriate task.  At that point, it left the system with 
barely anything running and no work was getting done.

> Off cause I did not yet test your OOM killer, maybe it really is better.
> Heuristics tend to rot and you did much work to make it right.
> 
> I don't want the old OOM killer back, but I don't want you to fall
> into the same pits as the pre-old OOM killer used to do.
> 

Thanks, and that's why I'm trying to avoid additional heuristics such 
CAP_SYS_RESOURCE where the priority is _implied_ rather than _proven_.  If 
CAP_SYS_RESOURCE was defined to be more preferred to stay alive, then I'd 
have no argument; it isn't.

> > > PS) Mapping an exponential value to a linear score is bad. E.g. A
> > >     oom_adj of 8 should make an 1-MB-process as likely to kill as
> > >     a 256-MB-process with oom_adj=0.
> > > 
> > 
> > To show that, you would have to show that an application that exists today 
> > uses an oom_adj for something other than polarization and is based on a 
> > calculation of allowable memory usage.  It simply doesn't exist.
> 
> No such application should exist because the OOM killer should DTRT.
> oom_adj was supposed to let the sysadmin lower his mission-critical
> DB's score to be just lower than the less-important tasks, or to
> point the kernel to his ever-faulty and easily-restarted browser.
> 

oom_score_adj allows use to define when an application is using more 
memory than expected and is often helpful in cpuset, memcg, or mempolicy 
constrained cases as well.  We'd like to be able to say that 30% of 
available memory should be discounted from a particular task that is 
expected to use 30% more memory than others without getting preferred.  
oom_score_adj can do that, oom_adj could not.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15 23:50     ` David Rientjes
@ 2010-11-17  0:06       ` Bodo Eggert
  2010-11-17  0:25         ` David Rientjes
  2010-11-17  0:48         ` Mandeep Singh Baines
  0 siblings, 2 replies; 77+ messages in thread
From: Bodo Eggert @ 2010-11-17  0:06 UTC (permalink / raw)
  To: David Rientjes
  Cc: Bodo Eggert, KOSAKI Motohiro, LKML, Linus Torvalds,
	Andrew Morton, Ying Han, Bodo Eggert, Mandeep Singh Baines,
	Figo.zhang

On Mon, 15 Nov 2010, David Rientjes wrote:
> On Tue, 16 Nov 2010, Bodo Eggert wrote:

> > > CAP_SYS_RESOURCE threads have full control over their oom killing priority
> > > by /proc/pid/oom_score_adj
> > 
> > , but unless they are written in the last months and designed for linux
> > and if the author took some time to research each external process invocation,
> > they can not be aware of this possibility.
> > 
> 
> You're clearly wrong, CAP_SYS_RESOURCE has been required to modify oom_adj 
> for over five years (as long as the git history).  8fb4fc68, merged into 
> 2.6.20, allowed tasks to raise their own oom_adj but not decrease it.  
> That is unchanged by the rewrite.

You are misunderstanding me. It was allowed to do this, but it did not need 
to do it yet. It was enough to be a well-written POSIX application without 
linux-specific OOM hacks for some specific kernel versions.

> > Besides that, if each process is supposed to change the default, the default
> > is wrong.
> 
> That doesn't make any sense, if want to protect a thread from the oom 
> killer you're going to need to modify oom_score_adj, the kernel can't know 
> what you perceive as being vital.  Having CAP_SYS_RESOURCE alone does not 
> imply that, it only allows unbounded access to resources.  That's 
> completely orthogonal to the goal of the oom killer heuristic, which is to 
> find the most memory-hogging task to kill.

The old oom killer's task was to guess the best victim to kill. For me, it 
did a good job (but the system kept thrashing for too long until it kicked
the offender). Looking at CAP_SYS_RESOURCE was one way to recognize 
important processes.

> > 1) The exponential scale did have a low resolution.
> > 
> > 2) The heuristics were developed using much brain power and much
> >    trial-and-error. You are going back to basics, and some people
> >    are not convinced that this is better. I googled and I did not
> >    find a discussion about how and why the new score was designed
> >    this way.
> >    looking at the output of:
> >    cd /proc; for a in [0-9]*; do
> >      echo `cat $a/oom_score` $a `perl -pes/'\0.*$'// < $a/cmdline`;
> >    done|grep -v ^0|sort -n |less
> >    , I 'm not convinced, too.
> > 
> 
> The old heuristics were a mixture of arbitrary values that didn't adjust 
> scores based on a unit and would often cause the incorrect task to be 
> targeted because there was no clear goal being achieved.  The new 
> heuristic has a solid goal: to identify and kill the most memory-hogging 
> task that is eligible given the context in which the oom occurs.  If you 
> disagree with that goal and want any of the old heursitics reintroduced, 
> please show that it makes sense in the oom killer.

The first old OOM killer did the same as you promise the current one does,
except for your bugfixes. That's why it killed the wrong applications and
all the heuristics were added until the complaints stopped.

Off cause I did not yet test your OOM killer, maybe it really is better.
Heuristics tend to rot and you did much work to make it right.

I don't want the old OOM killer back, but I don't want you to fall
into the same pits as the pre-old OOM killer used to do.

> > PS) Mapping an exponential value to a linear score is bad. E.g. A
> >     oom_adj of 8 should make an 1-MB-process as likely to kill as
> >     a 256-MB-process with oom_adj=0.
> > 
> 
> To show that, you would have to show that an application that exists today 
> uses an oom_adj for something other than polarization and is based on a 
> calculation of allowable memory usage.  It simply doesn't exist.

No such application should exist because the OOM killer should DTRT.
oom_adj was supposed to let the sysadmin lower his mission-critical
DB's score to be just lower than the less-important tasks, or to
point the kernel to his ever-faulty and easily-restarted browser.

> > PS2) Because I saw this in your presentation PDF: (@udev-people)
> >     The -17 score of udevd is wrong, since it will even prevent
> >     the OOM killer from working correctly if it grows to 100 MB:
> > 
> 
> Threads with CAP_SYS_RESOURCE are free to lower the oom_score_adj of any 
> thread they deem fit and that includes applications that lower its own 
> oom_score_adj.  The kernel isn't going to prohibit users from setting 
> their own oom_score_adj.

My point is: The udev people should not prevent the OOM killer 
unconditionally, it has an important task in case something goes wrong.
I just didn't want to start a new thread at that time of day.
-- 
How do I set my laser printer on stun?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-16 20:57                     ` David Rientjes
@ 2010-11-16 21:01                       ` Fabio Comolli
  0 siblings, 0 replies; 77+ messages in thread
From: Fabio Comolli @ 2010-11-16 21:01 UTC (permalink / raw)
  To: David Rientjes; +Cc: LKML

[CC: list trimmed again for sanity]

Another one:

[   34.709156] chromium-browse (1439): /proc/1480/oom_adj is
deprecated, please use /proc/1480/oom_score_adj instead.

2.6.37-rc2 - archlinux - package chromium-browser-ppa from AUR




On Tue, Nov 16, 2010 at 9:57 PM, David Rientjes <rientjes@google.com> wrote:
> On Tue, 16 Nov 2010, Alan Cox wrote:
>
>> Yes - so why is it spewing crap, annoying users and trying to irritate
>> application authors. It's not 2012 yet.
>>
>
> It's a WARN_ON_ONCE() so it will only spew a single line as a reminder
> that the application needs to be updated; would you prefer that to be
> suppressed until a year before removal, for example?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-16 14:55                   ` Alan Cox
@ 2010-11-16 20:57                     ` David Rientjes
  2010-11-16 21:01                       ` Fabio Comolli
  2010-11-17  4:04                     ` Valdis.Kletnieks
  1 sibling, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-16 20:57 UTC (permalink / raw)
  To: Alan Cox
  Cc: Florian Mickler, Valdis.Kletnieks, Jesper Juhl, KOSAKI Motohiro,
	Andrew Morton, Linus Torvalds, LKML, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

On Tue, 16 Nov 2010, Alan Cox wrote:

> Yes - so why is it spewing crap, annoying users and trying to irritate
> application authors. It's not 2012 yet.
> 

It's a WARN_ON_ONCE() so it will only spew a single line as a reminder 
that the application needs to be updated; would you prefer that to be 
suppressed until a year before removal, for example?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-16  0:13             ` Valdis.Kletnieks
  2010-11-16  6:43               ` David Rientjes
  2010-11-16 11:03               ` Alan Cox
@ 2010-11-16 15:15               ` Alejandro Riveira Fernández
  2 siblings, 0 replies; 77+ messages in thread
From: Alejandro Riveira Fernández @ 2010-11-16 15:15 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Jesper Juhl, David Rientjes, KOSAKI Motohiro, Andrew Morton,
	Linus Torvalds, LKML, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

[-- Attachment #1: Type: text/plain, Size: 1921 bytes --]

El Mon, 15 Nov 2010 19:13:15 -0500
Valdis.Kletnieks@vt.edu escribió:

> On Tue, 16 Nov 2010 00:31:00 +0100, Jesper Juhl said:
> 
> > I'm not going into the debate about whether or not deprecating one tunable 
> > for two years is sufficient or not. I'm simply going to mention one app 
> > that I know of that needs to be converted to use "oom_score_adj" on my 
> > box :
> > 
> > [jj@dragon ~]$ uname -a
> > Linux dragon 2.6.37-rc1-ARCH-00542-g0143832-dirty #1 SMP PREEMPT Mon Nov 15 2
> 2:01:52 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel
>  GNU/Linux
> > [jj@dragon ~]$ dmesg | grep oom_adj
> > start_kdeinit (1502): /proc/1502/oom_adj is deprecated, please use /proc/1502/oom_score_adj instead.
> 
> Make that 2 common apps:
> 
> % uname -a
> Linux turing-police.cc.vt.edu 2.6.37-rc1-mmotm1109 #1 SMP PREEMPT Wed Nov 10 12:30:17 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
> % dmesg | grep oom
> [   89.981594] sshd (4168): /proc/4168/oom_adj is deprecated, please use /proc/4168/oom_score_adj instead.
> % rpm -q openssh
> openssh-5.6p1-16.fc15.x86_64
> 
> 5.6p1 is the latest-n-greatest released version on www.openssh.org, so somebody
> probably needs to rattle their chain...

$ dmesg | grep deprecated
[    1.473365] udevd (662): /proc/662/oom_adj is deprecated, please use /proc/662/oom_score_adj instead.
$ apt-cache policy udev
udev:
  Instalados: 151-12.3
  Candidato: 151-12.3
  Tabla de versión:
 *** 151-12.3 0
        500 http://es.archive.ubuntu.com/ubuntu/ lucid-proposed/main Packages
        100 /var/lib/dpkg/status
     151-12.2 0
        500 http://es.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
     151-12 0
        500 http://es.archive.ubuntu.com/ubuntu/ lucid/main Packages
$ uname -a
Linux varda 2.6.36-00001-g90d39e9 #145 SMP PREEMPT Wed Oct 20 23:27:44 CEST 2010 x86_64 GNU/Linux

 Ubuntu 10.04 LTS

> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-16 13:03                 ` Florian Mickler
@ 2010-11-16 14:55                   ` Alan Cox
  2010-11-16 20:57                     ` David Rientjes
  2010-11-17  4:04                     ` Valdis.Kletnieks
  0 siblings, 2 replies; 77+ messages in thread
From: Alan Cox @ 2010-11-16 14:55 UTC (permalink / raw)
  To: Florian Mickler
  Cc: Valdis.Kletnieks, Jesper Juhl, David Rientjes, KOSAKI Motohiro,
	Andrew Morton, Linus Torvalds, LKML, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

> How does one mark it apropriately?
> The commit 51b1bd2 (oom: deprecate oom_adj tunable, see below) 
> added it to feature-removal-schedule.txt, a patch for
> Documentation/ABI has also been provided in the meantime, if i'm not
> mistaken. 

Yes - so why is it spewing crap, annoying users and trying to irritate
application authors. It's not 2012 yet.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-16 11:03               ` Alan Cox
@ 2010-11-16 13:03                 ` Florian Mickler
  2010-11-16 14:55                   ` Alan Cox
  0 siblings, 1 reply; 77+ messages in thread
From: Florian Mickler @ 2010-11-16 13:03 UTC (permalink / raw)
  To: Alan Cox
  Cc: Valdis.Kletnieks, Jesper Juhl, David Rientjes, KOSAKI Motohiro,
	Andrew Morton, Linus Torvalds, LKML, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

On Tue, 16 Nov 2010 11:03:10 +0000
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > 5.6p1 is the latest-n-greatest released version on www.openssh.org, so somebody
> > probably needs to rattle their chain...
> 
> But current openssh needs to support old kernels.
> 
> This is why this kind of obsoleting doesn't work well. It's not "update
> your app" so much as "drop support for older stuff or start doing
> complicated crap dependant on version"
> 
> and it's why for tiny amounts of code it is the *wrong* thing to force
> obsolete stuff especially when it still doesn't seem to have been
> properly marked for deprecation in the first place.
> 

How does one mark it apropriately?
The commit 51b1bd2 (oom: deprecate oom_adj tunable, see below) 
added it to feature-removal-schedule.txt, a patch for
Documentation/ABI has also been provided in the meantime, if i'm not
mistaken. 

And there is already a patch for openssh:
https://bugzilla.mindrot.org/show_bug.cgi?id=1838

Regards,
Flo

commit 51b1bd2ace1595b72956224deda349efa880b693
Author: David Rientjes <rientjes@google.com>
Date:   Mon Aug 9 17:19:47 2010 -0700

    oom: deprecate oom_adj tunable
    
    /proc/pid/oom_adj is now deprecated so that that it may eventually be
    removed.  The target date for removal is August 2012.
    
    A warning will be printed to the kernel log if a task attempts to use this
    interface.  Future warning will be suppressed until the kernel is rebooted
    to prevent spamming the kernel log.
    
    Signed-off-by: David Rientjes <rientjes@google.com>
    Cc: Nick Piggin <npiggin@suse.de>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Balbir Singh <balbir@in.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-16  0:13             ` Valdis.Kletnieks
  2010-11-16  6:43               ` David Rientjes
@ 2010-11-16 11:03               ` Alan Cox
  2010-11-16 13:03                 ` Florian Mickler
  2010-11-16 15:15               ` Alejandro Riveira Fernández
  2 siblings, 1 reply; 77+ messages in thread
From: Alan Cox @ 2010-11-16 11:03 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Jesper Juhl, David Rientjes, KOSAKI Motohiro, Andrew Morton,
	Linus Torvalds, LKML, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

> 5.6p1 is the latest-n-greatest released version on www.openssh.org, so somebody
> probably needs to rattle their chain...

But current openssh needs to support old kernels.

This is why this kind of obsoleting doesn't work well. It's not "update
your app" so much as "drop support for older stuff or start doing
complicated crap dependant on version"

and it's why for tiny amounts of code it is the *wrong* thing to force
obsolete stuff especially when it still doesn't seem to have been
properly marked for deprecation in the first place.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-16 10:04               ` Martin Knoblauch
@ 2010-11-16 10:33                 ` Alessandro Suardi
  0 siblings, 0 replies; 77+ messages in thread
From: Alessandro Suardi @ 2010-11-16 10:33 UTC (permalink / raw)
  To: Martin Knoblauch; +Cc: David Rientjes, LKML

On Tue, Nov 16, 2010 at 11:04 AM, Martin Knoblauch
<spamtrap@knobisoft.de> wrote:
> CC trimmed for sanity ...
>
> ----- Original Message ----
>
>> From: David Rientjes <rientjes@google.com>
>> To: Jesper Juhl <jj@chaosbits.net>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>; Andrew Morton
>><akpm@linux-foundation.org>; Linus Torvalds <torvalds@linux-foundation.org>;
>>LKML <linux-kernel@vger.kernel.org>; Ying Han <yinghan@google.com>; Bodo Eggert
>><7eggert@web.de>; Mandeep@yahoo.com
>> Sent: Tue, November 16, 2010 1:06:21 AM
>> Subject: Re: [PATCH] Revert oom rewrite series
>>
>>
>> Thanks for the report!  I'll get  involved with kde-devel and send a patch
>> to remove this dependency on newer  kernels to expedite the process.
>>
>>  [ Others with reports of deprecated use  of oom_adj can contact me
>>    privately and I'll find the parties of  interest to avoid topics
>>    unrelated to the kernel itself on LKML.  ]
> David,
>
>  another one for your collection. You asked for it :-) This is CentOS-5.5
> running on top of kernel 2.6.36, likely out of initrd:
>
> $ dmesg | grep deprecated
> [    2.430330] nash-hotplug (67): /proc/67/oom_adj is deprecated, please use
> /proc/67/oom_score_adj instead.

...and another, on Fedora 14, 2.6.37-rc1-git11:

auditd (2583): /proc/2583/oom_adj is deprecated, please use
/proc/2583/oom_score_adj instead.

Cheers,

--alessandro

 "There's always a siren singing you to shipwreck"

   (Radiohead, "There There")

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-16  0:06             ` David Rientjes
@ 2010-11-16 10:04               ` Martin Knoblauch
  2010-11-16 10:33                 ` Alessandro Suardi
  0 siblings, 1 reply; 77+ messages in thread
From: Martin Knoblauch @ 2010-11-16 10:04 UTC (permalink / raw)
  To: David Rientjes; +Cc: LKML

CC trimmed for sanity ...

----- Original Message ----

> From: David Rientjes <rientjes@google.com>
> To: Jesper Juhl <jj@chaosbits.net>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>; Andrew Morton 
><akpm@linux-foundation.org>; Linus Torvalds <torvalds@linux-foundation.org>; 
>LKML <linux-kernel@vger.kernel.org>; Ying Han <yinghan@google.com>; Bodo Eggert 
><7eggert@web.de>; Mandeep@yahoo.com
> Sent: Tue, November 16, 2010 1:06:21 AM
> Subject: Re: [PATCH] Revert oom rewrite series
> 
> 
> Thanks for the report!  I'll get  involved with kde-devel and send a patch 
> to remove this dependency on newer  kernels to expedite the process.
> 
>  [ Others with reports of deprecated use  of oom_adj can contact me 
>    privately and I'll find the parties of  interest to avoid topics 
>    unrelated to the kernel itself on LKML.  ]
David,

 another one for your collection. You asked for it :-) This is CentOS-5.5 
running on top of kernel 2.6.36, likely out of initrd:

$ dmesg | grep deprecated
[    2.430330] nash-hotplug (67): /proc/67/oom_adj is deprecated, please use 
/proc/67/oom_score_adj instead.

Cheers
Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-16  0:13             ` Valdis.Kletnieks
@ 2010-11-16  6:43               ` David Rientjes
  2010-11-16 11:03               ` Alan Cox
  2010-11-16 15:15               ` Alejandro Riveira Fernández
  2 siblings, 0 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-16  6:43 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Jesper Juhl, KOSAKI Motohiro, Andrew Morton, Linus Torvalds,
	LKML, Ying Han, Bodo Eggert, Mandeep Singh Baines, Figo.zhang

On Mon, 15 Nov 2010, Valdis.Kletnieks@vt.edu wrote:

> Make that 2 common apps:
> 
> % uname -a
> Linux turing-police.cc.vt.edu 2.6.37-rc1-mmotm1109 #1 SMP PREEMPT Wed Nov 10 12:30:17 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
> % dmesg | grep oom
> [   89.981594] sshd (4168): /proc/4168/oom_adj is deprecated, please use /proc/4168/oom_score_adj instead.
> % rpm -q openssh
> openssh-5.6p1-16.fc15.x86_64
> 
> 5.6p1 is the latest-n-greatest released version on www.openssh.org, so somebody
> probably needs to rattle their chain...
> 

Thanks, Darren Tucker fixed this a few hours after I reported it on the 
openssh bugzilla, the patch is at 
https://bugzilla.mindrot.org/show_bug.cgi?id=1838 -- it uses oom_score_adj 
if it exists and then falls back to oom_adj if running on an older kernel.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15 23:31           ` Jesper Juhl
  2010-11-16  0:06             ` David Rientjes
@ 2010-11-16  0:13             ` Valdis.Kletnieks
  2010-11-16  6:43               ` David Rientjes
                                 ` (2 more replies)
  1 sibling, 3 replies; 77+ messages in thread
From: Valdis.Kletnieks @ 2010-11-16  0:13 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: David Rientjes, KOSAKI Motohiro, Andrew Morton, Linus Torvalds,
	LKML, Ying Han, Bodo Eggert, Mandeep Singh Baines, Figo.zhang

[-- Attachment #1: Type: text/plain, Size: 1089 bytes --]

On Tue, 16 Nov 2010 00:31:00 +0100, Jesper Juhl said:

> I'm not going into the debate about whether or not deprecating one tunable 
> for two years is sufficient or not. I'm simply going to mention one app 
> that I know of that needs to be converted to use "oom_score_adj" on my 
> box :
> 
> [jj@dragon ~]$ uname -a
> Linux dragon 2.6.37-rc1-ARCH-00542-g0143832-dirty #1 SMP PREEMPT Mon Nov 15 2
2:01:52 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel
 GNU/Linux
> [jj@dragon ~]$ dmesg | grep oom_adj
> start_kdeinit (1502): /proc/1502/oom_adj is deprecated, please use /proc/1502/oom_score_adj instead.

Make that 2 common apps:

% uname -a
Linux turing-police.cc.vt.edu 2.6.37-rc1-mmotm1109 #1 SMP PREEMPT Wed Nov 10 12:30:17 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
% dmesg | grep oom
[   89.981594] sshd (4168): /proc/4168/oom_adj is deprecated, please use /proc/4168/oom_score_adj instead.
% rpm -q openssh
openssh-5.6p1-16.fc15.x86_64

5.6p1 is the latest-n-greatest released version on www.openssh.org, so somebody
probably needs to rattle their chain...


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15 23:31           ` Jesper Juhl
@ 2010-11-16  0:06             ` David Rientjes
  2010-11-16 10:04               ` Martin Knoblauch
  2010-11-16  0:13             ` Valdis.Kletnieks
  1 sibling, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-16  0:06 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: KOSAKI Motohiro, Andrew Morton, Linus Torvalds, LKML, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

On Tue, 16 Nov 2010, Jesper Juhl wrote:

> [jj@dragon ~]$ uname -a
> Linux dragon 2.6.37-rc1-ARCH-00542-g0143832-dirty #1 SMP PREEMPT Mon Nov 15 22:01:52 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel GNU/Linux
> [jj@dragon ~]$ dmesg | grep oom_adj
> start_kdeinit (1502): /proc/1502/oom_adj is deprecated, please use /proc/1502/oom_score_adj instead.
> [jj@dragon ~]$ /usr/lib/kde4/libexec/start_kdeinit --version
> 
> Qt: 4.7.1
> KDE: 4.5.3 (KDE 4.5.3)
> 

Thanks for the report!  I'll get involved with kde-devel and send a patch 
to remove this dependency on newer kernels to expedite the process.

 [ Others with reports of deprecated use of oom_adj can contact me 
   privately and I'll find the parties of interest to avoid topics 
   unrelated to the kernel itself on LKML. ]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15 23:33   ` Bodo Eggert
@ 2010-11-15 23:50     ` David Rientjes
  2010-11-17  0:06       ` Bodo Eggert
  0 siblings, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-15 23:50 UTC (permalink / raw)
  To: Bodo Eggert
  Cc: KOSAKI Motohiro, LKML, Linus Torvalds, Andrew Morton, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

On Tue, 16 Nov 2010, Bodo Eggert wrote:

> > CAP_SYS_RESOURCE threads have full control over their oom killing priority
> > by /proc/pid/oom_score_adj
> 
> , but unless they are written in the last months and designed for linux
> and if the author took some time to research each external process invocation,
> they can not be aware of this possibility.
> 

You're clearly wrong, CAP_SYS_RESOURCE has been required to modify oom_adj 
for over five years (as long as the git history).  8fb4fc68, merged into 
2.6.20, allowed tasks to raise their own oom_adj but not decrease it.  
That is unchanged by the rewrite.

> Besides that, if each process is supposed to change the default, the default
> is wrong.
> 

That doesn't make any sense, if want to protect a thread from the oom 
killer you're going to need to modify oom_score_adj, the kernel can't know 
what you perceive as being vital.  Having CAP_SYS_RESOURCE alone does not 
imply that, it only allows unbounded access to resources.  That's 
completely orthogonal to the goal of the oom killer heuristic, which is to 
find the most memory-hogging task to kill.

> 1) The exponential scale did have a low resolution.
> 
> 2) The heuristics were developed using much brain power and much
>    trial-and-error. You are going back to basics, and some people
>    are not convinced that this is better. I googled and I did not
>    find a discussion about how and why the new score was designed
>    this way.
>    looking at the output of:
>    cd /proc; for a in [0-9]*; do
>      echo `cat $a/oom_score` $a `perl -pes/'\0.*$'// < $a/cmdline`;
>    done|grep -v ^0|sort -n |less
>    , I 'm not convinced, too.
> 

The old heuristics were a mixture of arbitrary values that didn't adjust 
scores based on a unit and would often cause the incorrect task to be 
targeted because there was no clear goal being achieved.  The new 
heuristic has a solid goal: to identify and kill the most memory-hogging 
task that is eligible given the context in which the oom occurs.  If you 
disagree with that goal and want any of the old heursitics reintroduced, 
please show that it makes sense in the oom killer.

> PS) Mapping an exponential value to a linear score is bad. E.g. A
>     oom_adj of 8 should make an 1-MB-process as likely to kill as
>     a 256-MB-process with oom_adj=0.
> 

To show that, you would have to show that an application that exists today 
uses an oom_adj for something other than polarization and is based on a 
calculation of allowable memory usage.  It simply doesn't exist.

> PS2) Because I saw this in your presentation PDF: (@udev-people)
>     The -17 score of udevd is wrong, since it will even prevent
>     the OOM killer from working correctly if it grows to 100 MB:
> 

Threads with CAP_SYS_RESOURCE are free to lower the oom_score_adj of any 
thread they deem fit and that includes applications that lower its own 
oom_score_adj.  The kernel isn't going to prohibit users from setting 
their own oom_score_adj.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-14 21:58 ` David Rientjes
@ 2010-11-15 23:33   ` Bodo Eggert
  2010-11-15 23:50     ` David Rientjes
  0 siblings, 1 reply; 77+ messages in thread
From: Bodo Eggert @ 2010-11-15 23:33 UTC (permalink / raw)
  To: David Rientjes
  Cc: KOSAKI Motohiro, LKML, Linus Torvalds, Andrew Morton, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

On Sun, 14 Nov 2010, David Rientjes wrote:

> Also, stating that the new heuristic doesn't address CAP_SYS_RESOURCE
> approrpiately isn't a bug report, it's the desired behavior.  I eliminated
> all of the arbitrary heursitics in the old heuristic that we had the
> remove internally as well so that is predictable as possible and achieves
> the oom killer's sole goal: to kill the most memory-hogging task that is
> eligible to allow memory allocations in the current context to succeed.

> CAP_SYS_RESOURCE threads have full control over their oom killing priority
> by /proc/pid/oom_score_adj

, but unless they are written in the last months and designed for linux
and if the author took some time to research each external process 
invocation, they can not be aware of this possibility.

Besides that, if each process is supposed to change the default, the 
default is wrong.

> and need no consideration in the heuristic by
> default since it otherwise allows for the probability that multiple tasks
> will need to be killed when a CAP_SYS_RESOURCE thread uses an egregious
> amount of memory.

If it happens to use an egregious mount of memory, it SHOULD score
enough to get killed.

>> The problem is, DavidR patches don't refrect real world usecase at all
>> and breaking them. He can talk about the userland is wrong. but such
>> excuse doesn't solve real world issue. it makes no sense.
>
> As mentioned just a few minutes ago in another thread, there is no
> userspace breakage with the rewrite and you're only complaining here about
> the deprecation of /proc/pid/oom_adj for a period of two years.  Until
> it's removed in 2012 or later, it maps to the linear scale that
> oom_score_adj uses rather than its old exponential scale that was
> unusable for prioritization because of (1) the extremely low resolution,
> and (2) the arbitrary heuristics that preceeded it.

1) The exponential scale did have a low resolution.

2) The heuristics were developed using much brain power and much
    trial-and-error. You are going back to basics, and some people
    are not convinced that this is better. I googled and I did not
    find a discussion about how and why the new score was designed
    this way.
    looking at the output of:
    cd /proc; for a in [0-9]*; do
      echo `cat $a/oom_score` $a `perl -pes/'\0.*$'// < $a/cmdline`;
    done|grep -v ^0|sort -n |less
    , I 'm not convinced, too.

PS) Mapping an exponential value to a linear score is bad. E.g. A
     oom_adj of 8 should make an 1-MB-process as likely to kill as
     a 256-MB-process with oom_adj=0.

PS2) Because I saw this in your presentation PDF: (@udev-people)
     The -17 score of udevd is wrong, since it will even prevent
     the OOM killer from working correctly if it grows to 100 MB:

     It's default OOM score is 13, while root's shell is at 190
     and some KDE processes are at 200 000. It will not get killed
     under normal circumstances.

     If it udevd grows enough to score 190 as well, it has a bug
     that causes it to eat memory and it needs to be killed. Having
     a -17 oom_adj, it will cause the system to fail instead.
     Considering udevd's size, an adj of -1 or -2 should be enough on
     embedded systems, while desktop systems should not need it.
     If you are worried about udevd getting killed, protect ist using
     a wrapper.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15 10:34         ` David Rientjes
@ 2010-11-15 23:31           ` Jesper Juhl
  2010-11-16  0:06             ` David Rientjes
  2010-11-16  0:13             ` Valdis.Kletnieks
  2010-11-23  7:16           ` KOSAKI Motohiro
  1 sibling, 2 replies; 77+ messages in thread
From: Jesper Juhl @ 2010-11-15 23:31 UTC (permalink / raw)
  To: David Rientjes
  Cc: KOSAKI Motohiro, Andrew Morton, Linus Torvalds, LKML, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

On Mon, 15 Nov 2010, David Rientjes wrote:

[...]
> If you would like to cite these "multiple" applications that need to be 
> converted to use oom_score_adj (I know of udev), please let me know and 
> if they're open-source applications then I will commit to submitting 
> patches for them myself.  I believe the two year window is sufficient for 
> everyone else, though.
[...]

I'm not going into the debate about whether or not deprecating one tunable 
for two years is sufficient or not. I'm simply going to mention one app 
that I know of that needs to be converted to use "oom_score_adj" on my 
box :

[jj@dragon ~]$ uname -a
Linux dragon 2.6.37-rc1-ARCH-00542-g0143832-dirty #1 SMP PREEMPT Mon Nov 15 22:01:52 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel GNU/Linux
[jj@dragon ~]$ dmesg | grep oom_adj
start_kdeinit (1502): /proc/1502/oom_adj is deprecated, please use /proc/1502/oom_score_adj instead.
[jj@dragon ~]$ /usr/lib/kde4/libexec/start_kdeinit --version

Qt: 4.7.1
KDE: 4.5.3 (KDE 4.5.3)



-- 
Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15  6:57       ` KOSAKI Motohiro
@ 2010-11-15 10:34         ` David Rientjes
  2010-11-15 23:31           ` Jesper Juhl
  2010-11-23  7:16           ` KOSAKI Motohiro
  2010-11-23  7:16         ` KOSAKI Motohiro
  1 sibling, 2 replies; 77+ messages in thread
From: David Rientjes @ 2010-11-15 10:34 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Linus Torvalds, LKML, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

On Mon, 15 Nov 2010, KOSAKI Motohiro wrote:

> Of cource, I denied. He seems to think number of email is meaningful than
> how talk about. but it's incorrect and makes no sense. Why not? Also, He
> have to talk about logically. "Hey, I think it's not bug" makes no sense.
> Such claim don't solve anything. userland is still unhappy. Why not?
> I want to quickly action.
> 

If there are pending complaints or bugs that I haven't addressed, please 
bring them to my attention.  To date, I know of no issues that have been 
raised that I have not addressed; you're always free to disagree with my 
position, but in the end you may find that when the kernel moves in a 
different direction that you should begin to accept it.

> That said, If anyone want to change userland ABI, Be carefully. They have
> to investigate userland usecase carefully and avoid to break them carefully 
> again. If someone think "hey, It's no big matter. userland rewritten can solve
> an issue", I strongly disagree. they don't understand why all of userland 
> applications rewritten is harmful.
> 

You may remember that the initial version of my rewrite replaced oom_adj 
entirely with the new oom_score_adj semantics.  Others suggested that it 
be seperated into a new tunable and the old tunable deprecated for a 
lengthy period of time.  I accepted that criticism and understood the 
drawbacks of replacing the tunable immediately and followed those 
suggestions.  I disagree with you that the deprecation of oom_adj for a 
period of two years is as dramatic as you imply and I disagree that users 
are experiencing problems with the linear scale that it now operates on 
versus the old exponential scale.

> 1) About two month ago, Dave hansen observed strange OOM issue because he
>    has a big machine and ALL process are not so big. thus, eventually all 
>    process got oom-score=0 and oom-killer didn't work.
> 
>    https://kerneltrap.org/mailarchive/linux-driver-devel/2010/9/9/6886383
> 
>    DavidR changed oom-score to +1 in such situation. 
> 
>    http://kerneltrap.org/mailarchive/linux-kernel/2010/9/9/4617455
> 
>    But it is completely bognus. If all process have score=1, oom-killer fall
>    back to purely random killer. I expected and explained his patch has
>    its problem at half years ago. but he didn't fix yet.
> 

The resolution with which the oom killer considers memory is at 0.1% of 
system RAM at its highest (smaller when you have a memory controller, 
cpuset, or mempolicy constrained oom).  It considers a task within 0.1% of 
memory of another task to have equal "badness" to kill, we don't break 
ties in between that resolution -- it all depends on which one shows up in 
the tasklist first.  If you disagree with that resolution, which I support 
as being high enough, then you may certainly propose a patch to make it 
even finer at 0.01%, 0.001%, etc.  It would only change oom_badness() to 
range between [0,10000], [0,100000], etc.

> 2) Also half years ago, I did explained oom_adj is used from multiple 
>    applications. And we can't break them. But DavidR didn't fix.
> 

And we didn't.  oom_adj is still there and maps linearly to oom_score_adj; 
you just can't show a single application where that mapping breaks because 
it was based on an actual calculation.

If you would like to cite these "multiple" applications that need to be 
converted to use oom_score_adj (I know of udev), please let me know and 
if they're open-source applications then I will commit to submitting 
patches for them myself.  I believe the two year window is sufficient for 
everyone else, though.

> 3) Also about four month ago, I and kamezawa-san pointed out his patch
>    don't work on memcg. It also haven't been fixed.
> 

I don't know what you're referring to here, sorry.

> In the other hand, You can't explain what worth OOM-rewritten patch has. 
> Because there is nothing. It is only "powerful"(TM) for Google. but 
> instead It has zero worth for every other people. Here is just technical 
> issue. Bah.
> 

Please see my reply to Figo.zhang where I enumerate the four reasons why 
the new userspace tunable is more powerful than oom_adj.

At this point, I can only speculate that your distaste for the new oom 
killer is one of disposition; it seems like everytime you reply to an 
email (or, more regularly, just repost your revert) that you come into it 
with the attitude that my response cannot possibly be correct and that the 
way you see things is exactly as they should be.  If you were to consider 
other people's opinions, however, you may find some common ground that can 
be met.  I certainly did that when I introduced oom_score_adj instead of 
replacing oom_adj immediatley.  I also did it when I removed the forkbomb 
detector from the rewrite.  I also did it when considering swap in the 
heuristic when it initially was only rss.  Andrew is in the position where 
he has to make a judgment call on what should be included and what 
shouldn't and it should be pretty darn clear after you post your revert 
the first time, then the second time, then the third time, then the fourth 
time, and now the fifth time.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15  2:19     ` Andrew Morton
       [not found]       ` <AANLkTik_SDaiu2eQsJ9+4ywLR5K5V1Od-hwop6gwas3F@mail.gmail.com>
@ 2010-11-15  6:57       ` KOSAKI Motohiro
  2010-11-15 10:34         ` David Rientjes
  2010-11-23  7:16         ` KOSAKI Motohiro
  1 sibling, 2 replies; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-15  6:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, Linus Torvalds, LKML, David Rientjes, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

> On Mon, 15 Nov 2010 09:54:14 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > > 2010/11/13 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> > > >
> > > > Please apply this. this patch revert commits of oom changes since v2.6.35.
> > > 
> > > I'm not getting involved in this whole flame-war. You need to convince
> > > Andrew, who has been the person everything went through.
> > 
> > I wonder why he deep silence.
> 
> Nothing to say, really.  Seems each time we're told about a bug or a
> regression, David either fixes the bug or points out why it wasn't a
> bug or why it wasn't a regression or how it was a deliberate behaviour
> change for the better.

Of cource, I denied. He seems to think number of email is meaningful than
how talk about. but it's incorrect and makes no sense. Why not? Also, He
have to talk about logically. "Hey, I think it's not bug" makes no sense.
Such claim don't solve anything. userland is still unhappy. Why not?
I want to quickly action.

I would like to suggest they join and contribute any distro kernel 
maintainance team. Many community based distribution welcome to developrs.
And a bugfix work tell them a lot of thing. which usecase are freqently used,
which bug reports are fequently raised, etc.

That said, If anyone want to change userland ABI, Be carefully. They have
to investigate userland usecase carefully and avoid to break them carefully 
again. If someone think "hey, It's no big matter. userland rewritten can solve
an issue", I strongly disagree. they don't understand why all of userland 
applications rewritten is harmful.

> I just haven't seen any solid reason to be concerned about the state of
> the current oom-killer, sorry.

You can't say "I haven't seen". I always cced you. 

> I'm concerned that you're concerned!  A lot.  When someone such as
> yourself is unhappy with part of MM then I sit up and pay attention. 
> But after all this time I simply don't understand the technical issues
> which you're seeing here.

You should have read my patch descriptions which I sent and my e-mail.

1) About two month ago, Dave hansen observed strange OOM issue because he
   has a big machine and ALL process are not so big. thus, eventually all 
   process got oom-score=0 and oom-killer didn't work.

   https://kerneltrap.org/mailarchive/linux-driver-devel/2010/9/9/6886383

   DavidR changed oom-score to +1 in such situation. 

   http://kerneltrap.org/mailarchive/linux-kernel/2010/9/9/4617455

   But it is completely bognus. If all process have score=1, oom-killer fall
   back to purely random killer. I expected and explained his patch has
   its problem at half years ago. but he didn't fix yet.

2) Also half years ago, I did explained oom_adj is used from multiple 
   applications. And we can't break them. But DavidR didn't fix.

3) Also about four month ago, I and kamezawa-san pointed out his patch
   don't work on memcg. It also haven't been fixed.

In the other hand, You can't explain what worth OOM-rewritten patch has. 
Because there is nothing. It is only "powerful"(TM) for Google. but 
instead It has zero worth for every other people. Here is just technical 
issue. Bah.

And, I just don't understand why some people try to remove or obsolate
oom_adj. It's just eight lines code and It's used from multiple applications.
There is no reason to break userland at all.
--------------------------------------------------------
 178        /*
 179         * Adjust the score by oom_adj.
 180         */
 181        if (oom_adj) {
 182                if (oom_adj > 0) {
 183                        if (!points)
 184                                points = 1;
 185                        points <<= oom_adj;
 186                } else
 187                        points >>= -(oom_adj);
 188        }
--------------------------------------------------------

If you still have a question, please ask me. maybe I can answer all of 
your question.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
       [not found]       ` <AANLkTik_SDaiu2eQsJ9+4ywLR5K5V1Od-hwop6gwas3F@mail.gmail.com>
@ 2010-11-15  4:41         ` Figo.zhang
  0 siblings, 0 replies; 77+ messages in thread
From: Figo.zhang @ 2010-11-15  4:41 UTC (permalink / raw)
  To: Andrew Morton, David Rientjes
  Cc: figo zhang, KOSAKI Motohiro, Linus Torvalds, LKML, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, linux-mm

 >Nothing to say, really.  Seems each time we're told about a bug or a
 >regression, David either fixes the bug or points out why it wasn't a
 >bug or why it wasn't a regression or how it was a deliberate behaviour
 >change for the better.

 >I just haven't seen any solid reason to be concerned about the state of
 >the current oom-killer, sorry.

 >I'm concerned that you're concerned!  A lot.  When someone such as
 >yourself is unhappy with part of MM then I sit up and pay attention.
 >But after all this time I simply don't understand the technical issues
 >which you're seeing here.

we just talk about oom-killer technical issues.

i am doubt that a new rewrite but the athor canot provide some evidence
and experiment result, why did you do that? what is the prominent change

for your new algorithm?

as KOSAKI Motohiro said, "you removed CAP_SYS_RESOURCE condition with
ZERO explanation".

David just said that pls use userspace tunable for protection by
oom_score_adj. but may i ask question:

1. what is your innovation for your new algorithm, the old one have the
same way for user tunable oom_adj.

2. if server like db-server/financial-server have huge import processes
(such as root/hardware access processes)want to be protection, you let

the administrator to find out which processes should be protection. you
will let the  financial-server administrator huge crazy!! and lose so
many money!! ^~^

3. i see your email in LKML, you just said
"I have repeatedly said that the oom killer no longer kills KDE when run

on my desktop in the presence of a memory hogging task that was written
specifically to oom the machine."
http://thread.gmane.org/gmane.linux.kernel.mm/48998

so you just test your new oom_killer algorithm on your desktop with KDE,
so have you provide the detail how you do the test? is it do the
experiment again for anyone and got the same result as your comment ?

as KOSAKI Motohiro said, in reality word, it we makes 5-6 brain
simulation, embedded, desktop, web server,db server, hpc, finance.
Different workloads certenally makes big impact. have you do those
experiments?

i think that technology should base on experiment not on imagine.

Best,
Figo.zhang

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-15  0:54   ` KOSAKI Motohiro
@ 2010-11-15  2:19     ` Andrew Morton
       [not found]       ` <AANLkTik_SDaiu2eQsJ9+4ywLR5K5V1Od-hwop6gwas3F@mail.gmail.com>
  2010-11-15  6:57       ` KOSAKI Motohiro
  0 siblings, 2 replies; 77+ messages in thread
From: Andrew Morton @ 2010-11-15  2:19 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Linus Torvalds, LKML, David Rientjes, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

On Mon, 15 Nov 2010 09:54:14 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > 2010/11/13 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> > >
> > > Please apply this. this patch revert commits of oom changes since v2.6.35.
> > 
> > I'm not getting involved in this whole flame-war. You need to convince
> > Andrew, who has been the person everything went through.
> 
> I wonder why he deep silence.

Nothing to say, really.  Seems each time we're told about a bug or a
regression, David either fixes the bug or points out why it wasn't a
bug or why it wasn't a regression or how it was a deliberate behaviour
change for the better.

I just haven't seen any solid reason to be concerned about the state of
the current oom-killer, sorry.

I'm concerned that you're concerned!  A lot.  When someone such as
yourself is unhappy with part of MM then I sit up and pay attention. 
But after all this time I simply don't understand the technical issues
which you're seeing here.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-14 19:32 ` Linus Torvalds
@ 2010-11-15  0:54   ` KOSAKI Motohiro
  2010-11-15  2:19     ` Andrew Morton
  2010-11-23 23:51   ` KOSAKI Motohiro
  1 sibling, 1 reply; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-15  0:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kosaki.motohiro, LKML, David Rientjes, Andrew Morton, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

> 2010/11/13 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> >
> > Please apply this. this patch revert commits of oom changes since v2.6.35.
> 
> I'm not getting involved in this whole flame-war. You need to convince
> Andrew, who has been the person everything went through.

I wonder why he deep silence. But, _I_ strongly don't want to ignore bug report and
userland complain. I hope to fix any bug as far as my development time is allowed.




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-14  5:07 [PATCH] Revert oom rewrite series KOSAKI Motohiro
  2010-11-14 19:32 ` Linus Torvalds
@ 2010-11-14 21:58 ` David Rientjes
  2010-11-15 23:33   ` Bodo Eggert
  1 sibling, 1 reply; 77+ messages in thread
From: David Rientjes @ 2010-11-14 21:58 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Linus Torvalds, Andrew Morton, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

On Sun, 14 Nov 2010, KOSAKI Motohiro wrote:

> Linus,
> 
> Please apply this. this patch revert commits of oom changes since v2.6.35.
> 
> briefly says, "oom: badness heuristic rewrite" was merges by mistaken.
> It haven't been passed our design nor code review. then multiple bug reports
> has been popped up. I believe evey patches should pass a usecase and a code
> review :-/
> 

That's inaccurate, there haven't been multiple bug reports popping up 
since the rewrite; in fact, there hasn't been a single bug report.

There have been two changes to the oom killer since the rewrite:

 - we now kill all threads sharing the oom killed task that share the ->mm 
   since we can't free any memory without them exiting as well, and

 - we count threads that are immune from oom kill attached to an ->mm so 
   we can avoid needlessly killing tasks that aren't immune themselves but 
   have other threads sharing the ->mm that are.

Both of those changes were needed in the old oom killer as well, they have 
nothing to do with the rewrite.

Also, stating that the new heuristic doesn't address CAP_SYS_RESOURCE 
approrpiately isn't a bug report, it's the desired behavior.  I eliminated 
all of the arbitrary heursitics in the old heuristic that we had the 
remove internally as well so that is predictable as possible and achieves 
the oom killer's sole goal: to kill the most memory-hogging task that is 
eligible to allow memory allocations in the current context to succeed.  
CAP_SYS_RESOURCE threads have full control over their oom killing priority 
by /proc/pid/oom_score_adj and need no consideration in the heuristic by 
default since it otherwise allows for the probability that multiple tasks 
will need to be killed when a CAP_SYS_RESOURCE thread uses an egregious 
amount of memory.

> The problem is, DavidR patches don't refrect real world usecase at all
> and breaking them. He can talk about the userland is wrong. but such
> excuse doesn't solve real world issue. it makes no sense.
> 

As mentioned just a few minutes ago in another thread, there is no 
userspace breakage with the rewrite and you're only complaining here about 
the deprecation of /proc/pid/oom_adj for a period of two years.  Until 
it's removed in 2012 or later, it maps to the linear scale that 
oom_score_adj uses rather than its old exponential scale that was 
unusable for prioritization because of (1) the extremely low resolution, 
and (2) the arbitrary heuristics that preceeded it.

You've proposed various forms of your revert (this is the fifth one) and 
I've responded in a very respectful and technical way each time even 
though you have repeatedly called me stupid.  Linus is under the 
impression that this is some kind of flamewar when in reality it's only a 
desperate attempt of yours to start one, this kind of thing just really 
bounces off of me on a personal level.  I will, however, continue to 
remain professional.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] Revert oom rewrite series
  2010-11-14  5:07 [PATCH] Revert oom rewrite series KOSAKI Motohiro
@ 2010-11-14 19:32 ` Linus Torvalds
  2010-11-15  0:54   ` KOSAKI Motohiro
  2010-11-23 23:51   ` KOSAKI Motohiro
  2010-11-14 21:58 ` David Rientjes
  1 sibling, 2 replies; 77+ messages in thread
From: Linus Torvalds @ 2010-11-14 19:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, David Rientjes, Andrew Morton, Ying Han, Bodo Eggert,
	Mandeep Singh Baines, Figo.zhang

2010/11/13 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
>
> Please apply this. this patch revert commits of oom changes since v2.6.35.

I'm not getting involved in this whole flame-war. You need to convince
Andrew, who has been the person everything went through.

                    Linus

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH] Revert oom rewrite series
@ 2010-11-14  5:07 KOSAKI Motohiro
  2010-11-14 19:32 ` Linus Torvalds
  2010-11-14 21:58 ` David Rientjes
  0 siblings, 2 replies; 77+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14  5:07 UTC (permalink / raw)
  To: LKML, Linus Torvalds
  Cc: kosaki.motohiro, David Rientjes, Andrew Morton, Ying Han,
	Bodo Eggert, Mandeep Singh Baines, Figo.zhang

Linus,

Please apply this. this patch revert commits of oom changes since v2.6.35.

briefly says, "oom: badness heuristic rewrite" was merges by mistaken.
It haven't been passed our design nor code review. then multiple bug reports
has been popped up. I believe evey patches should pass a usecase and a code
review :-/

The problem is, DavidR patches don't refrect real world usecase at all
and breaking them. He can talk about the userland is wrong. but such
excuse doesn't solve real world issue. it makes no sense.

I hope every developers keep honestly development. googlers are NOT 
exception.


David, at least rss based oom score was passed our design review. 
So, if you will resubmit such part, we will ack it. please remember it.
Also, I can accept oom_score_adj feature if you can remove imcomatibility 
issue. OK?

Linus, if you want to check the patch. please use following way.
  % git diff a63d83f427fbce97a6cea0db2e64b0eb8435cd10^ mm/oom_kill.c include/linux/oom.h fs/proc/base.c


Thanks.

--------------------------------------------------------------------------
Subject: [PATCH] Revert oom rewrite series

This reverts following commits. They has broke an ABI and made multiple
enduser claim.

9c28ab662a8e3d19d07077ac0a8931c015e8afec Revert "oom: badness heuristic rewrite"
74cd8c6cb3e093c4d67ac3eb3581e246e4981dad Revert "oom: deprecate oom_adj tunable"
79a0bd5796e754c4b4e22071c4edddef3517d010 Revert "memcg: use find_lock_task_mm() in memory cgroups oom"
a465ef80c2a9fe73c85029fcea5c68ffee8dbb69 Revert "oom: always return a badness score of non-zero for eligible tas
516fcbb0c45d943df1b739d3be3d417aee2275f3 Revert "oom: filter unkillable tasks from tasklist dump"
b1c98f95a7954c450dadd809280f86863ea9d05d Revert "oom: add per-mm oom disable count"
fd79f3f47c82a0af5288afe7556905dd171bfc43 Revert "oom: avoid killing a task if a thread sharing its mm cannot be
2d72175528870dcef577db4a2a0b49d819c6eaff Revert "oom: kill all threads sharing oom killed task's mm"
be212960618ddcdb9526ce2cb73fd081fd3e90ea Revert "oom: rewrite error handling for oom_adj and oom_score_adj tunab
1b17c41599c594c7d11ef415a92d47c205fe89ea Revert "oom: fix locking for oom_adj and oom_score_adj"

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 Documentation/feature-removal-schedule.txt |   25 ---
 Documentation/filesystems/proc.txt         |   97 ++++-----
 fs/exec.c                                  |    5 -
 fs/proc/base.c                             |  176 ++--------------
 include/linux/memcontrol.h                 |    8 -
 include/linux/mm_types.h                   |    2 -
 include/linux/oom.h                        |   19 +--
 include/linux/sched.h                      |    3 +-
 kernel/exit.c                              |    3 -
 kernel/fork.c                              |   16 +--
 mm/memcontrol.c                            |   28 +---
 mm/oom_kill.c                              |  323 ++++++++++++++--------------
 12 files changed, 227 insertions(+), 478 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index d8f36f9..9af16b9 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -166,31 +166,6 @@ Who:	Eric Biederman <ebiederm@xmission.com>
 
 ---------------------------
 
-What:	/proc/<pid>/oom_adj
-When:	August 2012
-Why:	/proc/<pid>/oom_adj allows userspace to influence the oom killer's
-	badness heuristic used to determine which task to kill when the kernel
-	is out of memory.
-
-	The badness heuristic has since been rewritten since the introduction of
-	this tunable such that its meaning is deprecated.  The value was
-	implemented as a bitshift on a score generated by the badness()
-	function that did not have any precise units of measure.  With the
-	rewrite, the score is given as a proportion of available memory to the
-	task allocating pages, so using a bitshift which grows the score
-	exponentially is, thus, impossible to tune with fine granularity.
-
-	A much more powerful interface, /proc/<pid>/oom_score_adj, was
-	introduced with the oom killer rewrite that allows users to increase or
-	decrease the badness() score linearly.  This interface will replace
-	/proc/<pid>/oom_adj.
-
-	A warning will be emitted to the kernel log if an application uses this
-	deprecated interface.  After it is printed once, future warnings will be
-	suppressed until the kernel is rebooted.
-
----------------------------
-
 What:	remove EXPORT_SYMBOL(kernel_thread)
 When:	August 2006
 Files:	arch/*/kernel/*_ksyms.c
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index e73df27..030e3a1 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -33,8 +33,7 @@ Table of Contents
   2	Modifying System Parameters
 
   3	Per-Process Parameters
-  3.1	/proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
-								score
+  3.1	/proc/<pid>/oom_adj - Adjust the oom-killer score
   3.2	/proc/<pid>/oom_score - Display current oom-killer score
   3.3	/proc/<pid>/io - Display the IO accounting fields
   3.4	/proc/<pid>/coredump_filter - Core dump filtering settings
@@ -1246,64 +1245,42 @@ of the kernel.
 CHAPTER 3: PER-PROCESS PARAMETERS
 ------------------------------------------------------------------------------
 
-3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
---------------------------------------------------------------------------------
-
-These file can be used to adjust the badness heuristic used to select which
-process gets killed in out of memory conditions.
-
-The badness heuristic assigns a value to each candidate task ranging from 0
-(never kill) to 1000 (always kill) to determine which process is targeted.  The
-units are roughly a proportion along that range of allowed memory the process
-may allocate from based on an estimation of its current memory and swap use.
-For example, if a task is using all allowed memory, its badness score will be
-1000.  If it is using half of its allowed memory, its score will be 500.
-
-There is an additional factor included in the badness score: root
-processes are given 3% extra memory over other tasks.
-
-The amount of "allowed" memory depends on the context in which the oom killer
-was called.  If it is due to the memory assigned to the allocating task's cpuset
-being exhausted, the allowed memory represents the set of mems assigned to that
-cpuset.  If it is due to a mempolicy's node(s) being exhausted, the allowed
-memory represents the set of mempolicy nodes.  If it is due to a memory
-limit (or swap limit) being reached, the allowed memory is that configured
-limit.  Finally, if it is due to the entire system being out of memory, the
-allowed memory represents all allocatable resources.
-
-The value of /proc/<pid>/oom_score_adj is added to the badness score before it
-is used to determine which task to kill.  Acceptable values range from -1000
-(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX).  This allows userspace to
-polarize the preference for oom killing either by always preferring a certain
-task or completely disabling it.  The lowest possible value, -1000, is
-equivalent to disabling oom killing entirely for that task since it will always
-report a badness score of 0.
-
-Consequently, it is very simple for userspace to define the amount of memory to
-consider for each task.  Setting a /proc/<pid>/oom_score_adj value of +500, for
-example, is roughly equivalent to allowing the remainder of tasks sharing the
-same system, cpuset, mempolicy, or memory controller resources to use at least
-50% more memory.  A value of -500, on the other hand, would be roughly
-equivalent to discounting 50% of the task's allowed memory from being considered
-as scoring against the task.
-
-For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also
-be used to tune the badness score.  Its acceptable values range from -16
-(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
-(OOM_DISABLE) to disable oom killing entirely for that task.  Its value is
-scaled linearly with /proc/<pid>/oom_score_adj.
-
-Writing to /proc/<pid>/oom_score_adj or /proc/<pid>/oom_adj will change the
-other with its scaled value.
-
-NOTICE: /proc/<pid>/oom_adj is deprecated and will be removed, please see
-Documentation/feature-removal-schedule.txt.
-
-Caveat: when a parent task is selected, the oom killer will sacrifice any first
-generation children with seperate address spaces instead, if possible.  This
-avoids servers and important system daemons from being killed and loses the
-minimal amount of work.
-
+3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
+------------------------------------------------------
+
+This file can be used to adjust the score used to select which processes
+should be killed in an  out-of-memory  situation.  Giving it a high score will
+increase the likelihood of this process being killed by the oom-killer.  Valid
+values are in the range -16 to +15, plus the special value -17, which disables
+oom-killing altogether for this process.
+
+The process to be killed in an out-of-memory situation is selected among all others
+based on its badness score. This value equals the original memory size of the process
+and is then updated according to its CPU time (utime + stime) and the
+run time (uptime - start time). The longer it runs the smaller is the score.
+Badness score is divided by the square root of the CPU time and then by
+the double square root of the run time.
+
+Swapped out tasks are killed first. Half of each child's memory size is added to
+the parent's score if they do not share the same memory. Thus forking servers
+are the prime candidates to be killed. Having only one 'hungry' child will make
+parent less preferable than the child.
+
+/proc/<pid>/oom_score shows process' current badness score.
+
+The following heuristics are then applied:
+ * if the task was reniced, its score doubles
+ * superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE
+ 	or CAP_SYS_RAWIO) have their score divided by 4
+ * if oom condition happened in one cpuset and checked process does not belong
+ 	to it, its score is divided by 8
+ * the resulting score is multiplied by two to the power of oom_adj, i.e.
+	points <<= oom_adj when it is positive and
+	points >>= -(oom_adj) otherwise
+
+The task with the highest badness score is then selected and its children
+are killed, process itself will be killed in an OOM situation when it does
+not have children or some of them disabled oom like described above.
 
 3.2 /proc/<pid>/oom_score - Display current oom-killer score
 -------------------------------------------------------------
diff --git a/fs/exec.c b/fs/exec.c
index 99d33a1..47986fb 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -54,7 +54,6 @@
 #include <linux/fsnotify.h>
 #include <linux/fs_struct.h>
 #include <linux/pipe_fs_i.h>
-#include <linux/oom.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -766,10 +765,6 @@ static int exec_mmap(struct mm_struct *mm)
 	tsk->mm = mm;
 	tsk->active_mm = mm;
 	activate_mm(active_mm, mm);
-	if (old_mm && tsk->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
-		atomic_dec(&old_mm->oom_disable_count);
-		atomic_inc(&tsk->mm->oom_disable_count);
-	}
 	task_unlock(tsk);
 	arch_pick_mmap_layout(mm);
 	if (old_mm) {
diff --git a/fs/proc/base.c b/fs/proc/base.c
index f3d02ca..ed7d18e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -63,7 +63,6 @@
 #include <linux/namei.h>
 #include <linux/mnt_namespace.h>
 #include <linux/mm.h>
-#include <linux/swap.h>
 #include <linux/rcupdate.h>
 #include <linux/kallsyms.h>
 #include <linux/stacktrace.h>
@@ -431,11 +430,12 @@ static const struct file_operations proc_lstats_operations = {
 static int proc_oom_score(struct task_struct *task, char *buffer)
 {
 	unsigned long points = 0;
+	struct timespec uptime;
 
+	do_posix_clock_monotonic_gettime(&uptime);
 	read_lock(&tasklist_lock);
 	if (pid_alive(task))
-		points = oom_badness(task, NULL, NULL,
-					totalram_pages + total_swap_pages);
+		points = badness(task, NULL, NULL, uptime.tv_sec);
 	read_unlock(&tasklist_lock);
 	return sprintf(buffer, "%lu\n", points);
 }
@@ -1025,74 +1025,36 @@ static ssize_t oom_adjust_write(struct file *file, const char __user *buf,
 	memset(buffer, 0, sizeof(buffer));
 	if (count > sizeof(buffer) - 1)
 		count = sizeof(buffer) - 1;
-	if (copy_from_user(buffer, buf, count)) {
-		err = -EFAULT;
-		goto out;
-	}
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
 
 	err = strict_strtol(strstrip(buffer), 0, &oom_adjust);
 	if (err)
-		goto out;
+		return -EINVAL;
 	if ((oom_adjust < OOM_ADJUST_MIN || oom_adjust > OOM_ADJUST_MAX) &&
-	     oom_adjust != OOM_DISABLE) {
-		err = -EINVAL;
-		goto out;
-	}
+	     oom_adjust != OOM_DISABLE)
+		return -EINVAL;
 
 	task = get_proc_task(file->f_path.dentry->d_inode);
-	if (!task) {
-		err = -ESRCH;
-		goto out;
-	}
-
-	task_lock(task);
-	if (!task->mm) {
-		err = -EINVAL;
-		goto err_task_lock;
-	}
-
+	if (!task)
+		return -ESRCH;
 	if (!lock_task_sighand(task, &flags)) {
-		err = -ESRCH;
-		goto err_task_lock;
+		put_task_struct(task);
+		return -ESRCH;
 	}
 
 	if (oom_adjust < task->signal->oom_adj && !capable(CAP_SYS_RESOURCE)) {
-		err = -EACCES;
-		goto err_sighand;
-	}
-
-	if (oom_adjust != task->signal->oom_adj) {
-		if (oom_adjust == OOM_DISABLE)
-			atomic_inc(&task->mm->oom_disable_count);
-		if (task->signal->oom_adj == OOM_DISABLE)
-			atomic_dec(&task->mm->oom_disable_count);
+		unlock_task_sighand(task, &flags);
+		put_task_struct(task);
+		return -EACCES;
 	}
 
-	/*
-	 * Warn that /proc/pid/oom_adj is deprecated, see
-	 * Documentation/feature-removal-schedule.txt.
-	 */
-	printk_once(KERN_WARNING "%s (%d): /proc/%d/oom_adj is deprecated, "
-			"please use /proc/%d/oom_score_adj instead.\n",
-			current->comm, task_pid_nr(current),
-			task_pid_nr(task), task_pid_nr(task));
 	task->signal->oom_adj = oom_adjust;
-	/*
-	 * Scale /proc/pid/oom_score_adj appropriately ensuring that a maximum
-	 * value is always attainable.
-	 */
-	if (task->signal->oom_adj == OOM_ADJUST_MAX)
-		task->signal->oom_score_adj = OOM_SCORE_ADJ_MAX;
-	else
-		task->signal->oom_score_adj = (oom_adjust * OOM_SCORE_ADJ_MAX) /
-								-OOM_DISABLE;
-err_sighand:
+
 	unlock_task_sighand(task, &flags);
-err_task_lock:
-	task_unlock(task);
 	put_task_struct(task);
-out:
-	return err < 0 ? err : count;
+
+	return count;
 }
 
 static const struct file_operations proc_oom_adjust_operations = {
@@ -1101,106 +1063,6 @@ static const struct file_operations proc_oom_adjust_operations = {
 	.llseek		= generic_file_llseek,
 };
 
-static ssize_t oom_score_adj_read(struct file *file, char __user *buf,
-					size_t count, loff_t *ppos)
-{
-	struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
-	char buffer[PROC_NUMBUF];
-	int oom_score_adj = OOM_SCORE_ADJ_MIN;
-	unsigned long flags;
-	size_t len;
-
-	if (!task)
-		return -ESRCH;
-	if (lock_task_sighand(task, &flags)) {
-		oom_score_adj = task->signal->oom_score_adj;
-		unlock_task_sighand(task, &flags);
-	}
-	put_task_struct(task);
-	len = snprintf(buffer, sizeof(buffer), "%d\n", oom_score_adj);
-	return simple_read_from_buffer(buf, count, ppos, buffer, len);
-}
-
-static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
-					size_t count, loff_t *ppos)
-{
-	struct task_struct *task;
-	char buffer[PROC_NUMBUF];
-	unsigned long flags;
-	long oom_score_adj;
-	int err;
-
-	memset(buffer, 0, sizeof(buffer));
-	if (count > sizeof(buffer) - 1)
-		count = sizeof(buffer) - 1;
-	if (copy_from_user(buffer, buf, count)) {
-		err = -EFAULT;
-		goto out;
-	}
-
-	err = strict_strtol(strstrip(buffer), 0, &oom_score_adj);
-	if (err)
-		goto out;
-	if (oom_score_adj < OOM_SCORE_ADJ_MIN ||
-			oom_score_adj > OOM_SCORE_ADJ_MAX) {
-		err = -EINVAL;
-		goto out;
-	}
-
-	task = get_proc_task(file->f_path.dentry->d_inode);
-	if (!task) {
-		err = -ESRCH;
-		goto out;
-	}
-
-	task_lock(task);
-	if (!task->mm) {
-		err = -EINVAL;
-		goto err_task_lock;
-	}
-
-	if (!lock_task_sighand(task, &flags)) {
-		err = -ESRCH;
-		goto err_task_lock;
-	}
-
-	if (oom_score_adj < task->signal->oom_score_adj &&
-			!capable(CAP_SYS_RESOURCE)) {
-		err = -EACCES;
-		goto err_sighand;
-	}
-
-	if (oom_score_adj != task->signal->oom_score_adj) {
-		if (oom_score_adj == OOM_SCORE_ADJ_MIN)
-			atomic_inc(&task->mm->oom_disable_count);
-		if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
-			atomic_dec(&task->mm->oom_disable_count);
-	}
-	task->signal->oom_score_adj = oom_score_adj;
-	/*
-	 * Scale /proc/pid/oom_adj appropriately ensuring that OOM_DISABLE is
-	 * always attainable.
-	 */
-	if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
-		task->signal->oom_adj = OOM_DISABLE;
-	else
-		task->signal->oom_adj = (oom_score_adj * OOM_ADJUST_MAX) /
-							OOM_SCORE_ADJ_MAX;
-err_sighand:
-	unlock_task_sighand(task, &flags);
-err_task_lock:
-	task_unlock(task);
-	put_task_struct(task);
-out:
-	return err < 0 ? err : count;
-}
-
-static const struct file_operations proc_oom_score_adj_operations = {
-	.read		= oom_score_adj_read,
-	.write		= oom_score_adj_write,
-	.llseek		= default_llseek,
-};
-
 #ifdef CONFIG_AUDITSYSCALL
 #define TMPBUFLEN 21
 static ssize_t proc_loginuid_read(struct file * file, char __user * buf,
@@ -2779,7 +2641,6 @@ static const struct pid_entry tgid_base_stuff[] = {
 #endif
 	INF("oom_score",  S_IRUGO, proc_oom_score),
 	REG("oom_adj",    S_IRUGO|S_IWUSR, proc_oom_adjust_operations),
-	REG("oom_score_adj", S_IRUGO|S_IWUSR, proc_oom_score_adj_operations),
 #ifdef CONFIG_AUDITSYSCALL
 	REG("loginuid",   S_IWUSR|S_IRUGO, proc_loginuid_operations),
 	REG("sessionid",  S_IRUGO, proc_sessionid_operations),
@@ -3115,7 +2976,6 @@ static const struct pid_entry tid_base_stuff[] = {
 #endif
 	INF("oom_score", S_IRUGO, proc_oom_score),
 	REG("oom_adj",   S_IRUGO|S_IWUSR, proc_oom_adjust_operations),
-	REG("oom_score_adj", S_IRUGO|S_IWUSR, proc_oom_score_adj_operations),
 #ifdef CONFIG_AUDITSYSCALL
 	REG("loginuid",  S_IWUSR|S_IRUGO, proc_loginuid_operations),
 	REG("sessionid",  S_IRUSR, proc_sessionid_operations),
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 159a076..b13fc2a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -124,8 +124,6 @@ static inline bool mem_cgroup_disabled(void)
 void mem_cgroup_update_file_mapped(struct page *page, int val);
 unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 						gfp_t gfp_mask);
-u64 mem_cgroup_get_limit(struct mem_cgroup *mem);
-
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct mem_cgroup;
 
@@ -305,12 +303,6 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 	return 0;
 }
 
-static inline
-u64 mem_cgroup_get_limit(struct mem_cgroup *mem)
-{
-	return 0;
-}
-
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index bb7288a..cb57d65 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -310,8 +310,6 @@ struct mm_struct {
 #ifdef CONFIG_MMU_NOTIFIER
 	struct mmu_notifier_mm *mmu_notifier_mm;
 #endif
-	/* How many tasks sharing this mm are OOM_DISABLE */
-	atomic_t oom_disable_count;
 };
 
 /* Future-safe accessor for struct mm_struct's cpu_vm_mask. */
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 5e3aa83..40e5e3a 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -1,27 +1,14 @@
 #ifndef __INCLUDE_LINUX_OOM_H
 #define __INCLUDE_LINUX_OOM_H
 
-/*
- * /proc/<pid>/oom_adj is deprecated, see
- * Documentation/feature-removal-schedule.txt.
- *
- * /proc/<pid>/oom_adj set to -17 protects from the oom-killer
- */
+/* /proc/<pid>/oom_adj set to -17 protects from the oom-killer */
 #define OOM_DISABLE (-17)
 /* inclusive */
 #define OOM_ADJUST_MIN (-16)
 #define OOM_ADJUST_MAX 15
 
-/*
- * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for
- * pid.
- */
-#define OOM_SCORE_ADJ_MIN	(-1000)
-#define OOM_SCORE_ADJ_MAX	1000
-
 #ifdef __KERNEL__
 
-#include <linux/sched.h>
 #include <linux/types.h>
 #include <linux/nodemask.h>
 
@@ -40,8 +27,6 @@ enum oom_constraint {
 	CONSTRAINT_MEMCG,
 };
 
-extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
-			const nodemask_t *nodemask, unsigned long totalpages);
 extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
 extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
 
@@ -66,8 +51,6 @@ static inline void oom_killer_enable(void)
 extern unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
 		      const nodemask_t *nodemask, unsigned long uptime);
 
-extern struct task_struct *find_lock_task_mm(struct task_struct *p);
-
 /* sysctls */
 extern int sysctl_oom_dump_tasks;
 extern int sysctl_oom_kill_allocating_task;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d0036e5..a35acb6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -624,8 +624,7 @@ struct signal_struct {
 	struct tty_audit_buf *tty_audit_buf;
 #endif
 
-	int oom_adj;		/* OOM kill score adjustment (bit shift) */
-	int oom_score_adj;	/* OOM kill score adjustment */
+	int oom_adj;	/* OOM kill score adjustment (bit shift) */
 
 	struct mutex cred_guard_mutex;	/* guard against foreign influences on
 					 * credential calculations
diff --git a/kernel/exit.c b/kernel/exit.c
index 21aa7b3..c806406 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -50,7 +50,6 @@
 #include <linux/perf_event.h>
 #include <trace/events/sched.h>
 #include <linux/hw_breakpoint.h>
-#include <linux/oom.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -696,8 +695,6 @@ static void exit_mm(struct task_struct * tsk)
 	enter_lazy_tlb(mm, current);
 	/* We don't want this task to be frozen prematurely */
 	clear_freeze_flag(tsk);
-	if (tsk->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
-		atomic_dec(&mm->oom_disable_count);
 	task_unlock(tsk);
 	mm_update_next_owner(mm);
 	mmput(mm);
diff --git a/kernel/fork.c b/kernel/fork.c
index 3b159c5..cca5e8b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -65,7 +65,6 @@
 #include <linux/perf_event.h>
 #include <linux/posix-timers.h>
 #include <linux/user-return-notifier.h>
-#include <linux/oom.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -489,7 +488,6 @@ static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p)
 	mm->cached_hole_size = ~0UL;
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
-	atomic_set(&mm->oom_disable_count, 0);
 
 	if (likely(!mm_alloc_pgd(mm))) {
 		mm->def_flags = 0;
@@ -743,8 +741,6 @@ good_mm:
 	/* Initializing for Swap token stuff */
 	mm->token_priority = 0;
 	mm->last_interval = 0;
-	if (tsk->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
-		atomic_inc(&mm->oom_disable_count);
 
 	tsk->mm = mm;
 	tsk->active_mm = mm;
@@ -906,7 +902,6 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	tty_audit_fork(sig);
 
 	sig->oom_adj = current->signal->oom_adj;
-	sig->oom_score_adj = current->signal->oom_score_adj;
 
 	mutex_init(&sig->cred_guard_mutex);
 
@@ -1305,13 +1300,8 @@ bad_fork_cleanup_io:
 bad_fork_cleanup_namespaces:
 	exit_task_namespaces(p);
 bad_fork_cleanup_mm:
-	if (p->mm) {
-		task_lock(p);
-		if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
-			atomic_dec(&p->mm->oom_disable_count);
-		task_unlock(p);
+	if (p->mm)
 		mmput(p->mm);
-	}
 bad_fork_cleanup_signal:
 	if (!(clone_flags & CLONE_THREAD))
 		free_signal_struct(p->signal);
@@ -1704,10 +1694,6 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
 			active_mm = current->active_mm;
 			current->mm = new_mm;
 			current->active_mm = new_mm;
-			if (current->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
-				atomic_dec(&mm->oom_disable_count);
-				atomic_inc(&new_mm->oom_disable_count);
-			}
 			activate_mm(active_mm, new_mm);
 			new_mm = mm;
 		}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9a99cfa..c628370 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -47,7 +47,6 @@
 #include <linux/mm_inline.h>
 #include <linux/page_cgroup.h>
 #include <linux/cpu.h>
-#include <linux/oom.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -917,13 +916,10 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 {
 	int ret;
 	struct mem_cgroup *curr = NULL;
-	struct task_struct *p;
 
-	p = find_lock_task_mm(task);
-	if (!p)
-		return 0;
-	curr = try_get_mem_cgroup_from_mm(p->mm);
-	task_unlock(p);
+	task_lock(task);
+	curr = try_get_mem_cgroup_from_mm(task->mm);
+	task_unlock(task);
 	if (!curr)
 		return 0;
 	/*
@@ -1297,24 +1293,6 @@ static int mem_cgroup_count_children(struct mem_cgroup *mem)
 }
 
 /*
- * Return the memory (and swap, if configured) limit for a memcg.
- */
-u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
-{
-	u64 limit;
-	u64 memsw;
-
-	limit = res_counter_read_u64(&memcg->res, RES_LIMIT) +
-			total_swap_pages;
-	memsw = res_counter_read_u64(&memcg->memsw, RES_LIMIT);
-	/*
-	 * If memsw is finite and limits the amount of swap space available
-	 * to this memcg, return that limit.
-	 */
-	return min(limit, memsw);
-}
-
-/*
  * Visit the first child (need not be the first child as per the ordering
  * of the cgroup list, since we track last_scanned_child) of @mem and use
  * that to reclaim free pages from.
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7dcca55..f251ddb 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -4,8 +4,6 @@
  *  Copyright (C)  1998,2000  Rik van Riel
  *	Thanks go out to Claus Fischer for some serious inspiration and
  *	for goading me into coding this file...
- *  Copyright (C)  2010  Google, Inc.
- *	Rewritten by David Rientjes
  *
  *  The routines in this file are used to kill a process when
  *  we're seriously out of memory. This gets called from __alloc_pages()
@@ -36,6 +34,7 @@ int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 static DEFINE_SPINLOCK(zone_scan_lock);
+/* #define DEBUG */
 
 #ifdef CONFIG_NUMA
 /**
@@ -106,7 +105,7 @@ static void boost_dying_task_prio(struct task_struct *p,
  * pointer.  Return p, or any of its subthreads with a valid ->mm, with
  * task_lock() held.
  */
-struct task_struct *find_lock_task_mm(struct task_struct *p)
+static struct task_struct *find_lock_task_mm(struct task_struct *p)
 {
 	struct task_struct *t = p;
 
@@ -121,8 +120,8 @@ struct task_struct *find_lock_task_mm(struct task_struct *p)
 }
 
 /* return true if the task is not adequate as candidate victim task. */
-static bool oom_unkillable_task(struct task_struct *p,
-		const struct mem_cgroup *mem, const nodemask_t *nodemask)
+static bool oom_unkillable_task(struct task_struct *p, struct mem_cgroup *mem,
+			   const nodemask_t *nodemask)
 {
 	if (is_global_init(p))
 		return true;
@@ -141,82 +140,137 @@ static bool oom_unkillable_task(struct task_struct *p,
 }
 
 /**
- * oom_badness - heuristic function to determine which candidate task to kill
+ * badness - calculate a numeric value for how bad this task has been
  * @p: task struct of which task we should calculate
- * @totalpages: total present RAM allowed for page allocation
+ * @uptime: current uptime in seconds
  *
- * The heuristic for determining which task to kill is made to be as simple and
- * predictable as possible.  The goal is to return the highest value for the
- * task consuming the most memory to avoid subsequent oom failures.
+ * The formula used is relatively simple and documented inline in the
+ * function. The main rationale is that we want to select a good task
+ * to kill when we run out of memory.
+ *
+ * Good in this context means that:
+ * 1) we lose the minimum amount of work done
+ * 2) we recover a large amount of memory
+ * 3) we don't kill anything innocent of eating tons of memory
+ * 4) we want to kill the minimum amount of processes (one)
+ * 5) we try to kill the process the user expects us to kill, this
+ *    algorithm has been meticulously tuned to meet the principle
+ *    of least surprise ... (be careful when you change it)
  */
-unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
-		      const nodemask_t *nodemask, unsigned long totalpages)
+unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
+		      const nodemask_t *nodemask, unsigned long uptime)
 {
-	int points;
+	unsigned long points, cpu_time, run_time;
+	struct task_struct *child;
+	struct task_struct *c, *t;
+	int oom_adj = p->signal->oom_adj;
+	struct task_cputime task_time;
+	unsigned long utime;
+	unsigned long stime;
 
 	if (oom_unkillable_task(p, mem, nodemask))
 		return 0;
+	if (oom_adj == OOM_DISABLE)
+		return 0;
 
 	p = find_lock_task_mm(p);
 	if (!p)
 		return 0;
 
 	/*
-	 * Shortcut check for a thread sharing p->mm that is OOM_SCORE_ADJ_MIN
-	 * so the entire heuristic doesn't need to be executed for something
-	 * that cannot be killed.
+	 * The memory size of the process is the basis for the badness.
 	 */
-	if (atomic_read(&p->mm->oom_disable_count)) {
-		task_unlock(p);
-		return 0;
-	}
+	points = p->mm->total_vm;
+	task_unlock(p);
 
 	/*
-	 * When the PF_OOM_ORIGIN bit is set, it indicates the task should have
-	 * priority for oom killing.
+	 * swapoff can easily use up all memory, so kill those first.
 	 */
-	if (p->flags & PF_OOM_ORIGIN) {
-		task_unlock(p);
-		return 1000;
-	}
+	if (p->flags & PF_OOM_ORIGIN)
+		return ULONG_MAX;
 
 	/*
-	 * The memory controller may have a limit of 0 bytes, so avoid a divide
-	 * by zero, if necessary.
+	 * Processes which fork a lot of child processes are likely
+	 * a good choice. We add half the vmsize of the children if they
+	 * have an own mm. This prevents forking servers to flood the
+	 * machine with an endless amount of children. In case a single
+	 * child is eating the vast majority of memory, adding only half
+	 * to the parents will make the child our kill candidate of choice.
 	 */
-	if (!totalpages)
-		totalpages = 1;
+	t = p;
+	do {
+		list_for_each_entry(c, &t->children, sibling) {
+			child = find_lock_task_mm(c);
+			if (child) {
+				if (child->mm != p->mm)
+					points += child->mm->total_vm/2 + 1;
+				task_unlock(child);
+			}
+		}
+	} while_each_thread(p, t);
 
 	/*
-	 * The baseline for the badness score is the proportion of RAM that each
-	 * task's rss and swap space use.
+	 * CPU time is in tens of seconds and run time is in thousands
+         * of seconds. There is no particular reason for this other than
+         * that it turned out to work very well in practice.
 	 */
-	points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
-			totalpages;
-	task_unlock(p);
+	thread_group_cputime(p, &task_time);
+	utime = cputime_to_jiffies(task_time.utime);
+	stime = cputime_to_jiffies(task_time.stime);
+	cpu_time = (utime + stime) >> (SHIFT_HZ + 3);
+
+
+	if (uptime >= p->start_time.tv_sec)
+		run_time = (uptime - p->start_time.tv_sec) >> 10;
+	else
+		run_time = 0;
+
+	if (cpu_time)
+		points /= int_sqrt(cpu_time);
+	if (run_time)
+		points /= int_sqrt(int_sqrt(run_time));
 
 	/*
-	 * Root processes get 3% bonus, just like the __vm_enough_memory()
-	 * implementation used by LSMs.
+	 * Niced processes are most likely less important, so double
+	 * their badness points.
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
-		points -= 30;
+	if (task_nice(p) > 0)
+		points *= 2;
 
 	/*
-	 * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
-	 * either completely disable oom killing or always prefer a certain
-	 * task.
+	 * Superuser processes are usually more important, so we make it
+	 * less likely that we kill those.
 	 */
-	points += p->signal->oom_score_adj;
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
+	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
+		points /= 4;
 
 	/*
-	 * Never return 0 for an eligible task that may be killed since it's
-	 * possible that no single user task uses more than 0.1% of memory and
-	 * no single admin tasks uses more than 3.0%.
+	 * We don't want to kill a process with direct hardware access.
+	 * Not only could that mess up the hardware, but usually users
+	 * tend to only have this flag set on applications they think
+	 * of as important.
 	 */
-	if (points <= 0)
-		return 1;
-	return (points < 1000) ? points : 1000;
+	if (has_capability_noaudit(p, CAP_SYS_RAWIO))
+		points /= 4;
+
+	/*
+	 * Adjust the score by oom_adj.
+	 */
+	if (oom_adj) {
+		if (oom_adj > 0) {
+			if (!points)
+				points = 1;
+			points <<= oom_adj;
+		} else
+			points >>= -(oom_adj);
+	}
+
+#ifdef DEBUG
+	printk(KERN_DEBUG "OOMkill: task %d (%s) got %lu points\n",
+	p->pid, p->comm, points);
+#endif
+	return points;
 }
 
 /*
@@ -224,20 +278,12 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
  */
 #ifdef CONFIG_NUMA
 static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
-				gfp_t gfp_mask, nodemask_t *nodemask,
-				unsigned long *totalpages)
+				    gfp_t gfp_mask, nodemask_t *nodemask)
 {
 	struct zone *zone;
 	struct zoneref *z;
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
-	bool cpuset_limited = false;
-	int nid;
-
-	/* Default to all available memory */
-	*totalpages = totalram_pages + total_swap_pages;
 
-	if (!zonelist)
-		return CONSTRAINT_NONE;
 	/*
 	 * Reach here only when __GFP_NOFAIL is used. So, we should avoid
 	 * to kill current.We have to random task kill in this case.
@@ -247,37 +293,26 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
 		return CONSTRAINT_NONE;
 
 	/*
-	 * This is not a __GFP_THISNODE allocation, so a truncated nodemask in
-	 * the page allocator means a mempolicy is in effect.  Cpuset policy
-	 * is enforced in get_page_from_freelist().
+	 * The nodemask here is a nodemask passed to alloc_pages(). Now,
+	 * cpuset doesn't use this nodemask for its hardwall/softwall/hierarchy
+	 * feature. mempolicy is an only user of nodemask here.
+	 * check mempolicy's nodemask contains all N_HIGH_MEMORY
 	 */
-	if (nodemask && !nodes_subset(node_states[N_HIGH_MEMORY], *nodemask)) {
-		*totalpages = total_swap_pages;
-		for_each_node_mask(nid, *nodemask)
-			*totalpages += node_spanned_pages(nid);
+	if (nodemask && !nodes_subset(node_states[N_HIGH_MEMORY], *nodemask))
 		return CONSTRAINT_MEMORY_POLICY;
-	}
 
 	/* Check this allocation failure is caused by cpuset's wall function */
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 			high_zoneidx, nodemask)
 		if (!cpuset_zone_allowed_softwall(zone, gfp_mask))
-			cpuset_limited = true;
+			return CONSTRAINT_CPUSET;
 
-	if (cpuset_limited) {
-		*totalpages = total_swap_pages;
-		for_each_node_mask(nid, cpuset_current_mems_allowed)
-			*totalpages += node_spanned_pages(nid);
-		return CONSTRAINT_CPUSET;
-	}
 	return CONSTRAINT_NONE;
 }
 #else
 static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
-				gfp_t gfp_mask, nodemask_t *nodemask,
-				unsigned long *totalpages)
+				gfp_t gfp_mask, nodemask_t *nodemask)
 {
-	*totalpages = totalram_pages + total_swap_pages;
 	return CONSTRAINT_NONE;
 }
 #endif
@@ -288,16 +323,17 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
  *
  * (not docbooked, we don't want this one cluttering up the manual)
  */
-static struct task_struct *select_bad_process(unsigned int *ppoints,
-		unsigned long totalpages, struct mem_cgroup *mem,
-		const nodemask_t *nodemask)
+static struct task_struct *select_bad_process(unsigned long *ppoints,
+		struct mem_cgroup *mem, const nodemask_t *nodemask)
 {
 	struct task_struct *p;
 	struct task_struct *chosen = NULL;
+	struct timespec uptime;
 	*ppoints = 0;
 
+	do_posix_clock_monotonic_gettime(&uptime);
 	for_each_process(p) {
-		unsigned int points;
+		unsigned long points;
 
 		if (oom_unkillable_task(p, mem, nodemask))
 			continue;
@@ -329,11 +365,11 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 				return ERR_PTR(-1UL);
 
 			chosen = p;
-			*ppoints = 1000;
+			*ppoints = ULONG_MAX;
 		}
 
-		points = oom_badness(p, mem, nodemask, totalpages);
-		if (points > *ppoints) {
+		points = badness(p, mem, nodemask, uptime.tv_sec);
+		if (points > *ppoints || !chosen) {
 			chosen = p;
 			*ppoints = points;
 		}
@@ -345,24 +381,27 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 /**
  * dump_tasks - dump current memory state of all system tasks
  * @mem: current's memory controller, if constrained
- * @nodemask: nodemask passed to page allocator for mempolicy ooms
  *
- * Dumps the current memory state of all eligible tasks.  Tasks not in the same
- * memcg, not in the same cpuset, or bound to a disjoint set of mempolicy nodes
- * are not shown.
+ * Dumps the current memory state of all system tasks, excluding kernel threads.
  * State information includes task's pid, uid, tgid, vm size, rss, cpu, oom_adj
- * value, oom_score_adj value, and name.
+ * score, and name.
+ *
+ * If the actual is non-NULL, only tasks that are a member of the mem_cgroup are
+ * shown.
  *
  * Call with tasklist_lock read-locked.
  */
-static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
+static void dump_tasks(const struct mem_cgroup *mem)
 {
 	struct task_struct *p;
 	struct task_struct *task;
 
-	pr_info("[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name\n");
+	printk(KERN_INFO "[ pid ]   uid  tgid total_vm      rss cpu oom_adj "
+	       "name\n");
 	for_each_process(p) {
-		if (oom_unkillable_task(p, mem, nodemask))
+		if (p->flags & PF_KTHREAD)
+			continue;
+		if (mem && !task_in_mem_cgroup(p, mem))
 			continue;
 
 		task = find_lock_task_mm(p);
@@ -375,69 +414,43 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
 			continue;
 		}
 
-		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
+		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d %s\n",
 			task->pid, task_uid(task), task->tgid,
 			task->mm->total_vm, get_mm_rss(task->mm),
-			task_cpu(task), task->signal->oom_adj,
-			task->signal->oom_score_adj, task->comm);
+			task_cpu(task), task->signal->oom_adj, task->comm);
 		task_unlock(task);
 	}
 }
 
 static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
-			struct mem_cgroup *mem, const nodemask_t *nodemask)
+							struct mem_cgroup *mem)
 {
 	task_lock(current);
 	pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
-		"oom_adj=%d, oom_score_adj=%d\n",
-		current->comm, gfp_mask, order, current->signal->oom_adj,
-		current->signal->oom_score_adj);
+		"oom_adj=%d\n",
+		current->comm, gfp_mask, order, current->signal->oom_adj);
 	cpuset_print_task_mems_allowed(current);
 	task_unlock(current);
 	dump_stack();
 	mem_cgroup_print_oom_info(mem, p);
 	show_mem();
 	if (sysctl_oom_dump_tasks)
-		dump_tasks(mem, nodemask);
+		dump_tasks(mem);
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
 static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 {
-	struct task_struct *q;
-	struct mm_struct *mm;
-
 	p = find_lock_task_mm(p);
 	if (!p)
 		return 1;
 
-	/* mm cannot be safely dereferenced after task_unlock(p) */
-	mm = p->mm;
-
 	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
 		task_pid_nr(p), p->comm, K(p->mm->total_vm),
 		K(get_mm_counter(p->mm, MM_ANONPAGES)),
 		K(get_mm_counter(p->mm, MM_FILEPAGES)));
 	task_unlock(p);
 
-	/*
-	 * Kill all processes sharing p->mm in other thread groups, if any.
-	 * They don't get access to memory reserves or a higher scheduler
-	 * priority, though, to avoid depletion of all memory or task
-	 * starvation.  This prevents mm->mmap_sem livelock when an oom killed
-	 * task cannot exit because it requires the semaphore and its contended
-	 * by another thread trying to allocate memory itself.  That thread will
-	 * now get access to memory reserves since it has a pending fatal
-	 * signal.
-	 */
-	for_each_process(q)
-		if (q->mm == mm && !same_thread_group(q, p)) {
-			task_lock(q);	/* Protect ->comm from prctl() */
-			pr_err("Kill process %d (%s) sharing same memory\n",
-				task_pid_nr(q), q->comm);
-			task_unlock(q);
-			force_sig(SIGKILL, q);
-		}
 
 	set_tsk_thread_flag(p, TIF_MEMDIE);
 	force_sig(SIGKILL, p);
@@ -454,17 +467,17 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 #undef K
 
 static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
-			    unsigned int points, unsigned long totalpages,
-			    struct mem_cgroup *mem, nodemask_t *nodemask,
-			    const char *message)
+			    unsigned long points, struct mem_cgroup *mem,
+			    nodemask_t *nodemask, const char *message)
 {
 	struct task_struct *victim = p;
 	struct task_struct *child;
 	struct task_struct *t = p;
-	unsigned int victim_points = 0;
+	unsigned long victim_points = 0;
+	struct timespec uptime;
 
 	if (printk_ratelimit())
-		dump_header(p, gfp_mask, order, mem, nodemask);
+		dump_header(p, gfp_mask, order, mem);
 
 	/*
 	 * If the task is already exiting, don't alarm the sysadmin or kill
@@ -477,7 +490,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	}
 
 	task_lock(p);
-	pr_err("%s: Kill process %d (%s) score %d or sacrifice child\n",
+	pr_err("%s: Kill process %d (%s) score %lu or sacrifice child\n",
 		message, task_pid_nr(p), p->comm, points);
 	task_unlock(p);
 
@@ -487,15 +500,14 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 * parent.  This attempts to lose the minimal amount of work done while
 	 * still freeing memory.
 	 */
+	do_posix_clock_monotonic_gettime(&uptime);
 	do {
 		list_for_each_entry(child, &t->children, sibling) {
-			unsigned int child_points;
+			unsigned long child_points;
 
-			/*
-			 * oom_badness() returns 0 if the thread is unkillable
-			 */
-			child_points = oom_badness(child, mem, nodemask,
-								totalpages);
+			/* badness() returns 0 if the thread is unkillable */
+			child_points = badness(child, mem, nodemask,
+					       uptime.tv_sec);
 			if (child_points > victim_points) {
 				victim = child;
 				victim_points = child_points;
@@ -510,7 +522,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
  * Determines whether the kernel must panic because of the panic_on_oom sysctl.
  */
 static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
-				int order, const nodemask_t *nodemask)
+				int order)
 {
 	if (likely(!sysctl_panic_on_oom))
 		return;
@@ -524,7 +536,7 @@ static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
 			return;
 	}
 	read_lock(&tasklist_lock);
-	dump_header(NULL, gfp_mask, order, NULL, nodemask);
+	dump_header(NULL, gfp_mask, order, NULL);
 	read_unlock(&tasklist_lock);
 	panic("Out of memory: %s panic_on_oom is enabled\n",
 		sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
@@ -533,19 +545,17 @@ static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
 {
-	unsigned long limit;
-	unsigned int points = 0;
+	unsigned long points = 0;
 	struct task_struct *p;
 
-	check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, 0, NULL);
-	limit = mem_cgroup_get_limit(mem) >> PAGE_SHIFT;
+	check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, 0);
 	read_lock(&tasklist_lock);
 retry:
-	p = select_bad_process(&points, limit, mem, NULL);
+	p = select_bad_process(&points, mem, NULL);
 	if (!p || PTR_ERR(p) == -1UL)
 		goto out;
 
-	if (oom_kill_process(p, gfp_mask, 0, points, limit, mem, NULL,
+	if (oom_kill_process(p, gfp_mask, 0, points, mem, NULL,
 				"Memory cgroup out of memory"))
 		goto retry;
 out:
@@ -669,11 +679,9 @@ static void clear_system_oom(void)
 void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 		int order, nodemask_t *nodemask)
 {
-	const nodemask_t *mpol_mask;
 	struct task_struct *p;
-	unsigned long totalpages;
 	unsigned long freed = 0;
-	unsigned int points;
+	unsigned long points;
 	enum oom_constraint constraint = CONSTRAINT_NONE;
 	int killed = 0;
 
@@ -697,40 +705,41 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	 * Check if there were limitations on the allocation (only relevant for
 	 * NUMA) that may require different handling.
 	 */
-	constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
-						&totalpages);
-	mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
-	check_panic_on_oom(constraint, gfp_mask, order, mpol_mask);
+	if (zonelist)
+		constraint = constrained_alloc(zonelist, gfp_mask, nodemask);
+	check_panic_on_oom(constraint, gfp_mask, order);
 
 	read_lock(&tasklist_lock);
 	if (sysctl_oom_kill_allocating_task &&
 	    !oom_unkillable_task(current, NULL, nodemask) &&
-	    current->mm && !atomic_read(&current->mm->oom_disable_count)) {
+	    (current->signal->oom_adj != OOM_DISABLE)) {
 		/*
 		 * oom_kill_process() needs tasklist_lock held.  If it returns
 		 * non-zero, current could not be killed so we must fallback to
 		 * the tasklist scan.
 		 */
-		if (!oom_kill_process(current, gfp_mask, order, 0, totalpages,
-				NULL, nodemask,
+		if (!oom_kill_process(current, gfp_mask, order, 0, NULL,
+				nodemask,
 				"Out of memory (oom_kill_allocating_task)"))
 			goto out;
 	}
 
 retry:
-	p = select_bad_process(&points, totalpages, NULL, mpol_mask);
+	p = select_bad_process(&points, NULL,
+			constraint == CONSTRAINT_MEMORY_POLICY ? nodemask :
+								 NULL);
 	if (PTR_ERR(p) == -1UL)
 		goto out;
 
 	/* Found nothing?!?! Either we hang forever, or we panic. */
 	if (!p) {
-		dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
+		dump_header(NULL, gfp_mask, order, NULL);
 		read_unlock(&tasklist_lock);
 		panic("Out of memory and no killable processes...\n");
 	}
 
-	if (oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,
-				nodemask, "Out of memory"))
+	if (oom_kill_process(p, gfp_mask, order, points, NULL, nodemask,
+			     "Out of memory"))
 		goto retry;
 	killed = 1;
 out:
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2011-01-06  1:01 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-02  1:43 [PATCH]oom-kill: direct hardware access processes should get bonus Figo.zhang
2010-11-02  3:10 ` David Rientjes
2010-11-02 14:24   ` Figo.zhang
2010-11-02 19:34     ` David Rientjes
2010-11-03 23:43 ` [PATCH v2]oom-kill: CAP_SYS_RESOURCE " Figo.zhang
2010-11-03 23:47   ` David Rientjes
     [not found]     ` <AANLkTimjfmLzr_9+Sf4gk0xGkFjffQ1VcCnwmCXA88R8@mail.gmail.com>
2010-11-04  1:38       ` Figo.zhang
2010-11-04  1:50         ` David Rientjes
2010-11-04  2:12           ` Figo.zhang
2010-11-04  2:54             ` David Rientjes
2010-11-04  4:42               ` Figo.zhang
2010-11-04  5:08                 ` David Rientjes
2010-11-09 11:01           ` [PATCH " KOSAKI Motohiro
2010-11-09 12:24             ` Alan Cox
2010-11-09 21:06               ` David Rientjes
2010-11-09 21:25                 ` David Rientjes
2010-11-10 14:38                 ` Figo.zhang
2010-11-10 20:50                   ` David Rientjes
2010-11-09 10:41 ` [PATCH]oom-kill: direct hardware access processes " KOSAKI Motohiro
2010-11-09 12:24 ` [PATCH v2]mm/oom-kill: " Figo.zhang
2010-11-09 21:16   ` David Rientjes
2010-11-10 14:48     ` Figo.zhang
2010-11-14  5:07     ` KOSAKI Motohiro
2010-11-14 21:29       ` David Rientjes
2010-11-15  1:24         ` KOSAKI Motohiro
2010-11-15 10:03           ` David Rientjes
2010-11-23  7:16             ` KOSAKI Motohiro
2010-11-28  1:36               ` David Rientjes
2010-11-30 13:00                 ` KOSAKI Motohiro
2010-11-30 20:05                   ` David Rientjes
2010-11-10 15:14   ` [PATCH v3]mm/oom-kill: " Figo.zhang
2010-11-10 15:24     ` Figo.zhang
2010-11-10 21:00       ` David Rientjes
2010-11-14  5:21       ` KOSAKI Motohiro
2010-11-14 21:33         ` David Rientjes
2010-11-15  3:26           ` [PATCH] Revert oom rewrite series Figo.zhang
2010-11-15 10:14             ` David Rientjes
2010-11-15 10:57               ` Alan Cox
2010-11-15 20:54                 ` David Rientjes
2010-11-23  7:16                 ` KOSAKI Motohiro
2011-01-04  7:51       ` [PATCH v3]mm/oom-kill: direct hardware access processes should get bonus Figo.zhang
2011-01-04  8:28         ` KAMEZAWA Hiroyuki
2011-01-04  8:56           ` Figo.zhang
2011-01-06  0:55             ` KAMEZAWA Hiroyuki
2011-01-05  3:32         ` David Rientjes
2010-11-14  5:07 [PATCH] Revert oom rewrite series KOSAKI Motohiro
2010-11-14 19:32 ` Linus Torvalds
2010-11-15  0:54   ` KOSAKI Motohiro
2010-11-15  2:19     ` Andrew Morton
     [not found]       ` <AANLkTik_SDaiu2eQsJ9+4ywLR5K5V1Od-hwop6gwas3F@mail.gmail.com>
2010-11-15  4:41         ` Figo.zhang
2010-11-15  6:57       ` KOSAKI Motohiro
2010-11-15 10:34         ` David Rientjes
2010-11-15 23:31           ` Jesper Juhl
2010-11-16  0:06             ` David Rientjes
2010-11-16 10:04               ` Martin Knoblauch
2010-11-16 10:33                 ` Alessandro Suardi
2010-11-16  0:13             ` Valdis.Kletnieks
2010-11-16  6:43               ` David Rientjes
2010-11-16 11:03               ` Alan Cox
2010-11-16 13:03                 ` Florian Mickler
2010-11-16 14:55                   ` Alan Cox
2010-11-16 20:57                     ` David Rientjes
2010-11-16 21:01                       ` Fabio Comolli
2010-11-17  4:04                     ` Valdis.Kletnieks
2010-11-16 15:15               ` Alejandro Riveira Fernández
2010-11-23  7:16           ` KOSAKI Motohiro
2010-11-28  1:45             ` David Rientjes
2010-11-30 13:04               ` KOSAKI Motohiro
2010-11-30 20:02                 ` David Rientjes
2010-11-23  7:16         ` KOSAKI Motohiro
2010-11-23 23:51   ` KOSAKI Motohiro
2010-11-14 21:58 ` David Rientjes
2010-11-15 23:33   ` Bodo Eggert
2010-11-15 23:50     ` David Rientjes
2010-11-17  0:06       ` Bodo Eggert
2010-11-17  0:25         ` David Rientjes
2010-11-17  0:48         ` Mandeep Singh Baines

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).