linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy
@ 2009-11-24  5:57 Daisuke Nishimura
  2009-11-24  7:28 ` [BUGFIX][PATCH -stable] " Daisuke Nishimura
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Daisuke Nishimura @ 2009-11-24  5:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, linux-mm, stable, Balbir Singh, David Rientjes,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Daisuke Nishimura

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

	<some path>/00		use_hierarchy == 0	<- hitting limit
	  <some path>/00/aa	use_hierarchy == 1	<- "curr"

This leads to killing an innocent task in 00/aa. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
The bug exists and should be fixed in 2.6.31.y too.
I'll post a patch for -stable later.

 mm/memcontrol.c |    4 ++--
 mm/oom_kill.c   |   13 +++++++------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ea00a93..d02f9f8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -783,7 +783,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 	task_unlock(task);
 	if (!curr)
 		return 0;
-	if (curr->use_hierarchy)
+	if (mem->use_hierarchy)
 		ret = css_is_ancestor(&curr->css, &mem->css);
 	else
 		ret = (curr == mem);
@@ -1032,7 +1032,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 	static char memcg_name[PATH_MAX];
 	int ret;
 
-	if (!memcg)
+	if (!memcg || !p)
 		return;
 
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ab04537..be56461 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -356,7 +356,8 @@ static void dump_tasks(const struct mem_cgroup *mem)
 	} while_each_thread(g, p);
 }
 
-static void dump_header(gfp_t gfp_mask, int order, struct mem_cgroup *mem)
+static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
+							struct mem_cgroup *mem)
 {
 	pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
 		"oom_adj=%d\n",
@@ -365,7 +366,7 @@ static void dump_header(gfp_t gfp_mask, int order, struct mem_cgroup *mem)
 	cpuset_print_task_mems_allowed(current);
 	task_unlock(current);
 	dump_stack();
-	mem_cgroup_print_oom_info(mem, current);
+	mem_cgroup_print_oom_info(mem, p);
 	show_mem();
 	if (sysctl_oom_dump_tasks)
 		dump_tasks(mem);
@@ -440,7 +441,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	struct task_struct *c;
 
 	if (printk_ratelimit())
-		dump_header(gfp_mask, order, mem);
+		dump_header(p, gfp_mask, order, mem);
 
 	/*
 	 * If the task is already exiting, don't alarm the sysadmin or kill
@@ -576,7 +577,7 @@ retry:
 	/* Found nothing?!?! Either we hang forever, or we panic. */
 	if (!p) {
 		read_unlock(&tasklist_lock);
-		dump_header(gfp_mask, order, NULL);
+		dump_header(NULL, gfp_mask, order, NULL);
 		panic("Out of memory and no killable processes...\n");
 	}
 
@@ -644,7 +645,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 		return;
 
 	if (sysctl_panic_on_oom == 2) {
-		dump_header(gfp_mask, order, NULL);
+		dump_header(NULL, gfp_mask, order, NULL);
 		panic("out of memory. Compulsory panic_on_oom is selected.\n");
 	}
 
@@ -663,7 +664,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 
 	case CONSTRAINT_NONE:
 		if (sysctl_panic_on_oom) {
-			dump_header(gfp_mask, order, NULL);
+			dump_header(NULL, gfp_mask, order, NULL);
 			panic("out of memory. panic_on_oom is selected\n");
 		}
 		/* Fall-through */
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [BUGFIX][PATCH -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-24  5:57 [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy Daisuke Nishimura
@ 2009-11-24  7:28 ` Daisuke Nishimura
  2009-11-25  0:00   ` KAMEZAWA Hiroyuki
  2009-11-25  4:08   ` [BUGFIX][PATCH -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy Balbir Singh
  2009-11-24 13:31 ` [BUGFIX][PATCH -mmotm] " Balbir Singh
  2009-11-25  4:09 ` Balbir Singh
  2 siblings, 2 replies; 20+ messages in thread
From: Daisuke Nishimura @ 2009-11-24  7:28 UTC (permalink / raw)
  To: stable
  Cc: LKML, linux-mm, Andrew Morton, Balbir Singh, David Rientjes,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Daisuke Nishimura

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

	<some path>/00		use_hierarchy == 0	<- hitting limit
	  <some path>/00/aa	use_hierarchy == 1	<- "curr"

This leads to killing an innocent task in 00/aa. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/memcontrol.c |    2 +-
 mm/oom_kill.c   |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd4529d..3acc226 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -496,7 +496,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 	task_unlock(task);
 	if (!curr)
 		return 0;
-	if (curr->use_hierarchy)
+	if (mem->use_hierarchy)
 		ret = css_is_ancestor(&curr->css, &mem->css);
 	else
 		ret = (curr == mem);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index a7b2460..ed452e9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -400,7 +400,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		cpuset_print_task_mems_allowed(current);
 		task_unlock(current);
 		dump_stack();
-		mem_cgroup_print_oom_info(mem, current);
+		mem_cgroup_print_oom_info(mem, p);
 		show_mem();
 		if (sysctl_oom_dump_tasks)
 			dump_tasks(mem);
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-24  5:57 [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy Daisuke Nishimura
  2009-11-24  7:28 ` [BUGFIX][PATCH -stable] " Daisuke Nishimura
@ 2009-11-24 13:31 ` Balbir Singh
  2009-11-24 14:00   ` Daisuke Nishimura
  2009-11-25  4:09 ` Balbir Singh
  2 siblings, 1 reply; 20+ messages in thread
From: Balbir Singh @ 2009-11-24 13:31 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, LKML, linux-mm, stable, David Rientjes,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
<nishimura@mxp.nes.nec.co.jp> wrote:
> task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> a task can be a candidate for being oom-killed from memcg's limit, checks
> "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
>
> But this check return true(it's false positive) when:
>
>        <some path>/00          use_hierarchy == 0      <- hitting limit
>          <some path>/00/aa     use_hierarchy == 1      <- "curr"
>
> This leads to killing an innocent task in 00/aa. This patch is a fix for this
> bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> should print information of mem_cgroup which the task being killed, not current,
> belongs to.
>

Quick Question: What happens if <some path>/00 has no tasks in it
after your patches?

Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in  case of use_hierarchy
  2009-11-24 13:31 ` [BUGFIX][PATCH -mmotm] " Balbir Singh
@ 2009-11-24 14:00   ` Daisuke Nishimura
  2009-11-24 17:04     ` Balbir Singh
  0 siblings, 1 reply; 20+ messages in thread
From: Daisuke Nishimura @ 2009-11-24 14:00 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Andrew Morton, LKML, linux-mm, stable, David Rientjes,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Daisuke Nishimura

On Tue, 24 Nov 2009 19:01:54 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> <nishimura@mxp.nes.nec.co.jp> wrote:
> > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > a task can be a candidate for being oom-killed from memcg's limit, checks
> > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> >
> > But this check return true(it's false positive) when:
> >
> > A  A  A  A <some path>/00 A  A  A  A  A use_hierarchy == 0 A  A  A <- hitting limit
> > A  A  A  A  A <some path>/00/aa A  A  use_hierarchy == 1 A  A  A <- "curr"
> >
> > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > should print information of mem_cgroup which the task being killed, not current,
> > belongs to.
> >
> 
> Quick Question: What happens if <some path>/00 has no tasks in it
> after your patches?
> 
Nothing would happen because <some path>/00 never hit its limit.

The bug that this patch fixes is:

- create a dir <some path>/00 and set some limits.
- create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
- run some programs in both in 00 and 00/aa. programs in 00 should be
  big enough to cause oom by its limit.
- when oom happens by 00's limit, tasks in 00/aa can also be killed.


Regards,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-24 14:00   ` Daisuke Nishimura
@ 2009-11-24 17:04     ` Balbir Singh
  2009-11-24 23:49       ` Daisuke Nishimura
  2009-11-25  0:07       ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 20+ messages in thread
From: Balbir Singh @ 2009-11-24 17:04 UTC (permalink / raw)
  To: nishimura
  Cc: Andrew Morton, LKML, linux-mm, stable, David Rientjes,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

* Daisuke Nishimura <d-nishimura@mtf.biglobe.ne.jp> [2009-11-24 23:00:29]:

> On Tue, 24 Nov 2009 19:01:54 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > <nishimura@mxp.nes.nec.co.jp> wrote:
> > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > >
> > > But this check return true(it's false positive) when:
> > >
> > >        <some path>/00          use_hierarchy == 0      <- hitting limit
> > >          <some path>/00/aa     use_hierarchy == 1      <- "curr"
> > >
> > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > should print information of mem_cgroup which the task being killed, not current,
> > > belongs to.
> > >
> > 
> > Quick Question: What happens if <some path>/00 has no tasks in it
> > after your patches?
> > 
> Nothing would happen because <some path>/00 never hit its limit.

Why not? I am talking of a scenario where <some path>/00 is set to a
limit (similar to your example) and hits its limit, but the groups
under it have no limits, but tasks. Shouldn't we be scanning
<some path>/00/aa as well?

> 
> The bug that this patch fixes is:
> 
> - create a dir <some path>/00 and set some limits.
> - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> - run some programs in both in 00 and 00/aa. programs in 00 should be
>   big enough to cause oom by its limit.
> - when oom happens by 00's limit, tasks in 00/aa can also be killed.
>

To be honest, the last part is fair, specifically if 00/aa has a task
that is really the heaviest task as per the oom logic. no? Are you
suggesting that only tasks in <some path>/00 should be selected by the
oom logic? 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-24 17:04     ` Balbir Singh
@ 2009-11-24 23:49       ` Daisuke Nishimura
  2009-11-25  3:29         ` Balbir Singh
  2009-11-25  0:07       ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 20+ messages in thread
From: Daisuke Nishimura @ 2009-11-24 23:49 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, LKML, linux-mm, stable, David Rientjes,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Daisuke Nishimura

On Tue, 24 Nov 2009 22:34:02 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * Daisuke Nishimura <d-nishimura@mtf.biglobe.ne.jp> [2009-11-24 23:00:29]:
> 
> > On Tue, 24 Nov 2009 19:01:54 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > > <nishimura@mxp.nes.nec.co.jp> wrote:
> > > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > > >
> > > > But this check return true(it's false positive) when:
> > > >
> > > > A  A  A  A <some path>/00 A  A  A  A  A use_hierarchy == 0 A  A  A <- hitting limit
> > > > A  A  A  A  A <some path>/00/aa A  A  use_hierarchy == 1 A  A  A <- "curr"
> > > >
> > > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > > should print information of mem_cgroup which the task being killed, not current,
> > > > belongs to.
> > > >
> > > 
> > > Quick Question: What happens if <some path>/00 has no tasks in it
> > > after your patches?
> > > 
> > Nothing would happen because <some path>/00 never hit its limit.
> 
> Why not? I am talking of a scenario where <some path>/00 is set to a
> limit (similar to your example) and hits its limit, but the groups
> under it have no limits, but tasks. Shouldn't we be scanning
> <some path>/00/aa as well?
> 
> > 
> > The bug that this patch fixes is:
> > 
> > - create a dir <some path>/00 and set some limits.
> > - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> > - run some programs in both in 00 and 00/aa. programs in 00 should be
> >   big enough to cause oom by its limit.
> > - when oom happens by 00's limit, tasks in 00/aa can also be killed.
> >
> 
> To be honest, the last part is fair, specifically if 00/aa has a task
> that is really the heaviest task as per the oom logic. no? Are you
> suggesting that only tasks in <some path>/00 should be selected by the
> oom logic? 
> 
All of your comments would be rational if hierarchy is enabled in 00(it's
also enabled in 00/aa automatically in this case).
I'm saying about the case where it's disabled in 00 but enabled in 00/aa.

In this scenario, charges by tasks in 00/aa is(and should not be) charged to 00.
And oom caused by 00's limit should not affect the task in 00/aa.


Regards,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-24  7:28 ` [BUGFIX][PATCH -stable] " Daisuke Nishimura
@ 2009-11-25  0:00   ` KAMEZAWA Hiroyuki
  2009-11-25  5:32     ` [BUGFIX][PATCH v2 " Daisuke Nishimura
  2009-11-25  4:08   ` [BUGFIX][PATCH -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy Balbir Singh
  1 sibling, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-25  0:00 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: stable, LKML, linux-mm, Andrew Morton, Balbir Singh,
	David Rientjes, KOSAKI Motohiro

On Tue, 24 Nov 2009 16:28:54 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> a task can be a candidate for being oom-killed from memcg's limit, checks
> "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> 
> But this check return true(it's false positive) when:
> 
> 	<some path>/00		use_hierarchy == 0	<- hitting limit
> 	  <some path>/00/aa	use_hierarchy == 1	<- "curr"
> 
> This leads to killing an innocent task in 00/aa. This patch is a fix for this
> bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> should print information of mem_cgroup which the task being killed, not current,
> belongs to.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> ---
>  mm/memcontrol.c |    2 +-
>  mm/oom_kill.c   |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fd4529d..3acc226 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -496,7 +496,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
>  	task_unlock(task);
>  	if (!curr)
>  		return 0;
> -	if (curr->use_hierarchy)
> +	if (mem->use_hierarchy)
>  		ret = css_is_ancestor(&curr->css, &mem->css);
>  	else
>  		ret = (curr == mem);

Hmm. Maybe not-expected behavior...could you add comment ?

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
(*) I'm sorry I can't work enough in these days.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-24 17:04     ` Balbir Singh
  2009-11-24 23:49       ` Daisuke Nishimura
@ 2009-11-25  0:07       ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-25  0:07 UTC (permalink / raw)
  To: balbir
  Cc: nishimura, Andrew Morton, LKML, linux-mm, stable, David Rientjes,
	KOSAKI Motohiro

On Tue, 24 Nov 2009 22:34:02 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * Daisuke Nishimura <d-nishimura@mtf.biglobe.ne.jp> [2009-11-24 23:00:29]:
> 
> > On Tue, 24 Nov 2009 19:01:54 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > > <nishimura@mxp.nes.nec.co.jp> wrote:
> > > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > > >
> > > > But this check return true(it's false positive) when:
> > > >
> > > > A  A  A  A <some path>/00 A  A  A  A  A use_hierarchy == 0 A  A  A <- hitting limit
> > > > A  A  A  A  A <some path>/00/aa A  A  use_hierarchy == 1 A  A  A <- "curr"
> > > >
> > > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > > should print information of mem_cgroup which the task being killed, not current,
> > > > belongs to.
> > > >
> > > 
> > > Quick Question: What happens if <some path>/00 has no tasks in it
> > > after your patches?
> > > 
> > Nothing would happen because <some path>/00 never hit its limit.
> 
> Why not? I am talking of a scenario where <some path>/00 is set to a
> limit (similar to your example) and hits its limit, but the groups
> under it have no limits, but tasks. Shouldn't we be scanning
> <some path>/00/aa as well?
> 
No.  <some path>/00 == use_hierarchy=0 means _all_ children's accounting
information is never added up to <some path>/00.

If there is no task in <some path>/00, it means <some path>/00 contains only
file cache and not-migrated rss. To hit limit, the admin has to make 
memory.(memsw).limit_in_bytes smaller. But in this case, oom is not called.
-ENOMEM is returned to users. IIUC.




> > 
> > The bug that this patch fixes is:
> > 
> > - create a dir <some path>/00 and set some limits.
> > - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> > - run some programs in both in 00 and 00/aa. programs in 00 should be
> >   big enough to cause oom by its limit.
> > - when oom happens by 00's limit, tasks in 00/aa can also be killed.
> >
> 
> To be honest, the last part is fair, specifically if 00/aa has a task
> that is really the heaviest task as per the oom logic. no? Are you
> suggesting that only tasks in <some path>/00 should be selected by the
> oom logic? 
> 
<some path>/00 and <some path>/00/aa has completely different accounting set.
There are no hierarchy relationship. The directory tree shows "virtual"
hierarchy but in reality, their relationship is horizontal rather than hierarchycal.

So, killing tasks only in <some path>/00 is better.

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-24 23:49       ` Daisuke Nishimura
@ 2009-11-25  3:29         ` Balbir Singh
  0 siblings, 0 replies; 20+ messages in thread
From: Balbir Singh @ 2009-11-25  3:29 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, LKML, linux-mm, stable, David Rientjes,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

* nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-11-25 08:49:10]:

> On Tue, 24 Nov 2009 22:34:02 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > * Daisuke Nishimura <d-nishimura@mtf.biglobe.ne.jp> [2009-11-24 23:00:29]:
> > 
> > > On Tue, 24 Nov 2009 19:01:54 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > > > <nishimura@mxp.nes.nec.co.jp> wrote:
> > > > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > > > >
> > > > > But this check return true(it's false positive) when:
> > > > >
> > > > >        <some path>/00          use_hierarchy == 0      <- hitting limit
> > > > >          <some path>/00/aa     use_hierarchy == 1      <- "curr"
> > > > >
> > > > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > > > should print information of mem_cgroup which the task being killed, not current,
> > > > > belongs to.
> > > > >
> > > > 
> > > > Quick Question: What happens if <some path>/00 has no tasks in it
> > > > after your patches?
> > > > 
> > > Nothing would happen because <some path>/00 never hit its limit.
> > 
> > Why not? I am talking of a scenario where <some path>/00 is set to a
> > limit (similar to your example) and hits its limit, but the groups
> > under it have no limits, but tasks. Shouldn't we be scanning
> > <some path>/00/aa as well?
> > 
> > > 
> > > The bug that this patch fixes is:
> > > 
> > > - create a dir <some path>/00 and set some limits.
> > > - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> > > - run some programs in both in 00 and 00/aa. programs in 00 should be
> > >   big enough to cause oom by its limit.
> > > - when oom happens by 00's limit, tasks in 00/aa can also be killed.
> > >
> > 
> > To be honest, the last part is fair, specifically if 00/aa has a task
> > that is really the heaviest task as per the oom logic. no? Are you
> > suggesting that only tasks in <some path>/00 should be selected by the
> > oom logic? 
> > 
> All of your comments would be rational if hierarchy is enabled in 00(it's
> also enabled in 00/aa automatically in this case).
> I'm saying about the case where it's disabled in 00 but enabled in 00/aa.
> 

OK, I misunderstood the example then, so even though hierarchy is
disabled, we kill a task in 00/aa when 00 hits the limit. Thanks for
clarifying.

> In this scenario, charges by tasks in 00/aa is(and should not be) charged to 00.
> And oom caused by 00's limit should not affect the task in 00/aa.
> 
> 
> Regards,
> Daisuke Nishimura.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-24  7:28 ` [BUGFIX][PATCH -stable] " Daisuke Nishimura
  2009-11-25  0:00   ` KAMEZAWA Hiroyuki
@ 2009-11-25  4:08   ` Balbir Singh
  1 sibling, 0 replies; 20+ messages in thread
From: Balbir Singh @ 2009-11-25  4:08 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: stable, LKML, linux-mm, Andrew Morton, David Rientjes,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

* nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-11-24 16:28:54]:

> task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> a task can be a candidate for being oom-killed from memcg's limit, checks
> "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> 
> But this check return true(it's false positive) when:
> 
> 	<some path>/00		use_hierarchy == 0	<- hitting limit
> 	  <some path>/00/aa	use_hierarchy == 1	<- "curr"
> 
> This leads to killing an innocent task in 00/aa. This patch is a fix for this
> bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> should print information of mem_cgroup which the task being killed, not current,
> belongs to.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> ---
>  mm/memcontrol.c |    2 +-
>  mm/oom_kill.c   |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fd4529d..3acc226 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -496,7 +496,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
>  	task_unlock(task);
>  	if (!curr)
>  		return 0;
> -	if (curr->use_hierarchy)
> +	if (mem->use_hierarchy)
>  		ret = css_is_ancestor(&curr->css, &mem->css);
>  	else
>  		ret = (curr == mem);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index a7b2460..ed452e9 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -400,7 +400,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  		cpuset_print_task_mems_allowed(current);
>  		task_unlock(current);
>  		dump_stack();
> -		mem_cgroup_print_oom_info(mem, current);
> +		mem_cgroup_print_oom_info(mem, p);
>  		show_mem();
>  		if (sysctl_oom_dump_tasks)
>  			dump_tasks(mem);
>

 
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-24  5:57 [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy Daisuke Nishimura
  2009-11-24  7:28 ` [BUGFIX][PATCH -stable] " Daisuke Nishimura
  2009-11-24 13:31 ` [BUGFIX][PATCH -mmotm] " Balbir Singh
@ 2009-11-25  4:09 ` Balbir Singh
  2 siblings, 0 replies; 20+ messages in thread
From: Balbir Singh @ 2009-11-25  4:09 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, LKML, linux-mm, stable, David Rientjes,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

* nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-11-24 14:57:59]:

> task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> a task can be a candidate for being oom-killed from memcg's limit, checks
> "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> 
> But this check return true(it's false positive) when:
> 
> 	<some path>/00		use_hierarchy == 0	<- hitting limit
> 	  <some path>/00/aa	use_hierarchy == 1	<- "curr"
> 
> This leads to killing an innocent task in 00/aa. This patch is a fix for this
> bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> should print information of mem_cgroup which the task being killed, not current,
> belongs to.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>


Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
 
-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [BUGFIX][PATCH v2 -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-25  0:00   ` KAMEZAWA Hiroyuki
@ 2009-11-25  5:32     ` Daisuke Nishimura
  2009-11-25  5:50       ` KAMEZAWA Hiroyuki
                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Daisuke Nishimura @ 2009-11-25  5:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: stable, LKML, linux-mm, Andrew Morton, Balbir Singh,
	David Rientjes, KOSAKI Motohiro, Daisuke Nishimura

> Hmm. Maybe not-expected behavior...could you add comment ?
> 
How about this ?

> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> (*) I'm sorry I can't work enough in these days.
> 

BTW, this patch conflict with oom-dump-stack-and-vm-state-when-oom-killer-panics.patch
in current mmotm(that's why I post mmotm version separately), so this bug will not be fixed
till 2.6.33 in linus-tree.
So I think this patch should go in 2.6.32.y too.

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

	<some path>/00		use_hierarchy == 0	<- hitting limit
	  <some path>/00/aa	use_hierarchy == 1	<- "curr"

This leads to killing an innocent task in 00/aa. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 mm/memcontrol.c |    8 +++++++-
 mm/oom_kill.c   |    2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd4529d..566925e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -496,7 +496,13 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 	task_unlock(task);
 	if (!curr)
 		return 0;
-	if (curr->use_hierarchy)
+	/*
+	 * We should check use_hierarchy of "mem" not "curr". Because checking
+	 * use_hierarchy of "curr" here make this function true if hierarchy is
+	 * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
+	 * hierarchy(even if use_hierarchy is disabled in "mem").
+	 */
+	if (mem->use_hierarchy)
 		ret = css_is_ancestor(&curr->css, &mem->css);
 	else
 		ret = (curr == mem);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index a7b2460..ed452e9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -400,7 +400,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		cpuset_print_task_mems_allowed(current);
 		task_unlock(current);
 		dump_stack();
-		mem_cgroup_print_oom_info(mem, current);
+		mem_cgroup_print_oom_info(mem, p);
 		show_mem();
 		if (sysctl_oom_dump_tasks)
 			dump_tasks(mem);
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH v2 -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-25  5:32     ` [BUGFIX][PATCH v2 " Daisuke Nishimura
@ 2009-11-25  5:50       ` KAMEZAWA Hiroyuki
  2009-11-25 20:45       ` Andrew Morton
  2009-12-17  0:47       ` [BUGFIX][PATCH v2 -stable] " Daisuke Nishimura
  2 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-25  5:50 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: stable, LKML, linux-mm, Andrew Morton, Balbir Singh,
	David Rientjes, KOSAKI Motohiro

On Wed, 25 Nov 2009 14:32:18 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> > Hmm. Maybe not-expected behavior...could you add comment ?
> > 
> How about this ?
> 
seems nice. Thank you very much.

Regards,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [BUGFIX][PATCH v2 -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-25  5:32     ` [BUGFIX][PATCH v2 " Daisuke Nishimura
  2009-11-25  5:50       ` KAMEZAWA Hiroyuki
@ 2009-11-25 20:45       ` Andrew Morton
  2009-11-26  0:11         ` [BUGFIX][PATCH v2 -mmotm] " Daisuke Nishimura
  2009-12-17  0:47       ` [BUGFIX][PATCH v2 -stable] " Daisuke Nishimura
  2 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2009-11-25 20:45 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, stable, LKML, linux-mm, Balbir Singh,
	David Rientjes, KOSAKI Motohiro

On Wed, 25 Nov 2009 14:32:18 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> > Hmm. Maybe not-expected behavior...could you add comment ?
> > 
> How about this ?
> 
> > Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > (*) I'm sorry I can't work enough in these days.
> > 
> 
> BTW, this patch conflict with oom-dump-stack-and-vm-state-when-oom-killer-panics.patch
> in current mmotm(that's why I post mmotm version separately), so this bug will not be fixed
> till 2.6.33 in linus-tree.
> So I think this patch should go in 2.6.32.y too.

I don't actually have a 2.6.33 version of this patch yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [BUGFIX][PATCH v2 -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-25 20:45       ` Andrew Morton
@ 2009-11-26  0:11         ` Daisuke Nishimura
  0 siblings, 0 replies; 20+ messages in thread
From: Daisuke Nishimura @ 2009-11-26  0:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, stable, LKML, linux-mm, Balbir Singh,
	David Rientjes, KOSAKI Motohiro, Daisuke Nishimura

On Wed, 25 Nov 2009 12:45:51 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 25 Nov 2009 14:32:18 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > > Hmm. Maybe not-expected behavior...could you add comment ?
> > > 
> > How about this ?
> > 
> > > Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > (*) I'm sorry I can't work enough in these days.
> > > 
> > 
> > BTW, this patch conflict with oom-dump-stack-and-vm-state-when-oom-killer-panics.patch
> > in current mmotm(that's why I post mmotm version separately), so this bug will not be fixed
> > till 2.6.33 in linus-tree.
> > So I think this patch should go in 2.6.32.y too.
> 
> I don't actually have a 2.6.33 version of this patch yet.

I add comments as I did in for-stable version and attach the updated patch
for-mmotm to this mail.

It can be applied on current mmotm(2009-11-24-16-47).

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

	<some path>/aa		use_hierarchy == 0	<- hitting limit
	  <some path>/aa/00	use_hierarchy == 1	<- the task belongs to

This leads to killing an innocent task in aa/00. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 mm/memcontrol.c |   10 ++++++++--
 mm/oom_kill.c   |   13 +++++++------
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 661b8c6..951c103 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -759,7 +759,13 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 	task_unlock(task);
 	if (!curr)
 		return 0;
-	if (curr->use_hierarchy)
+	/*
+	 * We should check use_hierarchy of "mem" not "curr". Because checking
+	 * use_hierarchy of "curr" here make this function true if hierarchy is
+	 * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
+	 * hierarchy(even if use_hierarchy is disabled in "mem").
+	 */
+	if (mem->use_hierarchy)
 		ret = css_is_ancestor(&curr->css, &mem->css);
 	else
 		ret = (curr == mem);
@@ -1008,7 +1014,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 	static char memcg_name[PATH_MAX];
 	int ret;
 
-	if (!memcg)
+	if (!memcg || !p)
 		return;
 
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ab04537..be56461 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -356,7 +356,8 @@ static void dump_tasks(const struct mem_cgroup *mem)
 	} while_each_thread(g, p);
 }
 
-static void dump_header(gfp_t gfp_mask, int order, struct mem_cgroup *mem)
+static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
+							struct mem_cgroup *mem)
 {
 	pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
 		"oom_adj=%d\n",
@@ -365,7 +366,7 @@ static void dump_header(gfp_t gfp_mask, int order, struct mem_cgroup *mem)
 	cpuset_print_task_mems_allowed(current);
 	task_unlock(current);
 	dump_stack();
-	mem_cgroup_print_oom_info(mem, current);
+	mem_cgroup_print_oom_info(mem, p);
 	show_mem();
 	if (sysctl_oom_dump_tasks)
 		dump_tasks(mem);
@@ -440,7 +441,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	struct task_struct *c;
 
 	if (printk_ratelimit())
-		dump_header(gfp_mask, order, mem);
+		dump_header(p, gfp_mask, order, mem);
 
 	/*
 	 * If the task is already exiting, don't alarm the sysadmin or kill
@@ -576,7 +577,7 @@ retry:
 	/* Found nothing?!?! Either we hang forever, or we panic. */
 	if (!p) {
 		read_unlock(&tasklist_lock);
-		dump_header(gfp_mask, order, NULL);
+		dump_header(NULL, gfp_mask, order, NULL);
 		panic("Out of memory and no killable processes...\n");
 	}
 
@@ -644,7 +645,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 		return;
 
 	if (sysctl_panic_on_oom == 2) {
-		dump_header(gfp_mask, order, NULL);
+		dump_header(NULL, gfp_mask, order, NULL);
 		panic("out of memory. Compulsory panic_on_oom is selected.\n");
 	}
 
@@ -663,7 +664,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 
 	case CONSTRAINT_NONE:
 		if (sysctl_panic_on_oom) {
-			dump_header(gfp_mask, order, NULL);
+			dump_header(NULL, gfp_mask, order, NULL);
 			panic("out of memory. panic_on_oom is selected\n");
 		}
 		/* Fall-through */
-- 
1.5.6.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [BUGFIX][PATCH v2 -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-11-25  5:32     ` [BUGFIX][PATCH v2 " Daisuke Nishimura
  2009-11-25  5:50       ` KAMEZAWA Hiroyuki
  2009-11-25 20:45       ` Andrew Morton
@ 2009-12-17  0:47       ` Daisuke Nishimura
  2010-01-04 22:28         ` [stable] " Greg KH
  2 siblings, 1 reply; 20+ messages in thread
From: Daisuke Nishimura @ 2009-12-17  0:47 UTC (permalink / raw)
  To: stable
  Cc: LKML, linux-mm, Andrew Morton, Balbir Singh, David Rientjes,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, Daisuke Nishimura

Stable team.

Cay you pick this up for 2.6.32.y(and 2.6.31.y if it will be released) ?

This is a for-stable version of a bugfix patch that corresponds to the
upstream commmit d31f56dbf8bafaacb0c617f9a6f137498d5c7aed.

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

	<some path>/00		use_hierarchy == 0	<- hitting limit
	  <some path>/00/aa	use_hierarchy == 1	<- "curr"

This leads to killing an innocent task in 00/aa. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 mm/memcontrol.c |    8 +++++++-
 mm/oom_kill.c   |    2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd4529d..566925e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -496,7 +496,13 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 	task_unlock(task);
 	if (!curr)
 		return 0;
-	if (curr->use_hierarchy)
+	/*
+	 * We should check use_hierarchy of "mem" not "curr". Because checking
+	 * use_hierarchy of "curr" here make this function true if hierarchy is
+	 * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
+	 * hierarchy(even if use_hierarchy is disabled in "mem").
+	 */
+	if (mem->use_hierarchy)
 		ret = css_is_ancestor(&curr->css, &mem->css);
 	else
 		ret = (curr == mem);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index a7b2460..ed452e9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -400,7 +400,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		cpuset_print_task_mems_allowed(current);
 		task_unlock(current);
 		dump_stack();
-		mem_cgroup_print_oom_info(mem, current);
+		mem_cgroup_print_oom_info(mem, p);
 		show_mem();
 		if (sysctl_oom_dump_tasks)
 			dump_tasks(mem);
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [stable] [BUGFIX][PATCH v2 -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2009-12-17  0:47       ` [BUGFIX][PATCH v2 -stable] " Daisuke Nishimura
@ 2010-01-04 22:28         ` Greg KH
  2010-01-05  3:26           ` [stable][BUGFIX][PATCH v3] " Daisuke Nishimura
  0 siblings, 1 reply; 20+ messages in thread
From: Greg KH @ 2010-01-04 22:28 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: stable, LKML, linux-mm, KOSAKI Motohiro, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh

On Thu, Dec 17, 2009 at 09:47:24AM +0900, Daisuke Nishimura wrote:
> Stable team.
> 
> Cay you pick this up for 2.6.32.y(and 2.6.31.y if it will be released) ?
> 
> This is a for-stable version of a bugfix patch that corresponds to the
> upstream commmit d31f56dbf8bafaacb0c617f9a6f137498d5c7aed.

I've applied it to the .32-stable tree, but it does not apply to .31.
Care to provide a version of the patch for that kernel if you want it
applied there?

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [stable][BUGFIX][PATCH v3] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2010-01-04 22:28         ` [stable] " Greg KH
@ 2010-01-05  3:26           ` Daisuke Nishimura
  2010-01-05 19:26             ` [stable] [BUGFIX][PATCH " Greg KH
  2010-01-05 19:33             ` patch memcg-avoid-oom-killing-innocent-task-in-case-of-use_hierarchy.patch added to 2.6.31-stable tree gregkh
  0 siblings, 2 replies; 20+ messages in thread
From: Daisuke Nishimura @ 2010-01-05  3:26 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, LKML, linux-mm, KOSAKI Motohiro, David Rientjes,
	Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh,
	Daisuke Nishimura

On Mon, 4 Jan 2010 14:28:19 -0800
Greg KH <greg@kroah.com> wrote:

> On Thu, Dec 17, 2009 at 09:47:24AM +0900, Daisuke Nishimura wrote:
> > Stable team.
> > 
> > Cay you pick this up for 2.6.32.y(and 2.6.31.y if it will be released) ?
> > 
> > This is a for-stable version of a bugfix patch that corresponds to the
> > upstream commmit d31f56dbf8bafaacb0c617f9a6f137498d5c7aed.
> 
> I've applied it to the .32-stable tree, but it does not apply to .31.
> Care to provide a version of the patch for that kernel if you want it
> applied there?
> 
hmm, strange. I can apply it onto 2.6.31.9. It might conflict with other patches
in 2.6.31.y queue ?
Anyway, I've attached the patch that is rebased on 2.6.31.9. Please tell me if you
have any problem with it.

v3: rebased on 2.6.31.9
===

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [stable] [BUGFIX][PATCH v3] memcg: avoid oom-killing innocent task in case of use_hierarchy
  2010-01-05  3:26           ` [stable][BUGFIX][PATCH v3] " Daisuke Nishimura
@ 2010-01-05 19:26             ` Greg KH
  2010-01-05 19:33             ` patch memcg-avoid-oom-killing-innocent-task-in-case-of-use_hierarchy.patch added to 2.6.31-stable tree gregkh
  1 sibling, 0 replies; 20+ messages in thread
From: Greg KH @ 2010-01-05 19:26 UTC (permalink / raw)
  To: nishimura
  Cc: LKML, linux-mm, KOSAKI Motohiro, David Rientjes, Andrew Morton,
	stable, KAMEZAWA Hiroyuki, Balbir Singh

On Tue, Jan 05, 2010 at 12:26:33PM +0900, Daisuke Nishimura wrote:
> On Mon, 4 Jan 2010 14:28:19 -0800
> Greg KH <greg@kroah.com> wrote:
> 
> > On Thu, Dec 17, 2009 at 09:47:24AM +0900, Daisuke Nishimura wrote:
> > > Stable team.
> > > 
> > > Cay you pick this up for 2.6.32.y(and 2.6.31.y if it will be released) ?
> > > 
> > > This is a for-stable version of a bugfix patch that corresponds to the
> > > upstream commmit d31f56dbf8bafaacb0c617f9a6f137498d5c7aed.
> > 
> > I've applied it to the .32-stable tree, but it does not apply to .31.
> > Care to provide a version of the patch for that kernel if you want it
> > applied there?
> > 
> hmm, strange. I can apply it onto 2.6.31.9. It might conflict with other patches
> in 2.6.31.y queue ?
> Anyway, I've attached the patch that is rebased on 2.6.31.9. Please tell me if you
> have any problem with it.
> 
> v3: rebased on 2.6.31.9

This version worked, thanks for regenerating it.

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* patch memcg-avoid-oom-killing-innocent-task-in-case-of-use_hierarchy.patch added to 2.6.31-stable tree
  2010-01-05  3:26           ` [stable][BUGFIX][PATCH v3] " Daisuke Nishimura
  2010-01-05 19:26             ` [stable] [BUGFIX][PATCH " Greg KH
@ 2010-01-05 19:33             ` gregkh
  1 sibling, 0 replies; 20+ messages in thread
From: gregkh @ 2010-01-05 19:33 UTC (permalink / raw)
  To: d-nishimura, akpm, balbir, gregkh, greg, kamezawa.hiroyu,
	kosaki.motohiro, linux-mm, nishimura, rientjes, stable
  Cc: stable-commits


This is a note to let you know that we have just queued up the patch titled

    Subject: memcg: avoid oom-killing innocent task in case of use_hierarchy

to the 2.6.31-stable tree.  Its filename is

    memcg-avoid-oom-killing-innocent-task-in-case-of-use_hierarchy.patch

A git repo of this tree can be found at 
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-01-05 19:33 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-24  5:57 [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy Daisuke Nishimura
2009-11-24  7:28 ` [BUGFIX][PATCH -stable] " Daisuke Nishimura
2009-11-25  0:00   ` KAMEZAWA Hiroyuki
2009-11-25  5:32     ` [BUGFIX][PATCH v2 " Daisuke Nishimura
2009-11-25  5:50       ` KAMEZAWA Hiroyuki
2009-11-25 20:45       ` Andrew Morton
2009-11-26  0:11         ` [BUGFIX][PATCH v2 -mmotm] " Daisuke Nishimura
2009-12-17  0:47       ` [BUGFIX][PATCH v2 -stable] " Daisuke Nishimura
2010-01-04 22:28         ` [stable] " Greg KH
2010-01-05  3:26           ` [stable][BUGFIX][PATCH v3] " Daisuke Nishimura
2010-01-05 19:26             ` [stable] [BUGFIX][PATCH " Greg KH
2010-01-05 19:33             ` patch memcg-avoid-oom-killing-innocent-task-in-case-of-use_hierarchy.patch added to 2.6.31-stable tree gregkh
2009-11-25  4:08   ` [BUGFIX][PATCH -stable] memcg: avoid oom-killing innocent task in case of use_hierarchy Balbir Singh
2009-11-24 13:31 ` [BUGFIX][PATCH -mmotm] " Balbir Singh
2009-11-24 14:00   ` Daisuke Nishimura
2009-11-24 17:04     ` Balbir Singh
2009-11-24 23:49       ` Daisuke Nishimura
2009-11-25  3:29         ` Balbir Singh
2009-11-25  0:07       ` KAMEZAWA Hiroyuki
2009-11-25  4:09 ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).