linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/9] oom: don't try to kill oom_unkillable child
@ 2010-06-17  1:45 KOSAKI Motohiro
  2010-06-17  1:51 ` [PATCH 3/9] oom: make oom_unkillable_task() helper function KOSAKI Motohiro
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-17  1:45 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, David Rientjes, Minchan Kim,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro


Now, badness() doesn't care neigher CPUSET nor mempolicy. Then
if the victim child process have disjoint nodemask, __out_of_memory()
can makes kernel hang eventually.

This patch fixes it.

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 26ae697..0aeacb2 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -429,7 +429,7 @@ static int oom_kill_task(struct task_struct *p)
 
 static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 			    unsigned long points, struct mem_cgroup *mem,
-			    const char *message)
+			    nodemask_t *nodemask, const char *message)
 {
 	struct task_struct *victim = p;
 	struct task_struct *child;
@@ -469,6 +469,8 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 				continue;
 			if (mem && !task_in_mem_cgroup(child, mem))
 				continue;
+			if (!has_intersects_mems_allowed(child, nodemask))
+				continue;
 
 			/* badness() returns 0 if the thread is unkillable */
 			child_points = badness(child, uptime.tv_sec);
@@ -519,7 +521,7 @@ retry:
 	if (!p || PTR_ERR(p) == -1UL)
 		goto out;
 
-	if (oom_kill_process(p, gfp_mask, 0, points, mem,
+	if (oom_kill_process(p, gfp_mask, 0, points, mem, NULL,
 				"Memory cgroup out of memory"))
 		goto retry;
 out:
@@ -678,7 +680,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 		 * non-zero, current could not be killed so we must fallback to
 		 * the tasklist scan.
 		 */
-		if (!oom_kill_process(current, gfp_mask, order, 0, NULL,
+		if (!oom_kill_process(current, gfp_mask, order, 0, NULL, nodemask,
 				"Out of memory (oom_kill_allocating_task)"))
 			return;
 	}
@@ -697,7 +699,7 @@ retry:
 		panic("Out of memory and no killable processes...\n");
 	}
 
-	if (oom_kill_process(p, gfp_mask, order, points, NULL,
+	if (oom_kill_process(p, gfp_mask, order, points, NULL, nodemask,
 			     "Out of memory"))
 		goto retry;
 	read_unlock(&tasklist_lock);
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/9] oom: make oom_unkillable_task() helper function
  2010-06-17  1:45 [PATCH 1/9] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
@ 2010-06-17  1:51 ` KOSAKI Motohiro
  2010-06-21 20:15   ` David Rientjes
  2010-06-17  1:51 ` [PATCH 2/9] oom: oom_kill_process() doesn't select kthread child KOSAKI Motohiro
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-17  1:51 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, David Rientjes, Minchan Kim,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro


Now, we have the same task check in two places. Unify it.

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   33 ++++++++++++++++++++++-----------
 1 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index dc8589e..afcbf17 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -277,6 +277,26 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
 }
 #endif
 
+/* return true if the task is not adequate as candidate victim task. */
+static bool oom_unkillable_task(struct task_struct *p, struct mem_cgroup *mem,
+			   const nodemask_t *nodemask)
+{
+	if (is_global_init(p))
+		return true;
+	if (p->flags & PF_KTHREAD)
+		return true;
+
+	/* When mem_cgroup_out_of_memory() and p is not member of the group */
+	if (mem && !task_in_mem_cgroup(p, mem))
+		return true;
+
+	/* p may not have freeable memory in nodemask */
+	if (!has_intersects_mems_allowed(p, nodemask))
+		return true;
+
+	return false;
+}
+
 /*
  * Simple selection loop. We chose the process with the highest
  * number of 'points'. We expect the caller will lock the tasklist.
@@ -295,12 +315,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 	for_each_process(p) {
 		unsigned long points;
 
-		/* skip the init task and kthreads */
-		if (is_global_init(p) || (p->flags & PF_KTHREAD))
-			continue;
-		if (mem && !task_in_mem_cgroup(p, mem))
-			continue;
-		if (!has_intersects_mems_allowed(p, nodemask))
+		if (oom_unkillable_task(p, mem, nodemask))
 			continue;
 
 		/*
@@ -467,11 +482,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 
 			if (child->mm == p->mm)
 				continue;
-			if (child->flags & PF_KTHREAD)
-				continue;
-			if (mem && !task_in_mem_cgroup(child, mem))
-				continue;
-			if (!has_intersects_mems_allowed(child, nodemask))
+			if (oom_unkillable_task(p, mem, nodemask))
 				continue;
 
 			/* badness() returns 0 if the thread is unkillable */
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/9] oom: oom_kill_process() doesn't select kthread child
  2010-06-17  1:45 [PATCH 1/9] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
  2010-06-17  1:51 ` [PATCH 3/9] oom: make oom_unkillable_task() helper function KOSAKI Motohiro
@ 2010-06-17  1:51 ` KOSAKI Motohiro
  2010-06-21 20:14   ` David Rientjes
  2010-06-17  1:51 ` [PATCH 4/9] oom: oom_kill_process() need to check p is unkillable KOSAKI Motohiro
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-17  1:51 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, David Rientjes, Minchan Kim,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro


Now, select_bad_process() have PF_KTHREAD check, but oom_kill_process
doesn't. It mean oom_kill_process() may choose wrong task, especially,
when the child are using use_mm().

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0aeacb2..dc8589e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -467,6 +467,8 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 
 			if (child->mm == p->mm)
 				continue;
+			if (child->flags & PF_KTHREAD)
+				continue;
 			if (mem && !task_in_mem_cgroup(child, mem))
 				continue;
 			if (!has_intersects_mems_allowed(child, nodemask))
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/9] oom: cleanup has_intersects_mems_allowed()
  2010-06-17  1:45 [PATCH 1/9] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
                   ` (2 preceding siblings ...)
  2010-06-17  1:51 ` [PATCH 4/9] oom: oom_kill_process() need to check p is unkillable KOSAKI Motohiro
@ 2010-06-17  1:51 ` KOSAKI Motohiro
  2010-06-17  4:20   ` David Rientjes
  2010-06-17  1:51 ` [PATCH 6/9] oom: unify CAP_SYS_RAWIO check into other superuser check KOSAKI Motohiro
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-17  1:51 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, David Rientjes, Minchan Kim,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro


Now has_intersects_mems_allowed() has own thread iterate logic, but
it should use while_each_thread().

It slightly improve the code readability.

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 541a59b..b27db90 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -46,10 +46,10 @@ static DEFINE_SPINLOCK(zone_scan_lock);
  * shares the same mempolicy nodes as current if it is bound by such a policy
  * and whether or not it has the same set of allowed cpuset nodes.
  */
-static bool has_intersects_mems_allowed(struct task_struct *tsk,
+static bool has_intersects_mems_allowed(struct task_struct *p,
 					const nodemask_t *mask)
 {
-	struct task_struct *start = tsk;
+	struct task_struct *tsk = p;
 
 	do {
 		if (mask) {
@@ -69,8 +69,8 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
 			if (cpuset_mems_allowed_intersects(current, tsk))
 				return true;
 		}
-		tsk = next_thread(tsk);
-	} while (tsk != start);
+	} while_each_thread(p, tsk);
+
 	return false;
 }
 #else
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/9] oom: oom_kill_process() need to check p is unkillable
  2010-06-17  1:45 [PATCH 1/9] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
  2010-06-17  1:51 ` [PATCH 3/9] oom: make oom_unkillable_task() helper function KOSAKI Motohiro
  2010-06-17  1:51 ` [PATCH 2/9] oom: oom_kill_process() doesn't select kthread child KOSAKI Motohiro
@ 2010-06-17  1:51 ` KOSAKI Motohiro
  2010-06-17  4:19   ` David Rientjes
  2010-06-17  1:51 ` [PATCH 5/9] oom: cleanup has_intersects_mems_allowed() KOSAKI Motohiro
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-17  1:51 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, David Rientjes, Minchan Kim,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

When oom_kill_allocating_task is enabled, an argument task of
oom_kill_process is not selected by select_bad_process(), It's
just out_of_memory() caller task. It mean the task can be
unkillable. check it first.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index afcbf17..541a59b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -687,7 +687,8 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	check_panic_on_oom(constraint, gfp_mask, order);
 
 	read_lock(&tasklist_lock);
-	if (sysctl_oom_kill_allocating_task) {
+	if (sysctl_oom_kill_allocating_task &&
+	    !oom_unkillable_task(current, NULL, nodemask)) {
 		/*
 		 * oom_kill_process() needs tasklist_lock held.  If it returns
 		 * non-zero, current could not be killed so we must fallback to
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 8/9] oom: give the dying task a higher priority
  2010-06-17  1:45 [PATCH 1/9] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
                   ` (4 preceding siblings ...)
  2010-06-17  1:51 ` [PATCH 6/9] oom: unify CAP_SYS_RAWIO check into other superuser check KOSAKI Motohiro
@ 2010-06-17  1:51 ` KOSAKI Motohiro
  2010-06-17  1:51 ` [PATCH 7/9] oom: remove child->mm check from oom_kill_process() KOSAKI Motohiro
  2010-06-17  1:51 ` [PATCH 9/9] oom: multi threaded process coredump don't make deadlock KOSAKI Motohiro
  7 siblings, 0 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-17  1:51 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, David Rientjes, Minchan Kim,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

From: Luis Claudio R. Goncalves <lclaudio@uudg.org>

In a system under heavy load it was observed that even after the
oom-killer selects a task to die, the task may take a long time to die.

Right after sending a SIGKILL to the task selected by the oom-killer
this task has it's priority increased so that it can exit() exit soon,
freeing memory. That is accomplished by:

        /*
         * We give our sacrificial lamb high priority and access to
         * all the memory it needs. That way it should be able to
         * exit() and clear out its resources quickly...
         */
 	p->rt.time_slice = HZ;
 	set_tsk_thread_flag(p, TIF_MEMDIE);

It sounds plausible giving the dying task an even higher priority to be
sure it will be scheduled sooner and free the desired memory. It was
suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that
this task won't interfere with any running RT task.

If the dying task is already an RT task, leave it untouched.
Another good suggestion, implemented here, was to avoid boosting the
dying task priority in case of mem_cgroup OOM.

Signed-off-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   34 +++++++++++++++++++++++++++++++---
 1 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index a6bb2d7..b2ea2d8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -82,6 +82,24 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
 #endif /* CONFIG_NUMA */
 
 /*
+ * If this is a system OOM (not a memcg OOM) and the task selected to be
+ * killed is not already running at high (RT) priorities, speed up the
+ * recovery by boosting the dying task to the lowest FIFO priority.
+ * That helps with the recovery and avoids interfering with RT tasks.
+ */
+static void boost_dying_task_prio(struct task_struct *p,
+				  struct mem_cgroup *mem)
+{
+	struct sched_param param = { .sched_priority = 1 };
+
+	if (mem)
+		return;
+
+	if (!rt_task(p))
+		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
+}
+
+/*
  * The process p may have detached its own ->mm while exiting or through
  * use_mm(), but one or more of its subthreads may still have a valid
  * pointer.  Return p, or any of its subthreads with a valid ->mm, with
@@ -417,7 +435,7 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
-static int oom_kill_task(struct task_struct *p)
+static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 {
 	p = find_lock_task_mm(p);
 	if (!p || p->signal->oom_adj == OOM_DISABLE) {
@@ -430,9 +448,17 @@ static int oom_kill_task(struct task_struct *p)
 		K(get_mm_counter(p->mm, MM_FILEPAGES)));
 	task_unlock(p);
 
-	p->rt.time_slice = HZ;
+
 	set_tsk_thread_flag(p, TIF_MEMDIE);
 	force_sig(SIGKILL, p);
+
+	/*
+	 * We give our sacrificial lamb high priority and access to
+	 * all the memory it needs. That way it should be able to
+	 * exit() and clear out its resources quickly...
+	 */
+	boost_dying_task_prio(p, mem);
+
 	return 0;
 }
 #undef K
@@ -456,6 +482,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 */
 	if (p->flags & PF_EXITING) {
 		set_tsk_thread_flag(p, TIF_MEMDIE);
+		boost_dying_task_prio(p, mem);
 		return 0;
 	}
 
@@ -487,7 +514,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		}
 	} while_each_thread(p, t);
 
-	return oom_kill_task(victim);
+	return oom_kill_task(victim, mem);
 }
 
 /*
@@ -668,6 +695,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	 */
 	if (fatal_signal_pending(current)) {
 		set_thread_flag(TIF_MEMDIE);
+		boost_dying_task_prio(current, NULL);
 		return;
 	}
 
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/9] oom: unify CAP_SYS_RAWIO check into other superuser check
  2010-06-17  1:45 [PATCH 1/9] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
                   ` (3 preceding siblings ...)
  2010-06-17  1:51 ` [PATCH 5/9] oom: cleanup has_intersects_mems_allowed() KOSAKI Motohiro
@ 2010-06-17  1:51 ` KOSAKI Motohiro
  2010-06-17  4:18   ` David Rientjes
  2010-06-17  1:51 ` [PATCH 8/9] oom: give the dying task a higher priority KOSAKI Motohiro
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-17  1:51 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, David Rientjes, Minchan Kim,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro


Now, CAP_SYS_RAWIO check is very strange. if the user have both
CAP_SYS_ADMIN and CAP_SYS_RAWIO, points will makes 1/16.

Superuser's 1/4 bonus worthness is quite a bit dubious, but
considerable. However 1/16 is obviously insane.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   17 ++++++-----------
 1 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b27db90..7f91151 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -199,19 +199,14 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
 
 	/*
 	 * Superuser processes are usually more important, so we make it
-	 * less likely that we kill those.
+	 * less likely that we kill those. And we don't want to kill a
+	 * process with direct hardware access. Not only could that mess
+	 * up the hardware, but usually users tend to only have this
+	 * flag set on applications they think of as important.
 	 */
 	if (has_capability_noaudit(p, CAP_SYS_ADMIN) ||
-	    has_capability_noaudit(p, CAP_SYS_RESOURCE))
-		points /= 4;
-
-	/*
-	 * We don't want to kill a process with direct hardware access.
-	 * Not only could that mess up the hardware, but usually users
-	 * tend to only have this flag set on applications they think
-	 * of as important.
-	 */
-	if (has_capability_noaudit(p, CAP_SYS_RAWIO))
+	    has_capability_noaudit(p, CAP_SYS_RESOURCE) ||
+	    has_capability_noaudit(p, CAP_SYS_RAWIO))
 		points /= 4;
 
 	/*
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/9] oom: remove child->mm check from oom_kill_process()
  2010-06-17  1:45 [PATCH 1/9] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
                   ` (5 preceding siblings ...)
  2010-06-17  1:51 ` [PATCH 8/9] oom: give the dying task a higher priority KOSAKI Motohiro
@ 2010-06-17  1:51 ` KOSAKI Motohiro
  2010-06-17  1:51 ` [PATCH 9/9] oom: multi threaded process coredump don't make deadlock KOSAKI Motohiro
  7 siblings, 0 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-17  1:51 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, David Rientjes, Minchan Kim,
	KAMEZAWA Hiroyuki, Oleg Nesterov
  Cc: kosaki.motohiro


Current "child->mm == p->mm" mean prevent to select vfork() task.
But we don't have any reason to prevent it.

Remvoed.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7f91151..a6bb2d7 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -475,8 +475,6 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		list_for_each_entry(child, &t->children, sibling) {
 			unsigned long child_points;
 
-			if (child->mm == p->mm)
-				continue;
 			if (oom_unkillable_task(p, mem, nodemask))
 				continue;
 
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 9/9] oom: multi threaded process coredump don't make deadlock
  2010-06-17  1:45 [PATCH 1/9] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
                   ` (6 preceding siblings ...)
  2010-06-17  1:51 ` [PATCH 7/9] oom: remove child->mm check from oom_kill_process() KOSAKI Motohiro
@ 2010-06-17  1:51 ` KOSAKI Motohiro
  7 siblings, 0 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-17  1:51 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, David Rientjes, Minchan Kim,
	KAMEZAWA Hiroyuki, Oleg Nesterov
  Cc: kosaki.motohiro


Oleg pointed out current PF_EXITING check is wrong. Because PF_EXITING
is per-thread flag, not per-process flag. He said,

   Two threads, group-leader L and its sub-thread T. T dumps the code.
   In this case both threads have ->mm != NULL, L has PF_EXITING.

   The first problem is, select_bad_process() always return -1 in this
   case (even if the caller is T, this doesn't matter).

   The second problem is that we should add TIF_MEMDIE to T, not L.

I think we can remove this dubious PF_EXITING check. but as first step,
This patch add the protection of multi threaded issue.

Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b2ea2d8..4abc5c1 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -353,7 +353,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 		 * the process of exiting and releasing its resources.
 		 * Otherwise we could get an easy OOM deadlock.
 		 */
-		if ((p->flags & PF_EXITING) && p->mm) {
+		if (thread_group_empty(p) && (p->flags & PF_EXITING) && p->mm) {
 			if (p != current)
 				return ERR_PTR(-1UL);
 
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/9] oom: unify CAP_SYS_RAWIO check into other superuser check
  2010-06-17  1:51 ` [PATCH 6/9] oom: unify CAP_SYS_RAWIO check into other superuser check KOSAKI Motohiro
@ 2010-06-17  4:18   ` David Rientjes
  2010-06-21 11:46     ` KOSAKI Motohiro
  0 siblings, 1 reply; 24+ messages in thread
From: David Rientjes @ 2010-06-17  4:18 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Minchan Kim, KAMEZAWA Hiroyuki

On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:

> 
> Now, CAP_SYS_RAWIO check is very strange. if the user have both
> CAP_SYS_ADMIN and CAP_SYS_RAWIO, points will makes 1/16.
> 
> Superuser's 1/4 bonus worthness is quite a bit dubious, but
> considerable. However 1/16 is obviously insane.
> 

This is already obsoleted by the heuristic rewrite that is already under 
review.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] oom: oom_kill_process() need to check p is unkillable
  2010-06-17  1:51 ` [PATCH 4/9] oom: oom_kill_process() need to check p is unkillable KOSAKI Motohiro
@ 2010-06-17  4:19   ` David Rientjes
  2010-06-21 11:45     ` KOSAKI Motohiro
  0 siblings, 1 reply; 24+ messages in thread
From: David Rientjes @ 2010-06-17  4:19 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Minchan Kim, KAMEZAWA Hiroyuki

On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:

> When oom_kill_allocating_task is enabled, an argument task of
> oom_kill_process is not selected by select_bad_process(), It's
> just out_of_memory() caller task. It mean the task can be
> unkillable. check it first.
> 

This should be unnecessary if oom_kill_process() appropriately returns 
non-zero when it cannot kill a task.  What problem are you addressing with 
this fix?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/9] oom: cleanup has_intersects_mems_allowed()
  2010-06-17  1:51 ` [PATCH 5/9] oom: cleanup has_intersects_mems_allowed() KOSAKI Motohiro
@ 2010-06-17  4:20   ` David Rientjes
  2010-06-21 11:45     ` KOSAKI Motohiro
  0 siblings, 1 reply; 24+ messages in thread
From: David Rientjes @ 2010-06-17  4:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Minchan Kim, KAMEZAWA Hiroyuki

On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:

> 
> Now has_intersects_mems_allowed() has own thread iterate logic, but
> it should use while_each_thread().
> 
> It slightly improve the code readability.
> 
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

I disagree that the renaming of the variables is necessary, please simply 
change the while (tsk != start) to use while_each_thread(tsk, start);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] oom: oom_kill_process() need to check p is unkillable
  2010-06-17  4:19   ` David Rientjes
@ 2010-06-21 11:45     ` KOSAKI Motohiro
  2010-06-21 14:00       ` Minchan Kim
  2010-06-21 20:04       ` David Rientjes
  0 siblings, 2 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-21 11:45 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Minchan Kim,
	KAMEZAWA Hiroyuki

> On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:
> 
> > When oom_kill_allocating_task is enabled, an argument task of
> > oom_kill_process is not selected by select_bad_process(), It's
> > just out_of_memory() caller task. It mean the task can be
> > unkillable. check it first.
> > 
> 
> This should be unnecessary if oom_kill_process() appropriately returns 
> non-zero when it cannot kill a task.  What problem are you addressing with 
> this fix?

oom_kill_process() only check its children are unkillable, not its own.
To add check oom_kill_process() also solve the issue. as my previous patch does.
but Minchan pointed out it's unnecessary. because when !oom_kill_allocating_task
case, we have the same check in select_bad_process(). 




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/9] oom: cleanup has_intersects_mems_allowed()
  2010-06-17  4:20   ` David Rientjes
@ 2010-06-21 11:45     ` KOSAKI Motohiro
  2010-06-21 20:09       ` David Rientjes
  0 siblings, 1 reply; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-21 11:45 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Minchan Kim,
	KAMEZAWA Hiroyuki

> On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:
> 
> > 
> > Now has_intersects_mems_allowed() has own thread iterate logic, but
> > it should use while_each_thread().
> > 
> > It slightly improve the code readability.
> > 
> > Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> I disagree that the renaming of the variables is necessary, please simply 
> change the while (tsk != start) to use while_each_thread(tsk, start);

This is common naming rule of while_each_thread(). please grep.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/9] oom: unify CAP_SYS_RAWIO check into other superuser check
  2010-06-17  4:18   ` David Rientjes
@ 2010-06-21 11:46     ` KOSAKI Motohiro
  0 siblings, 0 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-21 11:46 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Minchan Kim,
	KAMEZAWA Hiroyuki

> On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:
> 
> > 
> > Now, CAP_SYS_RAWIO check is very strange. if the user have both
> > CAP_SYS_ADMIN and CAP_SYS_RAWIO, points will makes 1/16.
> > 
> > Superuser's 1/4 bonus worthness is quite a bit dubious, but
> > considerable. However 1/16 is obviously insane.
> > 
> 
> This is already obsoleted by the heuristic rewrite that is already under 
> review.

Huh? I don't think this patch conflict against your. but I'm ok to pending
awhile.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] oom: oom_kill_process() need to check p is unkillable
  2010-06-21 11:45     ` KOSAKI Motohiro
@ 2010-06-21 14:00       ` Minchan Kim
  2010-06-21 20:04       ` David Rientjes
  1 sibling, 0 replies; 24+ messages in thread
From: Minchan Kim @ 2010-06-21 14:00 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: David Rientjes, LKML, linux-mm, Andrew Morton, KAMEZAWA Hiroyuki

On Mon, Jun 21, 2010 at 08:45:45PM +0900, KOSAKI Motohiro wrote:
> > On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:
> > 
> > > When oom_kill_allocating_task is enabled, an argument task of
> > > oom_kill_process is not selected by select_bad_process(), It's
> > > just out_of_memory() caller task. It mean the task can be
> > > unkillable. check it first.
> > > 
> > 
> > This should be unnecessary if oom_kill_process() appropriately returns 
> > non-zero when it cannot kill a task.  What problem are you addressing with 
> > this fix?
> 
> oom_kill_process() only check its children are unkillable, not its own.
> To add check oom_kill_process() also solve the issue. as my previous patch does.
> but Minchan pointed out it's unnecessary. because when !oom_kill_allocating_task
> case, we have the same check in select_bad_process(). 
> 
> 
> 

If kthread doesn't use other process's mm, oom_kill_process can return non-zero.
and it might be no problem. 
but let's consider following case that kthread use use_mm. 

if (oom_kill_allocating_task)
        oom_kill_process
                pr_err("kill process.."); <-- false alarm
                oom_kill_task
                        find_lock_task_mm if kthread use use_mm
                        kill kernel thread

Yes. it's a just theory that kthread use use_mm and is selected as victim.
But although kthread doesn't use use_mm, oom_kill_process emits false alarm.
As a matter of fact, it doesn't kill itself or sacrifice child.

I think victim process selection should be done before calling 
oom_kill_process. oom_kill_process and oom_kill_task's role is  
just to try to kill the process or process's children by best effort 
as function's name.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] oom: oom_kill_process() need to check p is unkillable
  2010-06-21 11:45     ` KOSAKI Motohiro
  2010-06-21 14:00       ` Minchan Kim
@ 2010-06-21 20:04       ` David Rientjes
  2010-06-30  9:26         ` KOSAKI Motohiro
  1 sibling, 1 reply; 24+ messages in thread
From: David Rientjes @ 2010-06-21 20:04 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Minchan Kim, KAMEZAWA Hiroyuki

On Mon, 21 Jun 2010, KOSAKI Motohiro wrote:

> > > When oom_kill_allocating_task is enabled, an argument task of
> > > oom_kill_process is not selected by select_bad_process(), It's
> > > just out_of_memory() caller task. It mean the task can be
> > > unkillable. check it first.
> > > 
> > 
> > This should be unnecessary if oom_kill_process() appropriately returns 
> > non-zero when it cannot kill a task.  What problem are you addressing with 
> > this fix?
> 
> oom_kill_process() only check its children are unkillable, not its own.

No, oom_kill_process() returns the value of oom_kill_task(victim) which is 
non-zero for !victim->mm in mmotm-2010-06-11-16-40 (and 2.6.34 although 
victim == p in that case).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/9] oom: cleanup has_intersects_mems_allowed()
  2010-06-21 11:45     ` KOSAKI Motohiro
@ 2010-06-21 20:09       ` David Rientjes
  2010-06-30  9:26         ` KOSAKI Motohiro
  0 siblings, 1 reply; 24+ messages in thread
From: David Rientjes @ 2010-06-21 20:09 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Minchan Kim, KAMEZAWA Hiroyuki

On Mon, 21 Jun 2010, KOSAKI Motohiro wrote:

> > I disagree that the renaming of the variables is necessary, please simply 
> > change the while (tsk != start) to use while_each_thread(tsk, start);
> 
> This is common naming rule of while_each_thread(). please grep.
> 

I disagree, there's no sense in substituting variable names like "tsk" for 
`p' and removing a very clear and obvious "start" task: it doesn't improve 
code readability.

I'm in favor of changing the while (tsk != start) to 
while_each_thread(tsk, start) which is very trivial to understand and much 
more readable than while_each_thread(p, tsk).  With the latter, it's not 
clear whether `p' or "tsk" is the iterator and which is the constant.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/9] oom: oom_kill_process() doesn't select kthread child
  2010-06-17  1:51 ` [PATCH 2/9] oom: oom_kill_process() doesn't select kthread child KOSAKI Motohiro
@ 2010-06-21 20:14   ` David Rientjes
  2010-06-30  9:26     ` KOSAKI Motohiro
  0 siblings, 1 reply; 24+ messages in thread
From: David Rientjes @ 2010-06-21 20:14 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Minchan Kim, KAMEZAWA Hiroyuki

On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:

> 
> Now, select_bad_process() have PF_KTHREAD check, but oom_kill_process
> doesn't. It mean oom_kill_process() may choose wrong task, especially,
> when the child are using use_mm().
> 

This type of check should be moved to badness(), it will prevent these 
types of tasks from being selected both in select_bad_process() and 
oom_kill_process() if the score it returns is 0.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/9] oom: make oom_unkillable_task() helper function
  2010-06-17  1:51 ` [PATCH 3/9] oom: make oom_unkillable_task() helper function KOSAKI Motohiro
@ 2010-06-21 20:15   ` David Rientjes
  2010-06-30  9:26     ` KOSAKI Motohiro
  0 siblings, 1 reply; 24+ messages in thread
From: David Rientjes @ 2010-06-21 20:15 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Minchan Kim, KAMEZAWA Hiroyuki

On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:

> 
> Now, we have the same task check in two places. Unify it.
> 

We should exclude tasks from select_bad_process() and oom_kill_process() 
by having badness() return a score of 0, just like it's done for 
OOM_DISABLE.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/9] oom: cleanup has_intersects_mems_allowed()
  2010-06-21 20:09       ` David Rientjes
@ 2010-06-30  9:26         ` KOSAKI Motohiro
  0 siblings, 0 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:26 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Minchan Kim,
	KAMEZAWA Hiroyuki

> On Mon, 21 Jun 2010, KOSAKI Motohiro wrote:
> 
> > > I disagree that the renaming of the variables is necessary, please simply 
> > > change the while (tsk != start) to use while_each_thread(tsk, start);
> > 
> > This is common naming rule of while_each_thread(). please grep.
> > 
> 
> I disagree, there's no sense in substituting variable names like "tsk" for 
> `p' and removing a very clear and obvious "start" task: it doesn't improve 
> code readability.
> 
> I'm in favor of changing the while (tsk != start) to 
> while_each_thread(tsk, start) which is very trivial to understand and much 
> more readable than while_each_thread(p, tsk).  With the latter, it's not 
> clear whether `p' or "tsk" is the iterator and which is the constant.

Heh, I'm ok this. It isn't big matter at all.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] oom: oom_kill_process() need to check p is unkillable
  2010-06-21 20:04       ` David Rientjes
@ 2010-06-30  9:26         ` KOSAKI Motohiro
  0 siblings, 0 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:26 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Minchan Kim,
	KAMEZAWA Hiroyuki

> On Mon, 21 Jun 2010, KOSAKI Motohiro wrote:
> 
> > > > When oom_kill_allocating_task is enabled, an argument task of
> > > > oom_kill_process is not selected by select_bad_process(), It's
> > > > just out_of_memory() caller task. It mean the task can be
> > > > unkillable. check it first.
> > > > 
> > > 
> > > This should be unnecessary if oom_kill_process() appropriately returns 
> > > non-zero when it cannot kill a task.  What problem are you addressing with 
> > > this fix?
> > 
> > oom_kill_process() only check its children are unkillable, not its own.
> 
> No, oom_kill_process() returns the value of oom_kill_task(victim) which is 
> non-zero for !victim->mm in mmotm-2010-06-11-16-40 (and 2.6.34 although 
> victim == p in that case).

oom_kill_task() only check OOM_DISABLE. and Minchan elaborated more detailed
concern. please see his mail.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/9] oom: make oom_unkillable_task() helper function
  2010-06-21 20:15   ` David Rientjes
@ 2010-06-30  9:26     ` KOSAKI Motohiro
  0 siblings, 0 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:26 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Minchan Kim,
	KAMEZAWA Hiroyuki

> On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:
> 
> > 
> > Now, we have the same task check in two places. Unify it.
> > 
> 
> We should exclude tasks from select_bad_process() and oom_kill_process() 
> by having badness() return a score of 0, just like it's done for 
> OOM_DISABLE.

No. again, select_bad_process() have meaningful check order.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/9] oom: oom_kill_process() doesn't select kthread child
  2010-06-21 20:14   ` David Rientjes
@ 2010-06-30  9:26     ` KOSAKI Motohiro
  0 siblings, 0 replies; 24+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:26 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Minchan Kim,
	KAMEZAWA Hiroyuki

> On Thu, 17 Jun 2010, KOSAKI Motohiro wrote:
> 
> > 
> > Now, select_bad_process() have PF_KTHREAD check, but oom_kill_process
> > doesn't. It mean oom_kill_process() may choose wrong task, especially,
> > when the child are using use_mm().
> > 
> 
> This type of check should be moved to badness(), it will prevent these 
> types of tasks from being selected both in select_bad_process() and 
> oom_kill_process() if the score it returns is 0.

No, process check order of select_bad_process() is certain meaningful.
Only PF_KTHREAD check can move into badness(). Okey, that's fix to
incorrect /proc/<pid>/oom_score issue.




^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2010-06-30  9:26 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-17  1:45 [PATCH 1/9] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
2010-06-17  1:51 ` [PATCH 3/9] oom: make oom_unkillable_task() helper function KOSAKI Motohiro
2010-06-21 20:15   ` David Rientjes
2010-06-30  9:26     ` KOSAKI Motohiro
2010-06-17  1:51 ` [PATCH 2/9] oom: oom_kill_process() doesn't select kthread child KOSAKI Motohiro
2010-06-21 20:14   ` David Rientjes
2010-06-30  9:26     ` KOSAKI Motohiro
2010-06-17  1:51 ` [PATCH 4/9] oom: oom_kill_process() need to check p is unkillable KOSAKI Motohiro
2010-06-17  4:19   ` David Rientjes
2010-06-21 11:45     ` KOSAKI Motohiro
2010-06-21 14:00       ` Minchan Kim
2010-06-21 20:04       ` David Rientjes
2010-06-30  9:26         ` KOSAKI Motohiro
2010-06-17  1:51 ` [PATCH 5/9] oom: cleanup has_intersects_mems_allowed() KOSAKI Motohiro
2010-06-17  4:20   ` David Rientjes
2010-06-21 11:45     ` KOSAKI Motohiro
2010-06-21 20:09       ` David Rientjes
2010-06-30  9:26         ` KOSAKI Motohiro
2010-06-17  1:51 ` [PATCH 6/9] oom: unify CAP_SYS_RAWIO check into other superuser check KOSAKI Motohiro
2010-06-17  4:18   ` David Rientjes
2010-06-21 11:46     ` KOSAKI Motohiro
2010-06-17  1:51 ` [PATCH 8/9] oom: give the dying task a higher priority KOSAKI Motohiro
2010-06-17  1:51 ` [PATCH 7/9] oom: remove child->mm check from oom_kill_process() KOSAKI Motohiro
2010-06-17  1:51 ` [PATCH 9/9] oom: multi threaded process coredump don't make deadlock KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).