linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [mmotm 0611][PATCH 00/11] various OOM bugfixes v3
@ 2010-06-30  9:25 KOSAKI Motohiro
  2010-06-30  9:27 ` [PATCH 01/11] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
                   ` (10 more replies)
  0 siblings, 11 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:25 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

Hi

Here is updated series for various OOM fixes.
Almost fixes are trivial. 

One big improvement is Luis's dying task priority boost patch.
This is necessary for RT folks.


  oom: don't try to kill oom_unkillable child
  oom: oom_kill_process() doesn't select kthread child
  oom: make oom_unkillable_task() helper function
  oom: oom_kill_process() need to check p is unkillable
  oom: /proc/<pid>/oom_score treat kernel thread honestly
  oom: kill duplicate OOM_DISABLE check
  oom: move OOM_DISABLE check from oom_kill_task to out_of_memory()
  oom: cleanup has_intersects_mems_allowed()
  oom: remove child->mm check from oom_kill_process()
  oom: give the dying task a higher priority
  oom: multi threaded process coredump don't make deadlock

 fs/proc/base.c |    5 ++-
 mm/oom_kill.c  |  100 +++++++++++++++++++++++++++++++++++++++-----------------
 2 files changed, 73 insertions(+), 32 deletions(-)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/11] oom: don't try to kill oom_unkillable child
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
@ 2010-06-30  9:27 ` KOSAKI Motohiro
  2010-06-30  9:27 ` [PATCH 02/11] oom: oom_kill_process() doesn't select kthread child KOSAKI Motohiro
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:27 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

Now, badness() doesn't care neigher CPUSET nor mempolicy. Then
if the victim child process have disjoint nodemask, OOM Killer might
kill innocent process.

This patch fixes it.

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 26ae697..0aeacb2 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -429,7 +429,7 @@ static int oom_kill_task(struct task_struct *p)
 
 static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 			    unsigned long points, struct mem_cgroup *mem,
-			    const char *message)
+			    nodemask_t *nodemask, const char *message)
 {
 	struct task_struct *victim = p;
 	struct task_struct *child;
@@ -469,6 +469,8 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 				continue;
 			if (mem && !task_in_mem_cgroup(child, mem))
 				continue;
+			if (!has_intersects_mems_allowed(child, nodemask))
+				continue;
 
 			/* badness() returns 0 if the thread is unkillable */
 			child_points = badness(child, uptime.tv_sec);
@@ -519,7 +521,7 @@ retry:
 	if (!p || PTR_ERR(p) == -1UL)
 		goto out;
 
-	if (oom_kill_process(p, gfp_mask, 0, points, mem,
+	if (oom_kill_process(p, gfp_mask, 0, points, mem, NULL,
 				"Memory cgroup out of memory"))
 		goto retry;
 out:
@@ -678,7 +680,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 		 * non-zero, current could not be killed so we must fallback to
 		 * the tasklist scan.
 		 */
-		if (!oom_kill_process(current, gfp_mask, order, 0, NULL,
+		if (!oom_kill_process(current, gfp_mask, order, 0, NULL, nodemask,
 				"Out of memory (oom_kill_allocating_task)"))
 			return;
 	}
@@ -697,7 +699,7 @@ retry:
 		panic("Out of memory and no killable processes...\n");
 	}
 
-	if (oom_kill_process(p, gfp_mask, order, points, NULL,
+	if (oom_kill_process(p, gfp_mask, order, points, NULL, nodemask,
 			     "Out of memory"))
 		goto retry;
 	read_unlock(&tasklist_lock);
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/11] oom: oom_kill_process() doesn't select kthread child
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
  2010-06-30  9:27 ` [PATCH 01/11] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
@ 2010-06-30  9:27 ` KOSAKI Motohiro
  2010-06-30 13:55   ` Minchan Kim
  2010-06-30  9:28 ` [PATCH 03/11] oom: make oom_unkillable_task() helper function KOSAKI Motohiro
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:27 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

Now, select_bad_process() have PF_KTHREAD check, but oom_kill_process
doesn't. It mean oom_kill_process() may choose wrong task, especially,
when the child are using use_mm().

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0aeacb2..dc8589e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -467,6 +467,8 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 
 			if (child->mm == p->mm)
 				continue;
+			if (child->flags & PF_KTHREAD)
+				continue;
 			if (mem && !task_in_mem_cgroup(child, mem))
 				continue;
 			if (!has_intersects_mems_allowed(child, nodemask))
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/11] oom: make oom_unkillable_task() helper function
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
  2010-06-30  9:27 ` [PATCH 01/11] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
  2010-06-30  9:27 ` [PATCH 02/11] oom: oom_kill_process() doesn't select kthread child KOSAKI Motohiro
@ 2010-06-30  9:28 ` KOSAKI Motohiro
  2010-06-30 14:19   ` Minchan Kim
  2010-06-30  9:29 ` [PATCH 04/11] oom: oom_kill_process() need to check p is unkillable KOSAKI Motohiro
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:28 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

Now, we have the same task check in two places. Unify it.

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   33 ++++++++++++++++++++++-----------
 1 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index dc8589e..a4a5439 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -101,6 +101,26 @@ static struct task_struct *find_lock_task_mm(struct task_struct *p)
 	return NULL;
 }
 
+/* return true if the task is not adequate as candidate victim task. */
+static bool oom_unkillable_task(struct task_struct *p, struct mem_cgroup *mem,
+			   const nodemask_t *nodemask)
+{
+	if (is_global_init(p))
+		return true;
+	if (p->flags & PF_KTHREAD)
+		return true;
+
+	/* When mem_cgroup_out_of_memory() and p is not member of the group */
+	if (mem && !task_in_mem_cgroup(p, mem))
+		return true;
+
+	/* p may not have freeable memory in nodemask */
+	if (!has_intersects_mems_allowed(p, nodemask))
+		return true;
+
+	return false;
+}
+
 /**
  * badness - calculate a numeric value for how bad this task has been
  * @p: task struct of which task we should calculate
@@ -295,12 +315,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 	for_each_process(p) {
 		unsigned long points;
 
-		/* skip the init task and kthreads */
-		if (is_global_init(p) || (p->flags & PF_KTHREAD))
-			continue;
-		if (mem && !task_in_mem_cgroup(p, mem))
-			continue;
-		if (!has_intersects_mems_allowed(p, nodemask))
+		if (oom_unkillable_task(p, mem, nodemask))
 			continue;
 
 		/*
@@ -467,11 +482,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 
 			if (child->mm == p->mm)
 				continue;
-			if (child->flags & PF_KTHREAD)
-				continue;
-			if (mem && !task_in_mem_cgroup(child, mem))
-				continue;
-			if (!has_intersects_mems_allowed(child, nodemask))
+			if (oom_unkillable_task(p, mem, nodemask))
 				continue;
 
 			/* badness() returns 0 if the thread is unkillable */
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/11] oom: oom_kill_process() need to check p is unkillable
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
                   ` (2 preceding siblings ...)
  2010-06-30  9:28 ` [PATCH 03/11] oom: make oom_unkillable_task() helper function KOSAKI Motohiro
@ 2010-06-30  9:29 ` KOSAKI Motohiro
  2010-06-30 13:57   ` Minchan Kim
  2010-06-30  9:30 ` [PATCH 05/11] oom: /proc/<pid>/oom_score treat kernel thread honestly KOSAKI Motohiro
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:29 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

When oom_kill_allocating_task is enabled, an argument task of
oom_kill_process is not selected by select_bad_process(), It's
just out_of_memory() caller task. It mean the task can be
unkillable. check it first.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index a4a5439..ee00817 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -687,7 +687,8 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	check_panic_on_oom(constraint, gfp_mask, order);
 
 	read_lock(&tasklist_lock);
-	if (sysctl_oom_kill_allocating_task) {
+	if (sysctl_oom_kill_allocating_task &&
+	    !oom_unkillable_task(current, NULL, nodemask)) {
 		/*
 		 * oom_kill_process() needs tasklist_lock held.  If it returns
 		 * non-zero, current could not be killed so we must fallback to
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/11] oom: /proc/<pid>/oom_score treat kernel thread honestly
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
                   ` (3 preceding siblings ...)
  2010-06-30  9:29 ` [PATCH 04/11] oom: oom_kill_process() need to check p is unkillable KOSAKI Motohiro
@ 2010-06-30  9:30 ` KOSAKI Motohiro
  2010-06-30 14:03   ` Minchan Kim
  2010-06-30  9:31 ` [PATCH 06/11] oom: kill duplicate OOM_DISABLE check KOSAKI Motohiro
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:30 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

If kernel thread are using use_mm(), badness() return positive value.
This is not big issue because caller care it correctly. but there is
one exception, /proc/<pid>/oom_score call badness() directly and
don't care the task is regular process.

another example, /proc/1/oom_score return !0 value. but it's unkillable.
This incorrectness makes confusing to admin a bit.

This patch fixes it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/proc/base.c |    5 +++--
 mm/oom_kill.c  |   13 +++++++------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 28099a1..56b8d3e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -428,7 +428,8 @@ static const struct file_operations proc_lstats_operations = {
 #endif
 
 /* The badness from the OOM killer */
-unsigned long badness(struct task_struct *p, unsigned long uptime);
+unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
+		      nodemask_t *nodemask, unsigned long uptime);
 static int proc_oom_score(struct task_struct *task, char *buffer)
 {
 	unsigned long points = 0;
@@ -437,7 +438,7 @@ static int proc_oom_score(struct task_struct *task, char *buffer)
 	do_posix_clock_monotonic_gettime(&uptime);
 	read_lock(&tasklist_lock);
 	if (pid_alive(task))
-		points = badness(task, uptime.tv_sec);
+		points = badness(task, NULL, NULL, uptime.tv_sec);
 	read_unlock(&tasklist_lock);
 	return sprintf(buffer, "%lu\n", points);
 }
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ee00817..fcbd21b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -139,8 +139,8 @@ static bool oom_unkillable_task(struct task_struct *p, struct mem_cgroup *mem,
  *    algorithm has been meticulously tuned to meet the principle
  *    of least surprise ... (be careful when you change it)
  */
-
-unsigned long badness(struct task_struct *p, unsigned long uptime)
+unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
+		      const nodemask_t *nodemask, unsigned long uptime)
 {
 	unsigned long points, cpu_time, run_time;
 	struct task_struct *child;
@@ -150,6 +150,8 @@ unsigned long badness(struct task_struct *p, unsigned long uptime)
 	unsigned long utime;
 	unsigned long stime;
 
+	if (oom_unkillable_task(p, mem, nodemask))
+		return 0;
 	if (oom_adj == OOM_DISABLE)
 		return 0;
 
@@ -351,7 +353,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 		if (p->signal->oom_adj == OOM_DISABLE)
 			continue;
 
-		points = badness(p, uptime.tv_sec);
+		points = badness(p, mem, nodemask, uptime.tv_sec);
 		if (points > *ppoints || !chosen) {
 			chosen = p;
 			*ppoints = points;
@@ -482,11 +484,10 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 
 			if (child->mm == p->mm)
 				continue;
-			if (oom_unkillable_task(p, mem, nodemask))
-				continue;
 
 			/* badness() returns 0 if the thread is unkillable */
-			child_points = badness(child, uptime.tv_sec);
+			child_points = badness(child, mem, nodemask,
+					       uptime.tv_sec);
 			if (child_points > victim_points) {
 				victim = child;
 				victim_points = child_points;
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/11] oom: kill duplicate OOM_DISABLE check
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
                   ` (4 preceding siblings ...)
  2010-06-30  9:30 ` [PATCH 05/11] oom: /proc/<pid>/oom_score treat kernel thread honestly KOSAKI Motohiro
@ 2010-06-30  9:31 ` KOSAKI Motohiro
  2010-06-30 14:10   ` Minchan Kim
  2010-06-30  9:31 ` [PATCH 07/11] oom: move OOM_DISABLE check from oom_kill_task to out_of_memory() KOSAKI Motohiro
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:31 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro


select_bad_process() and badness() have the same OOM_DISABLE check.
This patch kill one.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index fcbd21b..f5d903f 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -350,9 +350,6 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 			*ppoints = ULONG_MAX;
 		}
 
-		if (p->signal->oom_adj == OOM_DISABLE)
-			continue;
-
 		points = badness(p, mem, nodemask, uptime.tv_sec);
 		if (points > *ppoints || !chosen) {
 			chosen = p;
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/11] oom: move OOM_DISABLE check from oom_kill_task to out_of_memory()
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
                   ` (5 preceding siblings ...)
  2010-06-30  9:31 ` [PATCH 06/11] oom: kill duplicate OOM_DISABLE check KOSAKI Motohiro
@ 2010-06-30  9:31 ` KOSAKI Motohiro
  2010-06-30 14:20   ` Minchan Kim
  2010-06-30  9:32 ` [PATCH 08/11] oom: cleanup has_intersects_mems_allowed() KOSAKI Motohiro
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:31 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

Now, if oom_kill_allocating_task is enabled and current have
OOM_DISABLED, following printk in oom_kill_process is called twice.

    pr_err("%s: Kill process %d (%s) score %lu or sacrifice child\n",
            message, task_pid_nr(p), p->comm, points);

So, OOM_DISABLE check should be more early.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f5d903f..75f6dbc 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -424,7 +424,7 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
 static int oom_kill_task(struct task_struct *p)
 {
 	p = find_lock_task_mm(p);
-	if (!p || p->signal->oom_adj == OOM_DISABLE) {
+	if (!p) {
 		task_unlock(p);
 		return 1;
 	}
@@ -686,7 +686,8 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 
 	read_lock(&tasklist_lock);
 	if (sysctl_oom_kill_allocating_task &&
-	    !oom_unkillable_task(current, NULL, nodemask)) {
+	    !oom_unkillable_task(current, NULL, nodemask) &&
+	    (current->signal->oom_adj != OOM_DISABLE)) {
 		/*
 		 * oom_kill_process() needs tasklist_lock held.  If it returns
 		 * non-zero, current could not be killed so we must fallback to
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/11] oom: cleanup has_intersects_mems_allowed()
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
                   ` (6 preceding siblings ...)
  2010-06-30  9:31 ` [PATCH 07/11] oom: move OOM_DISABLE check from oom_kill_task to out_of_memory() KOSAKI Motohiro
@ 2010-06-30  9:32 ` KOSAKI Motohiro
  2010-06-30  9:32 ` [PATCH 09/11] oom: remove child->mm check from oom_kill_process() KOSAKI Motohiro
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:32 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

Now has_intersects_mems_allowed() has own thread iterate logic, but
it should use while_each_thread().

It slightly improve the code readability.

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 75f6dbc..136708b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -69,8 +69,8 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
 			if (cpuset_mems_allowed_intersects(current, tsk))
 				return true;
 		}
-		tsk = next_thread(tsk);
-	} while (tsk != start);
+	} while_each_thread(start, tsk);
+
 	return false;
 }
 #else
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/11] oom: remove child->mm check from oom_kill_process()
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
                   ` (7 preceding siblings ...)
  2010-06-30  9:32 ` [PATCH 08/11] oom: cleanup has_intersects_mems_allowed() KOSAKI Motohiro
@ 2010-06-30  9:32 ` KOSAKI Motohiro
  2010-06-30 14:30   ` Minchan Kim
  2010-06-30  9:33 ` [PATCH 10/11] oom: give the dying task a higher priority KOSAKI Motohiro
  2010-06-30  9:34 ` [PATCH 11/11] oom: multi threaded process coredump don't make deadlock KOSAKI Motohiro
  10 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:32 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro


Current "child->mm == p->mm" mean prevent to select vfork() task.
But we don't have any reason to don't consider vfork().

Remvoed.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 136708b..b5678bf 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -479,9 +479,6 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		list_for_each_entry(child, &t->children, sibling) {
 			unsigned long child_points;
 
-			if (child->mm == p->mm)
-				continue;
-
 			/* badness() returns 0 if the thread is unkillable */
 			child_points = badness(child, mem, nodemask,
 					       uptime.tv_sec);
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/11] oom: give the dying task a higher priority
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
                   ` (8 preceding siblings ...)
  2010-06-30  9:32 ` [PATCH 09/11] oom: remove child->mm check from oom_kill_process() KOSAKI Motohiro
@ 2010-06-30  9:33 ` KOSAKI Motohiro
  2010-06-30  9:35   ` KOSAKI Motohiro
  2010-07-02 21:49   ` Andrew Morton
  2010-06-30  9:34 ` [PATCH 11/11] oom: multi threaded process coredump don't make deadlock KOSAKI Motohiro
  10 siblings, 2 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:33 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

From: Luis Claudio R. Goncalves <lclaudio@uudg.org>

In a system under heavy load it was observed that even after the
oom-killer selects a task to die, the task may take a long time to die.

Right after sending a SIGKILL to the task selected by the oom-killer
this task has it's priority increased so that it can exit() exit soon,
freeing memory. That is accomplished by:

        /*
         * We give our sacrificial lamb high priority and access to
         * all the memory it needs. That way it should be able to
         * exit() and clear out its resources quickly...
         */
 	p->rt.time_slice = HZ;
 	set_tsk_thread_flag(p, TIF_MEMDIE);

It sounds plausible giving the dying task an even higher priority to be
sure it will be scheduled sooner and free the desired memory. It was
suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that
this task won't interfere with any running RT task.

If the dying task is already an RT task, leave it untouched.
Another good suggestion, implemented here, was to avoid boosting the
dying task priority in case of mem_cgroup OOM.

Signed-off-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   34 +++++++++++++++++++++++++++++++---
 1 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b5678bf..0858b18 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -82,6 +82,24 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
 #endif /* CONFIG_NUMA */
 
 /*
+ * If this is a system OOM (not a memcg OOM) and the task selected to be
+ * killed is not already running at high (RT) priorities, speed up the
+ * recovery by boosting the dying task to the lowest FIFO priority.
+ * That helps with the recovery and avoids interfering with RT tasks.
+ */
+static void boost_dying_task_prio(struct task_struct *p,
+				  struct mem_cgroup *mem)
+{
+	struct sched_param param = { .sched_priority = 1 };
+
+	if (mem)
+		return;
+
+	if (!rt_task(p))
+		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
+}
+
+/*
  * The process p may have detached its own ->mm while exiting or through
  * use_mm(), but one or more of its subthreads may still have a valid
  * pointer.  Return p, or any of its subthreads with a valid ->mm, with
@@ -421,7 +439,7 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
-static int oom_kill_task(struct task_struct *p)
+static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 {
 	p = find_lock_task_mm(p);
 	if (!p) {
@@ -434,9 +452,17 @@ static int oom_kill_task(struct task_struct *p)
 		K(get_mm_counter(p->mm, MM_FILEPAGES)));
 	task_unlock(p);
 
-	p->rt.time_slice = HZ;
+
 	set_tsk_thread_flag(p, TIF_MEMDIE);
 	force_sig(SIGKILL, p);
+
+	/*
+	 * We give our sacrificial lamb high priority and access to
+	 * all the memory it needs. That way it should be able to
+	 * exit() and clear out its resources quickly...
+	 */
+	boost_dying_task_prio(p, mem);
+
 	return 0;
 }
 #undef K
@@ -460,6 +486,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 */
 	if (p->flags & PF_EXITING) {
 		set_tsk_thread_flag(p, TIF_MEMDIE);
+		boost_dying_task_prio(p, mem);
 		return 0;
 	}
 
@@ -489,7 +516,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		}
 	} while_each_thread(p, t);
 
-	return oom_kill_task(victim);
+	return oom_kill_task(victim, mem);
 }
 
 /*
@@ -670,6 +697,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	 */
 	if (fatal_signal_pending(current)) {
 		set_thread_flag(TIF_MEMDIE);
+		boost_dying_task_prio(current, NULL);
 		return;
 	}
 
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/11] oom: multi threaded process coredump don't make deadlock
  2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
                   ` (9 preceding siblings ...)
  2010-06-30  9:33 ` [PATCH 10/11] oom: give the dying task a higher priority KOSAKI Motohiro
@ 2010-06-30  9:34 ` KOSAKI Motohiro
  10 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:34 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Oleg Nesterov

Oleg pointed out current PF_EXITING check is wrong. Because PF_EXITING
is per-thread flag, not per-process flag. He said,

   Two threads, group-leader L and its sub-thread T. T dumps the code.
   In this case both threads have ->mm != NULL, L has PF_EXITING.

   The first problem is, select_bad_process() always return -1 in this
   case (even if the caller is T, this doesn't matter).

   The second problem is that we should add TIF_MEMDIE to T, not L.

I think we can remove this dubious PF_EXITING check. but as first step,
This patch add the protection of multi threaded issue.

Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0858b18..b04e557 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -360,7 +360,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 		 * the process of exiting and releasing its resources.
 		 * Otherwise we could get an easy OOM deadlock.
 		 */
-		if ((p->flags & PF_EXITING) && p->mm) {
+		if (thread_group_empty(p) && (p->flags & PF_EXITING) && p->mm) {
 			if (p != current)
 				return ERR_PTR(-1UL);
 
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] oom: give the dying task a higher priority
  2010-06-30  9:33 ` [PATCH 10/11] oom: give the dying task a higher priority KOSAKI Motohiro
@ 2010-06-30  9:35   ` KOSAKI Motohiro
  2010-06-30 14:40     ` Minchan Kim
  2010-07-02 21:49   ` Andrew Morton
  1 sibling, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-06-30  9:35 UTC (permalink / raw)
  To: Luis Claudio R. Goncalves
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Minchan Kim,
	David Rientjes, KAMEZAWA Hiroyuki


Sorry, I forgot to cc Luis. resend.


(intentional full quote)

> From: Luis Claudio R. Goncalves <lclaudio@uudg.org>
> 
> In a system under heavy load it was observed that even after the
> oom-killer selects a task to die, the task may take a long time to die.
> 
> Right after sending a SIGKILL to the task selected by the oom-killer
> this task has it's priority increased so that it can exit() exit soon,
> freeing memory. That is accomplished by:
> 
>         /*
>          * We give our sacrificial lamb high priority and access to
>          * all the memory it needs. That way it should be able to
>          * exit() and clear out its resources quickly...
>          */
>  	p->rt.time_slice = HZ;
>  	set_tsk_thread_flag(p, TIF_MEMDIE);
> 
> It sounds plausible giving the dying task an even higher priority to be
> sure it will be scheduled sooner and free the desired memory. It was
> suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that
> this task won't interfere with any running RT task.
> 
> If the dying task is already an RT task, leave it untouched.
> Another good suggestion, implemented here, was to avoid boosting the
> dying task priority in case of mem_cgroup OOM.
> 
> Signed-off-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/oom_kill.c |   34 +++++++++++++++++++++++++++++++---
>  1 files changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index b5678bf..0858b18 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -82,6 +82,24 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
>  #endif /* CONFIG_NUMA */
>  
>  /*
> + * If this is a system OOM (not a memcg OOM) and the task selected to be
> + * killed is not already running at high (RT) priorities, speed up the
> + * recovery by boosting the dying task to the lowest FIFO priority.
> + * That helps with the recovery and avoids interfering with RT tasks.
> + */
> +static void boost_dying_task_prio(struct task_struct *p,
> +				  struct mem_cgroup *mem)
> +{
> +	struct sched_param param = { .sched_priority = 1 };
> +
> +	if (mem)
> +		return;
> +
> +	if (!rt_task(p))
> +		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
> +}
> +
> +/*
>   * The process p may have detached its own ->mm while exiting or through
>   * use_mm(), but one or more of its subthreads may still have a valid
>   * pointer.  Return p, or any of its subthreads with a valid ->mm, with
> @@ -421,7 +439,7 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
>  }
>  
>  #define K(x) ((x) << (PAGE_SHIFT-10))
> -static int oom_kill_task(struct task_struct *p)
> +static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
>  {
>  	p = find_lock_task_mm(p);
>  	if (!p) {
> @@ -434,9 +452,17 @@ static int oom_kill_task(struct task_struct *p)
>  		K(get_mm_counter(p->mm, MM_FILEPAGES)));
>  	task_unlock(p);
>  
> -	p->rt.time_slice = HZ;
> +
>  	set_tsk_thread_flag(p, TIF_MEMDIE);
>  	force_sig(SIGKILL, p);
> +
> +	/*
> +	 * We give our sacrificial lamb high priority and access to
> +	 * all the memory it needs. That way it should be able to
> +	 * exit() and clear out its resources quickly...
> +	 */
> +	boost_dying_task_prio(p, mem);
> +
>  	return 0;
>  }
>  #undef K
> @@ -460,6 +486,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  	 */
>  	if (p->flags & PF_EXITING) {
>  		set_tsk_thread_flag(p, TIF_MEMDIE);
> +		boost_dying_task_prio(p, mem);
>  		return 0;
>  	}
>  
> @@ -489,7 +516,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  		}
>  	} while_each_thread(p, t);
>  
> -	return oom_kill_task(victim);
> +	return oom_kill_task(victim, mem);
>  }
>  
>  /*
> @@ -670,6 +697,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
>  	 */
>  	if (fatal_signal_pending(current)) {
>  		set_thread_flag(TIF_MEMDIE);
> +		boost_dying_task_prio(current, NULL);
>  		return;
>  	}
>  
> -- 
> 1.6.5.2
> 
> 
> 




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/11] oom: oom_kill_process() doesn't select kthread child
  2010-06-30  9:27 ` [PATCH 02/11] oom: oom_kill_process() doesn't select kthread child KOSAKI Motohiro
@ 2010-06-30 13:55   ` Minchan Kim
  2010-07-01  0:07     ` KOSAKI Motohiro
  0 siblings, 1 reply; 36+ messages in thread
From: Minchan Kim @ 2010-06-30 13:55 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki

On Wed, Jun 30, 2010 at 06:27:52PM +0900, KOSAKI Motohiro wrote:
> Now, select_bad_process() have PF_KTHREAD check, but oom_kill_process
> doesn't. It mean oom_kill_process() may choose wrong task, especially,
> when the child are using use_mm().

Is it possible child is kthread even though parent isn't kthread?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/11] oom: oom_kill_process() need to check p is unkillable
  2010-06-30  9:29 ` [PATCH 04/11] oom: oom_kill_process() need to check p is unkillable KOSAKI Motohiro
@ 2010-06-30 13:57   ` Minchan Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Minchan Kim @ 2010-06-30 13:57 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki

On Wed, Jun 30, 2010 at 06:29:22PM +0900, KOSAKI Motohiro wrote:
> When oom_kill_allocating_task is enabled, an argument task of
> oom_kill_process is not selected by select_bad_process(), It's
> just out_of_memory() caller task. It mean the task can be
> unkillable. check it first.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/11] oom: /proc/<pid>/oom_score treat kernel thread honestly
  2010-06-30  9:30 ` [PATCH 05/11] oom: /proc/<pid>/oom_score treat kernel thread honestly KOSAKI Motohiro
@ 2010-06-30 14:03   ` Minchan Kim
  2010-07-01  0:07     ` KOSAKI Motohiro
  0 siblings, 1 reply; 36+ messages in thread
From: Minchan Kim @ 2010-06-30 14:03 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki

On Wed, Jun 30, 2010 at 06:30:19PM +0900, KOSAKI Motohiro wrote:
> If kernel thread are using use_mm(), badness() return positive value.
> This is not big issue because caller care it correctly. but there is
> one exception, /proc/<pid>/oom_score call badness() directly and
> don't care the task is regular process.
> 
> another example, /proc/1/oom_score return !0 value. but it's unkillable.
> This incorrectness makes confusing to admin a bit.

Hmm. If it is a really problem, Could we solve it in proc_oom_score itself?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/11] oom: kill duplicate OOM_DISABLE check
  2010-06-30  9:31 ` [PATCH 06/11] oom: kill duplicate OOM_DISABLE check KOSAKI Motohiro
@ 2010-06-30 14:10   ` Minchan Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Minchan Kim @ 2010-06-30 14:10 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki

On Wed, Jun 30, 2010 at 06:31:00PM +0900, KOSAKI Motohiro wrote:
> 
> select_bad_process() and badness() have the same OOM_DISABLE check.
> This patch kill one.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] oom: make oom_unkillable_task() helper function
  2010-06-30  9:28 ` [PATCH 03/11] oom: make oom_unkillable_task() helper function KOSAKI Motohiro
@ 2010-06-30 14:19   ` Minchan Kim
  2010-07-01  0:07     ` KOSAKI Motohiro
  0 siblings, 1 reply; 36+ messages in thread
From: Minchan Kim @ 2010-06-30 14:19 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki

On Wed, Jun 30, 2010 at 06:28:37PM +0900, KOSAKI Motohiro wrote:
> Now, we have the same task check in two places. Unify it.
> 
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/oom_kill.c |   33 ++++++++++++++++++++++-----------
>  1 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index dc8589e..a4a5439 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -101,6 +101,26 @@ static struct task_struct *find_lock_task_mm(struct task_struct *p)
>  	return NULL;
>  }
>  
> +/* return true if the task is not adequate as candidate victim task. */
> +static bool oom_unkillable_task(struct task_struct *p, struct mem_cgroup *mem,
> +			   const nodemask_t *nodemask)
> +{
> +	if (is_global_init(p))
> +		return true;
> +	if (p->flags & PF_KTHREAD)
> +		return true;
> +
> +	/* When mem_cgroup_out_of_memory() and p is not member of the group */
> +	if (mem && !task_in_mem_cgroup(p, mem))
> +		return true;
> +
> +	/* p may not have freeable memory in nodemask */
> +	if (!has_intersects_mems_allowed(p, nodemask))
> +		return true;
> +
> +	return false;
> +}
> +

I returend this patch as review 7/11. 
Why didn't you check p->signal->oom_adj == OOM_DISABLE in here?
I don't figure out code after your patches are applied totally.
But I think it would be check it in this function as function's name says.


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 07/11] oom: move OOM_DISABLE check from oom_kill_task to out_of_memory()
  2010-06-30  9:31 ` [PATCH 07/11] oom: move OOM_DISABLE check from oom_kill_task to out_of_memory() KOSAKI Motohiro
@ 2010-06-30 14:20   ` Minchan Kim
  2010-07-01  0:07     ` KOSAKI Motohiro
  0 siblings, 1 reply; 36+ messages in thread
From: Minchan Kim @ 2010-06-30 14:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki

On Wed, Jun 30, 2010 at 06:31:36PM +0900, KOSAKI Motohiro wrote:
> Now, if oom_kill_allocating_task is enabled and current have
> OOM_DISABLED, following printk in oom_kill_process is called twice.
> 
>     pr_err("%s: Kill process %d (%s) score %lu or sacrifice child\n",
>             message, task_pid_nr(p), p->comm, points);
> 
> So, OOM_DISABLE check should be more early.

If we check it in oom_unkillable_task, we don't need this patch. 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/11] oom: remove child->mm check from oom_kill_process()
  2010-06-30  9:32 ` [PATCH 09/11] oom: remove child->mm check from oom_kill_process() KOSAKI Motohiro
@ 2010-06-30 14:30   ` Minchan Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Minchan Kim @ 2010-06-30 14:30 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki,
	Andrea Arcangeli

On Wed, Jun 30, 2010 at 06:32:44PM +0900, KOSAKI Motohiro wrote:
> 
> Current "child->mm == p->mm" mean prevent to select vfork() task.
> But we don't have any reason to don't consider vfork().

I guess "child->mm == p->mm" is for losing the minimal amount of work done
as comment say. But frankly speaking, I don't understand it, either.
Maybe "One shot, two kill" problem?

Andrea. Could you explain it?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] oom: give the dying task a higher priority
  2010-06-30  9:35   ` KOSAKI Motohiro
@ 2010-06-30 14:40     ` Minchan Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Minchan Kim @ 2010-06-30 14:40 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Luis Claudio R. Goncalves, LKML, linux-mm, Andrew Morton,
	David Rientjes, KAMEZAWA Hiroyuki, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra

On Wed, Jun 30, 2010 at 06:35:08PM +0900, KOSAKI Motohiro wrote:
> 
> Sorry, I forgot to cc Luis. resend.
> 
> 
> (intentional full quote)
> 
> > From: Luis Claudio R. Goncalves <lclaudio@uudg.org>
> > 
> > In a system under heavy load it was observed that even after the
> > oom-killer selects a task to die, the task may take a long time to die.
> > 
> > Right after sending a SIGKILL to the task selected by the oom-killer
> > this task has it's priority increased so that it can exit() exit soon,
> > freeing memory. That is accomplished by:
> > 
> >         /*
> >          * We give our sacrificial lamb high priority and access to
> >          * all the memory it needs. That way it should be able to
> >          * exit() and clear out its resources quickly...
> >          */
> >  	p->rt.time_slice = HZ;
> >  	set_tsk_thread_flag(p, TIF_MEMDIE);
> > 
> > It sounds plausible giving the dying task an even higher priority to be
> > sure it will be scheduled sooner and free the desired memory. It was
> > suggested on LKML using SCHED_FIFO:1, the lowest RT priority so that
> > this task won't interfere with any running RT task.
> > 
> > If the dying task is already an RT task, leave it untouched.
> > Another good suggestion, implemented here, was to avoid boosting the
> > dying task priority in case of mem_cgroup OOM.
> > 
> > Signed-off-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
> > Cc: Minchan Kim <minchan.kim@gmail.com>
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

It seems code itself doesn't have a problem.
So I give reviewed-by.
But this patch might break fairness of normal process at corner case.
If system working is more important than fairness of processes,
It does make sense. But scheduler guys might have a different opinion.

So at least, we need ACKs of scheduler guys.
Cced Ingo, Peter, Thomas. 

> > ---
> >  mm/oom_kill.c |   34 +++++++++++++++++++++++++++++++---
> >  1 files changed, 31 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index b5678bf..0858b18 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -82,6 +82,24 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk,
> >  #endif /* CONFIG_NUMA */
> >  
> >  /*
> > + * If this is a system OOM (not a memcg OOM) and the task selected to be
> > + * killed is not already running at high (RT) priorities, speed up the
> > + * recovery by boosting the dying task to the lowest FIFO priority.
> > + * That helps with the recovery and avoids interfering with RT tasks.
> > + */
> > +static void boost_dying_task_prio(struct task_struct *p,
> > +				  struct mem_cgroup *mem)
> > +{
> > +	struct sched_param param = { .sched_priority = 1 };
> > +
> > +	if (mem)
> > +		return;
> > +
> > +	if (!rt_task(p))
> > +		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
> > +}
> > +
> > +/*
> >   * The process p may have detached its own ->mm while exiting or through
> >   * use_mm(), but one or more of its subthreads may still have a valid
> >   * pointer.  Return p, or any of its subthreads with a valid ->mm, with
> > @@ -421,7 +439,7 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
> >  }
> >  
> >  #define K(x) ((x) << (PAGE_SHIFT-10))
> > -static int oom_kill_task(struct task_struct *p)
> > +static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> >  {
> >  	p = find_lock_task_mm(p);
> >  	if (!p) {
> > @@ -434,9 +452,17 @@ static int oom_kill_task(struct task_struct *p)
> >  		K(get_mm_counter(p->mm, MM_FILEPAGES)));
> >  	task_unlock(p);
> >  
> > -	p->rt.time_slice = HZ;
> > +
> >  	set_tsk_thread_flag(p, TIF_MEMDIE);
> >  	force_sig(SIGKILL, p);
> > +
> > +	/*
> > +	 * We give our sacrificial lamb high priority and access to
> > +	 * all the memory it needs. That way it should be able to
> > +	 * exit() and clear out its resources quickly...
> > +	 */
> > +	boost_dying_task_prio(p, mem);
> > +
> >  	return 0;
> >  }
> >  #undef K
> > @@ -460,6 +486,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> >  	 */
> >  	if (p->flags & PF_EXITING) {
> >  		set_tsk_thread_flag(p, TIF_MEMDIE);
> > +		boost_dying_task_prio(p, mem);
> >  		return 0;
> >  	}
> >  
> > @@ -489,7 +516,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> >  		}
> >  	} while_each_thread(p, t);
> >  
> > -	return oom_kill_task(victim);
> > +	return oom_kill_task(victim, mem);
> >  }
> >  
> >  /*
> > @@ -670,6 +697,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> >  	 */
> >  	if (fatal_signal_pending(current)) {
> >  		set_thread_flag(TIF_MEMDIE);
> > +		boost_dying_task_prio(current, NULL);
> >  		return;
> >  	}
> >  
> > -- 
> > 1.6.5.2
> > 
> > 
> > 
> 
> 
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] oom: make oom_unkillable_task() helper function
  2010-06-30 14:19   ` Minchan Kim
@ 2010-07-01  0:07     ` KOSAKI Motohiro
  0 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-07-01  0:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, David Rientjes,
	KAMEZAWA Hiroyuki

> On Wed, Jun 30, 2010 at 06:28:37PM +0900, KOSAKI Motohiro wrote:
> > Now, we have the same task check in two places. Unify it.
> > 
> > Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > ---
> >  mm/oom_kill.c |   33 ++++++++++++++++++++++-----------
> >  1 files changed, 22 insertions(+), 11 deletions(-)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index dc8589e..a4a5439 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -101,6 +101,26 @@ static struct task_struct *find_lock_task_mm(struct task_struct *p)
> >  	return NULL;
> >  }
> >  
> > +/* return true if the task is not adequate as candidate victim task. */
> > +static bool oom_unkillable_task(struct task_struct *p, struct mem_cgroup *mem,
> > +			   const nodemask_t *nodemask)
> > +{
> > +	if (is_global_init(p))
> > +		return true;
> > +	if (p->flags & PF_KTHREAD)
> > +		return true;
> > +
> > +	/* When mem_cgroup_out_of_memory() and p is not member of the group */
> > +	if (mem && !task_in_mem_cgroup(p, mem))
> > +		return true;
> > +
> > +	/* p may not have freeable memory in nodemask */
> > +	if (!has_intersects_mems_allowed(p, nodemask))
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> 
> I returend this patch as review 7/11. 
> Why didn't you check p->signal->oom_adj == OOM_DISABLE in here?
> I don't figure out code after your patches are applied totally.
> But I think it would be check it in this function as function's name says.

For preserve select_bad_process() semantics. It have

        for_each_process(p) {
                if (oom_unkillable_task(p, mem, nodemask))
			continue;

                if (thread_group_empty(p) && (p->flags & PF_EXITING) && p->mm) {
                        if (p != current)
                                return ERR_PTR(-1UL);

                        chosen = p;
                        *ppoints = ULONG_MAX;
                }

	        if (oom_adj == OOM_DISABLE)
			continue;

That said, Current OOM-Killer intend to kill PF_EXITING process even if
it have OOM_DISABLE. (practically, it's not kill. it only affect to give 
allocation bonus to PF_EXITING process)

My trivial fixes series don't intend to make large semantics change.




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/11] oom: oom_kill_process() doesn't select kthread child
  2010-06-30 13:55   ` Minchan Kim
@ 2010-07-01  0:07     ` KOSAKI Motohiro
  2010-07-01 13:38       ` Minchan Kim
  0 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-07-01  0:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, David Rientjes,
	KAMEZAWA Hiroyuki

> On Wed, Jun 30, 2010 at 06:27:52PM +0900, KOSAKI Motohiro wrote:
> > Now, select_bad_process() have PF_KTHREAD check, but oom_kill_process
> > doesn't. It mean oom_kill_process() may choose wrong task, especially,
> > when the child are using use_mm().
> 
> Is it possible child is kthread even though parent isn't kthread?

Usually unhappen. but crappy driver can do any strange thing freely.
As I said, oom code should have conservative assumption as far as possible.




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/11] oom: /proc/<pid>/oom_score treat kernel thread honestly
  2010-06-30 14:03   ` Minchan Kim
@ 2010-07-01  0:07     ` KOSAKI Motohiro
  2010-07-01 14:36       ` Minchan Kim
  0 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-07-01  0:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, David Rientjes,
	KAMEZAWA Hiroyuki

> On Wed, Jun 30, 2010 at 06:30:19PM +0900, KOSAKI Motohiro wrote:
> > If kernel thread are using use_mm(), badness() return positive value.
> > This is not big issue because caller care it correctly. but there is
> > one exception, /proc/<pid>/oom_score call badness() directly and
> > don't care the task is regular process.
> > 
> > another example, /proc/1/oom_score return !0 value. but it's unkillable.
> > This incorrectness makes confusing to admin a bit.
> 
> Hmm. If it is a really problem, Could we solve it in proc_oom_score itself?

probably, no good idea. For maintainance view, all oom related code should
be gathered in oom_kill.c.
If you dislike to add messy into badness(), I hope to make badness_for_oom_score()
or something like instead.




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 07/11] oom: move OOM_DISABLE check from oom_kill_task to out_of_memory()
  2010-06-30 14:20   ` Minchan Kim
@ 2010-07-01  0:07     ` KOSAKI Motohiro
  0 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-07-01  0:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, David Rientjes,
	KAMEZAWA Hiroyuki

> On Wed, Jun 30, 2010 at 06:31:36PM +0900, KOSAKI Motohiro wrote:
> > Now, if oom_kill_allocating_task is enabled and current have
> > OOM_DISABLED, following printk in oom_kill_process is called twice.
> > 
> >     pr_err("%s: Kill process %d (%s) score %lu or sacrifice child\n",
> >             message, task_pid_nr(p), p->comm, points);
> > 
> > So, OOM_DISABLE check should be more early.
> 
> If we check it in oom_unkillable_task, we don't need this patch. 

Yup. but please read the commnet of [3/11].




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/11] oom: oom_kill_process() doesn't select kthread child
  2010-07-01  0:07     ` KOSAKI Motohiro
@ 2010-07-01 13:38       ` Minchan Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Minchan Kim @ 2010-07-01 13:38 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki

On Thu, Jul 01, 2010 at 09:07:02AM +0900, KOSAKI Motohiro wrote:
> > On Wed, Jun 30, 2010 at 06:27:52PM +0900, KOSAKI Motohiro wrote:
> > > Now, select_bad_process() have PF_KTHREAD check, but oom_kill_process
> > > doesn't. It mean oom_kill_process() may choose wrong task, especially,
> > > when the child are using use_mm().
> > 
> > Is it possible child is kthread even though parent isn't kthread?
> 
> Usually unhappen. but crappy driver can do any strange thing freely.
> As I said, oom code should have conservative assumption as far as possible.

Okay. You change the check with oom_unkillable_task at last. 
The oom_unkillable_task is generic function so that the kthread check in
oom_kill_process is tivial, I think. 

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


> 
> 
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/11] oom: /proc/<pid>/oom_score treat kernel thread honestly
  2010-07-01  0:07     ` KOSAKI Motohiro
@ 2010-07-01 14:36       ` Minchan Kim
  0 siblings, 0 replies; 36+ messages in thread
From: Minchan Kim @ 2010-07-01 14:36 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki

On Thu, Jul 01, 2010 at 09:07:02AM +0900, KOSAKI Motohiro wrote:
> > On Wed, Jun 30, 2010 at 06:30:19PM +0900, KOSAKI Motohiro wrote:
> > > If kernel thread are using use_mm(), badness() return positive value.
> > > This is not big issue because caller care it correctly. but there is
> > > one exception, /proc/<pid>/oom_score call badness() directly and
> > > don't care the task is regular process.
> > > 
> > > another example, /proc/1/oom_score return !0 value. but it's unkillable.
> > > This incorrectness makes confusing to admin a bit.
> > 
> > Hmm. If it is a really problem, Could we solve it in proc_oom_score itself?
> 
> probably, no good idea. For maintainance view, all oom related code should
> be gathered in oom_kill.c.
> If you dislike to add messy into badness(), I hope to make badness_for_oom_score()

I am looking forward to seeing your next series.
Thanks, Kosaki. 

P.S) 
I think if the number of patch series is the bigger than #10, 
It would be better to include or point url of all-at-once patch 
in patch series.

In case of your patch, post patches changes pre patches
It could make hard review unless the reviewer merge patches into tree to
see the final figure. 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] oom: give the dying task a higher priority
  2010-06-30  9:33 ` [PATCH 10/11] oom: give the dying task a higher priority KOSAKI Motohiro
  2010-06-30  9:35   ` KOSAKI Motohiro
@ 2010-07-02 21:49   ` Andrew Morton
  2010-07-06  0:49     ` KOSAKI Motohiro
  1 sibling, 1 reply; 36+ messages in thread
From: Andrew Morton @ 2010-07-02 21:49 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Minchan Kim, David Rientjes, KAMEZAWA Hiroyuki,
	Ingo Molnar, Peter Zijlstra

On Wed, 30 Jun 2010 18:33:23 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> +static void boost_dying_task_prio(struct task_struct *p,
> +				  struct mem_cgroup *mem)
> +{
> +	struct sched_param param = { .sched_priority = 1 };
> +
> +	if (mem)
> +		return;
> +
> +	if (!rt_task(p))
> +		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
> +}

We can actually make `param' static here.  That saves a teeny bit of
code and a little bit of stack.  The oom-killer can be called when
we're using a lot of stack.

But if we make that change we really should make the param arg to
sched_setscheduler_nocheck() be const.  I did that (and was able to
convert lots of callers to use a static `param') but to complete the
job we'd need to chase through all the security goop, fixing up
security_task_setscheduler() and callees, and I got bored.


 include/linux/sched.h |    2 +-
 kernel/kthread.c      |    2 +-
 kernel/sched.c        |    4 ++--
 kernel/softirq.c      |    4 +++-
 kernel/stop_machine.c |    2 +-
 kernel/workqueue.c    |    2 +-
 6 files changed, 9 insertions(+), 7 deletions(-)

diff -puN kernel/kthread.c~a kernel/kthread.c
--- a/kernel/kthread.c~a
+++ a/kernel/kthread.c
@@ -131,7 +131,7 @@ struct task_struct *kthread_create(int (
 	wait_for_completion(&create.done);
 
 	if (!IS_ERR(create.result)) {
-		struct sched_param param = { .sched_priority = 0 };
+		static struct sched_param param = { .sched_priority = 0 };
 		va_list args;
 
 		va_start(args, namefmt);
diff -puN kernel/workqueue.c~a kernel/workqueue.c
--- a/kernel/workqueue.c~a
+++ a/kernel/workqueue.c
@@ -962,7 +962,7 @@ init_cpu_workqueue(struct workqueue_stru
 
 static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
 {
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+	static struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 	struct workqueue_struct *wq = cwq->wq;
 	const char *fmt = is_wq_single_threaded(wq) ? "%s" : "%s/%d";
 	struct task_struct *p;
diff -puN kernel/stop_machine.c~a kernel/stop_machine.c
--- a/kernel/stop_machine.c~a
+++ a/kernel/stop_machine.c
@@ -291,7 +291,7 @@ repeat:
 static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
 					   unsigned long action, void *hcpu)
 {
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
+	static struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
 	unsigned int cpu = (unsigned long)hcpu;
 	struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
 	struct task_struct *p;
diff -puN kernel/sched.c~a kernel/sched.c
--- a/kernel/sched.c~a
+++ a/kernel/sched.c
@@ -4570,7 +4570,7 @@ static bool check_same_owner(struct task
 }
 
 static int __sched_setscheduler(struct task_struct *p, int policy,
-				struct sched_param *param, bool user)
+				const struct sched_param *param, bool user)
 {
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	unsigned long flags;
@@ -4734,7 +4734,7 @@ EXPORT_SYMBOL_GPL(sched_setscheduler);
  * but our caller might not have that capability.
  */
 int sched_setscheduler_nocheck(struct task_struct *p, int policy,
-			       struct sched_param *param)
+			       const struct sched_param *param)
 {
 	return __sched_setscheduler(p, policy, param, false);
 }
diff -puN kernel/softirq.c~a kernel/softirq.c
--- a/kernel/softirq.c~a
+++ a/kernel/softirq.c
@@ -827,7 +827,9 @@ static int __cpuinit cpu_callback(struct
 			     cpumask_any(cpu_online_mask));
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN: {
-		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+		static struct sched_param param = {
+				.sched_priority = MAX_RT_PRIO-1,
+			};
 
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
diff -puN include/linux/sched.h~a include/linux/sched.h
--- a/include/linux/sched.h~a
+++ a/include/linux/sched.h
@@ -1924,7 +1924,7 @@ extern int task_curr(const struct task_s
 extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
 extern int sched_setscheduler_nocheck(struct task_struct *, int,
-				      struct sched_param *);
+				      const struct sched_param *);
 extern struct task_struct *idle_task(int cpu);
 extern struct task_struct *curr_task(int cpu);
 extern void set_curr_task(int cpu, struct task_struct *p);
_


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] oom: give the dying task a higher priority
  2010-07-02 21:49   ` Andrew Morton
@ 2010-07-06  0:49     ` KOSAKI Motohiro
  2010-07-06  0:50       ` [PATCH 1/2] security: add const to security_task_setscheduler() KOSAKI Motohiro
  2010-07-06  0:51       ` [PATCH 2/2] sched: make sched_param arugment static variables in some sched_setscheduler() caller KOSAKI Motohiro
  0 siblings, 2 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-07-06  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, LKML, linux-mm, Minchan Kim, David Rientjes,
	KAMEZAWA Hiroyuki, Ingo Molnar, Peter Zijlstra, James Morris

(cc to James)

> On Wed, 30 Jun 2010 18:33:23 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > +static void boost_dying_task_prio(struct task_struct *p,
> > +				  struct mem_cgroup *mem)
> > +{
> > +	struct sched_param param = { .sched_priority = 1 };
> > +
> > +	if (mem)
> > +		return;
> > +
> > +	if (!rt_task(p))
> > +		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
> > +}
> 
> We can actually make `param' static here.  That saves a teeny bit of
> code and a little bit of stack.  The oom-killer can be called when
> we're using a lot of stack.
> 
> But if we make that change we really should make the param arg to
> sched_setscheduler_nocheck() be const.  I did that (and was able to
> convert lots of callers to use a static `param') but to complete the
> job we'd need to chase through all the security goop, fixing up
> security_task_setscheduler() and callees, and I got bored.

ok, I've finished this works. I made two patches, for-security-tree and
for-core. diffstat is below.

I'll post the patches as reply of this mail.


KOSAKI Motohiro (2):
  security: add const to security_task_setscheduler()
  sched: make sched_param arugment static variables in some
    sched_setscheduler() caller

 include/linux/sched.h         |    5 +++--
 include/linux/security.h      |    9 +++++----
 kernel/irq/manage.c           |    4 +++-
 kernel/kthread.c              |    2 +-
 kernel/sched.c                |    6 +++---
 kernel/softirq.c              |    4 +++-
 kernel/stop_machine.c         |    2 +-
 kernel/trace/trace_selftest.c |    2 +-
 kernel/watchdog.c             |    2 +-
 kernel/workqueue.c            |    2 +-
 security/commoncap.c          |    2 +-
 security/security.c           |    4 ++--
 security/selinux/hooks.c      |    3 ++-
 security/smack/smack_lsm.c    |    2 +-
 14 files changed, 28 insertions(+), 21 deletions(-)

 - kosaki


> 
> 
>  include/linux/sched.h |    2 +-
>  kernel/kthread.c      |    2 +-
>  kernel/sched.c        |    4 ++--
>  kernel/softirq.c      |    4 +++-
>  kernel/stop_machine.c |    2 +-
>  kernel/workqueue.c    |    2 +-
>  6 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff -puN kernel/kthread.c~a kernel/kthread.c
> --- a/kernel/kthread.c~a
> +++ a/kernel/kthread.c
> @@ -131,7 +131,7 @@ struct task_struct *kthread_create(int (
>  	wait_for_completion(&create.done);
>  
>  	if (!IS_ERR(create.result)) {
> -		struct sched_param param = { .sched_priority = 0 };
> +		static struct sched_param param = { .sched_priority = 0 };
>  		va_list args;
>  
>  		va_start(args, namefmt);
> diff -puN kernel/workqueue.c~a kernel/workqueue.c
> --- a/kernel/workqueue.c~a
> +++ a/kernel/workqueue.c
> @@ -962,7 +962,7 @@ init_cpu_workqueue(struct workqueue_stru
>  
>  static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
>  {
> -	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
> +	static struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
>  	struct workqueue_struct *wq = cwq->wq;
>  	const char *fmt = is_wq_single_threaded(wq) ? "%s" : "%s/%d";
>  	struct task_struct *p;
> diff -puN kernel/stop_machine.c~a kernel/stop_machine.c
> --- a/kernel/stop_machine.c~a
> +++ a/kernel/stop_machine.c
> @@ -291,7 +291,7 @@ repeat:
>  static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
>  					   unsigned long action, void *hcpu)
>  {
> -	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
> +	static struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
>  	unsigned int cpu = (unsigned long)hcpu;
>  	struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
>  	struct task_struct *p;
> diff -puN kernel/sched.c~a kernel/sched.c
> --- a/kernel/sched.c~a
> +++ a/kernel/sched.c
> @@ -4570,7 +4570,7 @@ static bool check_same_owner(struct task
>  }
>  
>  static int __sched_setscheduler(struct task_struct *p, int policy,
> -				struct sched_param *param, bool user)
> +				const struct sched_param *param, bool user)
>  {
>  	int retval, oldprio, oldpolicy = -1, on_rq, running;
>  	unsigned long flags;
> @@ -4734,7 +4734,7 @@ EXPORT_SYMBOL_GPL(sched_setscheduler);
>   * but our caller might not have that capability.
>   */
>  int sched_setscheduler_nocheck(struct task_struct *p, int policy,
> -			       struct sched_param *param)
> +			       const struct sched_param *param)
>  {
>  	return __sched_setscheduler(p, policy, param, false);
>  }
> diff -puN kernel/softirq.c~a kernel/softirq.c
> --- a/kernel/softirq.c~a
> +++ a/kernel/softirq.c
> @@ -827,7 +827,9 @@ static int __cpuinit cpu_callback(struct
>  			     cpumask_any(cpu_online_mask));
>  	case CPU_DEAD:
>  	case CPU_DEAD_FROZEN: {
> -		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
> +		static struct sched_param param = {
> +				.sched_priority = MAX_RT_PRIO-1,
> +			};
>  
>  		p = per_cpu(ksoftirqd, hotcpu);
>  		per_cpu(ksoftirqd, hotcpu) = NULL;
> diff -puN include/linux/sched.h~a include/linux/sched.h
> --- a/include/linux/sched.h~a
> +++ a/include/linux/sched.h
> @@ -1924,7 +1924,7 @@ extern int task_curr(const struct task_s
>  extern int idle_cpu(int cpu);
>  extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
>  extern int sched_setscheduler_nocheck(struct task_struct *, int,
> -				      struct sched_param *);
> +				      const struct sched_param *);
>  extern struct task_struct *idle_task(int cpu);
>  extern struct task_struct *curr_task(int cpu);
>  extern void set_curr_task(int cpu, struct task_struct *p);
> _
> 




^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/2] security: add const to security_task_setscheduler()
  2010-07-06  0:49     ` KOSAKI Motohiro
@ 2010-07-06  0:50       ` KOSAKI Motohiro
  2010-07-06  0:51       ` [PATCH 2/2] sched: make sched_param arugment static variables in some sched_setscheduler() caller KOSAKI Motohiro
  1 sibling, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-07-06  0:50 UTC (permalink / raw)
  To: James Morris
  Cc: kosaki.motohiro, Andrew Morton, LKML, linux-mm, Minchan Kim,
	David Rientjes, KAMEZAWA Hiroyuki, Ingo Molnar, Peter Zijlstra

All security modules shouldn't change sched_param parameter of
security_task_setscheduler(). This is not only meaningless, but
also make harmful result if caller pass static variable.

This patch add const to it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/security.h   |    9 +++++----
 security/commoncap.c       |    2 +-
 security/security.c        |    4 ++--
 security/selinux/hooks.c   |    3 ++-
 security/smack/smack_lsm.c |    2 +-
 5 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 5bcb395..07e94e5 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -74,7 +74,8 @@ extern int cap_file_mmap(struct file *file, unsigned long reqprot,
 extern int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags);
 extern int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 			  unsigned long arg4, unsigned long arg5);
-extern int cap_task_setscheduler(struct task_struct *p, int policy, struct sched_param *lp);
+extern int cap_task_setscheduler(struct task_struct *p, int policy,
+				 const struct sched_param *lp);
 extern int cap_task_setioprio(struct task_struct *p, int ioprio);
 extern int cap_task_setnice(struct task_struct *p, int nice);
 extern int cap_syslog(int type, bool from_file);
@@ -1501,7 +1502,7 @@ struct security_operations {
 	int (*task_getioprio) (struct task_struct *p);
 	int (*task_setrlimit) (unsigned int resource, struct rlimit *new_rlim);
 	int (*task_setscheduler) (struct task_struct *p, int policy,
-				  struct sched_param *lp);
+				  const struct sched_param *lp);
 	int (*task_getscheduler) (struct task_struct *p);
 	int (*task_movememory) (struct task_struct *p);
 	int (*task_kill) (struct task_struct *p,
@@ -1750,8 +1751,8 @@ int security_task_setnice(struct task_struct *p, int nice);
 int security_task_setioprio(struct task_struct *p, int ioprio);
 int security_task_getioprio(struct task_struct *p);
 int security_task_setrlimit(unsigned int resource, struct rlimit *new_rlim);
-int security_task_setscheduler(struct task_struct *p,
-				int policy, struct sched_param *lp);
+int security_task_setscheduler(struct task_struct *p, int policy,
+			       const struct sched_param *lp);
 int security_task_getscheduler(struct task_struct *p);
 int security_task_movememory(struct task_struct *p);
 int security_task_kill(struct task_struct *p, struct siginfo *info,
diff --git a/security/commoncap.c b/security/commoncap.c
index 4e01599..b74d460 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -726,7 +726,7 @@ static int cap_safe_nice(struct task_struct *p)
  * specified task, returning 0 if permission is granted, -ve if denied.
  */
 int cap_task_setscheduler(struct task_struct *p, int policy,
-			   struct sched_param *lp)
+			  const struct sched_param *lp)
 {
 	return cap_safe_nice(p);
 }
diff --git a/security/security.c b/security/security.c
index 7461b1b..6151322 100644
--- a/security/security.c
+++ b/security/security.c
@@ -785,8 +785,8 @@ int security_task_setrlimit(unsigned int resource, struct rlimit *new_rlim)
 	return security_ops->task_setrlimit(resource, new_rlim);
 }
 
-int security_task_setscheduler(struct task_struct *p,
-				int policy, struct sched_param *lp)
+int security_task_setscheduler(struct task_struct *p, int policy,
+			       const struct sched_param *lp)
 {
 	return security_ops->task_setscheduler(p, policy, lp);
 }
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 5c9f25b..dd136bd 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3385,7 +3385,8 @@ static int selinux_task_setrlimit(unsigned int resource, struct rlimit *new_rlim
 	return 0;
 }
 
-static int selinux_task_setscheduler(struct task_struct *p, int policy, struct sched_param *lp)
+static int selinux_task_setscheduler(struct task_struct *p, int policy,
+				     const struct sched_param *lp)
 {
 	int rc;
 
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 07abc9c..c3336f1 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -1280,7 +1280,7 @@ static int smack_task_getioprio(struct task_struct *p)
  * Return 0 if read access is permitted
  */
 static int smack_task_setscheduler(struct task_struct *p, int policy,
-				   struct sched_param *lp)
+				   const struct sched_param *lp)
 {
 	int rc;
 
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/2] sched: make sched_param arugment static variables in some sched_setscheduler() caller
  2010-07-06  0:49     ` KOSAKI Motohiro
  2010-07-06  0:50       ` [PATCH 1/2] security: add const to security_task_setscheduler() KOSAKI Motohiro
@ 2010-07-06  0:51       ` KOSAKI Motohiro
  2010-07-06 22:13         ` Steven Rostedt
  1 sibling, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2010-07-06  0:51 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: kosaki.motohiro, Andrew Morton, LKML, linux-mm, Minchan Kim,
	David Rientjes, KAMEZAWA Hiroyuki, James Morris

Andrew Morton pointed out almost sched_setscheduler() caller are
using fixed parameter and it can be converted static. it reduce
runtume memory waste a bit.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/sched.h         |    5 +++--
 kernel/irq/manage.c           |    4 +++-
 kernel/kthread.c              |    2 +-
 kernel/sched.c                |    6 +++---
 kernel/softirq.c              |    4 +++-
 kernel/stop_machine.c         |    2 +-
 kernel/trace/trace_selftest.c |    2 +-
 kernel/watchdog.c             |    2 +-
 kernel/workqueue.c            |    2 +-
 9 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index cfcd13e..97b7fe3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1917,9 +1917,10 @@ extern int task_nice(const struct task_struct *p);
 extern int can_nice(const struct task_struct *p, const int nice);
 extern int task_curr(const struct task_struct *p);
 extern int idle_cpu(int cpu);
-extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
+extern int sched_setscheduler(struct task_struct *, int,
+			      const struct sched_param *);
 extern int sched_setscheduler_nocheck(struct task_struct *, int,
-				      struct sched_param *);
+				      const struct sched_param *);
 extern struct task_struct *idle_task(int cpu);
 extern struct task_struct *curr_task(int cpu);
 extern void set_curr_task(int cpu, struct task_struct *p);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 3164ba7..ce1dbb8 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -569,7 +569,9 @@ irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
  */
 static int irq_thread(void *data)
 {
-	struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO/2, };
+	static struct sched_param param = {
+		.sched_priority = MAX_USER_RT_PRIO/2,
+	};
 	struct irqaction *action = data;
 	struct irq_desc *desc = irq_to_desc(action->irq);
 	int wake, oneshot = desc->status & IRQ_ONESHOT;
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 83911c7..3b55a26 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -131,7 +131,7 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
 	wait_for_completion(&create.done);
 
 	if (!IS_ERR(create.result)) {
-		struct sched_param param = { .sched_priority = 0 };
+		static struct sched_param param = { .sched_priority = 0 };
 		va_list args;
 
 		va_start(args, namefmt);
diff --git a/kernel/sched.c b/kernel/sched.c
index 18faf4d..8f2f4b4 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4396,7 +4396,7 @@ static bool check_same_owner(struct task_struct *p)
 }
 
 static int __sched_setscheduler(struct task_struct *p, int policy,
-				struct sched_param *param, bool user)
+				const struct sched_param *param, bool user)
 {
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	unsigned long flags;
@@ -4540,7 +4540,7 @@ recheck:
  * NOTE that the task may be already dead.
  */
 int sched_setscheduler(struct task_struct *p, int policy,
-		       struct sched_param *param)
+		       const struct sched_param *param)
 {
 	return __sched_setscheduler(p, policy, param, true);
 }
@@ -4558,7 +4558,7 @@ EXPORT_SYMBOL_GPL(sched_setscheduler);
  * but our caller might not have that capability.
  */
 int sched_setscheduler_nocheck(struct task_struct *p, int policy,
-			       struct sched_param *param)
+			       const struct sched_param *param)
 {
 	return __sched_setscheduler(p, policy, param, false);
 }
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 07b4f1b..be3ec11 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -827,7 +827,9 @@ static int __cpuinit cpu_callback(struct notifier_block *nfb,
 			     cpumask_any(cpu_online_mask));
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN: {
-		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+		static struct sched_param param = {
+			.sched_priority = MAX_RT_PRIO-1
+		};
 
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 70f8d90..da9bb42 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -291,7 +291,7 @@ repeat:
 static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
 					   unsigned long action, void *hcpu)
 {
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
+	static struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
 	unsigned int cpu = (unsigned long)hcpu;
 	struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
 	struct task_struct *p;
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 250e7f9..d2ebe6e 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -560,7 +560,7 @@ trace_selftest_startup_nop(struct tracer *trace, struct trace_array *tr)
 static int trace_wakeup_test_thread(void *data)
 {
 	/* Make this a RT thread, doesn't need to be too high */
-	struct sched_param param = { .sched_priority = 5 };
+	static struct sched_param param = { .sched_priority = 5 };
 	struct completion *x = data;
 
 	sched_setscheduler(current, SCHED_FIFO, &param);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 91b0b26..95d8463 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -310,7 +310,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
  */
 static int watchdog(void *unused)
 {
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+	static struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 	struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
 
 	sched_setscheduler(current, SCHED_FIFO, &param);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 327d2de..036939a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -947,7 +947,7 @@ init_cpu_workqueue(struct workqueue_struct *wq, int cpu)
 
 static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
 {
-	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+	static struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 	struct workqueue_struct *wq = cwq->wq;
 	const char *fmt = is_wq_single_threaded(wq) ? "%s" : "%s/%d";
 	struct task_struct *p;
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] sched: make sched_param arugment static variables in some sched_setscheduler() caller
  2010-07-06  0:51       ` [PATCH 2/2] sched: make sched_param arugment static variables in some sched_setscheduler() caller KOSAKI Motohiro
@ 2010-07-06 22:13         ` Steven Rostedt
  2010-07-06 23:12           ` Andrew Morton
  0 siblings, 1 reply; 36+ messages in thread
From: Steven Rostedt @ 2010-07-06 22:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Ingo Molnar, Peter Zijlstra, Andrew Morton, LKML, linux-mm,
	Minchan Kim, David Rientjes, KAMEZAWA Hiroyuki, James Morris

On Tue, 2010-07-06 at 09:51 +0900, KOSAKI Motohiro wrote:
> Andrew Morton pointed out almost sched_setscheduler() caller are
> using fixed parameter and it can be converted static. it reduce
> runtume memory waste a bit.

We are replacing runtime waste with permanent waste?

> 
> Reported-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



> --- a/kernel/trace/trace_selftest.c
> +++ b/kernel/trace/trace_selftest.c
> @@ -560,7 +560,7 @@ trace_selftest_startup_nop(struct tracer *trace, struct trace_array *tr)
>  static int trace_wakeup_test_thread(void *data)
>  {
>  	/* Make this a RT thread, doesn't need to be too high */
> -	struct sched_param param = { .sched_priority = 5 };
> +	static struct sched_param param = { .sched_priority = 5 };
>  	struct completion *x = data;
>  

This is a thread that runs on boot up to test the sched_wakeup tracer.
Then it is deleted and all memory is reclaimed.

Thus, this patch just took memory that was usable at run time and
removed it permanently.

Please Cc me on all tracing changes.

-- Steve



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] sched: make sched_param arugment static variables in some sched_setscheduler() caller
  2010-07-06 22:13         ` Steven Rostedt
@ 2010-07-06 23:12           ` Andrew Morton
  2010-07-06 23:49             ` Steven Rostedt
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Morton @ 2010-07-06 23:12 UTC (permalink / raw)
  To: rostedt
  Cc: KOSAKI Motohiro, Ingo Molnar, Peter Zijlstra, LKML, linux-mm,
	Minchan Kim, David Rientjes, KAMEZAWA Hiroyuki, James Morris

On Tue, 06 Jul 2010 18:13:58 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 2010-07-06 at 09:51 +0900, KOSAKI Motohiro wrote:
> > Andrew Morton pointed out almost sched_setscheduler() caller are
> > using fixed parameter and it can be converted static. it reduce
> > runtume memory waste a bit.
> 
> We are replacing runtime waste with permanent waste?

Confused.  kernel/trace/ appears to waste resources by design, so what's
the issue?

I don't think this change will cause more waste.  It'll consume 4 bytes
of .data and will save a little more .text.

> > 
> > Reported-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> 
> 
> > --- a/kernel/trace/trace_selftest.c
> > +++ b/kernel/trace/trace_selftest.c
> > @@ -560,7 +560,7 @@ trace_selftest_startup_nop(struct tracer *trace, struct trace_array *tr)
> >  static int trace_wakeup_test_thread(void *data)
> >  {
> >  	/* Make this a RT thread, doesn't need to be too high */
> > -	struct sched_param param = { .sched_priority = 5 };
> > +	static struct sched_param param = { .sched_priority = 5 };
> >  	struct completion *x = data;
> >  
> 
> This is a thread that runs on boot up to test the sched_wakeup tracer.
> Then it is deleted and all memory is reclaimed.
> 
> Thus, this patch just took memory that was usable at run time and
> removed it permanently.
> 
> Please Cc me on all tracing changes.

Well if we're so worried about resource wastage then how about making
all boot-time-only text and data reside in __init and __initdata
sections rather than hanging around uselessly in memory for ever?

Only that's going to be hard because we went and added pointers into
.init.text from .data due to `struct tracer.selftest', which will cause
a storm of section mismatch warnings.  Doh, should have invoked the
selftests from initcalls.  That might open the opportunity of running
the selftests by modprobing the selftest module, too.

And I _do_ wish the selftest module was modprobeable, rather than this
monstrosity:

#ifdef CONFIG_FTRACE_SELFTEST
/* Let selftest have access to static functions in this file */
#include "trace_selftest.c"
#endif

Really?  Who had a tastebudectomy over there?  At least call it
trace_selftest.inc or something, so poor schmucks don't go scrabbling
around wondering "how the hell does this thing get built oh no they
didn't really go and #include it did they?"



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] sched: make sched_param arugment static variables in some sched_setscheduler() caller
  2010-07-06 23:12           ` Andrew Morton
@ 2010-07-06 23:49             ` Steven Rostedt
  2010-07-07  0:02               ` Andrew Morton
  0 siblings, 1 reply; 36+ messages in thread
From: Steven Rostedt @ 2010-07-06 23:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Ingo Molnar, Peter Zijlstra, LKML, linux-mm,
	Minchan Kim, David Rientjes, KAMEZAWA Hiroyuki, James Morris

On Tue, 2010-07-06 at 16:12 -0700, Andrew Morton wrote:

> Well if we're so worried about resource wastage then how about making
> all boot-time-only text and data reside in __init and __initdata
> sections rather than hanging around uselessly in memory for ever?

That would be a patch I would like :-)

I could probably do that when I get some time.

> 
> Only that's going to be hard because we went and added pointers into
> .init.text from .data due to `struct tracer.selftest', which will cause
> a storm of section mismatch warnings.  Doh, should have invoked the
> selftests from initcalls.  That might open the opportunity of running
> the selftests by modprobing the selftest module, too.

They are called by initcalls. The initcalls register the tracers and
that is the time we call the selftest. No other time.

Is there a way that we set up a function pointer to let the section
checks know that it is only called at bootup?

> 
> And I _do_ wish the selftest module was modprobeable, rather than this
> monstrosity:

The selftests are done by individual tracers at boot up. It would be
hard to modprobe them at that time.


> #ifdef CONFIG_FTRACE_SELFTEST
> /* Let selftest have access to static functions in this file */
> #include "trace_selftest.c"
> #endif
> 
> Really?  Who had a tastebudectomy over there?  At least call it
> trace_selftest.inc or something, so poor schmucks don't go scrabbling
> around wondering "how the hell does this thing get built oh no they
> didn't really go and #include it did they?"


Well this is also the way sched.c adds all its extra code. Making it
trace_selftest.inc would make it hard to know what the hell it was. And
also hard for editors to know what type of file it is, or things can be
missed with a 'find . -name "*.[ch]" | xargs grep blahblah'

Yes, the self tests are ugly and can probably go with an overhaul. Since
we are trying to get away from the tracer plugins anyway, they will
start disappearing when the plugins do.

We should have some main selftests anyway. Those are for the TRACE_EVENT
tests (which are not even in the trace_selftest.c file, and the function
testing which currently are, as well as the latency testers.

The trace_selftest.c should eventually be replaced with more compact
tests for the specific types of tracing.

-- Steve



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] sched: make sched_param arugment static variables in some sched_setscheduler() caller
  2010-07-06 23:49             ` Steven Rostedt
@ 2010-07-07  0:02               ` Andrew Morton
  2010-07-07 19:43                 ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Morton @ 2010-07-07  0:02 UTC (permalink / raw)
  To: rostedt
  Cc: KOSAKI Motohiro, Ingo Molnar, Peter Zijlstra, LKML, linux-mm,
	Minchan Kim, David Rientjes, KAMEZAWA Hiroyuki, James Morris

On Tue, 06 Jul 2010 19:49:47 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 2010-07-06 at 16:12 -0700, Andrew Morton wrote:
> 
> > Well if we're so worried about resource wastage then how about making
> > all boot-time-only text and data reside in __init and __initdata
> > sections rather than hanging around uselessly in memory for ever?
> 
> That would be a patch I would like :-)
> 
> I could probably do that when I get some time.
> 
> > 
> > Only that's going to be hard because we went and added pointers into
> > .init.text from .data due to `struct tracer.selftest', which will cause
> > a storm of section mismatch warnings.  Doh, should have invoked the
> > selftests from initcalls.  That might open the opportunity of running
> > the selftests by modprobing the selftest module, too.
> 
> They are called by initcalls. The initcalls register the tracers and
> that is the time we call the selftest. No other time.

It should all be __init!

> Is there a way that we set up a function pointer to let the section
> checks know that it is only called at bootup?

<sticks his nose in modpost.c for the first time>

There are various whitelisting hacks in there based on the name of the
offending symbol.  Search term: DEFAULT_SYMBOL_WHITE_LIST.

It'd be cleaner to just zap the tracer.selftest field altogether and
run the tests from initcalls if possible?

> > 
> > And I _do_ wish the selftest module was modprobeable, rather than this
> > monstrosity:
> 
> The selftests are done by individual tracers at boot up. It would be
> hard to modprobe them at that time.

No, if tracer_selftest.o was linked into vmlinux then the tests get run
within do_initcalls().  If tracer_selftest.o is a module, then the tests
get run at modprobe-time.  The latter option may not be terribly useful
but it comes basically for free as a reward for doing stuff correctly.

> 
> > #ifdef CONFIG_FTRACE_SELFTEST
> > /* Let selftest have access to static functions in this file */
> > #include "trace_selftest.c"
> > #endif
> > 
> > Really?  Who had a tastebudectomy over there?  At least call it
> > trace_selftest.inc or something, so poor schmucks don't go scrabbling
> > around wondering "how the hell does this thing get built oh no they
> > didn't really go and #include it did they?"
> 
> 
> Well this is also the way sched.c adds all its extra code.

The sched.c hack sucks too.

> Making it
> trace_selftest.inc would make it hard to know what the hell it was.

trace_selftest.i_really_suck?

> And
> also hard for editors to know what type of file it is, or things can be
> missed with a 'find . -name "*.[ch]" | xargs grep blahblah'

Well bad luck.  Of _course_ it makes a mess.  It's already a mess.

How's about just removing the `static' from whichever symbols
tracer_selftest needs?  That'd surely be better than #including a .c
file.

> Yes, the self tests are ugly and can probably go with an overhaul. Since
> we are trying to get away from the tracer plugins anyway, they will
> start disappearing when the plugins do.
> 
> We should have some main selftests anyway. Those are for the TRACE_EVENT
> tests (which are not even in the trace_selftest.c file, and the function
> testing which currently are, as well as the latency testers.
> 
> The trace_selftest.c should eventually be replaced with more compact
> tests for the specific types of tracing.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] sched: make sched_param arugment static variables in some sched_setscheduler() caller
  2010-07-07  0:02               ` Andrew Morton
@ 2010-07-07 19:43                 ` Peter Zijlstra
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2010-07-07 19:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: rostedt, KOSAKI Motohiro, Ingo Molnar, LKML, linux-mm,
	Minchan Kim, David Rientjes, KAMEZAWA Hiroyuki, James Morris

On Tue, 2010-07-06 at 17:02 -0700, Andrew Morton wrote:
> > Well this is also the way sched.c adds all its extra code.
> 
> The sched.c hack sucks too. 

Agreed, moving things to kernel/sched/ and adding some internal.h thing
could cure that, but I simply haven't gotten around to cleaning that
up..

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2010-07-07 19:44 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-30  9:25 [mmotm 0611][PATCH 00/11] various OOM bugfixes v3 KOSAKI Motohiro
2010-06-30  9:27 ` [PATCH 01/11] oom: don't try to kill oom_unkillable child KOSAKI Motohiro
2010-06-30  9:27 ` [PATCH 02/11] oom: oom_kill_process() doesn't select kthread child KOSAKI Motohiro
2010-06-30 13:55   ` Minchan Kim
2010-07-01  0:07     ` KOSAKI Motohiro
2010-07-01 13:38       ` Minchan Kim
2010-06-30  9:28 ` [PATCH 03/11] oom: make oom_unkillable_task() helper function KOSAKI Motohiro
2010-06-30 14:19   ` Minchan Kim
2010-07-01  0:07     ` KOSAKI Motohiro
2010-06-30  9:29 ` [PATCH 04/11] oom: oom_kill_process() need to check p is unkillable KOSAKI Motohiro
2010-06-30 13:57   ` Minchan Kim
2010-06-30  9:30 ` [PATCH 05/11] oom: /proc/<pid>/oom_score treat kernel thread honestly KOSAKI Motohiro
2010-06-30 14:03   ` Minchan Kim
2010-07-01  0:07     ` KOSAKI Motohiro
2010-07-01 14:36       ` Minchan Kim
2010-06-30  9:31 ` [PATCH 06/11] oom: kill duplicate OOM_DISABLE check KOSAKI Motohiro
2010-06-30 14:10   ` Minchan Kim
2010-06-30  9:31 ` [PATCH 07/11] oom: move OOM_DISABLE check from oom_kill_task to out_of_memory() KOSAKI Motohiro
2010-06-30 14:20   ` Minchan Kim
2010-07-01  0:07     ` KOSAKI Motohiro
2010-06-30  9:32 ` [PATCH 08/11] oom: cleanup has_intersects_mems_allowed() KOSAKI Motohiro
2010-06-30  9:32 ` [PATCH 09/11] oom: remove child->mm check from oom_kill_process() KOSAKI Motohiro
2010-06-30 14:30   ` Minchan Kim
2010-06-30  9:33 ` [PATCH 10/11] oom: give the dying task a higher priority KOSAKI Motohiro
2010-06-30  9:35   ` KOSAKI Motohiro
2010-06-30 14:40     ` Minchan Kim
2010-07-02 21:49   ` Andrew Morton
2010-07-06  0:49     ` KOSAKI Motohiro
2010-07-06  0:50       ` [PATCH 1/2] security: add const to security_task_setscheduler() KOSAKI Motohiro
2010-07-06  0:51       ` [PATCH 2/2] sched: make sched_param arugment static variables in some sched_setscheduler() caller KOSAKI Motohiro
2010-07-06 22:13         ` Steven Rostedt
2010-07-06 23:12           ` Andrew Morton
2010-07-06 23:49             ` Steven Rostedt
2010-07-07  0:02               ` Andrew Morton
2010-07-07 19:43                 ` Peter Zijlstra
2010-06-30  9:34 ` [PATCH 11/11] oom: multi threaded process coredump don't make deadlock KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).