[PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory (aka CAI founded issue)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-20  8:00 ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:00 UTC (permalink / raw)
  To: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg
  Cc: kosaki.motohiro


CAI Qian reported current oom logic doesn't work at all on his 16GB RAM
machine. oom killer killed all system daemon at first and his system
stopped responding.

The brief log is below.

> Out of memory: Kill process 1175 (dhclient) score 1 or sacrifice child
> Out of memory: Kill process 1247 (rsyslogd) score 1 or sacrifice child
> Out of memory: Kill process 1284 (irqbalance) score 1 or sacrifice child
> Out of memory: Kill process 1303 (rpcbind) score 1 or sacrifice child
> Out of memory: Kill process 1321 (rpc.statd) score 1 or sacrifice child
> Out of memory: Kill process 1333 (mdadm) score 1 or sacrifice child
> Out of memory: Kill process 1365 (rpc.idmapd) score 1 or sacrifice child
> Out of memory: Kill process 1403 (dbus-daemon) score 1 or sacrifice child
> Out of memory: Kill process 1438 (acpid) score 1 or sacrifice child
> Out of memory: Kill process 1447 (hald) score 1 or sacrifice child
> Out of memory: Kill process 1447 (hald) score 1 or sacrifice child
> Out of memory: Kill process 1487 (hald-addon-inpu) score 1 or sacrifice child
> Out of memory: Kill process 1488 (hald-addon-acpi) score 1 or sacrifice child
> Out of memory: Kill process 1507 (automount) score 1 or sacrifice child


The problems are three.

1) if two processes have the same oom score, we should kill younger process.
but current logic kill older. Typically oldest processes are system daemons.
2) Current logic use 'unsigned int' for internal score calculation. (exactly says,
it only use 0-1000 value). its very low precision calculation makes a lot of
same oom score and kill an ineligible process.
3) Current logic give 3% of SystemRAM to root processes. It obviously too big
if you have plenty memory. Now, your fork-bomb processes have 500MB OOM immune
bonus. then your fork-bomb never ever be killed.


KOSAKI Motohiro (5):
  oom: improve dump_tasks() show items
  oom: kill younger process first
  oom: oom-killer don't use proportion of system-ram internally
  oom: don't kill random process
  oom: merge oom_kill_process() with oom_kill_task()

 fs/proc/base.c        |   13 ++-
 include/linux/oom.h   |   10 +--
 include/linux/sched.h |   11 +++
 mm/oom_kill.c         |  201 +++++++++++++++++++++++++++----------------------
 4 files changed, 135 insertions(+), 100 deletions(-)

-- 
1.7.3.1



^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-20  8:00 ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:00 UTC (permalink / raw)
  To: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg
  Cc: kosaki.motohiro


CAI Qian reported current oom logic doesn't work at all on his 16GB RAM
machine. oom killer killed all system daemon at first and his system
stopped responding.

The brief log is below.

> Out of memory: Kill process 1175 (dhclient) score 1 or sacrifice child
> Out of memory: Kill process 1247 (rsyslogd) score 1 or sacrifice child
> Out of memory: Kill process 1284 (irqbalance) score 1 or sacrifice child
> Out of memory: Kill process 1303 (rpcbind) score 1 or sacrifice child
> Out of memory: Kill process 1321 (rpc.statd) score 1 or sacrifice child
> Out of memory: Kill process 1333 (mdadm) score 1 or sacrifice child
> Out of memory: Kill process 1365 (rpc.idmapd) score 1 or sacrifice child
> Out of memory: Kill process 1403 (dbus-daemon) score 1 or sacrifice child
> Out of memory: Kill process 1438 (acpid) score 1 or sacrifice child
> Out of memory: Kill process 1447 (hald) score 1 or sacrifice child
> Out of memory: Kill process 1447 (hald) score 1 or sacrifice child
> Out of memory: Kill process 1487 (hald-addon-inpu) score 1 or sacrifice child
> Out of memory: Kill process 1488 (hald-addon-acpi) score 1 or sacrifice child
> Out of memory: Kill process 1507 (automount) score 1 or sacrifice child


The problems are three.

1) if two processes have the same oom score, we should kill younger process.
but current logic kill older. Typically oldest processes are system daemons.
2) Current logic use 'unsigned int' for internal score calculation. (exactly says,
it only use 0-1000 value). its very low precision calculation makes a lot of
same oom score and kill an ineligible process.
3) Current logic give 3% of SystemRAM to root processes. It obviously too big
if you have plenty memory. Now, your fork-bomb processes have 500MB OOM immune
bonus. then your fork-bomb never ever be killed.


KOSAKI Motohiro (5):
  oom: improve dump_tasks() show items
  oom: kill younger process first
  oom: oom-killer don't use proportion of system-ram internally
  oom: don't kill random process
  oom: merge oom_kill_process() with oom_kill_task()

 fs/proc/base.c        |   13 ++-
 include/linux/oom.h   |   10 +--
 include/linux/sched.h |   11 +++
 mm/oom_kill.c         |  201 +++++++++++++++++++++++++++----------------------
 4 files changed, 135 insertions(+), 100 deletions(-)

-- 
1.7.3.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [PATCH 1/5] oom: improve dump_tasks() show items
  2011-05-20  8:00 ` KOSAKI Motohiro
@ 2011-05-20  8:01   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:01 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg

Recently, oom internal logic was dramatically changed. Thus
dump_tasks() doesn't show enough information for bug report
analysis. it has some meaningless items and don't have some
oom score related items.

This patch adapt displaying fields to new oom logic.

details
--------
removed: pid (we always kill process. don't need thread id),
         signal->oom_adj (we no longer uses it internally)
	 cpu (we no longer uses it)
added:  ppid (we often kill sacrifice child process)
        swap (it's accounted)
modify: RSS (account mm->nr_ptes too)

<old>
[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[ 3886]     0  3886     2893      441   1       0             0 bash
[ 3905]     0  3905    29361    25833   0       0             0 memtoy

<new>
[   pid]   ppid   uid total_vm      rss     swap score_adj name
[   417]      1     0     3298       12      184     -1000 udevd
[   830]      1     0     1776       11       16         0 system-setup-ke
[   973]      1     0    61179       35      116         0 rsyslogd
[  1733]   1732     0  1052337   958582        0         0 memtoy

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f52e85c..43d32ae 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -355,7 +355,7 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
 	struct task_struct *p;
 	struct task_struct *task;

-	pr_info("[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name\n");
+	pr_info("[   pid]   ppid   uid total_vm      rss     swap score_adj name\n");
 	for_each_process(p) {
 		if (oom_unkillable_task(p, mem, nodemask))
 			continue;
@@ -370,11 +370,14 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
 			continue;
 		}

-		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
-			task->pid, task_uid(task), task->tgid,
-			task->mm->total_vm, get_mm_rss(task->mm),
-			task_cpu(task), task->signal->oom_adj,
-			task->signal->oom_score_adj, task->comm);
+		pr_info("[%6d] %6d %5d %8lu %8lu %8lu %9d %s\n",
+			task_tgid_nr(task), task_tgid_nr(task->real_parent),
+			task_uid(task),
+			task->mm->total_vm,
+			get_mm_rss(task->mm) + p->mm->nr_ptes,
+			get_mm_counter(p->mm, MM_SWAPENTS),
+			task->signal->oom_score_adj,
+			task->comm);
 		task_unlock(task);
 	}
 }
-- 
1.7.3.1




^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 1/5] oom: improve dump_tasks() show items
@ 2011-05-20  8:01   ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:01 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg

Recently, oom internal logic was dramatically changed. Thus
dump_tasks() doesn't show enough information for bug report
analysis. it has some meaningless items and don't have some
oom score related items.

This patch adapt displaying fields to new oom logic.

details
--------
removed: pid (we always kill process. don't need thread id),
         signal->oom_adj (we no longer uses it internally)
	 cpu (we no longer uses it)
added:  ppid (we often kill sacrifice child process)
        swap (it's accounted)
modify: RSS (account mm->nr_ptes too)

<old>
[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[ 3886]     0  3886     2893      441   1       0             0 bash
[ 3905]     0  3905    29361    25833   0       0             0 memtoy

<new>
[   pid]   ppid   uid total_vm      rss     swap score_adj name
[   417]      1     0     3298       12      184     -1000 udevd
[   830]      1     0     1776       11       16         0 system-setup-ke
[   973]      1     0    61179       35      116         0 rsyslogd
[  1733]   1732     0  1052337   958582        0         0 memtoy

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f52e85c..43d32ae 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -355,7 +355,7 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
 	struct task_struct *p;
 	struct task_struct *task;

-	pr_info("[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name\n");
+	pr_info("[   pid]   ppid   uid total_vm      rss     swap score_adj name\n");
 	for_each_process(p) {
 		if (oom_unkillable_task(p, mem, nodemask))
 			continue;
@@ -370,11 +370,14 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
 			continue;
 		}

-		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
-			task->pid, task_uid(task), task->tgid,
-			task->mm->total_vm, get_mm_rss(task->mm),
-			task_cpu(task), task->signal->oom_adj,
-			task->signal->oom_score_adj, task->comm);
+		pr_info("[%6d] %6d %5d %8lu %8lu %8lu %9d %s\n",
+			task_tgid_nr(task), task_tgid_nr(task->real_parent),
+			task_uid(task),
+			task->mm->total_vm,
+			get_mm_rss(task->mm) + p->mm->nr_ptes,
+			get_mm_counter(p->mm, MM_SWAPENTS),
+			task->signal->oom_score_adj,
+			task->comm);
 		task_unlock(task);
 	}
 }
-- 
1.7.3.1



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 2/5] oom: kill younger process first
  2011-05-20  8:00 ` KOSAKI Motohiro
@ 2011-05-20  8:02   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:02 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg

This patch introduces do_each_thread_reverse() and select_bad_process()
uses it. The benefits are two, 1) oom-killer can kill younger process
than older if they have a same oom score. Usually younger process is
less important. 2) younger task often have PF_EXITING because shell
script makes a lot of short lived processes. Reverse order search can
detect it faster.

Reported-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/sched.h |   11 +++++++++++
 mm/oom_kill.c         |    2 +-
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 013314a..3698379 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2194,6 +2194,9 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
 #define next_task(p) \
 	list_entry_rcu((p)->tasks.next, struct task_struct, tasks)

+#define prev_task(p) \
+	list_entry((p)->tasks.prev, struct task_struct, tasks)
+
 #define for_each_process(p) \
 	for (p = &init_task ; (p = next_task(p)) != &init_task ; )

@@ -2206,6 +2209,14 @@ extern bool current_is_single_threaded(void);
 #define do_each_thread(g, t) \
 	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do

+/*
+ * Similar to do_each_thread(). but two difference are there.
+ *  - traverse tasks reverse order (i.e. younger to older)
+ *  - caller must hold tasklist_lock. rcu_read_lock isn't enough
+*/
+#define do_each_thread_reverse(g, t) \
+	for (g = t = &init_task ; (g = t = prev_task(g)) != &init_task ; ) do
+
 #define while_each_thread(g, t) \
 	while ((t = next_thread(t)) != g)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 43d32ae..e6a6c6f 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -282,7 +282,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 	struct task_struct *chosen = NULL;
 	*ppoints = 0;

-	do_each_thread(g, p) {
+	do_each_thread_reverse(g, p) {
 		unsigned int points;

 		if (!p->mm)
-- 
1.7.3.1




^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 2/5] oom: kill younger process first
@ 2011-05-20  8:02   ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:02 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg

This patch introduces do_each_thread_reverse() and select_bad_process()
uses it. The benefits are two, 1) oom-killer can kill younger process
than older if they have a same oom score. Usually younger process is
less important. 2) younger task often have PF_EXITING because shell
script makes a lot of short lived processes. Reverse order search can
detect it faster.

Reported-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/sched.h |   11 +++++++++++
 mm/oom_kill.c         |    2 +-
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 013314a..3698379 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2194,6 +2194,9 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
 #define next_task(p) \
 	list_entry_rcu((p)->tasks.next, struct task_struct, tasks)

+#define prev_task(p) \
+	list_entry((p)->tasks.prev, struct task_struct, tasks)
+
 #define for_each_process(p) \
 	for (p = &init_task ; (p = next_task(p)) != &init_task ; )

@@ -2206,6 +2209,14 @@ extern bool current_is_single_threaded(void);
 #define do_each_thread(g, t) \
 	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do

+/*
+ * Similar to do_each_thread(). but two difference are there.
+ *  - traverse tasks reverse order (i.e. younger to older)
+ *  - caller must hold tasklist_lock. rcu_read_lock isn't enough
+*/
+#define do_each_thread_reverse(g, t) \
+	for (g = t = &init_task ; (g = t = prev_task(g)) != &init_task ; ) do
+
 #define while_each_thread(g, t) \
 	while ((t = next_thread(t)) != g)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 43d32ae..e6a6c6f 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -282,7 +282,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 	struct task_struct *chosen = NULL;
 	*ppoints = 0;

-	do_each_thread(g, p) {
+	do_each_thread_reverse(g, p) {
 		unsigned int points;

 		if (!p->mm)
-- 
1.7.3.1



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-20  8:00 ` KOSAKI Motohiro
@ 2011-05-20  8:03   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:03 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg

CAI Qian reported his kernel did hang-up if he ran fork intensive
workload and then invoke oom-killer.

The problem is, current oom calculation uses 0-1000 normalized value
(The unit is a permillage of system-ram). Its low precision make
a lot of same oom score. IOW, in his case, all processes have smaller
oom score than 1 and internal calculation round it to 1.

Thus oom-killer kill ineligible process. This regression is caused by
commit a63d83f427 (oom: badness heuristic rewrite).

The solution is, the internal calculation just use number of pages
instead of permillage of system-ram. And convert it to permillage
value at displaying time.

This patch doesn't change any ABI (included  /proc/<pid>/oom_score_adj)
even though current logic has a lot of my dislike thing.

Reported-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/proc/base.c      |   13 ++++++----
 include/linux/oom.h |    7 +----
 mm/oom_kill.c       |   60 +++++++++++++++++++++++++++++++++-----------------
 3 files changed, 49 insertions(+), 31 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index dfa5327..d6b0424 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -476,14 +476,17 @@ static const struct file_operations proc_lstats_operations = {

 static int proc_oom_score(struct task_struct *task, char *buffer)
 {
-	unsigned long points = 0;
+	unsigned long points;
+	unsigned long ratio = 0;
+	unsigned long totalpages = totalram_pages + total_swap_pages + 1;

 	read_lock(&tasklist_lock);
-	if (pid_alive(task))
-		points = oom_badness(task, NULL, NULL,
-					totalram_pages + total_swap_pages);
+	if (pid_alive(task)) {
+		points = oom_badness(task, NULL, NULL, totalpages);
+		ratio = points * 1000 / totalpages;
+	}
 	read_unlock(&tasklist_lock);
-	return sprintf(buffer, "%lu\n", points);
+	return sprintf(buffer, "%lu\n", ratio);
 }

 struct limit_names {
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 5e3aa83..0f5b588 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -40,7 +40,8 @@ enum oom_constraint {
 	CONSTRAINT_MEMCG,
 };

-extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
+/* The badness from the OOM killer */
+extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 			const nodemask_t *nodemask, unsigned long totalpages);
 extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
 extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
@@ -62,10 +63,6 @@ static inline void oom_killer_enable(void)
 	oom_killer_disabled = false;
 }

-/* The badness from the OOM killer */
-extern unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
-		      const nodemask_t *nodemask, unsigned long uptime);
-
 extern struct task_struct *find_lock_task_mm(struct task_struct *p);

 /* sysctls */
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index e6a6c6f..8bbc3df 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -132,10 +132,12 @@ static bool oom_unkillable_task(struct task_struct *p,
  * predictable as possible.  The goal is to return the highest value for the
  * task consuming the most memory to avoid subsequent oom failures.
  */
-unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
+unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 		      const nodemask_t *nodemask, unsigned long totalpages)
 {
-	int points;
+	unsigned long points;
+	unsigned long score_adj = 0;
+

 	if (oom_unkillable_task(p, mem, nodemask))
 		return 0;
@@ -160,7 +162,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 */
 	if (p->flags & PF_OOM_ORIGIN) {
 		task_unlock(p);
-		return 1000;
+		return ULONG_MAX;
 	}

 	/*
@@ -176,33 +178,49 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 */
 	points = get_mm_rss(p->mm) + p->mm->nr_ptes;
 	points += get_mm_counter(p->mm, MM_SWAPENTS);
-
-	points *= 1000;
-	points /= totalpages;
 	task_unlock(p);

 	/*
 	 * Root processes get 3% bonus, just like the __vm_enough_memory()
 	 * implementation used by LSMs.
+	 *
+	 * XXX: Too large bonus, example, if the system have tera-bytes memory..
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
-		points -= 30;
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
+		if (points >= totalpages / 32)
+			points -= totalpages / 32;
+		else
+			points = 0;
+	}

 	/*
 	 * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
 	 * either completely disable oom killing or always prefer a certain
 	 * task.
 	 */
-	points += p->signal->oom_score_adj;
+	if (p->signal->oom_score_adj >= 0) {
+		score_adj = p->signal->oom_score_adj * (totalpages / 1000);
+		if (ULONG_MAX - points >= score_adj)
+			points += score_adj;
+		else
+			points = ULONG_MAX;
+	} else {
+		score_adj = -p->signal->oom_score_adj * (totalpages / 1000);
+		if (points >= score_adj)
+			points -= score_adj;
+		else
+			points = 0;
+	}

 	/*
 	 * Never return 0 for an eligible task that may be killed since it's
 	 * possible that no single user task uses more than 0.1% of memory and
 	 * no single admin tasks uses more than 3.0%.
 	 */
-	if (points <= 0)
-		return 1;
-	return (points < 1000) ? points : 1000;
+	if (!points)
+		points = 1;
+
+	return points;
 }

 /*
@@ -274,7 +292,7 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
  *
  * (not docbooked, we don't want this one cluttering up the manual)
  */
-static struct task_struct *select_bad_process(unsigned int *ppoints,
+static struct task_struct *select_bad_process(unsigned long *ppoints,
 		unsigned long totalpages, struct mem_cgroup *mem,
 		const nodemask_t *nodemask)
 {
@@ -283,7 +301,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 	*ppoints = 0;

 	do_each_thread_reverse(g, p) {
-		unsigned int points;
+		unsigned long points;

 		if (!p->mm)
 			continue;
@@ -314,7 +332,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 			 */
 			if (p == current) {
 				chosen = p;
-				*ppoints = 1000;
+				*ppoints = ULONG_MAX;
 			} else {
 				/*
 				 * If this task is not being ptraced on exit,
@@ -445,14 +463,14 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 #undef K

 static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
-			    unsigned int points, unsigned long totalpages,
+			    unsigned long points, unsigned long totalpages,
 			    struct mem_cgroup *mem, nodemask_t *nodemask,
 			    const char *message)
 {
 	struct task_struct *victim = p;
 	struct task_struct *child;
 	struct task_struct *t = p;
-	unsigned int victim_points = 0;
+	unsigned long victim_points = 0;

 	if (printk_ratelimit())
 		dump_header(p, gfp_mask, order, mem, nodemask);
@@ -467,7 +485,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	}

 	task_lock(p);
-	pr_err("%s: Kill process %d (%s) score %d or sacrifice child\n",
+	pr_err("%s: Kill process %d (%s) points %lu or sacrifice child\n",
 		message, task_pid_nr(p), p->comm, points);
 	task_unlock(p);

@@ -479,7 +497,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 */
 	do {
 		list_for_each_entry(child, &t->children, sibling) {
-			unsigned int child_points;
+			unsigned long child_points;

 			if (child->mm == p->mm)
 				continue;
@@ -526,7 +544,7 @@ static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
 void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
 {
 	unsigned long limit;
-	unsigned int points = 0;
+	unsigned long points = 0;
 	struct task_struct *p;

 	/*
@@ -675,7 +693,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	struct task_struct *p;
 	unsigned long totalpages;
 	unsigned long freed = 0;
-	unsigned int points;
+	unsigned long points;
 	enum oom_constraint constraint = CONSTRAINT_NONE;
 	int killed = 0;

-- 
1.7.3.1




^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-20  8:03   ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:03 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg

CAI Qian reported his kernel did hang-up if he ran fork intensive
workload and then invoke oom-killer.

The problem is, current oom calculation uses 0-1000 normalized value
(The unit is a permillage of system-ram). Its low precision make
a lot of same oom score. IOW, in his case, all processes have smaller
oom score than 1 and internal calculation round it to 1.

Thus oom-killer kill ineligible process. This regression is caused by
commit a63d83f427 (oom: badness heuristic rewrite).

The solution is, the internal calculation just use number of pages
instead of permillage of system-ram. And convert it to permillage
value at displaying time.

This patch doesn't change any ABI (included  /proc/<pid>/oom_score_adj)
even though current logic has a lot of my dislike thing.

Reported-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/proc/base.c      |   13 ++++++----
 include/linux/oom.h |    7 +----
 mm/oom_kill.c       |   60 +++++++++++++++++++++++++++++++++-----------------
 3 files changed, 49 insertions(+), 31 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index dfa5327..d6b0424 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -476,14 +476,17 @@ static const struct file_operations proc_lstats_operations = {

 static int proc_oom_score(struct task_struct *task, char *buffer)
 {
-	unsigned long points = 0;
+	unsigned long points;
+	unsigned long ratio = 0;
+	unsigned long totalpages = totalram_pages + total_swap_pages + 1;

 	read_lock(&tasklist_lock);
-	if (pid_alive(task))
-		points = oom_badness(task, NULL, NULL,
-					totalram_pages + total_swap_pages);
+	if (pid_alive(task)) {
+		points = oom_badness(task, NULL, NULL, totalpages);
+		ratio = points * 1000 / totalpages;
+	}
 	read_unlock(&tasklist_lock);
-	return sprintf(buffer, "%lu\n", points);
+	return sprintf(buffer, "%lu\n", ratio);
 }

 struct limit_names {
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 5e3aa83..0f5b588 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -40,7 +40,8 @@ enum oom_constraint {
 	CONSTRAINT_MEMCG,
 };

-extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
+/* The badness from the OOM killer */
+extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 			const nodemask_t *nodemask, unsigned long totalpages);
 extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
 extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
@@ -62,10 +63,6 @@ static inline void oom_killer_enable(void)
 	oom_killer_disabled = false;
 }

-/* The badness from the OOM killer */
-extern unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
-		      const nodemask_t *nodemask, unsigned long uptime);
-
 extern struct task_struct *find_lock_task_mm(struct task_struct *p);

 /* sysctls */
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index e6a6c6f..8bbc3df 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -132,10 +132,12 @@ static bool oom_unkillable_task(struct task_struct *p,
  * predictable as possible.  The goal is to return the highest value for the
  * task consuming the most memory to avoid subsequent oom failures.
  */
-unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
+unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 		      const nodemask_t *nodemask, unsigned long totalpages)
 {
-	int points;
+	unsigned long points;
+	unsigned long score_adj = 0;
+

 	if (oom_unkillable_task(p, mem, nodemask))
 		return 0;
@@ -160,7 +162,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 */
 	if (p->flags & PF_OOM_ORIGIN) {
 		task_unlock(p);
-		return 1000;
+		return ULONG_MAX;
 	}

 	/*
@@ -176,33 +178,49 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 */
 	points = get_mm_rss(p->mm) + p->mm->nr_ptes;
 	points += get_mm_counter(p->mm, MM_SWAPENTS);
-
-	points *= 1000;
-	points /= totalpages;
 	task_unlock(p);

 	/*
 	 * Root processes get 3% bonus, just like the __vm_enough_memory()
 	 * implementation used by LSMs.
+	 *
+	 * XXX: Too large bonus, example, if the system have tera-bytes memory..
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
-		points -= 30;
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
+		if (points >= totalpages / 32)
+			points -= totalpages / 32;
+		else
+			points = 0;
+	}

 	/*
 	 * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
 	 * either completely disable oom killing or always prefer a certain
 	 * task.
 	 */
-	points += p->signal->oom_score_adj;
+	if (p->signal->oom_score_adj >= 0) {
+		score_adj = p->signal->oom_score_adj * (totalpages / 1000);
+		if (ULONG_MAX - points >= score_adj)
+			points += score_adj;
+		else
+			points = ULONG_MAX;
+	} else {
+		score_adj = -p->signal->oom_score_adj * (totalpages / 1000);
+		if (points >= score_adj)
+			points -= score_adj;
+		else
+			points = 0;
+	}

 	/*
 	 * Never return 0 for an eligible task that may be killed since it's
 	 * possible that no single user task uses more than 0.1% of memory and
 	 * no single admin tasks uses more than 3.0%.
 	 */
-	if (points <= 0)
-		return 1;
-	return (points < 1000) ? points : 1000;
+	if (!points)
+		points = 1;
+
+	return points;
 }

 /*
@@ -274,7 +292,7 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
  *
  * (not docbooked, we don't want this one cluttering up the manual)
  */
-static struct task_struct *select_bad_process(unsigned int *ppoints,
+static struct task_struct *select_bad_process(unsigned long *ppoints,
 		unsigned long totalpages, struct mem_cgroup *mem,
 		const nodemask_t *nodemask)
 {
@@ -283,7 +301,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 	*ppoints = 0;

 	do_each_thread_reverse(g, p) {
-		unsigned int points;
+		unsigned long points;

 		if (!p->mm)
 			continue;
@@ -314,7 +332,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 			 */
 			if (p == current) {
 				chosen = p;
-				*ppoints = 1000;
+				*ppoints = ULONG_MAX;
 			} else {
 				/*
 				 * If this task is not being ptraced on exit,
@@ -445,14 +463,14 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 #undef K

 static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
-			    unsigned int points, unsigned long totalpages,
+			    unsigned long points, unsigned long totalpages,
 			    struct mem_cgroup *mem, nodemask_t *nodemask,
 			    const char *message)
 {
 	struct task_struct *victim = p;
 	struct task_struct *child;
 	struct task_struct *t = p;
-	unsigned int victim_points = 0;
+	unsigned long victim_points = 0;

 	if (printk_ratelimit())
 		dump_header(p, gfp_mask, order, mem, nodemask);
@@ -467,7 +485,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	}

 	task_lock(p);
-	pr_err("%s: Kill process %d (%s) score %d or sacrifice child\n",
+	pr_err("%s: Kill process %d (%s) points %lu or sacrifice child\n",
 		message, task_pid_nr(p), p->comm, points);
 	task_unlock(p);

@@ -479,7 +497,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 */
 	do {
 		list_for_each_entry(child, &t->children, sibling) {
-			unsigned int child_points;
+			unsigned long child_points;

 			if (child->mm == p->mm)
 				continue;
@@ -526,7 +544,7 @@ static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
 void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
 {
 	unsigned long limit;
-	unsigned int points = 0;
+	unsigned long points = 0;
 	struct task_struct *p;

 	/*
@@ -675,7 +693,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	struct task_struct *p;
 	unsigned long totalpages;
 	unsigned long freed = 0;
-	unsigned int points;
+	unsigned long points;
 	enum oom_constraint constraint = CONSTRAINT_NONE;
 	int killed = 0;

-- 
1.7.3.1



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 4/5] oom: don't kill random process
  2011-05-20  8:00 ` KOSAKI Motohiro
@ 2011-05-20  8:04   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:04 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg

CAI Qian reported oom-killer killed all system daemons in his
system at first if he ran fork bomb as root. The problem is,
current logic give them bonus of 3% of system ram. Example,
he has 16GB machine, then root processes have ~500MB oom
immune. It bring us crazy bad result. _all_ processes have
oom-score=1 and then, oom killer ignore process memory usage
and kill random process. This regression is caused by commit
a63d83f427 (oom: badness heuristic rewrite).

This patch changes select_bad_process() slightly. If oom points == 1,
it's a sign that the system have only root privileged processes or
similar. Thus, select_bad_process() calculate oom badness without
root bonus and select eligible process.

Also, this patch move finding sacrifice child logic into
select_bad_process(). It's necessary to implement adequate
no root bonus recalculation. and it makes good side effect,
current logic doesn't behave as the doc.

Documentation/sysctl/vm.txt says

    oom_kill_allocating_task

    If this is set to non-zero, the OOM killer simply kills the task that
    triggered the out-of-memory condition.  This avoids the expensive
    tasklist scan.

IOW, oom_kill_allocating_task shouldn't search sacrifice child.
This patch also fixes this issue.

Reported-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/proc/base.c      |    2 +-
 include/linux/oom.h |    3 +-
 mm/oom_kill.c       |   89 ++++++++++++++++++++++++++++----------------------
 3 files changed, 53 insertions(+), 41 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index d6b0424..b608b69 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -482,7 +482,7 @@ static int proc_oom_score(struct task_struct *task, char *buffer)

 	read_lock(&tasklist_lock);
 	if (pid_alive(task)) {
-		points = oom_badness(task, NULL, NULL, totalpages);
+		points = oom_badness(task, NULL, NULL, totalpages, 1);
 		ratio = points * 1000 / totalpages;
 	}
 	read_unlock(&tasklist_lock);
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 0f5b588..3dd3669 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -42,7 +42,8 @@ enum oom_constraint {

 /* The badness from the OOM killer */
 extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
-			const nodemask_t *nodemask, unsigned long totalpages);
+			const nodemask_t *nodemask, unsigned long totalpages,
+			int protect_root);
 extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
 extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8bbc3df..7d280d4 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -133,7 +133,8 @@ static bool oom_unkillable_task(struct task_struct *p,
  * task consuming the most memory to avoid subsequent oom failures.
  */
 unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
-		      const nodemask_t *nodemask, unsigned long totalpages)
+			 const nodemask_t *nodemask, unsigned long totalpages,
+			 int protect_root)
 {
 	unsigned long points;
 	unsigned long score_adj = 0;
@@ -186,7 +187,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 *
 	 * XXX: Too large bonus, example, if the system have tera-bytes memory..
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
+	if (protect_root && has_capability_noaudit(p, CAP_SYS_ADMIN)) {
 		if (points >= totalpages / 32)
 			points -= totalpages / 32;
 		else
@@ -298,8 +299,11 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 {
 	struct task_struct *g, *p;
 	struct task_struct *chosen = NULL;
-	*ppoints = 0;
+	int protect_root = 1;
+	unsigned long chosen_points = 0;
+	struct task_struct *child;

+ retry:
 	do_each_thread_reverse(g, p) {
 		unsigned long points;

@@ -332,7 +336,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 			 */
 			if (p == current) {
 				chosen = p;
-				*ppoints = ULONG_MAX;
+				chosen_points = ULONG_MAX;
 			} else {
 				/*
 				 * If this task is not being ptraced on exit,
@@ -345,13 +349,49 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 			}
 		}

-		points = oom_badness(p, mem, nodemask, totalpages);
-		if (points > *ppoints) {
+		points = oom_badness(p, mem, nodemask, totalpages, protect_root);
+		if (points > chosen_points) {
 			chosen = p;
-			*ppoints = points;
+			chosen_points = points;
 		}
 	} while_each_thread(g, p);

+	/*
+	 * chosen_point==1 may be a sign that root privilege bonus is too large
+	 * and we choose wrong task. Let's recalculate oom score without the
+	 * dubious bonus.
+	 */
+	if (protect_root && (chosen_points == 1)) {
+		protect_root = 0;
+		goto retry;
+	}
+
+	/*
+	 * If any of p's children has a different mm and is eligible for kill,
+	 * the one with the highest badness() score is sacrificed for its
+	 * parent.  This attempts to lose the minimal amount of work done while
+	 * still freeing memory.
+	 */
+	g = p = chosen;
+	do {
+		list_for_each_entry(child, &p->children, sibling) {
+			unsigned long child_points;
+
+			if (child->mm == p->mm)
+				continue;
+			/*
+			 * oom_badness() returns 0 if the thread is unkillable
+			 */
+			child_points = oom_badness(child, mem, nodemask,
+						   totalpages, protect_root);
+			if (child_points > chosen_points) {
+				chosen = child;
+				chosen_points = child_points;
+			}
+		}
+	} while_each_thread(g, p);
+
+	*ppoints = chosen_points;
 	return chosen;
 }

@@ -467,11 +507,6 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 			    struct mem_cgroup *mem, nodemask_t *nodemask,
 			    const char *message)
 {
-	struct task_struct *victim = p;
-	struct task_struct *child;
-	struct task_struct *t = p;
-	unsigned long victim_points = 0;
-
 	if (printk_ratelimit())
 		dump_header(p, gfp_mask, order, mem, nodemask);

@@ -485,35 +520,11 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	}

 	task_lock(p);
-	pr_err("%s: Kill process %d (%s) points %lu or sacrifice child\n",
-		message, task_pid_nr(p), p->comm, points);
+	pr_err("%s: Kill process %d (%s) points %lu\n",
+	       message, task_pid_nr(p), p->comm, points);
 	task_unlock(p);

-	/*
-	 * If any of p's children has a different mm and is eligible for kill,
-	 * the one with the highest badness() score is sacrificed for its
-	 * parent.  This attempts to lose the minimal amount of work done while
-	 * still freeing memory.
-	 */
-	do {
-		list_for_each_entry(child, &t->children, sibling) {
-			unsigned long child_points;
-
-			if (child->mm == p->mm)
-				continue;
-			/*
-			 * oom_badness() returns 0 if the thread is unkillable
-			 */
-			child_points = oom_badness(child, mem, nodemask,
-								totalpages);
-			if (child_points > victim_points) {
-				victim = child;
-				victim_points = child_points;
-			}
-		}
-	} while_each_thread(p, t);
-
-	return oom_kill_task(victim, mem);
+	return oom_kill_task(p, mem);
 }

 /*
-- 
1.7.3.1




^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 4/5] oom: don't kill random process
@ 2011-05-20  8:04   ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:04 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg

CAI Qian reported oom-killer killed all system daemons in his
system at first if he ran fork bomb as root. The problem is,
current logic give them bonus of 3% of system ram. Example,
he has 16GB machine, then root processes have ~500MB oom
immune. It bring us crazy bad result. _all_ processes have
oom-score=1 and then, oom killer ignore process memory usage
and kill random process. This regression is caused by commit
a63d83f427 (oom: badness heuristic rewrite).

This patch changes select_bad_process() slightly. If oom points == 1,
it's a sign that the system have only root privileged processes or
similar. Thus, select_bad_process() calculate oom badness without
root bonus and select eligible process.

Also, this patch move finding sacrifice child logic into
select_bad_process(). It's necessary to implement adequate
no root bonus recalculation. and it makes good side effect,
current logic doesn't behave as the doc.

Documentation/sysctl/vm.txt says

    oom_kill_allocating_task

    If this is set to non-zero, the OOM killer simply kills the task that
    triggered the out-of-memory condition.  This avoids the expensive
    tasklist scan.

IOW, oom_kill_allocating_task shouldn't search sacrifice child.
This patch also fixes this issue.

Reported-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/proc/base.c      |    2 +-
 include/linux/oom.h |    3 +-
 mm/oom_kill.c       |   89 ++++++++++++++++++++++++++++----------------------
 3 files changed, 53 insertions(+), 41 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index d6b0424..b608b69 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -482,7 +482,7 @@ static int proc_oom_score(struct task_struct *task, char *buffer)

 	read_lock(&tasklist_lock);
 	if (pid_alive(task)) {
-		points = oom_badness(task, NULL, NULL, totalpages);
+		points = oom_badness(task, NULL, NULL, totalpages, 1);
 		ratio = points * 1000 / totalpages;
 	}
 	read_unlock(&tasklist_lock);
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 0f5b588..3dd3669 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -42,7 +42,8 @@ enum oom_constraint {

 /* The badness from the OOM killer */
 extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
-			const nodemask_t *nodemask, unsigned long totalpages);
+			const nodemask_t *nodemask, unsigned long totalpages,
+			int protect_root);
 extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
 extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8bbc3df..7d280d4 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -133,7 +133,8 @@ static bool oom_unkillable_task(struct task_struct *p,
  * task consuming the most memory to avoid subsequent oom failures.
  */
 unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
-		      const nodemask_t *nodemask, unsigned long totalpages)
+			 const nodemask_t *nodemask, unsigned long totalpages,
+			 int protect_root)
 {
 	unsigned long points;
 	unsigned long score_adj = 0;
@@ -186,7 +187,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 *
 	 * XXX: Too large bonus, example, if the system have tera-bytes memory..
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
+	if (protect_root && has_capability_noaudit(p, CAP_SYS_ADMIN)) {
 		if (points >= totalpages / 32)
 			points -= totalpages / 32;
 		else
@@ -298,8 +299,11 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 {
 	struct task_struct *g, *p;
 	struct task_struct *chosen = NULL;
-	*ppoints = 0;
+	int protect_root = 1;
+	unsigned long chosen_points = 0;
+	struct task_struct *child;

+ retry:
 	do_each_thread_reverse(g, p) {
 		unsigned long points;

@@ -332,7 +336,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 			 */
 			if (p == current) {
 				chosen = p;
-				*ppoints = ULONG_MAX;
+				chosen_points = ULONG_MAX;
 			} else {
 				/*
 				 * If this task is not being ptraced on exit,
@@ -345,13 +349,49 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
 			}
 		}

-		points = oom_badness(p, mem, nodemask, totalpages);
-		if (points > *ppoints) {
+		points = oom_badness(p, mem, nodemask, totalpages, protect_root);
+		if (points > chosen_points) {
 			chosen = p;
-			*ppoints = points;
+			chosen_points = points;
 		}
 	} while_each_thread(g, p);

+	/*
+	 * chosen_point==1 may be a sign that root privilege bonus is too large
+	 * and we choose wrong task. Let's recalculate oom score without the
+	 * dubious bonus.
+	 */
+	if (protect_root && (chosen_points == 1)) {
+		protect_root = 0;
+		goto retry;
+	}
+
+	/*
+	 * If any of p's children has a different mm and is eligible for kill,
+	 * the one with the highest badness() score is sacrificed for its
+	 * parent.  This attempts to lose the minimal amount of work done while
+	 * still freeing memory.
+	 */
+	g = p = chosen;
+	do {
+		list_for_each_entry(child, &p->children, sibling) {
+			unsigned long child_points;
+
+			if (child->mm == p->mm)
+				continue;
+			/*
+			 * oom_badness() returns 0 if the thread is unkillable
+			 */
+			child_points = oom_badness(child, mem, nodemask,
+						   totalpages, protect_root);
+			if (child_points > chosen_points) {
+				chosen = child;
+				chosen_points = child_points;
+			}
+		}
+	} while_each_thread(g, p);
+
+	*ppoints = chosen_points;
 	return chosen;
 }

@@ -467,11 +507,6 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 			    struct mem_cgroup *mem, nodemask_t *nodemask,
 			    const char *message)
 {
-	struct task_struct *victim = p;
-	struct task_struct *child;
-	struct task_struct *t = p;
-	unsigned long victim_points = 0;
-
 	if (printk_ratelimit())
 		dump_header(p, gfp_mask, order, mem, nodemask);

@@ -485,35 +520,11 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	}

 	task_lock(p);
-	pr_err("%s: Kill process %d (%s) points %lu or sacrifice child\n",
-		message, task_pid_nr(p), p->comm, points);
+	pr_err("%s: Kill process %d (%s) points %lu\n",
+	       message, task_pid_nr(p), p->comm, points);
 	task_unlock(p);

-	/*
-	 * If any of p's children has a different mm and is eligible for kill,
-	 * the one with the highest badness() score is sacrificed for its
-	 * parent.  This attempts to lose the minimal amount of work done while
-	 * still freeing memory.
-	 */
-	do {
-		list_for_each_entry(child, &t->children, sibling) {
-			unsigned long child_points;
-
-			if (child->mm == p->mm)
-				continue;
-			/*
-			 * oom_badness() returns 0 if the thread is unkillable
-			 */
-			child_points = oom_badness(child, mem, nodemask,
-								totalpages);
-			if (child_points > victim_points) {
-				victim = child;
-				victim_points = child_points;
-			}
-		}
-	} while_each_thread(p, t);
-
-	return oom_kill_task(victim, mem);
+	return oom_kill_task(p, mem);
 }

 /*
-- 
1.7.3.1



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 5/5] oom: merge oom_kill_process() with oom_kill_task()
  2011-05-20  8:00 ` KOSAKI Motohiro
@ 2011-05-20  8:05   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:05 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg

Now, oom_kill_process() become almost empty function. Let's
merge it with oom_kill_task().

Also, this patch replace task_pid_nr() with task_tgid_nr().
Because 1) oom killer kill a process, not thread. 2) a userland
don't care thread id.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   53 ++++++++++++++++++++++-------------------------------
 1 files changed, 22 insertions(+), 31 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7d280d4..ec075cc 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -458,11 +458,26 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
 }

 #define K(x) ((x) << (PAGE_SHIFT-10))
-static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
+static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
+			    unsigned long points, unsigned long totalpages,
+			    struct mem_cgroup *mem, nodemask_t *nodemask,
+			    const char *message)
 {
 	struct task_struct *q;
 	struct mm_struct *mm;

+	if (printk_ratelimit())
+		dump_header(p, gfp_mask, order, mem, nodemask);
+
+	/*
+	 * If the task is already exiting, don't alarm the sysadmin or kill
+	 * its children or threads, just set TIF_MEMDIE so it can die quickly
+	 */
+	if (p->flags & PF_EXITING) {
+		set_tsk_thread_flag(p, TIF_MEMDIE);
+		return 0;
+	}
+
 	p = find_lock_task_mm(p);
 	if (!p)
 		return 1;
@@ -470,10 +485,11 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 	/* mm cannot be safely dereferenced after task_unlock(p) */
 	mm = p->mm;

-	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
-		task_pid_nr(p), p->comm, K(p->mm->total_vm),
-		K(get_mm_counter(p->mm, MM_ANONPAGES)),
-		K(get_mm_counter(p->mm, MM_FILEPAGES)));
+	pr_err("%s: Kill process %d (%s) points:%lu total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
+	       message, task_tgid_nr(p), p->comm, points,
+	       K(p->mm->total_vm),
+	       K(get_mm_counter(p->mm, MM_ANONPAGES)),
+	       K(get_mm_counter(p->mm, MM_FILEPAGES)));
 	task_unlock(p);

 	/*
@@ -490,7 +506,7 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 		if (q->mm == mm && !same_thread_group(q, p)) {
 			task_lock(q);	/* Protect ->comm from prctl() */
 			pr_err("Kill process %d (%s) sharing same memory\n",
-				task_pid_nr(q), q->comm);
+				task_tgid_nr(q), q->comm);
 			task_unlock(q);
 			force_sig(SIGKILL, q);
 		}
@@ -502,31 +518,6 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 }
 #undef K

-static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
-			    unsigned long points, unsigned long totalpages,
-			    struct mem_cgroup *mem, nodemask_t *nodemask,
-			    const char *message)
-{
-	if (printk_ratelimit())
-		dump_header(p, gfp_mask, order, mem, nodemask);
-
-	/*
-	 * If the task is already exiting, don't alarm the sysadmin or kill
-	 * its children or threads, just set TIF_MEMDIE so it can die quickly
-	 */
-	if (p->flags & PF_EXITING) {
-		set_tsk_thread_flag(p, TIF_MEMDIE);
-		return 0;
-	}
-
-	task_lock(p);
-	pr_err("%s: Kill process %d (%s) points %lu\n",
-	       message, task_pid_nr(p), p->comm, points);
-	task_unlock(p);
-
-	return oom_kill_task(p, mem);
-}
-
 /*
  * Determines whether the kernel must panic because of the panic_on_oom sysctl.
  */
-- 
1.7.3.1




^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [PATCH 5/5] oom: merge oom_kill_process() with oom_kill_task()
@ 2011-05-20  8:05   ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-20  8:05 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, minchan.kim, oleg

Now, oom_kill_process() become almost empty function. Let's
merge it with oom_kill_task().

Also, this patch replace task_pid_nr() with task_tgid_nr().
Because 1) oom killer kill a process, not thread. 2) a userland
don't care thread id.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |   53 ++++++++++++++++++++++-------------------------------
 1 files changed, 22 insertions(+), 31 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7d280d4..ec075cc 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -458,11 +458,26 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
 }

 #define K(x) ((x) << (PAGE_SHIFT-10))
-static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
+static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
+			    unsigned long points, unsigned long totalpages,
+			    struct mem_cgroup *mem, nodemask_t *nodemask,
+			    const char *message)
 {
 	struct task_struct *q;
 	struct mm_struct *mm;

+	if (printk_ratelimit())
+		dump_header(p, gfp_mask, order, mem, nodemask);
+
+	/*
+	 * If the task is already exiting, don't alarm the sysadmin or kill
+	 * its children or threads, just set TIF_MEMDIE so it can die quickly
+	 */
+	if (p->flags & PF_EXITING) {
+		set_tsk_thread_flag(p, TIF_MEMDIE);
+		return 0;
+	}
+
 	p = find_lock_task_mm(p);
 	if (!p)
 		return 1;
@@ -470,10 +485,11 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 	/* mm cannot be safely dereferenced after task_unlock(p) */
 	mm = p->mm;

-	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
-		task_pid_nr(p), p->comm, K(p->mm->total_vm),
-		K(get_mm_counter(p->mm, MM_ANONPAGES)),
-		K(get_mm_counter(p->mm, MM_FILEPAGES)));
+	pr_err("%s: Kill process %d (%s) points:%lu total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
+	       message, task_tgid_nr(p), p->comm, points,
+	       K(p->mm->total_vm),
+	       K(get_mm_counter(p->mm, MM_ANONPAGES)),
+	       K(get_mm_counter(p->mm, MM_FILEPAGES)));
 	task_unlock(p);

 	/*
@@ -490,7 +506,7 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 		if (q->mm == mm && !same_thread_group(q, p)) {
 			task_lock(q);	/* Protect ->comm from prctl() */
 			pr_err("Kill process %d (%s) sharing same memory\n",
-				task_pid_nr(q), q->comm);
+				task_tgid_nr(q), q->comm);
 			task_unlock(q);
 			force_sig(SIGKILL, q);
 		}
@@ -502,31 +518,6 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
 }
 #undef K

-static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
-			    unsigned long points, unsigned long totalpages,
-			    struct mem_cgroup *mem, nodemask_t *nodemask,
-			    const char *message)
-{
-	if (printk_ratelimit())
-		dump_header(p, gfp_mask, order, mem, nodemask);
-
-	/*
-	 * If the task is already exiting, don't alarm the sysadmin or kill
-	 * its children or threads, just set TIF_MEMDIE so it can die quickly
-	 */
-	if (p->flags & PF_EXITING) {
-		set_tsk_thread_flag(p, TIF_MEMDIE);
-		return 0;
-	}
-
-	task_lock(p);
-	pr_err("%s: Kill process %d (%s) points %lu\n",
-	       message, task_pid_nr(p), p->comm, points);
-	task_unlock(p);
-
-	return oom_kill_task(p, mem);
-}
-
 /*
  * Determines whether the kernel must panic because of the panic_on_oom sysctl.
  */
-- 
1.7.3.1



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH 2/5] oom: kill younger process first
  2011-05-20  8:02   ` KOSAKI Motohiro
@ 2011-05-23  2:37     ` Minchan Kim
  -1 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-23  2:37 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

2011/5/20 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> This patch introduces do_each_thread_reverse() and select_bad_process()
> uses it. The benefits are two, 1) oom-killer can kill younger process
> than older if they have a same oom score. Usually younger process is
> less important. 2) younger task often have PF_EXITING because shell
> script makes a lot of short lived processes. Reverse order search can
> detect it faster.
>
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 2/5] oom: kill younger process first
@ 2011-05-23  2:37     ` Minchan Kim
  0 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-23  2:37 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

2011/5/20 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> This patch introduces do_each_thread_reverse() and select_bad_process()
> uses it. The benefits are two, 1) oom-killer can kill younger process
> than older if they have a same oom score. Usually younger process is
> less important. 2) younger task often have PF_EXITING because shell
> script makes a lot of short lived processes. Reverse order search can
> detect it faster.
>
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-20  8:03   ` KOSAKI Motohiro
@ 2011-05-23  3:59     ` Minchan Kim
  -1 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-23  3:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

2011/5/20 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> CAI Qian reported his kernel did hang-up if he ran fork intensive
> workload and then invoke oom-killer.
>
> The problem is, current oom calculation uses 0-1000 normalized value
> (The unit is a permillage of system-ram). Its low precision make
> a lot of same oom score. IOW, in his case, all processes have smaller
> oom score than 1 and internal calculation round it to 1.
>
> Thus oom-killer kill ineligible process. This regression is caused by
> commit a63d83f427 (oom: badness heuristic rewrite).
>
> The solution is, the internal calculation just use number of pages
> instead of permillage of system-ram. And convert it to permillage
> value at displaying time.
>
> This patch doesn't change any ABI (included  /proc/<pid>/oom_score_adj)
> even though current logic has a lot of my dislike thing.
>
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  fs/proc/base.c      |   13 ++++++----
>  include/linux/oom.h |    7 +----
>  mm/oom_kill.c       |   60 +++++++++++++++++++++++++++++++++-----------------
>  3 files changed, 49 insertions(+), 31 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index dfa5327..d6b0424 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -476,14 +476,17 @@ static const struct file_operations proc_lstats_operations = {
>
>  static int proc_oom_score(struct task_struct *task, char *buffer)
>  {
> -       unsigned long points = 0;
> +       unsigned long points;
> +       unsigned long ratio = 0;
> +       unsigned long totalpages = totalram_pages + total_swap_pages + 1;

Does we need +1?
oom_badness does have the check.

>
>        read_lock(&tasklist_lock);
> -       if (pid_alive(task))
> -               points = oom_badness(task, NULL, NULL,
> -                                       totalram_pages + total_swap_pages);
> +       if (pid_alive(task)) {
> +               points = oom_badness(task, NULL, NULL, totalpages);
> +               ratio = points * 1000 / totalpages;
> +       }
>        read_unlock(&tasklist_lock);
> -       return sprintf(buffer, "%lu\n", points);
> +       return sprintf(buffer, "%lu\n", ratio);
>  }
>
>  struct limit_names {
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 5e3aa83..0f5b588 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -40,7 +40,8 @@ enum oom_constraint {
>        CONSTRAINT_MEMCG,
>  };
>
> -extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> +/* The badness from the OOM killer */
> +extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>                        const nodemask_t *nodemask, unsigned long totalpages);
>  extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
>  extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
> @@ -62,10 +63,6 @@ static inline void oom_killer_enable(void)
>        oom_killer_disabled = false;
>  }
>
> -/* The badness from the OOM killer */
> -extern unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
> -                     const nodemask_t *nodemask, unsigned long uptime);
> -
>  extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>
>  /* sysctls */
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index e6a6c6f..8bbc3df 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -132,10 +132,12 @@ static bool oom_unkillable_task(struct task_struct *p,
>  * predictable as possible.  The goal is to return the highest value for the
>  * task consuming the most memory to avoid subsequent oom failures.
>  */
> -unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> +unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>                      const nodemask_t *nodemask, unsigned long totalpages)
>  {
> -       int points;
> +       unsigned long points;
> +       unsigned long score_adj = 0;
> +
>
>        if (oom_unkillable_task(p, mem, nodemask))
>                return 0;
> @@ -160,7 +162,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>         */
>        if (p->flags & PF_OOM_ORIGIN) {
>                task_unlock(p);
> -               return 1000;
> +               return ULONG_MAX;
>        }
>
>        /*
> @@ -176,33 +178,49 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>         */
>        points = get_mm_rss(p->mm) + p->mm->nr_ptes;
>        points += get_mm_counter(p->mm, MM_SWAPENTS);
> -
> -       points *= 1000;
> -       points /= totalpages;
>        task_unlock(p);
>
>        /*
>         * Root processes get 3% bonus, just like the __vm_enough_memory()
>         * implementation used by LSMs.
> +        *
> +        * XXX: Too large bonus, example, if the system have tera-bytes memory..
>         */
> -       if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> -               points -= 30;
> +       if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> +               if (points >= totalpages / 32)
> +                       points -= totalpages / 32;
> +               else
> +                       points = 0;

Odd. Why do we initialize points with 0?

I think the idea is good.


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-23  3:59     ` Minchan Kim
  0 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-23  3:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

2011/5/20 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> CAI Qian reported his kernel did hang-up if he ran fork intensive
> workload and then invoke oom-killer.
>
> The problem is, current oom calculation uses 0-1000 normalized value
> (The unit is a permillage of system-ram). Its low precision make
> a lot of same oom score. IOW, in his case, all processes have smaller
> oom score than 1 and internal calculation round it to 1.
>
> Thus oom-killer kill ineligible process. This regression is caused by
> commit a63d83f427 (oom: badness heuristic rewrite).
>
> The solution is, the internal calculation just use number of pages
> instead of permillage of system-ram. And convert it to permillage
> value at displaying time.
>
> This patch doesn't change any ABI (included  /proc/<pid>/oom_score_adj)
> even though current logic has a lot of my dislike thing.
>
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  fs/proc/base.c      |   13 ++++++----
>  include/linux/oom.h |    7 +----
>  mm/oom_kill.c       |   60 +++++++++++++++++++++++++++++++++-----------------
>  3 files changed, 49 insertions(+), 31 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index dfa5327..d6b0424 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -476,14 +476,17 @@ static const struct file_operations proc_lstats_operations = {
>
>  static int proc_oom_score(struct task_struct *task, char *buffer)
>  {
> -       unsigned long points = 0;
> +       unsigned long points;
> +       unsigned long ratio = 0;
> +       unsigned long totalpages = totalram_pages + total_swap_pages + 1;

Does we need +1?
oom_badness does have the check.

>
>        read_lock(&tasklist_lock);
> -       if (pid_alive(task))
> -               points = oom_badness(task, NULL, NULL,
> -                                       totalram_pages + total_swap_pages);
> +       if (pid_alive(task)) {
> +               points = oom_badness(task, NULL, NULL, totalpages);
> +               ratio = points * 1000 / totalpages;
> +       }
>        read_unlock(&tasklist_lock);
> -       return sprintf(buffer, "%lu\n", points);
> +       return sprintf(buffer, "%lu\n", ratio);
>  }
>
>  struct limit_names {
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 5e3aa83..0f5b588 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -40,7 +40,8 @@ enum oom_constraint {
>        CONSTRAINT_MEMCG,
>  };
>
> -extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> +/* The badness from the OOM killer */
> +extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>                        const nodemask_t *nodemask, unsigned long totalpages);
>  extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
>  extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
> @@ -62,10 +63,6 @@ static inline void oom_killer_enable(void)
>        oom_killer_disabled = false;
>  }
>
> -/* The badness from the OOM killer */
> -extern unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
> -                     const nodemask_t *nodemask, unsigned long uptime);
> -
>  extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>
>  /* sysctls */
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index e6a6c6f..8bbc3df 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -132,10 +132,12 @@ static bool oom_unkillable_task(struct task_struct *p,
>  * predictable as possible.  The goal is to return the highest value for the
>  * task consuming the most memory to avoid subsequent oom failures.
>  */
> -unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> +unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>                      const nodemask_t *nodemask, unsigned long totalpages)
>  {
> -       int points;
> +       unsigned long points;
> +       unsigned long score_adj = 0;
> +
>
>        if (oom_unkillable_task(p, mem, nodemask))
>                return 0;
> @@ -160,7 +162,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>         */
>        if (p->flags & PF_OOM_ORIGIN) {
>                task_unlock(p);
> -               return 1000;
> +               return ULONG_MAX;
>        }
>
>        /*
> @@ -176,33 +178,49 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>         */
>        points = get_mm_rss(p->mm) + p->mm->nr_ptes;
>        points += get_mm_counter(p->mm, MM_SWAPENTS);
> -
> -       points *= 1000;
> -       points /= totalpages;
>        task_unlock(p);
>
>        /*
>         * Root processes get 3% bonus, just like the __vm_enough_memory()
>         * implementation used by LSMs.
> +        *
> +        * XXX: Too large bonus, example, if the system have tera-bytes memory..
>         */
> -       if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> -               points -= 30;
> +       if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> +               if (points >= totalpages / 32)
> +                       points -= totalpages / 32;
> +               else
> +                       points = 0;

Odd. Why do we initialize points with 0?

I think the idea is good.


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-20  8:03   ` KOSAKI Motohiro
@ 2011-05-23  4:02     ` Minchan Kim
  -1 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-23  4:02 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

2011/5/20 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> CAI Qian reported his kernel did hang-up if he ran fork intensive
> workload and then invoke oom-killer.
>
> The problem is, current oom calculation uses 0-1000 normalized value
> (The unit is a permillage of system-ram). Its low precision make
> a lot of same oom score. IOW, in his case, all processes have smaller
> oom score than 1 and internal calculation round it to 1.
>
> Thus oom-killer kill ineligible process. This regression is caused by
> commit a63d83f427 (oom: badness heuristic rewrite).
>
> The solution is, the internal calculation just use number of pages
> instead of permillage of system-ram. And convert it to permillage
> value at displaying time.
>
> This patch doesn't change any ABI (included  /proc/<pid>/oom_score_adj)
> even though current logic has a lot of my dislike thing.
>
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  fs/proc/base.c      |   13 ++++++----
>  include/linux/oom.h |    7 +----
>  mm/oom_kill.c       |   60 +++++++++++++++++++++++++++++++++-----------------
>  3 files changed, 49 insertions(+), 31 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index dfa5327..d6b0424 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -476,14 +476,17 @@ static const struct file_operations proc_lstats_operations = {
>
>  static int proc_oom_score(struct task_struct *task, char *buffer)
>  {
> -       unsigned long points = 0;
> +       unsigned long points;
> +       unsigned long ratio = 0;
> +       unsigned long totalpages = totalram_pages + total_swap_pages + 1;
>
>        read_lock(&tasklist_lock);
> -       if (pid_alive(task))
> -               points = oom_badness(task, NULL, NULL,
> -                                       totalram_pages + total_swap_pages);
> +       if (pid_alive(task)) {
> +               points = oom_badness(task, NULL, NULL, totalpages);
> +               ratio = points * 1000 / totalpages;
> +       }
>        read_unlock(&tasklist_lock);
> -       return sprintf(buffer, "%lu\n", points);
> +       return sprintf(buffer, "%lu\n", ratio);
>  }
>
>  struct limit_names {
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 5e3aa83..0f5b588 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -40,7 +40,8 @@ enum oom_constraint {
>        CONSTRAINT_MEMCG,
>  };
>
> -extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> +/* The badness from the OOM killer */
> +extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>                        const nodemask_t *nodemask, unsigned long totalpages);
>  extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
>  extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
> @@ -62,10 +63,6 @@ static inline void oom_killer_enable(void)
>        oom_killer_disabled = false;
>  }
>
> -/* The badness from the OOM killer */
> -extern unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
> -                     const nodemask_t *nodemask, unsigned long uptime);
> -
>  extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>
>  /* sysctls */
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index e6a6c6f..8bbc3df 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -132,10 +132,12 @@ static bool oom_unkillable_task(struct task_struct *p,
>  * predictable as possible.  The goal is to return the highest value for the
>  * task consuming the most memory to avoid subsequent oom failures.
>  */
> -unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> +unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>                      const nodemask_t *nodemask, unsigned long totalpages)
>  {
> -       int points;
> +       unsigned long points;
> +       unsigned long score_adj = 0;
> +
>
>        if (oom_unkillable_task(p, mem, nodemask))
>                return 0;
> @@ -160,7 +162,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>         */
>        if (p->flags & PF_OOM_ORIGIN) {
>                task_unlock(p);
> -               return 1000;
> +               return ULONG_MAX;
>        }
>
>        /*
> @@ -176,33 +178,49 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>         */
>        points = get_mm_rss(p->mm) + p->mm->nr_ptes;
>        points += get_mm_counter(p->mm, MM_SWAPENTS);
> -
> -       points *= 1000;
> -       points /= totalpages;
>        task_unlock(p);
>
>        /*
>         * Root processes get 3% bonus, just like the __vm_enough_memory()
>         * implementation used by LSMs.
> +        *
> +        * XXX: Too large bonus, example, if the system have tera-bytes memory..
>         */

Nitpick. I have no opposition about adding this comment.
But strictly speaking, the comment isn't related to this patch.
No biggie and it's up to you.  :)

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-23  4:02     ` Minchan Kim
  0 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-23  4:02 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

2011/5/20 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> CAI Qian reported his kernel did hang-up if he ran fork intensive
> workload and then invoke oom-killer.
>
> The problem is, current oom calculation uses 0-1000 normalized value
> (The unit is a permillage of system-ram). Its low precision make
> a lot of same oom score. IOW, in his case, all processes have smaller
> oom score than 1 and internal calculation round it to 1.
>
> Thus oom-killer kill ineligible process. This regression is caused by
> commit a63d83f427 (oom: badness heuristic rewrite).
>
> The solution is, the internal calculation just use number of pages
> instead of permillage of system-ram. And convert it to permillage
> value at displaying time.
>
> This patch doesn't change any ABI (included  /proc/<pid>/oom_score_adj)
> even though current logic has a lot of my dislike thing.
>
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  fs/proc/base.c      |   13 ++++++----
>  include/linux/oom.h |    7 +----
>  mm/oom_kill.c       |   60 +++++++++++++++++++++++++++++++++-----------------
>  3 files changed, 49 insertions(+), 31 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index dfa5327..d6b0424 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -476,14 +476,17 @@ static const struct file_operations proc_lstats_operations = {
>
>  static int proc_oom_score(struct task_struct *task, char *buffer)
>  {
> -       unsigned long points = 0;
> +       unsigned long points;
> +       unsigned long ratio = 0;
> +       unsigned long totalpages = totalram_pages + total_swap_pages + 1;
>
>        read_lock(&tasklist_lock);
> -       if (pid_alive(task))
> -               points = oom_badness(task, NULL, NULL,
> -                                       totalram_pages + total_swap_pages);
> +       if (pid_alive(task)) {
> +               points = oom_badness(task, NULL, NULL, totalpages);
> +               ratio = points * 1000 / totalpages;
> +       }
>        read_unlock(&tasklist_lock);
> -       return sprintf(buffer, "%lu\n", points);
> +       return sprintf(buffer, "%lu\n", ratio);
>  }
>
>  struct limit_names {
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 5e3aa83..0f5b588 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -40,7 +40,8 @@ enum oom_constraint {
>        CONSTRAINT_MEMCG,
>  };
>
> -extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> +/* The badness from the OOM killer */
> +extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>                        const nodemask_t *nodemask, unsigned long totalpages);
>  extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
>  extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
> @@ -62,10 +63,6 @@ static inline void oom_killer_enable(void)
>        oom_killer_disabled = false;
>  }
>
> -/* The badness from the OOM killer */
> -extern unsigned long badness(struct task_struct *p, struct mem_cgroup *mem,
> -                     const nodemask_t *nodemask, unsigned long uptime);
> -
>  extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>
>  /* sysctls */
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index e6a6c6f..8bbc3df 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -132,10 +132,12 @@ static bool oom_unkillable_task(struct task_struct *p,
>  * predictable as possible.  The goal is to return the highest value for the
>  * task consuming the most memory to avoid subsequent oom failures.
>  */
> -unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> +unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>                      const nodemask_t *nodemask, unsigned long totalpages)
>  {
> -       int points;
> +       unsigned long points;
> +       unsigned long score_adj = 0;
> +
>
>        if (oom_unkillable_task(p, mem, nodemask))
>                return 0;
> @@ -160,7 +162,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>         */
>        if (p->flags & PF_OOM_ORIGIN) {
>                task_unlock(p);
> -               return 1000;
> +               return ULONG_MAX;
>        }
>
>        /*
> @@ -176,33 +178,49 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>         */
>        points = get_mm_rss(p->mm) + p->mm->nr_ptes;
>        points += get_mm_counter(p->mm, MM_SWAPENTS);
> -
> -       points *= 1000;
> -       points /= totalpages;
>        task_unlock(p);
>
>        /*
>         * Root processes get 3% bonus, just like the __vm_enough_memory()
>         * implementation used by LSMs.
> +        *
> +        * XXX: Too large bonus, example, if the system have tera-bytes memory..
>         */

Nitpick. I have no opposition about adding this comment.
But strictly speaking, the comment isn't related to this patch.
No biggie and it's up to you.  :)

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-20  8:04   ` KOSAKI Motohiro
@ 2011-05-23  4:31     ` Minchan Kim
  -1 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-23  4:31 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

2011/5/20 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> CAI Qian reported oom-killer killed all system daemons in his
> system at first if he ran fork bomb as root. The problem is,
> current logic give them bonus of 3% of system ram. Example,
> he has 16GB machine, then root processes have ~500MB oom
> immune. It bring us crazy bad result. _all_ processes have
> oom-score=1 and then, oom killer ignore process memory usage
> and kill random process. This regression is caused by commit
> a63d83f427 (oom: badness heuristic rewrite).
>
> This patch changes select_bad_process() slightly. If oom points == 1,
> it's a sign that the system have only root privileged processes or
> similar. Thus, select_bad_process() calculate oom badness without
> root bonus and select eligible process.
>
> Also, this patch move finding sacrifice child logic into
> select_bad_process(). It's necessary to implement adequate
> no root bonus recalculation. and it makes good side effect,
> current logic doesn't behave as the doc.
>
> Documentation/sysctl/vm.txt says
>
>    oom_kill_allocating_task
>
>    If this is set to non-zero, the OOM killer simply kills the task that
>    triggered the out-of-memory condition.  This avoids the expensive
>    tasklist scan.
>
> IOW, oom_kill_allocating_task shouldn't search sacrifice child.
> This patch also fixes this issue.
>
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  fs/proc/base.c      |    2 +-
>  include/linux/oom.h |    3 +-
>  mm/oom_kill.c       |   89 ++++++++++++++++++++++++++++----------------------
>  3 files changed, 53 insertions(+), 41 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index d6b0424..b608b69 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -482,7 +482,7 @@ static int proc_oom_score(struct task_struct *task, char *buffer)
>
>        read_lock(&tasklist_lock);
>        if (pid_alive(task)) {
> -               points = oom_badness(task, NULL, NULL, totalpages);
> +               points = oom_badness(task, NULL, NULL, totalpages, 1);
>                ratio = points * 1000 / totalpages;
>        }
>        read_unlock(&tasklist_lock);
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 0f5b588..3dd3669 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -42,7 +42,8 @@ enum oom_constraint {
>
>  /* The badness from the OOM killer */
>  extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> -                       const nodemask_t *nodemask, unsigned long totalpages);
> +                       const nodemask_t *nodemask, unsigned long totalpages,
> +                       int protect_root);
>  extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
>  extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 8bbc3df..7d280d4 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -133,7 +133,8 @@ static bool oom_unkillable_task(struct task_struct *p,
>  * task consuming the most memory to avoid subsequent oom failures.
>  */
>  unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> -                     const nodemask_t *nodemask, unsigned long totalpages)
> +                        const nodemask_t *nodemask, unsigned long totalpages,
> +                        int protect_root)
>  {
>        unsigned long points;
>        unsigned long score_adj = 0;
> @@ -186,7 +187,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>         *
>         * XXX: Too large bonus, example, if the system have tera-bytes memory..
>         */
> -       if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> +       if (protect_root && has_capability_noaudit(p, CAP_SYS_ADMIN)) {
>                if (points >= totalpages / 32)
>                        points -= totalpages / 32;
>                else
> @@ -298,8 +299,11 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
>  {
>        struct task_struct *g, *p;
>        struct task_struct *chosen = NULL;
> -       *ppoints = 0;
> +       int protect_root = 1;
> +       unsigned long chosen_points = 0;
> +       struct task_struct *child;
>
> + retry:
>        do_each_thread_reverse(g, p) {
>                unsigned long points;
>
> @@ -332,7 +336,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
>                         */
>                        if (p == current) {
>                                chosen = p;
> -                               *ppoints = ULONG_MAX;
> +                               chosen_points = ULONG_MAX;
>                        } else {
>                                /*
>                                 * If this task is not being ptraced on exit,
> @@ -345,13 +349,49 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
>                        }
>                }
>
> -               points = oom_badness(p, mem, nodemask, totalpages);
> -               if (points > *ppoints) {
> +               points = oom_badness(p, mem, nodemask, totalpages, protect_root);
> +               if (points > chosen_points) {
>                        chosen = p;
> -                       *ppoints = points;
> +                       chosen_points = points;
>                }
>        } while_each_thread(g, p);
>
> +       /*
> +        * chosen_point==1 may be a sign that root privilege bonus is too large
> +        * and we choose wrong task. Let's recalculate oom score without the
> +        * dubious bonus.
> +        */
> +       if (protect_root && (chosen_points == 1)) {
> +               protect_root = 0;
> +               goto retry;
> +       }

The idea is good to me.
But once we meet it, should we give up protecting root privileged processes?
How about decaying bonus point?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-23  4:31     ` Minchan Kim
  0 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-23  4:31 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

2011/5/20 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:
> CAI Qian reported oom-killer killed all system daemons in his
> system at first if he ran fork bomb as root. The problem is,
> current logic give them bonus of 3% of system ram. Example,
> he has 16GB machine, then root processes have ~500MB oom
> immune. It bring us crazy bad result. _all_ processes have
> oom-score=1 and then, oom killer ignore process memory usage
> and kill random process. This regression is caused by commit
> a63d83f427 (oom: badness heuristic rewrite).
>
> This patch changes select_bad_process() slightly. If oom points == 1,
> it's a sign that the system have only root privileged processes or
> similar. Thus, select_bad_process() calculate oom badness without
> root bonus and select eligible process.
>
> Also, this patch move finding sacrifice child logic into
> select_bad_process(). It's necessary to implement adequate
> no root bonus recalculation. and it makes good side effect,
> current logic doesn't behave as the doc.
>
> Documentation/sysctl/vm.txt says
>
>    oom_kill_allocating_task
>
>    If this is set to non-zero, the OOM killer simply kills the task that
>    triggered the out-of-memory condition.  This avoids the expensive
>    tasklist scan.
>
> IOW, oom_kill_allocating_task shouldn't search sacrifice child.
> This patch also fixes this issue.
>
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  fs/proc/base.c      |    2 +-
>  include/linux/oom.h |    3 +-
>  mm/oom_kill.c       |   89 ++++++++++++++++++++++++++++----------------------
>  3 files changed, 53 insertions(+), 41 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index d6b0424..b608b69 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -482,7 +482,7 @@ static int proc_oom_score(struct task_struct *task, char *buffer)
>
>        read_lock(&tasklist_lock);
>        if (pid_alive(task)) {
> -               points = oom_badness(task, NULL, NULL, totalpages);
> +               points = oom_badness(task, NULL, NULL, totalpages, 1);
>                ratio = points * 1000 / totalpages;
>        }
>        read_unlock(&tasklist_lock);
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 0f5b588..3dd3669 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -42,7 +42,8 @@ enum oom_constraint {
>
>  /* The badness from the OOM killer */
>  extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> -                       const nodemask_t *nodemask, unsigned long totalpages);
> +                       const nodemask_t *nodemask, unsigned long totalpages,
> +                       int protect_root);
>  extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
>  extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 8bbc3df..7d280d4 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -133,7 +133,8 @@ static bool oom_unkillable_task(struct task_struct *p,
>  * task consuming the most memory to avoid subsequent oom failures.
>  */
>  unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> -                     const nodemask_t *nodemask, unsigned long totalpages)
> +                        const nodemask_t *nodemask, unsigned long totalpages,
> +                        int protect_root)
>  {
>        unsigned long points;
>        unsigned long score_adj = 0;
> @@ -186,7 +187,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>         *
>         * XXX: Too large bonus, example, if the system have tera-bytes memory..
>         */
> -       if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> +       if (protect_root && has_capability_noaudit(p, CAP_SYS_ADMIN)) {
>                if (points >= totalpages / 32)
>                        points -= totalpages / 32;
>                else
> @@ -298,8 +299,11 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
>  {
>        struct task_struct *g, *p;
>        struct task_struct *chosen = NULL;
> -       *ppoints = 0;
> +       int protect_root = 1;
> +       unsigned long chosen_points = 0;
> +       struct task_struct *child;
>
> + retry:
>        do_each_thread_reverse(g, p) {
>                unsigned long points;
>
> @@ -332,7 +336,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
>                         */
>                        if (p == current) {
>                                chosen = p;
> -                               *ppoints = ULONG_MAX;
> +                               chosen_points = ULONG_MAX;
>                        } else {
>                                /*
>                                 * If this task is not being ptraced on exit,
> @@ -345,13 +349,49 @@ static struct task_struct *select_bad_process(unsigned long *ppoints,
>                        }
>                }
>
> -               points = oom_badness(p, mem, nodemask, totalpages);
> -               if (points > *ppoints) {
> +               points = oom_badness(p, mem, nodemask, totalpages, protect_root);
> +               if (points > chosen_points) {
>                        chosen = p;
> -                       *ppoints = points;
> +                       chosen_points = points;
>                }
>        } while_each_thread(g, p);
>
> +       /*
> +        * chosen_point==1 may be a sign that root privilege bonus is too large
> +        * and we choose wrong task. Let's recalculate oom score without the
> +        * dubious bonus.
> +        */
> +       if (protect_root && (chosen_points == 1)) {
> +               protect_root = 0;
> +               goto retry;
> +       }

The idea is good to me.
But once we meet it, should we give up protecting root privileged processes?
How about decaying bonus point?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 1/5] oom: improve dump_tasks() show items
  2011-05-20  8:01   ` KOSAKI Motohiro
@ 2011-05-23 22:16     ` David Rientjes
  -1 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-23 22:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, Andrew Morton, caiqian, Hugh Dickins,
	KAMEZAWA Hiroyuki, Minchan Kim, Oleg Nesterov

On Fri, 20 May 2011, KOSAKI Motohiro wrote:

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index f52e85c..43d32ae 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -355,7 +355,7 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
>  	struct task_struct *p;
>  	struct task_struct *task;
> 
> -	pr_info("[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name\n");
> +	pr_info("[   pid]   ppid   uid total_vm      rss     swap score_adj name\n");
>  	for_each_process(p) {
>  		if (oom_unkillable_task(p, mem, nodemask))
>  			continue;
> @@ -370,11 +370,14 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
>  			continue;
>  		}
> 
> -		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
> -			task->pid, task_uid(task), task->tgid,
> -			task->mm->total_vm, get_mm_rss(task->mm),
> -			task_cpu(task), task->signal->oom_adj,
> -			task->signal->oom_score_adj, task->comm);
> +		pr_info("[%6d] %6d %5d %8lu %8lu %8lu %9d %s\n",
> +			task_tgid_nr(task), task_tgid_nr(task->real_parent),
> +			task_uid(task),
> +			task->mm->total_vm,
> +			get_mm_rss(task->mm) + p->mm->nr_ptes,
> +			get_mm_counter(p->mm, MM_SWAPENTS),
> +			task->signal->oom_score_adj,
> +			task->comm);
>  		task_unlock(task);
>  	}
>  }

Looks good, with the exception that the "score_adj" header should remain 
"oom_score_adj" since that is its name within procfs that changes the 
tunable.

After that's fixed:

	Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 1/5] oom: improve dump_tasks() show items
@ 2011-05-23 22:16     ` David Rientjes
  0 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-23 22:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, Andrew Morton, caiqian, Hugh Dickins,
	KAMEZAWA Hiroyuki, Minchan Kim, Oleg Nesterov

On Fri, 20 May 2011, KOSAKI Motohiro wrote:

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index f52e85c..43d32ae 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -355,7 +355,7 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
>  	struct task_struct *p;
>  	struct task_struct *task;
> 
> -	pr_info("[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name\n");
> +	pr_info("[   pid]   ppid   uid total_vm      rss     swap score_adj name\n");
>  	for_each_process(p) {
>  		if (oom_unkillable_task(p, mem, nodemask))
>  			continue;
> @@ -370,11 +370,14 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
>  			continue;
>  		}
> 
> -		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
> -			task->pid, task_uid(task), task->tgid,
> -			task->mm->total_vm, get_mm_rss(task->mm),
> -			task_cpu(task), task->signal->oom_adj,
> -			task->signal->oom_score_adj, task->comm);
> +		pr_info("[%6d] %6d %5d %8lu %8lu %8lu %9d %s\n",
> +			task_tgid_nr(task), task_tgid_nr(task->real_parent),
> +			task_uid(task),
> +			task->mm->total_vm,
> +			get_mm_rss(task->mm) + p->mm->nr_ptes,
> +			get_mm_counter(p->mm, MM_SWAPENTS),
> +			task->signal->oom_score_adj,
> +			task->comm);
>  		task_unlock(task);
>  	}
>  }

Looks good, with the exception that the "score_adj" header should remain 
"oom_score_adj" since that is its name within procfs that changes the 
tunable.

After that's fixed:

	Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 2/5] oom: kill younger process first
  2011-05-20  8:02   ` KOSAKI Motohiro
@ 2011-05-23 22:20     ` David Rientjes
  -1 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-23 22:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Fri, 20 May 2011, KOSAKI Motohiro wrote:

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 013314a..3698379 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2194,6 +2194,9 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
>  #define next_task(p) \
>  	list_entry_rcu((p)->tasks.next, struct task_struct, tasks)
> 
> +#define prev_task(p) \
> +	list_entry((p)->tasks.prev, struct task_struct, tasks)
> +
>  #define for_each_process(p) \
>  	for (p = &init_task ; (p = next_task(p)) != &init_task ; )
> 
> @@ -2206,6 +2209,14 @@ extern bool current_is_single_threaded(void);
>  #define do_each_thread(g, t) \
>  	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
> 
> +/*
> + * Similar to do_each_thread(). but two difference are there.
> + *  - traverse tasks reverse order (i.e. younger to older)
> + *  - caller must hold tasklist_lock. rcu_read_lock isn't enough
> +*/
> +#define do_each_thread_reverse(g, t) \
> +	for (g = t = &init_task ; (g = t = prev_task(g)) != &init_task ; ) do
> +
>  #define while_each_thread(g, t) \
>  	while ((t = next_thread(t)) != g)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 43d32ae..e6a6c6f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -282,7 +282,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  	struct task_struct *chosen = NULL;
>  	*ppoints = 0;
> 
> -	do_each_thread(g, p) {
> +	do_each_thread_reverse(g, p) {
>  		unsigned int points;
> 
>  		if (!p->mm)

Same response as when you initially proposed this patch: the comment needs 
to explicitly state that it is not break-safe just like do_each_thread().  
See http://marc.info/?l=linux-mm&m=130507027312785

A comment such as

	/*
	 * Reverse of do_each_thread(); still not break-safe.
	 * Must hold tasklist_lock.
	 */

would suffice.  There are no "callers" to a macro.

After that:

	Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 2/5] oom: kill younger process first
@ 2011-05-23 22:20     ` David Rientjes
  0 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-23 22:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Fri, 20 May 2011, KOSAKI Motohiro wrote:

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 013314a..3698379 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2194,6 +2194,9 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
>  #define next_task(p) \
>  	list_entry_rcu((p)->tasks.next, struct task_struct, tasks)
> 
> +#define prev_task(p) \
> +	list_entry((p)->tasks.prev, struct task_struct, tasks)
> +
>  #define for_each_process(p) \
>  	for (p = &init_task ; (p = next_task(p)) != &init_task ; )
> 
> @@ -2206,6 +2209,14 @@ extern bool current_is_single_threaded(void);
>  #define do_each_thread(g, t) \
>  	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
> 
> +/*
> + * Similar to do_each_thread(). but two difference are there.
> + *  - traverse tasks reverse order (i.e. younger to older)
> + *  - caller must hold tasklist_lock. rcu_read_lock isn't enough
> +*/
> +#define do_each_thread_reverse(g, t) \
> +	for (g = t = &init_task ; (g = t = prev_task(g)) != &init_task ; ) do
> +
>  #define while_each_thread(g, t) \
>  	while ((t = next_thread(t)) != g)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 43d32ae..e6a6c6f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -282,7 +282,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  	struct task_struct *chosen = NULL;
>  	*ppoints = 0;
> 
> -	do_each_thread(g, p) {
> +	do_each_thread_reverse(g, p) {
>  		unsigned int points;
> 
>  		if (!p->mm)

Same response as when you initially proposed this patch: the comment needs 
to explicitly state that it is not break-safe just like do_each_thread().  
See http://marc.info/?l=linux-mm&m=130507027312785

A comment such as

	/*
	 * Reverse of do_each_thread(); still not break-safe.
	 * Must hold tasklist_lock.
	 */

would suffice.  There are no "callers" to a macro.

After that:

	Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-20  8:03   ` KOSAKI Motohiro
@ 2011-05-23 22:28     ` David Rientjes
  -1 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-23 22:28 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Fri, 20 May 2011, KOSAKI Motohiro wrote:

> CAI Qian reported his kernel did hang-up if he ran fork intensive
> workload and then invoke oom-killer.
> 
> The problem is, current oom calculation uses 0-1000 normalized value
> (The unit is a permillage of system-ram). Its low precision make
> a lot of same oom score. IOW, in his case, all processes have smaller
> oom score than 1 and internal calculation round it to 1.
> 
> Thus oom-killer kill ineligible process. This regression is caused by
> commit a63d83f427 (oom: badness heuristic rewrite).
> 
> The solution is, the internal calculation just use number of pages
> instead of permillage of system-ram. And convert it to permillage
> value at displaying time.
> 
> This patch doesn't change any ABI (included  /proc/<pid>/oom_score_adj)
> even though current logic has a lot of my dislike thing.
> 

Same response as when you initially proposed this patch: 
http://marc.info/?l=linux-kernel&m=130507086613317 -- you never replied to 
that.

The changelog doesn't accurately represent CAI Qian's problem; the issue 
is that root processes are given too large of a bonus in comparison to 
other threads that are using at most 1.9% of available memory.  That can 
be fixed, as I suggested by giving 1% bonus per 10% of memory used so that 
the process would have to be using 10% before it even receives a bonus.

I already suggested an alternative patch to CAI Qian to greatly increase 
the granularity of the oom score from a range of 0-1000 to 0-10000 to 
differentiate between tasks within 0.01% of available memory (16MB on CAI 
Qian's 16GB system).  I'll propose this officially in a separate email.

This patch also includes undocumented changes such as changing the bonus 
given to root processes.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-23 22:28     ` David Rientjes
  0 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-23 22:28 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Fri, 20 May 2011, KOSAKI Motohiro wrote:

> CAI Qian reported his kernel did hang-up if he ran fork intensive
> workload and then invoke oom-killer.
> 
> The problem is, current oom calculation uses 0-1000 normalized value
> (The unit is a permillage of system-ram). Its low precision make
> a lot of same oom score. IOW, in his case, all processes have smaller
> oom score than 1 and internal calculation round it to 1.
> 
> Thus oom-killer kill ineligible process. This regression is caused by
> commit a63d83f427 (oom: badness heuristic rewrite).
> 
> The solution is, the internal calculation just use number of pages
> instead of permillage of system-ram. And convert it to permillage
> value at displaying time.
> 
> This patch doesn't change any ABI (included  /proc/<pid>/oom_score_adj)
> even though current logic has a lot of my dislike thing.
> 

Same response as when you initially proposed this patch: 
http://marc.info/?l=linux-kernel&m=130507086613317 -- you never replied to 
that.

The changelog doesn't accurately represent CAI Qian's problem; the issue 
is that root processes are given too large of a bonus in comparison to 
other threads that are using at most 1.9% of available memory.  That can 
be fixed, as I suggested by giving 1% bonus per 10% of memory used so that 
the process would have to be using 10% before it even receives a bonus.

I already suggested an alternative patch to CAI Qian to greatly increase 
the granularity of the oom score from a range of 0-1000 to 0-10000 to 
differentiate between tasks within 0.01% of available memory (16MB on CAI 
Qian's 16GB system).  I'll propose this officially in a separate email.

This patch also includes undocumented changes such as changing the bonus 
given to root processes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-20  8:04   ` KOSAKI Motohiro
@ 2011-05-23 22:32     ` David Rientjes
  -1 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-23 22:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Fri, 20 May 2011, KOSAKI Motohiro wrote:

> CAI Qian reported oom-killer killed all system daemons in his
> system at first if he ran fork bomb as root. The problem is,
> current logic give them bonus of 3% of system ram. Example,
> he has 16GB machine, then root processes have ~500MB oom
> immune. It bring us crazy bad result. _all_ processes have
> oom-score=1 and then, oom killer ignore process memory usage
> and kill random process. This regression is caused by commit
> a63d83f427 (oom: badness heuristic rewrite).
> 
> This patch changes select_bad_process() slightly. If oom points == 1,
> it's a sign that the system have only root privileged processes or
> similar. Thus, select_bad_process() calculate oom badness without
> root bonus and select eligible process.
> 

You said earlier that you thought it was a good idea to do a proportional 
based bonus for root processes.  Do you have a specific objection to 
giving root processes a 1% bonus for every 10% of used memory instead?

> Also, this patch move finding sacrifice child logic into
> select_bad_process(). It's necessary to implement adequate
> no root bonus recalculation. and it makes good side effect,
> current logic doesn't behave as the doc.
> 

This is unnecessary and just makes the oom killer egregiously long.  We 
are already diagnosing problems here at Google where the oom killer holds 
tasklist_lock on the readside for far too long, causing other cpus waiting 
for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are 
disabled and it is spinning.  A second tasklist scan is simply a 
non-starter.

 [ This is also one of the reasons why we needed to introduce
   mm->oom_disable_count to prevent a second, expensive tasklist scan. ]

> Documentation/sysctl/vm.txt says
> 
>     oom_kill_allocating_task
> 
>     If this is set to non-zero, the OOM killer simply kills the task that
>     triggered the out-of-memory condition.  This avoids the expensive
>     tasklist scan.
> 
> IOW, oom_kill_allocating_task shouldn't search sacrifice child.
> This patch also fixes this issue.
> 

oom_kill_allocating_task was introduced for SGI to prevent the expensive 
tasklist scan, the task that is actually allocating the memory isn't 
actually interesting and is usually random.  This should be turned into a 
documentation fix rather than changing the implementation.

Thanks.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-23 22:32     ` David Rientjes
  0 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-23 22:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Fri, 20 May 2011, KOSAKI Motohiro wrote:

> CAI Qian reported oom-killer killed all system daemons in his
> system at first if he ran fork bomb as root. The problem is,
> current logic give them bonus of 3% of system ram. Example,
> he has 16GB machine, then root processes have ~500MB oom
> immune. It bring us crazy bad result. _all_ processes have
> oom-score=1 and then, oom killer ignore process memory usage
> and kill random process. This regression is caused by commit
> a63d83f427 (oom: badness heuristic rewrite).
> 
> This patch changes select_bad_process() slightly. If oom points == 1,
> it's a sign that the system have only root privileged processes or
> similar. Thus, select_bad_process() calculate oom badness without
> root bonus and select eligible process.
> 

You said earlier that you thought it was a good idea to do a proportional 
based bonus for root processes.  Do you have a specific objection to 
giving root processes a 1% bonus for every 10% of used memory instead?

> Also, this patch move finding sacrifice child logic into
> select_bad_process(). It's necessary to implement adequate
> no root bonus recalculation. and it makes good side effect,
> current logic doesn't behave as the doc.
> 

This is unnecessary and just makes the oom killer egregiously long.  We 
are already diagnosing problems here at Google where the oom killer holds 
tasklist_lock on the readside for far too long, causing other cpus waiting 
for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are 
disabled and it is spinning.  A second tasklist scan is simply a 
non-starter.

 [ This is also one of the reasons why we needed to introduce
   mm->oom_disable_count to prevent a second, expensive tasklist scan. ]

> Documentation/sysctl/vm.txt says
> 
>     oom_kill_allocating_task
> 
>     If this is set to non-zero, the OOM killer simply kills the task that
>     triggered the out-of-memory condition.  This avoids the expensive
>     tasklist scan.
> 
> IOW, oom_kill_allocating_task shouldn't search sacrifice child.
> This patch also fixes this issue.
> 

oom_kill_allocating_task was introduced for SGI to prevent the expensive 
tasklist scan, the task that is actually allocating the memory isn't 
actually interesting and is usually random.  This should be turned into a 
documentation fix rather than changing the implementation.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-23 22:28     ` David Rientjes
@ 2011-05-23 22:48       ` David Rientjes
  -1 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-23 22:48 UTC (permalink / raw)
  To: KOSAKI Motohiro, caiqian
  Cc: linux-mm, linux-kernel, Andrew Morton, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Mon, 23 May 2011, David Rientjes wrote:

> I already suggested an alternative patch to CAI Qian to greatly increase 
> the granularity of the oom score from a range of 0-1000 to 0-10000 to 
> differentiate between tasks within 0.01% of available memory (16MB on CAI 
> Qian's 16GB system).  I'll propose this officially in a separate email.
> 

This is an alternative patch as earlier proposed with suggested 
improvements from Minchan.  CAI, would it be possible to test this out on 
your usecase?

I'm indifferent to the actual scale of OOM_SCORE_MAX_FACTOR; it could be 
10 as proposed in this patch or even increased higher for higher 
resolution.


diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -38,6 +38,9 @@ int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 static DEFINE_SPINLOCK(zone_scan_lock);
 
+#define OOM_SCORE_MAX_FACTOR	10
+#define OOM_SCORE_MAX		(OOM_SCORE_ADJ_MAX * OOM_SCORE_MAX_FACTOR)
+
 #ifdef CONFIG_NUMA
 /**
  * has_intersects_mems_allowed() - check task eligiblity for kill
@@ -160,7 +163,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 */
 	if (p->flags & PF_OOM_ORIGIN) {
 		task_unlock(p);
-		return 1000;
+		return OOM_SCORE_MAX;
 	}
 
 	/*
@@ -177,32 +180,38 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	points = get_mm_rss(p->mm) + p->mm->nr_ptes;
 	points += get_mm_counter(p->mm, MM_SWAPENTS);
 
-	points *= 1000;
+	points *= OOM_SCORE_MAX;
 	points /= totalpages;
 	task_unlock(p);
 
 	/*
-	 * Root processes get 3% bonus, just like the __vm_enough_memory()
-	 * implementation used by LSMs.
+	 * Root processes get a bonus of 1% per 10% of memory used.
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
-		points -= 30;
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
+		int bonus;
+		int granularity;
+
+		bonus = OOM_SCORE_MAX / 100;		/* bonus is 1% */
+		granularity = OOM_SCORE_MAX / 10;	/* granularity is 10% */
+
+		points -= bonus * (points / granularity);
+	}
 
 	/*
 	 * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
 	 * either completely disable oom killing or always prefer a certain
 	 * task.
 	 */
-	points += p->signal->oom_score_adj;
+	points += p->signal->oom_score_adj * OOM_SCORE_MAX_FACTOR;
 
 	/*
 	 * Never return 0 for an eligible task that may be killed since it's
-	 * possible that no single user task uses more than 0.1% of memory and
+	 * possible that no single user task uses more than 0.01% of memory and
 	 * no single admin tasks uses more than 3.0%.
 	 */
 	if (points <= 0)
 		return 1;
-	return (points < 1000) ? points : 1000;
+	return (points < OOM_SCORE_MAX) ? points : OOM_SCORE_MAX;
 }
 
 /*
@@ -314,7 +323,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 			 */
 			if (p == current) {
 				chosen = p;
-				*ppoints = 1000;
+				*ppoints = OOM_SCORE_MAX;
 			} else {
 				/*
 				 * If this task is not being ptraced on exit,

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-23 22:48       ` David Rientjes
  0 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-23 22:48 UTC (permalink / raw)
  To: KOSAKI Motohiro, caiqian
  Cc: linux-mm, linux-kernel, Andrew Morton, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Mon, 23 May 2011, David Rientjes wrote:

> I already suggested an alternative patch to CAI Qian to greatly increase 
> the granularity of the oom score from a range of 0-1000 to 0-10000 to 
> differentiate between tasks within 0.01% of available memory (16MB on CAI 
> Qian's 16GB system).  I'll propose this officially in a separate email.
> 

This is an alternative patch as earlier proposed with suggested 
improvements from Minchan.  CAI, would it be possible to test this out on 
your usecase?

I'm indifferent to the actual scale of OOM_SCORE_MAX_FACTOR; it could be 
10 as proposed in this patch or even increased higher for higher 
resolution.


diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -38,6 +38,9 @@ int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 static DEFINE_SPINLOCK(zone_scan_lock);
 
+#define OOM_SCORE_MAX_FACTOR	10
+#define OOM_SCORE_MAX		(OOM_SCORE_ADJ_MAX * OOM_SCORE_MAX_FACTOR)
+
 #ifdef CONFIG_NUMA
 /**
  * has_intersects_mems_allowed() - check task eligiblity for kill
@@ -160,7 +163,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 */
 	if (p->flags & PF_OOM_ORIGIN) {
 		task_unlock(p);
-		return 1000;
+		return OOM_SCORE_MAX;
 	}
 
 	/*
@@ -177,32 +180,38 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	points = get_mm_rss(p->mm) + p->mm->nr_ptes;
 	points += get_mm_counter(p->mm, MM_SWAPENTS);
 
-	points *= 1000;
+	points *= OOM_SCORE_MAX;
 	points /= totalpages;
 	task_unlock(p);
 
 	/*
-	 * Root processes get 3% bonus, just like the __vm_enough_memory()
-	 * implementation used by LSMs.
+	 * Root processes get a bonus of 1% per 10% of memory used.
 	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
-		points -= 30;
+	if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
+		int bonus;
+		int granularity;
+
+		bonus = OOM_SCORE_MAX / 100;		/* bonus is 1% */
+		granularity = OOM_SCORE_MAX / 10;	/* granularity is 10% */
+
+		points -= bonus * (points / granularity);
+	}
 
 	/*
 	 * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
 	 * either completely disable oom killing or always prefer a certain
 	 * task.
 	 */
-	points += p->signal->oom_score_adj;
+	points += p->signal->oom_score_adj * OOM_SCORE_MAX_FACTOR;
 
 	/*
 	 * Never return 0 for an eligible task that may be killed since it's
-	 * possible that no single user task uses more than 0.1% of memory and
+	 * possible that no single user task uses more than 0.01% of memory and
 	 * no single admin tasks uses more than 3.0%.
 	 */
 	if (points <= 0)
 		return 1;
-	return (points < 1000) ? points : 1000;
+	return (points < OOM_SCORE_MAX) ? points : OOM_SCORE_MAX;
 }
 
 /*
@@ -314,7 +323,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 			 */
 			if (p == current) {
 				chosen = p;
-				*ppoints = 1000;
+				*ppoints = OOM_SCORE_MAX;
 			} else {
 				/*
 				 * If this task is not being ptraced on exit,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-23  3:59     ` Minchan Kim
@ 2011-05-24  1:14       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:14 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

Hi


>> @@ -476,14 +476,17 @@ static const struct file_operations proc_lstats_operations = {
>>
>>   static int proc_oom_score(struct task_struct *task, char *buffer)
>>   {
>> -       unsigned long points = 0;
>> +       unsigned long points;
>> +       unsigned long ratio = 0;
>> +       unsigned long totalpages = totalram_pages + total_swap_pages + 1;
>
> Does we need +1?
> oom_badness does have the check.

"ratio = points * 1000 / totalpages;" need to avoid zero divide.

>>         /*
>>          * Root processes get 3% bonus, just like the __vm_enough_memory()
>>          * implementation used by LSMs.
>> +        *
>> +        * XXX: Too large bonus, example, if the system have tera-bytes memory..
>>          */
>> -       if (has_capability_noaudit(p, CAP_SYS_ADMIN))
>> -               points -= 30;
>> +       if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
>> +               if (points>= totalpages / 32)
>> +                       points -= totalpages / 32;
>> +               else
>> +                       points = 0;
>
> Odd. Why do we initialize points with 0?
>
> I think the idea is good.

The points is unsigned. It's common technique to avoid underflow.



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-24  1:14       ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:14 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

Hi


>> @@ -476,14 +476,17 @@ static const struct file_operations proc_lstats_operations = {
>>
>>   static int proc_oom_score(struct task_struct *task, char *buffer)
>>   {
>> -       unsigned long points = 0;
>> +       unsigned long points;
>> +       unsigned long ratio = 0;
>> +       unsigned long totalpages = totalram_pages + total_swap_pages + 1;
>
> Does we need +1?
> oom_badness does have the check.

"ratio = points * 1000 / totalpages;" need to avoid zero divide.

>>         /*
>>          * Root processes get 3% bonus, just like the __vm_enough_memory()
>>          * implementation used by LSMs.
>> +        *
>> +        * XXX: Too large bonus, example, if the system have tera-bytes memory..
>>          */
>> -       if (has_capability_noaudit(p, CAP_SYS_ADMIN))
>> -               points -= 30;
>> +       if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
>> +               if (points>= totalpages / 32)
>> +                       points -= totalpages / 32;
>> +               else
>> +                       points = 0;
>
> Odd. Why do we initialize points with 0?
>
> I think the idea is good.

The points is unsigned. It's common technique to avoid underflow.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-23 22:48       ` David Rientjes
@ 2011-05-24  1:21         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:21 UTC (permalink / raw)
  To: rientjes
  Cc: caiqian, linux-mm, linux-kernel, akpm, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/24 7:48), David Rientjes wrote:
> On Mon, 23 May 2011, David Rientjes wrote:
>
>> I already suggested an alternative patch to CAI Qian to greatly increase
>> the granularity of the oom score from a range of 0-1000 to 0-10000 to
>> differentiate between tasks within 0.01% of available memory (16MB on CAI
>> Qian's 16GB system).  I'll propose this officially in a separate email.
>>
>
> This is an alternative patch as earlier proposed with suggested
> improvements from Minchan.  CAI, would it be possible to test this out on
> your usecase?
>
> I'm indifferent to the actual scale of OOM_SCORE_MAX_FACTOR; it could be
> 10 as proposed in this patch or even increased higher for higher
> resolution.

I did explain why your proposal is unacceptable.

http://www.gossamer-threads.com/lists/linux/kernel/1378837#1378837


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-24  1:21         ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:21 UTC (permalink / raw)
  To: rientjes
  Cc: caiqian, linux-mm, linux-kernel, akpm, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/24 7:48), David Rientjes wrote:
> On Mon, 23 May 2011, David Rientjes wrote:
>
>> I already suggested an alternative patch to CAI Qian to greatly increase
>> the granularity of the oom score from a range of 0-1000 to 0-10000 to
>> differentiate between tasks within 0.01% of available memory (16MB on CAI
>> Qian's 16GB system).  I'll propose this officially in a separate email.
>>
>
> This is an alternative patch as earlier proposed with suggested
> improvements from Minchan.  CAI, would it be possible to test this out on
> your usecase?
>
> I'm indifferent to the actual scale of OOM_SCORE_MAX_FACTOR; it could be
> 10 as proposed in this patch or even increased higher for higher
> resolution.

I did explain why your proposal is unacceptable.

http://www.gossamer-threads.com/lists/linux/kernel/1378837#1378837

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-24  1:14       ` KOSAKI Motohiro
@ 2011-05-24  1:32         ` Minchan Kim
  -1 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-24  1:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

On Tue, May 24, 2011 at 10:14 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi
>
>
>>> @@ -476,14 +476,17 @@ static const struct file_operations
>>> proc_lstats_operations = {
>>>
>>>  static int proc_oom_score(struct task_struct *task, char *buffer)
>>>  {
>>> -       unsigned long points = 0;
>>> +       unsigned long points;
>>> +       unsigned long ratio = 0;
>>> +       unsigned long totalpages = totalram_pages + total_swap_pages + 1;
>>
>> Does we need +1?
>> oom_badness does have the check.
>
> "ratio = points * 1000 / totalpages;" need to avoid zero divide.
>
>>>        /*
>>>         * Root processes get 3% bonus, just like the __vm_enough_memory()
>>>         * implementation used by LSMs.
>>> +        *
>>> +        * XXX: Too large bonus, example, if the system have tera-bytes
>>> memory..
>>>         */
>>> -       if (has_capability_noaudit(p, CAP_SYS_ADMIN))
>>> -               points -= 30;
>>> +       if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
>>> +               if (points>= totalpages / 32)
>>> +                       points -= totalpages / 32;
>>> +               else
>>> +                       points = 0;
>>
>> Odd. Why do we initialize points with 0?
>>
>> I think the idea is good.
>
> The points is unsigned. It's common technique to avoid underflow.
>

Thanks for explanation, KOSAKI.
I need sleeping. :(



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-24  1:32         ` Minchan Kim
  0 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-24  1:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

On Tue, May 24, 2011 at 10:14 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi
>
>
>>> @@ -476,14 +476,17 @@ static const struct file_operations
>>> proc_lstats_operations = {
>>>
>>>  static int proc_oom_score(struct task_struct *task, char *buffer)
>>>  {
>>> -       unsigned long points = 0;
>>> +       unsigned long points;
>>> +       unsigned long ratio = 0;
>>> +       unsigned long totalpages = totalram_pages + total_swap_pages + 1;
>>
>> Does we need +1?
>> oom_badness does have the check.
>
> "ratio = points * 1000 / totalpages;" need to avoid zero divide.
>
>>>        /*
>>>         * Root processes get 3% bonus, just like the __vm_enough_memory()
>>>         * implementation used by LSMs.
>>> +        *
>>> +        * XXX: Too large bonus, example, if the system have tera-bytes
>>> memory..
>>>         */
>>> -       if (has_capability_noaudit(p, CAP_SYS_ADMIN))
>>> -               points -= 30;
>>> +       if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
>>> +               if (points>= totalpages / 32)
>>> +                       points -= totalpages / 32;
>>> +               else
>>> +                       points = 0;
>>
>> Odd. Why do we initialize points with 0?
>>
>> I think the idea is good.
>
> The points is unsigned. It's common technique to avoid underflow.
>

Thanks for explanation, KOSAKI.
I need sleeping. :(



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-23 22:32     ` David Rientjes
@ 2011-05-24  1:35       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:35 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/24 7:32), David Rientjes wrote:
> On Fri, 20 May 2011, KOSAKI Motohiro wrote:
>
>> CAI Qian reported oom-killer killed all system daemons in his
>> system at first if he ran fork bomb as root. The problem is,
>> current logic give them bonus of 3% of system ram. Example,
>> he has 16GB machine, then root processes have ~500MB oom
>> immune. It bring us crazy bad result. _all_ processes have
>> oom-score=1 and then, oom killer ignore process memory usage
>> and kill random process. This regression is caused by commit
>> a63d83f427 (oom: badness heuristic rewrite).
>>
>> This patch changes select_bad_process() slightly. If oom points == 1,
>> it's a sign that the system have only root privileged processes or
>> similar. Thus, select_bad_process() calculate oom badness without
>> root bonus and select eligible process.
>>
>
> You said earlier that you thought it was a good idea to do a proportional
> based bonus for root processes.  Do you have a specific objection to
> giving root processes a 1% bonus for every 10% of used memory instead?

Because it's completely another topic. You have to maek another patch.



>> Also, this patch move finding sacrifice child logic into
>> select_bad_process(). It's necessary to implement adequate
>> no root bonus recalculation. and it makes good side effect,
>> current logic doesn't behave as the doc.
>>
>
> This is unnecessary and just makes the oom killer egregiously long.  We
> are already diagnosing problems here at Google where the oom killer holds
> tasklist_lock on the readside for far too long, causing other cpus waiting
> for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
> disabled and it is spinning.  A second tasklist scan is simply a
> non-starter.
>
>   [ This is also one of the reasons why we needed to introduce
>     mm->oom_disable_count to prevent a second, expensive tasklist scan. ]

You misunderstand the code. Both select_bad_process() and oom_kill_process()
are under tasklist_lock(). IOW, no change lock holding time.


>> Documentation/sysctl/vm.txt says
>>
>>      oom_kill_allocating_task
>>
>>      If this is set to non-zero, the OOM killer simply kills the task that
>>      triggered the out-of-memory condition.  This avoids the expensive
>>      tasklist scan.
>>
>> IOW, oom_kill_allocating_task shouldn't search sacrifice child.
>> This patch also fixes this issue.
>>
>
> oom_kill_allocating_task was introduced for SGI to prevent the expensive
> tasklist scan, the task that is actually allocating the memory isn't
> actually interesting and is usually random.  This should be turned into a
> documentation fix rather than changing the implementation.

No benefit. I don't take it.



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  1:35       ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:35 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/24 7:32), David Rientjes wrote:
> On Fri, 20 May 2011, KOSAKI Motohiro wrote:
>
>> CAI Qian reported oom-killer killed all system daemons in his
>> system at first if he ran fork bomb as root. The problem is,
>> current logic give them bonus of 3% of system ram. Example,
>> he has 16GB machine, then root processes have ~500MB oom
>> immune. It bring us crazy bad result. _all_ processes have
>> oom-score=1 and then, oom killer ignore process memory usage
>> and kill random process. This regression is caused by commit
>> a63d83f427 (oom: badness heuristic rewrite).
>>
>> This patch changes select_bad_process() slightly. If oom points == 1,
>> it's a sign that the system have only root privileged processes or
>> similar. Thus, select_bad_process() calculate oom badness without
>> root bonus and select eligible process.
>>
>
> You said earlier that you thought it was a good idea to do a proportional
> based bonus for root processes.  Do you have a specific objection to
> giving root processes a 1% bonus for every 10% of used memory instead?

Because it's completely another topic. You have to maek another patch.



>> Also, this patch move finding sacrifice child logic into
>> select_bad_process(). It's necessary to implement adequate
>> no root bonus recalculation. and it makes good side effect,
>> current logic doesn't behave as the doc.
>>
>
> This is unnecessary and just makes the oom killer egregiously long.  We
> are already diagnosing problems here at Google where the oom killer holds
> tasklist_lock on the readside for far too long, causing other cpus waiting
> for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
> disabled and it is spinning.  A second tasklist scan is simply a
> non-starter.
>
>   [ This is also one of the reasons why we needed to introduce
>     mm->oom_disable_count to prevent a second, expensive tasklist scan. ]

You misunderstand the code. Both select_bad_process() and oom_kill_process()
are under tasklist_lock(). IOW, no change lock holding time.


>> Documentation/sysctl/vm.txt says
>>
>>      oom_kill_allocating_task
>>
>>      If this is set to non-zero, the OOM killer simply kills the task that
>>      triggered the out-of-memory condition.  This avoids the expensive
>>      tasklist scan.
>>
>> IOW, oom_kill_allocating_task shouldn't search sacrifice child.
>> This patch also fixes this issue.
>>
>
> oom_kill_allocating_task was introduced for SGI to prevent the expensive
> tasklist scan, the task that is actually allocating the memory isn't
> actually interesting and is usually random.  This should be turned into a
> documentation fix rather than changing the implementation.

No benefit. I don't take it.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  1:35       ` KOSAKI Motohiro
@ 2011-05-24  1:39         ` David Rientjes
  -1 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-24  1:39 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Tue, 24 May 2011, KOSAKI Motohiro wrote:

> > > Also, this patch move finding sacrifice child logic into
> > > select_bad_process(). It's necessary to implement adequate
> > > no root bonus recalculation. and it makes good side effect,
> > > current logic doesn't behave as the doc.
> > > 
> > 
> > This is unnecessary and just makes the oom killer egregiously long.  We
> > are already diagnosing problems here at Google where the oom killer holds
> > tasklist_lock on the readside for far too long, causing other cpus waiting
> > for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
> > disabled and it is spinning.  A second tasklist scan is simply a
> > non-starter.
> > 
> >   [ This is also one of the reasons why we needed to introduce
> >     mm->oom_disable_count to prevent a second, expensive tasklist scan. ]
> 
> You misunderstand the code. Both select_bad_process() and oom_kill_process()
> are under tasklist_lock(). IOW, no change lock holding time.
> 

A second iteration through the tasklist in select_bad_process() will 
extend the time that tasklist_lock is held, which is what your patch does.  

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  1:39         ` David Rientjes
  0 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-24  1:39 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Tue, 24 May 2011, KOSAKI Motohiro wrote:

> > > Also, this patch move finding sacrifice child logic into
> > > select_bad_process(). It's necessary to implement adequate
> > > no root bonus recalculation. and it makes good side effect,
> > > current logic doesn't behave as the doc.
> > > 
> > 
> > This is unnecessary and just makes the oom killer egregiously long.  We
> > are already diagnosing problems here at Google where the oom killer holds
> > tasklist_lock on the readside for far too long, causing other cpus waiting
> > for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
> > disabled and it is spinning.  A second tasklist scan is simply a
> > non-starter.
> > 
> >   [ This is also one of the reasons why we needed to introduce
> >     mm->oom_disable_count to prevent a second, expensive tasklist scan. ]
> 
> You misunderstand the code. Both select_bad_process() and oom_kill_process()
> are under tasklist_lock(). IOW, no change lock holding time.
> 

A second iteration through the tasklist in select_bad_process() will 
extend the time that tasklist_lock is held, which is what your patch does.  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-23  4:02     ` Minchan Kim
@ 2011-05-24  1:44       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:44 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

>> @@ -176,33 +178,49 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>>          */
>>         points = get_mm_rss(p->mm) + p->mm->nr_ptes;
>>         points += get_mm_counter(p->mm, MM_SWAPENTS);
>> -
>> -       points *= 1000;
>> -       points /= totalpages;
>>         task_unlock(p);
>>
>>         /*
>>          * Root processes get 3% bonus, just like the __vm_enough_memory()
>>          * implementation used by LSMs.
>> +        *
>> +        * XXX: Too large bonus, example, if the system have tera-bytes memory..
>>          */
>
> Nitpick. I have no opposition about adding this comment.
> But strictly speaking, the comment isn't related to this patch.
> No biggie and it's up to you.  :)

ok, removed.

 From 3dda8863e5acdba7a714f0e7506fae931865c442 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Tue, 24 May 2011 10:43:49 +0900
Subject: [PATCH] remove unrelated comments

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
  mm/oom_kill.c |    2 --
  1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ec075cc..b01fa64 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -184,8 +184,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
  	/*
  	 * Root processes get 3% bonus, just like the __vm_enough_memory()
  	 * implementation used by LSMs.
-	 *
-	 * XXX: Too large bonus, example, if the system have tera-bytes memory..
  	 */
  	if (protect_root && has_capability_noaudit(p, CAP_SYS_ADMIN)) {
  		if (points >= totalpages / 32)
-- 
1.7.3.1



^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-24  1:44       ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:44 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

>> @@ -176,33 +178,49 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>>          */
>>         points = get_mm_rss(p->mm) + p->mm->nr_ptes;
>>         points += get_mm_counter(p->mm, MM_SWAPENTS);
>> -
>> -       points *= 1000;
>> -       points /= totalpages;
>>         task_unlock(p);
>>
>>         /*
>>          * Root processes get 3% bonus, just like the __vm_enough_memory()
>>          * implementation used by LSMs.
>> +        *
>> +        * XXX: Too large bonus, example, if the system have tera-bytes memory..
>>          */
>
> Nitpick. I have no opposition about adding this comment.
> But strictly speaking, the comment isn't related to this patch.
> No biggie and it's up to you.  :)

ok, removed.

 From 3dda8863e5acdba7a714f0e7506fae931865c442 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Tue, 24 May 2011 10:43:49 +0900
Subject: [PATCH] remove unrelated comments

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
  mm/oom_kill.c |    2 --
  1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ec075cc..b01fa64 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -184,8 +184,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
  	/*
  	 * Root processes get 3% bonus, just like the __vm_enough_memory()
  	 * implementation used by LSMs.
-	 *
-	 * XXX: Too large bonus, example, if the system have tera-bytes memory..
  	 */
  	if (protect_root && has_capability_noaudit(p, CAP_SYS_ADMIN)) {
  		if (points >= totalpages / 32)
-- 
1.7.3.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-23  4:31     ` Minchan Kim
@ 2011-05-24  1:53       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:53 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

>> +       /*
>> +        * chosen_point==1 may be a sign that root privilege bonus is too large
>> +        * and we choose wrong task. Let's recalculate oom score without the
>> +        * dubious bonus.
>> +        */
>> +       if (protect_root&&  (chosen_points == 1)) {
>> +               protect_root = 0;
>> +               goto retry;
>> +       }
>
> The idea is good to me.
> But once we meet it, should we give up protecting root privileged processes?
> How about decaying bonus point?

After applying my patch, unprivileged process never get score-1. (note, mapping
anon pages naturally makes to increase nr_ptes)

Then, decaying don't make any accuracy. Am I missing something?



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  1:53       ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:53 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

>> +       /*
>> +        * chosen_point==1 may be a sign that root privilege bonus is too large
>> +        * and we choose wrong task. Let's recalculate oom score without the
>> +        * dubious bonus.
>> +        */
>> +       if (protect_root&&  (chosen_points == 1)) {
>> +               protect_root = 0;
>> +               goto retry;
>> +       }
>
> The idea is good to me.
> But once we meet it, should we give up protecting root privileged processes?
> How about decaying bonus point?

After applying my patch, unprivileged process never get score-1. (note, mapping
anon pages naturally makes to increase nr_ptes)

Then, decaying don't make any accuracy. Am I missing something?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  1:39         ` David Rientjes
@ 2011-05-24  1:55           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:55 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

>>> This is unnecessary and just makes the oom killer egregiously long.  We
>>> are already diagnosing problems here at Google where the oom killer holds
>>> tasklist_lock on the readside for far too long, causing other cpus waiting
>>> for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
>>> disabled and it is spinning.  A second tasklist scan is simply a
>>> non-starter.
>>>
>>>    [ This is also one of the reasons why we needed to introduce
>>>      mm->oom_disable_count to prevent a second, expensive tasklist scan. ]
>>
>> You misunderstand the code. Both select_bad_process() and oom_kill_process()
>> are under tasklist_lock(). IOW, no change lock holding time.
>>
>
> A second iteration through the tasklist in select_bad_process() will
> extend the time that tasklist_lock is held, which is what your patch does.

It never happen usual case. Plz think when happen all process score = 1.


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  1:55           ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  1:55 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

>>> This is unnecessary and just makes the oom killer egregiously long.  We
>>> are already diagnosing problems here at Google where the oom killer holds
>>> tasklist_lock on the readside for far too long, causing other cpus waiting
>>> for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
>>> disabled and it is spinning.  A second tasklist scan is simply a
>>> non-starter.
>>>
>>>    [ This is also one of the reasons why we needed to introduce
>>>      mm->oom_disable_count to prevent a second, expensive tasklist scan. ]
>>
>> You misunderstand the code. Both select_bad_process() and oom_kill_process()
>> are under tasklist_lock(). IOW, no change lock holding time.
>>
>
> A second iteration through the tasklist in select_bad_process() will
> extend the time that tasklist_lock is held, which is what your patch does.

It never happen usual case. Plz think when happen all process score = 1.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  1:55           ` KOSAKI Motohiro
@ 2011-05-24  1:58             ` David Rientjes
  -1 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-24  1:58 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Tue, 24 May 2011, KOSAKI Motohiro wrote:

> > > > This is unnecessary and just makes the oom killer egregiously long.  We
> > > > are already diagnosing problems here at Google where the oom killer
> > > > holds
> > > > tasklist_lock on the readside for far too long, causing other cpus
> > > > waiting
> > > > for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
> > > > disabled and it is spinning.  A second tasklist scan is simply a
> > > > non-starter.
> > > > 
> > > >    [ This is also one of the reasons why we needed to introduce
> > > >      mm->oom_disable_count to prevent a second, expensive tasklist scan.
> > > > ]
> > > 
> > > You misunderstand the code. Both select_bad_process() and
> > > oom_kill_process()
> > > are under tasklist_lock(). IOW, no change lock holding time.
> > > 
> > 
> > A second iteration through the tasklist in select_bad_process() will
> > extend the time that tasklist_lock is held, which is what your patch does.
> 
> It never happen usual case. Plz think when happen all process score = 1.
> 

I don't care if it happens in the usual case or extremely rare case.  It 
significantly increases the amount of time that tasklist_lock is held 
which causes writelock starvation on other cpus and causes issues, 
especially if the cpu being starved is updating the timer because it has 
irqs disabled, i.e. write_lock_irq(&tasklist_lock) usually in the clone or 
exit path.  We can do better than that, and that's why I proposed my patch 
to CAI that increases the resolution of the scoring and makes the root 
process bonus proportional to the amount of used memory.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  1:58             ` David Rientjes
  0 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-24  1:58 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Tue, 24 May 2011, KOSAKI Motohiro wrote:

> > > > This is unnecessary and just makes the oom killer egregiously long.  We
> > > > are already diagnosing problems here at Google where the oom killer
> > > > holds
> > > > tasklist_lock on the readside for far too long, causing other cpus
> > > > waiting
> > > > for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
> > > > disabled and it is spinning.  A second tasklist scan is simply a
> > > > non-starter.
> > > > 
> > > >    [ This is also one of the reasons why we needed to introduce
> > > >      mm->oom_disable_count to prevent a second, expensive tasklist scan.
> > > > ]
> > > 
> > > You misunderstand the code. Both select_bad_process() and
> > > oom_kill_process()
> > > are under tasklist_lock(). IOW, no change lock holding time.
> > > 
> > 
> > A second iteration through the tasklist in select_bad_process() will
> > extend the time that tasklist_lock is held, which is what your patch does.
> 
> It never happen usual case. Plz think when happen all process score = 1.
> 

I don't care if it happens in the usual case or extremely rare case.  It 
significantly increases the amount of time that tasklist_lock is held 
which causes writelock starvation on other cpus and causes issues, 
especially if the cpu being starved is updating the timer because it has 
irqs disabled, i.e. write_lock_irq(&tasklist_lock) usually in the clone or 
exit path.  We can do better than that, and that's why I proposed my patch 
to CAI that increases the resolution of the scoring and makes the root 
process bonus proportional to the amount of used memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  1:58             ` David Rientjes
@ 2011-05-24  2:03               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  2:03 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/24 10:58), David Rientjes wrote:
> On Tue, 24 May 2011, KOSAKI Motohiro wrote:
>
>>>>> This is unnecessary and just makes the oom killer egregiously long.  We
>>>>> are already diagnosing problems here at Google where the oom killer
>>>>> holds
>>>>> tasklist_lock on the readside for far too long, causing other cpus
>>>>> waiting
>>>>> for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
>>>>> disabled and it is spinning.  A second tasklist scan is simply a
>>>>> non-starter.
>>>>>
>>>>>     [ This is also one of the reasons why we needed to introduce
>>>>>       mm->oom_disable_count to prevent a second, expensive tasklist scan.
>>>>> ]
>>>>
>>>> You misunderstand the code. Both select_bad_process() and
>>>> oom_kill_process()
>>>> are under tasklist_lock(). IOW, no change lock holding time.
>>>>
>>>
>>> A second iteration through the tasklist in select_bad_process() will
>>> extend the time that tasklist_lock is held, which is what your patch does.
>>
>> It never happen usual case. Plz think when happen all process score = 1.
>>
>
> I don't care if it happens in the usual case or extremely rare case.  It
> significantly increases the amount of time that tasklist_lock is held
> which causes writelock starvation on other cpus and causes issues,
> especially if the cpu being starved is updating the timer because it has
> irqs disabled, i.e. write_lock_irq(&tasklist_lock) usually in the clone or
> exit path.  We can do better than that, and that's why I proposed my patch
> to CAI that increases the resolution of the scoring and makes the root
> process bonus proportional to the amount of used memory.

Do I need to say the same word? Please read the code at first.




^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  2:03               ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  2:03 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/24 10:58), David Rientjes wrote:
> On Tue, 24 May 2011, KOSAKI Motohiro wrote:
>
>>>>> This is unnecessary and just makes the oom killer egregiously long.  We
>>>>> are already diagnosing problems here at Google where the oom killer
>>>>> holds
>>>>> tasklist_lock on the readside for far too long, causing other cpus
>>>>> waiting
>>>>> for a write_lock_irq(&tasklist_lock) to encounter issues when irqs are
>>>>> disabled and it is spinning.  A second tasklist scan is simply a
>>>>> non-starter.
>>>>>
>>>>>     [ This is also one of the reasons why we needed to introduce
>>>>>       mm->oom_disable_count to prevent a second, expensive tasklist scan.
>>>>> ]
>>>>
>>>> You misunderstand the code. Both select_bad_process() and
>>>> oom_kill_process()
>>>> are under tasklist_lock(). IOW, no change lock holding time.
>>>>
>>>
>>> A second iteration through the tasklist in select_bad_process() will
>>> extend the time that tasklist_lock is held, which is what your patch does.
>>
>> It never happen usual case. Plz think when happen all process score = 1.
>>
>
> I don't care if it happens in the usual case or extremely rare case.  It
> significantly increases the amount of time that tasklist_lock is held
> which causes writelock starvation on other cpus and causes issues,
> especially if the cpu being starved is updating the timer because it has
> irqs disabled, i.e. write_lock_irq(&tasklist_lock) usually in the clone or
> exit path.  We can do better than that, and that's why I proposed my patch
> to CAI that increases the resolution of the scoring and makes the root
> process bonus proportional to the amount of used memory.

Do I need to say the same word? Please read the code at first.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-23 22:28     ` David Rientjes
@ 2011-05-24  2:07       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  2:07 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/24 7:28), David Rientjes wrote:
> On Fri, 20 May 2011, KOSAKI Motohiro wrote:
>
>> CAI Qian reported his kernel did hang-up if he ran fork intensive
>> workload and then invoke oom-killer.
>>
>> The problem is, current oom calculation uses 0-1000 normalized value
>> (The unit is a permillage of system-ram). Its low precision make
>> a lot of same oom score. IOW, in his case, all processes have smaller
>> oom score than 1 and internal calculation round it to 1.
>>
>> Thus oom-killer kill ineligible process. This regression is caused by
>> commit a63d83f427 (oom: badness heuristic rewrite).
>>
>> The solution is, the internal calculation just use number of pages
>> instead of permillage of system-ram. And convert it to permillage
>> value at displaying time.
>>
>> This patch doesn't change any ABI (included  /proc/<pid>/oom_score_adj)
>> even though current logic has a lot of my dislike thing.
>>
>
> Same response as when you initially proposed this patch:
> http://marc.info/?l=linux-kernel&m=130507086613317 -- you never replied to
> that.

I did replay. Why don't you read?
http://www.gossamer-threads.com/lists/linux/kernel/1378837#1378837

If you haven't understand the issue, you can apply following patch and
run it.


diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b01fa64..f35909b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -718,6 +718,9 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
  	 */
  	constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
  						&totalpages);
+
+	totalpages *= 10;
+
  	mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
  	check_panic_on_oom(constraint, gfp_mask, order, mpol_mask);



> The changelog doesn't accurately represent CAI Qian's problem; the issue
> is that root processes are given too large of a bonus in comparison to
> other threads that are using at most 1.9% of available memory.  That can
> be fixed, as I suggested by giving 1% bonus per 10% of memory used so that
> the process would have to be using 10% before it even receives a bonus.
>
> I already suggested an alternative patch to CAI Qian to greatly increase
> the granularity of the oom score from a range of 0-1000 to 0-10000 to
> differentiate between tasks within 0.01% of available memory (16MB on CAI
> Qian's 16GB system).  I'll propose this officially in a separate email.
>
> This patch also includes undocumented changes such as changing the bonus
> given to root processes.





^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-24  2:07       ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  2:07 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/24 7:28), David Rientjes wrote:
> On Fri, 20 May 2011, KOSAKI Motohiro wrote:
>
>> CAI Qian reported his kernel did hang-up if he ran fork intensive
>> workload and then invoke oom-killer.
>>
>> The problem is, current oom calculation uses 0-1000 normalized value
>> (The unit is a permillage of system-ram). Its low precision make
>> a lot of same oom score. IOW, in his case, all processes have smaller
>> oom score than 1 and internal calculation round it to 1.
>>
>> Thus oom-killer kill ineligible process. This regression is caused by
>> commit a63d83f427 (oom: badness heuristic rewrite).
>>
>> The solution is, the internal calculation just use number of pages
>> instead of permillage of system-ram. And convert it to permillage
>> value at displaying time.
>>
>> This patch doesn't change any ABI (included  /proc/<pid>/oom_score_adj)
>> even though current logic has a lot of my dislike thing.
>>
>
> Same response as when you initially proposed this patch:
> http://marc.info/?l=linux-kernel&m=130507086613317 -- you never replied to
> that.

I did replay. Why don't you read?
http://www.gossamer-threads.com/lists/linux/kernel/1378837#1378837

If you haven't understand the issue, you can apply following patch and
run it.


diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b01fa64..f35909b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -718,6 +718,9 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
  	 */
  	constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
  						&totalpages);
+
+	totalpages *= 10;
+
  	mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
  	check_panic_on_oom(constraint, gfp_mask, order, mpol_mask);



> The changelog doesn't accurately represent CAI Qian's problem; the issue
> is that root processes are given too large of a bonus in comparison to
> other threads that are using at most 1.9% of available memory.  That can
> be fixed, as I suggested by giving 1% bonus per 10% of memory used so that
> the process would have to be using 10% before it even receives a bonus.
>
> I already suggested an alternative patch to CAI Qian to greatly increase
> the granularity of the oom score from a range of 0-1000 to 0-10000 to
> differentiate between tasks within 0.01% of available memory (16MB on CAI
> Qian's 16GB system).  I'll propose this officially in a separate email.
>
> This patch also includes undocumented changes such as changing the bonus
> given to root processes.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-24  1:44       ` KOSAKI Motohiro
@ 2011-05-24  3:11         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  3:11 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

> ok, removed.

I'm sorry. previous patch has white space damage.
Let's retry send it.


>From 3dda8863e5acdba7a714f0e7506fae931865c442 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Tue, 24 May 2011 10:43:49 +0900
Subject: [PATCH] remove unrelated comments

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ec075cc..b01fa64 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -184,8 +184,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	/*
 	 * Root processes get 3% bonus, just like the __vm_enough_memory()
 	 * implementation used by LSMs.
-	 *
-	 * XXX: Too large bonus, example, if the system have tera-bytes memory..
 	 */
 	if (protect_root && has_capability_noaudit(p, CAP_SYS_ADMIN)) {
 		if (points >= totalpages / 32)
-- 
1.7.3.1




^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-24  3:11         ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  3:11 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

> ok, removed.

I'm sorry. previous patch has white space damage.
Let's retry send it.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-23 22:48       ` David Rientjes
@ 2011-05-24  8:32         ` CAI Qian
  -1 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-24  8:32 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, hughd, kamezawa hiroyu,
	minchan kim, oleg, KOSAKI Motohiro



----- Original Message -----
> On Mon, 23 May 2011, David Rientjes wrote:
> 
> > I already suggested an alternative patch to CAI Qian to greatly
> > increase
> > the granularity of the oom score from a range of 0-1000 to 0-10000
> > to
> > differentiate between tasks within 0.01% of available memory (16MB
> > on CAI
> > Qian's 16GB system). I'll propose this officially in a separate
> > email.
> >
> 
> This is an alternative patch as earlier proposed with suggested
> improvements from Minchan. CAI, would it be possible to test this out
> on
> your usecase?
Sure, will test KOSAKI Motohiro's v2 patches plus this one.
> I'm indifferent to the actual scale of OOM_SCORE_MAX_FACTOR; it could
> be
> 10 as proposed in this patch or even increased higher for higher
> resolution.
> 
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -38,6 +38,9 @@ int sysctl_oom_kill_allocating_task;
> int sysctl_oom_dump_tasks = 1;
> static DEFINE_SPINLOCK(zone_scan_lock);
> 
> +#define OOM_SCORE_MAX_FACTOR 10
> +#define OOM_SCORE_MAX (OOM_SCORE_ADJ_MAX * OOM_SCORE_MAX_FACTOR)
> +
> #ifdef CONFIG_NUMA
> /**
> * has_intersects_mems_allowed() - check task eligiblity for kill
> @@ -160,7 +163,7 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> */
> if (p->flags & PF_OOM_ORIGIN) {
> task_unlock(p);
> - return 1000;
> + return OOM_SCORE_MAX;
> }
> 
> /*
> @@ -177,32 +180,38 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> points = get_mm_rss(p->mm) + p->mm->nr_ptes;
> points += get_mm_counter(p->mm, MM_SWAPENTS);
> 
> - points *= 1000;
> + points *= OOM_SCORE_MAX;
> points /= totalpages;
> task_unlock(p);
> 
> /*
> - * Root processes get 3% bonus, just like the __vm_enough_memory()
> - * implementation used by LSMs.
> + * Root processes get a bonus of 1% per 10% of memory used.
> */
> - if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> - points -= 30;
> + if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> + int bonus;
> + int granularity;
> +
> + bonus = OOM_SCORE_MAX / 100; /* bonus is 1% */
> + granularity = OOM_SCORE_MAX / 10; /* granularity is 10% */
> +
> + points -= bonus * (points / granularity);
> + }
> 
> /*
> * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
> * either completely disable oom killing or always prefer a certain
> * task.
> */
> - points += p->signal->oom_score_adj;
> + points += p->signal->oom_score_adj * OOM_SCORE_MAX_FACTOR;
> 
> /*
> * Never return 0 for an eligible task that may be killed since it's
> - * possible that no single user task uses more than 0.1% of memory
> and
> + * possible that no single user task uses more than 0.01% of memory
> and
> * no single admin tasks uses more than 3.0%.
> */
> if (points <= 0)
> return 1;
> - return (points < 1000) ? points : 1000;
> + return (points < OOM_SCORE_MAX) ? points : OOM_SCORE_MAX;
> }
> 
> /*
> @@ -314,7 +323,7 @@ static struct task_struct
> *select_bad_process(unsigned int *ppoints,
> */
> if (p == current) {
> chosen = p;
> - *ppoints = 1000;
> + *ppoints = OOM_SCORE_MAX;
> } else {
> /*
> * If this task is not being ptraced on exit,

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-24  8:32         ` CAI Qian
  0 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-24  8:32 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, hughd, kamezawa hiroyu,
	minchan kim, oleg, KOSAKI Motohiro



----- Original Message -----
> On Mon, 23 May 2011, David Rientjes wrote:
> 
> > I already suggested an alternative patch to CAI Qian to greatly
> > increase
> > the granularity of the oom score from a range of 0-1000 to 0-10000
> > to
> > differentiate between tasks within 0.01% of available memory (16MB
> > on CAI
> > Qian's 16GB system). I'll propose this officially in a separate
> > email.
> >
> 
> This is an alternative patch as earlier proposed with suggested
> improvements from Minchan. CAI, would it be possible to test this out
> on
> your usecase?
Sure, will test KOSAKI Motohiro's v2 patches plus this one.
> I'm indifferent to the actual scale of OOM_SCORE_MAX_FACTOR; it could
> be
> 10 as proposed in this patch or even increased higher for higher
> resolution.
> 
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -38,6 +38,9 @@ int sysctl_oom_kill_allocating_task;
> int sysctl_oom_dump_tasks = 1;
> static DEFINE_SPINLOCK(zone_scan_lock);
> 
> +#define OOM_SCORE_MAX_FACTOR 10
> +#define OOM_SCORE_MAX (OOM_SCORE_ADJ_MAX * OOM_SCORE_MAX_FACTOR)
> +
> #ifdef CONFIG_NUMA
> /**
> * has_intersects_mems_allowed() - check task eligiblity for kill
> @@ -160,7 +163,7 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> */
> if (p->flags & PF_OOM_ORIGIN) {
> task_unlock(p);
> - return 1000;
> + return OOM_SCORE_MAX;
> }
> 
> /*
> @@ -177,32 +180,38 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> points = get_mm_rss(p->mm) + p->mm->nr_ptes;
> points += get_mm_counter(p->mm, MM_SWAPENTS);
> 
> - points *= 1000;
> + points *= OOM_SCORE_MAX;
> points /= totalpages;
> task_unlock(p);
> 
> /*
> - * Root processes get 3% bonus, just like the __vm_enough_memory()
> - * implementation used by LSMs.
> + * Root processes get a bonus of 1% per 10% of memory used.
> */
> - if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> - points -= 30;
> + if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> + int bonus;
> + int granularity;
> +
> + bonus = OOM_SCORE_MAX / 100; /* bonus is 1% */
> + granularity = OOM_SCORE_MAX / 10; /* granularity is 10% */
> +
> + points -= bonus * (points / granularity);
> + }
> 
> /*
> * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
> * either completely disable oom killing or always prefer a certain
> * task.
> */
> - points += p->signal->oom_score_adj;
> + points += p->signal->oom_score_adj * OOM_SCORE_MAX_FACTOR;
> 
> /*
> * Never return 0 for an eligible task that may be killed since it's
> - * possible that no single user task uses more than 0.1% of memory
> and
> + * possible that no single user task uses more than 0.01% of memory
> and
> * no single admin tasks uses more than 3.0%.
> */
> if (points <= 0)
> return 1;
> - return (points < 1000) ? points : 1000;
> + return (points < OOM_SCORE_MAX) ? points : OOM_SCORE_MAX;
> }
> 
> /*
> @@ -314,7 +323,7 @@ static struct task_struct
> *select_bad_process(unsigned int *ppoints,
> */
> if (p == current) {
> chosen = p;
> - *ppoints = 1000;
> + *ppoints = OOM_SCORE_MAX;
> } else {
> /*
> * If this task is not being ptraced on exit,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  1:53       ` KOSAKI Motohiro
@ 2011-05-24  8:46         ` Minchan Kim
  -1 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-24  8:46 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

On Tue, May 24, 2011 at 10:53 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>>> +       /*
>>> +        * chosen_point==1 may be a sign that root privilege bonus is too
>>> large
>>> +        * and we choose wrong task. Let's recalculate oom score without
>>> the
>>> +        * dubious bonus.
>>> +        */
>>> +       if (protect_root&&  (chosen_points == 1)) {
>>> +               protect_root = 0;
>>> +               goto retry;
>>> +       }
>>
>> The idea is good to me.
>> But once we meet it, should we give up protecting root privileged
>> processes?
>> How about decaying bonus point?
>
> After applying my patch, unprivileged process never get score-1. (note,
> mapping
> anon pages naturally makes to increase nr_ptes)

Hmm, If I understand your code correctly, unprivileged process can get
a score 1 by 3% bonus.
So after all, we can get a chosen_point with 1.
Why I get a chosen_point with 1 is as bonus is rather big, I think.
So I would like to use small bonus than first iteration(ie, decay bonus).

>
> Then, decaying don't make any accuracy. Am I missing something?

Maybe I miss something.  :(




-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  8:46         ` Minchan Kim
  0 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-24  8:46 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

On Tue, May 24, 2011 at 10:53 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>>> +       /*
>>> +        * chosen_point==1 may be a sign that root privilege bonus is too
>>> large
>>> +        * and we choose wrong task. Let's recalculate oom score without
>>> the
>>> +        * dubious bonus.
>>> +        */
>>> +       if (protect_root&&  (chosen_points == 1)) {
>>> +               protect_root = 0;
>>> +               goto retry;
>>> +       }
>>
>> The idea is good to me.
>> But once we meet it, should we give up protecting root privileged
>> processes?
>> How about decaying bonus point?
>
> After applying my patch, unprivileged process never get score-1. (note,
> mapping
> anon pages naturally makes to increase nr_ptes)

Hmm, If I understand your code correctly, unprivileged process can get
a score 1 by 3% bonus.
So after all, we can get a chosen_point with 1.
Why I get a chosen_point with 1 is as bonus is rather big, I think.
So I would like to use small bonus than first iteration(ie, decay bonus).

>
> Then, decaying don't make any accuracy. Am I missing something?

Maybe I miss something.  :(




-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  8:46         ` Minchan Kim
@ 2011-05-24  8:49           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  8:49 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

(2011/05/24 17:46), Minchan Kim wrote:
> On Tue, May 24, 2011 at 10:53 AM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
>>>> +       /*
>>>> +        * chosen_point==1 may be a sign that root privilege bonus is too
>>>> large
>>>> +        * and we choose wrong task. Let's recalculate oom score without
>>>> the
>>>> +        * dubious bonus.
>>>> +        */
>>>> +       if (protect_root&&  (chosen_points == 1)) {
>>>> +               protect_root = 0;
>>>> +               goto retry;
>>>> +       }
>>>
>>> The idea is good to me.
>>> But once we meet it, should we give up protecting root privileged
>>> processes?
>>> How about decaying bonus point?
>>
>> After applying my patch, unprivileged process never get score-1. (note,
>> mapping
>> anon pages naturally makes to increase nr_ptes)
> 
> Hmm, If I understand your code correctly, unprivileged process can get
> a score 1 by 3% bonus.

3% bonus is for privileged process. :)


> So after all, we can get a chosen_point with 1.
> Why I get a chosen_point with 1 is as bonus is rather big, I think.
> So I would like to use small bonus than first iteration(ie, decay bonus).
> 
>>
>> Then, decaying don't make any accuracy. Am I missing something?
> 
> Maybe I miss something.  :(
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  8:49           ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  8:49 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

(2011/05/24 17:46), Minchan Kim wrote:
> On Tue, May 24, 2011 at 10:53 AM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
>>>> +       /*
>>>> +        * chosen_point==1 may be a sign that root privilege bonus is too
>>>> large
>>>> +        * and we choose wrong task. Let's recalculate oom score without
>>>> the
>>>> +        * dubious bonus.
>>>> +        */
>>>> +       if (protect_root&&  (chosen_points == 1)) {
>>>> +               protect_root = 0;
>>>> +               goto retry;
>>>> +       }
>>>
>>> The idea is good to me.
>>> But once we meet it, should we give up protecting root privileged
>>> processes?
>>> How about decaying bonus point?
>>
>> After applying my patch, unprivileged process never get score-1. (note,
>> mapping
>> anon pages naturally makes to increase nr_ptes)
> 
> Hmm, If I understand your code correctly, unprivileged process can get
> a score 1 by 3% bonus.

3% bonus is for privileged process. :)


> So after all, we can get a chosen_point with 1.
> Why I get a chosen_point with 1 is as bonus is rather big, I think.
> So I would like to use small bonus than first iteration(ie, decay bonus).
> 
>>
>> Then, decaying don't make any accuracy. Am I missing something?
> 
> Maybe I miss something.  :(
> 
> 
> 
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  8:49           ` KOSAKI Motohiro
@ 2011-05-24  9:04             ` Minchan Kim
  -1 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-24  9:04 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

On Tue, May 24, 2011 at 5:49 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> (2011/05/24 17:46), Minchan Kim wrote:
>> On Tue, May 24, 2011 at 10:53 AM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>>>>> +       /*
>>>>> +        * chosen_point==1 may be a sign that root privilege bonus is too
>>>>> large
>>>>> +        * and we choose wrong task. Let's recalculate oom score without
>>>>> the
>>>>> +        * dubious bonus.
>>>>> +        */
>>>>> +       if (protect_root&&  (chosen_points == 1)) {
>>>>> +               protect_root = 0;
>>>>> +               goto retry;
>>>>> +       }
>>>>
>>>> The idea is good to me.
>>>> But once we meet it, should we give up protecting root privileged
>>>> processes?
>>>> How about decaying bonus point?
>>>
>>> After applying my patch, unprivileged process never get score-1. (note,
>>> mapping
>>> anon pages naturally makes to increase nr_ptes)
>>
>> Hmm, If I understand your code correctly, unprivileged process can get
>> a score 1 by 3% bonus.
>
> 3% bonus is for privileged process. :)

OMG. Typo.
Anyway, my point is following as.
If chose_point is 1, it means root bonus is rather big. Right?
If is is, your patch does second loop with completely ignore of bonus
for root privileged process.
My point is that let's not ignore bonus completely. Instead of it,
let's recalculate 1.5% for example.

But I don't insist on my idea.
Thanks.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  9:04             ` Minchan Kim
  0 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-24  9:04 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

On Tue, May 24, 2011 at 5:49 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> (2011/05/24 17:46), Minchan Kim wrote:
>> On Tue, May 24, 2011 at 10:53 AM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>>>>> +       /*
>>>>> +        * chosen_point==1 may be a sign that root privilege bonus is too
>>>>> large
>>>>> +        * and we choose wrong task. Let's recalculate oom score without
>>>>> the
>>>>> +        * dubious bonus.
>>>>> +        */
>>>>> +       if (protect_root&&  (chosen_points == 1)) {
>>>>> +               protect_root = 0;
>>>>> +               goto retry;
>>>>> +       }
>>>>
>>>> The idea is good to me.
>>>> But once we meet it, should we give up protecting root privileged
>>>> processes?
>>>> How about decaying bonus point?
>>>
>>> After applying my patch, unprivileged process never get score-1. (note,
>>> mapping
>>> anon pages naturally makes to increase nr_ptes)
>>
>> Hmm, If I understand your code correctly, unprivileged process can get
>> a score 1 by 3% bonus.
>
> 3% bonus is for privileged process. :)

OMG. Typo.
Anyway, my point is following as.
If chose_point is 1, it means root bonus is rather big. Right?
If is is, your patch does second loop with completely ignore of bonus
for root privileged process.
My point is that let's not ignore bonus completely. Instead of it,
let's recalculate 1.5% for example.

But I don't insist on my idea.
Thanks.
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  9:04             ` Minchan Kim
@ 2011-05-24  9:09               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  9:09 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

>>> Hmm, If I understand your code correctly, unprivileged process can get
>>> a score 1 by 3% bonus.
>>
>> 3% bonus is for privileged process. :)
> 
> OMG. Typo.
> Anyway, my point is following as.
> If chose_point is 1, it means root bonus is rather big. Right?
> If is is, your patch does second loop with completely ignore of bonus
> for root privileged process.
> My point is that let's not ignore bonus completely. Instead of it,
> let's recalculate 1.5% for example.

1) unpriviledged process can't get score 1 (because at least a process need one
   anon, one file and two or more ptes).
2) then, score=1 mean all processes in the system are privileged. thus decay won't help.

IOW, never happen privileged and unprivileged score in this case.


> 
> But I don't insist on my idea.
> Thanks.



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  9:09               ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  9:09 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

>>> Hmm, If I understand your code correctly, unprivileged process can get
>>> a score 1 by 3% bonus.
>>
>> 3% bonus is for privileged process. :)
> 
> OMG. Typo.
> Anyway, my point is following as.
> If chose_point is 1, it means root bonus is rather big. Right?
> If is is, your patch does second loop with completely ignore of bonus
> for root privileged process.
> My point is that let's not ignore bonus completely. Instead of it,
> let's recalculate 1.5% for example.

1) unpriviledged process can't get score 1 (because at least a process need one
   anon, one file and two or more ptes).
2) then, score=1 mean all processes in the system are privileged. thus decay won't help.

IOW, never happen privileged and unprivileged score in this case.


> 
> But I don't insist on my idea.
> Thanks.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  9:09               ` KOSAKI Motohiro
@ 2011-05-24  9:20                 ` Minchan Kim
  -1 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-24  9:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

On Tue, May 24, 2011 at 6:09 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>>>> Hmm, If I understand your code correctly, unprivileged process can get
>>>> a score 1 by 3% bonus.
>>>
>>> 3% bonus is for privileged process. :)
>>
>> OMG. Typo.
>> Anyway, my point is following as.
>> If chose_point is 1, it means root bonus is rather big. Right?
>> If is is, your patch does second loop with completely ignore of bonus
>> for root privileged process.
>> My point is that let's not ignore bonus completely. Instead of it,
>> let's recalculate 1.5% for example.
>
> 1) unpriviledged process can't get score 1 (because at least a process need one
>   anon, one file and two or more ptes).
> 2) then, score=1 mean all processes in the system are privileged. thus decay won't help.
>
> IOW, never happen privileged and unprivileged score in this case.

I am blind. Thanks for open my eyes, KOSAKI.


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  9:20                 ` Minchan Kim
  0 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-05-24  9:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

On Tue, May 24, 2011 at 6:09 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>>>> Hmm, If I understand your code correctly, unprivileged process can get
>>>> a score 1 by 3% bonus.
>>>
>>> 3% bonus is for privileged process. :)
>>
>> OMG. Typo.
>> Anyway, my point is following as.
>> If chose_point is 1, it means root bonus is rather big. Right?
>> If is is, your patch does second loop with completely ignore of bonus
>> for root privileged process.
>> My point is that let's not ignore bonus completely. Instead of it,
>> let's recalculate 1.5% for example.
>
> 1) unpriviledged process can't get score 1 (because at least a process need one
>   anon, one file and two or more ptes).
> 2) then, score=1 mean all processes in the system are privileged. thus decay won't help.
>
> IOW, never happen privileged and unprivileged score in this case.

I am blind. Thanks for open my eyes, KOSAKI.


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  9:20                 ` Minchan Kim
@ 2011-05-24  9:38                   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  9:38 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

(2011/05/24 18:20), Minchan Kim wrote:
> On Tue, May 24, 2011 at 6:09 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
>>>>> Hmm, If I understand your code correctly, unprivileged process can get
>>>>> a score 1 by 3% bonus.
>>>>
>>>> 3% bonus is for privileged process. :)
>>>
>>> OMG. Typo.
>>> Anyway, my point is following as.
>>> If chose_point is 1, it means root bonus is rather big. Right?
>>> If is is, your patch does second loop with completely ignore of bonus
>>> for root privileged process.
>>> My point is that let's not ignore bonus completely. Instead of it,
>>> let's recalculate 1.5% for example.
>>
>> 1) unpriviledged process can't get score 1 (because at least a process need one
>>   anon, one file and two or more ptes).
>> 2) then, score=1 mean all processes in the system are privileged. thus decay won't help.
>>
>> IOW, never happen privileged and unprivileged score in this case.
> 
> I am blind. Thanks for open my eyes, KOSAKI.

No. Your review is very cute. Thank you for attempting this!




^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-24  9:38                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-24  9:38 UTC (permalink / raw)
  To: minchan.kim
  Cc: linux-mm, linux-kernel, akpm, caiqian, rientjes, hughd,
	kamezawa.hiroyu, oleg

(2011/05/24 18:20), Minchan Kim wrote:
> On Tue, May 24, 2011 at 6:09 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
>>>>> Hmm, If I understand your code correctly, unprivileged process can get
>>>>> a score 1 by 3% bonus.
>>>>
>>>> 3% bonus is for privileged process. :)
>>>
>>> OMG. Typo.
>>> Anyway, my point is following as.
>>> If chose_point is 1, it means root bonus is rather big. Right?
>>> If is is, your patch does second loop with completely ignore of bonus
>>> for root privileged process.
>>> My point is that let's not ignore bonus completely. Instead of it,
>>> let's recalculate 1.5% for example.
>>
>> 1) unpriviledged process can't get score 1 (because at least a process need one
>>   anon, one file and two or more ptes).
>> 2) then, score=1 mean all processes in the system are privileged. thus decay won't help.
>>
>> IOW, never happen privileged and unprivileged score in this case.
> 
> I am blind. Thanks for open my eyes, KOSAKI.

No. Your review is very cute. Thank you for attempting this!



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-24  2:03               ` KOSAKI Motohiro
@ 2011-05-25 23:50                 ` David Rientjes
  -1 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-25 23:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Tue, 24 May 2011, KOSAKI Motohiro wrote:

> > I don't care if it happens in the usual case or extremely rare case.  It
> > significantly increases the amount of time that tasklist_lock is held
> > which causes writelock starvation on other cpus and causes issues,
> > especially if the cpu being starved is updating the timer because it has
> > irqs disabled, i.e. write_lock_irq(&tasklist_lock) usually in the clone or
> > exit path.  We can do better than that, and that's why I proposed my patch
> > to CAI that increases the resolution of the scoring and makes the root
> > process bonus proportional to the amount of used memory.
> 
> Do I need to say the same word? Please read the code at first.
> 

I'm afraid that a second time through the tasklist in select_bad_process() 
is simply a non-starter for _any_ case; it significantly increases the 
amount of time that tasklist_lock is held and causes problems elsewhere on 
large systems -- such as some of ours -- since irqs are disabled while 
waiting for the writeside of the lock.  I think it would be better to use 
a proportional privilege for root processes based on the amount of memory 
they are using (discounting 1% of memory per 10% of memory used, as 
proposed earlier, seems sane) so we can always protect root when necessary 
and never iterate through the list again.

Please look into the earlier review comments on the other patches, refresh 
the series, and post it again.  Thanks!

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-25 23:50                 ` David Rientjes
  0 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-25 23:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Tue, 24 May 2011, KOSAKI Motohiro wrote:

> > I don't care if it happens in the usual case or extremely rare case.  It
> > significantly increases the amount of time that tasklist_lock is held
> > which causes writelock starvation on other cpus and causes issues,
> > especially if the cpu being starved is updating the timer because it has
> > irqs disabled, i.e. write_lock_irq(&tasklist_lock) usually in the clone or
> > exit path.  We can do better than that, and that's why I proposed my patch
> > to CAI that increases the resolution of the scoring and makes the root
> > process bonus proportional to the amount of used memory.
> 
> Do I need to say the same word? Please read the code at first.
> 

I'm afraid that a second time through the tasklist in select_bad_process() 
is simply a non-starter for _any_ case; it significantly increases the 
amount of time that tasklist_lock is held and causes problems elsewhere on 
large systems -- such as some of ours -- since irqs are disabled while 
waiting for the writeside of the lock.  I think it would be better to use 
a proportional privilege for root processes based on the amount of memory 
they are using (discounting 1% of memory per 10% of memory used, as 
proposed earlier, seems sane) so we can always protect root when necessary 
and never iterate through the list again.

Please look into the earlier review comments on the other patches, refresh 
the series, and post it again.  Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-23 22:48       ` David Rientjes
@ 2011-05-26  7:08         ` CAI Qian
  -1 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-26  7:08 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, hughd, kamezawa hiroyu,
	minchan kim, oleg, KOSAKI Motohiro



----- Original Message -----
> On Mon, 23 May 2011, David Rientjes wrote:
> 
> > I already suggested an alternative patch to CAI Qian to greatly
> > increase
> > the granularity of the oom score from a range of 0-1000 to 0-10000
> > to
> > differentiate between tasks within 0.01% of available memory (16MB
> > on CAI
> > Qian's 16GB system). I'll propose this officially in a separate
> > email.
> >
> 
> This is an alternative patch as earlier proposed with suggested
> improvements from Minchan. CAI, would it be possible to test this out
> on
> your usecase?
Here is the results for the testing. Running the reproducer as non-root
user, the results look good as OOM killer just killed each python process
in-turn that the reproducer forked. However, when running it as root
user, sshd and other random processes had been killed.

[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[  567]     0   567     2935      365   0     -17         -1000 udevd
[ 2116]     0  2116     3099      464   8     -17         -1000 udevd
[ 2117]     0  2117     3099      503   2     -17         -1000 udevd
[ 2317]     0  2317     6404       39   8     -17         -1000 auditd
[ 3221]     0  3221    15998      153   9       0             0 sshd
[ 3223]     0  3223    24421      204   0       0             0 sshd
[ 3227]     0  3227    27093       86   4       0             0 bash
[ 3246]     0  3246     1029       18   1       0             0 agetty
[ 3251]     0  3251   243710    98227  11       0             0 python
[ 3252]     0  3252   243710   109999   9       0             0 python
[ 3253]     0  3253   243710   111538  12       0             0 python
[ 3254]     0  3254   243710   106931   1       0             0 python
[ 3255]     0  3255   243710   103367   9       0             0 python
[ 3256]     0  3256   243710    97715   1       0             0 python
[ 3257]     0  3257   243710   107443   9       0             0 python
[ 3258]     0  3258   243710   101298   4       0             0 python
[ 3259]     0  3259   243710   118707   1       0             0 python
[ 3260]     0  3260   243710   104882   9       0             0 python
[ 3261]     0  3261   243710   108979  12       0             0 python
[ 3262]     0  3262   243710    93106   1       0             0 python
[ 3263]     0  3263   243710    97714  12       0             0 python
[ 3264]     0  3264   243710    91571  12       0             0 python
[ 3265]     0  3265   243710    93107   1       0             0 python
[ 3266]     0  3266   243710    83790   9       0             0 python
[ 3267]     0  3267   243710    81330   5       0             0 python
[ 3268]     0  3268   243710    83378   5       0             0 python
[ 3269]     0  3269   243710    77235   4       0             0 python
[ 3270]     0  3270   243710    80732   1       0             0 python
[ 3271]     0  3271   243710    72626  11       0             0 python
[ 3272]     0  3272   243710    81385   7       0             0 python
[ 3273]     0  3273   243710    71749   3       0             0 python
[ 3274]     0  3274   243710    70735   1       0             0 python
[ 3275]     0  3275   243710    84403   9       0             0 python
[ 3276]     0  3276   243710    72255  13       0             0 python
[ 3277]     0  3277   243710    65971   3       0             0 python
[ 3278]     0  3278   243710    66172  15       0             0 python
[ 3279]     0  3279   243710    69555   1       0             0 python
[ 3280]     0  3280   243710    68689   9       0             0 python
[ 3281]     0  3281   243710    69553   9       0             0 python
[ 3282]     0  3282   243710    64439   6       0             0 python
[ 3283]     0  3283   243710    56753  11       0             0 python
[ 3284]     0  3284   243710    57917   6       0             0 python
[ 3285]     0  3285   243710    55730   9       0             0 python
[ 3286]     0  3286   243710    54193   9       0             0 python
[ 3287]     0  3287   243710    51123   1       0             0 python
[ 3288]     0  3288   243710    52146  15       0             0 python
[ 3289]     0  3289   243710    48220   9       0             0 python
[ 3290]     0  3290   243710    48051   3       0             0 python
[ 3291]     0  3291   243710    40371   3       0             0 python
[ 3292]     0  3292   243710    49229  13       0             0 python
[ 3293]     0  3293   243710    40549   9       0             0 python
[ 3294]     0  3294   243710    41618   5       0             0 python
[ 3295]     0  3295   243710    40429   9       0             0 python
[ 3296]     0  3296   243710    36787   1       0             0 python
[ 3297]     0  3297   243710    39346  11       0             0 python
[ 3298]     0  3298   243710    35251   3       0             0 python
[ 3299]     0  3299   243710    32872   3       0             0 python
[ 3300]     0  3300   243710    29781   1       0             0 python
[ 3301]     0  3301   243710    27570  11       0             0 python
[ 3302]     0  3302   243710    28081   9       0             0 python
[ 3303]     0  3303   243710    24499   1       0             0 python
[ 3304]     0  3304   243710    21427   1       0             0 python
[ 3305]     0  3305   243710    25522   9       0             0 python
[ 3306]     0  3306   243710    28081   9       0             0 python
[ 3307]     0  3307   243710    21939   9       0             0 python
[ 3308]     0  3308   243710    19890   9       0             0 python
[ 3309]     0  3309   243710    18354   3       0             0 python
[ 3310]     0  3310   243710    16590  14       0             0 python
[ 3311]     0  3311   243710    18718  11       0             0 python
[ 3312]     0  3312   243710    17841   1       0             0 python
[ 3313]     0  3313   243710    14258  11       0             0 python
[ 3314]     0  3314   243710    14426   4       0             0 python
[ 3315]     0  3315   243710    15282   6       0             0 python
[ 3316]     0  3316   243710     9650   6       0             0 python
[ 3317]     0  3317   243710    11699   1       0             0 python
[ 3318]     0  3318   243710    11372   3       0             0 python
[ 3319]     0  3319   243710     9650   9       0             0 python
[ 3320]     0  3320   243710     8426  11       0             0 python
[ 3321]     0  3321   243710     4531   3       0             0 python
[ 3322]     0  3322   243710     8627   9       0             0 python
[ 3323]     0  3323   243710     6578   1       0             0 python
[ 3324]     0  3324   243710     5553   7       0             0 python
[ 3325]     0  3325   243710    10673   3       0             0 python
[ 3326]     0  3326   243710     6578  11       0             0 python
[ 3327]     0  3327   243710     3505   1       0             0 python
[ 3328]     0  3328   243710     3530   1       0             0 python
[ 3329]     0  3329   243710     5205  11       0             0 python
[ 3330]     0  3330   243710     1970   9       0             0 python
[ 3331]     0  3331   243710     4021  11       0             0 python
[ 3332]     0  3332   243710     5043   1       0             0 python
[ 3333]     0  3333   243710     2481   1       0             0 python
[ 3334]     0  3334   243710     4530   1       0             0 python
[ 3343]     0  3343    41835      773   9       0             0 python
[ 3344]     0  3344    41835      773   4       0             0 python
[ 3345]     0  3345    41835      773   1       0             0 python
[ 3346]     0  3346    41835      773   1       0             0 python
[ 3347]     0  3347    41835      773   9       0             0 python
[ 3348]     0  3348    41835      773   3       0             0 python
[ 3349]     0  3349    41835      773   1       0             0 python
[ 3350]     0  3350    41835      773  11       0             0 python
Out of memory: Kill process 3221 (sshd) score 1 or sacrifice child
Killed process 3223 (sshd) total-vm:97684kB, anon-rss:816kB, file-rss:0kB
sshd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
sshd cpuset=/ mems_allowed=0-1

[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[  567]     0   567     2935        0   0     -17         -1000 udevd
[ 2103]     0  2103     1025        0   9       0             0 mingetty
[ 2105]     0  2105     1025        0   2       0             0 mingetty
[ 2109]     0  2109    19263      100   0       0             0 login
[ 2116]     0  2116     3099        0   8     -17         -1000 udevd
[ 2117]     0  2117     3099        0   2     -17         -1000 udevd
[ 2317]     0  2317     6404       20   0     -17         -1000 auditd
[ 2338]     0  2338    27093       11  10       0             0 bash
[ 2358]     0  2358   245248     8337   6       0             0 python
[ 2359]     0  2359   245248    11151   6       0             0 python
[ 2360]     0  2360   245248    12487  10       0             0 python
[ 2361]     0  2361   245248    11702   9       0             0 python
[ 2362]     0  2362   245248     6751   1       0             0 python
[ 2363]     0  2363   245248    10952   2       0             0 python
[ 2364]     0  2364   245248    12113   1       0             0 python
[ 2365]     0  2365   245248    11258   9       0             0 python
[ 2366]     0  2366   245248     9697  10       0             0 python
[ 2367]     0  2367   245248    12453   2       0             0 python
[ 2368]     0  2368   245248    14357  10       0             0 python
[ 2369]     0  2369   245248    11282  10       0             0 python
[ 2370]     0  2370   245248    11138   0       0             0 python
[ 2371]     0  2371   245248    10615  13       0             0 python
[ 2372]     0  2372   245248    10742   2       0             0 python
[ 2373]     0  2373   245248     9024   7       0             0 python
[ 2374]     0  2374   245248    12176  12       0             0 python
[ 2375]     0  2375   245248    13886  10       0             0 python
[ 2376]     0  2376   245248    10974   5       0             0 python
[ 2377]     0  2377   245248     8416  11       0             0 python
[ 2378]     0  2378   245248     9469  11       0             0 python
[ 2379]     0  2379   245248    11312  13       0             0 python
[ 2380]     0  2380   245248     9317   1       0             0 python
[ 2381]     0  2381   245248    10424   0       0             0 python
[ 2382]     0  2382   245248    15806   1       0             0 python
[ 2383]     0  2383   245248    15340   7       0             0 python
[ 2384]     0  2384   245248     7932   9       0             0 python
[ 2385]     0  2385   245248    10420   0       0             0 python
[ 2386]     0  2386   245248    14376   9       0             0 python
[ 2387]     0  2387   245248    12410   2       0             0 python
[ 2388]     0  2388   245248    14596   9       0             0 python
[ 2389]     0  2389   245248     7898   9       0             0 python
[ 2390]     0  2390   245248    10943  10       0             0 python
[ 2391]     0  2391   245248     8787   2       0             0 python
[ 2392]     0  2392   245248     7252  10       0             0 python
[ 2393]     0  2393   245248    12978  15       0             0 python
[ 2394]     0  2394   245248     7034  11       0             0 python
[ 2395]     0  2395   245248    10903   2       0             0 python
[ 2396]     0  2396   245248    10280  10       0             0 python
[ 2397]     0  2397   245248    10793   9       0             0 python
[ 2398]     0  2398   245248     8205   9       0             0 python
[ 2399]     0  2399   245248     9675   0       0             0 python
[ 2400]     0  2400   245248    11304   5       0             0 python
[ 2401]     0  2401   245248    15053   5       0             0 python
[ 2402]     0  2402   245248    14449  10       0             0 python
[ 2403]     0  2403   245248     8466   1       0             0 python
[ 2404]     0  2404   245248    14250  10       0             0 python
[ 2405]     0  2405   245248    11630   9       0             0 python
[ 2406]     0  2406   245248     9562   9       0             0 python
[ 2407]     0  2407   245248     8802   1       0             0 python
[ 2408]     0  2408   245248     9521   1       0             0 python
[ 2409]     0  2409   245248     4827  13       0             0 python
[ 2410]     0  2410   245248    10364   1       0             0 python
[ 2411]     0  2411   245248     8749   0       0             0 python
[ 2412]     0  2412   245248    15082   0       0             0 python
[ 2413]     0  2413   245248    11023  10       0             0 python
[ 2414]     0  2414   245248     9087   1       0             0 python
[ 2415]     0  2415   245248     9906   2       0             0 python
[ 2416]     0  2416   245248    13862   5       0             0 python
[ 2417]     0  2417   245248     9553   2       0             0 python
[ 2418]     0  2418   245248     8556  13       0             0 python
[ 2419]     0  2419   245248     9246   9       0             0 python
[ 2420]     0  2420   245248    11084   2       0             0 python
[ 2421]     0  2421   245248    16256   2       0             0 python
[ 2422]     0  2422   245248    13057  12       0             0 python
[ 2423]     0  2423   245248    10578   7       0             0 python
[ 2424]     0  2424   245248    10407   3       0             0 python
[ 2425]     0  2425   245248    10329   3       0             0 python
[ 2426]     0  2426   245248     9489   9       0             0 python
[ 2427]     0  2427   245248    10004   3       0             0 python
[ 2428]     0  2428   245248     7411   0       0             0 python
[ 2429]     0  2429   245248    13647   1       0             0 python
[ 2430]     0  2430   245248    10134   2       0             0 python
[ 2431]     0  2431   245248    12157  10       0             0 python
[ 2432]     0  2432   245248    11158   1       0             0 python
[ 2433]     0  2433   245248     9829  14       0             0 python
[ 2434]     0  2434   245248     5859   3       0             0 python
[ 2435]     0  2435   245248    11456   9       0             0 python
[ 2436]     0  2436   245248    12754   3       0             0 python
[ 2437]     0  2437   245248    11098   0       0             0 python
[ 2438]     0  2438   245248    10676   0       0             0 python
[ 2439]     0  2439   245248     9105   2       0             0 python
[ 2440]     0  2440   245248    10539  10       0             0 python
[ 2441]     0  2441   245248    11514  10       0             0 python
[ 2442]     0  2442   245248    10019   4       0             0 python
[ 2443]     0  2443   245248     7545  14       0             0 python
[ 2444]     0  2444   245248    11830  10       0             0 python
[ 2445]     0  2445   245248     4708  10       0             0 python
[ 2446]     0  2446   245248     8227  10       0             0 python
[ 2447]     0  2447   245248     6306  10       0             0 python
[ 2448]     0  2448   245248     8888   0       0             0 python
[ 2449]     0  2449   245248    11337   3       0             0 python
[ 2450]     0  2450   245248     4856   0       0             0 python
[ 2451]     0  2451   245248    12369   0       0             0 python
[ 2452]     0  2452   245248    11077  10       0             0 python
[ 2453]     0  2453   245248     6757   0       0             0 python
[ 2454]     0  2454   245248     6785  10       0             0 python
[ 2455]     0  2455   245248     6532   3       0             0 python
[ 2456]     0  2456   245248     6265   9       0             0 python
[ 2457]     0  2457   245248     8126   3       0             0 python
[ 2458]     0  2458   245248     9573  10       0             0 python
[ 2459]     0  2459   245248     6954  10       0             0 python
[ 2460]     0  2460   245248     7539   3       0             0 python
[ 2461]     0  2461   245248     7623   0       0             0 python
[ 2462]     0  2462   245248     4853   2       0             0 python
[ 2463]     0  2463   245248     9488  10       0             0 python
[ 2464]     0  2464   245248     6415   0       0             0 python
[ 2465]     0  2465   245248     9745   1       0             0 python
[ 2466]     0  2466   245248     7332   3       0             0 python
[ 2467]     0  2467   245248     7408  11       0             0 python
[ 2468]     0  2468   245248     8311   0       0             0 python
[ 2469]     0  2469   245248     6963   0       0             0 python
[ 2470]     0  2470   245248     8620  10       0             0 python
[ 2471]     0  2471   245248     5799  10       0             0 python
[ 2472]     0  2472   245248    12855  10       0             0 python
[ 2473]     0  2473   245248     8718   9       0             0 python
[ 2474]     0  2474   245248     6782   2       0             0 python
[ 2475]     0  2475   245248     9566   0       0             0 python
[ 2476]     0  2476   245248     8083   9       0             0 python
[ 2477]     0  2477   245248     8657  10       0             0 python
[ 2478]     0  2478   245248     8997   9       0             0 python
[ 2479]     0  2479   245248     6539  11       0             0 python
[ 2480]     0  2480   245248     8906   9       0             0 python
[ 2481]     0  2481   245248     8916  11       0             0 python
[ 2482]     0  2482   245248     8083   0       0             0 python
[ 2483]     0  2483   245248     9490   7       0             0 python
[ 2484]     0  2484   245248     8123   0       0             0 python
[ 2485]     0  2485   245248     7315  11       0             0 python
[ 2486]     0  2486   245248     9084   4       0             0 python
[ 2487]     0  2487   245248     8036  15       0             0 python
[ 2488]     0  2488   245248     6839   2       0             0 python
[ 2489]     0  2489   245248     9478  11       0             0 python
[ 2490]     0  2490   245248    11535  11       0             0 python
[ 2491]     0  2491   245248     7895   2       0             0 python
[ 2492]     0  2492   245248     8831   0       0             0 python
[ 2493]     0  2493   245248     9219   0       0             0 python
[ 2494]     0  2494   245248     8472  11       0             0 python
[ 2495]     0  2495   245248     6666   1       0             0 python
[ 2496]     0  2496   245248     4875  11       0             0 python
[ 2497]     0  2497   245248     6802  11       0             0 python
[ 2498]     0  2498   245248     4901   9       0             0 python
[ 2499]     0  2499   245248     8510  11       0             0 python
[ 2500]     0  2500   245248     8620  15       0             0 python
[ 2501]     0  2501   245248     7169  10       0             0 python
[ 2502]     0  2502   245248     6283   0       0             0 python
[ 2503]     0  2503   245248     9497   0       0             0 python
[ 2504]     0  2504   245248    10091   2       0             0 python
[ 2505]     0  2505   245248    11700   0       0             0 python
[ 2506]     0  2506   245248     8353   3       0             0 python
[ 2507]     0  2507   245248     8505   2       0             0 python
[ 2508]     0  2508   245248    10486   0       0             0 python
[ 2509]     0  2509   245248     6641   3       0             0 python
[ 2510]     0  2510   245248     7175  10       0             0 python
[ 2511]     0  2511   245248    10100   9       0             0 python
[ 2512]     0  2512   245248     6984  13       0             0 python
[ 2513]     0  2513   245248     7677  13       0             0 python
[ 2514]     0  2514   245248     7645  11       0             0 python
[ 2515]     0  2515   245248     8854   4       0             0 python
[ 2516]     0  2516   245248     6888   0       0             0 python
[ 2517]     0  2517   245248     6297  11       0             0 python
[ 2518]     0  2518   245248     8011  11       0             0 python
[ 2519]     0  2519   245248     6353  10       0             0 python
[ 2520]     0  2520   245248     5168   9       0             0 python
[ 2521]     0  2521   245248     7274  11       0             0 python
[ 2522]     0  2522   245248     6374  11       0             0 python
[ 2523]     0  2523   245248     9404   1       0             0 python
[ 2524]     0  2524   245248     7486   0       0             0 python
[ 2525]     0  2525   245248     7290  10       0             0 python
[ 2526]     0  2526   245248     5940   0       0             0 python
[ 2527]     0  2527   245248     7999  10       0             0 python
[ 2528]     0  2528   245248     8201   0       0             0 python
[ 2529]     0  2529   245248     8065   0       0             0 python
[ 2530]     0  2530   245248     6452   9       0             0 python
[ 2531]     0  2531   245248     6162  11       0             0 python
[ 2532]     0  2532   245248     6808   0       0             0 python
[ 2533]     0  2533   245248     4331   2       0             0 python
[ 2534]     0  2534   245248     6458   0       0             0 python
[ 2535]     0  2535   245248     3250   0       0             0 python
[ 2536]     0  2536   245248     5289   9       0             0 python
[ 2537]     0  2537   245248     9369  13       0             0 python
[ 2538]     0  2538   245248     9187  15       0             0 python
[ 2539]     0  2539   245248     8274   0       0             0 python
[ 2540]     0  2540   245248     8051   2       0             0 python
[ 2541]     0  2541   245248     4732   4       0             0 python
[ 2542]     0  2542   245248     4662   0       0             0 python
[ 2543]     0  2543   245248    12070   0       0             0 python
[ 2546]     0  2546   245248     6923   4       0             0 python
[ 2547]     0  2547   245248     4550   0       0             0 python
[ 2548]     0  2548   245248     4700  12       0             0 python
[ 2549]     0  2549   245248     5822  11       0             0 python
[ 2550]     0  2550   245248     6179  10       0             0 python
[ 2551]     0  2551   245248     7794   0       0             0 python
[ 2552]     0  2552   245248     6456  10       0             0 python
[ 2553]     0  2553   245248     4932   4       0             0 python
[ 2554]     0  2554   245248     7680  11       0             0 python
[ 2555]     0  2555   245248     1642  10       0             0 python
[ 2556]     0  2556   245248     7480  10       0             0 python
[ 2557]     0  2557   245248     3598   0       0             0 python
[ 2558]     0  2558   245248     7949   0       0             0 python
[ 2559]     0  2559   245248     4294   0       0             0 python
[ 2560]     0  2560   245248     5138   0       0             0 python
[ 2561]     0  2561   245248    11045   9       0             0 python
[ 2562]     0  2562   245248     4290   9       0             0 python
[ 2563]     0  2563   245248     7603   0       0             0 python
[ 2564]     0  2564   245248     8683  12       0             0 python
[ 2565]     0  2565   245248     6409  12       0             0 python
[ 2566]     0  2566   245248     8321   9       0             0 python
[ 2567]     0  2567   245248     7416   0       0             0 python
[ 2568]     0  2568   245248     5272   2       0             0 python
[ 2569]     0  2569   245248     7359  10       0             0 python
[ 2570]     0  2570   245248     4641   9       0             0 python
[ 2571]     0  2571   245248     7698   2       0             0 python
[ 2572]     0  2572   245248     6118  11       0             0 python
[ 2573]     0  2573   245248     4822   0       0             0 python
[ 2574]     0  2574   245248     4745   0       0             0 python
[ 2575]     0  2575   245248     8029   0       0             0 python
[ 2576]     0  2576   245248     6350   9       0             0 python
[ 2577]     0  2577   245248     5537   9       0             0 python
[ 2578]     0  2578   245248     6861   3       0             0 python
[ 2579]     0  2579   245248     5632   4       0             0 python
[ 2580]     0  2580   245248     6023   0       0             0 python
[ 2581]     0  2581   245248     7947  11       0             0 python
[ 2582]     0  2582   245248     6752   9       0             0 python
[ 2583]     0  2583   245248     4282  12       0             0 python
[ 2584]     0  2584   245248     6069   4       0             0 python
[ 2585]     0  2585   245248     5472  11       0             0 python
[ 2586]     0  2586   245248     4729   0       0             0 python
[ 2587]     0  2587   245248     8205   0       0             0 python
[ 2588]     0  2588   245248     6234  10       0             0 python
[ 2589]     0  2589   245248     7687  11       0             0 python
[ 2590]     0  2590   245248     8817  11       0             0 python
[ 2591]     0  2591   245248     5784  11       0             0 python
[ 2592]     0  2592   245248     7518  10       0             0 python
[ 2593]     0  2593   245248     7213  12       0             0 python
[ 2594]     0  2594   245248     9752   3       0             0 python
[ 2595]     0  2595   245248     7039   0       0             0 python
[ 2596]     0  2596   245248     8164   0       0             0 python
[ 2597]     0  2597   245248     4113  11       0             0 python
[ 2598]     0  2598   245248     4153   0       0             0 python
[ 2599]     0  2599   245248     6651  11       0             0 python
[ 2600]     0  2600   245248     3933   9       0             0 python
[ 2601]     0  2601   245248     7722  14       0             0 python
[ 2602]     0  2602   245248     7535   4       0             0 python
[ 2603]     0  2603   245248     4903   2       0             0 python
[ 2604]     0  2604   245248     5542   0       0             0 python
[ 2605]     0  2605   245248     4589  10       0             0 python
[ 2606]     0  2606   245248     7672   2       0             0 python
[ 2607]     0  2607   245248     6656   2       0             0 python
[ 2608]     0  2608   245248     6467   2       0             0 python
[ 2609]     0  2609   245248     8780   0       0             0 python
[ 2610]     0  2610   245248    11257   0       0             0 python
[ 2611]     0  2611   245248     6748   0       0             0 python
[ 2612]     0  2612   245248     8885  11       0             0 python
[ 2613]     0  2613   245248     4232   0       0             0 python
[ 2614]     0  2614   245248     5724  11       0             0 python
[ 2615]     0  2615   245248     2842  11       0             0 python
[ 2616]     0  2616   245248     4994  15       0             0 python
[ 2617]     0  2617   245248     5417  11       0             0 python
[ 2618]     0  2618   245248     4660   0       0             0 python
[ 2619]     0  2619   245248     5655  11       0             0 python
[ 2620]     0  2620   245248     5952   0       0             0 python
[ 2621]     0  2621   245248     6983  11       0             0 python
[ 2622]     0  2622   245248     6066  12       0             0 python
[ 2623]     0  2623   245248     7743  11       0             0 python
[ 2624]     0  2624   245248     3138  11       0             0 python
[ 2625]     0  2625   245248     6144   0       0             0 python
[ 2626]     0  2626   245248     5238   9       0             0 python
[ 2627]     0  2627   245248     9371  11       0             0 python
[ 2628]     0  2628   245248    13048  10       0             0 python
[ 2629]     0  2629   245248     6702   3       0             0 python
[ 2630]     0  2630   245248     5319  10       0             0 python
[ 2631]     0  2631   245248     7964   0       0             0 python
[ 2632]     0  2632   245248     5787  14       0             0 python
[ 2633]     0  2633   245248     9816   0       0             0 python
[ 2634]     0  2634   245248     5415   6       0             0 python
[ 2635]     0  2635   245248     6740   3       0             0 python
[ 2636]     0  2636   245248    10180   3       0             0 python
[ 2637]     0  2637   245248     5007  11       0             0 python
[ 2638]     0  2638   245248     5801   9       0             0 python
[ 2639]     0  2639   245248     7823   3       0             0 python
[ 2640]     0  2640   245248     9127   0       0             0 python
[ 2641]     0  2641   245248     5614   0       0             0 python
[ 2642]     0  2642   245248     4686  10       0             0 python
[ 2643]     0  2643   245248     4305  11       0             0 python
[ 2644]     0  2644   245248     4714   2       0             0 python
[ 2645]     0  2645   245248     5964  11       0             0 python
[ 2646]     0  2646   245248     7440  10       0             0 python
[ 2647]     0  2647   245248     6062   4       0             0 python
[ 2648]     0  2648   245248     5733   6       0             0 python
[ 2649]     0  2649   245248     5063   0       0             0 python
[ 2650]     0  2650   245248     4793   2       0             0 python
[ 2651]     0  2651   245248     5806   4       0             0 python
[ 2652]     0  2652   245248     8126  10       0             0 python
[ 2653]     0  2653   245248     5794   3       0             0 python
[ 2654]     0  2654   245248     4370  12       0             0 python
[ 2655]     0  2655   245248     5621   0       0             0 python
[ 2656]     0  2656   245248     6514  11       0             0 python
[ 2657]     0  2657   245248     6560   3       0             0 python
[ 2658]     0  2658   245248     7352   2       0             0 python
[ 2659]     0  2659   245248     4456   0       0             0 python
[ 2660]     0  2660   245248     6508   3       0             0 python
[ 2661]     0  2661   245248     4231   4       0             0 python
[ 2662]     0  2662   245248     5967   0       0             0 python
[ 2663]     0  2663   245248     5007   3       0             0 python
[ 2664]     0  2664   245248     5878   3       0             0 python
[ 2665]     0  2665   245248     7469  11       0             0 python
[ 2666]     0  2666   245248     4697   4       0             0 python
[ 2667]     0  2667   245248     3484  11       0             0 python
[ 2668]     0  2668   245248     4223   3       0             0 python
[ 2669]     0  2669   245248    10490  10       0             0 python
[ 2670]     0  2670   245248     3395   3       0             0 python
[ 2671]     0  2671   245248     7004  12       0             0 python
[ 2672]     0  2672   245248     6340   0       0             0 python
[ 2673]     0  2673   245248     3384   0       0             0 python
[ 2674]     0  2674   245248     5563   0       0             0 python
[ 2675]     0  2675   245248     4799  14       0             0 python
[ 2676]     0  2676   245248    10170  15       0             0 python
[ 2677]     0  2677   245248     4793  10       0             0 python
[ 2678]     0  2678   245248     6221   0       0             0 python
[ 2679]     0  2679   245248     4710  10       0             0 python
[ 2680]     0  2680   245248     6231   0       0             0 python
[ 2681]     0  2681   245248     3573   3       0             0 python
[ 2682]     0  2682   245248     3332   0       0             0 python
[ 2683]     0  2683   245248     6929   2       0             0 python
[ 2684]     0  2684   245248     6015  11       0             0 python
[ 2685]     0  2685   245248     5167  14       0             0 python
[ 2688]     0  2688   245248     5195   2       0             0 python
[ 2689]     0  2689   245248     5293   2       0             0 python
[ 2690]     0  2690   245248     4398  10       0             0 python
[ 2691]     0  2691   245248     4672  11       0             0 python
[ 2692]     0  2692   245248     5772   6       0             0 python
[ 2693]     0  2693   245248     4550   2       0             0 python
[ 2694]     0  2694   245248     6926   0       0             0 python
[ 2695]     0  2695   245248     3137   2       0             0 python
[ 2696]     0  2696   245248     4804  10       0             0 python
[ 2697]     0  2697   245248     7152   0       0             0 python
[ 2698]     0  2698   245248     3031   3       0             0 python
[ 2699]     0  2699   245248     6700   0       0             0 python
[ 2700]     0  2700   245248     4299   6       0             0 python
[ 2701]     0  2701   245248     3678   0       0             0 python
[ 2702]     0  2702   245248     4665   0       0             0 python
[ 2703]     0  2703   245248     5555   5       0             0 python
[ 2704]     0  2704   245248     5672   0       0             0 python
[ 2705]     0  2705   245248     3480   0       0             0 python
[ 2706]     0  2706   245248     4387  10       0             0 python
[ 2707]     0  2707   245248     4539   0       0             0 python
[ 2708]     0  2708   245248     3206  11       0             0 python
[ 2711]     0  2711   245248     6383  10       0             0 python
[ 2712]     0  2712   245248     6077   2       0             0 python
[ 2713]     0  2713   245248     4819   0       0             0 python
[ 2714]     0  2714   245248     6774   0       0             0 python
[ 2715]     0  2715   245248     4395   0       0             0 python
[ 2716]     0  2716   245248     9053  11       0             0 python
[ 2717]     0  2717   245248     8341   7       0             0 python
[ 2718]     0  2718   245248     4305   0       0             0 python
[ 2723]     0  2723  1027964      156   8       0             0 console-kit-dae
[ 2790]     0  2790    27092       54   4       0             0 bash
[ 2808]     0  2808   245248     4255  11       0             0 python
[ 2809]     0  2809   245248     7280   2       0             0 python
[ 2810]     0  2810   245248     5922  11       0             0 python
[ 2811]     0  2811   245248     4383   0       0             0 python
[ 2812]     0  2812   245248     4755  15       0             0 python
[ 2813]     0  2813   245248     6075  10       0             0 python
[ 2814]     0  2814   245248     4818   2       0             0 python
[ 2815]     0  2815   245248     4671   3       0             0 python
[ 2816]     0  2816   245248     5975   0       0             0 python
[ 2817]     0  2817   245248     4209   0       0             0 python
[ 2818]     0  2818   245248     5534  12       0             0 python
[ 2819]     0  2819   245248     2562   0       0             0 python
[ 2820]     0  2820   245248     4585   7       0             0 python
[ 2821]     0  2821   245248     6823  10       0             0 python
[ 2822]     0  2822   245248     5243  11       0             0 python
[ 2823]     0  2823   245248     7690   0       0             0 python
[ 2824]     0  2824   245248     5813  11       0             0 python
[ 2825]     0  2825   245248     3626   7       0             0 python
[ 2826]     0  2826   245248     4024   3       0             0 python
[ 2827]     0  2827   245248     6512   0       0             0 python
[ 2828]     0  2828   245248     4419   7       0             0 python
[ 2829]     0  2829   245248    13229   0       0             0 python
[ 2830]     0  2830   245248     2401   0       0             0 python
[ 2831]     0  2831   245248     2651  10       0             0 python
[ 2832]     0  2832   245248     4976   0       0             0 python
[ 2833]     0  2833   245248     6267  10       0             0 python
[ 2834]     0  2834   245248     3703  11       0             0 python
[ 2835]     0  2835   245248     4086   2       0             0 python
[ 2836]     0  2836   245248     6895  14       0             0 python
[ 2837]     0  2837   245248     3800  10       0             0 python
[ 2838]     0  2838   245248     8418  10       0             0 python
[ 2839]     0  2839   245248     3809  10       0             0 python
[ 2840]     0  2840   245248     2784  11       0             0 python
[ 2841]     0  2841   245248     3494   6       0             0 python
[ 2842]     0  2842   245248     4246   2       0             0 python
[ 2843]     0  2843   245248     5831   0       0             0 python
[ 2844]     0  2844   245248     7335   3       0             0 python
[ 2845]     0  2845   245248     5514   0       0             0 python
[ 2846]     0  2846   245248     6125   0       0             0 python
[ 2847]     0  2847   245248     5592  14       0             0 python
[ 2848]     0  2848   245248     5769   0       0             0 python
[ 2849]     0  2849   245248     4548   2       0             0 python
[ 2850]     0  2850   245248     7435   7       0             0 python
[ 2851]     0  2851   245248     6527   3       0             0 python
[ 2852]     0  2852   245248     3152   0       0             0 python
[ 2853]     0  2853   245248     5106   0       0             0 python
[ 2854]     0  2854   245248     5215  10       0             0 python
[ 2855]     0  2855   245248     4286   2       0             0 python
[ 2856]     0  2856   245248     6282   0       0             0 python
[ 2857]     0  2857   245248     3207  15       0             0 python
[ 2858]     0  2858   245248     5448  11       0             0 python
[ 2859]     0  2859   245248     3807  10       0             0 python
[ 2860]     0  2860   245248     3279  14       0             0 python
[ 2861]     0  2861   245248     4322   3       0             0 python
[ 2862]     0  2862   245248     4324   0       0             0 python
[ 2863]     0  2863   245248     3590  11       0             0 python
[ 2864]     0  2864   245248     7398   2       0             0 python
[ 2865]     0  2865   245248     5345   3       0             0 python
[ 2866]     0  2866   245248     5494   0       0             0 python
[ 2867]     0  2867   245248     5302   0       0             0 python
[ 2868]     0  2868   245248     6553   4       0             0 python
[ 2869]     0  2869   245248     4227   0       0             0 python
[ 2870]     0  2870   245248     4746  15       0             0 python
[ 2871]     0  2871   245248     5238   2       0             0 python
[ 2872]     0  2872   245248     4250  14       0             0 python
[ 2873]     0  2873   245248     7820   2       0             0 python
[ 2874]     0  2874   245248     3762   0       0             0 python
[ 2875]     0  2875   245248     4310   3       0             0 python
[ 2876]     0  2876   245248     3243   2       0             0 python
[ 2877]     0  2877   245248     3813  11       0             0 python
[ 2878]     0  2878   245248     5350  11       0             0 python
[ 2879]     0  2879   245248     5832  11       0             0 python
[ 2880]     0  2880   245248     4321   3       0             0 python
[ 2881]     0  2881   245248     4831   3       0             0 python
[ 2882]     0  2882   245248     3215   0       0             0 python
[ 2883]     0  2883   245248     2718   0       0             0 python
[ 2884]     0  2884   245248     5707   3       0             0 python
[ 2885]     0  2885   245248     4566   3       0             0 python
[ 2886]     0  2886   245248     5540   3       0             0 python
[ 2887]     0  2887   245248     6340   3       0             0 python
[ 2888]     0  2888   245248     4824   3       0             0 python
[ 2889]     0  2889   245248     4877  10       0             0 python
[ 2890]     0  2890   245248     3616   3       0             0 python
[ 2891]     0  2891   245248     3814   2       0             0 python
[ 2892]     0  2892   245248     4341   9       0             0 python
[ 2893]     0  2893   245248     5771   9       0             0 python
[ 2894]     0  2894   245248     3303   2       0             0 python
[ 2895]     0  2895   245248     4327  10       0             0 python
[ 2896]     0  2896   245248     2791   2       0             0 python
[ 2897]     0  2897   245248     4728   3       0             0 python
[ 2898]     0  2898   245248     4823   3       0             0 python
[ 2899]     0  2899   245248     4221   2       0             0 python
[ 2900]     0  2900   245248     3692  13       0             0 python
[ 2901]     0  2901   245248     7446   9       0             0 python
[ 2902]     0  2902   245248     3719  10       0             0 python
[ 2903]     0  2903   245248     6232   3       0             0 python
[ 2904]     0  2904   245248     4791   2       0             0 python
[ 2905]     0  2905   245248     6689   2       0             0 python
[ 2906]     0  2906   245248     6370   6       0             0 python
[ 2909]     0  2909   245248     3934   6       0             0 python
[ 2910]     0  2910   245248     2908  10       0             0 python
[ 2911]     0  2911   245248     2299  11       0             0 python
[ 2912]     0  2912   245248     5449   7       0             0 python
[ 2913]     0  2913   245248     3814   3       0             0 python
[ 2914]     0  2914   245248     3302  10       0             0 python
[ 2915]     0  2915   245248     4840   3       0             0 python
[ 2916]     0  2916   245248     3236   6       0             0 python
[ 2917]     0  2917   245248     4037  11       0             0 python
[ 2918]     0  2918   245248     2266  11       0             0 python
[ 2919]     0  2919   245248     2786   3       0             0 python
[ 2920]     0  2920   245248     8194  11       0             0 python
[ 2921]     0  2921   245248     2247  10       0             0 python
[ 2922]     0  2922   245248     4847   1       0             0 python
[ 2923]     0  2923   245248     3302   1       0             0 python
[ 2924]     0  2924   245248     3940   1       0             0 python
[ 2925]     0  2925   245248     4866   2       0             0 python
[ 2926]     0  2926   245248     3301   1       0             0 python
[ 2927]     0  2927   245248     1462  10       0             0 python
[ 2928]     0  2928   245248     1829   2       0             0 python
[ 2929]     0  2929   245248     4283   1       0             0 python
[ 2930]     0  2930   245248     3398   2       0             0 python
[ 2931]     0  2931   245248     7905   1       0             0 python
[ 2932]     0  2932   245248     4302   2       0             0 python
[ 2933]     0  2933   245248     2885   2       0             0 python
[ 2934]     0  2934   245248     6637   2       0             0 python
[ 2935]     0  2935   245248     2876  11       0             0 python
[ 2936]     0  2936   245248     3719   3       0             0 python
[ 2937]     0  2937   245248     2768   1       0             0 python
[ 2938]     0  2938   245248     1984  11       0             0 python
[ 2939]     0  2939   245248     2280  15       0             0 python
[ 2940]     0  2940   245248     1767   1       0             0 python
[ 2941]     0  2941   245248     3816  10       0             0 python
[ 2942]     0  2942   245248     2790   3       0             0 python
[ 2943]     0  2943   245248     3831   3       0             0 python
[ 2944]     0  2944   245248     3813   9       0             0 python
[ 2945]     0  2945   245248     4326  14       0             0 python
[ 2946]     0  2946   245248     2793   6       0             0 python
[ 2947]     0  2947   245248     4247   9       0             0 python
[ 2948]     0  2948   245248     3304   2       0             0 python
[ 2949]     0  2949   245248     4391   3       0             0 python
[ 2950]     0  2950   245248     3810  15       0             0 python
[ 2951]     0  2951   245248     2293  10       0             0 python
[ 2952]     0  2952   245248     4311   3       0             0 python
[ 2953]     0  2953   245248     4378   2       0             0 python
[ 2954]     0  2954   245248     4086   2       0             0 python
[ 2955]     0  2955   245248     2982   3       0             0 python
[ 2956]     0  2956   245248     2287   9       0             0 python
[ 2957]     0  2957   245248     5347  10       0             0 python
[ 2958]     0  2958   245248     5331  11       0             0 python
[ 2959]     0  2959   245248     1307   3       0             0 python
[ 2960]     0  2960   245248     4327  10       0             0 python
[ 2961]     0  2961   245248     3236   9       0             0 python
[ 2962]     0  2962   245248     3681   9       0             0 python
[ 2963]     0  2963   245248     3304   1       0             0 python
[ 2964]     0  2964   245248     3298  11       0             0 python
[ 2965]     0  2965   245248     5123  14       0             0 python
[ 2966]     0  2966   245248     4327   3       0             0 python
[ 2967]     0  2967   245248     4278   3       0             0 python
[ 2968]     0  2968   245248     2778   1       0             0 python
[ 2969]     0  2969   245248     3963   2       0             0 python
[ 2970]     0  2970   245248     3994   1       0             0 python
[ 2971]     0  2971   245248     3292   2       0             0 python
[ 2972]     0  2972   245248     3815   3       0             0 python
[ 2973]     0  2973   245248     5351   3       0             0 python
[ 2974]     0  2974   245248     6424  10       0             0 python
[ 2975]     0  2975   245248     2794   1       0             0 python
[ 2976]     0  2976   245248     4327   1       0             0 python
[ 2977]     0  2977   245248     3029   1       0             0 python
[ 2978]     0  2978   245248     4914   1       0             0 python
[ 2979]     0  2979   245248     6850   1       0             0 python
[ 2980]     0  2980   245248     3301   1       0             0 python
[ 2981]     0  2981   245248     3454   2       0             0 python
[ 2982]     0  2982   245248     2856   1       0             0 python
[ 2983]     0  2983   245248     2295   7       0             0 python
[ 2984]     0  2984   245248     4732  10       0             0 python
[ 2985]     0  2985   245248     3815   9       0             0 python
[ 2986]     0  2986   245248     1705  13       0             0 python
[ 2987]     0  2987   245248     2282   9       0             0 python
[ 2988]     0  2988   245248     3817   9       0             0 python
[ 2989]     0  2989   245248     2783   9       0             0 python
[ 2990]     0  2990   245248     4835   2       0             0 python
[ 2991]     0  2991   245248     4838   3       0             0 python
[ 2992]     0  2992   245248      229  12       0             0 python
[ 2993]     0  2993   245248     1768   3       0             0 python
[ 2994]     0  2994   245248     4802   3       0             0 python
[ 2995]     0  2995   245248     7995   9       0             0 python
[ 2996]     0  2996   245248     2141  12       0             0 python
[ 2997]     0  2997   245248     1741   2       0             0 python
[ 2998]     0  2998   245248     4905  14       0             0 python
[ 2999]     0  2999   245248     2789   3       0             0 python
[ 3000]     0  3000   245248     4321   2       0             0 python
[ 3001]     0  3001   245248     3816  11       0             0 python
[ 3002]     0  3002   245248     2790   2       0             0 python
[ 3003]     0  3003   245248     1760   2       0             0 python
[ 3004]     0  3004   245248     3290   9       0             0 python
[ 3005]     0  3005   245248     2793   3       0             0 python
[ 3006]     0  3006   245248     3811   3       0             0 python
[ 3007]     0  3007   245248     3302   9       0             0 python
[ 3008]     0  3008   245248     2304  12       0             0 python
[ 3009]     0  3009   245248     2797   9       0             0 python
[ 3010]     0  3010   245248     2723   9       0             0 python
[ 3011]     0  3011   245248     1769   9       0             0 python
[ 3017]     0  3017   245248     1823  11       0             0 python
[ 3018]     0  3018   245248     2794  11       0             0 python
[ 3019]     0  3019   245248     3817   3       0             0 python
[ 3020]     0  3020   245248     1769  14       0             0 python
[ 3022]     0  3022   245248     1837  15       0             0 python
[ 3023]     0  3023   245248     2282  10       0             0 python
[ 3024]     0  3024   245248     2282  10       0             0 python
[ 3025]     0  3025   245248     2278   3       0             0 python
[ 3026]     0  3026   245248     2282  14       0             0 python
[ 3027]     0  3027   245248     2791   2       0             0 python
[ 3028]     0  3028   245248     1461   9       0             0 python
[ 3029]     0  3029   245248     1773   3       0             0 python
[ 3030]     0  3030   245248     2280   9       0             0 python
[ 3031]     0  3031   245248     3862   9       0             0 python
[ 3032]     0  3032   245248     2381  11       0             0 python
[ 3033]     0  3033   245248     2437   9       0             0 python
[ 3034]     0  3034   245248     1769   9       0             0 python
[ 3035]     0  3035   245248     3144  10       0             0 python
[ 3036]     0  3036   245248     2676  11       0             0 python
[ 3037]     0  3037   245248      214  11       0             0 python
[ 3038]     0  3038   245248     2389   9       0             0 python
[ 3039]     0  3039   245248     2386   9       0             0 python
[ 3040]     0  3040   245248     2334   2       0             0 python
[ 3041]     0  3041   245248     3819   0       0             0 python
[ 3042]     0  3042   245248     2373   3       0             0 python
[ 3043]     0  3043   245248     1259   9       0             0 python
[ 3044]     0  3044   245248     2183   3       0             0 python
[ 3045]     0  3045   245248     5869  14       0             0 python
[ 3046]     0  3046   245248     2281  10       0             0 python
[ 3047]     0  3047   245248     2791   9       0             0 python
[ 3048]     0  3048   245248     3820  12       0             0 python
[ 3049]     0  3049   245248     2792  10       0             0 python
[ 3050]     0  3050   245248     1449   3       0             0 python
[ 3051]     0  3051   245248     1769   9       0             0 python
[ 3052]     0  3052   245248     4330  10       0             0 python
[ 3053]     0  3053   245248     1731   9       0             0 python
[ 3054]     0  3054   245248     1257   9       0             0 python
[ 3055]     0  3055   245248     1207  14       0             0 python
[ 3056]     0  3056   245248      184   9       0             0 python
[ 3057]     0  3057   245248     1255   4       0             0 python
[ 3058]     0  3058   245248     1769   2       0             0 python
[ 3059]     0  3059   245248     2234   9       0             0 python
[ 3060]     0  3060   245248     2795   4       0             0 python
[ 3061]     0  3061   245248     1768   4       0             0 python
[ 3062]     0  3062   245248      748  10       0             0 python
[ 3063]     0  3063   245248     1955  15       0             0 python
[ 3064]     0  3064   245248     1260   9       0             0 python
[ 3065]     0  3065   245248     1350   6       0             0 python
[ 3066]     0  3066   245248     1769   9       0             0 python
[ 3067]     0  3067   245248     3307   2       0             0 python
[ 3068]     0  3068   245248     2276   6       0             0 python
[ 3069]     0  3069   245248     1877  10       0             0 python
[ 3070]     0  3070   245248     2702   0       0             0 python
[ 3071]     0  3071   245248     1805  10       0             0 python
[ 3072]     0  3072   245248     1283   9       0             0 python
[ 3073]     0  3073   245248     2282   6       0             0 python
[ 3074]     0  3074   245248     3306   2       0             0 python
[ 3075]     0  3075   245248     2283   2       0             0 python
[ 3076]     0  3076   245248      216   3       0             0 python
[ 3077]     0  3077   245248     2282  11       0             0 python
[ 3078]     0  3078   245248     2045   2       0             0 python
[ 3079]     0  3079   245248     2794   7       0             0 python
[ 3080]     0  3080   245248     1764  10       0             0 python
[ 3081]     0  3081   245248     1769  13       0             0 python
[ 3082]     0  3082   245248     1258   3       0             0 python
[ 3083]     0  3083   245248     2283   9       0             0 python
[ 3084]     0  3084   245248     1351   9       0             0 python
[ 3085]     0  3085   245248     1256   9       0             0 python
[ 3086]     0  3086   245248     2282   9       0             0 python
[ 3087]     0  3087   245248     2771   4       0             0 python
[ 3088]     0  3088   245248     3839   3       0             0 python
[ 3089]     0  3089   245248     2271  11       0             0 python
[ 3090]     0  3090   245248     2082  10       0             0 python
[ 3091]     0  3091   245248     3285   2       0             0 python
[ 3092]     0  3092   245248      722   9       0             0 python
[ 3093]     0  3093   245248     1768   2       0             0 python
[ 3094]     0  3094   245248     1259   9       0             0 python
[ 3095]     0  3095   245248     2283   9       0             0 python
[ 3096]     0  3096   245248     1314  10       0             0 python
[ 3097]     0  3097   245248     2441   9       0             0 python
[ 3098]     0  3098   245248     1770   2       0             0 python
[ 3099]     0  3099   245248     1261  10       0             0 python
[ 3100]     0  3100   245248     2338   9       0             0 python
[ 3101]     0  3101   245248     1770   2       0             0 python
[ 3102]     0  3102   245248     1752   9       0             0 python
[ 3103]     0  3103   245248     1937  10       0             0 python
[ 3104]     0  3104   245248     1768  10       0             0 python
[ 3108]     0  3108   245248     1773   9       0             0 python
[ 3109]     0  3109   245248      746   2       0             0 python
[ 3110]     0  3110   245248     2794  11       0             0 python
[ 3111]     0  3111   245248     3546   9       0             0 python
[ 3112]     0  3112   245248     3307  10       0             0 python
[ 3113]     0  3113   245248     2665  11       0             0 python
[ 3114]     0  3114   245248      214   9       0             0 python
[ 3115]     0  3115   245248     2268   9       0             0 python
[ 3116]     0  3116   245248     1772   9       0             0 python
[ 3117]     0  3117   245248      216  11       0             0 python
[ 3118]     0  3118   245248     2791  10       0             0 python
[ 3119]     0  3119   245248      746   3       0             0 python
[ 3120]     0  3120   245248     1257  10       0             0 python
[ 3121]     0  3121   245248     1418  10       0             0 python
[ 3122]     0  3122   245248     1262   9       0             0 python
[ 3123]     0  3123   245248     1260   9       0             0 python
[ 3124]     0  3124   245248     1771  15       0             0 python
[ 3125]     0  3125   245248      216  11       0             0 python
[ 3126]     0  3126   245248     1305   9       0             0 python
[ 3127]     0  3127   245248     1247  12       0             0 python
[ 3128]     0  3128   245248     2221   4       0             0 python
[ 3129]     0  3129   245248      746   2       0             0 python
[ 3130]     0  3130   245248      746  11       0             0 python
[ 3131]     0  3131   245248      743  11       0             0 python
[ 3132]     0  3132   245248      218   4       0             0 python
[ 3133]     0  3133   245248     1770   2       0             0 python
[ 3134]     0  3134   245248      232  10       0             0 python
[ 3135]     0  3135    41834      474   2       0             0 python
[ 3136]     0  3136   245248      217   1       0             0 python
[ 3139]     0  3139   245248      215  11       0             0 python
[ 3140]     0  3140   245248      214   1       0             0 python
[ 3141]     0  3141   245248      215   3       0             0 python
[ 3142]     0  3142   245248      216   1       0             0 python
[ 3143]     0  3143   245248      215   7       0             0 python
[ 3144]     0  3144   245248      217  10       0             0 python
[ 3145]     0  3145   245248      216  12       0             0 python
[ 3146]     0  3146    41834      140   2       0             0 python
[ 3157]     0  3157    41834      140   0       0             0 python
[ 3158]     0  3158    41834      127   3       0             0 python
[ 3159]     0  3159    41834      133   2       0             0 python
[ 3160]     0  3160    41834      123   3       0             0 python
[ 3161]     0  3161    41834      117   3       0             0 python
[ 3162]     0  3162    41834      113   3       0             0 python
[ 3164]     0  3164    41834      107   1       0             0 python
[ 3166]     0  3166    41834       98   3       0             0 python
Out of memory: Kill process 2103 (mingetty) score 1 or sacrifice child
Killed process 2103 (mingetty) total-vm:4100kB, anon-rss:0kB, file-rss:0kB
python invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
python cpuset=/ mems_allowed=0-1

Out of memory: Kill process 3246 (agetty) score 1 or sacrifice child
Killed process 3246 (agetty) total-vm:4116kB, anon-rss:72kB, file-rss:0kB
init: tty (init: tty (/devinit: tty (/dev/tty4) main process (3169) killed by KILL signal
init: tty (/dev/tty4) main process ended, respawning
init: tty (/dev/tty5) main process (3170) killed by KILL signal
init: tty (/dev/tty5) main process ended, respawning
init: tty (/dev/tty6) main process (3171) killed by KILL signal
init: tty (/dev/tty6) main process ended, respawning
init: serial (ttyS0) main process (3246) killed by KILL signal
init: serial (ttyS0) main process ended, respawning

> I'm indifferent to the actual scale of OOM_SCORE_MAX_FACTOR; it could
> be
> 10 as proposed in this patch or even increased higher for higher
> resolution.
> 
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -38,6 +38,9 @@ int sysctl_oom_kill_allocating_task;
> int sysctl_oom_dump_tasks = 1;
> static DEFINE_SPINLOCK(zone_scan_lock);
> 
> +#define OOM_SCORE_MAX_FACTOR 10
> +#define OOM_SCORE_MAX (OOM_SCORE_ADJ_MAX * OOM_SCORE_MAX_FACTOR)
> +
> #ifdef CONFIG_NUMA
> /**
> * has_intersects_mems_allowed() - check task eligiblity for kill
> @@ -160,7 +163,7 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> */
> if (p->flags & PF_OOM_ORIGIN) {
> task_unlock(p);
> - return 1000;
> + return OOM_SCORE_MAX;
> }
> 
> /*
> @@ -177,32 +180,38 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> points = get_mm_rss(p->mm) + p->mm->nr_ptes;
> points += get_mm_counter(p->mm, MM_SWAPENTS);
> 
> - points *= 1000;
> + points *= OOM_SCORE_MAX;
> points /= totalpages;
> task_unlock(p);
> 
> /*
> - * Root processes get 3% bonus, just like the __vm_enough_memory()
> - * implementation used by LSMs.
> + * Root processes get a bonus of 1% per 10% of memory used.
> */
> - if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> - points -= 30;
> + if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> + int bonus;
> + int granularity;
> +
> + bonus = OOM_SCORE_MAX / 100; /* bonus is 1% */
> + granularity = OOM_SCORE_MAX / 10; /* granularity is 10% */
> +
> + points -= bonus * (points / granularity);
> + }
> 
> /*
> * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
> * either completely disable oom killing or always prefer a certain
> * task.
> */
> - points += p->signal->oom_score_adj;
> + points += p->signal->oom_score_adj * OOM_SCORE_MAX_FACTOR;
> 
> /*
> * Never return 0 for an eligible task that may be killed since it's
> - * possible that no single user task uses more than 0.1% of memory
> and
> + * possible that no single user task uses more than 0.01% of memory
> and
> * no single admin tasks uses more than 3.0%.
> */
> if (points <= 0)
> return 1;
> - return (points < 1000) ? points : 1000;
> + return (points < OOM_SCORE_MAX) ? points : OOM_SCORE_MAX;
> }
> 
> /*
> @@ -314,7 +323,7 @@ static struct task_struct
> *select_bad_process(unsigned int *ppoints,
> */
> if (p == current) {
> chosen = p;
> - *ppoints = 1000;
> + *ppoints = OOM_SCORE_MAX;
> } else {
> /*
> * If this task is not being ptraced on exit,

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-26  7:08         ` CAI Qian
  0 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-26  7:08 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, hughd, kamezawa hiroyu,
	minchan kim, oleg, KOSAKI Motohiro



----- Original Message -----
> On Mon, 23 May 2011, David Rientjes wrote:
> 
> > I already suggested an alternative patch to CAI Qian to greatly
> > increase
> > the granularity of the oom score from a range of 0-1000 to 0-10000
> > to
> > differentiate between tasks within 0.01% of available memory (16MB
> > on CAI
> > Qian's 16GB system). I'll propose this officially in a separate
> > email.
> >
> 
> This is an alternative patch as earlier proposed with suggested
> improvements from Minchan. CAI, would it be possible to test this out
> on
> your usecase?
Here is the results for the testing. Running the reproducer as non-root
user, the results look good as OOM killer just killed each python process
in-turn that the reproducer forked. However, when running it as root
user, sshd and other random processes had been killed.

[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[  567]     0   567     2935      365   0     -17         -1000 udevd
[ 2116]     0  2116     3099      464   8     -17         -1000 udevd
[ 2117]     0  2117     3099      503   2     -17         -1000 udevd
[ 2317]     0  2317     6404       39   8     -17         -1000 auditd
[ 3221]     0  3221    15998      153   9       0             0 sshd
[ 3223]     0  3223    24421      204   0       0             0 sshd
[ 3227]     0  3227    27093       86   4       0             0 bash
[ 3246]     0  3246     1029       18   1       0             0 agetty
[ 3251]     0  3251   243710    98227  11       0             0 python
[ 3252]     0  3252   243710   109999   9       0             0 python
[ 3253]     0  3253   243710   111538  12       0             0 python
[ 3254]     0  3254   243710   106931   1       0             0 python
[ 3255]     0  3255   243710   103367   9       0             0 python
[ 3256]     0  3256   243710    97715   1       0             0 python
[ 3257]     0  3257   243710   107443   9       0             0 python
[ 3258]     0  3258   243710   101298   4       0             0 python
[ 3259]     0  3259   243710   118707   1       0             0 python
[ 3260]     0  3260   243710   104882   9       0             0 python
[ 3261]     0  3261   243710   108979  12       0             0 python
[ 3262]     0  3262   243710    93106   1       0             0 python
[ 3263]     0  3263   243710    97714  12       0             0 python
[ 3264]     0  3264   243710    91571  12       0             0 python
[ 3265]     0  3265   243710    93107   1       0             0 python
[ 3266]     0  3266   243710    83790   9       0             0 python
[ 3267]     0  3267   243710    81330   5       0             0 python
[ 3268]     0  3268   243710    83378   5       0             0 python
[ 3269]     0  3269   243710    77235   4       0             0 python
[ 3270]     0  3270   243710    80732   1       0             0 python
[ 3271]     0  3271   243710    72626  11       0             0 python
[ 3272]     0  3272   243710    81385   7       0             0 python
[ 3273]     0  3273   243710    71749   3       0             0 python
[ 3274]     0  3274   243710    70735   1       0             0 python
[ 3275]     0  3275   243710    84403   9       0             0 python
[ 3276]     0  3276   243710    72255  13       0             0 python
[ 3277]     0  3277   243710    65971   3       0             0 python
[ 3278]     0  3278   243710    66172  15       0             0 python
[ 3279]     0  3279   243710    69555   1       0             0 python
[ 3280]     0  3280   243710    68689   9       0             0 python
[ 3281]     0  3281   243710    69553   9       0             0 python
[ 3282]     0  3282   243710    64439   6       0             0 python
[ 3283]     0  3283   243710    56753  11       0             0 python
[ 3284]     0  3284   243710    57917   6       0             0 python
[ 3285]     0  3285   243710    55730   9       0             0 python
[ 3286]     0  3286   243710    54193   9       0             0 python
[ 3287]     0  3287   243710    51123   1       0             0 python
[ 3288]     0  3288   243710    52146  15       0             0 python
[ 3289]     0  3289   243710    48220   9       0             0 python
[ 3290]     0  3290   243710    48051   3       0             0 python
[ 3291]     0  3291   243710    40371   3       0             0 python
[ 3292]     0  3292   243710    49229  13       0             0 python
[ 3293]     0  3293   243710    40549   9       0             0 python
[ 3294]     0  3294   243710    41618   5       0             0 python
[ 3295]     0  3295   243710    40429   9       0             0 python
[ 3296]     0  3296   243710    36787   1       0             0 python
[ 3297]     0  3297   243710    39346  11       0             0 python
[ 3298]     0  3298   243710    35251   3       0             0 python
[ 3299]     0  3299   243710    32872   3       0             0 python
[ 3300]     0  3300   243710    29781   1       0             0 python
[ 3301]     0  3301   243710    27570  11       0             0 python
[ 3302]     0  3302   243710    28081   9       0             0 python
[ 3303]     0  3303   243710    24499   1       0             0 python
[ 3304]     0  3304   243710    21427   1       0             0 python
[ 3305]     0  3305   243710    25522   9       0             0 python
[ 3306]     0  3306   243710    28081   9       0             0 python
[ 3307]     0  3307   243710    21939   9       0             0 python
[ 3308]     0  3308   243710    19890   9       0             0 python
[ 3309]     0  3309   243710    18354   3       0             0 python
[ 3310]     0  3310   243710    16590  14       0             0 python
[ 3311]     0  3311   243710    18718  11       0             0 python
[ 3312]     0  3312   243710    17841   1       0             0 python
[ 3313]     0  3313   243710    14258  11       0             0 python
[ 3314]     0  3314   243710    14426   4       0             0 python
[ 3315]     0  3315   243710    15282   6       0             0 python
[ 3316]     0  3316   243710     9650   6       0             0 python
[ 3317]     0  3317   243710    11699   1       0             0 python
[ 3318]     0  3318   243710    11372   3       0             0 python
[ 3319]     0  3319   243710     9650   9       0             0 python
[ 3320]     0  3320   243710     8426  11       0             0 python
[ 3321]     0  3321   243710     4531   3       0             0 python
[ 3322]     0  3322   243710     8627   9       0             0 python
[ 3323]     0  3323   243710     6578   1       0             0 python
[ 3324]     0  3324   243710     5553   7       0             0 python
[ 3325]     0  3325   243710    10673   3       0             0 python
[ 3326]     0  3326   243710     6578  11       0             0 python
[ 3327]     0  3327   243710     3505   1       0             0 python
[ 3328]     0  3328   243710     3530   1       0             0 python
[ 3329]     0  3329   243710     5205  11       0             0 python
[ 3330]     0  3330   243710     1970   9       0             0 python
[ 3331]     0  3331   243710     4021  11       0             0 python
[ 3332]     0  3332   243710     5043   1       0             0 python
[ 3333]     0  3333   243710     2481   1       0             0 python
[ 3334]     0  3334   243710     4530   1       0             0 python
[ 3343]     0  3343    41835      773   9       0             0 python
[ 3344]     0  3344    41835      773   4       0             0 python
[ 3345]     0  3345    41835      773   1       0             0 python
[ 3346]     0  3346    41835      773   1       0             0 python
[ 3347]     0  3347    41835      773   9       0             0 python
[ 3348]     0  3348    41835      773   3       0             0 python
[ 3349]     0  3349    41835      773   1       0             0 python
[ 3350]     0  3350    41835      773  11       0             0 python
Out of memory: Kill process 3221 (sshd) score 1 or sacrifice child
Killed process 3223 (sshd) total-vm:97684kB, anon-rss:816kB, file-rss:0kB
sshd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
sshd cpuset=/ mems_allowed=0-1

[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[  567]     0   567     2935        0   0     -17         -1000 udevd
[ 2103]     0  2103     1025        0   9       0             0 mingetty
[ 2105]     0  2105     1025        0   2       0             0 mingetty
[ 2109]     0  2109    19263      100   0       0             0 login
[ 2116]     0  2116     3099        0   8     -17         -1000 udevd
[ 2117]     0  2117     3099        0   2     -17         -1000 udevd
[ 2317]     0  2317     6404       20   0     -17         -1000 auditd
[ 2338]     0  2338    27093       11  10       0             0 bash
[ 2358]     0  2358   245248     8337   6       0             0 python
[ 2359]     0  2359   245248    11151   6       0             0 python
[ 2360]     0  2360   245248    12487  10       0             0 python
[ 2361]     0  2361   245248    11702   9       0             0 python
[ 2362]     0  2362   245248     6751   1       0             0 python
[ 2363]     0  2363   245248    10952   2       0             0 python
[ 2364]     0  2364   245248    12113   1       0             0 python
[ 2365]     0  2365   245248    11258   9       0             0 python
[ 2366]     0  2366   245248     9697  10       0             0 python
[ 2367]     0  2367   245248    12453   2       0             0 python
[ 2368]     0  2368   245248    14357  10       0             0 python
[ 2369]     0  2369   245248    11282  10       0             0 python
[ 2370]     0  2370   245248    11138   0       0             0 python
[ 2371]     0  2371   245248    10615  13       0             0 python
[ 2372]     0  2372   245248    10742   2       0             0 python
[ 2373]     0  2373   245248     9024   7       0             0 python
[ 2374]     0  2374   245248    12176  12       0             0 python
[ 2375]     0  2375   245248    13886  10       0             0 python
[ 2376]     0  2376   245248    10974   5       0             0 python
[ 2377]     0  2377   245248     8416  11       0             0 python
[ 2378]     0  2378   245248     9469  11       0             0 python
[ 2379]     0  2379   245248    11312  13       0             0 python
[ 2380]     0  2380   245248     9317   1       0             0 python
[ 2381]     0  2381   245248    10424   0       0             0 python
[ 2382]     0  2382   245248    15806   1       0             0 python
[ 2383]     0  2383   245248    15340   7       0             0 python
[ 2384]     0  2384   245248     7932   9       0             0 python
[ 2385]     0  2385   245248    10420   0       0             0 python
[ 2386]     0  2386   245248    14376   9       0             0 python
[ 2387]     0  2387   245248    12410   2       0             0 python
[ 2388]     0  2388   245248    14596   9       0             0 python
[ 2389]     0  2389   245248     7898   9       0             0 python
[ 2390]     0  2390   245248    10943  10       0             0 python
[ 2391]     0  2391   245248     8787   2       0             0 python
[ 2392]     0  2392   245248     7252  10       0             0 python
[ 2393]     0  2393   245248    12978  15       0             0 python
[ 2394]     0  2394   245248     7034  11       0             0 python
[ 2395]     0  2395   245248    10903   2       0             0 python
[ 2396]     0  2396   245248    10280  10       0             0 python
[ 2397]     0  2397   245248    10793   9       0             0 python
[ 2398]     0  2398   245248     8205   9       0             0 python
[ 2399]     0  2399   245248     9675   0       0             0 python
[ 2400]     0  2400   245248    11304   5       0             0 python
[ 2401]     0  2401   245248    15053   5       0             0 python
[ 2402]     0  2402   245248    14449  10       0             0 python
[ 2403]     0  2403   245248     8466   1       0             0 python
[ 2404]     0  2404   245248    14250  10       0             0 python
[ 2405]     0  2405   245248    11630   9       0             0 python
[ 2406]     0  2406   245248     9562   9       0             0 python
[ 2407]     0  2407   245248     8802   1       0             0 python
[ 2408]     0  2408   245248     9521   1       0             0 python
[ 2409]     0  2409   245248     4827  13       0             0 python
[ 2410]     0  2410   245248    10364   1       0             0 python
[ 2411]     0  2411   245248     8749   0       0             0 python
[ 2412]     0  2412   245248    15082   0       0             0 python
[ 2413]     0  2413   245248    11023  10       0             0 python
[ 2414]     0  2414   245248     9087   1       0             0 python
[ 2415]     0  2415   245248     9906   2       0             0 python
[ 2416]     0  2416   245248    13862   5       0             0 python
[ 2417]     0  2417   245248     9553   2       0             0 python
[ 2418]     0  2418   245248     8556  13       0             0 python
[ 2419]     0  2419   245248     9246   9       0             0 python
[ 2420]     0  2420   245248    11084   2       0             0 python
[ 2421]     0  2421   245248    16256   2       0             0 python
[ 2422]     0  2422   245248    13057  12       0             0 python
[ 2423]     0  2423   245248    10578   7       0             0 python
[ 2424]     0  2424   245248    10407   3       0             0 python
[ 2425]     0  2425   245248    10329   3       0             0 python
[ 2426]     0  2426   245248     9489   9       0             0 python
[ 2427]     0  2427   245248    10004   3       0             0 python
[ 2428]     0  2428   245248     7411   0       0             0 python
[ 2429]     0  2429   245248    13647   1       0             0 python
[ 2430]     0  2430   245248    10134   2       0             0 python
[ 2431]     0  2431   245248    12157  10       0             0 python
[ 2432]     0  2432   245248    11158   1       0             0 python
[ 2433]     0  2433   245248     9829  14       0             0 python
[ 2434]     0  2434   245248     5859   3       0             0 python
[ 2435]     0  2435   245248    11456   9       0             0 python
[ 2436]     0  2436   245248    12754   3       0             0 python
[ 2437]     0  2437   245248    11098   0       0             0 python
[ 2438]     0  2438   245248    10676   0       0             0 python
[ 2439]     0  2439   245248     9105   2       0             0 python
[ 2440]     0  2440   245248    10539  10       0             0 python
[ 2441]     0  2441   245248    11514  10       0             0 python
[ 2442]     0  2442   245248    10019   4       0             0 python
[ 2443]     0  2443   245248     7545  14       0             0 python
[ 2444]     0  2444   245248    11830  10       0             0 python
[ 2445]     0  2445   245248     4708  10       0             0 python
[ 2446]     0  2446   245248     8227  10       0             0 python
[ 2447]     0  2447   245248     6306  10       0             0 python
[ 2448]     0  2448   245248     8888   0       0             0 python
[ 2449]     0  2449   245248    11337   3       0             0 python
[ 2450]     0  2450   245248     4856   0       0             0 python
[ 2451]     0  2451   245248    12369   0       0             0 python
[ 2452]     0  2452   245248    11077  10       0             0 python
[ 2453]     0  2453   245248     6757   0       0             0 python
[ 2454]     0  2454   245248     6785  10       0             0 python
[ 2455]     0  2455   245248     6532   3       0             0 python
[ 2456]     0  2456   245248     6265   9       0             0 python
[ 2457]     0  2457   245248     8126   3       0             0 python
[ 2458]     0  2458   245248     9573  10       0             0 python
[ 2459]     0  2459   245248     6954  10       0             0 python
[ 2460]     0  2460   245248     7539   3       0             0 python
[ 2461]     0  2461   245248     7623   0       0             0 python
[ 2462]     0  2462   245248     4853   2       0             0 python
[ 2463]     0  2463   245248     9488  10       0             0 python
[ 2464]     0  2464   245248     6415   0       0             0 python
[ 2465]     0  2465   245248     9745   1       0             0 python
[ 2466]     0  2466   245248     7332   3       0             0 python
[ 2467]     0  2467   245248     7408  11       0             0 python
[ 2468]     0  2468   245248     8311   0       0             0 python
[ 2469]     0  2469   245248     6963   0       0             0 python
[ 2470]     0  2470   245248     8620  10       0             0 python
[ 2471]     0  2471   245248     5799  10       0             0 python
[ 2472]     0  2472   245248    12855  10       0             0 python
[ 2473]     0  2473   245248     8718   9       0             0 python
[ 2474]     0  2474   245248     6782   2       0             0 python
[ 2475]     0  2475   245248     9566   0       0             0 python
[ 2476]     0  2476   245248     8083   9       0             0 python
[ 2477]     0  2477   245248     8657  10       0             0 python
[ 2478]     0  2478   245248     8997   9       0             0 python
[ 2479]     0  2479   245248     6539  11       0             0 python
[ 2480]     0  2480   245248     8906   9       0             0 python
[ 2481]     0  2481   245248     8916  11       0             0 python
[ 2482]     0  2482   245248     8083   0       0             0 python
[ 2483]     0  2483   245248     9490   7       0             0 python
[ 2484]     0  2484   245248     8123   0       0             0 python
[ 2485]     0  2485   245248     7315  11       0             0 python
[ 2486]     0  2486   245248     9084   4       0             0 python
[ 2487]     0  2487   245248     8036  15       0             0 python
[ 2488]     0  2488   245248     6839   2       0             0 python
[ 2489]     0  2489   245248     9478  11       0             0 python
[ 2490]     0  2490   245248    11535  11       0             0 python
[ 2491]     0  2491   245248     7895   2       0             0 python
[ 2492]     0  2492   245248     8831   0       0             0 python
[ 2493]     0  2493   245248     9219   0       0             0 python
[ 2494]     0  2494   245248     8472  11       0             0 python
[ 2495]     0  2495   245248     6666   1       0             0 python
[ 2496]     0  2496   245248     4875  11       0             0 python
[ 2497]     0  2497   245248     6802  11       0             0 python
[ 2498]     0  2498   245248     4901   9       0             0 python
[ 2499]     0  2499   245248     8510  11       0             0 python
[ 2500]     0  2500   245248     8620  15       0             0 python
[ 2501]     0  2501   245248     7169  10       0             0 python
[ 2502]     0  2502   245248     6283   0       0             0 python
[ 2503]     0  2503   245248     9497   0       0             0 python
[ 2504]     0  2504   245248    10091   2       0             0 python
[ 2505]     0  2505   245248    11700   0       0             0 python
[ 2506]     0  2506   245248     8353   3       0             0 python
[ 2507]     0  2507   245248     8505   2       0             0 python
[ 2508]     0  2508   245248    10486   0       0             0 python
[ 2509]     0  2509   245248     6641   3       0             0 python
[ 2510]     0  2510   245248     7175  10       0             0 python
[ 2511]     0  2511   245248    10100   9       0             0 python
[ 2512]     0  2512   245248     6984  13       0             0 python
[ 2513]     0  2513   245248     7677  13       0             0 python
[ 2514]     0  2514   245248     7645  11       0             0 python
[ 2515]     0  2515   245248     8854   4       0             0 python
[ 2516]     0  2516   245248     6888   0       0             0 python
[ 2517]     0  2517   245248     6297  11       0             0 python
[ 2518]     0  2518   245248     8011  11       0             0 python
[ 2519]     0  2519   245248     6353  10       0             0 python
[ 2520]     0  2520   245248     5168   9       0             0 python
[ 2521]     0  2521   245248     7274  11       0             0 python
[ 2522]     0  2522   245248     6374  11       0             0 python
[ 2523]     0  2523   245248     9404   1       0             0 python
[ 2524]     0  2524   245248     7486   0       0             0 python
[ 2525]     0  2525   245248     7290  10       0             0 python
[ 2526]     0  2526   245248     5940   0       0             0 python
[ 2527]     0  2527   245248     7999  10       0             0 python
[ 2528]     0  2528   245248     8201   0       0             0 python
[ 2529]     0  2529   245248     8065   0       0             0 python
[ 2530]     0  2530   245248     6452   9       0             0 python
[ 2531]     0  2531   245248     6162  11       0             0 python
[ 2532]     0  2532   245248     6808   0       0             0 python
[ 2533]     0  2533   245248     4331   2       0             0 python
[ 2534]     0  2534   245248     6458   0       0             0 python
[ 2535]     0  2535   245248     3250   0       0             0 python
[ 2536]     0  2536   245248     5289   9       0             0 python
[ 2537]     0  2537   245248     9369  13       0             0 python
[ 2538]     0  2538   245248     9187  15       0             0 python
[ 2539]     0  2539   245248     8274   0       0             0 python
[ 2540]     0  2540   245248     8051   2       0             0 python
[ 2541]     0  2541   245248     4732   4       0             0 python
[ 2542]     0  2542   245248     4662   0       0             0 python
[ 2543]     0  2543   245248    12070   0       0             0 python
[ 2546]     0  2546   245248     6923   4       0             0 python
[ 2547]     0  2547   245248     4550   0       0             0 python
[ 2548]     0  2548   245248     4700  12       0             0 python
[ 2549]     0  2549   245248     5822  11       0             0 python
[ 2550]     0  2550   245248     6179  10       0             0 python
[ 2551]     0  2551   245248     7794   0       0             0 python
[ 2552]     0  2552   245248     6456  10       0             0 python
[ 2553]     0  2553   245248     4932   4       0             0 python
[ 2554]     0  2554   245248     7680  11       0             0 python
[ 2555]     0  2555   245248     1642  10       0             0 python
[ 2556]     0  2556   245248     7480  10       0             0 python
[ 2557]     0  2557   245248     3598   0       0             0 python
[ 2558]     0  2558   245248     7949   0       0             0 python
[ 2559]     0  2559   245248     4294   0       0             0 python
[ 2560]     0  2560   245248     5138   0       0             0 python
[ 2561]     0  2561   245248    11045   9       0             0 python
[ 2562]     0  2562   245248     4290   9       0             0 python
[ 2563]     0  2563   245248     7603   0       0             0 python
[ 2564]     0  2564   245248     8683  12       0             0 python
[ 2565]     0  2565   245248     6409  12       0             0 python
[ 2566]     0  2566   245248     8321   9       0             0 python
[ 2567]     0  2567   245248     7416   0       0             0 python
[ 2568]     0  2568   245248     5272   2       0             0 python
[ 2569]     0  2569   245248     7359  10       0             0 python
[ 2570]     0  2570   245248     4641   9       0             0 python
[ 2571]     0  2571   245248     7698   2       0             0 python
[ 2572]     0  2572   245248     6118  11       0             0 python
[ 2573]     0  2573   245248     4822   0       0             0 python
[ 2574]     0  2574   245248     4745   0       0             0 python
[ 2575]     0  2575   245248     8029   0       0             0 python
[ 2576]     0  2576   245248     6350   9       0             0 python
[ 2577]     0  2577   245248     5537   9       0             0 python
[ 2578]     0  2578   245248     6861   3       0             0 python
[ 2579]     0  2579   245248     5632   4       0             0 python
[ 2580]     0  2580   245248     6023   0       0             0 python
[ 2581]     0  2581   245248     7947  11       0             0 python
[ 2582]     0  2582   245248     6752   9       0             0 python
[ 2583]     0  2583   245248     4282  12       0             0 python
[ 2584]     0  2584   245248     6069   4       0             0 python
[ 2585]     0  2585   245248     5472  11       0             0 python
[ 2586]     0  2586   245248     4729   0       0             0 python
[ 2587]     0  2587   245248     8205   0       0             0 python
[ 2588]     0  2588   245248     6234  10       0             0 python
[ 2589]     0  2589   245248     7687  11       0             0 python
[ 2590]     0  2590   245248     8817  11       0             0 python
[ 2591]     0  2591   245248     5784  11       0             0 python
[ 2592]     0  2592   245248     7518  10       0             0 python
[ 2593]     0  2593   245248     7213  12       0             0 python
[ 2594]     0  2594   245248     9752   3       0             0 python
[ 2595]     0  2595   245248     7039   0       0             0 python
[ 2596]     0  2596   245248     8164   0       0             0 python
[ 2597]     0  2597   245248     4113  11       0             0 python
[ 2598]     0  2598   245248     4153   0       0             0 python
[ 2599]     0  2599   245248     6651  11       0             0 python
[ 2600]     0  2600   245248     3933   9       0             0 python
[ 2601]     0  2601   245248     7722  14       0             0 python
[ 2602]     0  2602   245248     7535   4       0             0 python
[ 2603]     0  2603   245248     4903   2       0             0 python
[ 2604]     0  2604   245248     5542   0       0             0 python
[ 2605]     0  2605   245248     4589  10       0             0 python
[ 2606]     0  2606   245248     7672   2       0             0 python
[ 2607]     0  2607   245248     6656   2       0             0 python
[ 2608]     0  2608   245248     6467   2       0             0 python
[ 2609]     0  2609   245248     8780   0       0             0 python
[ 2610]     0  2610   245248    11257   0       0             0 python
[ 2611]     0  2611   245248     6748   0       0             0 python
[ 2612]     0  2612   245248     8885  11       0             0 python
[ 2613]     0  2613   245248     4232   0       0             0 python
[ 2614]     0  2614   245248     5724  11       0             0 python
[ 2615]     0  2615   245248     2842  11       0             0 python
[ 2616]     0  2616   245248     4994  15       0             0 python
[ 2617]     0  2617   245248     5417  11       0             0 python
[ 2618]     0  2618   245248     4660   0       0             0 python
[ 2619]     0  2619   245248     5655  11       0             0 python
[ 2620]     0  2620   245248     5952   0       0             0 python
[ 2621]     0  2621   245248     6983  11       0             0 python
[ 2622]     0  2622   245248     6066  12       0             0 python
[ 2623]     0  2623   245248     7743  11       0             0 python
[ 2624]     0  2624   245248     3138  11       0             0 python
[ 2625]     0  2625   245248     6144   0       0             0 python
[ 2626]     0  2626   245248     5238   9       0             0 python
[ 2627]     0  2627   245248     9371  11       0             0 python
[ 2628]     0  2628   245248    13048  10       0             0 python
[ 2629]     0  2629   245248     6702   3       0             0 python
[ 2630]     0  2630   245248     5319  10       0             0 python
[ 2631]     0  2631   245248     7964   0       0             0 python
[ 2632]     0  2632   245248     5787  14       0             0 python
[ 2633]     0  2633   245248     9816   0       0             0 python
[ 2634]     0  2634   245248     5415   6       0             0 python
[ 2635]     0  2635   245248     6740   3       0             0 python
[ 2636]     0  2636   245248    10180   3       0             0 python
[ 2637]     0  2637   245248     5007  11       0             0 python
[ 2638]     0  2638   245248     5801   9       0             0 python
[ 2639]     0  2639   245248     7823   3       0             0 python
[ 2640]     0  2640   245248     9127   0       0             0 python
[ 2641]     0  2641   245248     5614   0       0             0 python
[ 2642]     0  2642   245248     4686  10       0             0 python
[ 2643]     0  2643   245248     4305  11       0             0 python
[ 2644]     0  2644   245248     4714   2       0             0 python
[ 2645]     0  2645   245248     5964  11       0             0 python
[ 2646]     0  2646   245248     7440  10       0             0 python
[ 2647]     0  2647   245248     6062   4       0             0 python
[ 2648]     0  2648   245248     5733   6       0             0 python
[ 2649]     0  2649   245248     5063   0       0             0 python
[ 2650]     0  2650   245248     4793   2       0             0 python
[ 2651]     0  2651   245248     5806   4       0             0 python
[ 2652]     0  2652   245248     8126  10       0             0 python
[ 2653]     0  2653   245248     5794   3       0             0 python
[ 2654]     0  2654   245248     4370  12       0             0 python
[ 2655]     0  2655   245248     5621   0       0             0 python
[ 2656]     0  2656   245248     6514  11       0             0 python
[ 2657]     0  2657   245248     6560   3       0             0 python
[ 2658]     0  2658   245248     7352   2       0             0 python
[ 2659]     0  2659   245248     4456   0       0             0 python
[ 2660]     0  2660   245248     6508   3       0             0 python
[ 2661]     0  2661   245248     4231   4       0             0 python
[ 2662]     0  2662   245248     5967   0       0             0 python
[ 2663]     0  2663   245248     5007   3       0             0 python
[ 2664]     0  2664   245248     5878   3       0             0 python
[ 2665]     0  2665   245248     7469  11       0             0 python
[ 2666]     0  2666   245248     4697   4       0             0 python
[ 2667]     0  2667   245248     3484  11       0             0 python
[ 2668]     0  2668   245248     4223   3       0             0 python
[ 2669]     0  2669   245248    10490  10       0             0 python
[ 2670]     0  2670   245248     3395   3       0             0 python
[ 2671]     0  2671   245248     7004  12       0             0 python
[ 2672]     0  2672   245248     6340   0       0             0 python
[ 2673]     0  2673   245248     3384   0       0             0 python
[ 2674]     0  2674   245248     5563   0       0             0 python
[ 2675]     0  2675   245248     4799  14       0             0 python
[ 2676]     0  2676   245248    10170  15       0             0 python
[ 2677]     0  2677   245248     4793  10       0             0 python
[ 2678]     0  2678   245248     6221   0       0             0 python
[ 2679]     0  2679   245248     4710  10       0             0 python
[ 2680]     0  2680   245248     6231   0       0             0 python
[ 2681]     0  2681   245248     3573   3       0             0 python
[ 2682]     0  2682   245248     3332   0       0             0 python
[ 2683]     0  2683   245248     6929   2       0             0 python
[ 2684]     0  2684   245248     6015  11       0             0 python
[ 2685]     0  2685   245248     5167  14       0             0 python
[ 2688]     0  2688   245248     5195   2       0             0 python
[ 2689]     0  2689   245248     5293   2       0             0 python
[ 2690]     0  2690   245248     4398  10       0             0 python
[ 2691]     0  2691   245248     4672  11       0             0 python
[ 2692]     0  2692   245248     5772   6       0             0 python
[ 2693]     0  2693   245248     4550   2       0             0 python
[ 2694]     0  2694   245248     6926   0       0             0 python
[ 2695]     0  2695   245248     3137   2       0             0 python
[ 2696]     0  2696   245248     4804  10       0             0 python
[ 2697]     0  2697   245248     7152   0       0             0 python
[ 2698]     0  2698   245248     3031   3       0             0 python
[ 2699]     0  2699   245248     6700   0       0             0 python
[ 2700]     0  2700   245248     4299   6       0             0 python
[ 2701]     0  2701   245248     3678   0       0             0 python
[ 2702]     0  2702   245248     4665   0       0             0 python
[ 2703]     0  2703   245248     5555   5       0             0 python
[ 2704]     0  2704   245248     5672   0       0             0 python
[ 2705]     0  2705   245248     3480   0       0             0 python
[ 2706]     0  2706   245248     4387  10       0             0 python
[ 2707]     0  2707   245248     4539   0       0             0 python
[ 2708]     0  2708   245248     3206  11       0             0 python
[ 2711]     0  2711   245248     6383  10       0             0 python
[ 2712]     0  2712   245248     6077   2       0             0 python
[ 2713]     0  2713   245248     4819   0       0             0 python
[ 2714]     0  2714   245248     6774   0       0             0 python
[ 2715]     0  2715   245248     4395   0       0             0 python
[ 2716]     0  2716   245248     9053  11       0             0 python
[ 2717]     0  2717   245248     8341   7       0             0 python
[ 2718]     0  2718   245248     4305   0       0             0 python
[ 2723]     0  2723  1027964      156   8       0             0 console-kit-dae
[ 2790]     0  2790    27092       54   4       0             0 bash
[ 2808]     0  2808   245248     4255  11       0             0 python
[ 2809]     0  2809   245248     7280   2       0             0 python
[ 2810]     0  2810   245248     5922  11       0             0 python
[ 2811]     0  2811   245248     4383   0       0             0 python
[ 2812]     0  2812   245248     4755  15       0             0 python
[ 2813]     0  2813   245248     6075  10       0             0 python
[ 2814]     0  2814   245248     4818   2       0             0 python
[ 2815]     0  2815   245248     4671   3       0             0 python
[ 2816]     0  2816   245248     5975   0       0             0 python
[ 2817]     0  2817   245248     4209   0       0             0 python
[ 2818]     0  2818   245248     5534  12       0             0 python
[ 2819]     0  2819   245248     2562   0       0             0 python
[ 2820]     0  2820   245248     4585   7       0             0 python
[ 2821]     0  2821   245248     6823  10       0             0 python
[ 2822]     0  2822   245248     5243  11       0             0 python
[ 2823]     0  2823   245248     7690   0       0             0 python
[ 2824]     0  2824   245248     5813  11       0             0 python
[ 2825]     0  2825   245248     3626   7       0             0 python
[ 2826]     0  2826   245248     4024   3       0             0 python
[ 2827]     0  2827   245248     6512   0       0             0 python
[ 2828]     0  2828   245248     4419   7       0             0 python
[ 2829]     0  2829   245248    13229   0       0             0 python
[ 2830]     0  2830   245248     2401   0       0             0 python
[ 2831]     0  2831   245248     2651  10       0             0 python
[ 2832]     0  2832   245248     4976   0       0             0 python
[ 2833]     0  2833   245248     6267  10       0             0 python
[ 2834]     0  2834   245248     3703  11       0             0 python
[ 2835]     0  2835   245248     4086   2       0             0 python
[ 2836]     0  2836   245248     6895  14       0             0 python
[ 2837]     0  2837   245248     3800  10       0             0 python
[ 2838]     0  2838   245248     8418  10       0             0 python
[ 2839]     0  2839   245248     3809  10       0             0 python
[ 2840]     0  2840   245248     2784  11       0             0 python
[ 2841]     0  2841   245248     3494   6       0             0 python
[ 2842]     0  2842   245248     4246   2       0             0 python
[ 2843]     0  2843   245248     5831   0       0             0 python
[ 2844]     0  2844   245248     7335   3       0             0 python
[ 2845]     0  2845   245248     5514   0       0             0 python
[ 2846]     0  2846   245248     6125   0       0             0 python
[ 2847]     0  2847   245248     5592  14       0             0 python
[ 2848]     0  2848   245248     5769   0       0             0 python
[ 2849]     0  2849   245248     4548   2       0             0 python
[ 2850]     0  2850   245248     7435   7       0             0 python
[ 2851]     0  2851   245248     6527   3       0             0 python
[ 2852]     0  2852   245248     3152   0       0             0 python
[ 2853]     0  2853   245248     5106   0       0             0 python
[ 2854]     0  2854   245248     5215  10       0             0 python
[ 2855]     0  2855   245248     4286   2       0             0 python
[ 2856]     0  2856   245248     6282   0       0             0 python
[ 2857]     0  2857   245248     3207  15       0             0 python
[ 2858]     0  2858   245248     5448  11       0             0 python
[ 2859]     0  2859   245248     3807  10       0             0 python
[ 2860]     0  2860   245248     3279  14       0             0 python
[ 2861]     0  2861   245248     4322   3       0             0 python
[ 2862]     0  2862   245248     4324   0       0             0 python
[ 2863]     0  2863   245248     3590  11       0             0 python
[ 2864]     0  2864   245248     7398   2       0             0 python
[ 2865]     0  2865   245248     5345   3       0             0 python
[ 2866]     0  2866   245248     5494   0       0             0 python
[ 2867]     0  2867   245248     5302   0       0             0 python
[ 2868]     0  2868   245248     6553   4       0             0 python
[ 2869]     0  2869   245248     4227   0       0             0 python
[ 2870]     0  2870   245248     4746  15       0             0 python
[ 2871]     0  2871   245248     5238   2       0             0 python
[ 2872]     0  2872   245248     4250  14       0             0 python
[ 2873]     0  2873   245248     7820   2       0             0 python
[ 2874]     0  2874   245248     3762   0       0             0 python
[ 2875]     0  2875   245248     4310   3       0             0 python
[ 2876]     0  2876   245248     3243   2       0             0 python
[ 2877]     0  2877   245248     3813  11       0             0 python
[ 2878]     0  2878   245248     5350  11       0             0 python
[ 2879]     0  2879   245248     5832  11       0             0 python
[ 2880]     0  2880   245248     4321   3       0             0 python
[ 2881]     0  2881   245248     4831   3       0             0 python
[ 2882]     0  2882   245248     3215   0       0             0 python
[ 2883]     0  2883   245248     2718   0       0             0 python
[ 2884]     0  2884   245248     5707   3       0             0 python
[ 2885]     0  2885   245248     4566   3       0             0 python
[ 2886]     0  2886   245248     5540   3       0             0 python
[ 2887]     0  2887   245248     6340   3       0             0 python
[ 2888]     0  2888   245248     4824   3       0             0 python
[ 2889]     0  2889   245248     4877  10       0             0 python
[ 2890]     0  2890   245248     3616   3       0             0 python
[ 2891]     0  2891   245248     3814   2       0             0 python
[ 2892]     0  2892   245248     4341   9       0             0 python
[ 2893]     0  2893   245248     5771   9       0             0 python
[ 2894]     0  2894   245248     3303   2       0             0 python
[ 2895]     0  2895   245248     4327  10       0             0 python
[ 2896]     0  2896   245248     2791   2       0             0 python
[ 2897]     0  2897   245248     4728   3       0             0 python
[ 2898]     0  2898   245248     4823   3       0             0 python
[ 2899]     0  2899   245248     4221   2       0             0 python
[ 2900]     0  2900   245248     3692  13       0             0 python
[ 2901]     0  2901   245248     7446   9       0             0 python
[ 2902]     0  2902   245248     3719  10       0             0 python
[ 2903]     0  2903   245248     6232   3       0             0 python
[ 2904]     0  2904   245248     4791   2       0             0 python
[ 2905]     0  2905   245248     6689   2       0             0 python
[ 2906]     0  2906   245248     6370   6       0             0 python
[ 2909]     0  2909   245248     3934   6       0             0 python
[ 2910]     0  2910   245248     2908  10       0             0 python
[ 2911]     0  2911   245248     2299  11       0             0 python
[ 2912]     0  2912   245248     5449   7       0             0 python
[ 2913]     0  2913   245248     3814   3       0             0 python
[ 2914]     0  2914   245248     3302  10       0             0 python
[ 2915]     0  2915   245248     4840   3       0             0 python
[ 2916]     0  2916   245248     3236   6       0             0 python
[ 2917]     0  2917   245248     4037  11       0             0 python
[ 2918]     0  2918   245248     2266  11       0             0 python
[ 2919]     0  2919   245248     2786   3       0             0 python
[ 2920]     0  2920   245248     8194  11       0             0 python
[ 2921]     0  2921   245248     2247  10       0             0 python
[ 2922]     0  2922   245248     4847   1       0             0 python
[ 2923]     0  2923   245248     3302   1       0             0 python
[ 2924]     0  2924   245248     3940   1       0             0 python
[ 2925]     0  2925   245248     4866   2       0             0 python
[ 2926]     0  2926   245248     3301   1       0             0 python
[ 2927]     0  2927   245248     1462  10       0             0 python
[ 2928]     0  2928   245248     1829   2       0             0 python
[ 2929]     0  2929   245248     4283   1       0             0 python
[ 2930]     0  2930   245248     3398   2       0             0 python
[ 2931]     0  2931   245248     7905   1       0             0 python
[ 2932]     0  2932   245248     4302   2       0             0 python
[ 2933]     0  2933   245248     2885   2       0             0 python
[ 2934]     0  2934   245248     6637   2       0             0 python
[ 2935]     0  2935   245248     2876  11       0             0 python
[ 2936]     0  2936   245248     3719   3       0             0 python
[ 2937]     0  2937   245248     2768   1       0             0 python
[ 2938]     0  2938   245248     1984  11       0             0 python
[ 2939]     0  2939   245248     2280  15       0             0 python
[ 2940]     0  2940   245248     1767   1       0             0 python
[ 2941]     0  2941   245248     3816  10       0             0 python
[ 2942]     0  2942   245248     2790   3       0             0 python
[ 2943]     0  2943   245248     3831   3       0             0 python
[ 2944]     0  2944   245248     3813   9       0             0 python
[ 2945]     0  2945   245248     4326  14       0             0 python
[ 2946]     0  2946   245248     2793   6       0             0 python
[ 2947]     0  2947   245248     4247   9       0             0 python
[ 2948]     0  2948   245248     3304   2       0             0 python
[ 2949]     0  2949   245248     4391   3       0             0 python
[ 2950]     0  2950   245248     3810  15       0             0 python
[ 2951]     0  2951   245248     2293  10       0             0 python
[ 2952]     0  2952   245248     4311   3       0             0 python
[ 2953]     0  2953   245248     4378   2       0             0 python
[ 2954]     0  2954   245248     4086   2       0             0 python
[ 2955]     0  2955   245248     2982   3       0             0 python
[ 2956]     0  2956   245248     2287   9       0             0 python
[ 2957]     0  2957   245248     5347  10       0             0 python
[ 2958]     0  2958   245248     5331  11       0             0 python
[ 2959]     0  2959   245248     1307   3       0             0 python
[ 2960]     0  2960   245248     4327  10       0             0 python
[ 2961]     0  2961   245248     3236   9       0             0 python
[ 2962]     0  2962   245248     3681   9       0             0 python
[ 2963]     0  2963   245248     3304   1       0             0 python
[ 2964]     0  2964   245248     3298  11       0             0 python
[ 2965]     0  2965   245248     5123  14       0             0 python
[ 2966]     0  2966   245248     4327   3       0             0 python
[ 2967]     0  2967   245248     4278   3       0             0 python
[ 2968]     0  2968   245248     2778   1       0             0 python
[ 2969]     0  2969   245248     3963   2       0             0 python
[ 2970]     0  2970   245248     3994   1       0             0 python
[ 2971]     0  2971   245248     3292   2       0             0 python
[ 2972]     0  2972   245248     3815   3       0             0 python
[ 2973]     0  2973   245248     5351   3       0             0 python
[ 2974]     0  2974   245248     6424  10       0             0 python
[ 2975]     0  2975   245248     2794   1       0             0 python
[ 2976]     0  2976   245248     4327   1       0             0 python
[ 2977]     0  2977   245248     3029   1       0             0 python
[ 2978]     0  2978   245248     4914   1       0             0 python
[ 2979]     0  2979   245248     6850   1       0             0 python
[ 2980]     0  2980   245248     3301   1       0             0 python
[ 2981]     0  2981   245248     3454   2       0             0 python
[ 2982]     0  2982   245248     2856   1       0             0 python
[ 2983]     0  2983   245248     2295   7       0             0 python
[ 2984]     0  2984   245248     4732  10       0             0 python
[ 2985]     0  2985   245248     3815   9       0             0 python
[ 2986]     0  2986   245248     1705  13       0             0 python
[ 2987]     0  2987   245248     2282   9       0             0 python
[ 2988]     0  2988   245248     3817   9       0             0 python
[ 2989]     0  2989   245248     2783   9       0             0 python
[ 2990]     0  2990   245248     4835   2       0             0 python
[ 2991]     0  2991   245248     4838   3       0             0 python
[ 2992]     0  2992   245248      229  12       0             0 python
[ 2993]     0  2993   245248     1768   3       0             0 python
[ 2994]     0  2994   245248     4802   3       0             0 python
[ 2995]     0  2995   245248     7995   9       0             0 python
[ 2996]     0  2996   245248     2141  12       0             0 python
[ 2997]     0  2997   245248     1741   2       0             0 python
[ 2998]     0  2998   245248     4905  14       0             0 python
[ 2999]     0  2999   245248     2789   3       0             0 python
[ 3000]     0  3000   245248     4321   2       0             0 python
[ 3001]     0  3001   245248     3816  11       0             0 python
[ 3002]     0  3002   245248     2790   2       0             0 python
[ 3003]     0  3003   245248     1760   2       0             0 python
[ 3004]     0  3004   245248     3290   9       0             0 python
[ 3005]     0  3005   245248     2793   3       0             0 python
[ 3006]     0  3006   245248     3811   3       0             0 python
[ 3007]     0  3007   245248     3302   9       0             0 python
[ 3008]     0  3008   245248     2304  12       0             0 python
[ 3009]     0  3009   245248     2797   9       0             0 python
[ 3010]     0  3010   245248     2723   9       0             0 python
[ 3011]     0  3011   245248     1769   9       0             0 python
[ 3017]     0  3017   245248     1823  11       0             0 python
[ 3018]     0  3018   245248     2794  11       0             0 python
[ 3019]     0  3019   245248     3817   3       0             0 python
[ 3020]     0  3020   245248     1769  14       0             0 python
[ 3022]     0  3022   245248     1837  15       0             0 python
[ 3023]     0  3023   245248     2282  10       0             0 python
[ 3024]     0  3024   245248     2282  10       0             0 python
[ 3025]     0  3025   245248     2278   3       0             0 python
[ 3026]     0  3026   245248     2282  14       0             0 python
[ 3027]     0  3027   245248     2791   2       0             0 python
[ 3028]     0  3028   245248     1461   9       0             0 python
[ 3029]     0  3029   245248     1773   3       0             0 python
[ 3030]     0  3030   245248     2280   9       0             0 python
[ 3031]     0  3031   245248     3862   9       0             0 python
[ 3032]     0  3032   245248     2381  11       0             0 python
[ 3033]     0  3033   245248     2437   9       0             0 python
[ 3034]     0  3034   245248     1769   9       0             0 python
[ 3035]     0  3035   245248     3144  10       0             0 python
[ 3036]     0  3036   245248     2676  11       0             0 python
[ 3037]     0  3037   245248      214  11       0             0 python
[ 3038]     0  3038   245248     2389   9       0             0 python
[ 3039]     0  3039   245248     2386   9       0             0 python
[ 3040]     0  3040   245248     2334   2       0             0 python
[ 3041]     0  3041   245248     3819   0       0             0 python
[ 3042]     0  3042   245248     2373   3       0             0 python
[ 3043]     0  3043   245248     1259   9       0             0 python
[ 3044]     0  3044   245248     2183   3       0             0 python
[ 3045]     0  3045   245248     5869  14       0             0 python
[ 3046]     0  3046   245248     2281  10       0             0 python
[ 3047]     0  3047   245248     2791   9       0             0 python
[ 3048]     0  3048   245248     3820  12       0             0 python
[ 3049]     0  3049   245248     2792  10       0             0 python
[ 3050]     0  3050   245248     1449   3       0             0 python
[ 3051]     0  3051   245248     1769   9       0             0 python
[ 3052]     0  3052   245248     4330  10       0             0 python
[ 3053]     0  3053   245248     1731   9       0             0 python
[ 3054]     0  3054   245248     1257   9       0             0 python
[ 3055]     0  3055   245248     1207  14       0             0 python
[ 3056]     0  3056   245248      184   9       0             0 python
[ 3057]     0  3057   245248     1255   4       0             0 python
[ 3058]     0  3058   245248     1769   2       0             0 python
[ 3059]     0  3059   245248     2234   9       0             0 python
[ 3060]     0  3060   245248     2795   4       0             0 python
[ 3061]     0  3061   245248     1768   4       0             0 python
[ 3062]     0  3062   245248      748  10       0             0 python
[ 3063]     0  3063   245248     1955  15       0             0 python
[ 3064]     0  3064   245248     1260   9       0             0 python
[ 3065]     0  3065   245248     1350   6       0             0 python
[ 3066]     0  3066   245248     1769   9       0             0 python
[ 3067]     0  3067   245248     3307   2       0             0 python
[ 3068]     0  3068   245248     2276   6       0             0 python
[ 3069]     0  3069   245248     1877  10       0             0 python
[ 3070]     0  3070   245248     2702   0       0             0 python
[ 3071]     0  3071   245248     1805  10       0             0 python
[ 3072]     0  3072   245248     1283   9       0             0 python
[ 3073]     0  3073   245248     2282   6       0             0 python
[ 3074]     0  3074   245248     3306   2       0             0 python
[ 3075]     0  3075   245248     2283   2       0             0 python
[ 3076]     0  3076   245248      216   3       0             0 python
[ 3077]     0  3077   245248     2282  11       0             0 python
[ 3078]     0  3078   245248     2045   2       0             0 python
[ 3079]     0  3079   245248     2794   7       0             0 python
[ 3080]     0  3080   245248     1764  10       0             0 python
[ 3081]     0  3081   245248     1769  13       0             0 python
[ 3082]     0  3082   245248     1258   3       0             0 python
[ 3083]     0  3083   245248     2283   9       0             0 python
[ 3084]     0  3084   245248     1351   9       0             0 python
[ 3085]     0  3085   245248     1256   9       0             0 python
[ 3086]     0  3086   245248     2282   9       0             0 python
[ 3087]     0  3087   245248     2771   4       0             0 python
[ 3088]     0  3088   245248     3839   3       0             0 python
[ 3089]     0  3089   245248     2271  11       0             0 python
[ 3090]     0  3090   245248     2082  10       0             0 python
[ 3091]     0  3091   245248     3285   2       0             0 python
[ 3092]     0  3092   245248      722   9       0             0 python
[ 3093]     0  3093   245248     1768   2       0             0 python
[ 3094]     0  3094   245248     1259   9       0             0 python
[ 3095]     0  3095   245248     2283   9       0             0 python
[ 3096]     0  3096   245248     1314  10       0             0 python
[ 3097]     0  3097   245248     2441   9       0             0 python
[ 3098]     0  3098   245248     1770   2       0             0 python
[ 3099]     0  3099   245248     1261  10       0             0 python
[ 3100]     0  3100   245248     2338   9       0             0 python
[ 3101]     0  3101   245248     1770   2       0             0 python
[ 3102]     0  3102   245248     1752   9       0             0 python
[ 3103]     0  3103   245248     1937  10       0             0 python
[ 3104]     0  3104   245248     1768  10       0             0 python
[ 3108]     0  3108   245248     1773   9       0             0 python
[ 3109]     0  3109   245248      746   2       0             0 python
[ 3110]     0  3110   245248     2794  11       0             0 python
[ 3111]     0  3111   245248     3546   9       0             0 python
[ 3112]     0  3112   245248     3307  10       0             0 python
[ 3113]     0  3113   245248     2665  11       0             0 python
[ 3114]     0  3114   245248      214   9       0             0 python
[ 3115]     0  3115   245248     2268   9       0             0 python
[ 3116]     0  3116   245248     1772   9       0             0 python
[ 3117]     0  3117   245248      216  11       0             0 python
[ 3118]     0  3118   245248     2791  10       0             0 python
[ 3119]     0  3119   245248      746   3       0             0 python
[ 3120]     0  3120   245248     1257  10       0             0 python
[ 3121]     0  3121   245248     1418  10       0             0 python
[ 3122]     0  3122   245248     1262   9       0             0 python
[ 3123]     0  3123   245248     1260   9       0             0 python
[ 3124]     0  3124   245248     1771  15       0             0 python
[ 3125]     0  3125   245248      216  11       0             0 python
[ 3126]     0  3126   245248     1305   9       0             0 python
[ 3127]     0  3127   245248     1247  12       0             0 python
[ 3128]     0  3128   245248     2221   4       0             0 python
[ 3129]     0  3129   245248      746   2       0             0 python
[ 3130]     0  3130   245248      746  11       0             0 python
[ 3131]     0  3131   245248      743  11       0             0 python
[ 3132]     0  3132   245248      218   4       0             0 python
[ 3133]     0  3133   245248     1770   2       0             0 python
[ 3134]     0  3134   245248      232  10       0             0 python
[ 3135]     0  3135    41834      474   2       0             0 python
[ 3136]     0  3136   245248      217   1       0             0 python
[ 3139]     0  3139   245248      215  11       0             0 python
[ 3140]     0  3140   245248      214   1       0             0 python
[ 3141]     0  3141   245248      215   3       0             0 python
[ 3142]     0  3142   245248      216   1       0             0 python
[ 3143]     0  3143   245248      215   7       0             0 python
[ 3144]     0  3144   245248      217  10       0             0 python
[ 3145]     0  3145   245248      216  12       0             0 python
[ 3146]     0  3146    41834      140   2       0             0 python
[ 3157]     0  3157    41834      140   0       0             0 python
[ 3158]     0  3158    41834      127   3       0             0 python
[ 3159]     0  3159    41834      133   2       0             0 python
[ 3160]     0  3160    41834      123   3       0             0 python
[ 3161]     0  3161    41834      117   3       0             0 python
[ 3162]     0  3162    41834      113   3       0             0 python
[ 3164]     0  3164    41834      107   1       0             0 python
[ 3166]     0  3166    41834       98   3       0             0 python
Out of memory: Kill process 2103 (mingetty) score 1 or sacrifice child
Killed process 2103 (mingetty) total-vm:4100kB, anon-rss:0kB, file-rss:0kB
python invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
python cpuset=/ mems_allowed=0-1

Out of memory: Kill process 3246 (agetty) score 1 or sacrifice child
Killed process 3246 (agetty) total-vm:4116kB, anon-rss:72kB, file-rss:0kB
init: tty (init: tty (/devinit: tty (/dev/tty4) main process (3169) killed by KILL signal
init: tty (/dev/tty4) main process ended, respawning
init: tty (/dev/tty5) main process (3170) killed by KILL signal
init: tty (/dev/tty5) main process ended, respawning
init: tty (/dev/tty6) main process (3171) killed by KILL signal
init: tty (/dev/tty6) main process ended, respawning
init: serial (ttyS0) main process (3246) killed by KILL signal
init: serial (ttyS0) main process ended, respawning

> I'm indifferent to the actual scale of OOM_SCORE_MAX_FACTOR; it could
> be
> 10 as proposed in this patch or even increased higher for higher
> resolution.
> 
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -38,6 +38,9 @@ int sysctl_oom_kill_allocating_task;
> int sysctl_oom_dump_tasks = 1;
> static DEFINE_SPINLOCK(zone_scan_lock);
> 
> +#define OOM_SCORE_MAX_FACTOR 10
> +#define OOM_SCORE_MAX (OOM_SCORE_ADJ_MAX * OOM_SCORE_MAX_FACTOR)
> +
> #ifdef CONFIG_NUMA
> /**
> * has_intersects_mems_allowed() - check task eligiblity for kill
> @@ -160,7 +163,7 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> */
> if (p->flags & PF_OOM_ORIGIN) {
> task_unlock(p);
> - return 1000;
> + return OOM_SCORE_MAX;
> }
> 
> /*
> @@ -177,32 +180,38 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> points = get_mm_rss(p->mm) + p->mm->nr_ptes;
> points += get_mm_counter(p->mm, MM_SWAPENTS);
> 
> - points *= 1000;
> + points *= OOM_SCORE_MAX;
> points /= totalpages;
> task_unlock(p);
> 
> /*
> - * Root processes get 3% bonus, just like the __vm_enough_memory()
> - * implementation used by LSMs.
> + * Root processes get a bonus of 1% per 10% of memory used.
> */
> - if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> - points -= 30;
> + if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> + int bonus;
> + int granularity;
> +
> + bonus = OOM_SCORE_MAX / 100; /* bonus is 1% */
> + granularity = OOM_SCORE_MAX / 10; /* granularity is 10% */
> +
> + points -= bonus * (points / granularity);
> + }
> 
> /*
> * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
> * either completely disable oom killing or always prefer a certain
> * task.
> */
> - points += p->signal->oom_score_adj;
> + points += p->signal->oom_score_adj * OOM_SCORE_MAX_FACTOR;
> 
> /*
> * Never return 0 for an eligible task that may be killed since it's
> - * possible that no single user task uses more than 0.1% of memory
> and
> + * possible that no single user task uses more than 0.01% of memory
> and
> * no single admin tasks uses more than 3.0%.
> */
> if (points <= 0)
> return 1;
> - return (points < 1000) ? points : 1000;
> + return (points < OOM_SCORE_MAX) ? points : OOM_SCORE_MAX;
> }
> 
> /*
> @@ -314,7 +323,7 @@ static struct task_struct
> *select_bad_process(unsigned int *ppoints,
> */
> if (p == current) {
> chosen = p;
> - *ppoints = 1000;
> + *ppoints = OOM_SCORE_MAX;
> } else {
> /*
> * If this task is not being ptraced on exit,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-20  8:03   ` KOSAKI Motohiro
@ 2011-05-26  9:34     ` CAI Qian
  -1 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-26  9:34 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg

Hello KOSAKI,

----- Original Message -----
> CAI Qian reported his kernel did hang-up if he ran fork intensive
> workload and then invoke oom-killer.
> 
> The problem is, current oom calculation uses 0-1000 normalized value
> (The unit is a permillage of system-ram). Its low precision make
> a lot of same oom score. IOW, in his case, all processes have smaller
> oom score than 1 and internal calculation round it to 1.
> 
> Thus oom-killer kill ineligible process. This regression is caused by
> commit a63d83f427 (oom: badness heuristic rewrite).
> 
> The solution is, the internal calculation just use number of pages
> instead of permillage of system-ram. And convert it to permillage
> value at displaying time.
> 
> This patch doesn't change any ABI (included /proc/<pid>/oom_score_adj)
> even though current logic has a lot of my dislike thing.
> 
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
> fs/proc/base.c | 13 ++++++----
> include/linux/oom.h | 7 +----
> mm/oom_kill.c | 60 +++++++++++++++++++++++++++++++++-----------------
> 3 files changed, 49 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index dfa5327..d6b0424 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -476,14 +476,17 @@ static const struct file_operations
> proc_lstats_operations = {
> 
> static int proc_oom_score(struct task_struct *task, char *buffer)
> {
> - unsigned long points = 0;
> + unsigned long points;
> + unsigned long ratio = 0;
> + unsigned long totalpages = totalram_pages + total_swap_pages + 1;
> 
> read_lock(&tasklist_lock);
> - if (pid_alive(task))
> - points = oom_badness(task, NULL, NULL,
> - totalram_pages + total_swap_pages);
> + if (pid_alive(task)) {
> + points = oom_badness(task, NULL, NULL, totalpages);
> + ratio = points * 1000 / totalpages;
> + }
> read_unlock(&tasklist_lock);
> - return sprintf(buffer, "%lu\n", points);
> + return sprintf(buffer, "%lu\n", ratio);
> }
> 
> struct limit_names {
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 5e3aa83..0f5b588 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -40,7 +40,8 @@ enum oom_constraint {
> CONSTRAINT_MEMCG,
> };
> 
> -extern unsigned int oom_badness(struct task_struct *p, struct
> mem_cgroup *mem,
> +/* The badness from the OOM killer */
> +extern unsigned long oom_badness(struct task_struct *p, struct
> mem_cgroup *mem,
> const nodemask_t *nodemask, unsigned long totalpages);
> extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t
> gfp_flags);
> extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t
> gfp_flags);
> @@ -62,10 +63,6 @@ static inline void oom_killer_enable(void)
> oom_killer_disabled = false;
> }
> 
> -/* The badness from the OOM killer */
> -extern unsigned long badness(struct task_struct *p, struct mem_cgroup
> *mem,
> - const nodemask_t *nodemask, unsigned long uptime);
> -
> extern struct task_struct *find_lock_task_mm(struct task_struct *p);
> 
> /* sysctls */
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index e6a6c6f..8bbc3df 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -132,10 +132,12 @@ static bool oom_unkillable_task(struct
> task_struct *p,
> * predictable as possible. The goal is to return the highest value for
> the
> * task consuming the most memory to avoid subsequent oom failures.
> */
> -unsigned int oom_badness(struct task_struct *p, struct mem_cgroup
> *mem,
> +unsigned long oom_badness(struct task_struct *p, struct mem_cgroup
> *mem,
> const nodemask_t *nodemask, unsigned long totalpages)
> {
> - int points;
> + unsigned long points;
> + unsigned long score_adj = 0;
> +
> 
> if (oom_unkillable_task(p, mem, nodemask))
> return 0;
> @@ -160,7 +162,7 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> */
> if (p->flags & PF_OOM_ORIGIN) {
> task_unlock(p);
> - return 1000;
> + return ULONG_MAX;
> }
This part failed to apply to the latest git tree so unable to test those
patches this time. Can you fix that?

Thanks,
CAI Qian
> /*
> @@ -176,33 +178,49 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> */
> points = get_mm_rss(p->mm) + p->mm->nr_ptes;
> points += get_mm_counter(p->mm, MM_SWAPENTS);
> -
> - points *= 1000;
> - points /= totalpages;
> task_unlock(p);
> 
> /*
> * Root processes get 3% bonus, just like the __vm_enough_memory()
> * implementation used by LSMs.
> + *
> + * XXX: Too large bonus, example, if the system have tera-bytes
> memory..
> */
> - if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> - points -= 30;
> + if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> + if (points >= totalpages / 32)
> + points -= totalpages / 32;
> + else
> + points = 0;
> + }
> 
> /*
> * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
> * either completely disable oom killing or always prefer a certain
> * task.
> */
> - points += p->signal->oom_score_adj;
> + if (p->signal->oom_score_adj >= 0) {
> + score_adj = p->signal->oom_score_adj * (totalpages / 1000);
> + if (ULONG_MAX - points >= score_adj)
> + points += score_adj;
> + else
> + points = ULONG_MAX;
> + } else {
> + score_adj = -p->signal->oom_score_adj * (totalpages / 1000);
> + if (points >= score_adj)
> + points -= score_adj;
> + else
> + points = 0;
> + }
> 
> /*
> * Never return 0 for an eligible task that may be killed since it's
> * possible that no single user task uses more than 0.1% of memory and
> * no single admin tasks uses more than 3.0%.
> */
> - if (points <= 0)
> - return 1;
> - return (points < 1000) ? points : 1000;
> + if (!points)
> + points = 1;
> +
> + return points;
> }
> 
> /*
> @@ -274,7 +292,7 @@ static enum oom_constraint
> constrained_alloc(struct zonelist *zonelist,
> *
> * (not docbooked, we don't want this one cluttering up the manual)
> */
> -static struct task_struct *select_bad_process(unsigned int *ppoints,
> +static struct task_struct *select_bad_process(unsigned long *ppoints,
> unsigned long totalpages, struct mem_cgroup *mem,
> const nodemask_t *nodemask)
> {
> @@ -283,7 +301,7 @@ static struct task_struct
> *select_bad_process(unsigned int *ppoints,
> *ppoints = 0;
> 
> do_each_thread_reverse(g, p) {
> - unsigned int points;
> + unsigned long points;
> 
> if (!p->mm)
> continue;
> @@ -314,7 +332,7 @@ static struct task_struct
> *select_bad_process(unsigned int *ppoints,
> */
> if (p == current) {
> chosen = p;
> - *ppoints = 1000;
> + *ppoints = ULONG_MAX;
> } else {
> /*
> * If this task is not being ptraced on exit,
> @@ -445,14 +463,14 @@ static int oom_kill_task(struct task_struct *p,
> struct mem_cgroup *mem)
> #undef K
> 
> static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int
> order,
> - unsigned int points, unsigned long totalpages,
> + unsigned long points, unsigned long totalpages,
> struct mem_cgroup *mem, nodemask_t *nodemask,
> const char *message)
> {
> struct task_struct *victim = p;
> struct task_struct *child;
> struct task_struct *t = p;
> - unsigned int victim_points = 0;
> + unsigned long victim_points = 0;
> 
> if (printk_ratelimit())
> dump_header(p, gfp_mask, order, mem, nodemask);
> @@ -467,7 +485,7 @@ static int oom_kill_process(struct task_struct *p,
> gfp_t gfp_mask, int order,
> }
> 
> task_lock(p);
> - pr_err("%s: Kill process %d (%s) score %d or sacrifice child\n",
> + pr_err("%s: Kill process %d (%s) points %lu or sacrifice child\n",
> message, task_pid_nr(p), p->comm, points);
> task_unlock(p);
> 
> @@ -479,7 +497,7 @@ static int oom_kill_process(struct task_struct *p,
> gfp_t gfp_mask, int order,
> */
> do {
> list_for_each_entry(child, &t->children, sibling) {
> - unsigned int child_points;
> + unsigned long child_points;
> 
> if (child->mm == p->mm)
> continue;
> @@ -526,7 +544,7 @@ static void check_panic_on_oom(enum oom_constraint
> constraint, gfp_t gfp_mask,
> void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
> {
> unsigned long limit;
> - unsigned int points = 0;
> + unsigned long points = 0;
> struct task_struct *p;
> 
> /*
> @@ -675,7 +693,7 @@ void out_of_memory(struct zonelist *zonelist,
> gfp_t gfp_mask,
> struct task_struct *p;
> unsigned long totalpages;
> unsigned long freed = 0;
> - unsigned int points;
> + unsigned long points;
> enum oom_constraint constraint = CONSTRAINT_NONE;
> int killed = 0;
> 
> --
> 1.7.3.1
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: href=mailto:"dont@kvack.org"> email@kvack.org 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-26  9:34     ` CAI Qian
  0 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-26  9:34 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg

Hello KOSAKI,

----- Original Message -----
> CAI Qian reported his kernel did hang-up if he ran fork intensive
> workload and then invoke oom-killer.
> 
> The problem is, current oom calculation uses 0-1000 normalized value
> (The unit is a permillage of system-ram). Its low precision make
> a lot of same oom score. IOW, in his case, all processes have smaller
> oom score than 1 and internal calculation round it to 1.
> 
> Thus oom-killer kill ineligible process. This regression is caused by
> commit a63d83f427 (oom: badness heuristic rewrite).
> 
> The solution is, the internal calculation just use number of pages
> instead of permillage of system-ram. And convert it to permillage
> value at displaying time.
> 
> This patch doesn't change any ABI (included /proc/<pid>/oom_score_adj)
> even though current logic has a lot of my dislike thing.
> 
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
> fs/proc/base.c | 13 ++++++----
> include/linux/oom.h | 7 +----
> mm/oom_kill.c | 60 +++++++++++++++++++++++++++++++++-----------------
> 3 files changed, 49 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index dfa5327..d6b0424 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -476,14 +476,17 @@ static const struct file_operations
> proc_lstats_operations = {
> 
> static int proc_oom_score(struct task_struct *task, char *buffer)
> {
> - unsigned long points = 0;
> + unsigned long points;
> + unsigned long ratio = 0;
> + unsigned long totalpages = totalram_pages + total_swap_pages + 1;
> 
> read_lock(&tasklist_lock);
> - if (pid_alive(task))
> - points = oom_badness(task, NULL, NULL,
> - totalram_pages + total_swap_pages);
> + if (pid_alive(task)) {
> + points = oom_badness(task, NULL, NULL, totalpages);
> + ratio = points * 1000 / totalpages;
> + }
> read_unlock(&tasklist_lock);
> - return sprintf(buffer, "%lu\n", points);
> + return sprintf(buffer, "%lu\n", ratio);
> }
> 
> struct limit_names {
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 5e3aa83..0f5b588 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -40,7 +40,8 @@ enum oom_constraint {
> CONSTRAINT_MEMCG,
> };
> 
> -extern unsigned int oom_badness(struct task_struct *p, struct
> mem_cgroup *mem,
> +/* The badness from the OOM killer */
> +extern unsigned long oom_badness(struct task_struct *p, struct
> mem_cgroup *mem,
> const nodemask_t *nodemask, unsigned long totalpages);
> extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t
> gfp_flags);
> extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t
> gfp_flags);
> @@ -62,10 +63,6 @@ static inline void oom_killer_enable(void)
> oom_killer_disabled = false;
> }
> 
> -/* The badness from the OOM killer */
> -extern unsigned long badness(struct task_struct *p, struct mem_cgroup
> *mem,
> - const nodemask_t *nodemask, unsigned long uptime);
> -
> extern struct task_struct *find_lock_task_mm(struct task_struct *p);
> 
> /* sysctls */
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index e6a6c6f..8bbc3df 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -132,10 +132,12 @@ static bool oom_unkillable_task(struct
> task_struct *p,
> * predictable as possible. The goal is to return the highest value for
> the
> * task consuming the most memory to avoid subsequent oom failures.
> */
> -unsigned int oom_badness(struct task_struct *p, struct mem_cgroup
> *mem,
> +unsigned long oom_badness(struct task_struct *p, struct mem_cgroup
> *mem,
> const nodemask_t *nodemask, unsigned long totalpages)
> {
> - int points;
> + unsigned long points;
> + unsigned long score_adj = 0;
> +
> 
> if (oom_unkillable_task(p, mem, nodemask))
> return 0;
> @@ -160,7 +162,7 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> */
> if (p->flags & PF_OOM_ORIGIN) {
> task_unlock(p);
> - return 1000;
> + return ULONG_MAX;
> }
This part failed to apply to the latest git tree so unable to test those
patches this time. Can you fix that?

Thanks,
CAI Qian
> /*
> @@ -176,33 +178,49 @@ unsigned int oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> */
> points = get_mm_rss(p->mm) + p->mm->nr_ptes;
> points += get_mm_counter(p->mm, MM_SWAPENTS);
> -
> - points *= 1000;
> - points /= totalpages;
> task_unlock(p);
> 
> /*
> * Root processes get 3% bonus, just like the __vm_enough_memory()
> * implementation used by LSMs.
> + *
> + * XXX: Too large bonus, example, if the system have tera-bytes
> memory..
> */
> - if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> - points -= 30;
> + if (has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> + if (points >= totalpages / 32)
> + points -= totalpages / 32;
> + else
> + points = 0;
> + }
> 
> /*
> * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
> * either completely disable oom killing or always prefer a certain
> * task.
> */
> - points += p->signal->oom_score_adj;
> + if (p->signal->oom_score_adj >= 0) {
> + score_adj = p->signal->oom_score_adj * (totalpages / 1000);
> + if (ULONG_MAX - points >= score_adj)
> + points += score_adj;
> + else
> + points = ULONG_MAX;
> + } else {
> + score_adj = -p->signal->oom_score_adj * (totalpages / 1000);
> + if (points >= score_adj)
> + points -= score_adj;
> + else
> + points = 0;
> + }
> 
> /*
> * Never return 0 for an eligible task that may be killed since it's
> * possible that no single user task uses more than 0.1% of memory and
> * no single admin tasks uses more than 3.0%.
> */
> - if (points <= 0)
> - return 1;
> - return (points < 1000) ? points : 1000;
> + if (!points)
> + points = 1;
> +
> + return points;
> }
> 
> /*
> @@ -274,7 +292,7 @@ static enum oom_constraint
> constrained_alloc(struct zonelist *zonelist,
> *
> * (not docbooked, we don't want this one cluttering up the manual)
> */
> -static struct task_struct *select_bad_process(unsigned int *ppoints,
> +static struct task_struct *select_bad_process(unsigned long *ppoints,
> unsigned long totalpages, struct mem_cgroup *mem,
> const nodemask_t *nodemask)
> {
> @@ -283,7 +301,7 @@ static struct task_struct
> *select_bad_process(unsigned int *ppoints,
> *ppoints = 0;
> 
> do_each_thread_reverse(g, p) {
> - unsigned int points;
> + unsigned long points;
> 
> if (!p->mm)
> continue;
> @@ -314,7 +332,7 @@ static struct task_struct
> *select_bad_process(unsigned int *ppoints,
> */
> if (p == current) {
> chosen = p;
> - *ppoints = 1000;
> + *ppoints = ULONG_MAX;
> } else {
> /*
> * If this task is not being ptraced on exit,
> @@ -445,14 +463,14 @@ static int oom_kill_task(struct task_struct *p,
> struct mem_cgroup *mem)
> #undef K
> 
> static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int
> order,
> - unsigned int points, unsigned long totalpages,
> + unsigned long points, unsigned long totalpages,
> struct mem_cgroup *mem, nodemask_t *nodemask,
> const char *message)
> {
> struct task_struct *victim = p;
> struct task_struct *child;
> struct task_struct *t = p;
> - unsigned int victim_points = 0;
> + unsigned long victim_points = 0;
> 
> if (printk_ratelimit())
> dump_header(p, gfp_mask, order, mem, nodemask);
> @@ -467,7 +485,7 @@ static int oom_kill_process(struct task_struct *p,
> gfp_t gfp_mask, int order,
> }
> 
> task_lock(p);
> - pr_err("%s: Kill process %d (%s) score %d or sacrifice child\n",
> + pr_err("%s: Kill process %d (%s) points %lu or sacrifice child\n",
> message, task_pid_nr(p), p->comm, points);
> task_unlock(p);
> 
> @@ -479,7 +497,7 @@ static int oom_kill_process(struct task_struct *p,
> gfp_t gfp_mask, int order,
> */
> do {
> list_for_each_entry(child, &t->children, sibling) {
> - unsigned int child_points;
> + unsigned long child_points;
> 
> if (child->mm == p->mm)
> continue;
> @@ -526,7 +544,7 @@ static void check_panic_on_oom(enum oom_constraint
> constraint, gfp_t gfp_mask,
> void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
> {
> unsigned long limit;
> - unsigned int points = 0;
> + unsigned long points = 0;
> struct task_struct *p;
> 
> /*
> @@ -675,7 +693,7 @@ void out_of_memory(struct zonelist *zonelist,
> gfp_t gfp_mask,
> struct task_struct *p;
> unsigned long totalpages;
> unsigned long freed = 0;
> - unsigned int points;
> + unsigned long points;
> enum oom_constraint constraint = CONSTRAINT_NONE;
> int killed = 0;
> 
> --
> 1.7.3.1
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: href=mailto:"dont@kvack.org"> email@kvack.org 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-26  9:34     ` CAI Qian
@ 2011-05-26  9:56       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-26  9:56 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

>> @@ -160,7 +162,7 @@ unsigned int oom_badness(struct task_struct *p,
>> struct mem_cgroup *mem,
>> */
>> if (p->flags & PF_OOM_ORIGIN) {
>> task_unlock(p);
>> - return 1000;
>> + return ULONG_MAX;
>> }
> This part failed to apply to the latest git tree so unable to test those
> patches this time. Can you fix that?

Please apply ontop mmotm-0512.

Thanks.



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-26  9:56       ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-26  9:56 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

>> @@ -160,7 +162,7 @@ unsigned int oom_badness(struct task_struct *p,
>> struct mem_cgroup *mem,
>> */
>> if (p->flags & PF_OOM_ORIGIN) {
>> task_unlock(p);
>> - return 1000;
>> + return ULONG_MAX;
>> }
> This part failed to apply to the latest git tree so unable to test those
> patches this time. Can you fix that?

Please apply ontop mmotm-0512.

Thanks.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
  2011-05-26  7:08         ` CAI Qian
@ 2011-05-27 19:12           ` David Rientjes
  -1 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-27 19:12 UTC (permalink / raw)
  To: CAI Qian
  Cc: linux-mm, linux-kernel, Andrew Morton, hughd, kamezawa hiroyu,
	minchan kim, oleg, KOSAKI Motohiro

On Thu, 26 May 2011, CAI Qian wrote:

> Here is the results for the testing. Running the reproducer as non-root
> user, the results look good as OOM killer just killed each python process
> in-turn that the reproducer forked. However, when running it as root
> user, sshd and other random processes had been killed.
> 

Thanks for testing!  The patch that I proposed for you was a little more 
conservative in terms of providing a bonus to root processes that aren't 
using a certain threshold of memory.  My latest proposal was to give root 
processes only a 1% bonus for every 10% of memory they consume, so it 
would be impossible for them to have an oom score of 1 as reported in your 
logs.

I believe that KOSAKI-san is refreshing his series of patches, so let's 
look at how your workload behaves on the next iteration.  Thanks CAI!

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally
@ 2011-05-27 19:12           ` David Rientjes
  0 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-27 19:12 UTC (permalink / raw)
  To: CAI Qian
  Cc: linux-mm, linux-kernel, Andrew Morton, hughd, kamezawa hiroyu,
	minchan kim, oleg, KOSAKI Motohiro

On Thu, 26 May 2011, CAI Qian wrote:

> Here is the results for the testing. Running the reproducer as non-root
> user, the results look good as OOM killer just killed each python process
> in-turn that the reproducer forked. However, when running it as root
> user, sshd and other random processes had been killed.
> 

Thanks for testing!  The patch that I proposed for you was a little more 
conservative in terms of providing a bonus to root processes that aren't 
using a certain threshold of memory.  My latest proposal was to give root 
processes only a 1% bonus for every 10% of memory they consume, so it 
would be impossible for them to have an oom score of 1 as reported in your 
logs.

I believe that KOSAKI-san is refreshing his series of patches, so let's 
look at how your workload behaves on the next iteration.  Thanks CAI!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-25 23:50                 ` David Rientjes
@ 2011-05-30  1:17                   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-30  1:17 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

> I'm afraid that a second time through the tasklist in select_bad_process() 
> is simply a non-starter for _any_ case; it significantly increases the 
> amount of time that tasklist_lock is held and causes problems elsewhere on 
> large systems -- such as some of ours -- since irqs are disabled while 
> waiting for the writeside of the lock.  I think it would be better to use 
> a proportional privilege for root processes based on the amount of memory 
> they are using (discounting 1% of memory per 10% of memory used, as 
> proposed earlier, seems sane) so we can always protect root when necessary 
> and never iterate through the list again.
> 
> Please look into the earlier review comments on the other patches, refresh 
> the series, and post it again.  Thanks!

Never mind.

You never see to increase tasklist_lock. You never seen all processes
have root privilege case.


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-30  1:17                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-30  1:17 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

> I'm afraid that a second time through the tasklist in select_bad_process() 
> is simply a non-starter for _any_ case; it significantly increases the 
> amount of time that tasklist_lock is held and causes problems elsewhere on 
> large systems -- such as some of ours -- since irqs are disabled while 
> waiting for the writeside of the lock.  I think it would be better to use 
> a proportional privilege for root processes based on the amount of memory 
> they are using (discounting 1% of memory per 10% of memory used, as 
> proposed earlier, seems sane) so we can always protect root when necessary 
> and never iterate through the list again.
> 
> Please look into the earlier review comments on the other patches, refresh 
> the series, and post it again.  Thanks!

Never mind.

You never see to increase tasklist_lock. You never seen all processes
have root privilege case.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-20  8:00 ` KOSAKI Motohiro
@ 2011-05-31  1:33   ` CAI Qian
  -1 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-31  1:33 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg

Hello,

Have tested those patches rebased from KOSAKI for the latest mainline.
It still killed random processes and recevied a panic at the end by
using root user. The full oom output can be found here.
http://people.redhat.com/qcai/oom

Cheers,
CAI Qian

----- Original Message -----
> CAI Qian reported current oom logic doesn't work at all on his 16GB
> RAM
> machine. oom killer killed all system daemon at first and his system
> stopped responding.
> 
> The brief log is below.
> 
> > Out of memory: Kill process 1175 (dhclient) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1247 (rsyslogd) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1284 (irqbalance) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1303 (rpcbind) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1321 (rpc.statd) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1333 (mdadm) score 1 or sacrifice child
> > Out of memory: Kill process 1365 (rpc.idmapd) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1403 (dbus-daemon) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1438 (acpid) score 1 or sacrifice child
> > Out of memory: Kill process 1447 (hald) score 1 or sacrifice child
> > Out of memory: Kill process 1447 (hald) score 1 or sacrifice child
> > Out of memory: Kill process 1487 (hald-addon-inpu) score 1 or
> > sacrifice child
> > Out of memory: Kill process 1488 (hald-addon-acpi) score 1 or
> > sacrifice child
> > Out of memory: Kill process 1507 (automount) score 1 or sacrifice
> > child
> 
> 
> The problems are three.
> 
> 1) if two processes have the same oom score, we should kill younger
> process.
> but current logic kill older. Typically oldest processes are system
> daemons.
> 2) Current logic use 'unsigned int' for internal score calculation.
> (exactly says,
> it only use 0-1000 value). its very low precision calculation makes a
> lot of
> same oom score and kill an ineligible process.
> 3) Current logic give 3% of SystemRAM to root processes. It obviously
> too big
> if you have plenty memory. Now, your fork-bomb processes have 500MB
> OOM immune
> bonus. then your fork-bomb never ever be killed.
> 
> 
> KOSAKI Motohiro (5):
> oom: improve dump_tasks() show items
> oom: kill younger process first
> oom: oom-killer don't use proportion of system-ram internally
> oom: don't kill random process
> oom: merge oom_kill_process() with oom_kill_task()
> 
> fs/proc/base.c | 13 ++-
> include/linux/oom.h | 10 +--
> include/linux/sched.h | 11 +++
> mm/oom_kill.c | 201 +++++++++++++++++++++++++++----------------------
> 4 files changed, 135 insertions(+), 100 deletions(-)
> 
> --
> 1.7.3.1
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  1:33   ` CAI Qian
  0 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-31  1:33 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg

Hello,

Have tested those patches rebased from KOSAKI for the latest mainline.
It still killed random processes and recevied a panic at the end by
using root user. The full oom output can be found here.
http://people.redhat.com/qcai/oom

Cheers,
CAI Qian

----- Original Message -----
> CAI Qian reported current oom logic doesn't work at all on his 16GB
> RAM
> machine. oom killer killed all system daemon at first and his system
> stopped responding.
> 
> The brief log is below.
> 
> > Out of memory: Kill process 1175 (dhclient) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1247 (rsyslogd) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1284 (irqbalance) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1303 (rpcbind) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1321 (rpc.statd) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1333 (mdadm) score 1 or sacrifice child
> > Out of memory: Kill process 1365 (rpc.idmapd) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1403 (dbus-daemon) score 1 or sacrifice
> > child
> > Out of memory: Kill process 1438 (acpid) score 1 or sacrifice child
> > Out of memory: Kill process 1447 (hald) score 1 or sacrifice child
> > Out of memory: Kill process 1447 (hald) score 1 or sacrifice child
> > Out of memory: Kill process 1487 (hald-addon-inpu) score 1 or
> > sacrifice child
> > Out of memory: Kill process 1488 (hald-addon-acpi) score 1 or
> > sacrifice child
> > Out of memory: Kill process 1507 (automount) score 1 or sacrifice
> > child
> 
> 
> The problems are three.
> 
> 1) if two processes have the same oom score, we should kill younger
> process.
> but current logic kill older. Typically oldest processes are system
> daemons.
> 2) Current logic use 'unsigned int' for internal score calculation.
> (exactly says,
> it only use 0-1000 value). its very low precision calculation makes a
> lot of
> same oom score and kill an ineligible process.
> 3) Current logic give 3% of SystemRAM to root processes. It obviously
> too big
> if you have plenty memory. Now, your fork-bomb processes have 500MB
> OOM immune
> bonus. then your fork-bomb never ever be killed.
> 
> 
> KOSAKI Motohiro (5):
> oom: improve dump_tasks() show items
> oom: kill younger process first
> oom: oom-killer don't use proportion of system-ram internally
> oom: don't kill random process
> oom: merge oom_kill_process() with oom_kill_task()
> 
> fs/proc/base.c | 13 ++-
> include/linux/oom.h | 10 +--
> include/linux/sched.h | 11 +++
> mm/oom_kill.c | 201 +++++++++++++++++++++++++++----------------------
> 4 files changed, 135 insertions(+), 100 deletions(-)
> 
> --
> 1.7.3.1
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  1:33   ` CAI Qian
@ 2011-05-31  4:10     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  4:10 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/31 10:33), CAI Qian wrote:
> Hello,
> 
> Have tested those patches rebased from KOSAKI for the latest mainline.
> It still killed random processes and recevied a panic at the end by
> using root user. The full oom output can be found here.
> http://people.redhat.com/qcai/oom

You ran fork-bomb as root. Therefore unprivileged process was killed at first.
It's no random. It's intentional and desirable. I mean

- If you run the same progream as non-root, python will be killed at first.
  Because it consume a lot of memory than daemons.
- If you run the same program as root, non root process and privilege explicit
  dropping processes (e.g. irqbalance) will be killed at first.


Look, your log says, highest oom score process was killed first.

Out of memory: Kill process 5462 (abrtd) points:393 total-vm:262300kB, anon-rss:1024kB, file-rss:0kB
Out of memory: Kill process 5277 (hald) points:303 total-vm:25444kB, anon-rss:1116kB, file-rss:0kB
Out of memory: Kill process 5720 (sshd) points:258 total-vm:97684kB, anon-rss:824kB, file-rss:0kB
Out of memory: Kill process 5457 (pickup) points:236 total-vm:78672kB, anon-rss:768kB, file-rss:0kB
Out of memory: Kill process 5451 (master) points:235 total-vm:78592kB, anon-rss:796kB, file-rss:0kB
Out of memory: Kill process 5458 (qmgr) points:233 total-vm:78740kB, anon-rss:764kB, file-rss:0kB
Out of memory: Kill process 5353 (sshd) points:189 total-vm:63992kB, anon-rss:620kB, file-rss:0kB
Out of memory: Kill process 1626 (dhclient) points:129 total-vm:9148kB, anon-rss:484kB, file-rss:0kB



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  4:10     ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  4:10 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/31 10:33), CAI Qian wrote:
> Hello,
> 
> Have tested those patches rebased from KOSAKI for the latest mainline.
> It still killed random processes and recevied a panic at the end by
> using root user. The full oom output can be found here.
> http://people.redhat.com/qcai/oom

You ran fork-bomb as root. Therefore unprivileged process was killed at first.
It's no random. It's intentional and desirable. I mean

- If you run the same progream as non-root, python will be killed at first.
  Because it consume a lot of memory than daemons.
- If you run the same program as root, non root process and privilege explicit
  dropping processes (e.g. irqbalance) will be killed at first.


Look, your log says, highest oom score process was killed first.

Out of memory: Kill process 5462 (abrtd) points:393 total-vm:262300kB, anon-rss:1024kB, file-rss:0kB
Out of memory: Kill process 5277 (hald) points:303 total-vm:25444kB, anon-rss:1116kB, file-rss:0kB
Out of memory: Kill process 5720 (sshd) points:258 total-vm:97684kB, anon-rss:824kB, file-rss:0kB
Out of memory: Kill process 5457 (pickup) points:236 total-vm:78672kB, anon-rss:768kB, file-rss:0kB
Out of memory: Kill process 5451 (master) points:235 total-vm:78592kB, anon-rss:796kB, file-rss:0kB
Out of memory: Kill process 5458 (qmgr) points:233 total-vm:78740kB, anon-rss:764kB, file-rss:0kB
Out of memory: Kill process 5353 (sshd) points:189 total-vm:63992kB, anon-rss:620kB, file-rss:0kB
Out of memory: Kill process 1626 (dhclient) points:129 total-vm:9148kB, anon-rss:484kB, file-rss:0kB


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  4:10     ` KOSAKI Motohiro
@ 2011-05-31  4:14       ` CAI Qian
  -1 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-31  4:14 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg



----- Original Message -----
> (2011/05/31 10:33), CAI Qian wrote:
> > Hello,
> >
> > Have tested those patches rebased from KOSAKI for the latest
> > mainline.
> > It still killed random processes and recevied a panic at the end by
> > using root user. The full oom output can be found here.
> > http://people.redhat.com/qcai/oom
> 
> You ran fork-bomb as root. Therefore unprivileged process was killed
> at first.
> It's no random. It's intentional and desirable. I mean
> 
> - If you run the same progream as non-root, python will be killed at
> first.
> Because it consume a lot of memory than daemons.
> - If you run the same program as root, non root process and privilege
> explicit
> dropping processes (e.g. irqbalance) will be killed at first.
> 
> 
> Look, your log says, highest oom score process was killed first.
> 
> Out of memory: Kill process 5462 (abrtd) points:393 total-vm:262300kB,
> anon-rss:1024kB, file-rss:0kB
> Out of memory: Kill process 5277 (hald) points:303 total-vm:25444kB,
> anon-rss:1116kB, file-rss:0kB
> Out of memory: Kill process 5720 (sshd) points:258 total-vm:97684kB,
> anon-rss:824kB, file-rss:0kB
> Out of memory: Kill process 5457 (pickup) points:236 total-vm:78672kB,
> anon-rss:768kB, file-rss:0kB
> Out of memory: Kill process 5451 (master) points:235 total-vm:78592kB,
> anon-rss:796kB, file-rss:0kB
> Out of memory: Kill process 5458 (qmgr) points:233 total-vm:78740kB,
> anon-rss:764kB, file-rss:0kB
> Out of memory: Kill process 5353 (sshd) points:189 total-vm:63992kB,
> anon-rss:620kB, file-rss:0kB
> Out of memory: Kill process 1626 (dhclient) points:129
> total-vm:9148kB, anon-rss:484kB, file-rss:0kB
OK, there was also a panic at the end. Is that expected?

BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
IP: [<ffffffff811227d4>] get_mm_counter+0x14/0x30
PGD 0 
Oops: 0000 [#1] SMP 
CPU 7 
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 dm_mirror dm_region_hash dm_log microcode serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg shpchp ioatdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: scsi_wait_scan]

Pid: 5232, comm: dbus-daemon Not tainted 3.0.0-rc1+ #3 IBM System x3550 M3 -[7944I21]-/69Y4438     
RIP: 0010:[<ffffffff811227d4>]  [<ffffffff811227d4>] get_mm_counter+0x14/0x30
RSP: 0000:ffff88027116b828  EFLAGS: 00010286
RAX: 00000000000002a0 RBX: ffff880470cd8a80 RCX: 0000000000000003
RDX: 000000000000000e RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffff88027116b828 R08: 0000000000000000 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000007 R12: ffff88027116b880
R13: 0000000000000000 R14: 0000000000000000 R15: ffff880270df2100
FS:  00007f78a3837700(0000) GS:ffff88047fc60000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000002a8 CR3: 000000047238f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dbus-daemon (pid: 5232, threadinfo ffff88027116a000, task ffff880270df2100)
Stack:
 ffff88027116b8b8 ffffffff81104c60 0000000000000000 0000000000000000
 ffff8802704c4680 0000000000000000 ffff8802705161c0 0000000000000000
 0000000000000000 0000000000000000 0000000000000286 ffff880470cd8e98
Call Trace:
 [<ffffffff81104c60>] dump_tasks+0xa0/0x160
 [<ffffffff81104dd5>] dump_header+0xb5/0xd0
 [<ffffffff81104f15>] oom_kill_process+0xa5/0x1c0
 [<ffffffff811055ef>] out_of_memory+0xff/0x220
 [<ffffffff8110a962>] __alloc_pages_slowpath+0x632/0x6b0
 [<ffffffff8110ab84>] __alloc_pages_nodemask+0x1a4/0x1f0
 [<ffffffff81147d52>] kmem_getpages+0x62/0x170
 [<ffffffff8114886a>] fallback_alloc+0x1ba/0x270
 [<ffffffff811482e3>] ? cache_grow+0x2c3/0x2f0
 [<ffffffff811485f5>] ____cache_alloc_node+0x95/0x150
 [<ffffffff8114901d>] kmem_cache_alloc+0xfd/0x190
 [<ffffffff810d20ed>] taskstats_exit+0x1cd/0x240
 [<ffffffff81066667>] do_exit+0x177/0x430
 [<ffffffff81066971>] do_group_exit+0x51/0xc0
 [<ffffffff81078583>] get_signal_to_deliver+0x203/0x470
 [<ffffffff8100b939>] do_signal+0x69/0x190
 [<ffffffff8100bac5>] do_notify_resume+0x65/0x80
 [<ffffffff814db6d0>] int_signal+0x12/0x17
Code: 48 8b 00 c9 48 d1 e8 83 e0 01 c3 0f 1f 40 00 31 c0 c9 c3 0f 1f 40 00 55 48 89 e5 66 66 66 66 90 48 63 f6 48 8d 84 f7 90 02 00 00 
 8b 50 08 31 c0 c9 48 85 d2 48 0f 49 c2 c3 66 66 66 66 2e 0f 
RIP  [<ffffffff811227d4>] get_mm_counter+0x14/0x30
 RSP <ffff88027116b828>
CR2: 00000000000002a8
---[ end trace 742b26ee0c4fab73 ]---
Fixing recursive fault but reboot is needed!
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
Pid: 4, comm: kworker/0:0 Tainted: G      D     3.0.0-rc1+ #3
Call Trace:
 <NMI>  [<ffffffff814d062f>] panic+0x91/0x1a8
 [<ffffffff810c76e1>] watchdog_overflow_callback+0xb1/0xc0
 [<ffffffff810fbbdd>] __perf_event_overflow+0x9d/0x250
 [<ffffffff810fc1c4>] perf_event_overflow+0x14/0x20
 [<ffffffff8101df36>] intel_pmu_handle_irq+0x326/0x530
 [<ffffffff814d4ba9>] perf_event_nmi_handler+0x29/0xa0
 [<ffffffff814d6f65>] notifier_call_chain+0x55/0x80
 [<ffffffff814d6fca>] atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff814d6ffe>] notify_die+0x2e/0x30
 [<ffffffff814d4199>] default_do_nmi+0x39/0x1f0
 [<ffffffff814d43d0>] do_nmi+0x80/0xa0
 [<ffffffff814d3b90>] nmi+0x20/0x30
 [<ffffffff8123f379>] ? __write_lock_failed+0x9/0x20
 <<EOE>>  [<ffffffff814d32de>] ? _raw_write_lock_irq+0x1e/0x20
 [<ffffffff81065cec>] forget_original_parent+0x3c/0x330
 [<ffffffff81065ffb>] exit_notify+0x1b/0x190
 [<ffffffff810666ed>] do_exit+0x1fd/0x430
 [<ffffffff8107fae0>] ? manage_workers+0x120/0x120
 [<ffffffff810846ce>] kthread+0x8e/0xa0
 [<ffffffff814dc544>] kernel_thread_helper+0x4/0x10
 [<ffffffff81084640>] ? kthread_worker_fn+0x1a0/0x1a0
 [<ffffffff814dc540>] ? gs_change+0x13/0x13

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  4:14       ` CAI Qian
  0 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-31  4:14 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg



----- Original Message -----
> (2011/05/31 10:33), CAI Qian wrote:
> > Hello,
> >
> > Have tested those patches rebased from KOSAKI for the latest
> > mainline.
> > It still killed random processes and recevied a panic at the end by
> > using root user. The full oom output can be found here.
> > http://people.redhat.com/qcai/oom
> 
> You ran fork-bomb as root. Therefore unprivileged process was killed
> at first.
> It's no random. It's intentional and desirable. I mean
> 
> - If you run the same progream as non-root, python will be killed at
> first.
> Because it consume a lot of memory than daemons.
> - If you run the same program as root, non root process and privilege
> explicit
> dropping processes (e.g. irqbalance) will be killed at first.
> 
> 
> Look, your log says, highest oom score process was killed first.
> 
> Out of memory: Kill process 5462 (abrtd) points:393 total-vm:262300kB,
> anon-rss:1024kB, file-rss:0kB
> Out of memory: Kill process 5277 (hald) points:303 total-vm:25444kB,
> anon-rss:1116kB, file-rss:0kB
> Out of memory: Kill process 5720 (sshd) points:258 total-vm:97684kB,
> anon-rss:824kB, file-rss:0kB
> Out of memory: Kill process 5457 (pickup) points:236 total-vm:78672kB,
> anon-rss:768kB, file-rss:0kB
> Out of memory: Kill process 5451 (master) points:235 total-vm:78592kB,
> anon-rss:796kB, file-rss:0kB
> Out of memory: Kill process 5458 (qmgr) points:233 total-vm:78740kB,
> anon-rss:764kB, file-rss:0kB
> Out of memory: Kill process 5353 (sshd) points:189 total-vm:63992kB,
> anon-rss:620kB, file-rss:0kB
> Out of memory: Kill process 1626 (dhclient) points:129
> total-vm:9148kB, anon-rss:484kB, file-rss:0kB
OK, there was also a panic at the end. Is that expected?

BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
IP: [<ffffffff811227d4>] get_mm_counter+0x14/0x30
PGD 0 
Oops: 0000 [#1] SMP 
CPU 7 
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 dm_mirror dm_region_hash dm_log microcode serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg shpchp ioatdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: scsi_wait_scan]

Pid: 5232, comm: dbus-daemon Not tainted 3.0.0-rc1+ #3 IBM System x3550 M3 -[7944I21]-/69Y4438     
RIP: 0010:[<ffffffff811227d4>]  [<ffffffff811227d4>] get_mm_counter+0x14/0x30
RSP: 0000:ffff88027116b828  EFLAGS: 00010286
RAX: 00000000000002a0 RBX: ffff880470cd8a80 RCX: 0000000000000003
RDX: 000000000000000e RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffff88027116b828 R08: 0000000000000000 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000007 R12: ffff88027116b880
R13: 0000000000000000 R14: 0000000000000000 R15: ffff880270df2100
FS:  00007f78a3837700(0000) GS:ffff88047fc60000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000002a8 CR3: 000000047238f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dbus-daemon (pid: 5232, threadinfo ffff88027116a000, task ffff880270df2100)
Stack:
 ffff88027116b8b8 ffffffff81104c60 0000000000000000 0000000000000000
 ffff8802704c4680 0000000000000000 ffff8802705161c0 0000000000000000
 0000000000000000 0000000000000000 0000000000000286 ffff880470cd8e98
Call Trace:
 [<ffffffff81104c60>] dump_tasks+0xa0/0x160
 [<ffffffff81104dd5>] dump_header+0xb5/0xd0
 [<ffffffff81104f15>] oom_kill_process+0xa5/0x1c0
 [<ffffffff811055ef>] out_of_memory+0xff/0x220
 [<ffffffff8110a962>] __alloc_pages_slowpath+0x632/0x6b0
 [<ffffffff8110ab84>] __alloc_pages_nodemask+0x1a4/0x1f0
 [<ffffffff81147d52>] kmem_getpages+0x62/0x170
 [<ffffffff8114886a>] fallback_alloc+0x1ba/0x270
 [<ffffffff811482e3>] ? cache_grow+0x2c3/0x2f0
 [<ffffffff811485f5>] ____cache_alloc_node+0x95/0x150
 [<ffffffff8114901d>] kmem_cache_alloc+0xfd/0x190
 [<ffffffff810d20ed>] taskstats_exit+0x1cd/0x240
 [<ffffffff81066667>] do_exit+0x177/0x430
 [<ffffffff81066971>] do_group_exit+0x51/0xc0
 [<ffffffff81078583>] get_signal_to_deliver+0x203/0x470
 [<ffffffff8100b939>] do_signal+0x69/0x190
 [<ffffffff8100bac5>] do_notify_resume+0x65/0x80
 [<ffffffff814db6d0>] int_signal+0x12/0x17
Code: 48 8b 00 c9 48 d1 e8 83 e0 01 c3 0f 1f 40 00 31 c0 c9 c3 0f 1f 40 00 55 48 89 e5 66 66 66 66 90 48 63 f6 48 8d 84 f7 90 02 00 00 
 8b 50 08 31 c0 c9 48 85 d2 48 0f 49 c2 c3 66 66 66 66 2e 0f 
RIP  [<ffffffff811227d4>] get_mm_counter+0x14/0x30
 RSP <ffff88027116b828>
CR2: 00000000000002a8
---[ end trace 742b26ee0c4fab73 ]---
Fixing recursive fault but reboot is needed!
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
Pid: 4, comm: kworker/0:0 Tainted: G      D     3.0.0-rc1+ #3
Call Trace:
 <NMI>  [<ffffffff814d062f>] panic+0x91/0x1a8
 [<ffffffff810c76e1>] watchdog_overflow_callback+0xb1/0xc0
 [<ffffffff810fbbdd>] __perf_event_overflow+0x9d/0x250
 [<ffffffff810fc1c4>] perf_event_overflow+0x14/0x20
 [<ffffffff8101df36>] intel_pmu_handle_irq+0x326/0x530
 [<ffffffff814d4ba9>] perf_event_nmi_handler+0x29/0xa0
 [<ffffffff814d6f65>] notifier_call_chain+0x55/0x80
 [<ffffffff814d6fca>] atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff814d6ffe>] notify_die+0x2e/0x30
 [<ffffffff814d4199>] default_do_nmi+0x39/0x1f0
 [<ffffffff814d43d0>] do_nmi+0x80/0xa0
 [<ffffffff814d3b90>] nmi+0x20/0x30
 [<ffffffff8123f379>] ? __write_lock_failed+0x9/0x20
 <<EOE>>  [<ffffffff814d32de>] ? _raw_write_lock_irq+0x1e/0x20
 [<ffffffff81065cec>] forget_original_parent+0x3c/0x330
 [<ffffffff81065ffb>] exit_notify+0x1b/0x190
 [<ffffffff810666ed>] do_exit+0x1fd/0x430
 [<ffffffff8107fae0>] ? manage_workers+0x120/0x120
 [<ffffffff810846ce>] kthread+0x8e/0xa0
 [<ffffffff814dc544>] kernel_thread_helper+0x4/0x10
 [<ffffffff81084640>] ? kthread_worker_fn+0x1a0/0x1a0
 [<ffffffff814dc540>] ? gs_change+0x13/0x13

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  4:10     ` KOSAKI Motohiro
@ 2011-05-31  4:32       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  4:32 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/31 13:10), KOSAKI Motohiro wrote:
> (2011/05/31 10:33), CAI Qian wrote:
>> Hello,
>>
>> Have tested those patches rebased from KOSAKI for the latest mainline.
>> It still killed random processes and recevied a panic at the end by
>> using root user. The full oom output can be found here.
>> http://people.redhat.com/qcai/oom
> 
> You ran fork-bomb as root. Therefore unprivileged process was killed at first.
> It's no random. It's intentional and desirable. I mean
> 
> - If you run the same progream as non-root, python will be killed at first.
>   Because it consume a lot of memory than daemons.
> - If you run the same program as root, non root process and privilege explicit
>   dropping processes (e.g. irqbalance) will be killed at first.

I mean, oom-killer start to kill python after killing all unprivilege process
in this case. Please wait & see ahile after sequence.




^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  4:32       ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  4:32 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/31 13:10), KOSAKI Motohiro wrote:
> (2011/05/31 10:33), CAI Qian wrote:
>> Hello,
>>
>> Have tested those patches rebased from KOSAKI for the latest mainline.
>> It still killed random processes and recevied a panic at the end by
>> using root user. The full oom output can be found here.
>> http://people.redhat.com/qcai/oom
> 
> You ran fork-bomb as root. Therefore unprivileged process was killed at first.
> It's no random. It's intentional and desirable. I mean
> 
> - If you run the same progream as non-root, python will be killed at first.
>   Because it consume a lot of memory than daemons.
> - If you run the same program as root, non root process and privilege explicit
>   dropping processes (e.g. irqbalance) will be killed at first.

I mean, oom-killer start to kill python after killing all unprivilege process
in this case. Please wait & see ahile after sequence.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  4:14       ` CAI Qian
@ 2011-05-31  4:34         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  4:34 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

> OK, there was also a panic at the end. Is that expected?

Definitely, no.
At least, I can't reproduce it. Can you reproduce it?


> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
> IP: [<ffffffff811227d4>] get_mm_counter+0x14/0x30
> PGD 0 
> Oops: 0000 [#1] SMP 
> CPU 7 
> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 dm_mirror dm_region_hash dm_log microcode serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg shpchp ioatdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: scsi_wait_scan]


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  4:34         ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  4:34 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

> OK, there was also a panic at the end. Is that expected?

Definitely, no.
At least, I can't reproduce it. Can you reproduce it?


> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
> IP: [<ffffffff811227d4>] get_mm_counter+0x14/0x30
> PGD 0 
> Oops: 0000 [#1] SMP 
> CPU 7 
> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 dm_mirror dm_region_hash dm_log microcode serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg shpchp ioatdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: scsi_wait_scan]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-30  1:17                   ` KOSAKI Motohiro
@ 2011-05-31  4:48                     ` David Rientjes
  -1 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-31  4:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Mon, 30 May 2011, KOSAKI Motohiro wrote:

> Never mind.
> 
> You never see to increase tasklist_lock. You never seen all processes
> have root privilege case.
> 

I don't really understand what you're trying to say, sorry.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-31  4:48                     ` David Rientjes
  0 siblings, 0 replies; 118+ messages in thread
From: David Rientjes @ 2011-05-31  4:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

On Mon, 30 May 2011, KOSAKI Motohiro wrote:

> Never mind.
> 
> You never see to increase tasklist_lock. You never seen all processes
> have root privilege case.
> 

I don't really understand what you're trying to say, sorry.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  4:14       ` CAI Qian
@ 2011-05-31  4:49         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  4:49 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

> OK, there was also a panic at the end. Is that expected?
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
> IP: [<ffffffff811227d4>] get_mm_counter+0x14/0x30
> PGD 0 
> Oops: 0000 [#1] SMP 
> CPU 7 
> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 dm_mirror dm_region_hash dm_log microcode serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg shpchp ioatdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: scsi_wait_scan]

My fault. my [1/5] has a bug. please apply following incremental patch.


index 9c7f149..f0e34d4 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -448,8 +448,8 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *no
                        task_tgid_nr(task), task_tgid_nr(task->real_parent),
                        task_uid(task),
                        task->mm->total_vm,
-                       get_mm_rss(task->mm) + p->mm->nr_ptes,
-                       get_mm_counter(p->mm, MM_SWAPENTS),
+                       get_mm_rss(task->mm) + task->mm->nr_ptes,
+                       get_mm_counter(task->mm, MM_SWAPENTS),
                        task->signal->oom_score_adj,
                        task->comm);
                task_unlock(task);



^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  4:49         ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  4:49 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

> OK, there was also a panic at the end. Is that expected?
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
> IP: [<ffffffff811227d4>] get_mm_counter+0x14/0x30
> PGD 0 
> Oops: 0000 [#1] SMP 
> CPU 7 
> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 dm_mirror dm_region_hash dm_log microcode serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg shpchp ioatdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: scsi_wait_scan]

My fault. my [1/5] has a bug. please apply following incremental patch.


index 9c7f149..f0e34d4 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -448,8 +448,8 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *no
                        task_tgid_nr(task), task_tgid_nr(task->real_parent),
                        task_uid(task),
                        task->mm->total_vm,
-                       get_mm_rss(task->mm) + p->mm->nr_ptes,
-                       get_mm_counter(p->mm, MM_SWAPENTS),
+                       get_mm_rss(task->mm) + task->mm->nr_ptes,
+                       get_mm_counter(task->mm, MM_SWAPENTS),
                        task->signal->oom_score_adj,
                        task->comm);
                task_unlock(task);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  4:10     ` KOSAKI Motohiro
@ 2011-05-31  4:52       ` CAI Qian
  -1 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-31  4:52 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg



----- Original Message -----
> (2011/05/31 10:33), CAI Qian wrote:
> > Hello,
> >
> > Have tested those patches rebased from KOSAKI for the latest
> > mainline.
> > It still killed random processes and recevied a panic at the end by
> > using root user. The full oom output can be found here.
> > http://people.redhat.com/qcai/oom
> 
> You ran fork-bomb as root. Therefore unprivileged process was killed
> at first.
> It's no random. It's intentional and desirable. I mean
> 
> - If you run the same progream as non-root, python will be killed at
> first.
> Because it consume a lot of memory than daemons.
> - If you run the same program as root, non root process and privilege
> explicit
> dropping processes (e.g. irqbalance) will be killed at first.
Hmm, at least there were some programs were root processes but were killed
first.
[   pid]   ppid   uid total_vm      rss     swap score_adj name
[  5720]   5353     0    24421      257        0         0 sshd
[  5353]      1     0    15998      189        0         0 sshd
[  5451]      1     0    19648      235        0         0 master
[  1626]      1     0     2287      129        0         0 dhclient
> 
> Look, your log says, highest oom score process was killed first.
> 
> Out of memory: Kill process 5462 (abrtd) points:393 total-vm:262300kB,
> anon-rss:1024kB, file-rss:0kB
> Out of memory: Kill process 5277 (hald) points:303 total-vm:25444kB,
> anon-rss:1116kB, file-rss:0kB
> Out of memory: Kill process 5720 (sshd) points:258 total-vm:97684kB,
> anon-rss:824kB, file-rss:0kB
> Out of memory: Kill process 5457 (pickup) points:236 total-vm:78672kB,
> anon-rss:768kB, file-rss:0kB
> Out of memory: Kill process 5451 (master) points:235 total-vm:78592kB,
> anon-rss:796kB, file-rss:0kB
> Out of memory: Kill process 5458 (qmgr) points:233 total-vm:78740kB,
> anon-rss:764kB, file-rss:0kB
> Out of memory: Kill process 5353 (sshd) points:189 total-vm:63992kB,
> anon-rss:620kB, file-rss:0kB
> Out of memory: Kill process 1626 (dhclient) points:129
> total-vm:9148kB, anon-rss:484kB, file-rss:0kB
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  4:52       ` CAI Qian
  0 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-31  4:52 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg



----- Original Message -----
> (2011/05/31 10:33), CAI Qian wrote:
> > Hello,
> >
> > Have tested those patches rebased from KOSAKI for the latest
> > mainline.
> > It still killed random processes and recevied a panic at the end by
> > using root user. The full oom output can be found here.
> > http://people.redhat.com/qcai/oom
> 
> You ran fork-bomb as root. Therefore unprivileged process was killed
> at first.
> It's no random. It's intentional and desirable. I mean
> 
> - If you run the same progream as non-root, python will be killed at
> first.
> Because it consume a lot of memory than daemons.
> - If you run the same program as root, non root process and privilege
> explicit
> dropping processes (e.g. irqbalance) will be killed at first.
Hmm, at least there were some programs were root processes but were killed
first.
[   pid]   ppid   uid total_vm      rss     swap score_adj name
[  5720]   5353     0    24421      257        0         0 sshd
[  5353]      1     0    15998      189        0         0 sshd
[  5451]      1     0    19648      235        0         0 master
[  1626]      1     0     2287      129        0         0 dhclient
> 
> Look, your log says, highest oom score process was killed first.
> 
> Out of memory: Kill process 5462 (abrtd) points:393 total-vm:262300kB,
> anon-rss:1024kB, file-rss:0kB
> Out of memory: Kill process 5277 (hald) points:303 total-vm:25444kB,
> anon-rss:1116kB, file-rss:0kB
> Out of memory: Kill process 5720 (sshd) points:258 total-vm:97684kB,
> anon-rss:824kB, file-rss:0kB
> Out of memory: Kill process 5457 (pickup) points:236 total-vm:78672kB,
> anon-rss:768kB, file-rss:0kB
> Out of memory: Kill process 5451 (master) points:235 total-vm:78592kB,
> anon-rss:796kB, file-rss:0kB
> Out of memory: Kill process 5458 (qmgr) points:233 total-vm:78740kB,
> anon-rss:764kB, file-rss:0kB
> Out of memory: Kill process 5353 (sshd) points:189 total-vm:63992kB,
> anon-rss:620kB, file-rss:0kB
> Out of memory: Kill process 1626 (dhclient) points:129
> total-vm:9148kB, anon-rss:484kB, file-rss:0kB
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
  2011-05-31  4:48                     ` David Rientjes
@ 2011-05-31  4:54                       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  4:54 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/31 13:48), David Rientjes wrote:
> On Mon, 30 May 2011, KOSAKI Motohiro wrote:
> 
>> Never mind.
>>
>> You never see to increase tasklist_lock. You never seen all processes
>> have root privilege case.
> 
> I don't really understand what you're trying to say, sorry.

It's no for job server workload. I mean.



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH 4/5] oom: don't kill random process
@ 2011-05-31  4:54                       ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  4:54 UTC (permalink / raw)
  To: rientjes
  Cc: linux-mm, linux-kernel, akpm, caiqian, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/31 13:48), David Rientjes wrote:
> On Mon, 30 May 2011, KOSAKI Motohiro wrote:
> 
>> Never mind.
>>
>> You never see to increase tasklist_lock. You never seen all processes
>> have root privilege case.
> 
> I don't really understand what you're trying to say, sorry.

It's no for job server workload. I mean.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  4:52       ` CAI Qian
@ 2011-05-31  7:04         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  7:04 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

>> - If you run the same program as root, non root process and privilege
>> explicit
>> dropping processes (e.g. irqbalance) will be killed at first.
> Hmm, at least there were some programs were root processes but were killed
> first.
> [   pid]   ppid   uid total_vm      rss     swap score_adj name
> [  5720]   5353     0    24421      257        0         0 sshd
> [  5353]      1     0    15998      189        0         0 sshd
> [  5451]      1     0    19648      235        0         0 master
> [  1626]      1     0     2287      129        0         0 dhclient

Hi

I can't reproduce this too. Are you sure these processes have a full root privilege?
I've made new debugging patch. After applying following patch, do these processes show
cap=1?



index f0e34d4..fe788df 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -429,7 +429,7 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *no
        struct task_struct *p;
        struct task_struct *task;

-       pr_info("[   pid]   ppid   uid total_vm      rss     swap score_adj name\n");
+       pr_info("[   pid]   ppid   uid cap total_vm      rss     swap score_adj name\n");
        for_each_process(p) {
                if (oom_unkillable_task(p, mem, nodemask))
                        continue;
@@ -444,9 +444,9 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *no
                        continue;
                }

-               pr_info("[%6d] %6d %5d %8lu %8lu %8lu %9d %s\n",
+               pr_info("[%6d] %6d %5d %3d %8lu %8lu %8lu %9d %s\n",
                        task_tgid_nr(task), task_tgid_nr(task->real_parent),
-                       task_uid(task),
+                       task_uid(task), has_capability_noaudit(task, CAP_SYS_ADMIN),
                        task->mm->total_vm,
                        get_mm_rss(task->mm) + task->mm->nr_ptes,
                        get_mm_counter(task->mm, MM_SWAPENTS),













^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  7:04         ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  7:04 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

>> - If you run the same program as root, non root process and privilege
>> explicit
>> dropping processes (e.g. irqbalance) will be killed at first.
> Hmm, at least there were some programs were root processes but were killed
> first.
> [   pid]   ppid   uid total_vm      rss     swap score_adj name
> [  5720]   5353     0    24421      257        0         0 sshd
> [  5353]      1     0    15998      189        0         0 sshd
> [  5451]      1     0    19648      235        0         0 master
> [  1626]      1     0     2287      129        0         0 dhclient

Hi

I can't reproduce this too. Are you sure these processes have a full root privilege?
I've made new debugging patch. After applying following patch, do these processes show
cap=1?



index f0e34d4..fe788df 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -429,7 +429,7 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *no
        struct task_struct *p;
        struct task_struct *task;

-       pr_info("[   pid]   ppid   uid total_vm      rss     swap score_adj name\n");
+       pr_info("[   pid]   ppid   uid cap total_vm      rss     swap score_adj name\n");
        for_each_process(p) {
                if (oom_unkillable_task(p, mem, nodemask))
                        continue;
@@ -444,9 +444,9 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *no
                        continue;
                }

-               pr_info("[%6d] %6d %5d %8lu %8lu %8lu %9d %s\n",
+               pr_info("[%6d] %6d %5d %3d %8lu %8lu %8lu %9d %s\n",
                        task_tgid_nr(task), task_tgid_nr(task->real_parent),
-                       task_uid(task),
+                       task_uid(task), has_capability_noaudit(task, CAP_SYS_ADMIN),
                        task->mm->total_vm,
                        get_mm_rss(task->mm) + task->mm->nr_ptes,
                        get_mm_counter(task->mm, MM_SWAPENTS),












--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  7:04         ` KOSAKI Motohiro
@ 2011-05-31  7:50           ` CAI Qian
  -1 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-31  7:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg



----- Original Message -----
> >> - If you run the same program as root, non root process and
> >> privilege
> >> explicit
> >> dropping processes (e.g. irqbalance) will be killed at first.
> > Hmm, at least there were some programs were root processes but were
> > killed
> > first.
> > [ pid] ppid uid total_vm rss swap score_adj name
> > [ 5720] 5353 0 24421 257 0 0 sshd
> > [ 5353] 1 0 15998 189 0 0 sshd
> > [ 5451] 1 0 19648 235 0 0 master
> > [ 1626] 1 0 2287 129 0 0 dhclient
> 
> Hi
> 
> I can't reproduce this too. Are you sure these processes have a full
> root privilege?
> I've made new debugging patch. After applying following patch, do
> these processes show
> cap=1?
No, all of them had cap=0. Wondering why something like sshd not been
made cap=1 to avoid early oom kill.
> 
> 
> 
> index f0e34d4..fe788df 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -429,7 +429,7 @@ static void dump_tasks(const struct mem_cgroup
> *mem, const nodemask_t *no
> struct task_struct *p;
> struct task_struct *task;
> 
> - pr_info("[ pid] ppid uid total_vm rss swap score_adj name\n");
> + pr_info("[ pid] ppid uid cap total_vm rss swap score_adj name\n");
> for_each_process(p) {
> if (oom_unkillable_task(p, mem, nodemask))
> continue;
> @@ -444,9 +444,9 @@ static void dump_tasks(const struct mem_cgroup
> *mem, const nodemask_t *no
> continue;
> }
> 
> - pr_info("[%6d] %6d %5d %8lu %8lu %8lu %9d %s\n",
> + pr_info("[%6d] %6d %5d %3d %8lu %8lu %8lu %9d %s\n",
> task_tgid_nr(task), task_tgid_nr(task->real_parent),
> - task_uid(task),
> + task_uid(task), has_capability_noaudit(task, CAP_SYS_ADMIN),
> task->mm->total_vm,
> get_mm_rss(task->mm) + task->mm->nr_ptes,
> get_mm_counter(task->mm, MM_SWAPENTS),
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  7:50           ` CAI Qian
  0 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-31  7:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg



----- Original Message -----
> >> - If you run the same program as root, non root process and
> >> privilege
> >> explicit
> >> dropping processes (e.g. irqbalance) will be killed at first.
> > Hmm, at least there were some programs were root processes but were
> > killed
> > first.
> > [ pid] ppid uid total_vm rss swap score_adj name
> > [ 5720] 5353 0 24421 257 0 0 sshd
> > [ 5353] 1 0 15998 189 0 0 sshd
> > [ 5451] 1 0 19648 235 0 0 master
> > [ 1626] 1 0 2287 129 0 0 dhclient
> 
> Hi
> 
> I can't reproduce this too. Are you sure these processes have a full
> root privilege?
> I've made new debugging patch. After applying following patch, do
> these processes show
> cap=1?
No, all of them had cap=0. Wondering why something like sshd not been
made cap=1 to avoid early oom kill.
> 
> 
> 
> index f0e34d4..fe788df 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -429,7 +429,7 @@ static void dump_tasks(const struct mem_cgroup
> *mem, const nodemask_t *no
> struct task_struct *p;
> struct task_struct *task;
> 
> - pr_info("[ pid] ppid uid total_vm rss swap score_adj name\n");
> + pr_info("[ pid] ppid uid cap total_vm rss swap score_adj name\n");
> for_each_process(p) {
> if (oom_unkillable_task(p, mem, nodemask))
> continue;
> @@ -444,9 +444,9 @@ static void dump_tasks(const struct mem_cgroup
> *mem, const nodemask_t *no
> continue;
> }
> 
> - pr_info("[%6d] %6d %5d %8lu %8lu %8lu %9d %s\n",
> + pr_info("[%6d] %6d %5d %3d %8lu %8lu %8lu %9d %s\n",
> task_tgid_nr(task), task_tgid_nr(task->real_parent),
> - task_uid(task),
> + task_uid(task), has_capability_noaudit(task, CAP_SYS_ADMIN),
> task->mm->total_vm,
> get_mm_rss(task->mm) + task->mm->nr_ptes,
> get_mm_counter(task->mm, MM_SWAPENTS),
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  7:50           ` CAI Qian
@ 2011-05-31  7:56             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  7:56 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/31 16:50), CAI Qian wrote:
> 
> 
> ----- Original Message -----
>>>> - If you run the same program as root, non root process and
>>>> privilege
>>>> explicit
>>>> dropping processes (e.g. irqbalance) will be killed at first.
>>> Hmm, at least there were some programs were root processes but were
>>> killed
>>> first.
>>> [ pid] ppid uid total_vm rss swap score_adj name
>>> [ 5720] 5353 0 24421 257 0 0 sshd
>>> [ 5353] 1 0 15998 189 0 0 sshd
>>> [ 5451] 1 0 19648 235 0 0 master
>>> [ 1626] 1 0 2287 129 0 0 dhclient
>>
>> Hi
>>
>> I can't reproduce this too. Are you sure these processes have a full
>> root privilege?
>> I've made new debugging patch. After applying following patch, do
>> these processes show
>> cap=1?
> No, all of them had cap=0. Wondering why something like sshd not been
> made cap=1 to avoid early oom kill.

Then, I believe your distro applying distro specific patch to ssh.
Which distro are you using now?




^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  7:56             ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  7:56 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/31 16:50), CAI Qian wrote:
> 
> 
> ----- Original Message -----
>>>> - If you run the same program as root, non root process and
>>>> privilege
>>>> explicit
>>>> dropping processes (e.g. irqbalance) will be killed at first.
>>> Hmm, at least there were some programs were root processes but were
>>> killed
>>> first.
>>> [ pid] ppid uid total_vm rss swap score_adj name
>>> [ 5720] 5353 0 24421 257 0 0 sshd
>>> [ 5353] 1 0 15998 189 0 0 sshd
>>> [ 5451] 1 0 19648 235 0 0 master
>>> [ 1626] 1 0 2287 129 0 0 dhclient
>>
>> Hi
>>
>> I can't reproduce this too. Are you sure these processes have a full
>> root privilege?
>> I've made new debugging patch. After applying following patch, do
>> these processes show
>> cap=1?
> No, all of them had cap=0. Wondering why something like sshd not been
> made cap=1 to avoid early oom kill.

Then, I believe your distro applying distro specific patch to ssh.
Which distro are you using now?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  7:56             ` KOSAKI Motohiro
@ 2011-05-31  7:59               ` CAI Qian
  -1 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-31  7:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg



----- Original Message -----
> (2011/05/31 16:50), CAI Qian wrote:
> >
> >
> > ----- Original Message -----
> >>>> - If you run the same program as root, non root process and
> >>>> privilege
> >>>> explicit
> >>>> dropping processes (e.g. irqbalance) will be killed at first.
> >>> Hmm, at least there were some programs were root processes but
> >>> were
> >>> killed
> >>> first.
> >>> [ pid] ppid uid total_vm rss swap score_adj name
> >>> [ 5720] 5353 0 24421 257 0 0 sshd
> >>> [ 5353] 1 0 15998 189 0 0 sshd
> >>> [ 5451] 1 0 19648 235 0 0 master
> >>> [ 1626] 1 0 2287 129 0 0 dhclient
> >>
> >> Hi
> >>
> >> I can't reproduce this too. Are you sure these processes have a
> >> full
> >> root privilege?
> >> I've made new debugging patch. After applying following patch, do
> >> these processes show
> >> cap=1?
> > No, all of them had cap=0. Wondering why something like sshd not
> > been
> > made cap=1 to avoid early oom kill.
> 
> Then, I believe your distro applying distro specific patch to ssh.
> Which distro are you using now?
It is a Fedora-like distro.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  7:59               ` CAI Qian
  0 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-05-31  7:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg



----- Original Message -----
> (2011/05/31 16:50), CAI Qian wrote:
> >
> >
> > ----- Original Message -----
> >>>> - If you run the same program as root, non root process and
> >>>> privilege
> >>>> explicit
> >>>> dropping processes (e.g. irqbalance) will be killed at first.
> >>> Hmm, at least there were some programs were root processes but
> >>> were
> >>> killed
> >>> first.
> >>> [ pid] ppid uid total_vm rss swap score_adj name
> >>> [ 5720] 5353 0 24421 257 0 0 sshd
> >>> [ 5353] 1 0 15998 189 0 0 sshd
> >>> [ 5451] 1 0 19648 235 0 0 master
> >>> [ 1626] 1 0 2287 129 0 0 dhclient
> >>
> >> Hi
> >>
> >> I can't reproduce this too. Are you sure these processes have a
> >> full
> >> root privilege?
> >> I've made new debugging patch. After applying following patch, do
> >> these processes show
> >> cap=1?
> > No, all of them had cap=0. Wondering why something like sshd not
> > been
> > made cap=1 to avoid early oom kill.
> 
> Then, I believe your distro applying distro specific patch to ssh.
> Which distro are you using now?
It is a Fedora-like distro.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  7:59               ` CAI Qian
@ 2011-05-31  8:11                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  8:11 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

>> Then, I believe your distro applying distro specific patch to ssh.
>> Which distro are you using now?
> It is a Fedora-like distro.

Ho Hm.
Actually, I'm using Fedora14 and I don't see this phenomenon.
I'll try to version up to Fedora15 in near future.

Thanks.


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31  8:11                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31  8:11 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

>> Then, I believe your distro applying distro specific patch to ssh.
>> Which distro are you using now?
> It is a Fedora-like distro.

Ho Hm.
Actually, I'm using Fedora14 and I don't see this phenomenon.
I'll try to version up to Fedora15 in near future.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31  8:11                 ` KOSAKI Motohiro
@ 2011-05-31 10:01                   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31 10:01 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/31 17:11), KOSAKI Motohiro wrote:
>>> Then, I believe your distro applying distro specific patch to ssh.
>>> Which distro are you using now?
>> It is a Fedora-like distro.

So, Does this makes sense?



>From e47fedaa546499fa3d4196753194db0609cfa2e5 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Tue, 31 May 2011 18:28:30 +0900
Subject: [PATCH] oom: use euid instead of CAP_SYS_ADMIN for protection root process

Recently, many userland daemon prefer to use libcap-ng and drop
all privilege just after startup. Because of (1) Almost privilege
are necessary only when special file open, and aren't necessary
read and write. (2) In general, privilege dropping brings better
protection from exploit when bugs are found in the daemon.

But, it makes suboptimal oom-killer behavior. CAI Qian reported
oom killer killed some important daemon at first on his fedora
like distro. Because they've lost CAP_SYS_ADMIN.

Of course, we recommend to drop privileges as far as possible
instead of keeping them. Thus, oom killer don't have to check
any capability. It implicitly suggest wrong programming style.

This patch change root process check way from CAP_SYS_ADMIN to
just euid==0.

Reported-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/oom_kill.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 59eda6e..4e1e8a5 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -203,7 +203,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 * Root processes get 3% bonus, just like the __vm_enough_memory()
 	 * implementation used by LSMs.
 	 */
-	if (protect_root && has_capability_noaudit(p, CAP_SYS_ADMIN)) {
+	if (protect_root && (task_euid(p) == 0)) {
 		if (points >= totalpages / 32)
 			points -= totalpages / 32;
 		else
@@ -429,7 +429,7 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
 	struct task_struct *p;
 	struct task_struct *task;

-	pr_info("[   pid]   ppid   uid cap total_vm      rss     swap score_adj name\n");
+	pr_info("[   pid]   ppid   uid  euid total_vm      rss     swap score_adj name\n");
 	for_each_process(p) {
 		if (oom_unkillable_task(p, mem, nodemask))
 			continue;
@@ -444,9 +444,9 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
 			continue;
 		}

-		pr_info("[%6d] %6d %5d %3d %8lu %8lu %8lu %9d %s\n",
+		pr_info("[%6d] %6d %5d %5d %8lu %8lu %8lu %9d %s\n",
 			task_tgid_nr(task), task_tgid_nr(task->real_parent),
-			task_uid(task),	has_capability_noaudit(task, CAP_SYS_ADMIN),
+			task_uid(task),	task_euid(task),
 			task->mm->total_vm,
 			get_mm_rss(task->mm) + task->mm->nr_ptes,
 			get_mm_counter(task->mm, MM_SWAPENTS),
-- 
1.7.3.1




^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-05-31 10:01                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-05-31 10:01 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa.hiroyu,
	minchan.kim, oleg

(2011/05/31 17:11), KOSAKI Motohiro wrote:
>>> Then, I believe your distro applying distro specific patch to ssh.
>>> Which distro are you using now?
>> It is a Fedora-like distro.

So, Does this makes sense?

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31 10:01                   ` KOSAKI Motohiro
@ 2011-06-01  1:17                     ` CAI Qian
  -1 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-06-01  1:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg



----- Original Message -----
> (2011/05/31 17:11), KOSAKI Motohiro wrote:
> >>> Then, I believe your distro applying distro specific patch to ssh.
> >>> Which distro are you using now?
> >> It is a Fedora-like distro.
> 
> So, Does this makes sense?
Looks like so, at least now sshd can survive from oom-killed.
> 
> 
> 
> From e47fedaa546499fa3d4196753194db0609cfa2e5 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Tue, 31 May 2011 18:28:30 +0900
> Subject: [PATCH] oom: use euid instead of CAP_SYS_ADMIN for protection
> root process
> 
> Recently, many userland daemon prefer to use libcap-ng and drop
> all privilege just after startup. Because of (1) Almost privilege
> are necessary only when special file open, and aren't necessary
> read and write. (2) In general, privilege dropping brings better
> protection from exploit when bugs are found in the daemon.
> 
> But, it makes suboptimal oom-killer behavior. CAI Qian reported
> oom killer killed some important daemon at first on his fedora
> like distro. Because they've lost CAP_SYS_ADMIN.
> 
> Of course, we recommend to drop privileges as far as possible
> instead of keeping them. Thus, oom killer don't have to check
> any capability. It implicitly suggest wrong programming style.
> 
> This patch change root process check way from CAP_SYS_ADMIN to
> just euid==0.
> 
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
> mm/oom_kill.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 59eda6e..4e1e8a5 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -203,7 +203,7 @@ unsigned long oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> * Root processes get 3% bonus, just like the __vm_enough_memory()
> * implementation used by LSMs.
> */
> - if (protect_root && has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> + if (protect_root && (task_euid(p) == 0)) {
> if (points >= totalpages / 32)
> points -= totalpages / 32;
> else
> @@ -429,7 +429,7 @@ static void dump_tasks(const struct mem_cgroup
> *mem, const nodemask_t *nodemask)
> struct task_struct *p;
> struct task_struct *task;
> 
> - pr_info("[ pid] ppid uid cap total_vm rss swap score_adj name\n");
> + pr_info("[ pid] ppid uid euid total_vm rss swap score_adj name\n");
> for_each_process(p) {
> if (oom_unkillable_task(p, mem, nodemask))
> continue;
> @@ -444,9 +444,9 @@ static void dump_tasks(const struct mem_cgroup
> *mem, const nodemask_t *nodemask)
> continue;
> }
> 
> - pr_info("[%6d] %6d %5d %3d %8lu %8lu %8lu %9d %s\n",
> + pr_info("[%6d] %6d %5d %5d %8lu %8lu %8lu %9d %s\n",
> task_tgid_nr(task), task_tgid_nr(task->real_parent),
> - task_uid(task), has_capability_noaudit(task, CAP_SYS_ADMIN),
> + task_uid(task), task_euid(task),
> task->mm->total_vm,
> get_mm_rss(task->mm) + task->mm->nr_ptes,
> get_mm_counter(task->mm, MM_SWAPENTS),
> --
> 1.7.3.1
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: href=mailto:"dont@kvack.org"> email@kvack.org 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-06-01  1:17                     ` CAI Qian
  0 siblings, 0 replies; 118+ messages in thread
From: CAI Qian @ 2011-06-01  1:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, linux-kernel, akpm, rientjes, hughd, kamezawa hiroyu,
	minchan kim, oleg



----- Original Message -----
> (2011/05/31 17:11), KOSAKI Motohiro wrote:
> >>> Then, I believe your distro applying distro specific patch to ssh.
> >>> Which distro are you using now?
> >> It is a Fedora-like distro.
> 
> So, Does this makes sense?
Looks like so, at least now sshd can survive from oom-killed.
> 
> 
> 
> From e47fedaa546499fa3d4196753194db0609cfa2e5 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Tue, 31 May 2011 18:28:30 +0900
> Subject: [PATCH] oom: use euid instead of CAP_SYS_ADMIN for protection
> root process
> 
> Recently, many userland daemon prefer to use libcap-ng and drop
> all privilege just after startup. Because of (1) Almost privilege
> are necessary only when special file open, and aren't necessary
> read and write. (2) In general, privilege dropping brings better
> protection from exploit when bugs are found in the daemon.
> 
> But, it makes suboptimal oom-killer behavior. CAI Qian reported
> oom killer killed some important daemon at first on his fedora
> like distro. Because they've lost CAP_SYS_ADMIN.
> 
> Of course, we recommend to drop privileges as far as possible
> instead of keeping them. Thus, oom killer don't have to check
> any capability. It implicitly suggest wrong programming style.
> 
> This patch change root process check way from CAP_SYS_ADMIN to
> just euid==0.
> 
> Reported-by: CAI Qian <caiqian@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
> mm/oom_kill.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 59eda6e..4e1e8a5 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -203,7 +203,7 @@ unsigned long oom_badness(struct task_struct *p,
> struct mem_cgroup *mem,
> * Root processes get 3% bonus, just like the __vm_enough_memory()
> * implementation used by LSMs.
> */
> - if (protect_root && has_capability_noaudit(p, CAP_SYS_ADMIN)) {
> + if (protect_root && (task_euid(p) == 0)) {
> if (points >= totalpages / 32)
> points -= totalpages / 32;
> else
> @@ -429,7 +429,7 @@ static void dump_tasks(const struct mem_cgroup
> *mem, const nodemask_t *nodemask)
> struct task_struct *p;
> struct task_struct *task;
> 
> - pr_info("[ pid] ppid uid cap total_vm rss swap score_adj name\n");
> + pr_info("[ pid] ppid uid euid total_vm rss swap score_adj name\n");
> for_each_process(p) {
> if (oom_unkillable_task(p, mem, nodemask))
> continue;
> @@ -444,9 +444,9 @@ static void dump_tasks(const struct mem_cgroup
> *mem, const nodemask_t *nodemask)
> continue;
> }
> 
> - pr_info("[%6d] %6d %5d %3d %8lu %8lu %8lu %9d %s\n",
> + pr_info("[%6d] %6d %5d %5d %8lu %8lu %8lu %9d %s\n",
> task_tgid_nr(task), task_tgid_nr(task->real_parent),
> - task_uid(task), has_capability_noaudit(task, CAP_SYS_ADMIN),
> + task_uid(task), task_euid(task),
> task->mm->total_vm,
> get_mm_rss(task->mm) + task->mm->nr_ptes,
> get_mm_counter(task->mm, MM_SWAPENTS),
> --
> 1.7.3.1
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: href=mailto:"dont@kvack.org"> email@kvack.org 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-05-31 10:01                   ` KOSAKI Motohiro
@ 2011-06-01  3:32                     ` Minchan Kim
  -1 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-06-01  3:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: caiqian, linux-mm, linux-kernel, akpm, rientjes, hughd,
	kamezawa.hiroyu, oleg

Hi KOSAKI,

On Tue, May 31, 2011 at 07:01:08PM +0900, KOSAKI Motohiro wrote:
> (2011/05/31 17:11), KOSAKI Motohiro wrote:
> >>> Then, I believe your distro applying distro specific patch to ssh.
> >>> Which distro are you using now?
> >> It is a Fedora-like distro.
> 
> So, Does this makes sense?
> 
> 
> 
> From e47fedaa546499fa3d4196753194db0609cfa2e5 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Tue, 31 May 2011 18:28:30 +0900
> Subject: [PATCH] oom: use euid instead of CAP_SYS_ADMIN for protection root process
> 
> Recently, many userland daemon prefer to use libcap-ng and drop
> all privilege just after startup. Because of (1) Almost privilege
> are necessary only when special file open, and aren't necessary
> read and write. (2) In general, privilege dropping brings better
> protection from exploit when bugs are found in the daemon.
> 
> But, it makes suboptimal oom-killer behavior. CAI Qian reported
> oom killer killed some important daemon at first on his fedora
> like distro. Because they've lost CAP_SYS_ADMIN.
> 
> Of course, we recommend to drop privileges as far as possible
> instead of keeping them. Thus, oom killer don't have to check
> any capability. It implicitly suggest wrong programming style.
> 
> This patch change root process check way from CAP_SYS_ADMIN to
> just euid==0.

I like this but I have some comments.
Firstly, it's not dependent with your series so I think this could
be merged firstly.
Before that, I would like to make clear my concern.
As I look below comment, 3% bonus is dependent with __vm_enough_memory's logic?
If it isn't, we can remove the comment. It would be another patch.
If is is, could we change __vm_enough_memory for euid instead of cap?

        * Root processes get 3% bonus, just like the __vm_enough_memory()
	* implementation used by LSMs.

-- 
Kind regards
Minchan Kim

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-06-01  3:32                     ` Minchan Kim
  0 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-06-01  3:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: caiqian, linux-mm, linux-kernel, akpm, rientjes, hughd,
	kamezawa.hiroyu, oleg

Hi KOSAKI,

On Tue, May 31, 2011 at 07:01:08PM +0900, KOSAKI Motohiro wrote:
> (2011/05/31 17:11), KOSAKI Motohiro wrote:
> >>> Then, I believe your distro applying distro specific patch to ssh.
> >>> Which distro are you using now?
> >> It is a Fedora-like distro.
> 
> So, Does this makes sense?
> 
> 
> 
> From e47fedaa546499fa3d4196753194db0609cfa2e5 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Date: Tue, 31 May 2011 18:28:30 +0900
> Subject: [PATCH] oom: use euid instead of CAP_SYS_ADMIN for protection root process
> 
> Recently, many userland daemon prefer to use libcap-ng and drop
> all privilege just after startup. Because of (1) Almost privilege
> are necessary only when special file open, and aren't necessary
> read and write. (2) In general, privilege dropping brings better
> protection from exploit when bugs are found in the daemon.
> 
> But, it makes suboptimal oom-killer behavior. CAI Qian reported
> oom killer killed some important daemon at first on his fedora
> like distro. Because they've lost CAP_SYS_ADMIN.
> 
> Of course, we recommend to drop privileges as far as possible
> instead of keeping them. Thus, oom killer don't have to check
> any capability. It implicitly suggest wrong programming style.
> 
> This patch change root process check way from CAP_SYS_ADMIN to
> just euid==0.

I like this but I have some comments.
Firstly, it's not dependent with your series so I think this could
be merged firstly.
Before that, I would like to make clear my concern.
As I look below comment, 3% bonus is dependent with __vm_enough_memory's logic?
If it isn't, we can remove the comment. It would be another patch.
If is is, could we change __vm_enough_memory for euid instead of cap?

        * Root processes get 3% bonus, just like the __vm_enough_memory()
	* implementation used by LSMs.

-- 
Kind regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-06-01  3:32                     ` Minchan Kim
@ 2011-06-06  3:07                       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-06-06  3:07 UTC (permalink / raw)
  To: minchan.kim
  Cc: caiqian, linux-mm, linux-kernel, akpm, rientjes, hughd,
	kamezawa.hiroyu, oleg

>> Of course, we recommend to drop privileges as far as possible
>> instead of keeping them. Thus, oom killer don't have to check
>> any capability. It implicitly suggest wrong programming style.
>>
>> This patch change root process check way from CAP_SYS_ADMIN to
>> just euid==0.
> 
> I like this but I have some comments.
> Firstly, it's not dependent with your series so I think this could
> be merged firstly.

I agree.

> Before that, I would like to make clear my concern.
> As I look below comment, 3% bonus is dependent with __vm_enough_memory's logic?

No. completely independent.

vm_enough_memory() check the task _can_ allocate more memory. IOW, the task
is subjective. And oom-killer check the task should be protected from oom-killer.
IOW, the task is objective.


> If it isn't, we can remove the comment. It would be another patch.
> If is is, could we change __vm_enough_memory for euid instead of cap?
> 
>         * Root processes get 3% bonus, just like the __vm_enough_memory()
> 	* implementation used by LSMs.

vm_enough_memory() is completely correct. I don't see any reason to change it.



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-06-06  3:07                       ` KOSAKI Motohiro
  0 siblings, 0 replies; 118+ messages in thread
From: KOSAKI Motohiro @ 2011-06-06  3:07 UTC (permalink / raw)
  To: minchan.kim
  Cc: caiqian, linux-mm, linux-kernel, akpm, rientjes, hughd,
	kamezawa.hiroyu, oleg

>> Of course, we recommend to drop privileges as far as possible
>> instead of keeping them. Thus, oom killer don't have to check
>> any capability. It implicitly suggest wrong programming style.
>>
>> This patch change root process check way from CAP_SYS_ADMIN to
>> just euid==0.
> 
> I like this but I have some comments.
> Firstly, it's not dependent with your series so I think this could
> be merged firstly.

I agree.

> Before that, I would like to make clear my concern.
> As I look below comment, 3% bonus is dependent with __vm_enough_memory's logic?

No. completely independent.

vm_enough_memory() check the task _can_ allocate more memory. IOW, the task
is subjective. And oom-killer check the task should be protected from oom-killer.
IOW, the task is objective.


> If it isn't, we can remove the comment. It would be another patch.
> If is is, could we change __vm_enough_memory for euid instead of cap?
> 
>         * Root processes get 3% bonus, just like the __vm_enough_memory()
> 	* implementation used by LSMs.

vm_enough_memory() is completely correct. I don't see any reason to change it.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
  2011-06-06  3:07                       ` KOSAKI Motohiro
@ 2011-06-06 14:44                         ` Minchan Kim
  -1 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-06-06 14:44 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: caiqian, linux-mm, linux-kernel, akpm, rientjes, hughd,
	kamezawa.hiroyu, oleg

On Mon, Jun 06, 2011 at 12:07:15PM +0900, KOSAKI Motohiro wrote:
> >> Of course, we recommend to drop privileges as far as possible
> >> instead of keeping them. Thus, oom killer don't have to check
> >> any capability. It implicitly suggest wrong programming style.
> >>
> >> This patch change root process check way from CAP_SYS_ADMIN to
> >> just euid==0.
> > 
> > I like this but I have some comments.
> > Firstly, it's not dependent with your series so I think this could
> > be merged firstly.
> 
> I agree.
> 
> > Before that, I would like to make clear my concern.
> > As I look below comment, 3% bonus is dependent with __vm_enough_memory's logic?
> 
> No. completely independent.
> 
> vm_enough_memory() check the task _can_ allocate more memory. IOW, the task
> is subjective. And oom-killer check the task should be protected from oom-killer.
> IOW, the task is objective.
> 

Hmm, maybe I can't understand your point.
My though was below.

Assumption)
1. root allocation bonus point -> 10% 
2. OOM have no bonus about root process

Scenario)
1.
System has 101 free pages and 10 normal tasks.
Ideally, 10 tasks allocates free memory fairly so each task will have 10 pages.
So OOM killer can select victim fairly when new task which requires 10 pages forks.

2.
System has 101 free pages and 10 tasks. (9 normal task , 1 root task)
10 * 9 + 11 will be consumed. So each normal task will have 10 pages
but a root task have 11 pages.
So OOM killer can always selectd root process as vicim.(We assumed
OOM doesn't have a bonus on root process)

Conclusion)
For solving above problem, we have to give bonus which was given
in allocation to OOM, too. It's fair.
So I think it has a dependency.

-- 
Kind regards
Minchan Kim

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory  (aka CAI founded issue)
@ 2011-06-06 14:44                         ` Minchan Kim
  0 siblings, 0 replies; 118+ messages in thread
From: Minchan Kim @ 2011-06-06 14:44 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: caiqian, linux-mm, linux-kernel, akpm, rientjes, hughd,
	kamezawa.hiroyu, oleg

On Mon, Jun 06, 2011 at 12:07:15PM +0900, KOSAKI Motohiro wrote:
> >> Of course, we recommend to drop privileges as far as possible
> >> instead of keeping them. Thus, oom killer don't have to check
> >> any capability. It implicitly suggest wrong programming style.
> >>
> >> This patch change root process check way from CAP_SYS_ADMIN to
> >> just euid==0.
> > 
> > I like this but I have some comments.
> > Firstly, it's not dependent with your series so I think this could
> > be merged firstly.
> 
> I agree.
> 
> > Before that, I would like to make clear my concern.
> > As I look below comment, 3% bonus is dependent with __vm_enough_memory's logic?
> 
> No. completely independent.
> 
> vm_enough_memory() check the task _can_ allocate more memory. IOW, the task
> is subjective. And oom-killer check the task should be protected from oom-killer.
> IOW, the task is objective.
> 

Hmm, maybe I can't understand your point.
My though was below.

Assumption)
1. root allocation bonus point -> 10% 
2. OOM have no bonus about root process

Scenario)
1.
System has 101 free pages and 10 normal tasks.
Ideally, 10 tasks allocates free memory fairly so each task will have 10 pages.
So OOM killer can select victim fairly when new task which requires 10 pages forks.

2.
System has 101 free pages and 10 tasks. (9 normal task , 1 root task)
10 * 9 + 11 will be consumed. So each normal task will have 10 pages
but a root task have 11 pages.
So OOM killer can always selectd root process as vicim.(We assumed
OOM doesn't have a bonus on root process)

Conclusion)
For solving above problem, we have to give bonus which was given
in allocation to OOM, too. It's fair.
So I think it has a dependency.

-- 
Kind regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 118+ messages in thread

end of thread, other threads:[~2011-06-06 14:45 UTC | newest]

Thread overview: 118+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-20  8:00 [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory (aka CAI founded issue) KOSAKI Motohiro
2011-05-20  8:00 ` KOSAKI Motohiro
2011-05-20  8:01 ` [PATCH 1/5] oom: improve dump_tasks() show items KOSAKI Motohiro
2011-05-20  8:01   ` KOSAKI Motohiro
2011-05-23 22:16   ` David Rientjes
2011-05-23 22:16     ` David Rientjes
2011-05-20  8:02 ` [PATCH 2/5] oom: kill younger process first KOSAKI Motohiro
2011-05-20  8:02   ` KOSAKI Motohiro
2011-05-23  2:37   ` Minchan Kim
2011-05-23  2:37     ` Minchan Kim
2011-05-23 22:20   ` David Rientjes
2011-05-23 22:20     ` David Rientjes
2011-05-20  8:03 ` [PATCH 3/5] oom: oom-killer don't use proportion of system-ram internally KOSAKI Motohiro
2011-05-20  8:03   ` KOSAKI Motohiro
2011-05-23  3:59   ` Minchan Kim
2011-05-23  3:59     ` Minchan Kim
2011-05-24  1:14     ` KOSAKI Motohiro
2011-05-24  1:14       ` KOSAKI Motohiro
2011-05-24  1:32       ` Minchan Kim
2011-05-24  1:32         ` Minchan Kim
2011-05-23  4:02   ` Minchan Kim
2011-05-23  4:02     ` Minchan Kim
2011-05-24  1:44     ` KOSAKI Motohiro
2011-05-24  1:44       ` KOSAKI Motohiro
2011-05-24  3:11       ` KOSAKI Motohiro
2011-05-24  3:11         ` KOSAKI Motohiro
2011-05-23 22:28   ` David Rientjes
2011-05-23 22:28     ` David Rientjes
2011-05-23 22:48     ` David Rientjes
2011-05-23 22:48       ` David Rientjes
2011-05-24  1:21       ` KOSAKI Motohiro
2011-05-24  1:21         ` KOSAKI Motohiro
2011-05-24  8:32       ` CAI Qian
2011-05-24  8:32         ` CAI Qian
2011-05-26  7:08       ` CAI Qian
2011-05-26  7:08         ` CAI Qian
2011-05-27 19:12         ` David Rientjes
2011-05-27 19:12           ` David Rientjes
2011-05-24  2:07     ` KOSAKI Motohiro
2011-05-24  2:07       ` KOSAKI Motohiro
2011-05-26  9:34   ` CAI Qian
2011-05-26  9:34     ` CAI Qian
2011-05-26  9:56     ` KOSAKI Motohiro
2011-05-26  9:56       ` KOSAKI Motohiro
2011-05-20  8:04 ` [PATCH 4/5] oom: don't kill random process KOSAKI Motohiro
2011-05-20  8:04   ` KOSAKI Motohiro
2011-05-23  4:31   ` Minchan Kim
2011-05-23  4:31     ` Minchan Kim
2011-05-24  1:53     ` KOSAKI Motohiro
2011-05-24  1:53       ` KOSAKI Motohiro
2011-05-24  8:46       ` Minchan Kim
2011-05-24  8:46         ` Minchan Kim
2011-05-24  8:49         ` KOSAKI Motohiro
2011-05-24  8:49           ` KOSAKI Motohiro
2011-05-24  9:04           ` Minchan Kim
2011-05-24  9:04             ` Minchan Kim
2011-05-24  9:09             ` KOSAKI Motohiro
2011-05-24  9:09               ` KOSAKI Motohiro
2011-05-24  9:20               ` Minchan Kim
2011-05-24  9:20                 ` Minchan Kim
2011-05-24  9:38                 ` KOSAKI Motohiro
2011-05-24  9:38                   ` KOSAKI Motohiro
2011-05-23 22:32   ` David Rientjes
2011-05-23 22:32     ` David Rientjes
2011-05-24  1:35     ` KOSAKI Motohiro
2011-05-24  1:35       ` KOSAKI Motohiro
2011-05-24  1:39       ` David Rientjes
2011-05-24  1:39         ` David Rientjes
2011-05-24  1:55         ` KOSAKI Motohiro
2011-05-24  1:55           ` KOSAKI Motohiro
2011-05-24  1:58           ` David Rientjes
2011-05-24  1:58             ` David Rientjes
2011-05-24  2:03             ` KOSAKI Motohiro
2011-05-24  2:03               ` KOSAKI Motohiro
2011-05-25 23:50               ` David Rientjes
2011-05-25 23:50                 ` David Rientjes
2011-05-30  1:17                 ` KOSAKI Motohiro
2011-05-30  1:17                   ` KOSAKI Motohiro
2011-05-31  4:48                   ` David Rientjes
2011-05-31  4:48                     ` David Rientjes
2011-05-31  4:54                     ` KOSAKI Motohiro
2011-05-31  4:54                       ` KOSAKI Motohiro
2011-05-20  8:05 ` [PATCH 5/5] oom: merge oom_kill_process() with oom_kill_task() KOSAKI Motohiro
2011-05-20  8:05   ` KOSAKI Motohiro
2011-05-31  1:33 ` [PATCH v2 0/5] Fix oom killer doesn't work at all if system have > gigabytes memory (aka CAI founded issue) CAI Qian
2011-05-31  1:33   ` CAI Qian
2011-05-31  4:10   ` KOSAKI Motohiro
2011-05-31  4:10     ` KOSAKI Motohiro
2011-05-31  4:14     ` CAI Qian
2011-05-31  4:14       ` CAI Qian
2011-05-31  4:34       ` KOSAKI Motohiro
2011-05-31  4:34         ` KOSAKI Motohiro
2011-05-31  4:49       ` KOSAKI Motohiro
2011-05-31  4:49         ` KOSAKI Motohiro
2011-05-31  4:32     ` KOSAKI Motohiro
2011-05-31  4:32       ` KOSAKI Motohiro
2011-05-31  4:52     ` CAI Qian
2011-05-31  4:52       ` CAI Qian
2011-05-31  7:04       ` KOSAKI Motohiro
2011-05-31  7:04         ` KOSAKI Motohiro
2011-05-31  7:50         ` CAI Qian
2011-05-31  7:50           ` CAI Qian
2011-05-31  7:56           ` KOSAKI Motohiro
2011-05-31  7:56             ` KOSAKI Motohiro
2011-05-31  7:59             ` CAI Qian
2011-05-31  7:59               ` CAI Qian
2011-05-31  8:11               ` KOSAKI Motohiro
2011-05-31  8:11                 ` KOSAKI Motohiro
2011-05-31 10:01                 ` KOSAKI Motohiro
2011-05-31 10:01                   ` KOSAKI Motohiro
2011-06-01  1:17                   ` CAI Qian
2011-06-01  1:17                     ` CAI Qian
2011-06-01  3:32                   ` Minchan Kim
2011-06-01  3:32                     ` Minchan Kim
2011-06-06  3:07                     ` KOSAKI Motohiro
2011-06-06  3:07                       ` KOSAKI Motohiro
2011-06-06 14:44                       ` Minchan Kim
2011-06-06 14:44                         ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.