All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
@ 2011-04-13 18:33 David Rientjes
  2011-04-14  0:03 ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2011-04-13 18:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Izik Eidus, KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-mm

There's a kernel-wide shortage of per-process flags, so it's always 
helpful to trim one when possible without incurring a significant 
penalty.  It's even more important when you're planning on adding a per-
process flag yourself, which I plan to do shortly for transparent 
hugepages.

PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a 
tendency to allocate large amounts of memory and should be preferred for 
killing over other tasks.  We'd rather immediately kill the task making 
the errant syscall rather than penalizing an innocent task.

This patch removes PF_OOM_ORIGIN since its behavior is equivalent to 
setting the process's oom_score_adj to OOM_SCORE_ADJ_MIN.

The process's old oom_score_adj is stored and then set to 
OOM_SCORE_ADJ_MIN during the time it used to have PF_OOM_ORIGIN.  The old 
value is then reinstated when the process should no longer be considered 
a high priority for oom killing.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 include/linux/oom.h   |    2 ++
 include/linux/sched.h |    1 -
 mm/ksm.c              |    7 +++++--
 mm/oom_kill.c         |   28 +++++++++++++++++++---------
 mm/swapfile.c         |    6 ++++--
 5 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -40,6 +40,8 @@ enum oom_constraint {
 	CONSTRAINT_MEMCG,
 };
 
+extern int test_set_oom_score_adj(int new_val);
+
 extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 			const nodemask_t *nodemask, unsigned long totalpages);
 extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
diff --git a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1738,7 +1738,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
 #define PF_KSWAPD	0x00040000	/* I am kswapd */
-#define PF_OOM_ORIGIN	0x00080000	/* Allocating much memory to others */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
diff --git a/mm/ksm.c b/mm/ksm.c
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -35,6 +35,7 @@
 #include <linux/ksm.h>
 #include <linux/hash.h>
 #include <linux/freezer.h>
+#include <linux/oom.h>
 
 #include <asm/tlbflush.h>
 #include "internal.h"
@@ -1894,9 +1895,11 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr,
 	if (ksm_run != flags) {
 		ksm_run = flags;
 		if (flags & KSM_RUN_UNMERGE) {
-			current->flags |= PF_OOM_ORIGIN;
+			int oom_score_adj;
+
+			oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MIN);
 			err = unmerge_and_remove_all_rmap_items();
-			current->flags &= ~PF_OOM_ORIGIN;
+			test_set_oom_score_adj(oom_score_adj);
 			if (err) {
 				ksm_run = KSM_RUN_STOP;
 				count = err;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -38,6 +38,25 @@ int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 static DEFINE_SPINLOCK(zone_scan_lock);
 
+int test_set_oom_score_adj(int new_val)
+{
+	struct sighand_struct *sighand = current->sighand;
+	int old_val;
+
+	spin_lock(&sighand->siglock);
+	old_val = current->signal->oom_score_adj;
+	if (new_val != old_val) {
+		if (new_val == OOM_SCORE_ADJ_MIN)
+			atomic_inc(&current->mm->oom_disable_count);
+		else if (old_val == OOM_SCORE_ADJ_MIN)
+			atomic_dec(&current->mm->oom_disable_count);
+		current->signal->oom_score_adj = new_val;
+	}
+	spin_unlock(&sighand->siglock);
+
+	return old_val;
+}
+
 #ifdef CONFIG_NUMA
 /**
  * has_intersects_mems_allowed() - check task eligiblity for kill
@@ -173,15 +192,6 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	}
 
 	/*
-	 * When the PF_OOM_ORIGIN bit is set, it indicates the task should have
-	 * priority for oom killing.
-	 */
-	if (p->flags & PF_OOM_ORIGIN) {
-		task_unlock(p);
-		return 1000;
-	}
-
-	/*
 	 * The memory controller may have a limit of 0 bytes, so avoid a divide
 	 * by zero, if necessary.
 	 */
diff --git a/mm/swapfile.c b/mm/swapfile.c
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -31,6 +31,7 @@
 #include <linux/syscalls.h>
 #include <linux/memcontrol.h>
 #include <linux/poll.h>
+#include <linux/oom.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -1555,6 +1556,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	struct address_space *mapping;
 	struct inode *inode;
 	char *pathname;
+	int oom_score_adj;
 	int i, type, prev;
 	int err;
 
@@ -1613,9 +1615,9 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	p->flags &= ~SWP_WRITEOK;
 	spin_unlock(&swap_lock);
 
-	current->flags |= PF_OOM_ORIGIN;
+	oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MIN);
 	err = try_to_unuse(type);
-	current->flags &= ~PF_OOM_ORIGIN;
+	test_set_oom_score_adj(oom_score_adj);
 
 	if (err) {
 		/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-13 18:33 [patch] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj David Rientjes
@ 2011-04-14  0:03 ` KOSAKI Motohiro
  2011-04-14  0:41   ` [patch v2] " David Rientjes
  0 siblings, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2011-04-14  0:03 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Andrew Morton, Hugh Dickins, Izik Eidus,
	KAMEZAWA Hiroyuki, linux-mm

> There's a kernel-wide shortage of per-process flags, so it's always 
> helpful to trim one when possible without incurring a significant 
> penalty.  It's even more important when you're planning on adding a per-
> process flag yourself, which I plan to do shortly for transparent 
> hugepages.
> 
> PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a 
> tendency to allocate large amounts of memory and should be preferred for 
> killing over other tasks.  We'd rather immediately kill the task making 
> the errant syscall rather than penalizing an innocent task.
> 
> This patch removes PF_OOM_ORIGIN since its behavior is equivalent to 
> setting the process's oom_score_adj to OOM_SCORE_ADJ_MIN.

s/OOM_SCORE_ADJ_MIN/OOM_SCORE_ADJ_MAX/ ?

OOM_SCORE_ADJ_MIN == -1000. then,
	points += OOM_SCORE_ADJ_MIN
makes very small value (usually 1).




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch v2] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-14  0:03 ` KOSAKI Motohiro
@ 2011-04-14  0:41   ` David Rientjes
  2011-04-14  0:46     ` KOSAKI Motohiro
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David Rientjes @ 2011-04-14  0:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Hugh Dickins, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm

There's a kernel-wide shortage of per-process flags, so it's always 
helpful to trim one when possible without incurring a significant 
penalty.  It's even more important when you're planning on adding a per-
process flag yourself, which I plan to do shortly for transparent 
hugepages.

PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a 
tendency to allocate large amounts of memory and should be preferred for 
killing over other tasks.  We'd rather immediately kill the task making 
the errant syscall rather than penalizing an innocent task.

This patch removes PF_OOM_ORIGIN since its behavior is equivalent to 
setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.

The process's old oom_score_adj is stored and then set to 
OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN.  The old 
value is then reinstated when the process should no longer be considered 
a high priority for oom killing.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 v2: s/OOM_SCORE_ADJ_MIN/OOM_SCORE_ADJ_MAX/ as pointed out by
     KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

 include/linux/oom.h   |    2 ++
 include/linux/sched.h |    1 -
 mm/ksm.c              |    7 +++++--
 mm/oom_kill.c         |   28 +++++++++++++++++++---------
 mm/swapfile.c         |    6 ++++--
 5 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -40,6 +40,8 @@ enum oom_constraint {
 	CONSTRAINT_MEMCG,
 };
 
+extern int test_set_oom_score_adj(int new_val);
+
 extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 			const nodemask_t *nodemask, unsigned long totalpages);
 extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
diff --git a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1738,7 +1738,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
 #define PF_KSWAPD	0x00040000	/* I am kswapd */
-#define PF_OOM_ORIGIN	0x00080000	/* Allocating much memory to others */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
diff --git a/mm/ksm.c b/mm/ksm.c
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -35,6 +35,7 @@
 #include <linux/ksm.h>
 #include <linux/hash.h>
 #include <linux/freezer.h>
+#include <linux/oom.h>
 
 #include <asm/tlbflush.h>
 #include "internal.h"
@@ -1894,9 +1895,11 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr,
 	if (ksm_run != flags) {
 		ksm_run = flags;
 		if (flags & KSM_RUN_UNMERGE) {
-			current->flags |= PF_OOM_ORIGIN;
+			int oom_score_adj;
+
+			oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
 			err = unmerge_and_remove_all_rmap_items();
-			current->flags &= ~PF_OOM_ORIGIN;
+			test_set_oom_score_adj(oom_score_adj);
 			if (err) {
 				ksm_run = KSM_RUN_STOP;
 				count = err;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -38,6 +38,25 @@ int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 static DEFINE_SPINLOCK(zone_scan_lock);
 
+int test_set_oom_score_adj(int new_val)
+{
+	struct sighand_struct *sighand = current->sighand;
+	int old_val;
+
+	spin_lock(&sighand->siglock);
+	old_val = current->signal->oom_score_adj;
+	if (new_val != old_val) {
+		if (new_val == OOM_SCORE_ADJ_MIN)
+			atomic_inc(&current->mm->oom_disable_count);
+		else if (old_val == OOM_SCORE_ADJ_MIN)
+			atomic_dec(&current->mm->oom_disable_count);
+		current->signal->oom_score_adj = new_val;
+	}
+	spin_unlock(&sighand->siglock);
+
+	return old_val;
+}
+
 #ifdef CONFIG_NUMA
 /**
  * has_intersects_mems_allowed() - check task eligiblity for kill
@@ -173,15 +192,6 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	}
 
 	/*
-	 * When the PF_OOM_ORIGIN bit is set, it indicates the task should have
-	 * priority for oom killing.
-	 */
-	if (p->flags & PF_OOM_ORIGIN) {
-		task_unlock(p);
-		return 1000;
-	}
-
-	/*
 	 * The memory controller may have a limit of 0 bytes, so avoid a divide
 	 * by zero, if necessary.
 	 */
diff --git a/mm/swapfile.c b/mm/swapfile.c
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -31,6 +31,7 @@
 #include <linux/syscalls.h>
 #include <linux/memcontrol.h>
 #include <linux/poll.h>
+#include <linux/oom.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -1555,6 +1556,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	struct address_space *mapping;
 	struct inode *inode;
 	char *pathname;
+	int oom_score_adj;
 	int i, type, prev;
 	int err;
 
@@ -1613,9 +1615,9 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	p->flags &= ~SWP_WRITEOK;
 	spin_unlock(&swap_lock);
 
-	current->flags |= PF_OOM_ORIGIN;
+	oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
 	err = try_to_unuse(type);
-	current->flags &= ~PF_OOM_ORIGIN;
+	test_set_oom_score_adj(oom_score_adj);
 
 	if (err) {
 		/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch v2] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-14  0:41   ` [patch v2] " David Rientjes
@ 2011-04-14  0:46     ` KOSAKI Motohiro
  2011-04-14  1:09     ` Minchan Kim
  2011-04-14 20:18     ` [patch v3] " David Rientjes
  2 siblings, 0 replies; 12+ messages in thread
From: KOSAKI Motohiro @ 2011-04-14  0:46 UTC (permalink / raw)
  To: David Rientjes
  Cc: kosaki.motohiro, Andrew Morton, Hugh Dickins, Izik Eidus,
	KAMEZAWA Hiroyuki, linux-mm

> There's a kernel-wide shortage of per-process flags, so it's always 
> helpful to trim one when possible without incurring a significant 
> penalty.  It's even more important when you're planning on adding a per-
> process flag yourself, which I plan to do shortly for transparent 
> hugepages.
> 
> PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a 
> tendency to allocate large amounts of memory and should be preferred for 
> killing over other tasks.  We'd rather immediately kill the task making 
> the errant syscall rather than penalizing an innocent task.
> 
> This patch removes PF_OOM_ORIGIN since its behavior is equivalent to 
> setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.
> 
> The process's old oom_score_adj is stored and then set to 
> OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN.  The old 
> value is then reinstated when the process should no longer be considered 
> a high priority for oom killing.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  v2: s/OOM_SCORE_ADJ_MIN/OOM_SCORE_ADJ_MAX/ as pointed out by
>      KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Good patch.
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch v2] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-14  0:41   ` [patch v2] " David Rientjes
  2011-04-14  0:46     ` KOSAKI Motohiro
@ 2011-04-14  1:09     ` Minchan Kim
  2011-04-14  1:12       ` David Rientjes
  2011-04-14 20:18     ` [patch v3] " David Rientjes
  2 siblings, 1 reply; 12+ messages in thread
From: Minchan Kim @ 2011-04-14  1:09 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, KOSAKI Motohiro, Hugh Dickins, Izik Eidus,
	KAMEZAWA Hiroyuki, linux-mm

On Thu, Apr 14, 2011 at 9:41 AM, David Rientjes <rientjes@google.com> wrote:
> There's a kernel-wide shortage of per-process flags, so it's always
> helpful to trim one when possible without incurring a significant
> penalty.  It's even more important when you're planning on adding a per-
> process flag yourself, which I plan to do shortly for transparent
> hugepages.
>
> PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a
> tendency to allocate large amounts of memory and should be preferred for
> killing over other tasks.  We'd rather immediately kill the task making
> the errant syscall rather than penalizing an innocent task.
>
> This patch removes PF_OOM_ORIGIN since its behavior is equivalent to
> setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.
>
> The process's old oom_score_adj is stored and then set to
> OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN.  The old
> value is then reinstated when the process should no longer be considered
> a high priority for oom killing.
>
> Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Seem to be reasonable and code don't have a problem.
But couldn't we make the function in general(ex, passed task_struct)
and use it when we change oom_score_adj(ex, oom_score_adj_write)?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch v2] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-14  1:09     ` Minchan Kim
@ 2011-04-14  1:12       ` David Rientjes
  2011-04-14  1:21         ` Minchan Kim
  0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2011-04-14  1:12 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, KOSAKI Motohiro, Hugh Dickins, Izik Eidus,
	KAMEZAWA Hiroyuki, linux-mm

On Thu, 14 Apr 2011, Minchan Kim wrote:

> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> 

Thanks!

> Seem to be reasonable and code don't have a problem.
> But couldn't we make the function in general(ex, passed task_struct)
> and use it when we change oom_score_adj(ex, oom_score_adj_write)?
> 

I thought about doing that, but oom_score_adj_write doesn't operate on 
current, so it needs to lock p->sighand differently and also does a test 
to ensure that the new value is only less than the current value for 
CAP_SYS_RESOURCE.  That test is required to take place under the lock as 
well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch v2] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-14  1:12       ` David Rientjes
@ 2011-04-14  1:21         ` Minchan Kim
  2011-04-14  7:55           ` Matt Fleming
  0 siblings, 1 reply; 12+ messages in thread
From: Minchan Kim @ 2011-04-14  1:21 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, KOSAKI Motohiro, Hugh Dickins, Izik Eidus,
	KAMEZAWA Hiroyuki, linux-mm

On Thu, Apr 14, 2011 at 10:12 AM, David Rientjes <rientjes@google.com> wrote:
> On Thu, 14 Apr 2011, Minchan Kim wrote:
>
>> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
>>
>
> Thanks!
>
>> Seem to be reasonable and code don't have a problem.
>> But couldn't we make the function in general(ex, passed task_struct)
>> and use it when we change oom_score_adj(ex, oom_score_adj_write)?
>>
>
> I thought about doing that, but oom_score_adj_write doesn't operate on
> current, so it needs to lock p->sighand differently and also does a test
> to ensure that the new value is only less than the current value for
> CAP_SYS_RESOURCE.  That test is required to take place under the lock as
> well.
>

Yes. We already have facilities for it(ex, task_lock, lock_task_sighand).
And I think CAP_SYS_RESOURCE check in general function don't have a problem.

Of course, it adds unnecessary overhead slightly but it's not a hot
path.  What's problem for you to go ahead?


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch v2] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-14  1:21         ` Minchan Kim
@ 2011-04-14  7:55           ` Matt Fleming
  0 siblings, 0 replies; 12+ messages in thread
From: Matt Fleming @ 2011-04-14  7:55 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Rientjes, Andrew Morton, KOSAKI Motohiro, Hugh Dickins,
	Izik Eidus, KAMEZAWA Hiroyuki, linux-mm

On Thu, 14 Apr 2011 10:21:56 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
 
> Yes. We already have facilities for it(ex, task_lock, lock_task_sighand).
> And I think CAP_SYS_RESOURCE check in general function don't have a problem.
> 
> Of course, it adds unnecessary overhead slightly but it's not a hot
> path.  What's problem for you to go ahead?

Also, lock_task_sighand() would disable interrupts when acquiring
sighand->siglock, which this patch doesn't do, but should.

-- 
Matt Fleming, Intel Open Source Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch v3] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-14  0:41   ` [patch v2] " David Rientjes
  2011-04-14  0:46     ` KOSAKI Motohiro
  2011-04-14  1:09     ` Minchan Kim
@ 2011-04-14 20:18     ` David Rientjes
  2011-04-15 22:35       ` Hugh Dickins
  2 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2011-04-14 20:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Hugh Dickins, Izik Eidus, KAMEZAWA Hiroyuki,
	Matt Fleming, linux-mm

There's a kernel-wide shortage of per-process flags, so it's always 
helpful to trim one when possible without incurring a significant 
penalty.  It's even more important when you're planning on adding a per-
process flag yourself, which I plan to do shortly for transparent 
hugepages.

PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a 
tendency to allocate large amounts of memory and should be preferred for 
killing over other tasks.  We'd rather immediately kill the task making 
the errant syscall rather than penalizing an innocent task.

This patch removes PF_OOM_ORIGIN since its behavior is equivalent to 
setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.

The process's old oom_score_adj is stored and then set to 
OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN.  The old 
value is then reinstated when the process should no longer be considered 
a high priority for oom killing.

Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 v3: add comment for test_set_oom_score_adj()
     disable irqs when taking siglock, thanks to Matt Fleming

 include/linux/oom.h   |    2 ++
 include/linux/sched.h |    1 -
 mm/ksm.c              |    7 +++++--
 mm/oom_kill.c         |   36 +++++++++++++++++++++++++++---------
 mm/swapfile.c         |    6 ++++--
 5 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -40,6 +40,8 @@ enum oom_constraint {
 	CONSTRAINT_MEMCG,
 };
 
+extern int test_set_oom_score_adj(int new_val);
+
 extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 			const nodemask_t *nodemask, unsigned long totalpages);
 extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
diff --git a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1738,7 +1738,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
 #define PF_KSWAPD	0x00040000	/* I am kswapd */
-#define PF_OOM_ORIGIN	0x00080000	/* Allocating much memory to others */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
diff --git a/mm/ksm.c b/mm/ksm.c
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -35,6 +35,7 @@
 #include <linux/ksm.h>
 #include <linux/hash.h>
 #include <linux/freezer.h>
+#include <linux/oom.h>
 
 #include <asm/tlbflush.h>
 #include "internal.h"
@@ -1894,9 +1895,11 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr,
 	if (ksm_run != flags) {
 		ksm_run = flags;
 		if (flags & KSM_RUN_UNMERGE) {
-			current->flags |= PF_OOM_ORIGIN;
+			int oom_score_adj;
+
+			oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
 			err = unmerge_and_remove_all_rmap_items();
-			current->flags &= ~PF_OOM_ORIGIN;
+			test_set_oom_score_adj(oom_score_adj);
 			if (err) {
 				ksm_run = KSM_RUN_STOP;
 				count = err;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -38,6 +38,33 @@ int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 static DEFINE_SPINLOCK(zone_scan_lock);
 
+/**
+ * test_set_oom_score_adj() - set current's oom_score_adj and return old value
+ * @new_val: new oom_score_adj value
+ *
+ * Sets the oom_score_adj value for current to @new_val with proper
+ * synchronization and returns the old value.  Usually used to temporarily
+ * set a value, save the old value in the caller, and then reinstate it later.
+ */
+int test_set_oom_score_adj(int new_val)
+{
+	struct sighand_struct *sighand = current->sighand;
+	int old_val;
+
+	spin_lock_irq(&sighand->siglock);
+	old_val = current->signal->oom_score_adj;
+	if (new_val != old_val) {
+		if (new_val == OOM_SCORE_ADJ_MIN)
+			atomic_inc(&current->mm->oom_disable_count);
+		else if (old_val == OOM_SCORE_ADJ_MIN)
+			atomic_dec(&current->mm->oom_disable_count);
+		current->signal->oom_score_adj = new_val;
+	}
+	spin_unlock_irq(&sighand->siglock);
+
+	return old_val;
+}
+
 #ifdef CONFIG_NUMA
 /**
  * has_intersects_mems_allowed() - check task eligiblity for kill
@@ -173,15 +200,6 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	}
 
 	/*
-	 * When the PF_OOM_ORIGIN bit is set, it indicates the task should have
-	 * priority for oom killing.
-	 */
-	if (p->flags & PF_OOM_ORIGIN) {
-		task_unlock(p);
-		return 1000;
-	}
-
-	/*
 	 * The memory controller may have a limit of 0 bytes, so avoid a divide
 	 * by zero, if necessary.
 	 */
diff --git a/mm/swapfile.c b/mm/swapfile.c
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -31,6 +31,7 @@
 #include <linux/syscalls.h>
 #include <linux/memcontrol.h>
 #include <linux/poll.h>
+#include <linux/oom.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -1555,6 +1556,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	struct address_space *mapping;
 	struct inode *inode;
 	char *pathname;
+	int oom_score_adj;
 	int i, type, prev;
 	int err;
 
@@ -1613,9 +1615,9 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	p->flags &= ~SWP_WRITEOK;
 	spin_unlock(&swap_lock);
 
-	current->flags |= PF_OOM_ORIGIN;
+	oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
 	err = try_to_unuse(type);
-	current->flags &= ~PF_OOM_ORIGIN;
+	test_set_oom_score_adj(oom_score_adj);
 
 	if (err) {
 		/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch v3] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-14 20:18     ` [patch v3] " David Rientjes
@ 2011-04-15 22:35       ` Hugh Dickins
  2011-04-15 23:03         ` David Rientjes
  0 siblings, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2011-04-15 22:35 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, KOSAKI Motohiro, KAMEZAWA Hiroyuki, Matt Fleming,
	linux-mm

On Thu, 14 Apr 2011, David Rientjes wrote:

> There's a kernel-wide shortage of per-process flags, so it's always 
> helpful to trim one when possible without incurring a significant 
> penalty.  It's even more important when you're planning on adding a per-
> process flag yourself, which I plan to do shortly for transparent 
> hugepages.
> 
> PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a 
> tendency to allocate large amounts of memory and should be preferred for 
> killing over other tasks.  We'd rather immediately kill the task making 
> the errant syscall rather than penalizing an innocent task.
> 
> This patch removes PF_OOM_ORIGIN since its behavior is equivalent to 
> setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.
> 
> The process's old oom_score_adj is stored and then set to 
> OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN.  The old 
> value is then reinstated when the process should no longer be considered 
> a high priority for oom killing.
> 
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: David Rientjes <rientjes@google.com>

Sorry, I'm trailing along way behind as usual.

This makes good sense (now you're using MAX instead of MIN!),
but may I helatedly ask you to change the name test_set_oom_score_adj()
to replace_oom_score_adj()?  test_set means a bitflag operation to me.

Otherwise,
Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  v3: add comment for test_set_oom_score_adj()
>      disable irqs when taking siglock, thanks to Matt Fleming
> 
>  include/linux/oom.h   |    2 ++
>  include/linux/sched.h |    1 -
>  mm/ksm.c              |    7 +++++--
>  mm/oom_kill.c         |   36 +++++++++++++++++++++++++++---------
>  mm/swapfile.c         |    6 ++++--
>  5 files changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -40,6 +40,8 @@ enum oom_constraint {
>  	CONSTRAINT_MEMCG,
>  };
>  
> +extern int test_set_oom_score_adj(int new_val);
> +
>  extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>  			const nodemask_t *nodemask, unsigned long totalpages);
>  extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1738,7 +1738,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
>  #define PF_FROZEN	0x00010000	/* frozen for system suspend */
>  #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
>  #define PF_KSWAPD	0x00040000	/* I am kswapd */
> -#define PF_OOM_ORIGIN	0x00080000	/* Allocating much memory to others */
>  #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
>  #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
>  #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
> diff --git a/mm/ksm.c b/mm/ksm.c
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -35,6 +35,7 @@
>  #include <linux/ksm.h>
>  #include <linux/hash.h>
>  #include <linux/freezer.h>
> +#include <linux/oom.h>
>  
>  #include <asm/tlbflush.h>
>  #include "internal.h"
> @@ -1894,9 +1895,11 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr,
>  	if (ksm_run != flags) {
>  		ksm_run = flags;
>  		if (flags & KSM_RUN_UNMERGE) {
> -			current->flags |= PF_OOM_ORIGIN;
> +			int oom_score_adj;
> +
> +			oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
>  			err = unmerge_and_remove_all_rmap_items();
> -			current->flags &= ~PF_OOM_ORIGIN;
> +			test_set_oom_score_adj(oom_score_adj);
>  			if (err) {
>  				ksm_run = KSM_RUN_STOP;
>  				count = err;
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -38,6 +38,33 @@ int sysctl_oom_kill_allocating_task;
>  int sysctl_oom_dump_tasks = 1;
>  static DEFINE_SPINLOCK(zone_scan_lock);
>  
> +/**
> + * test_set_oom_score_adj() - set current's oom_score_adj and return old value
> + * @new_val: new oom_score_adj value
> + *
> + * Sets the oom_score_adj value for current to @new_val with proper
> + * synchronization and returns the old value.  Usually used to temporarily
> + * set a value, save the old value in the caller, and then reinstate it later.
> + */
> +int test_set_oom_score_adj(int new_val)
> +{
> +	struct sighand_struct *sighand = current->sighand;
> +	int old_val;
> +
> +	spin_lock_irq(&sighand->siglock);
> +	old_val = current->signal->oom_score_adj;
> +	if (new_val != old_val) {
> +		if (new_val == OOM_SCORE_ADJ_MIN)
> +			atomic_inc(&current->mm->oom_disable_count);
> +		else if (old_val == OOM_SCORE_ADJ_MIN)
> +			atomic_dec(&current->mm->oom_disable_count);
> +		current->signal->oom_score_adj = new_val;
> +	}
> +	spin_unlock_irq(&sighand->siglock);
> +
> +	return old_val;
> +}
> +
>  #ifdef CONFIG_NUMA
>  /**
>   * has_intersects_mems_allowed() - check task eligiblity for kill
> @@ -173,15 +200,6 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>  	}
>  
>  	/*
> -	 * When the PF_OOM_ORIGIN bit is set, it indicates the task should have
> -	 * priority for oom killing.
> -	 */
> -	if (p->flags & PF_OOM_ORIGIN) {
> -		task_unlock(p);
> -		return 1000;
> -	}
> -
> -	/*
>  	 * The memory controller may have a limit of 0 bytes, so avoid a divide
>  	 * by zero, if necessary.
>  	 */
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -31,6 +31,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/memcontrol.h>
>  #include <linux/poll.h>
> +#include <linux/oom.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
> @@ -1555,6 +1556,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>  	struct address_space *mapping;
>  	struct inode *inode;
>  	char *pathname;
> +	int oom_score_adj;
>  	int i, type, prev;
>  	int err;
>  
> @@ -1613,9 +1615,9 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>  	p->flags &= ~SWP_WRITEOK;
>  	spin_unlock(&swap_lock);
>  
> -	current->flags |= PF_OOM_ORIGIN;
> +	oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
>  	err = try_to_unuse(type);
> -	current->flags &= ~PF_OOM_ORIGIN;
> +	test_set_oom_score_adj(oom_score_adj);
>  
>  	if (err) {
>  		/*
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch v3] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-15 22:35       ` Hugh Dickins
@ 2011-04-15 23:03         ` David Rientjes
  2011-04-16  1:48           ` Hugh Dickins
  0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2011-04-15 23:03 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, KOSAKI Motohiro, KAMEZAWA Hiroyuki, Matt Fleming,
	linux-mm

On Fri, 15 Apr 2011, Hugh Dickins wrote:

> This makes good sense (now you're using MAX instead of MIN!),
> but may I helatedly ask you to change the name test_set_oom_score_adj()
> to replace_oom_score_adj()?  test_set means a bitflag operation to me.
> 

Does replace_oom_score_adj() imply that it will be returning the old value 
of oom_score_adj like test_set_oom_score_adj() does?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch v3] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj
  2011-04-15 23:03         ` David Rientjes
@ 2011-04-16  1:48           ` Hugh Dickins
  0 siblings, 0 replies; 12+ messages in thread
From: Hugh Dickins @ 2011-04-16  1:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, KOSAKI Motohiro, KAMEZAWA Hiroyuki, Matt Fleming,
	linux-mm

On Fri, Apr 15, 2011 at 4:03 PM, David Rientjes <rientjes@google.com> wrote:
> On Fri, 15 Apr 2011, Hugh Dickins wrote:
>
>> This makes good sense (now you're using MAX instead of MIN!),
>> but may I helatedly ask you to change the name test_set_oom_score_adj()
>> to replace_oom_score_adj()?  test_set means a bitflag operation to me.
>>
>
> Does replace_oom_score_adj() imply that it will be returning the old value
> of oom_score_adj like test_set_oom_score_adj() does?

I can easily imagine an implementation of "replace_oom_score_adj"
which does not return the old value: so no, that name does not imply
that it will be returning the old value.  But since it does return
something, it's quite reasonable that what it returns is the old
value.

Whereas "test_set_oom_score_adj" tends to imply that it will set the
oom_score_adj only if it's currently zero.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-04-16  1:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-13 18:33 [patch] oom: replace PF_OOM_ORIGIN with toggling oom_score_adj David Rientjes
2011-04-14  0:03 ` KOSAKI Motohiro
2011-04-14  0:41   ` [patch v2] " David Rientjes
2011-04-14  0:46     ` KOSAKI Motohiro
2011-04-14  1:09     ` Minchan Kim
2011-04-14  1:12       ` David Rientjes
2011-04-14  1:21         ` Minchan Kim
2011-04-14  7:55           ` Matt Fleming
2011-04-14 20:18     ` [patch v3] " David Rientjes
2011-04-15 22:35       ` Hugh Dickins
2011-04-15 23:03         ` David Rientjes
2011-04-16  1:48           ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.