linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
@ 2015-10-22  6:49 Hongjie Fang (方洪杰)
  2015-10-22  9:47 ` Michal Hocko
  2015-10-26 21:42 ` David Rientjes
  0 siblings, 2 replies; 14+ messages in thread
From: Hongjie Fang (方洪杰) @ 2015-10-22  6:49 UTC (permalink / raw)
  To: David Rientjes, Michal Hocko, Eric W. Biederman; +Cc: linux-kernel


The oom_adj has been replaced by oom_score_adj in kernel,
but the /proc/pid/oom_adj is provided for legacy purposes.
When write/read a value into/from /proc/pid/oom_adj,
there is a transformation between oom_adj and oom_score_adj.

After writing a new value into /proc/pid/oom_adj, then read it.
The return value is a different value than you wrote.
Fix this by adding a adjustment factor.

Signed-off-by: Hongjie Fang <hongjie.fang@spreadtrum.com>
---
Encountered the problem when I changed a task's oom_adj on
an Android smart phone. As follows,
1. # cat /proc/1450/oom_adj
   15
2. # echo 10 > /proc/1450/oom_adj
3. # cat /proc/1450/oom_adj
   9


diff --git a/fs/proc/base.c b/fs/proc/base.c
index b25eee4..2312e43 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1019,15 +1019,19 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count,
 	int oom_adj = OOM_ADJUST_MIN;
 	size_t len;
 	unsigned long flags;
+	int adjust;
 
 	if (!task)
 		return -ESRCH;
 	if (lock_task_sighand(task, &flags)) {
-		if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MAX)
+		if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MAX) {
 			oom_adj = OOM_ADJUST_MAX;
-		else
-			oom_adj = (task->signal->oom_score_adj * -OOM_DISABLE) /
+		} else {
+			adjust = task->signal->oom_score_adj > 0 ?
+				 (OOM_SCORE_ADJ_MAX-1) : -(OOM_SCORE_ADJ_MAX-1);
+			oom_adj = (task->signal->oom_score_adj * -OOM_DISABLE + adjust) /
 				  OOM_SCORE_ADJ_MAX;
+		}
 		unlock_task_sighand(task, &flags);
 	}
 	put_task_struct(task);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-10-22  6:49 [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj Hongjie Fang (方洪杰)
@ 2015-10-22  9:47 ` Michal Hocko
  2015-10-26 21:42 ` David Rientjes
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2015-10-22  9:47 UTC (permalink / raw)
  To: Hongjie Fang (方洪杰)
  Cc: David Rientjes, Eric W. Biederman, linux-kernel

On Thu 22-10-15 06:49:01, Hongjie Fang (方洪杰) wrote:
> 
> The oom_adj has been replaced by oom_score_adj in kernel,
> but the /proc/pid/oom_adj is provided for legacy purposes.
> When write/read a value into/from /proc/pid/oom_adj,
> there is a transformation between oom_adj and oom_score_adj.
> 
> After writing a new value into /proc/pid/oom_adj, then read it.
> The return value is a different value than you wrote.
> Fix this by adding a adjustment factor.

... when printing the value.

Your previous patch has changed the stored value while this one only
changes the presented value. The previous one was more correct IMO
but in reality the difference (+-1) in oom_score_adj should be hardly
noticeable and nobody has complained about the current scaling for years
so this approach is probably more conservative.

> Signed-off-by: Hongjie Fang <hongjie.fang@spreadtrum.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> Encountered the problem when I changed a task's oom_adj on
> an Android smart phone. As follows,
> 1. # cat /proc/1450/oom_adj
>    15
> 2. # echo 10 > /proc/1450/oom_adj
> 3. # cat /proc/1450/oom_adj
>    9
> 
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index b25eee4..2312e43 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1019,15 +1019,19 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count,
>  	int oom_adj = OOM_ADJUST_MIN;
>  	size_t len;
>  	unsigned long flags;
> +	int adjust;
>  
>  	if (!task)
>  		return -ESRCH;
>  	if (lock_task_sighand(task, &flags)) {
> -		if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MAX)
> +		if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MAX) {
>  			oom_adj = OOM_ADJUST_MAX;
> -		else
> -			oom_adj = (task->signal->oom_score_adj * -OOM_DISABLE) /
> +		} else {
> +			adjust = task->signal->oom_score_adj > 0 ?
> +				 (OOM_SCORE_ADJ_MAX-1) : -(OOM_SCORE_ADJ_MAX-1);
> +			oom_adj = (task->signal->oom_score_adj * -OOM_DISABLE + adjust) /
>  				  OOM_SCORE_ADJ_MAX;
> +		}
>  		unlock_task_sighand(task, &flags);
>  	}
>  	put_task_struct(task);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-10-22  6:49 [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj Hongjie Fang (方洪杰)
  2015-10-22  9:47 ` Michal Hocko
@ 2015-10-26 21:42 ` David Rientjes
  2015-10-27 12:37   ` Michal Hocko
       [not found]   ` <5eff586de266418090f792077fcff993@SHMBX01.spreadtrum.com>
  1 sibling, 2 replies; 14+ messages in thread
From: David Rientjes @ 2015-10-26 21:42 UTC (permalink / raw)
  To: Hongjie Fang (方洪杰)
  Cc: Michal Hocko, Eric W. Biederman, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 936 bytes --]

On Thu, 22 Oct 2015, Hongjie Fang (方洪杰) wrote:

> 
> The oom_adj has been replaced by oom_score_adj in kernel,
> but the /proc/pid/oom_adj is provided for legacy purposes.
> When write/read a value into/from /proc/pid/oom_adj,
> there is a transformation between oom_adj and oom_score_adj.
> 
> After writing a new value into /proc/pid/oom_adj, then read it.
> The return value is a different value than you wrote.
> Fix this by adding a adjustment factor.
> 

You're only looking at the output and seeing that it disagrees with what 
was written and ignoring _why_ it disagrees.

It's because, as I already stated, oom_score_adj is the effective tunable 
for oom kill process prioritization and the legacy oom_adj had a different 
scale where a 1:1 mapping is not possible.

All throughout the kernel, we report the effective value.  We accept 
writes and the reads report the effective value.  This is no different.

Nack again.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-10-26 21:42 ` David Rientjes
@ 2015-10-27 12:37   ` Michal Hocko
       [not found]   ` <5eff586de266418090f792077fcff993@SHMBX01.spreadtrum.com>
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2015-10-27 12:37 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hongjie Fang (方洪杰), Eric W. Biederman, linux-kernel

On Mon 26-10-15 14:42:57, David Rientjes wrote:
> On Thu, 22 Oct 2015, Hongjie Fang (方洪杰) wrote:
> 
> > 
> > The oom_adj has been replaced by oom_score_adj in kernel,
> > but the /proc/pid/oom_adj is provided for legacy purposes.
> > When write/read a value into/from /proc/pid/oom_adj,
> > there is a transformation between oom_adj and oom_score_adj.
> > 
> > After writing a new value into /proc/pid/oom_adj, then read it.
> > The return value is a different value than you wrote.
> > Fix this by adding a adjustment factor.
> > 
> 
> You're only looking at the output and seeing that it disagrees with what 
> was written and ignoring _why_ it disagrees.
> 
> It's because, as I already stated, oom_score_adj is the effective tunable 
> for oom kill process prioritization and the legacy oom_adj had a different 
> scale where a 1:1 mapping is not possible.
> 
> All throughout the kernel, we report the effective value.  We accept 
> writes and the reads report the effective value.  This is no different.
> 
> Nack again.

I really fail to understand your reasoning. The patch basically fixes up
the presented value of oom_adj after rounding imprecision. It doesn't
change the way how the oom_adj->oom_score_aj mapping is done at all. All
it does is that it presents oom_adj1 -> oom_score_adj -> oom_adj2 and
oom_adj1 = oom_adj2

How can this be any harmful? And more importantly why do you want to
expose the imprecision in the mapping to the user space in the first
place?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 答复: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
       [not found]   ` <5eff586de266418090f792077fcff993@SHMBX01.spreadtrum.com>
@ 2015-10-28 23:54     ` David Rientjes
  2015-10-29  3:47       ` Eric W. Biederman
  2015-10-29 17:04       ` Michal Hocko
  0 siblings, 2 replies; 14+ messages in thread
From: David Rientjes @ 2015-10-28 23:54 UTC (permalink / raw)
  To: Hongjie Fang (方洪杰)
  Cc: Michal Hocko, Eric W. Biederman, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 741 bytes --]

On Wed, 28 Oct 2015, Hongjie Fang (方洪杰) wrote:

> Under a userspace perspective, get a different value than he wrote, 
> it must be confusing.
> 

It's confusing, but with purpose: it shows there is no direct mapping 
between /proc/pid/oom_adj and /proc/pid/oom_score_adj.  
/proc/pid/oom_score_adj is the effective policy and has been for years.  
The value returned by /proc/pid/oom_adj demonstrates reality vs what is 
perceived and is a side-effect of integer division truncating the result 
in C.

It's a bad situation, I agree, and we anticipated the complete removal of 
/proc/pid/oom_adj years ago since it has been deprecated for years.  Maybe 
one day we can convince Linus that is possible, but until then we're stuck 
with it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 答复: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-10-28 23:54     ` 答复: " David Rientjes
@ 2015-10-29  3:47       ` Eric W. Biederman
  2015-11-01  4:30         ` David Rientjes
  2015-10-29 17:04       ` Michal Hocko
  1 sibling, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2015-10-29  3:47 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hongjie Fang (方洪杰), Michal Hocko, linux-kernel

David Rientjes <rientjes@google.com> writes:

> On Wed, 28 Oct 2015, Hongjie Fang (方洪杰) wrote:
>
>> Under a userspace perspective, get a different value than he wrote, 
>> it must be confusing.
>> 
>
> It's confusing, but with purpose: it shows there is no direct mapping 
> between /proc/pid/oom_adj and /proc/pid/oom_score_adj.  
> /proc/pid/oom_score_adj is the effective policy and has been for years.  
> The value returned by /proc/pid/oom_adj demonstrates reality vs what is 
> perceived and is a side-effect of integer division truncating the result 
> in C.
>
> It's a bad situation, I agree, and we anticipated the complete removal of 
> /proc/pid/oom_adj years ago since it has been deprecated for years.  Maybe 
> one day we can convince Linus that is possible, but until then we're stuck 
> with it.

If you really want to remove /proc/pid/oom_adj start by placing it in a
Kconfig so people can make it go away.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 答复: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-10-28 23:54     ` 答复: " David Rientjes
  2015-10-29  3:47       ` Eric W. Biederman
@ 2015-10-29 17:04       ` Michal Hocko
  2015-10-30 12:59         ` Michal Hocko
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2015-10-29 17:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hongjie Fang (方洪杰), Eric W. Biederman, linux-kernel

On Wed 28-10-15 16:54:04, David Rientjes wrote:
[...]
> It's a bad situation, I agree, and we anticipated the complete removal of 
> /proc/pid/oom_adj years ago since it has been deprecated for years.  Maybe 
> one day we can convince Linus that is possible, but until then we're stuck 
> with it.

Let's do it then.
---
>From 71be967d15b3298f3fad7e49ee51f852761b9632 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 29 Oct 2015 17:42:22 +0100
Subject: [PATCH] oom: get rid of oom_adj

oom_adj has been marked as deprecated by 51b1bd2ace15 ("oom: deprecate
oom_adj tunable") which is quite some time ago. The knob was even
removed by 01dc52ebdf47 ("oom: remove deprecated oom_adj") but
then reintroduced back by fa0cbbf145aa ("mm, oom: reintroduce
/proc/pid/oom_adj") with a scaling mechanism to map oom_adj to
oom_score_adj. The mapping is not ideal and quite confusing as noted by
Hongjie Fang:
$ echo 10 > /proc/1450/oom_adj
$ cat /proc/1450/oom_adj
9

This could be fixed but it seems that the knob has been deprecated for
long enough to finally get rid of it altogether. Most of the reports
happened back in 2012 resp. 2013 and programs were fixed to either
use oom_score_adj or have a fallback mechanism to use oom_adj on older
kernels.

Time has come to finally get rid of this duality and potential source of
confusion. Let's keep OOM_ADJUST_* and OOM_DISABLE in user headers to
not break existing code compilation.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/filesystems/proc.txt |  11 +---
 fs/proc/base.c                     | 108 -------------------------------------
 2 files changed, 2 insertions(+), 117 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d6f259eaa5ef..762bd3f410e9 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -33,8 +33,7 @@ Table of Contents
   2	Modifying System Parameters
 
   3	Per-Process Parameters
-  3.1	/proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
-								score
+  3.1	/proc/<pid>/oom_score_adj - Adjust the oom-killer score
   3.2	/proc/<pid>/oom_score - Display current oom-killer score
   3.3	/proc/<pid>/io - Display the IO accounting fields
   3.4	/proc/<pid>/coredump_filter - Core dump filtering settings
@@ -1436,7 +1435,7 @@ of the kernel.
 CHAPTER 3: PER-PROCESS PARAMETERS
 ------------------------------------------------------------------------------
 
-3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
+3.1 /proc/<pid>/oom_score_adj- Adjust the oom-killer score
 --------------------------------------------------------------------------------
 
 These file can be used to adjust the badness heuristic used to select which
@@ -1477,12 +1476,6 @@ same system, cpuset, mempolicy, or memory controller resources to use at least
 equivalent to discounting 50% of the task's allowed memory from being considered
 as scoring against the task.
 
-For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also
-be used to tune the badness score.  Its acceptable values range from -16
-(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
-(OOM_DISABLE) to disable oom killing entirely for that task.  Its value is
-scaled linearly with /proc/<pid>/oom_score_adj.
-
 The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
 value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
 requires CAP_SYS_RESOURCE.
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 60c71b10eaee..f34e43dd8727 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1011,112 +1011,6 @@ static const struct file_operations proc_environ_operations = {
 	.release	= mem_release,
 };
 
-static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count,
-			    loff_t *ppos)
-{
-	struct task_struct *task = get_proc_task(file_inode(file));
-	char buffer[PROC_NUMBUF];
-	int oom_adj = OOM_ADJUST_MIN;
-	size_t len;
-	unsigned long flags;
-
-	if (!task)
-		return -ESRCH;
-	if (lock_task_sighand(task, &flags)) {
-		if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MAX)
-			oom_adj = OOM_ADJUST_MAX;
-		else
-			oom_adj = (task->signal->oom_score_adj * -OOM_DISABLE) /
-				  OOM_SCORE_ADJ_MAX;
-		unlock_task_sighand(task, &flags);
-	}
-	put_task_struct(task);
-	len = snprintf(buffer, sizeof(buffer), "%d\n", oom_adj);
-	return simple_read_from_buffer(buf, count, ppos, buffer, len);
-}
-
-static ssize_t oom_adj_write(struct file *file, const char __user *buf,
-			     size_t count, loff_t *ppos)
-{
-	struct task_struct *task;
-	char buffer[PROC_NUMBUF];
-	int oom_adj;
-	unsigned long flags;
-	int err;
-
-	memset(buffer, 0, sizeof(buffer));
-	if (count > sizeof(buffer) - 1)
-		count = sizeof(buffer) - 1;
-	if (copy_from_user(buffer, buf, count)) {
-		err = -EFAULT;
-		goto out;
-	}
-
-	err = kstrtoint(strstrip(buffer), 0, &oom_adj);
-	if (err)
-		goto out;
-	if ((oom_adj < OOM_ADJUST_MIN || oom_adj > OOM_ADJUST_MAX) &&
-	     oom_adj != OOM_DISABLE) {
-		err = -EINVAL;
-		goto out;
-	}
-
-	task = get_proc_task(file_inode(file));
-	if (!task) {
-		err = -ESRCH;
-		goto out;
-	}
-
-	task_lock(task);
-	if (!task->mm) {
-		err = -EINVAL;
-		goto err_task_lock;
-	}
-
-	if (!lock_task_sighand(task, &flags)) {
-		err = -ESRCH;
-		goto err_task_lock;
-	}
-
-	/*
-	 * Scale /proc/pid/oom_score_adj appropriately ensuring that a maximum
-	 * value is always attainable.
-	 */
-	if (oom_adj == OOM_ADJUST_MAX)
-		oom_adj = OOM_SCORE_ADJ_MAX;
-	else
-		oom_adj = (oom_adj * OOM_SCORE_ADJ_MAX) / -OOM_DISABLE;
-
-	if (oom_adj < task->signal->oom_score_adj &&
-	    !capable(CAP_SYS_RESOURCE)) {
-		err = -EACCES;
-		goto err_sighand;
-	}
-
-	/*
-	 * /proc/pid/oom_adj is provided for legacy purposes, ask users to use
-	 * /proc/pid/oom_score_adj instead.
-	 */
-	pr_warn_once("%s (%d): /proc/%d/oom_adj is deprecated, please use /proc/%d/oom_score_adj instead.\n",
-		  current->comm, task_pid_nr(current), task_pid_nr(task),
-		  task_pid_nr(task));
-
-	task->signal->oom_score_adj = oom_adj;
-	trace_oom_score_adj_update(task);
-err_sighand:
-	unlock_task_sighand(task, &flags);
-err_task_lock:
-	task_unlock(task);
-	put_task_struct(task);
-out:
-	return err < 0 ? err : count;
-}
-
-static const struct file_operations proc_oom_adj_operations = {
-	.read		= oom_adj_read,
-	.write		= oom_adj_write,
-	.llseek		= generic_file_llseek,
-};
 
 static ssize_t oom_score_adj_read(struct file *file, char __user *buf,
 					size_t count, loff_t *ppos)
@@ -2813,7 +2707,6 @@ static const struct pid_entry tgid_base_stuff[] = {
 	ONE("cgroup",  S_IRUGO, proc_cgroup_show),
 #endif
 	ONE("oom_score",  S_IRUGO, proc_oom_score),
-	REG("oom_adj",    S_IRUGO|S_IWUSR, proc_oom_adj_operations),
 	REG("oom_score_adj", S_IRUGO|S_IWUSR, proc_oom_score_adj_operations),
 #ifdef CONFIG_AUDITSYSCALL
 	REG("loginuid",   S_IWUSR|S_IRUGO, proc_loginuid_operations),
@@ -3161,7 +3054,6 @@ static const struct pid_entry tid_base_stuff[] = {
 	ONE("cgroup",  S_IRUGO, proc_cgroup_show),
 #endif
 	ONE("oom_score", S_IRUGO, proc_oom_score),
-	REG("oom_adj",   S_IRUGO|S_IWUSR, proc_oom_adj_operations),
 	REG("oom_score_adj", S_IRUGO|S_IWUSR, proc_oom_score_adj_operations),
 #ifdef CONFIG_AUDITSYSCALL
 	REG("loginuid",  S_IWUSR|S_IRUGO, proc_loginuid_operations),
-- 
2.6.1



-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: 答复: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-10-29 17:04       ` Michal Hocko
@ 2015-10-30 12:59         ` Michal Hocko
  2015-10-30 14:46           ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2015-10-30 12:59 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hongjie Fang (方洪杰), Eric W. Biederman, linux-kernel

On Thu 29-10-15 18:04:22, Michal Hocko wrote:
> On Wed 28-10-15 16:54:04, David Rientjes wrote:
> [...]
> > It's a bad situation, I agree, and we anticipated the complete removal of 
> > /proc/pid/oom_adj years ago since it has been deprecated for years.  Maybe 
> > one day we can convince Linus that is possible, but until then we're stuck 
> > with it.
> 
> Let's do it then.

I've just checked debian code search and it seems that procps still
relies on oom_adj. I have sent a patch but that sounds like we are not
there yet. I will hunt for other projects still using the deprecated
file exclusively. Hopefully there won't be too many of them.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 答复: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-10-30 12:59         ` Michal Hocko
@ 2015-10-30 14:46           ` Michal Hocko
  2015-11-01  4:38             ` David Rientjes
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2015-10-30 14:46 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hongjie Fang (方洪杰), Eric W. Biederman, linux-kernel

On Fri 30-10-15 13:59:03, Michal Hocko wrote:
> On Thu 29-10-15 18:04:22, Michal Hocko wrote:
> > On Wed 28-10-15 16:54:04, David Rientjes wrote:
> > [...]
> > > It's a bad situation, I agree, and we anticipated the complete removal of 
> > > /proc/pid/oom_adj years ago since it has been deprecated for years.  Maybe 
> > > one day we can convince Linus that is possible, but until then we're stuck 
> > > with it.
> > 
> > Let's do it then.
> 
> I've just checked debian code search and it seems that procps still
> relies on oom_adj. I have sent a patch but that sounds like we are not
> there yet. I will hunt for other projects still using the deprecated
> file exclusively. Hopefully there won't be too many of them.

It doesn't look that bad afterall:
$ curl -s http://codesearch.debian.net/results/7223e657af3f2ad0/packages.json
{"Packages":["tgt","ggobi","hurd","linux","condor","wine-gecko-2.21","android-platform-frameworks-native","nautilus","procps","wireshark","intel-gpu-tools","iceweasel","icedove","ardour","linux-tools","kde4libs","nss-pam-ldapd","chromium-browser","passenger","archipel-agent-virtualmachine-oomkiller","bleachbit","tilestache","slurm-llnl","ns3","nbd","open-iscsi","mhwaveedit","nilfs-tools","stress-ng","lvm2","gradm2","audit","postgresql-common","zfs-fuse","ocfs2-tools","gimp","advene","lldpad","reniced","pitivi","trinity","petri-foo","rtai","postgresql-9.4","procenv","multipath-tools","percona-toolkit","apparmor","upstart","watchdog","boinc","fusil","util-vserver","booth","geeqie","openssh","oar","android-platform-system-core","kinit","xournal","player","gimp-gap","android-tools"]}

Of those
* android-tools need a trivial patch - not sure who is upstream here
  so pushed through Debian bugzilla - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803485
* boinc-client.init need a trivial patch - Debian specific it seesm
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803484
* ocfs2_controld - posted to debian as I wasn't sure about the upstream
  status - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803486
* procps needs a trivial patch - sent upstream already

This two are nasty because they consume only oom_adj scale so we have
to rescale explicitly :/
* archipel-agent-virtualmachine-oomkiller oom_adj is stored in the DB
* reniced - this is one is nasty as well because it consumes oom_adj
  from user

I will have a look at them early next week.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 答复: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-10-29  3:47       ` Eric W. Biederman
@ 2015-11-01  4:30         ` David Rientjes
  0 siblings, 0 replies; 14+ messages in thread
From: David Rientjes @ 2015-11-01  4:30 UTC (permalink / raw)
  To: Andrew Morton, Eric W. Biederman
  Cc: Hongjie Fang (方洪杰), Michal Hocko, linux-kernel

On Wed, 28 Oct 2015, Eric W. Biederman wrote:

> > It's confusing, but with purpose: it shows there is no direct mapping 
> > between /proc/pid/oom_adj and /proc/pid/oom_score_adj.  
> > /proc/pid/oom_score_adj is the effective policy and has been for years.  
> > The value returned by /proc/pid/oom_adj demonstrates reality vs what is 
> > perceived and is a side-effect of integer division truncating the result 
> > in C.
> >
> > It's a bad situation, I agree, and we anticipated the complete removal of 
> > /proc/pid/oom_adj years ago since it has been deprecated for years.  Maybe 
> > one day we can convince Linus that is possible, but until then we're stuck 
> > with it.
> 
> If you really want to remove /proc/pid/oom_adj start by placing it in a
> Kconfig so people can make it go away.
> 

[+akpm]

We've tried to remove it over the period of a couple of years after 
oom_score_adj was introduced.  I believe we had a WARN_ON() for the first 
two years and then removed it all together.  A few months later, there was 
a bug report because the file didn't exist and Linus insisted it be 
readded because we don't break userspace.

I would absolutely love to remove oom_adj, but I'm afraid that it may not 
be possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 答复: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-10-30 14:46           ` Michal Hocko
@ 2015-11-01  4:38             ` David Rientjes
  2015-11-02 13:08               ` Michal Hocko
  2015-11-02 17:18               ` Dave Jones
  0 siblings, 2 replies; 14+ messages in thread
From: David Rientjes @ 2015-11-01  4:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Hongjie Fang (方洪杰), Eric W. Biederman, linux-kernel

On Fri, 30 Oct 2015, Michal Hocko wrote:

> On Fri 30-10-15 13:59:03, Michal Hocko wrote:
> > On Thu 29-10-15 18:04:22, Michal Hocko wrote:
> > > On Wed 28-10-15 16:54:04, David Rientjes wrote:
> > > [...]
> > > > It's a bad situation, I agree, and we anticipated the complete removal of 
> > > > /proc/pid/oom_adj years ago since it has been deprecated for years.  Maybe 
> > > > one day we can convince Linus that is possible, but until then we're stuck 
> > > > with it.
> > > 
> > > Let's do it then.
> > 
> > I've just checked debian code search and it seems that procps still
> > relies on oom_adj. I have sent a patch but that sounds like we are not
> > there yet. I will hunt for other projects still using the deprecated
> > file exclusively. Hopefully there won't be too many of them.
> 
> It doesn't look that bad afterall:
> $ curl -s http://codesearch.debian.net/results/7223e657af3f2ad0/packages.json
> {"Packages":["tgt","ggobi","hurd","linux","condor","wine-gecko-2.21","android-platform-frameworks-native","nautilus","procps","wireshark","intel-gpu-tools","iceweasel","icedove","ardour","linux-tools","kde4libs","nss-pam-ldapd","chromium-browser","passenger","archipel-agent-virtualmachine-oomkiller","bleachbit","tilestache","slurm-llnl","ns3","nbd","open-iscsi","mhwaveedit","nilfs-tools","stress-ng","lvm2","gradm2","audit","postgresql-common","zfs-fuse","ocfs2-tools","gimp","advene","lldpad","reniced","pitivi","trinity","petri-foo","rtai","postgresql-9.4","procenv","multipath-tools","percona-toolkit","apparmor","upstart","watchdog","boinc","fusil","util-vserver","booth","geeqie","openssh","oar","android-platform-system-core","kinit","xournal","player","gimp-gap","android-tools"]}
> 
> Of those
> * android-tools need a trivial patch - not sure who is upstream here
>   so pushed through Debian bugzilla - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803485
> * boinc-client.init need a trivial patch - Debian specific it seesm
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803484
> * ocfs2_controld - posted to debian as I wasn't sure about the upstream
>   status - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803486
> * procps needs a trivial patch - sent upstream already
> 
> This two are nasty because they consume only oom_adj scale so we have
> to rescale explicitly :/
> * archipel-agent-virtualmachine-oomkiller oom_adj is stored in the DB
> * reniced - this is one is nasty as well because it consumes oom_adj
>   from user
> 
> I will have a look at them early next week.

Reintroducing oom_adj came from the thread at 
https://lkml.org/lkml/2012/11/12/281.  I believe kde has long supported 
oom_score_adj, but the objection Linus raised wasn't about a specific 
binary, rather "all userspace".  

I did the same type of searching and fixing back then for things like 
udev, chromium, openssh, etc to convert them to oom_score_adj.  It was 
mostly trivial because people either set it to -17 to disable from the oom 
killer or to 0 to nullify an oom-disabled process.

I'd love to be able to remove oom_adj.  I'm not sure if we can get to that 
point if the instance is that "all userspace" must not write to it and it 
would require users to rebuild their binaries.  If we could show that all 
the major open source users of oom_adj (there can't be _that_ many that 
would be significantly impacted since you needed CAP_SYS_RESOURCE to 
reduce it) were converted, maybe Linus would accept it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 答复: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-11-01  4:38             ` David Rientjes
@ 2015-11-02 13:08               ` Michal Hocko
  2015-11-02 17:18               ` Dave Jones
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2015-11-02 13:08 UTC (permalink / raw)
  To: David Rientjes
  Cc: Hongjie Fang (方洪杰), Eric W. Biederman, linux-kernel

On Sat 31-10-15 21:38:31, David Rientjes wrote:
[...]
> I'd love to be able to remove oom_adj.  I'm not sure if we can get to that 
> point if the instance is that "all userspace" must not write to it and it 
> would require users to rebuild their binaries. 

Considering there are still some which need fixing, we are definitely
not there yet. It will take some time until all the distributor take the
fixes. So we have to wait some more. I will make sure to push the fixes
to the remaining ones.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 答复: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-11-01  4:38             ` David Rientjes
  2015-11-02 13:08               ` Michal Hocko
@ 2015-11-02 17:18               ` Dave Jones
  2015-11-02 20:24                 ` Michal Hocko
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Jones @ 2015-11-02 17:18 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Hongjie Fang (方洪杰),
	Eric W. Biederman, linux-kernel

On Sat, Oct 31, 2015 at 09:38:31PM -0700, David Rientjes wrote:
 > On Fri, 30 Oct 2015, Michal Hocko wrote:
 > 
 > > > I will hunt for other projects still using the deprecated
 > > > file exclusively. Hopefully there won't be too many of them.
 > > 
 > > It doesn't look that bad afterall:
 > > $ curl -s http://codesearch.debian.net/results/7223e657af3f2ad0/packages.json
 > > {"Packages":["tgt","ggobi","hurd","linux","condor","wine-gecko-2.21","android-platform-frameworks-native","nautilus","procps","wireshark","intel-gpu-tools","iceweasel","icedove","ardour","linux-tools","kde4libs","nss-pam-ldapd","chromium-browser","passenger","archipel-agent-virtualmachine-oomkiller","bleachbit","tilestache","slurm-llnl","ns3","nbd","open-iscsi","mhwaveedit","nilfs-tools","stress-ng","lvm2","gradm2","audit","postgresql-common","zfs-fuse","ocfs2-tools","gimp","advene","lldpad","reniced","pitivi","trinity","petri-foo","rtai","postgresql-9.4","procenv","multipath-tools","percona-toolkit","apparmor","upstart","watchdog","boinc","fusil","util-vserver","booth","geeqie","openssh","oar","android-platform-system-core","kinit","xournal","player","gimp-gap","android-tools"]}
>> 
 > I did the same type of searching and fixing back then for things like 
 > udev, chromium, openssh, etc to convert them to oom_score_adj.  It was 
 > mostly trivial because people either set it to -17 to disable from the oom 
 > killer or to 0 to nullify an oom-disabled process.
 > 
 > I'd love to be able to remove oom_adj.  I'm not sure if we can get to that 
 > point if the instance is that "all userspace" must not write to it and it 
 > would require users to rebuild their binaries.  If we could show that all 
 > the major open source users of oom_adj (there can't be _that_ many that 
 > would be significantly impacted since you needed CAP_SYS_RESOURCE to 
 > reduce it) were converted, maybe Linus would accept it.

FWIW, you can probably remove trinity from that list.
It only uses oom_adj as "don't write to this file if it exists".
It's absense would be silently ignored.  The child processes have
been using oom_score_adj for a long time.

	Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 答复: [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj
  2015-11-02 17:18               ` Dave Jones
@ 2015-11-02 20:24                 ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2015-11-02 20:24 UTC (permalink / raw)
  To: Dave Jones
  Cc: David Rientjes, Hongjie Fang (方洪杰),
	Eric W. Biederman, linux-kernel

On Mon 02-11-15 12:18:50, Dave Jones wrote:
> On Sat, Oct 31, 2015 at 09:38:31PM -0700, David Rientjes wrote:
>  > On Fri, 30 Oct 2015, Michal Hocko wrote:
>  > 
>  > > > I will hunt for other projects still using the deprecated
>  > > > file exclusively. Hopefully there won't be too many of them.
>  > > 
>  > > It doesn't look that bad afterall:
>  > > $ curl -s http://codesearch.debian.net/results/7223e657af3f2ad0/packages.json
>  > > {"Packages":["tgt","ggobi","hurd","linux","condor","wine-gecko-2.21","android-platform-frameworks-native","nautilus","procps","wireshark","intel-gpu-tools","iceweasel","icedove","ardour","linux-tools","kde4libs","nss-pam-ldapd","chromium-browser","passenger","archipel-agent-virtualmachine-oomkiller","bleachbit","tilestache","slurm-llnl","ns3","nbd","open-iscsi","mhwaveedit","nilfs-tools","stress-ng","lvm2","gradm2","audit","postgresql-common","zfs-fuse","ocfs2-tools","gimp","advene","lldpad","reniced","pitivi","trinity","petri-foo","rtai","postgresql-9.4","procenv","multipath-tools","percona-toolkit","apparmor","upstart","watchdog","boinc","fusil","util-vserver","booth","geeqie","openssh","oar","android-platform-system-core","kinit","xournal","player","gimp-gap","android-tools"]}
> >> 
>  > I did the same type of searching and fixing back then for things like 
>  > udev, chromium, openssh, etc to convert them to oom_score_adj.  It was 
>  > mostly trivial because people either set it to -17 to disable from the oom 
>  > killer or to 0 to nullify an oom-disabled process.
>  > 
>  > I'd love to be able to remove oom_adj.  I'm not sure if we can get to that 
>  > point if the instance is that "all userspace" must not write to it and it 
>  > would require users to rebuild their binaries.  If we could show that all 
>  > the major open source users of oom_adj (there can't be _that_ many that 
>  > would be significantly impacted since you needed CAP_SYS_RESOURCE to 
>  > reduce it) were converted, maybe Linus would accept it.
> 
> FWIW, you can probably remove trinity from that list.
> It only uses oom_adj as "don't write to this file if it exists".

Yeah, I have filtered it like many others for exactly this reason
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-11-02 20:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-22  6:49 [PATCHv2 4.3-rc6] proc: fix convert from oom_score_adj to oom_adj Hongjie Fang (方洪杰)
2015-10-22  9:47 ` Michal Hocko
2015-10-26 21:42 ` David Rientjes
2015-10-27 12:37   ` Michal Hocko
     [not found]   ` <5eff586de266418090f792077fcff993@SHMBX01.spreadtrum.com>
2015-10-28 23:54     ` 答复: " David Rientjes
2015-10-29  3:47       ` Eric W. Biederman
2015-11-01  4:30         ` David Rientjes
2015-10-29 17:04       ` Michal Hocko
2015-10-30 12:59         ` Michal Hocko
2015-10-30 14:46           ` Michal Hocko
2015-11-01  4:38             ` David Rientjes
2015-11-02 13:08               ` Michal Hocko
2015-11-02 17:18               ` Dave Jones
2015-11-02 20:24                 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).