linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes
@ 2018-03-07 23:52 David Rientjes
  2018-03-13 13:39 ` Michal Hocko
  0 siblings, 1 reply; 2+ messages in thread
From: David Rientjes @ 2018-03-07 23:52 UTC (permalink / raw)
  To: Gaurav Kohli, Andrew Morton
  Cc: mhocko, kirill.shutemov, Andrea Arcangeli, linux-mm,
	linux-kernel, linux-arm-msm

Since the 2.6 kernel, the oom killer has slightly biased away from 
CAP_SYS_ADMIN processes by discounting some of its memory usage in 
comparison to other processes.

This has always been implicit and nothing exactly relies on the behavior.

Gaurav notices that __task_cred() can dereference a potentially freed 
pointer if the task under consideration is exiting because a reference to 
the task_struct is not held.

Remove the CAP_SYS_ADMIN bias so that all processes are treated equally.

If any CAP_SYS_ADMIN process would like to be biased against, it is always 
allowed to adjust /proc/pid/oom_score_adj.

Reported-by: Gaurav Kohli <gkohli@codeaurora.org>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -224,13 +224,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
 		mm_pgtables_bytes(p->mm) / PAGE_SIZE;
 	task_unlock(p);
 
-	/*
-	 * Root processes get 3% bonus, just like the __vm_enough_memory()
-	 * implementation used by LSMs.
-	 */
-	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
-		points -= (points * 3) / 100;
-
 	/* Normalize to oom_score_adj units */
 	adj *= totalpages / 1000;
 	points += adj;

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [patch] mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes
  2018-03-07 23:52 [patch] mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes David Rientjes
@ 2018-03-13 13:39 ` Michal Hocko
  0 siblings, 0 replies; 2+ messages in thread
From: Michal Hocko @ 2018-03-13 13:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Gaurav Kohli, Andrew Morton, kirill.shutemov, Andrea Arcangeli,
	linux-mm, linux-kernel, linux-arm-msm

On Wed 07-03-18 15:52:15, David Rientjes wrote:
> Since the 2.6 kernel, the oom killer has slightly biased away from 
> CAP_SYS_ADMIN processes by discounting some of its memory usage in 
> comparison to other processes.
> 
> This has always been implicit and nothing exactly relies on the behavior.
> 
> Gaurav notices that __task_cred() can dereference a potentially freed 
> pointer if the task under consideration is exiting because a reference to 
> the task_struct is not held.
> 
> Remove the CAP_SYS_ADMIN bias so that all processes are treated equally.
> 
> If any CAP_SYS_ADMIN process would like to be biased against, it is always 
> allowed to adjust /proc/pid/oom_score_adj.
> 
> Reported-by: Gaurav Kohli <gkohli@codeaurora.org>
> Signed-off-by: David Rientjes <rientjes@google.com>

This is simpler than playing reference counting tricks and whatnot.
Moreover I do agree that this heuristic is questionable on its own. The
bias is basically random and invisible to the userspace. We already have
a way to tune the same thing by oom_score_adj

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/oom_kill.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -224,13 +224,6 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
>  		mm_pgtables_bytes(p->mm) / PAGE_SIZE;
>  	task_unlock(p);
>  
> -	/*
> -	 * Root processes get 3% bonus, just like the __vm_enough_memory()
> -	 * implementation used by LSMs.
> -	 */
> -	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> -		points -= (points * 3) / 100;
> -
>  	/* Normalize to oom_score_adj units */
>  	adj *= totalpages / 1000;
>  	points += adj;

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-03-13 13:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-07 23:52 [patch] mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes David Rientjes
2018-03-13 13:39 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).