linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] oom_kill: oom_score_adj broken for processes with small memory usage
@ 2021-07-01 12:54 minyard
  2021-07-16  5:19 ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: minyard @ 2021-07-01 12:54 UTC (permalink / raw)
  To: Andrew Morton, linux-mm; +Cc: linux-kernel, Corey Minyard

From: Corey Minyard <cminyard@mvista.com>

If you have a process with less than 1000 totalpages, the calculation:

  adj = (long)p->signal->oom_score_adj;
  ...
  adj *= totalpages / 1000;

will always result in adj being zero no matter what oom_score_adj is,
which could result in the wrong process being picked for killing.

Fix by adding 1000 to totalpages before dividing.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
---
I ran across this trying to diagnose another problem where I set up a
cgroup with a small amount of memory and couldn't get a test program to
work right.

I'm not sure this is quite right, to keep closer to the current behavior
you could do:

	if (totalpages >= 1000)
		adj *= totalpages / 1000;

but that would map 0-1999 to the same value.  But this at least shows
the issue.  I can provide a test program the shows the issue, but I
think it's pretty obvious from the code.

 mm/oom_kill.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index eefd3f5fde46..4ae0b6193bcd 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -233,8 +233,11 @@ long oom_badness(struct task_struct *p, unsigned long totalpages)
 		mm_pgtables_bytes(p->mm) / PAGE_SIZE;
 	task_unlock(p);
 
-	/* Normalize to oom_score_adj units */
-	adj *= totalpages / 1000;
+	/*
+	 * Normalize to oom_score_adj units.  You should never
+	 * multiply by zero here, or oom_score_adj will not work.
+	 */
+	adj *= (totalpages + 1000) / 1000;
 	points += adj;
 
 	return points;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] oom_kill: oom_score_adj broken for processes with small memory usage
  2021-07-01 12:54 [PATCH] oom_kill: oom_score_adj broken for processes with small memory usage minyard
@ 2021-07-16  5:19 ` Michal Hocko
  2021-07-16 12:25   ` Corey Minyard
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2021-07-16  5:19 UTC (permalink / raw)
  To: minyard; +Cc: Andrew Morton, linux-mm, linux-kernel, Corey Minyard

On Thu 01-07-21 07:54:30, minyard@acm.org wrote:
> From: Corey Minyard <cminyard@mvista.com>
> 
> If you have a process with less than 1000 totalpages, the calculation:
> 
>   adj = (long)p->signal->oom_score_adj;
>   ...
>   adj *= totalpages / 1000;
> 
> will always result in adj being zero no matter what oom_score_adj is,
> which could result in the wrong process being picked for killing.
> 
> Fix by adding 1000 to totalpages before dividing.

Yes, this is a known limitation of the oom_score_adj and its scale.
Is this a practical problem to be solved though? I mean 0-1000 pages is
not really that much different from imprecision at a larger scale where
tasks are effectively considered equal.

I have to say I do not really like the proposed workaround. It doesn't
really solve the problem yet it adds another special case.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] oom_kill: oom_score_adj broken for processes with small memory usage
  2021-07-16  5:19 ` Michal Hocko
@ 2021-07-16 12:25   ` Corey Minyard
  2021-09-02 19:55     ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Corey Minyard @ 2021-07-16 12:25 UTC (permalink / raw)
  To: Michal Hocko; +Cc: minyard, Andrew Morton, linux-mm, linux-kernel

On Fri, Jul 16, 2021 at 07:19:24AM +0200, Michal Hocko wrote:
> On Thu 01-07-21 07:54:30, minyard@acm.org wrote:
> > From: Corey Minyard <cminyard@mvista.com>
> > 
> > If you have a process with less than 1000 totalpages, the calculation:
> > 
> >   adj = (long)p->signal->oom_score_adj;
> >   ...
> >   adj *= totalpages / 1000;
> > 
> > will always result in adj being zero no matter what oom_score_adj is,
> > which could result in the wrong process being picked for killing.
> > 
> > Fix by adding 1000 to totalpages before dividing.
> 
> Yes, this is a known limitation of the oom_score_adj and its scale.
> Is this a practical problem to be solved though? I mean 0-1000 pages is
> not really that much different from imprecision at a larger scale where
> tasks are effectively considered equal.

Known limitation?  Is this documented?  I couldn't find anything that
said "oom_score_adj doesn't work at all with programs with <1000 pages
besides setting the value to -1000".

> 
> I have to say I do not really like the proposed workaround. It doesn't
> really solve the problem yet it adds another special case.

The problem is that if you have a small program, there is no way to
set it's priority besides completely disablling the OOM killer for
it.

I don't understand the special case comment.  How is this adding a
special case?  This patch removes a special case.  Small programs
working different than big programs is a special case.  Making them all
work the same is removing an element of surprise from someone expecting
things to work as documented.

-corey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] oom_kill: oom_score_adj broken for processes with small memory usage
  2021-07-16 12:25   ` Corey Minyard
@ 2021-09-02 19:55     ` Andrew Morton
  2021-09-03  1:52       ` Corey Minyard
  2021-09-03  7:49       ` Michal Hocko
  0 siblings, 2 replies; 6+ messages in thread
From: Andrew Morton @ 2021-09-02 19:55 UTC (permalink / raw)
  To: cminyard; +Cc: Michal Hocko, minyard, linux-mm, linux-kernel

On Fri, 16 Jul 2021 07:25:47 -0500 Corey Minyard <cminyard@mvista.com> wrote:

> On Fri, Jul 16, 2021 at 07:19:24AM +0200, Michal Hocko wrote:
> > On Thu 01-07-21 07:54:30, minyard@acm.org wrote:
> > > From: Corey Minyard <cminyard@mvista.com>
> > > 
> > > If you have a process with less than 1000 totalpages, the calculation:
> > > 
> > >   adj = (long)p->signal->oom_score_adj;
> > >   ...
> > >   adj *= totalpages / 1000;
> > > 
> > > will always result in adj being zero no matter what oom_score_adj is,
> > > which could result in the wrong process being picked for killing.
> > > 
> > > Fix by adding 1000 to totalpages before dividing.
> > 
> > Yes, this is a known limitation of the oom_score_adj and its scale.
> > Is this a practical problem to be solved though? I mean 0-1000 pages is
> > not really that much different from imprecision at a larger scale where
> > tasks are effectively considered equal.
> 
> Known limitation?  Is this documented?  I couldn't find anything that
> said "oom_score_adj doesn't work at all with programs with <1000 pages
> besides setting the value to -1000".
> 
> > 
> > I have to say I do not really like the proposed workaround. It doesn't
> > really solve the problem yet it adds another special case.
> 
> The problem is that if you have a small program, there is no way to
> set it's priority besides completely disablling the OOM killer for
> it.
> 
> I don't understand the special case comment.  How is this adding a
> special case?  This patch removes a special case.  Small programs
> working different than big programs is a special case.  Making them all
> work the same is removing an element of surprise from someone expecting
> things to work as documented.
> 

Can we please get this resolved one way or the other?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] oom_kill: oom_score_adj broken for processes with small memory usage
  2021-09-02 19:55     ` Andrew Morton
@ 2021-09-03  1:52       ` Corey Minyard
  2021-09-03  7:49       ` Michal Hocko
  1 sibling, 0 replies; 6+ messages in thread
From: Corey Minyard @ 2021-09-03  1:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: cminyard, Michal Hocko, linux-mm, linux-kernel

On Thu, Sep 02, 2021 at 12:55:01PM -0700, Andrew Morton wrote:
> On Fri, 16 Jul 2021 07:25:47 -0500 Corey Minyard <cminyard@mvista.com> wrote:
> 
> > On Fri, Jul 16, 2021 at 07:19:24AM +0200, Michal Hocko wrote:
> > > On Thu 01-07-21 07:54:30, minyard@acm.org wrote:
> > > > From: Corey Minyard <cminyard@mvista.com>
> > > > 
> > > > If you have a process with less than 1000 totalpages, the calculation:
> > > > 
> > > >   adj = (long)p->signal->oom_score_adj;
> > > >   ...
> > > >   adj *= totalpages / 1000;
> > > > 
> > > > will always result in adj being zero no matter what oom_score_adj is,
> > > > which could result in the wrong process being picked for killing.
> > > > 
> > > > Fix by adding 1000 to totalpages before dividing.
> > > 
> > > Yes, this is a known limitation of the oom_score_adj and its scale.
> > > Is this a practical problem to be solved though? I mean 0-1000 pages is
> > > not really that much different from imprecision at a larger scale where
> > > tasks are effectively considered equal.
> > 
> > Known limitation?  Is this documented?  I couldn't find anything that
> > said "oom_score_adj doesn't work at all with programs with <1000 pages
> > besides setting the value to -1000".
> > 
> > > 
> > > I have to say I do not really like the proposed workaround. It doesn't
> > > really solve the problem yet it adds another special case.
> > 
> > The problem is that if you have a small program, there is no way to
> > set it's priority besides completely disablling the OOM killer for
> > it.
> > 
> > I don't understand the special case comment.  How is this adding a
> > special case?  This patch removes a special case.  Small programs
> > working different than big programs is a special case.  Making them all
> > work the same is removing an element of surprise from someone expecting
> > things to work as documented.
> > 
> 
> Can we please get this resolved one way or the other?

My goal in submitting this is to avoid someone having to go through what
I went through.  I know it now, so it's not going to affect me again.

We could document this, but to me it seems silly when something can just
be made consistent to avoid having to document it.  I got no response to
my questions above, so I don't know what to make of it.

Thanks Andrew,

-corey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] oom_kill: oom_score_adj broken for processes with small memory usage
  2021-09-02 19:55     ` Andrew Morton
  2021-09-03  1:52       ` Corey Minyard
@ 2021-09-03  7:49       ` Michal Hocko
  1 sibling, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2021-09-03  7:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: cminyard, minyard, linux-mm, linux-kernel

On Thu 02-09-21 12:55:01, Andrew Morton wrote:
> On Fri, 16 Jul 2021 07:25:47 -0500 Corey Minyard <cminyard@mvista.com> wrote:
> 
> > On Fri, Jul 16, 2021 at 07:19:24AM +0200, Michal Hocko wrote:
> > > On Thu 01-07-21 07:54:30, minyard@acm.org wrote:
> > > > From: Corey Minyard <cminyard@mvista.com>
> > > > 
> > > > If you have a process with less than 1000 totalpages, the calculation:
> > > > 
> > > >   adj = (long)p->signal->oom_score_adj;
> > > >   ...
> > > >   adj *= totalpages / 1000;
> > > > 
> > > > will always result in adj being zero no matter what oom_score_adj is,
> > > > which could result in the wrong process being picked for killing.
> > > > 
> > > > Fix by adding 1000 to totalpages before dividing.
> > > 
> > > Yes, this is a known limitation of the oom_score_adj and its scale.
> > > Is this a practical problem to be solved though? I mean 0-1000 pages is
> > > not really that much different from imprecision at a larger scale where
> > > tasks are effectively considered equal.
> > 
> > Known limitation?  Is this documented?  I couldn't find anything that
> > said "oom_score_adj doesn't work at all with programs with <1000 pages
> > besides setting the value to -1000".
> > 
> > > 
> > > I have to say I do not really like the proposed workaround. It doesn't
> > > really solve the problem yet it adds another special case.
> > 
> > The problem is that if you have a small program, there is no way to
> > set it's priority besides completely disablling the OOM killer for
> > it.
> > 
> > I don't understand the special case comment.  How is this adding a
> > special case?  This patch removes a special case.  Small programs
> > working different than big programs is a special case.  Making them all
> > work the same is removing an element of surprise from someone expecting
> > things to work as documented.
> > 
> 
> Can we please get this resolved one way or the other?

As I've already said, I do not see this practical enough problem to
warrant special treatment. Do we really care about controlling the oom
behavior for tasks with <4MB of memory?

I fully agree that the current situation is not ideal. The whole
oom_score* API sucks but here we are with an user API that is
effectivelly impossible to fix properly.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-03  7:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-01 12:54 [PATCH] oom_kill: oom_score_adj broken for processes with small memory usage minyard
2021-07-16  5:19 ` Michal Hocko
2021-07-16 12:25   ` Corey Minyard
2021-09-02 19:55     ` Andrew Morton
2021-09-03  1:52       ` Corey Minyard
2021-09-03  7:49       ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).