All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-17 14:31 ` Tetsuo Handa
  0 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2016-02-17 14:31 UTC (permalink / raw)
  To: mhocko, akpm
  Cc: rientjes, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm,
	linux-kernel, Tetsuo Handa

oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a
thread which returns oom_task_origin() == true. But it is possible
that such thread is marked as OOM-unkillable. In that case, the OOM
killer must not select such process.

Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable
process because subsequent oom_badness() call will return 0, this
patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE
if that process is marked as OOM-unkillable (regardless of
oom_task_origin()).

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Suggested-by: Michal Hocko <mhocko@kernel.org>
---
 mm/oom_kill.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7653055..cf87153 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 		if (!is_sysrq_oom(oc))
 			return OOM_SCAN_ABORT;
 	}
-	if (!task->mm)
+	if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
 		return OOM_SCAN_CONTINUE;
 
 	/*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-17 14:31 ` Tetsuo Handa
  0 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2016-02-17 14:31 UTC (permalink / raw)
  To: mhocko, akpm
  Cc: rientjes, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm,
	linux-kernel, Tetsuo Handa

oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a
thread which returns oom_task_origin() == true. But it is possible
that such thread is marked as OOM-unkillable. In that case, the OOM
killer must not select such process.

Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable
process because subsequent oom_badness() call will return 0, this
patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE
if that process is marked as OOM-unkillable (regardless of
oom_task_origin()).

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Suggested-by: Michal Hocko <mhocko@kernel.org>
---
 mm/oom_kill.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7653055..cf87153 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 		if (!is_sysrq_oom(oc))
 			return OOM_SCAN_ABORT;
 	}
-	if (!task->mm)
+	if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
 		return OOM_SCAN_CONTINUE;
 
 	/*
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-17 14:31 ` Tetsuo Handa
@ 2016-02-17 14:48   ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-17 14:48 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: akpm, rientjes, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Wed 17-02-16 23:31:00, Tetsuo Handa wrote:
> oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a
> thread which returns oom_task_origin() == true. But it is possible
> that such thread is marked as OOM-unkillable. In that case, the OOM
> killer must not select such process.

As already pointed out swapoff or ksm run_store are the only users of
OOM_FLAG_ORIGIN and it would be insane to run them from an oom disabled
context. So I wouldn't care much about this part that much and consider
the patch to be more of a cleanup rather than a bug fix.

> Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable
> process because subsequent oom_badness() call will return 0, this
> patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE
> if that process is marked as OOM-unkillable (regardless of
> oom_task_origin()).
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Suggested-by: Michal Hocko <mhocko@kernel.org>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/oom_kill.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 7653055..cf87153 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  		if (!is_sysrq_oom(oc))
>  			return OOM_SCAN_ABORT;
>  	}
> -	if (!task->mm)
> +	if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
>  		return OOM_SCAN_CONTINUE;
>  
>  	/*
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-17 14:48   ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-17 14:48 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: akpm, rientjes, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Wed 17-02-16 23:31:00, Tetsuo Handa wrote:
> oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a
> thread which returns oom_task_origin() == true. But it is possible
> that such thread is marked as OOM-unkillable. In that case, the OOM
> killer must not select such process.

As already pointed out swapoff or ksm run_store are the only users of
OOM_FLAG_ORIGIN and it would be insane to run them from an oom disabled
context. So I wouldn't care much about this part that much and consider
the patch to be more of a cleanup rather than a bug fix.

> Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable
> process because subsequent oom_badness() call will return 0, this
> patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE
> if that process is marked as OOM-unkillable (regardless of
> oom_task_origin()).
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Suggested-by: Michal Hocko <mhocko@kernel.org>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/oom_kill.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 7653055..cf87153 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  		if (!is_sysrq_oom(oc))
>  			return OOM_SCAN_ABORT;
>  	}
> -	if (!task->mm)
> +	if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
>  		return OOM_SCAN_CONTINUE;
>  
>  	/*
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-17 14:31 ` Tetsuo Handa
@ 2016-02-17 22:31   ` David Rientjes
  -1 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2016-02-17 22:31 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mhocko, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Wed, 17 Feb 2016, Tetsuo Handa wrote:

> oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a
> thread which returns oom_task_origin() == true. But it is possible
> that such thread is marked as OOM-unkillable. In that case, the OOM
> killer must not select such process.
> 
> Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable
> process because subsequent oom_badness() call will return 0, this
> patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE
> if that process is marked as OOM-unkillable (regardless of
> oom_task_origin()).
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Suggested-by: Michal Hocko <mhocko@kernel.org>
> ---
>  mm/oom_kill.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 7653055..cf87153 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  		if (!is_sysrq_oom(oc))
>  			return OOM_SCAN_ABORT;
>  	}
> -	if (!task->mm)
> +	if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
>  		return OOM_SCAN_CONTINUE;
>  
>  	/*

I'm getting multiple emails from you with the identical patch, something 
is definitely wacky in your toolchain.

Anyway, this is NACK'd since task->signal->oom_score_adj is checked under 
task_lock() for threads with memory attached, that's the purpose of 
finding the correct thread in oom_badness() and taking task_lock().  We 
aren't going to duplicate logic in several functions that all do the same 
thing.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-17 22:31   ` David Rientjes
  0 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2016-02-17 22:31 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mhocko, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Wed, 17 Feb 2016, Tetsuo Handa wrote:

> oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a
> thread which returns oom_task_origin() == true. But it is possible
> that such thread is marked as OOM-unkillable. In that case, the OOM
> killer must not select such process.
> 
> Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable
> process because subsequent oom_badness() call will return 0, this
> patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE
> if that process is marked as OOM-unkillable (regardless of
> oom_task_origin()).
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Suggested-by: Michal Hocko <mhocko@kernel.org>
> ---
>  mm/oom_kill.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 7653055..cf87153 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  		if (!is_sysrq_oom(oc))
>  			return OOM_SCAN_ABORT;
>  	}
> -	if (!task->mm)
> +	if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
>  		return OOM_SCAN_CONTINUE;
>  
>  	/*

I'm getting multiple emails from you with the identical patch, something 
is definitely wacky in your toolchain.

Anyway, this is NACK'd since task->signal->oom_score_adj is checked under 
task_lock() for threads with memory attached, that's the purpose of 
finding the correct thread in oom_badness() and taking task_lock().  We 
aren't going to duplicate logic in several functions that all do the same 
thing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-17 22:31   ` David Rientjes
@ 2016-02-18  8:09     ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-18  8:09 UTC (permalink / raw)
  To: David Rientjes
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Wed 17-02-16 14:31:54, David Rientjes wrote:
> On Wed, 17 Feb 2016, Tetsuo Handa wrote:
> 
> > oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a
> > thread which returns oom_task_origin() == true. But it is possible
> > that such thread is marked as OOM-unkillable. In that case, the OOM
> > killer must not select such process.
> > 
> > Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable
> > process because subsequent oom_badness() call will return 0, this
> > patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE
> > if that process is marked as OOM-unkillable (regardless of
> > oom_task_origin()).
> > 
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Suggested-by: Michal Hocko <mhocko@kernel.org>
> > ---
> >  mm/oom_kill.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 7653055..cf87153 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
> >  		if (!is_sysrq_oom(oc))
> >  			return OOM_SCAN_ABORT;
> >  	}
> > -	if (!task->mm)
> > +	if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
> >  		return OOM_SCAN_CONTINUE;
> >  
> >  	/*
> 
> I'm getting multiple emails from you with the identical patch, something 
> is definitely wacky in your toolchain.
> 
> Anyway, this is NACK'd since task->signal->oom_score_adj is checked under 
> task_lock() for threads with memory attached, that's the purpose of 
> finding the correct thread in oom_badness() and taking task_lock().  We 
> aren't going to duplicate logic in several functions that all do the same 
> thing.

Is the task_lock really necessary, though? E.g. oom_task_origin()
doesn't seem to depend on it for task->signal safety. If you are
referring to races with changing oom_score_adj does such a race matter
at all?

To me this looks like a reasonable cleanup because we _know_ that
OOM_SCORE_ADJ_MIN means OOM_SCAN_CONTINUE and do not really have to go
down to oom_badness to find that out. Or what am I missing?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-18  8:09     ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-18  8:09 UTC (permalink / raw)
  To: David Rientjes
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Wed 17-02-16 14:31:54, David Rientjes wrote:
> On Wed, 17 Feb 2016, Tetsuo Handa wrote:
> 
> > oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a
> > thread which returns oom_task_origin() == true. But it is possible
> > that such thread is marked as OOM-unkillable. In that case, the OOM
> > killer must not select such process.
> > 
> > Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable
> > process because subsequent oom_badness() call will return 0, this
> > patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE
> > if that process is marked as OOM-unkillable (regardless of
> > oom_task_origin()).
> > 
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Suggested-by: Michal Hocko <mhocko@kernel.org>
> > ---
> >  mm/oom_kill.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 7653055..cf87153 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
> >  		if (!is_sysrq_oom(oc))
> >  			return OOM_SCAN_ABORT;
> >  	}
> > -	if (!task->mm)
> > +	if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
> >  		return OOM_SCAN_CONTINUE;
> >  
> >  	/*
> 
> I'm getting multiple emails from you with the identical patch, something 
> is definitely wacky in your toolchain.
> 
> Anyway, this is NACK'd since task->signal->oom_score_adj is checked under 
> task_lock() for threads with memory attached, that's the purpose of 
> finding the correct thread in oom_badness() and taking task_lock().  We 
> aren't going to duplicate logic in several functions that all do the same 
> thing.

Is the task_lock really necessary, though? E.g. oom_task_origin()
doesn't seem to depend on it for task->signal safety. If you are
referring to races with changing oom_score_adj does such a race matter
at all?

To me this looks like a reasonable cleanup because we _know_ that
OOM_SCORE_ADJ_MIN means OOM_SCAN_CONTINUE and do not really have to go
down to oom_badness to find that out. Or what am I missing?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-18  8:09     ` Michal Hocko
@ 2016-02-18 10:30       ` Tetsuo Handa
  -1 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2016-02-18 10:30 UTC (permalink / raw)
  To: mhocko, rientjes
  Cc: akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm,
	linux-kernel

Michal Hocko wrote:
> On Wed 17-02-16 14:31:54, David Rientjes wrote:
> > On Wed, 17 Feb 2016, Tetsuo Handa wrote:
> > 
> > > oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a
> > > thread which returns oom_task_origin() == true. But it is possible
> > > that such thread is marked as OOM-unkillable. In that case, the OOM
> > > killer must not select such process.
> > > 
> > > Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable
> > > process because subsequent oom_badness() call will return 0, this
> > > patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE
> > > if that process is marked as OOM-unkillable (regardless of
> > > oom_task_origin()).
> > > 
> > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Suggested-by: Michal Hocko <mhocko@kernel.org>
> > > ---
> > >  mm/oom_kill.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 7653055..cf87153 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
> > >  		if (!is_sysrq_oom(oc))
> > >  			return OOM_SCAN_ABORT;
> > >  	}
> > > -	if (!task->mm)
> > > +	if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
> > >  		return OOM_SCAN_CONTINUE;
> > >  
> > >  	/*
> > 
> > I'm getting multiple emails from you with the identical patch, something 
> > is definitely wacky in your toolchain.

Sorry, my operation error. I didn't know I can't do like below.

  git send-email --to="rcpt1 rcpt2" --cc="rcpt3 rcpt4 rcpt5" file.patch

Thus, I resent the identical patch. Not a toolchain problem.

> > 
> > Anyway, this is NACK'd since task->signal->oom_score_adj is checked under 
> > task_lock() for threads with memory attached, that's the purpose of 
> > finding the correct thread in oom_badness() and taking task_lock().  We 
> > aren't going to duplicate logic in several functions that all do the same 
> > thing.
> 
> Is the task_lock really necessary, though? E.g. oom_task_origin()
> doesn't seem to depend on it for task->signal safety. If you are
> referring to races with changing oom_score_adj does such a race matter
> at all?
> 
> To me this looks like a reasonable cleanup because we _know_ that
> OOM_SCORE_ADJ_MIN means OOM_SCAN_CONTINUE and do not really have to go
> down to oom_badness to find that out. Or what am I missing?
> 

That NACK will not matter if a draft patch shown bottom is acceptable.

I got a question about commit 9cbb78bb314360a8 "mm, memcg: introduce own
oom handler to iterate only over its own threads" while trying to kill
duplicated oom_unkillable_task() checks by merging oom_scan_process_thread()
into oom_badness().

Currently, oom_scan_process_thread() is doing

  if (oom_unkillable_task(p, NULL, oc->nodemask))
  	return OOM_SCAN_CONTINUE;

and oom_badness() is doing

  if (oom_unkillable_task(p, memcg, nodemask))
  	return 0;

.

For normal OOM case, out_of_memory() is calling

  oom_scan_process_thread(oc, p, totalpages)
  oom_badness(p, NULL, oc->nodemask, totalpages)

which is translated to

  if (oom_unkillable_task(p, NULL, oc->nodemask))
      return OOM_SCAN_CONTINUE;
  if (oom_unkillable_task(p, NULL, oc->nodemask))
      return 0;

.

But for memcg OOM case, mem_cgroup_out_of_memory() is calling

  oom_scan_process_thread(oc, p, totalpages)
  oom_badness(p, memcg, NULL, totalpages)

which is translated to

  if (oom_unkillable_task(p, NULL, NULL))
      return OOM_SCAN_CONTINUE;
  if (oom_unkillable_task(p, memcg, NULL))
      return 0;

.

Commit 9cbb78bb314360a8 changed oom_scan_process_thread() to
always pass memcg == NULL by removing memcg argument from
oom_scan_process_thread(). As a result, after that commit,
we are doing test_tsk_thread_flag(p, TIF_MEMDIE) check and
oom_task_origin(p) check between two oom_unkillable_task()
calls of memcg OOM case. Why don't we skip these checks by
passing memcg != NULL to first oom_unkillable_task() call?
Was this change by error?

----------
>From 36f79bc270858e93be75de2adae1afe757d91b94 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 18 Feb 2016 19:21:26 +0900
Subject: [PATCH] mm,oom: kill duplicated oom_unkillable_task() checks.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 fs/proc/base.c      |  2 +-
 include/linux/oom.h | 16 +++-------
 mm/memcontrol.c     | 23 ++++----------
 mm/oom_kill.c       | 91 +++++++++++++++++++++++------------------------------
 4 files changed, 51 insertions(+), 81 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index d7c51ca..3020aa2 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -581,7 +581,7 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns,
 
 	read_lock(&tasklist_lock);
 	if (pid_alive(task))
-		points = oom_badness(task, NULL, NULL, totalpages) *
+		points = oom_badness(task, NULL, NULL, totalpages, false) *
 						1000 / totalpages;
 	read_unlock(&tasklist_lock);
 	seq_printf(m, "%lu\n", points);
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 45993b8..b31467e 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -43,13 +43,6 @@ enum oom_constraint {
 	CONSTRAINT_MEMCG,
 };
 
-enum oom_scan_t {
-	OOM_SCAN_OK,		/* scan thread and find its badness */
-	OOM_SCAN_CONTINUE,	/* do not consider thread for oom kill */
-	OOM_SCAN_ABORT,		/* abort the iteration and return */
-	OOM_SCAN_SELECT,	/* always select this thread first */
-};
-
 /* Thread is the potential origin of an oom condition; kill first on oom */
 #define OOM_FLAG_ORIGIN		((__force oom_flags_t)0x1)
 
@@ -73,8 +66,10 @@ static inline bool oom_task_origin(const struct task_struct *p)
 extern void mark_oom_victim(struct task_struct *tsk);
 
 extern unsigned long oom_badness(struct task_struct *p,
-		struct mem_cgroup *memcg, const nodemask_t *nodemask,
-		unsigned long totalpages);
+				 struct mem_cgroup *memcg,
+				 struct oom_control *oc,
+				 unsigned long totalpages,
+				 bool check_exceptions);
 
 extern int oom_kills_count(void);
 extern void note_oom_kill(void);
@@ -86,9 +81,6 @@ extern void check_panic_on_oom(struct oom_control *oc,
 			       enum oom_constraint constraint,
 			       struct mem_cgroup *memcg);
 
-extern enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
-		struct task_struct *task, unsigned long totalpages);
-
 extern bool out_of_memory(struct oom_control *oc);
 
 extern void exit_oom_victim(struct task_struct *tsk);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae8b81c..0d422bf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1248,7 +1248,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	struct mem_cgroup *iter;
 	unsigned long chosen_points = 0;
 	unsigned long totalpages;
-	unsigned int points = 0;
+	unsigned long points = 0;
 	struct task_struct *chosen = NULL;
 
 	mutex_lock(&oom_lock);
@@ -1271,28 +1271,17 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 
 		css_task_iter_start(&iter->css, &it);
 		while ((task = css_task_iter_next(&it))) {
-			switch (oom_scan_process_thread(&oc, task, totalpages)) {
-			case OOM_SCAN_SELECT:
-				if (chosen)
-					put_task_struct(chosen);
-				chosen = task;
-				chosen_points = ULONG_MAX;
-				get_task_struct(chosen);
-				/* fall through */
-			case OOM_SCAN_CONTINUE:
+			points = oom_badness(task, memcg, &oc, totalpages,
+					     true);
+			if (!points || points < chosen_points)
 				continue;
-			case OOM_SCAN_ABORT:
+			if (points == ULONG_MAX) {
 				css_task_iter_end(&it);
 				mem_cgroup_iter_break(memcg, iter);
 				if (chosen)
 					put_task_struct(chosen);
 				goto unlock;
-			case OOM_SCAN_OK:
-				break;
-			};
-			points = oom_badness(task, memcg, NULL, totalpages);
-			if (!points || points < chosen_points)
-				continue;
+			}
 			/* Prefer thread group leaders for display purposes */
 			if (points == chosen_points &&
 			    thread_group_leader(chosen))
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index d7bb9c1..f426ce8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -155,21 +155,40 @@ static bool oom_unkillable_task(struct task_struct *p,
 /**
  * oom_badness - heuristic function to determine which candidate task to kill
  * @p: task struct of which task we should calculate
+ * @memcg: memory cgroup. NULL unless memcg OOM case.
+ * @oc: oom_control struct. NULL if /proc/pid/oom_score_adj case.
  * @totalpages: total present RAM allowed for page allocation
+ * @check_exceptions: whether to check for TIF_MEMDIE and oom_task_origin().
+ *
+ * Returns ULONG_MAX if @p is marked as OOM-victim.
+ * Returns ULONG_MAX - 1 if @p is marked as oom_task_origin().
+ * Returns 0 if @p is marked as OOM-unkillable.
+ * Returns integer between 1 and ULONG_MAX - 2 otherwise.
  *
  * The heuristic for determining which task to kill is made to be as simple and
  * predictable as possible.  The goal is to return the highest value for the
  * task consuming the most memory to avoid subsequent oom failures.
  */
 unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
-			  const nodemask_t *nodemask, unsigned long totalpages)
+			  struct oom_control *oc, unsigned long totalpages,
+			  bool check_exceptions)
 {
 	long points;
 	long adj;
 
-	if (oom_unkillable_task(p, memcg, nodemask))
+	if (oom_unkillable_task(p, memcg, oc ? oc->nodemask : NULL))
 		return 0;
 
+	/*
+	 * This task already has access to memory reserves and is being killed.
+	 * Don't allow any other task to have access to the reserves.
+	 */
+	if (check_exceptions)
+		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+			if (!is_sysrq_oom(oc))
+				return ULONG_MAX;
+		}
+
 	p = find_lock_task_mm(p);
 	if (!p)
 		return 0;
@@ -181,6 +200,16 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
 	}
 
 	/*
+	 * If task is allocating a lot of memory and has been marked to be
+	 * killed first if it triggers an oom, then select it.
+	 */
+	if (check_exceptions)
+		if (oom_task_origin(p)) {
+			task_unlock(p);
+			return ULONG_MAX - 1;
+		}
+
+	/*
 	 * The baseline for the badness score is the proportion of RAM that each
 	 * task's rss, pagetable and swap space use.
 	 */
@@ -268,36 +297,6 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc,
 }
 #endif
 
-enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
-			struct task_struct *task, unsigned long totalpages)
-{
-	if (oom_unkillable_task(task, NULL, oc->nodemask))
-		return OOM_SCAN_CONTINUE;
-
-	/*
-	 * This task already has access to memory reserves and is being killed.
-	 * Don't allow any other task to have access to the reserves.
-	 */
-	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
-		if (!is_sysrq_oom(oc))
-			return OOM_SCAN_ABORT;
-	}
-	if (!task->mm)
-		return OOM_SCAN_CONTINUE;
-
-	/*
-	 * If task is allocating a lot of memory and has been marked to be
-	 * killed first if it triggers an oom, then select it.
-	 */
-	if (oom_task_origin(task))
-		return OOM_SCAN_SELECT;
-
-	if (task_will_free_mem(task) && !is_sysrq_oom(oc))
-		return OOM_SCAN_ABORT;
-
-	return OOM_SCAN_OK;
-}
-
 /*
  * Simple selection loop. We chose the process with the highest
  * number of 'points'.  Returns -1 on scan abort.
@@ -311,24 +310,14 @@ static struct task_struct *select_bad_process(struct oom_control *oc,
 
 	rcu_read_lock();
 	for_each_process_thread(g, p) {
-		unsigned int points;
-
-		switch (oom_scan_process_thread(oc, p, totalpages)) {
-		case OOM_SCAN_SELECT:
-			chosen = p;
-			chosen_points = ULONG_MAX;
-			/* fall through */
-		case OOM_SCAN_CONTINUE:
+		const unsigned long points = oom_badness(p, NULL, oc,
+							 totalpages, true);
+		if (!points || points < chosen_points)
 			continue;
-		case OOM_SCAN_ABORT:
+		if (points == ULONG_MAX) {
 			rcu_read_unlock();
 			return (struct task_struct *)(-1UL);
-		case OOM_SCAN_OK:
-			break;
-		};
-		points = oom_badness(p, NULL, oc->nodemask, totalpages);
-		if (!points || points < chosen_points)
-			continue;
+		}
 		/* Prefer thread group leaders for display purposes */
 		if (points == chosen_points && thread_group_leader(chosen))
 			continue;
@@ -679,7 +668,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	struct task_struct *child;
 	struct task_struct *t;
 	struct mm_struct *mm;
-	unsigned int victim_points = 0;
+	unsigned long victim_points = 0;
 	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
 					      DEFAULT_RATELIMIT_BURST);
 	bool can_oom_reap = true;
@@ -712,15 +701,15 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	read_lock(&tasklist_lock);
 	for_each_thread(p, t) {
 		list_for_each_entry(child, &t->children, sibling) {
-			unsigned int child_points;
+			unsigned long child_points;
 
 			if (process_shares_mm(child, p->mm))
 				continue;
 			/*
 			 * oom_badness() returns 0 if the thread is unkillable
 			 */
-			child_points = oom_badness(child, memcg, oc->nodemask,
-								totalpages);
+			child_points = oom_badness(child, memcg, oc,
+						   totalpages, false);
 			if (child_points > victim_points) {
 				put_task_struct(victim);
 				victim = child;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-18 10:30       ` Tetsuo Handa
  0 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2016-02-18 10:30 UTC (permalink / raw)
  To: mhocko, rientjes
  Cc: akpm, mgorman, oleg, torvalds, hughd, andrea, riel, linux-mm,
	linux-kernel

Michal Hocko wrote:
> On Wed 17-02-16 14:31:54, David Rientjes wrote:
> > On Wed, 17 Feb 2016, Tetsuo Handa wrote:
> > 
> > > oom_scan_process_thread() returns OOM_SCAN_SELECT when there is a
> > > thread which returns oom_task_origin() == true. But it is possible
> > > that such thread is marked as OOM-unkillable. In that case, the OOM
> > > killer must not select such process.
> > > 
> > > Since it is meaningless to return OOM_SCAN_OK for OOM-unkillable
> > > process because subsequent oom_badness() call will return 0, this
> > > patch changes oom_scan_process_thread to return OOM_SCAN_CONTINUE
> > > if that process is marked as OOM-unkillable (regardless of
> > > oom_task_origin()).
> > > 
> > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Suggested-by: Michal Hocko <mhocko@kernel.org>
> > > ---
> > >  mm/oom_kill.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 7653055..cf87153 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -282,7 +282,7 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
> > >  		if (!is_sysrq_oom(oc))
> > >  			return OOM_SCAN_ABORT;
> > >  	}
> > > -	if (!task->mm)
> > > +	if (!task->mm || task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
> > >  		return OOM_SCAN_CONTINUE;
> > >  
> > >  	/*
> > 
> > I'm getting multiple emails from you with the identical patch, something 
> > is definitely wacky in your toolchain.

Sorry, my operation error. I didn't know I can't do like below.

  git send-email --to="rcpt1 rcpt2" --cc="rcpt3 rcpt4 rcpt5" file.patch

Thus, I resent the identical patch. Not a toolchain problem.

> > 
> > Anyway, this is NACK'd since task->signal->oom_score_adj is checked under 
> > task_lock() for threads with memory attached, that's the purpose of 
> > finding the correct thread in oom_badness() and taking task_lock().  We 
> > aren't going to duplicate logic in several functions that all do the same 
> > thing.
> 
> Is the task_lock really necessary, though? E.g. oom_task_origin()
> doesn't seem to depend on it for task->signal safety. If you are
> referring to races with changing oom_score_adj does such a race matter
> at all?
> 
> To me this looks like a reasonable cleanup because we _know_ that
> OOM_SCORE_ADJ_MIN means OOM_SCAN_CONTINUE and do not really have to go
> down to oom_badness to find that out. Or what am I missing?
> 

That NACK will not matter if a draft patch shown bottom is acceptable.

I got a question about commit 9cbb78bb314360a8 "mm, memcg: introduce own
oom handler to iterate only over its own threads" while trying to kill
duplicated oom_unkillable_task() checks by merging oom_scan_process_thread()
into oom_badness().

Currently, oom_scan_process_thread() is doing

  if (oom_unkillable_task(p, NULL, oc->nodemask))
  	return OOM_SCAN_CONTINUE;

and oom_badness() is doing

  if (oom_unkillable_task(p, memcg, nodemask))
  	return 0;

.

For normal OOM case, out_of_memory() is calling

  oom_scan_process_thread(oc, p, totalpages)
  oom_badness(p, NULL, oc->nodemask, totalpages)

which is translated to

  if (oom_unkillable_task(p, NULL, oc->nodemask))
      return OOM_SCAN_CONTINUE;
  if (oom_unkillable_task(p, NULL, oc->nodemask))
      return 0;

.

But for memcg OOM case, mem_cgroup_out_of_memory() is calling

  oom_scan_process_thread(oc, p, totalpages)
  oom_badness(p, memcg, NULL, totalpages)

which is translated to

  if (oom_unkillable_task(p, NULL, NULL))
      return OOM_SCAN_CONTINUE;
  if (oom_unkillable_task(p, memcg, NULL))
      return 0;

.

Commit 9cbb78bb314360a8 changed oom_scan_process_thread() to
always pass memcg == NULL by removing memcg argument from
oom_scan_process_thread(). As a result, after that commit,
we are doing test_tsk_thread_flag(p, TIF_MEMDIE) check and
oom_task_origin(p) check between two oom_unkillable_task()
calls of memcg OOM case. Why don't we skip these checks by
passing memcg != NULL to first oom_unkillable_task() call?
Was this change by error?

----------
>From 36f79bc270858e93be75de2adae1afe757d91b94 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 18 Feb 2016 19:21:26 +0900
Subject: [PATCH] mm,oom: kill duplicated oom_unkillable_task() checks.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 fs/proc/base.c      |  2 +-
 include/linux/oom.h | 16 +++-------
 mm/memcontrol.c     | 23 ++++----------
 mm/oom_kill.c       | 91 +++++++++++++++++++++++------------------------------
 4 files changed, 51 insertions(+), 81 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index d7c51ca..3020aa2 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -581,7 +581,7 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns,
 
 	read_lock(&tasklist_lock);
 	if (pid_alive(task))
-		points = oom_badness(task, NULL, NULL, totalpages) *
+		points = oom_badness(task, NULL, NULL, totalpages, false) *
 						1000 / totalpages;
 	read_unlock(&tasklist_lock);
 	seq_printf(m, "%lu\n", points);
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 45993b8..b31467e 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -43,13 +43,6 @@ enum oom_constraint {
 	CONSTRAINT_MEMCG,
 };
 
-enum oom_scan_t {
-	OOM_SCAN_OK,		/* scan thread and find its badness */
-	OOM_SCAN_CONTINUE,	/* do not consider thread for oom kill */
-	OOM_SCAN_ABORT,		/* abort the iteration and return */
-	OOM_SCAN_SELECT,	/* always select this thread first */
-};
-
 /* Thread is the potential origin of an oom condition; kill first on oom */
 #define OOM_FLAG_ORIGIN		((__force oom_flags_t)0x1)
 
@@ -73,8 +66,10 @@ static inline bool oom_task_origin(const struct task_struct *p)
 extern void mark_oom_victim(struct task_struct *tsk);
 
 extern unsigned long oom_badness(struct task_struct *p,
-		struct mem_cgroup *memcg, const nodemask_t *nodemask,
-		unsigned long totalpages);
+				 struct mem_cgroup *memcg,
+				 struct oom_control *oc,
+				 unsigned long totalpages,
+				 bool check_exceptions);
 
 extern int oom_kills_count(void);
 extern void note_oom_kill(void);
@@ -86,9 +81,6 @@ extern void check_panic_on_oom(struct oom_control *oc,
 			       enum oom_constraint constraint,
 			       struct mem_cgroup *memcg);
 
-extern enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
-		struct task_struct *task, unsigned long totalpages);
-
 extern bool out_of_memory(struct oom_control *oc);
 
 extern void exit_oom_victim(struct task_struct *tsk);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae8b81c..0d422bf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1248,7 +1248,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	struct mem_cgroup *iter;
 	unsigned long chosen_points = 0;
 	unsigned long totalpages;
-	unsigned int points = 0;
+	unsigned long points = 0;
 	struct task_struct *chosen = NULL;
 
 	mutex_lock(&oom_lock);
@@ -1271,28 +1271,17 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 
 		css_task_iter_start(&iter->css, &it);
 		while ((task = css_task_iter_next(&it))) {
-			switch (oom_scan_process_thread(&oc, task, totalpages)) {
-			case OOM_SCAN_SELECT:
-				if (chosen)
-					put_task_struct(chosen);
-				chosen = task;
-				chosen_points = ULONG_MAX;
-				get_task_struct(chosen);
-				/* fall through */
-			case OOM_SCAN_CONTINUE:
+			points = oom_badness(task, memcg, &oc, totalpages,
+					     true);
+			if (!points || points < chosen_points)
 				continue;
-			case OOM_SCAN_ABORT:
+			if (points == ULONG_MAX) {
 				css_task_iter_end(&it);
 				mem_cgroup_iter_break(memcg, iter);
 				if (chosen)
 					put_task_struct(chosen);
 				goto unlock;
-			case OOM_SCAN_OK:
-				break;
-			};
-			points = oom_badness(task, memcg, NULL, totalpages);
-			if (!points || points < chosen_points)
-				continue;
+			}
 			/* Prefer thread group leaders for display purposes */
 			if (points == chosen_points &&
 			    thread_group_leader(chosen))
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index d7bb9c1..f426ce8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -155,21 +155,40 @@ static bool oom_unkillable_task(struct task_struct *p,
 /**
  * oom_badness - heuristic function to determine which candidate task to kill
  * @p: task struct of which task we should calculate
+ * @memcg: memory cgroup. NULL unless memcg OOM case.
+ * @oc: oom_control struct. NULL if /proc/pid/oom_score_adj case.
  * @totalpages: total present RAM allowed for page allocation
+ * @check_exceptions: whether to check for TIF_MEMDIE and oom_task_origin().
+ *
+ * Returns ULONG_MAX if @p is marked as OOM-victim.
+ * Returns ULONG_MAX - 1 if @p is marked as oom_task_origin().
+ * Returns 0 if @p is marked as OOM-unkillable.
+ * Returns integer between 1 and ULONG_MAX - 2 otherwise.
  *
  * The heuristic for determining which task to kill is made to be as simple and
  * predictable as possible.  The goal is to return the highest value for the
  * task consuming the most memory to avoid subsequent oom failures.
  */
 unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
-			  const nodemask_t *nodemask, unsigned long totalpages)
+			  struct oom_control *oc, unsigned long totalpages,
+			  bool check_exceptions)
 {
 	long points;
 	long adj;
 
-	if (oom_unkillable_task(p, memcg, nodemask))
+	if (oom_unkillable_task(p, memcg, oc ? oc->nodemask : NULL))
 		return 0;
 
+	/*
+	 * This task already has access to memory reserves and is being killed.
+	 * Don't allow any other task to have access to the reserves.
+	 */
+	if (check_exceptions)
+		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+			if (!is_sysrq_oom(oc))
+				return ULONG_MAX;
+		}
+
 	p = find_lock_task_mm(p);
 	if (!p)
 		return 0;
@@ -181,6 +200,16 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
 	}
 
 	/*
+	 * If task is allocating a lot of memory and has been marked to be
+	 * killed first if it triggers an oom, then select it.
+	 */
+	if (check_exceptions)
+		if (oom_task_origin(p)) {
+			task_unlock(p);
+			return ULONG_MAX - 1;
+		}
+
+	/*
 	 * The baseline for the badness score is the proportion of RAM that each
 	 * task's rss, pagetable and swap space use.
 	 */
@@ -268,36 +297,6 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc,
 }
 #endif
 
-enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
-			struct task_struct *task, unsigned long totalpages)
-{
-	if (oom_unkillable_task(task, NULL, oc->nodemask))
-		return OOM_SCAN_CONTINUE;
-
-	/*
-	 * This task already has access to memory reserves and is being killed.
-	 * Don't allow any other task to have access to the reserves.
-	 */
-	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
-		if (!is_sysrq_oom(oc))
-			return OOM_SCAN_ABORT;
-	}
-	if (!task->mm)
-		return OOM_SCAN_CONTINUE;
-
-	/*
-	 * If task is allocating a lot of memory and has been marked to be
-	 * killed first if it triggers an oom, then select it.
-	 */
-	if (oom_task_origin(task))
-		return OOM_SCAN_SELECT;
-
-	if (task_will_free_mem(task) && !is_sysrq_oom(oc))
-		return OOM_SCAN_ABORT;
-
-	return OOM_SCAN_OK;
-}
-
 /*
  * Simple selection loop. We chose the process with the highest
  * number of 'points'.  Returns -1 on scan abort.
@@ -311,24 +310,14 @@ static struct task_struct *select_bad_process(struct oom_control *oc,
 
 	rcu_read_lock();
 	for_each_process_thread(g, p) {
-		unsigned int points;
-
-		switch (oom_scan_process_thread(oc, p, totalpages)) {
-		case OOM_SCAN_SELECT:
-			chosen = p;
-			chosen_points = ULONG_MAX;
-			/* fall through */
-		case OOM_SCAN_CONTINUE:
+		const unsigned long points = oom_badness(p, NULL, oc,
+							 totalpages, true);
+		if (!points || points < chosen_points)
 			continue;
-		case OOM_SCAN_ABORT:
+		if (points == ULONG_MAX) {
 			rcu_read_unlock();
 			return (struct task_struct *)(-1UL);
-		case OOM_SCAN_OK:
-			break;
-		};
-		points = oom_badness(p, NULL, oc->nodemask, totalpages);
-		if (!points || points < chosen_points)
-			continue;
+		}
 		/* Prefer thread group leaders for display purposes */
 		if (points == chosen_points && thread_group_leader(chosen))
 			continue;
@@ -679,7 +668,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	struct task_struct *child;
 	struct task_struct *t;
 	struct mm_struct *mm;
-	unsigned int victim_points = 0;
+	unsigned long victim_points = 0;
 	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
 					      DEFAULT_RATELIMIT_BURST);
 	bool can_oom_reap = true;
@@ -712,15 +701,15 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	read_lock(&tasklist_lock);
 	for_each_thread(p, t) {
 		list_for_each_entry(child, &t->children, sibling) {
-			unsigned int child_points;
+			unsigned long child_points;
 
 			if (process_shares_mm(child, p->mm))
 				continue;
 			/*
 			 * oom_badness() returns 0 if the thread is unkillable
 			 */
-			child_points = oom_badness(child, memcg, oc->nodemask,
-								totalpages);
+			child_points = oom_badness(child, memcg, oc,
+						   totalpages, false);
 			if (child_points > victim_points) {
 				put_task_struct(victim);
 				victim = child;
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-18 10:30       ` Tetsuo Handa
@ 2016-02-18 12:08         ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-18 12:08 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Thu 18-02-16 19:30:12, Tetsuo Handa wrote:
[...]
> Commit 9cbb78bb314360a8 changed oom_scan_process_thread() to
> always pass memcg == NULL by removing memcg argument from
> oom_scan_process_thread(). As a result, after that commit,
> we are doing test_tsk_thread_flag(p, TIF_MEMDIE) check and
> oom_task_origin(p) check between two oom_unkillable_task()
> calls of memcg OOM case. Why don't we skip these checks by
> passing memcg != NULL to first oom_unkillable_task() call?
> Was this change by error?

I am not really sure I understand your question.  The point is
that mem_cgroup_out_of_memory does for_each_mem_cgroup_tree which
means that only tasks from the given memcg hierarchy is checked and
oom_unkillable_task cares about memcg only for

        /* When mem_cgroup_out_of_memory() and p is not member of the group */
        if (memcg && !task_in_mem_cgroup(p, memcg))
                return true;

which is never true by definition. I guess we can safely remove the memcg
argument from oom_badness and oom_unkillable_task. At least from a quick
glance...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-18 12:08         ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-18 12:08 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Thu 18-02-16 19:30:12, Tetsuo Handa wrote:
[...]
> Commit 9cbb78bb314360a8 changed oom_scan_process_thread() to
> always pass memcg == NULL by removing memcg argument from
> oom_scan_process_thread(). As a result, after that commit,
> we are doing test_tsk_thread_flag(p, TIF_MEMDIE) check and
> oom_task_origin(p) check between two oom_unkillable_task()
> calls of memcg OOM case. Why don't we skip these checks by
> passing memcg != NULL to first oom_unkillable_task() call?
> Was this change by error?

I am not really sure I understand your question.  The point is
that mem_cgroup_out_of_memory does for_each_mem_cgroup_tree which
means that only tasks from the given memcg hierarchy is checked and
oom_unkillable_task cares about memcg only for

        /* When mem_cgroup_out_of_memory() and p is not member of the group */
        if (memcg && !task_in_mem_cgroup(p, memcg))
                return true;

which is never true by definition. I guess we can safely remove the memcg
argument from oom_badness and oom_unkillable_task. At least from a quick
glance...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-18 12:08         ` Michal Hocko
@ 2016-02-18 12:13           ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-18 12:13 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Thu 18-02-16 13:08:49, Michal Hocko wrote:
> I guess we can safely remove the memcg
> argument from oom_badness and oom_unkillable_task. At least from a quick
> glance...

No we cannot actually. oom_kill_process could select a child which is in
a different memcg in that case...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-18 12:13           ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-18 12:13 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Thu 18-02-16 13:08:49, Michal Hocko wrote:
> I guess we can safely remove the memcg
> argument from oom_badness and oom_unkillable_task. At least from a quick
> glance...

No we cannot actually. oom_kill_process could select a child which is in
a different memcg in that case...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-18 12:13           ` Michal Hocko
@ 2016-02-19 15:07             ` Tetsuo Handa
  -1 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2016-02-19 15:07 UTC (permalink / raw)
  To: mhocko
  Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

Michal Hocko wrote:
> On Thu 18-02-16 13:08:49, Michal Hocko wrote:
> > I guess we can safely remove the memcg
> > argument from oom_badness and oom_unkillable_task. At least from a quick
> > glance...
> 
> No we cannot actually. oom_kill_process could select a child which is in
> a different memcg in that case...

Then, don't we need to check whether processes sharing victim->mm in other
thread groups are in the same memcg when we walk the process list?

----------
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f426ce8..d05db31 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -762,8 +762,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 			continue;
 		if (same_thread_group(p, victim))
 			continue;
-		if (unlikely(p->flags & PF_KTHREAD) || is_global_init(p) ||
-		    p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
+		if (oom_badness(p, memcg, oc, totalpages, false) == 0) {
 			/*
 			 * We cannot use oom_reaper for the mm shared by this
 			 * process because it wouldn't get killed and so the
----------

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-19 15:07             ` Tetsuo Handa
  0 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2016-02-19 15:07 UTC (permalink / raw)
  To: mhocko
  Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

Michal Hocko wrote:
> On Thu 18-02-16 13:08:49, Michal Hocko wrote:
> > I guess we can safely remove the memcg
> > argument from oom_badness and oom_unkillable_task. At least from a quick
> > glance...
> 
> No we cannot actually. oom_kill_process could select a child which is in
> a different memcg in that case...

Then, don't we need to check whether processes sharing victim->mm in other
thread groups are in the same memcg when we walk the process list?

----------
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f426ce8..d05db31 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -762,8 +762,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 			continue;
 		if (same_thread_group(p, victim))
 			continue;
-		if (unlikely(p->flags & PF_KTHREAD) || is_global_init(p) ||
-		    p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
+		if (oom_badness(p, memcg, oc, totalpages, false) == 0) {
 			/*
 			 * We cannot use oom_reaper for the mm shared by this
 			 * process because it wouldn't get killed and so the
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-19 15:07             ` Tetsuo Handa
@ 2016-02-19 15:13               ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-19 15:13 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Sat 20-02-16 00:07:05, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Thu 18-02-16 13:08:49, Michal Hocko wrote:
> > > I guess we can safely remove the memcg
> > > argument from oom_badness and oom_unkillable_task. At least from a quick
> > > glance...
> > 
> > No we cannot actually. oom_kill_process could select a child which is in
> > a different memcg in that case...
> 
> Then, don't we need to check whether processes sharing victim->mm in other
> thread groups are in the same memcg when we walk the process list?

memcg is bound to the mm not to the task. So all processes sharing the
mm after in the same memcg (from the memcg POV). See tast_struct::owner.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-19 15:13               ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-19 15:13 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: rientjes, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Sat 20-02-16 00:07:05, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Thu 18-02-16 13:08:49, Michal Hocko wrote:
> > > I guess we can safely remove the memcg
> > > argument from oom_badness and oom_unkillable_task. At least from a quick
> > > glance...
> > 
> > No we cannot actually. oom_kill_process could select a child which is in
> > a different memcg in that case...
> 
> Then, don't we need to check whether processes sharing victim->mm in other
> thread groups are in the same memcg when we walk the process list?

memcg is bound to the mm not to the task. So all processes sharing the
mm after in the same memcg (from the memcg POV). See tast_struct::owner.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-18  8:09     ` Michal Hocko
@ 2016-02-23  1:06       ` David Rientjes
  -1 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2016-02-23  1:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Thu, 18 Feb 2016, Michal Hocko wrote:

> > Anyway, this is NACK'd since task->signal->oom_score_adj is checked under 
> > task_lock() for threads with memory attached, that's the purpose of 
> > finding the correct thread in oom_badness() and taking task_lock().  We 
> > aren't going to duplicate logic in several functions that all do the same 
> > thing.
> 
> Is the task_lock really necessary, though? E.g. oom_task_origin()
> doesn't seem to depend on it for task->signal safety. If you are
> referring to races with changing oom_score_adj does such a race matter
> at all?
> 

oom_badness() ranges from 0 (don't kill) to 1000 (please kill).  It 
factors in the setting of /proc/self/oom_score_adj to change that value.  
That is where OOM_SCORE_ADJ_MIN is enforced.  It is also needed in 
oom_badness() to determine whether a child process should be sacrificed 
for its parent.  We don't add duplicate logic everywhere if you want the 
code to be maintainable; the only exception would be for performance 
critical code which the oom killer most certainly is not.

I'm simply not entertaining any patch to the oom killer that duplicates 
code everywhere, increases its complexity, makes it grow in text size, and 
makes it more difficult to maintain.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-23  1:06       ` David Rientjes
  0 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2016-02-23  1:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Thu, 18 Feb 2016, Michal Hocko wrote:

> > Anyway, this is NACK'd since task->signal->oom_score_adj is checked under 
> > task_lock() for threads with memory attached, that's the purpose of 
> > finding the correct thread in oom_badness() and taking task_lock().  We 
> > aren't going to duplicate logic in several functions that all do the same 
> > thing.
> 
> Is the task_lock really necessary, though? E.g. oom_task_origin()
> doesn't seem to depend on it for task->signal safety. If you are
> referring to races with changing oom_score_adj does such a race matter
> at all?
> 

oom_badness() ranges from 0 (don't kill) to 1000 (please kill).  It 
factors in the setting of /proc/self/oom_score_adj to change that value.  
That is where OOM_SCORE_ADJ_MIN is enforced.  It is also needed in 
oom_badness() to determine whether a child process should be sacrificed 
for its parent.  We don't add duplicate logic everywhere if you want the 
code to be maintainable; the only exception would be for performance 
critical code which the oom killer most certainly is not.

I'm simply not entertaining any patch to the oom killer that duplicates 
code everywhere, increases its complexity, makes it grow in text size, and 
makes it more difficult to maintain.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-23  1:06       ` David Rientjes
@ 2016-02-23 12:34         ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-23 12:34 UTC (permalink / raw)
  To: David Rientjes
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Mon 22-02-16 17:06:29, David Rientjes wrote:
> On Thu, 18 Feb 2016, Michal Hocko wrote:
> 
> > > Anyway, this is NACK'd since task->signal->oom_score_adj is checked under 
> > > task_lock() for threads with memory attached, that's the purpose of 
> > > finding the correct thread in oom_badness() and taking task_lock().  We 
> > > aren't going to duplicate logic in several functions that all do the same 
> > > thing.
> > 
> > Is the task_lock really necessary, though? E.g. oom_task_origin()
> > doesn't seem to depend on it for task->signal safety. If you are
> > referring to races with changing oom_score_adj does such a race matter
> > at all?
> > 
> 
> oom_badness() ranges from 0 (don't kill) to 1000 (please kill).  It 
> factors in the setting of /proc/self/oom_score_adj to change that value.  
> That is where OOM_SCORE_ADJ_MIN is enforced. 

The question is whether the current placement of OOM_SCORE_ADJ_MIN
is appropriate. Wouldn't it make more sense to check it in oom_unkillable_task
instead? Sure, checking oom_score_adj under task_lock inside oom_badness will
prevent from races but the question I raised previously was whether we
actually care about those races? When would it matter? Is it really
likely that the update happen during the oom killing? And if yes what
prevents from the update happening _after_ the check?

If for nothing else oom_unkillable_task would be complete that way. E.g.
sysctl_oom_kill_allocating_task has to check for OOM_SCORE_ADJ_MIN
because it doesn't rely on oom_badness and that alone would suggest
that the check is misplaced.

That being said I do not really care that much. I would just find it
neater to have oom_unkillable_task that would really consider all the
cases where the OOM should ignore a task.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-23 12:34         ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-23 12:34 UTC (permalink / raw)
  To: David Rientjes
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Mon 22-02-16 17:06:29, David Rientjes wrote:
> On Thu, 18 Feb 2016, Michal Hocko wrote:
> 
> > > Anyway, this is NACK'd since task->signal->oom_score_adj is checked under 
> > > task_lock() for threads with memory attached, that's the purpose of 
> > > finding the correct thread in oom_badness() and taking task_lock().  We 
> > > aren't going to duplicate logic in several functions that all do the same 
> > > thing.
> > 
> > Is the task_lock really necessary, though? E.g. oom_task_origin()
> > doesn't seem to depend on it for task->signal safety. If you are
> > referring to races with changing oom_score_adj does such a race matter
> > at all?
> > 
> 
> oom_badness() ranges from 0 (don't kill) to 1000 (please kill).  It 
> factors in the setting of /proc/self/oom_score_adj to change that value.  
> That is where OOM_SCORE_ADJ_MIN is enforced. 

The question is whether the current placement of OOM_SCORE_ADJ_MIN
is appropriate. Wouldn't it make more sense to check it in oom_unkillable_task
instead? Sure, checking oom_score_adj under task_lock inside oom_badness will
prevent from races but the question I raised previously was whether we
actually care about those races? When would it matter? Is it really
likely that the update happen during the oom killing? And if yes what
prevents from the update happening _after_ the check?

If for nothing else oom_unkillable_task would be complete that way. E.g.
sysctl_oom_kill_allocating_task has to check for OOM_SCORE_ADJ_MIN
because it doesn't rely on oom_badness and that alone would suggest
that the check is misplaced.

That being said I do not really care that much. I would just find it
neater to have oom_unkillable_task that would really consider all the
cases where the OOM should ignore a task.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-23 12:34         ` Michal Hocko
@ 2016-02-23 22:33           ` David Rientjes
  -1 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2016-02-23 22:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Tue, 23 Feb 2016, Michal Hocko wrote:

> > oom_badness() ranges from 0 (don't kill) to 1000 (please kill).  It 
> > factors in the setting of /proc/self/oom_score_adj to change that value.  
> > That is where OOM_SCORE_ADJ_MIN is enforced. 
> 
> The question is whether the current placement of OOM_SCORE_ADJ_MIN
> is appropriate. Wouldn't it make more sense to check it in oom_unkillable_task
> instead?

oom_unkillable_task() deals with the type of task it is (init or kthread) 
or being ineligible due to the memcg and cpuset placement.  We want to 
exclude them from consideration and also suppress them from the task dump 
in the kernel log.  We don't want to suppress oom disabled processes, we 
really want to know their rss, for example.  It could be renamed 
is_ineligible_task().

> Sure, checking oom_score_adj under task_lock inside oom_badness will
> prevent from races but the question I raised previously was whether we
> actually care about those races? When would it matter? Is it really
> likely that the update happen during the oom killing? And if yes what
> prevents from the update happening _after_ the check?
> 

It's not necessarily to take task_lock(), but find_lock_task_mm() is the 
means we have to iterate threads to find any with memory attached.  We 
need that logic in oom_badness() to avoid racing with threads that have 
entered exit_mm().  It's possible for a thread to have a non-NULL ->mm in 
oom_scan_process_thread(), the thread enters exit_mm() without kill, and 
oom_badness() can still find it to be eligible because other threads have 
not exited.  We still want to issue a kill to this process and task_lock() 
protects the setting of task->mm to NULL: don't consider it to be a race 
in setting oom_score_adj, consider it to be a race in unmapping (but not 
freeing) memory in th exit path.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-23 22:33           ` David Rientjes
  0 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2016-02-23 22:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Tue, 23 Feb 2016, Michal Hocko wrote:

> > oom_badness() ranges from 0 (don't kill) to 1000 (please kill).  It 
> > factors in the setting of /proc/self/oom_score_adj to change that value.  
> > That is where OOM_SCORE_ADJ_MIN is enforced. 
> 
> The question is whether the current placement of OOM_SCORE_ADJ_MIN
> is appropriate. Wouldn't it make more sense to check it in oom_unkillable_task
> instead?

oom_unkillable_task() deals with the type of task it is (init or kthread) 
or being ineligible due to the memcg and cpuset placement.  We want to 
exclude them from consideration and also suppress them from the task dump 
in the kernel log.  We don't want to suppress oom disabled processes, we 
really want to know their rss, for example.  It could be renamed 
is_ineligible_task().

> Sure, checking oom_score_adj under task_lock inside oom_badness will
> prevent from races but the question I raised previously was whether we
> actually care about those races? When would it matter? Is it really
> likely that the update happen during the oom killing? And if yes what
> prevents from the update happening _after_ the check?
> 

It's not necessarily to take task_lock(), but find_lock_task_mm() is the 
means we have to iterate threads to find any with memory attached.  We 
need that logic in oom_badness() to avoid racing with threads that have 
entered exit_mm().  It's possible for a thread to have a non-NULL ->mm in 
oom_scan_process_thread(), the thread enters exit_mm() without kill, and 
oom_badness() can still find it to be eligible because other threads have 
not exited.  We still want to issue a kill to this process and task_lock() 
protects the setting of task->mm to NULL: don't consider it to be a race 
in setting oom_score_adj, consider it to be a race in unmapping (but not 
freeing) memory in th exit path.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-23 22:33           ` David Rientjes
@ 2016-02-24 10:05             ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-24 10:05 UTC (permalink / raw)
  To: David Rientjes
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Tue 23-02-16 14:33:01, David Rientjes wrote:
> On Tue, 23 Feb 2016, Michal Hocko wrote:
> 
> > > oom_badness() ranges from 0 (don't kill) to 1000 (please kill).  It 
> > > factors in the setting of /proc/self/oom_score_adj to change that value.  
> > > That is where OOM_SCORE_ADJ_MIN is enforced. 
> > 
> > The question is whether the current placement of OOM_SCORE_ADJ_MIN
> > is appropriate. Wouldn't it make more sense to check it in oom_unkillable_task
> > instead?
> 
> oom_unkillable_task() deals with the type of task it is (init or kthread) 
> or being ineligible due to the memcg and cpuset placement.

Yes and OOM disabled is yet another condition.

> We want to 
> exclude them from consideration and also suppress them from the task dump 
> in the kernel log.  We don't want to suppress oom disabled processes, we 
> really want to know their rss, for example.

Hmm, is it really helpful though? What would you deduce from seeing a
large rss an OOM_SCORE_ADJ_MIN task? Misconfigured system? There must
have been a reason to mark the task that way in the first place so you
can hardly do anything about it. Moreover you can deduce the same from
the available information.

I would even argue that displaying OOM_SCORE_ADJ_MIN might be a bit
counterproductive because you have to filter them out when looking at
the listing.

> It could be renamed is_ineligible_task().

That wouldn't really help imho because OOM_SCORE_ADJ_MIN is an
uneligible task.

> > Sure, checking oom_score_adj under task_lock inside oom_badness will
> > prevent from races but the question I raised previously was whether we
> > actually care about those races? When would it matter? Is it really
> > likely that the update happen during the oom killing? And if yes what
> > prevents from the update happening _after_ the check?
> > 
> 
> It's not necessarily to take task_lock(), but find_lock_task_mm() is the 
> means we have to iterate threads to find any with memory attached.  We 
> need that logic in oom_badness() to avoid racing with threads that have 
> entered exit_mm().  It's possible for a thread to have a non-NULL ->mm in 
> oom_scan_process_thread(), the thread enters exit_mm() without kill, and 
> oom_badness() can still find it to be eligible because other threads have 
> not exited.  We still want to issue a kill to this process and task_lock() 
> protects the setting of task->mm to NULL: don't consider it to be a race 
> in setting oom_score_adj, consider it to be a race in unmapping (but not 
> freeing) memory in th exit path.

I am confused now. This all is true but it is independent on OOM_SCORE_ADJ_MIN
check? The check is per signal_struct so checking all the threads will
not change anything.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-24 10:05             ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2016-02-24 10:05 UTC (permalink / raw)
  To: David Rientjes
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Tue 23-02-16 14:33:01, David Rientjes wrote:
> On Tue, 23 Feb 2016, Michal Hocko wrote:
> 
> > > oom_badness() ranges from 0 (don't kill) to 1000 (please kill).  It 
> > > factors in the setting of /proc/self/oom_score_adj to change that value.  
> > > That is where OOM_SCORE_ADJ_MIN is enforced. 
> > 
> > The question is whether the current placement of OOM_SCORE_ADJ_MIN
> > is appropriate. Wouldn't it make more sense to check it in oom_unkillable_task
> > instead?
> 
> oom_unkillable_task() deals with the type of task it is (init or kthread) 
> or being ineligible due to the memcg and cpuset placement.

Yes and OOM disabled is yet another condition.

> We want to 
> exclude them from consideration and also suppress them from the task dump 
> in the kernel log.  We don't want to suppress oom disabled processes, we 
> really want to know their rss, for example.

Hmm, is it really helpful though? What would you deduce from seeing a
large rss an OOM_SCORE_ADJ_MIN task? Misconfigured system? There must
have been a reason to mark the task that way in the first place so you
can hardly do anything about it. Moreover you can deduce the same from
the available information.

I would even argue that displaying OOM_SCORE_ADJ_MIN might be a bit
counterproductive because you have to filter them out when looking at
the listing.

> It could be renamed is_ineligible_task().

That wouldn't really help imho because OOM_SCORE_ADJ_MIN is an
uneligible task.

> > Sure, checking oom_score_adj under task_lock inside oom_badness will
> > prevent from races but the question I raised previously was whether we
> > actually care about those races? When would it matter? Is it really
> > likely that the update happen during the oom killing? And if yes what
> > prevents from the update happening _after_ the check?
> > 
> 
> It's not necessarily to take task_lock(), but find_lock_task_mm() is the 
> means we have to iterate threads to find any with memory attached.  We 
> need that logic in oom_badness() to avoid racing with threads that have 
> entered exit_mm().  It's possible for a thread to have a non-NULL ->mm in 
> oom_scan_process_thread(), the thread enters exit_mm() without kill, and 
> oom_badness() can still find it to be eligible because other threads have 
> not exited.  We still want to issue a kill to this process and task_lock() 
> protects the setting of task->mm to NULL: don't consider it to be a race 
> in setting oom_score_adj, consider it to be a race in unmapping (but not 
> freeing) memory in th exit path.

I am confused now. This all is true but it is independent on OOM_SCORE_ADJ_MIN
check? The check is per signal_struct so checking all the threads will
not change anything.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
  2016-02-24 10:05             ` Michal Hocko
@ 2016-02-24 21:36               ` David Rientjes
  -1 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2016-02-24 21:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Wed, 24 Feb 2016, Michal Hocko wrote:

> Hmm, is it really helpful though? What would you deduce from seeing a
> large rss an OOM_SCORE_ADJ_MIN task? Misconfigured system? There must
> have been a reason to mark the task that way in the first place so you
> can hardly do anything about it. Moreover you can deduce the same from
> the available information.
> 

Users run processes that are vital to the machine with OOM_SCORE_ADJ_MIN.  
This does not make them immune to having memory leaks that caused the oom 
condition, and identifying that has triaged many bugs in the past.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable.
@ 2016-02-24 21:36               ` David Rientjes
  0 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2016-02-24 21:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, akpm, mgorman, oleg, torvalds, hughd, andrea, riel,
	linux-mm, linux-kernel

On Wed, 24 Feb 2016, Michal Hocko wrote:

> Hmm, is it really helpful though? What would you deduce from seeing a
> large rss an OOM_SCORE_ADJ_MIN task? Misconfigured system? There must
> have been a reason to mark the task that way in the first place so you
> can hardly do anything about it. Moreover you can deduce the same from
> the available information.
> 

Users run processes that are vital to the machine with OOM_SCORE_ADJ_MIN.  
This does not make them immune to having memory leaks that caused the oom 
condition, and identifying that has triaged many bugs in the past.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-02-24 21:36 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-17 14:31 [PATCH v2] mm,oom: exclude oom_task_origin processes if they are OOM-unkillable Tetsuo Handa
2016-02-17 14:31 ` Tetsuo Handa
2016-02-17 14:48 ` Michal Hocko
2016-02-17 14:48   ` Michal Hocko
2016-02-17 22:31 ` David Rientjes
2016-02-17 22:31   ` David Rientjes
2016-02-18  8:09   ` Michal Hocko
2016-02-18  8:09     ` Michal Hocko
2016-02-18 10:30     ` Tetsuo Handa
2016-02-18 10:30       ` Tetsuo Handa
2016-02-18 12:08       ` Michal Hocko
2016-02-18 12:08         ` Michal Hocko
2016-02-18 12:13         ` Michal Hocko
2016-02-18 12:13           ` Michal Hocko
2016-02-19 15:07           ` Tetsuo Handa
2016-02-19 15:07             ` Tetsuo Handa
2016-02-19 15:13             ` Michal Hocko
2016-02-19 15:13               ` Michal Hocko
2016-02-23  1:06     ` David Rientjes
2016-02-23  1:06       ` David Rientjes
2016-02-23 12:34       ` Michal Hocko
2016-02-23 12:34         ` Michal Hocko
2016-02-23 22:33         ` David Rientjes
2016-02-23 22:33           ` David Rientjes
2016-02-24 10:05           ` Michal Hocko
2016-02-24 10:05             ` Michal Hocko
2016-02-24 21:36             ` David Rientjes
2016-02-24 21:36               ` David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.