All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-04-12  9:19 ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-04-12  9:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Oleg Nesterov, David Rientjes, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

task_will_free_mem is a misnomer for a more complex PF_EXITING test
for early break out from the oom killer because it is believed that
such a task would release its memory shortly and so we do not have
to select an oom victim and perform a disruptive action.

Currently we make sure that the given task is not participating in the
core dumping because it might get blocked for a long time - see
d003f371b270 ("oom: don't assume that a coredumping thread will exit
soon").

The check can still do better though. We shouldn't consider the task
unless the whole thread group is going down. This is rather unlikely
but not impossible. A single exiting thread would surely leave all the
address space behind. If we are really unlucky it might get stuck on the
exit path and keep its TIF_MEMDIE and so block the oom killer.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
I hope I got it right but I would really appreciate if Oleg found some
time and double checked after me. The fix is more cosmetic than anything
else but I guess it is worth it.

Thanks!

 include/linux/oom.h | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 628a43242a34..b09c7dc523ff 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -102,13 +102,24 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
 
 static inline bool task_will_free_mem(struct task_struct *task)
 {
+	struct signal_struct *sig = task->signal;
+
 	/*
 	 * A coredumping process may sleep for an extended period in exit_mm(),
 	 * so the oom killer cannot assume that the process will promptly exit
 	 * and release memory.
 	 */
-	return (task->flags & PF_EXITING) &&
-		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
+	if (sig->flags & SIGNAL_GROUP_COREDUMP)
+		return false;
+
+	if (!(task->flags & PF_EXITING))
+		return false;
+
+	/* Make sure that the whole thread group is going down */
+	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
+		return false;
+
+	return true;
 }
 
 /* sysctls */
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-04-12  9:19 ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-04-12  9:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Oleg Nesterov, David Rientjes, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

task_will_free_mem is a misnomer for a more complex PF_EXITING test
for early break out from the oom killer because it is believed that
such a task would release its memory shortly and so we do not have
to select an oom victim and perform a disruptive action.

Currently we make sure that the given task is not participating in the
core dumping because it might get blocked for a long time - see
d003f371b270 ("oom: don't assume that a coredumping thread will exit
soon").

The check can still do better though. We shouldn't consider the task
unless the whole thread group is going down. This is rather unlikely
but not impossible. A single exiting thread would surely leave all the
address space behind. If we are really unlucky it might get stuck on the
exit path and keep its TIF_MEMDIE and so block the oom killer.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
I hope I got it right but I would really appreciate if Oleg found some
time and double checked after me. The fix is more cosmetic than anything
else but I guess it is worth it.

Thanks!

 include/linux/oom.h | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 628a43242a34..b09c7dc523ff 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -102,13 +102,24 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
 
 static inline bool task_will_free_mem(struct task_struct *task)
 {
+	struct signal_struct *sig = task->signal;
+
 	/*
 	 * A coredumping process may sleep for an extended period in exit_mm(),
 	 * so the oom killer cannot assume that the process will promptly exit
 	 * and release memory.
 	 */
-	return (task->flags & PF_EXITING) &&
-		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
+	if (sig->flags & SIGNAL_GROUP_COREDUMP)
+		return false;
+
+	if (!(task->flags & PF_EXITING))
+		return false;
+
+	/* Make sure that the whole thread group is going down */
+	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
+		return false;
+
+	return true;
 }
 
 /* sysctls */
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-04-12  9:19 ` Michal Hocko
@ 2016-04-13 11:04   ` Tetsuo Handa
  -1 siblings, 0 replies; 24+ messages in thread
From: Tetsuo Handa @ 2016-04-13 11:04 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Oleg Nesterov, David Rientjes, linux-mm, LKML, Michal Hocko

On 2016/04/12 18:19, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> task_will_free_mem is a misnomer for a more complex PF_EXITING test
> for early break out from the oom killer because it is believed that
> such a task would release its memory shortly and so we do not have
> to select an oom victim and perform a disruptive action.
> 
> Currently we make sure that the given task is not participating in the
> core dumping because it might get blocked for a long time - see
> d003f371b270 ("oom: don't assume that a coredumping thread will exit
> soon").
> 
> The check can still do better though. We shouldn't consider the task
> unless the whole thread group is going down. This is rather unlikely
> but not impossible. A single exiting thread would surely leave all the
> address space behind. If we are really unlucky it might get stuck on the
> exit path and keep its TIF_MEMDIE and so block the oom killer.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> I hope I got it right but I would really appreciate if Oleg found some
> time and double checked after me. The fix is more cosmetic than anything
> else but I guess it is worth it.

I don't know what

    fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
    out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.

in commit d003f371b270 is saying (how SIGNAL_GROUP_COREDUMP can make
fatal_signal_pending() true when fatal_signal_pending() is defined as

  static inline int __fatal_signal_pending(struct task_struct *p)
  {
  	return unlikely(sigismember(&p->pending.signal, SIGKILL));
  }

  static inline int fatal_signal_pending(struct task_struct *p)
  {
  	return signal_pending(p) && __fatal_signal_pending(p);
  }

which does not check SIGNAL_GROUP_COREDUMP). But I think that playing
with racy conditions as of setting TIF_MEMDIE is a bad direction.

The most disruptive action is not to select an OOM victim when we need to
select an OOM victim (which is known as the OOM livelock). Do you agree?

The least disruptive action is not to select an OOM victim when we don't
need to select an OOM victim (which is known as disabling the OOM killer).
Do you agree?

If you can agree on both, we can have a chance to make less disruptive
using bound waiting.

Since commit 6a618957ad17 ("mm: oom_kill: don't ignore oom score on exiting
tasks") was merged before your OOM detection rework is merged,

    We've tried direct reclaim at least 15 times by the time we decide
    the system is OOM

in that commit now became a puzzling explanation. But the reason I proposed
that change is that we will hit the OOM livelock if we wait unconditionally
( http://lkml.kernel.org/r/20160217143917.GP29196@dhcp22.suse.cz ).
If we accept bound waiting, we did not need to merge that change.

Also, we don't need to delete the shortcuts if we accept bound waiting
if you think deleting the shortcuts makes more disruptive.

I believe that the preferred way (from the point of view of trying to avoid
disruptive action if possible) is to wait for a bounded amount when checking
for TIF_MEMDIE threads to release their mm, rather than play with racy
situations as of setting TIF_MEMDIE.

> 
> Thanks!
> 
>  include/linux/oom.h | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 628a43242a34..b09c7dc523ff 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -102,13 +102,24 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>  
>  static inline bool task_will_free_mem(struct task_struct *task)
>  {
> +	struct signal_struct *sig = task->signal;
> +
>  	/*
>  	 * A coredumping process may sleep for an extended period in exit_mm(),
>  	 * so the oom killer cannot assume that the process will promptly exit
>  	 * and release memory.
>  	 */
> -	return (task->flags & PF_EXITING) &&
> -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> +		return false;
> +
> +	if (!(task->flags & PF_EXITING))
> +		return false;
> +
> +	/* Make sure that the whole thread group is going down */
> +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> +		return false;

The whole thread group is going down does not mean we make sure that
we will send SIGKILL to other thread groups sharing the same memory which
is possibly holding mmap_sem for write, does it?

> +
> +	return true;
>  }
>  
>  /* sysctls */
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-04-13 11:04   ` Tetsuo Handa
  0 siblings, 0 replies; 24+ messages in thread
From: Tetsuo Handa @ 2016-04-13 11:04 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Oleg Nesterov, David Rientjes, linux-mm, LKML, Michal Hocko

On 2016/04/12 18:19, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> task_will_free_mem is a misnomer for a more complex PF_EXITING test
> for early break out from the oom killer because it is believed that
> such a task would release its memory shortly and so we do not have
> to select an oom victim and perform a disruptive action.
> 
> Currently we make sure that the given task is not participating in the
> core dumping because it might get blocked for a long time - see
> d003f371b270 ("oom: don't assume that a coredumping thread will exit
> soon").
> 
> The check can still do better though. We shouldn't consider the task
> unless the whole thread group is going down. This is rather unlikely
> but not impossible. A single exiting thread would surely leave all the
> address space behind. If we are really unlucky it might get stuck on the
> exit path and keep its TIF_MEMDIE and so block the oom killer.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> I hope I got it right but I would really appreciate if Oleg found some
> time and double checked after me. The fix is more cosmetic than anything
> else but I guess it is worth it.

I don't know what

    fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
    out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.

in commit d003f371b270 is saying (how SIGNAL_GROUP_COREDUMP can make
fatal_signal_pending() true when fatal_signal_pending() is defined as

  static inline int __fatal_signal_pending(struct task_struct *p)
  {
  	return unlikely(sigismember(&p->pending.signal, SIGKILL));
  }

  static inline int fatal_signal_pending(struct task_struct *p)
  {
  	return signal_pending(p) && __fatal_signal_pending(p);
  }

which does not check SIGNAL_GROUP_COREDUMP). But I think that playing
with racy conditions as of setting TIF_MEMDIE is a bad direction.

The most disruptive action is not to select an OOM victim when we need to
select an OOM victim (which is known as the OOM livelock). Do you agree?

The least disruptive action is not to select an OOM victim when we don't
need to select an OOM victim (which is known as disabling the OOM killer).
Do you agree?

If you can agree on both, we can have a chance to make less disruptive
using bound waiting.

Since commit 6a618957ad17 ("mm: oom_kill: don't ignore oom score on exiting
tasks") was merged before your OOM detection rework is merged,

    We've tried direct reclaim at least 15 times by the time we decide
    the system is OOM

in that commit now became a puzzling explanation. But the reason I proposed
that change is that we will hit the OOM livelock if we wait unconditionally
( http://lkml.kernel.org/r/20160217143917.GP29196@dhcp22.suse.cz ).
If we accept bound waiting, we did not need to merge that change.

Also, we don't need to delete the shortcuts if we accept bound waiting
if you think deleting the shortcuts makes more disruptive.

I believe that the preferred way (from the point of view of trying to avoid
disruptive action if possible) is to wait for a bounded amount when checking
for TIF_MEMDIE threads to release their mm, rather than play with racy
situations as of setting TIF_MEMDIE.

> 
> Thanks!
> 
>  include/linux/oom.h | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 628a43242a34..b09c7dc523ff 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -102,13 +102,24 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>  
>  static inline bool task_will_free_mem(struct task_struct *task)
>  {
> +	struct signal_struct *sig = task->signal;
> +
>  	/*
>  	 * A coredumping process may sleep for an extended period in exit_mm(),
>  	 * so the oom killer cannot assume that the process will promptly exit
>  	 * and release memory.
>  	 */
> -	return (task->flags & PF_EXITING) &&
> -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> +		return false;
> +
> +	if (!(task->flags & PF_EXITING))
> +		return false;
> +
> +	/* Make sure that the whole thread group is going down */
> +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> +		return false;

The whole thread group is going down does not mean we make sure that
we will send SIGKILL to other thread groups sharing the same memory which
is possibly holding mmap_sem for write, does it?

> +
> +	return true;
>  }
>  
>  /* sysctls */
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-04-13 11:04   ` Tetsuo Handa
@ 2016-04-13 13:08     ` Michal Hocko
  -1 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-04-13 13:08 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, Oleg Nesterov, David Rientjes, linux-mm, LKML

On Wed 13-04-16 20:04:54, Tetsuo Handa wrote:
> On 2016/04/12 18:19, Michal Hocko wrote:
[...]
> > Hi,
> > I hope I got it right but I would really appreciate if Oleg found some
> > time and double checked after me. The fix is more cosmetic than anything
> > else but I guess it is worth it.
> 
> I don't know what
> 
>     fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
>     out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
> 
> in commit d003f371b270 is saying (how SIGNAL_GROUP_COREDUMP can make
> fatal_signal_pending() true when fatal_signal_pending() is defined as

I guess this is about zap_process() but Olge would be more appropriate
to clarify. Anyway I fail to see how this is realted to this particular
patch.

[...]

> > diff --git a/include/linux/oom.h b/include/linux/oom.h
> > index 628a43242a34..b09c7dc523ff 100644
> > --- a/include/linux/oom.h
> > +++ b/include/linux/oom.h
> > @@ -102,13 +102,24 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
> >  
> >  static inline bool task_will_free_mem(struct task_struct *task)
> >  {
> > +	struct signal_struct *sig = task->signal;
> > +
> >  	/*
> >  	 * A coredumping process may sleep for an extended period in exit_mm(),
> >  	 * so the oom killer cannot assume that the process will promptly exit
> >  	 * and release memory.
> >  	 */
> > -	return (task->flags & PF_EXITING) &&
> > -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> > +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> > +		return false;
> > +
> > +	if (!(task->flags & PF_EXITING))
> > +		return false;
> > +
> > +	/* Make sure that the whole thread group is going down */
> > +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> > +		return false;
> 
> The whole thread group is going down does not mean we make sure that
> we will send SIGKILL to other thread groups sharing the same memory which
> is possibly holding mmap_sem for write, does it?

And the patch description doesn't say anything about processes sharing
mm. This is supposed to be a minor fix of an obviously suboptimal
behavior of task_will_free_mem. Can we stick to the proposed patch,
please?

If we really do care about processes sharing mm _that_much_ then it
should be handled in the separate patch.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-04-13 13:08     ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-04-13 13:08 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, Oleg Nesterov, David Rientjes, linux-mm, LKML

On Wed 13-04-16 20:04:54, Tetsuo Handa wrote:
> On 2016/04/12 18:19, Michal Hocko wrote:
[...]
> > Hi,
> > I hope I got it right but I would really appreciate if Oleg found some
> > time and double checked after me. The fix is more cosmetic than anything
> > else but I guess it is worth it.
> 
> I don't know what
> 
>     fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
>     out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
> 
> in commit d003f371b270 is saying (how SIGNAL_GROUP_COREDUMP can make
> fatal_signal_pending() true when fatal_signal_pending() is defined as

I guess this is about zap_process() but Olge would be more appropriate
to clarify. Anyway I fail to see how this is realted to this particular
patch.

[...]

> > diff --git a/include/linux/oom.h b/include/linux/oom.h
> > index 628a43242a34..b09c7dc523ff 100644
> > --- a/include/linux/oom.h
> > +++ b/include/linux/oom.h
> > @@ -102,13 +102,24 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
> >  
> >  static inline bool task_will_free_mem(struct task_struct *task)
> >  {
> > +	struct signal_struct *sig = task->signal;
> > +
> >  	/*
> >  	 * A coredumping process may sleep for an extended period in exit_mm(),
> >  	 * so the oom killer cannot assume that the process will promptly exit
> >  	 * and release memory.
> >  	 */
> > -	return (task->flags & PF_EXITING) &&
> > -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> > +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> > +		return false;
> > +
> > +	if (!(task->flags & PF_EXITING))
> > +		return false;
> > +
> > +	/* Make sure that the whole thread group is going down */
> > +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> > +		return false;
> 
> The whole thread group is going down does not mean we make sure that
> we will send SIGKILL to other thread groups sharing the same memory which
> is possibly holding mmap_sem for write, does it?

And the patch description doesn't say anything about processes sharing
mm. This is supposed to be a minor fix of an obviously suboptimal
behavior of task_will_free_mem. Can we stick to the proposed patch,
please?

If we really do care about processes sharing mm _that_much_ then it
should be handled in the separate patch.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-04-13 13:08     ` Michal Hocko
@ 2016-04-13 13:27       ` Tetsuo Handa
  -1 siblings, 0 replies; 24+ messages in thread
From: Tetsuo Handa @ 2016-04-13 13:27 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, oleg, rientjes, linux-mm, linux-kernel

Michal Hocko wrote:
> > The whole thread group is going down does not mean we make sure that
> > we will send SIGKILL to other thread groups sharing the same memory which
> > is possibly holding mmap_sem for write, does it?
> 
> And the patch description doesn't say anything about processes sharing
> mm. This is supposed to be a minor fix of an obviously suboptimal
> behavior of task_will_free_mem. Can we stick to the proposed patch,
> please?
> 
> If we really do care about processes sharing mm _that_much_ then it
> should be handled in the separate patch.

I do care. The OOM reaper cannot work unless SIGKILL is sent to a thread
which is holding mmap_sem for write. Thus, sending SIGKILL to all thread
groups sharing the mm is needed by your down_write_killable(&mm->mmap_sem)
changes. Like I wrote at
http://lkml.kernel.org/r/201604092300.BDI39040.FFSQLJOMHOOVtF@I-love.SAKURA.ne.jp ,
we cannot fix that problem unless you accept the slowpath.

I don't like you don't explain your approach for handling the slowpath.
If you explain your approach for handling the slowpath and I agree on
your approach, I will also agree on the proposed patches.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-04-13 13:27       ` Tetsuo Handa
  0 siblings, 0 replies; 24+ messages in thread
From: Tetsuo Handa @ 2016-04-13 13:27 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, oleg, rientjes, linux-mm, linux-kernel

Michal Hocko wrote:
> > The whole thread group is going down does not mean we make sure that
> > we will send SIGKILL to other thread groups sharing the same memory which
> > is possibly holding mmap_sem for write, does it?
> 
> And the patch description doesn't say anything about processes sharing
> mm. This is supposed to be a minor fix of an obviously suboptimal
> behavior of task_will_free_mem. Can we stick to the proposed patch,
> please?
> 
> If we really do care about processes sharing mm _that_much_ then it
> should be handled in the separate patch.

I do care. The OOM reaper cannot work unless SIGKILL is sent to a thread
which is holding mmap_sem for write. Thus, sending SIGKILL to all thread
groups sharing the mm is needed by your down_write_killable(&mm->mmap_sem)
changes. Like I wrote at
http://lkml.kernel.org/r/201604092300.BDI39040.FFSQLJOMHOOVtF@I-love.SAKURA.ne.jp ,
we cannot fix that problem unless you accept the slowpath.

I don't like you don't explain your approach for handling the slowpath.
If you explain your approach for handling the slowpath and I agree on
your approach, I will also agree on the proposed patches.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-04-13 13:27       ` Tetsuo Handa
@ 2016-04-13 13:45         ` Michal Hocko
  -1 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-04-13 13:45 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, oleg, rientjes, linux-mm, linux-kernel

On Wed 13-04-16 22:27:52, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > The whole thread group is going down does not mean we make sure that
> > > we will send SIGKILL to other thread groups sharing the same memory which
> > > is possibly holding mmap_sem for write, does it?
> > 
> > And the patch description doesn't say anything about processes sharing
> > mm. This is supposed to be a minor fix of an obviously suboptimal
> > behavior of task_will_free_mem. Can we stick to the proposed patch,
> > please?
> > 
> > If we really do care about processes sharing mm _that_much_ then it
> > should be handled in the separate patch.
> 
> I do care.

then feel free to post a patch. I believe such a change should be
handled in a separate patch. I have intentionally layed out the code
in a way to allow further checks easily.

Separate processes sharing the same mm have lower priority for me
because I do not know of any recent userspace which would use this
strange threading model or do you have anything specific in mind which
would make it more real-life? We will get to this eventually.

> The OOM reaper cannot work unless SIGKILL is sent to a thread
> which is holding mmap_sem for write. Thus, sending SIGKILL to all thread
> groups sharing the mm is needed by your down_write_killable(&mm->mmap_sem)
> changes. Like I wrote at
> http://lkml.kernel.org/r/201604092300.BDI39040.FFSQLJOMHOOVtF@I-love.SAKURA.ne.jp ,
> we cannot fix that problem unless you accept the slowpath.
> 
> I don't like you don't explain your approach for handling the slowpath.
> If you explain your approach for handling the slowpath and I agree on
> your approach, I will also agree on the proposed patches.

I would much appreciate if you _stopped_ conflating different things
together. This is just generating a lot of fuzz and slows the overal
progress.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-04-13 13:45         ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-04-13 13:45 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, oleg, rientjes, linux-mm, linux-kernel

On Wed 13-04-16 22:27:52, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > The whole thread group is going down does not mean we make sure that
> > > we will send SIGKILL to other thread groups sharing the same memory which
> > > is possibly holding mmap_sem for write, does it?
> > 
> > And the patch description doesn't say anything about processes sharing
> > mm. This is supposed to be a minor fix of an obviously suboptimal
> > behavior of task_will_free_mem. Can we stick to the proposed patch,
> > please?
> > 
> > If we really do care about processes sharing mm _that_much_ then it
> > should be handled in the separate patch.
> 
> I do care.

then feel free to post a patch. I believe such a change should be
handled in a separate patch. I have intentionally layed out the code
in a way to allow further checks easily.

Separate processes sharing the same mm have lower priority for me
because I do not know of any recent userspace which would use this
strange threading model or do you have anything specific in mind which
would make it more real-life? We will get to this eventually.

> The OOM reaper cannot work unless SIGKILL is sent to a thread
> which is holding mmap_sem for write. Thus, sending SIGKILL to all thread
> groups sharing the mm is needed by your down_write_killable(&mm->mmap_sem)
> changes. Like I wrote at
> http://lkml.kernel.org/r/201604092300.BDI39040.FFSQLJOMHOOVtF@I-love.SAKURA.ne.jp ,
> we cannot fix that problem unless you accept the slowpath.
> 
> I don't like you don't explain your approach for handling the slowpath.
> If you explain your approach for handling the slowpath and I agree on
> your approach, I will also agree on the proposed patches.

I would much appreciate if you _stopped_ conflating different things
together. This is just generating a lot of fuzz and slows the overal
progress.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-04-12  9:19 ` Michal Hocko
@ 2016-04-26 13:57   ` Michal Hocko
  -1 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-04-26 13:57 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov; +Cc: David Rientjes, linux-mm, LKML

On Tue 12-04-16 11:19:16, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> task_will_free_mem is a misnomer for a more complex PF_EXITING test
> for early break out from the oom killer because it is believed that
> such a task would release its memory shortly and so we do not have
> to select an oom victim and perform a disruptive action.
> 
> Currently we make sure that the given task is not participating in the
> core dumping because it might get blocked for a long time - see
> d003f371b270 ("oom: don't assume that a coredumping thread will exit
> soon").
> 
> The check can still do better though. We shouldn't consider the task
> unless the whole thread group is going down. This is rather unlikely
> but not impossible. A single exiting thread would surely leave all the
> address space behind. If we are really unlucky it might get stuck on the
> exit path and keep its TIF_MEMDIE and so block the oom killer.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> I hope I got it right but I would really appreciate if Oleg found some
> time and double checked after me. The fix is more cosmetic than anything
> else but I guess it is worth it.

ping...

> 
> Thanks!
> 
>  include/linux/oom.h | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 628a43242a34..b09c7dc523ff 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -102,13 +102,24 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>  
>  static inline bool task_will_free_mem(struct task_struct *task)
>  {
> +	struct signal_struct *sig = task->signal;
> +
>  	/*
>  	 * A coredumping process may sleep for an extended period in exit_mm(),
>  	 * so the oom killer cannot assume that the process will promptly exit
>  	 * and release memory.
>  	 */
> -	return (task->flags & PF_EXITING) &&
> -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> +		return false;
> +
> +	if (!(task->flags & PF_EXITING))
> +		return false;
> +
> +	/* Make sure that the whole thread group is going down */
> +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> +		return false;
> +
> +	return true;
>  }
>  
>  /* sysctls */
> -- 
> 2.8.0.rc3
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-04-26 13:57   ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-04-26 13:57 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov; +Cc: David Rientjes, linux-mm, LKML

On Tue 12-04-16 11:19:16, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> task_will_free_mem is a misnomer for a more complex PF_EXITING test
> for early break out from the oom killer because it is believed that
> such a task would release its memory shortly and so we do not have
> to select an oom victim and perform a disruptive action.
> 
> Currently we make sure that the given task is not participating in the
> core dumping because it might get blocked for a long time - see
> d003f371b270 ("oom: don't assume that a coredumping thread will exit
> soon").
> 
> The check can still do better though. We shouldn't consider the task
> unless the whole thread group is going down. This is rather unlikely
> but not impossible. A single exiting thread would surely leave all the
> address space behind. If we are really unlucky it might get stuck on the
> exit path and keep its TIF_MEMDIE and so block the oom killer.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> I hope I got it right but I would really appreciate if Oleg found some
> time and double checked after me. The fix is more cosmetic than anything
> else but I guess it is worth it.

ping...

> 
> Thanks!
> 
>  include/linux/oom.h | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 628a43242a34..b09c7dc523ff 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -102,13 +102,24 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>  
>  static inline bool task_will_free_mem(struct task_struct *task)
>  {
> +	struct signal_struct *sig = task->signal;
> +
>  	/*
>  	 * A coredumping process may sleep for an extended period in exit_mm(),
>  	 * so the oom killer cannot assume that the process will promptly exit
>  	 * and release memory.
>  	 */
> -	return (task->flags & PF_EXITING) &&
> -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> +		return false;
> +
> +	if (!(task->flags & PF_EXITING))
> +		return false;
> +
> +	/* Make sure that the whole thread group is going down */
> +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> +		return false;
> +
> +	return true;
>  }
>  
>  /* sysctls */
> -- 
> 2.8.0.rc3
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-04-13 13:08     ` Michal Hocko
@ 2016-05-17 18:06       ` Oleg Nesterov
  -1 siblings, 0 replies; 24+ messages in thread
From: Oleg Nesterov @ 2016-05-17 18:06 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tetsuo Handa, Andrew Morton, David Rientjes, linux-mm, LKML

On 04/13, Michal Hocko wrote:
>
> On Wed 13-04-16 20:04:54, Tetsuo Handa wrote:
> > On 2016/04/12 18:19, Michal Hocko wrote:
> [...]
> > > Hi,
> > > I hope I got it right but I would really appreciate if Oleg found some
> > > time and double checked after me. The fix is more cosmetic than anything
> > > else but I guess it is worth it.
> >
> > I don't know what
> >
> >     fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
> >     out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
> >
> > in commit d003f371b270 is saying (how SIGNAL_GROUP_COREDUMP can make
> > fatal_signal_pending() true when fatal_signal_pending() is defined as
>
> I guess this is about zap_process() but Olge would be more appropriate
> to clarify.


Yes, exactly, the dumper sends SIGKILL to other CLONE_THREAD and/or CLONE_VM
threads.

so I think the patch is fine, but let me write another email...

Oleg.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-05-17 18:06       ` Oleg Nesterov
  0 siblings, 0 replies; 24+ messages in thread
From: Oleg Nesterov @ 2016-05-17 18:06 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tetsuo Handa, Andrew Morton, David Rientjes, linux-mm, LKML

On 04/13, Michal Hocko wrote:
>
> On Wed 13-04-16 20:04:54, Tetsuo Handa wrote:
> > On 2016/04/12 18:19, Michal Hocko wrote:
> [...]
> > > Hi,
> > > I hope I got it right but I would really appreciate if Oleg found some
> > > time and double checked after me. The fix is more cosmetic than anything
> > > else but I guess it is worth it.
> >
> > I don't know what
> >
> >     fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
> >     out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
> >
> > in commit d003f371b270 is saying (how SIGNAL_GROUP_COREDUMP can make
> > fatal_signal_pending() true when fatal_signal_pending() is defined as
>
> I guess this is about zap_process() but Olge would be more appropriate
> to clarify.


Yes, exactly, the dumper sends SIGKILL to other CLONE_THREAD and/or CLONE_VM
threads.

so I think the patch is fine, but let me write another email...

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-04-12  9:19 ` Michal Hocko
@ 2016-05-17 18:42   ` Oleg Nesterov
  -1 siblings, 0 replies; 24+ messages in thread
From: Oleg Nesterov @ 2016-05-17 18:42 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, David Rientjes, linux-mm, LKML, Michal Hocko

On 04/12, Michal Hocko wrote:
>
> We shouldn't consider the task
> unless the whole thread group is going down.

Yes, agreed. I'd even say that oom-killer should never look at individual
task/threads, it should work with mm's. And one of the big mistakes (imo)
was the s/for_each_process/for_each_thread/ change in select_bad_process()
a while ago.

Michal, I won't even try to actually review this patch, I lost any hope
to understand OOM-killer a long ago ;) But I do agree with this change,
we obviously should not rely on PF_EXITING.

>  static inline bool task_will_free_mem(struct task_struct *task)
>  {
> +	struct signal_struct *sig = task->signal;
> +
>  	/*
>  	 * A coredumping process may sleep for an extended period in exit_mm(),
>  	 * so the oom killer cannot assume that the process will promptly exit
>  	 * and release memory.
>  	 */
> -	return (task->flags & PF_EXITING) &&
> -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> +		return false;
> +
> +	if (!(task->flags & PF_EXITING))
> +		return false;
> +
> +	/* Make sure that the whole thread group is going down */
> +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> +		return false;
> +
> +	return true;

So this looks certainly better to me, but perhaps it should do

	if (SIGNAL_GROUP_COREDUMP)
		return false;

	if (SIGNAL_GROUP_EXIT)
		return true;

	if (thread_group_empty() && PF_EXITING)
		return true;

	return false;

?

I won't insist, I do not even know if this would be better or not. But if
SIGNAL_GROUP_EXIT is set all sub-threads should go away even if PF_EXITING
is not set yet because this task didn't dequeue SIGKILL yet.

Up to you in any case.

Oleg.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-05-17 18:42   ` Oleg Nesterov
  0 siblings, 0 replies; 24+ messages in thread
From: Oleg Nesterov @ 2016-05-17 18:42 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, David Rientjes, linux-mm, LKML, Michal Hocko

On 04/12, Michal Hocko wrote:
>
> We shouldn't consider the task
> unless the whole thread group is going down.

Yes, agreed. I'd even say that oom-killer should never look at individual
task/threads, it should work with mm's. And one of the big mistakes (imo)
was the s/for_each_process/for_each_thread/ change in select_bad_process()
a while ago.

Michal, I won't even try to actually review this patch, I lost any hope
to understand OOM-killer a long ago ;) But I do agree with this change,
we obviously should not rely on PF_EXITING.

>  static inline bool task_will_free_mem(struct task_struct *task)
>  {
> +	struct signal_struct *sig = task->signal;
> +
>  	/*
>  	 * A coredumping process may sleep for an extended period in exit_mm(),
>  	 * so the oom killer cannot assume that the process will promptly exit
>  	 * and release memory.
>  	 */
> -	return (task->flags & PF_EXITING) &&
> -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> +		return false;
> +
> +	if (!(task->flags & PF_EXITING))
> +		return false;
> +
> +	/* Make sure that the whole thread group is going down */
> +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> +		return false;
> +
> +	return true;

So this looks certainly better to me, but perhaps it should do

	if (SIGNAL_GROUP_COREDUMP)
		return false;

	if (SIGNAL_GROUP_EXIT)
		return true;

	if (thread_group_empty() && PF_EXITING)
		return true;

	return false;

?

I won't insist, I do not even know if this would be better or not. But if
SIGNAL_GROUP_EXIT is set all sub-threads should go away even if PF_EXITING
is not set yet because this task didn't dequeue SIGKILL yet.

Up to you in any case.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-05-17 18:42   ` Oleg Nesterov
@ 2016-05-17 20:25     ` Michal Hocko
  -1 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-05-17 20:25 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, David Rientjes, linux-mm, LKML

On Tue 17-05-16 20:42:25, Oleg Nesterov wrote:
> On 04/12, Michal Hocko wrote:
> >
> > We shouldn't consider the task
> > unless the whole thread group is going down.
> 
> Yes, agreed. I'd even say that oom-killer should never look at individual
> task/threads, it should work with mm's. And one of the big mistakes (imo)
> was the s/for_each_process/for_each_thread/ change in select_bad_process()
> a while ago.
> 
> Michal, I won't even try to actually review this patch, I lost any hope
> to understand OOM-killer a long ago ;) But I do agree with this change,
> we obviously should not rely on PF_EXITING.
> 
> >  static inline bool task_will_free_mem(struct task_struct *task)
> >  {
> > +	struct signal_struct *sig = task->signal;
> > +
> >  	/*
> >  	 * A coredumping process may sleep for an extended period in exit_mm(),
> >  	 * so the oom killer cannot assume that the process will promptly exit
> >  	 * and release memory.
> >  	 */
> > -	return (task->flags & PF_EXITING) &&
> > -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> > +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> > +		return false;
> > +
> > +	if (!(task->flags & PF_EXITING))
> > +		return false;
> > +
> > +	/* Make sure that the whole thread group is going down */
> > +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> > +		return false;
> > +
> > +	return true;
> 
> So this looks certainly better to me, but perhaps it should do
> 
> 	if (SIGNAL_GROUP_COREDUMP)
> 		return false;
> 
> 	if (SIGNAL_GROUP_EXIT)
> 		return true;
> 
> 	if (thread_group_empty() && PF_EXITING)
> 		return true;
> 
> 	return false;
> 
> ?
> 
> I won't insist, I do not even know if this would be better or not. But if
> SIGNAL_GROUP_EXIT is set all sub-threads should go away even if PF_EXITING
> is not set yet because this task didn't dequeue SIGKILL yet.
> 
> Up to you in any case.

I have structured the checks this way because I expect I would like to
have all early break outs as false. This will help when we want to
extend those with further more specific checks. E.g. if the task is
sharing the mm with another thread group.

Anyway thanks for the review Oleg!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-05-17 20:25     ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-05-17 20:25 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, David Rientjes, linux-mm, LKML

On Tue 17-05-16 20:42:25, Oleg Nesterov wrote:
> On 04/12, Michal Hocko wrote:
> >
> > We shouldn't consider the task
> > unless the whole thread group is going down.
> 
> Yes, agreed. I'd even say that oom-killer should never look at individual
> task/threads, it should work with mm's. And one of the big mistakes (imo)
> was the s/for_each_process/for_each_thread/ change in select_bad_process()
> a while ago.
> 
> Michal, I won't even try to actually review this patch, I lost any hope
> to understand OOM-killer a long ago ;) But I do agree with this change,
> we obviously should not rely on PF_EXITING.
> 
> >  static inline bool task_will_free_mem(struct task_struct *task)
> >  {
> > +	struct signal_struct *sig = task->signal;
> > +
> >  	/*
> >  	 * A coredumping process may sleep for an extended period in exit_mm(),
> >  	 * so the oom killer cannot assume that the process will promptly exit
> >  	 * and release memory.
> >  	 */
> > -	return (task->flags & PF_EXITING) &&
> > -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> > +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> > +		return false;
> > +
> > +	if (!(task->flags & PF_EXITING))
> > +		return false;
> > +
> > +	/* Make sure that the whole thread group is going down */
> > +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> > +		return false;
> > +
> > +	return true;
> 
> So this looks certainly better to me, but perhaps it should do
> 
> 	if (SIGNAL_GROUP_COREDUMP)
> 		return false;
> 
> 	if (SIGNAL_GROUP_EXIT)
> 		return true;
> 
> 	if (thread_group_empty() && PF_EXITING)
> 		return true;
> 
> 	return false;
> 
> ?
> 
> I won't insist, I do not even know if this would be better or not. But if
> SIGNAL_GROUP_EXIT is set all sub-threads should go away even if PF_EXITING
> is not set yet because this task didn't dequeue SIGKILL yet.
> 
> Up to you in any case.

I have structured the checks this way because I expect I would like to
have all early break outs as false. This will help when we want to
extend those with further more specific checks. E.g. if the task is
sharing the mm with another thread group.

Anyway thanks for the review Oleg!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-04-26 13:57   ` Michal Hocko
@ 2016-05-17 20:28     ` Michal Hocko
  -1 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-05-17 20:28 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov; +Cc: David Rientjes, linux-mm, LKML

On Tue 26-04-16 15:57:52, Michal Hocko wrote:
> On Tue 12-04-16 11:19:16, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > task_will_free_mem is a misnomer for a more complex PF_EXITING test
> > for early break out from the oom killer because it is believed that
> > such a task would release its memory shortly and so we do not have
> > to select an oom victim and perform a disruptive action.
> > 
> > Currently we make sure that the given task is not participating in the
> > core dumping because it might get blocked for a long time - see
> > d003f371b270 ("oom: don't assume that a coredumping thread will exit
> > soon").
> > 
> > The check can still do better though. We shouldn't consider the task
> > unless the whole thread group is going down. This is rather unlikely
> > but not impossible. A single exiting thread would surely leave all the
> > address space behind. If we are really unlucky it might get stuck on the
> > exit path and keep its TIF_MEMDIE and so block the oom killer.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> > 
> > Hi,
> > I hope I got it right but I would really appreciate if Oleg found some
> > time and double checked after me. The fix is more cosmetic than anything
> > else but I guess it is worth it.
> 
> ping...

Andrew, this is not in the mmotm tree now because I didn't feel really
confortable with the patch without Oleg seeing it. But it seems Oleg is
ok [1] with it so could you push it to Linus along with the rest of oom
pile please?

[1] http://lkml.kernel.org/r/20160517184225.GB32068@redhat.com

> > 
> > Thanks!
> > 
> >  include/linux/oom.h | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/oom.h b/include/linux/oom.h
> > index 628a43242a34..b09c7dc523ff 100644
> > --- a/include/linux/oom.h
> > +++ b/include/linux/oom.h
> > @@ -102,13 +102,24 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
> >  
> >  static inline bool task_will_free_mem(struct task_struct *task)
> >  {
> > +	struct signal_struct *sig = task->signal;
> > +
> >  	/*
> >  	 * A coredumping process may sleep for an extended period in exit_mm(),
> >  	 * so the oom killer cannot assume that the process will promptly exit
> >  	 * and release memory.
> >  	 */
> > -	return (task->flags & PF_EXITING) &&
> > -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> > +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> > +		return false;
> > +
> > +	if (!(task->flags & PF_EXITING))
> > +		return false;
> > +
> > +	/* Make sure that the whole thread group is going down */
> > +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> > +		return false;
> > +
> > +	return true;
> >  }
> >  
> >  /* sysctls */
> > -- 
> > 2.8.0.rc3
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-05-17 20:28     ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-05-17 20:28 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov; +Cc: David Rientjes, linux-mm, LKML

On Tue 26-04-16 15:57:52, Michal Hocko wrote:
> On Tue 12-04-16 11:19:16, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > task_will_free_mem is a misnomer for a more complex PF_EXITING test
> > for early break out from the oom killer because it is believed that
> > such a task would release its memory shortly and so we do not have
> > to select an oom victim and perform a disruptive action.
> > 
> > Currently we make sure that the given task is not participating in the
> > core dumping because it might get blocked for a long time - see
> > d003f371b270 ("oom: don't assume that a coredumping thread will exit
> > soon").
> > 
> > The check can still do better though. We shouldn't consider the task
> > unless the whole thread group is going down. This is rather unlikely
> > but not impossible. A single exiting thread would surely leave all the
> > address space behind. If we are really unlucky it might get stuck on the
> > exit path and keep its TIF_MEMDIE and so block the oom killer.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> > 
> > Hi,
> > I hope I got it right but I would really appreciate if Oleg found some
> > time and double checked after me. The fix is more cosmetic than anything
> > else but I guess it is worth it.
> 
> ping...

Andrew, this is not in the mmotm tree now because I didn't feel really
confortable with the patch without Oleg seeing it. But it seems Oleg is
ok [1] with it so could you push it to Linus along with the rest of oom
pile please?

[1] http://lkml.kernel.org/r/20160517184225.GB32068@redhat.com

> > 
> > Thanks!
> > 
> >  include/linux/oom.h | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/oom.h b/include/linux/oom.h
> > index 628a43242a34..b09c7dc523ff 100644
> > --- a/include/linux/oom.h
> > +++ b/include/linux/oom.h
> > @@ -102,13 +102,24 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p);
> >  
> >  static inline bool task_will_free_mem(struct task_struct *task)
> >  {
> > +	struct signal_struct *sig = task->signal;
> > +
> >  	/*
> >  	 * A coredumping process may sleep for an extended period in exit_mm(),
> >  	 * so the oom killer cannot assume that the process will promptly exit
> >  	 * and release memory.
> >  	 */
> > -	return (task->flags & PF_EXITING) &&
> > -		!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
> > +	if (sig->flags & SIGNAL_GROUP_COREDUMP)
> > +		return false;
> > +
> > +	if (!(task->flags & PF_EXITING))
> > +		return false;
> > +
> > +	/* Make sure that the whole thread group is going down */
> > +	if (!thread_group_empty(task) && !(sig->flags & SIGNAL_GROUP_EXIT))
> > +		return false;
> > +
> > +	return true;
> >  }
> >  
> >  /* sysctls */
> > -- 
> > 2.8.0.rc3
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-05-17 20:28     ` Michal Hocko
@ 2016-05-17 22:21       ` Andrew Morton
  -1 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2016-05-17 22:21 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Oleg Nesterov, David Rientjes, linux-mm, LKML, Joonsoo Kim

On Tue, 17 May 2016 22:28:56 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> Andrew, this is not in the mmotm tree now because I didn't feel really
> confortable with the patch without Oleg seeing it. But it seems Oleg is
> ok [1] with it so could you push it to Linus along with the rest of oom
> pile please?

Reluctant.  The CONFIG_COMPACTION=n regression which Joonsoo identified
is quite severe.  Before patch: 10000 forks succeed.  After patch: 500
forks fail.  Ouch.

How can we merge such a thing?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-05-17 22:21       ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2016-05-17 22:21 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Oleg Nesterov, David Rientjes, linux-mm, LKML, Joonsoo Kim

On Tue, 17 May 2016 22:28:56 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> Andrew, this is not in the mmotm tree now because I didn't feel really
> confortable with the patch without Oleg seeing it. But it seems Oleg is
> ok [1] with it so could you push it to Linus along with the rest of oom
> pile please?

Reluctant.  The CONFIG_COMPACTION=n regression which Joonsoo identified
is quite severe.  Before patch: 10000 forks succeed.  After patch: 500
forks fail.  Ouch.

How can we merge such a thing?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
  2016-05-17 22:21       ` Andrew Morton
@ 2016-05-18  7:16         ` Michal Hocko
  -1 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-05-18  7:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Oleg Nesterov, David Rientjes, linux-mm, LKML, Joonsoo Kim

On Tue 17-05-16 15:21:39, Andrew Morton wrote:
> On Tue, 17 May 2016 22:28:56 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > Andrew, this is not in the mmotm tree now because I didn't feel really
> > confortable with the patch without Oleg seeing it. But it seems Oleg is
> > ok [1] with it so could you push it to Linus along with the rest of oom
> > pile please?
> 
> Reluctant.  The CONFIG_COMPACTION=n regression which Joonsoo identified
> is quite severe.  Before patch: 10000 forks succeed.  After patch: 500
> forks fail.  Ouch.
> 
> How can we merge such a thing?

That regression has been fixed by
http://lkml.kernel.org/r/1463051677-29418-3-git-send-email-mhocko@kernel.org

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] oom: consider multi-threaded tasks in task_will_free_mem
@ 2016-05-18  7:16         ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-05-18  7:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Oleg Nesterov, David Rientjes, linux-mm, LKML, Joonsoo Kim

On Tue 17-05-16 15:21:39, Andrew Morton wrote:
> On Tue, 17 May 2016 22:28:56 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > Andrew, this is not in the mmotm tree now because I didn't feel really
> > confortable with the patch without Oleg seeing it. But it seems Oleg is
> > ok [1] with it so could you push it to Linus along with the rest of oom
> > pile please?
> 
> Reluctant.  The CONFIG_COMPACTION=n regression which Joonsoo identified
> is quite severe.  Before patch: 10000 forks succeed.  After patch: 500
> forks fail.  Ouch.
> 
> How can we merge such a thing?

That regression has been fixed by
http://lkml.kernel.org/r/1463051677-29418-3-git-send-email-mhocko@kernel.org

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-05-18  7:16 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-12  9:19 [PATCH] oom: consider multi-threaded tasks in task_will_free_mem Michal Hocko
2016-04-12  9:19 ` Michal Hocko
2016-04-13 11:04 ` Tetsuo Handa
2016-04-13 11:04   ` Tetsuo Handa
2016-04-13 13:08   ` Michal Hocko
2016-04-13 13:08     ` Michal Hocko
2016-04-13 13:27     ` Tetsuo Handa
2016-04-13 13:27       ` Tetsuo Handa
2016-04-13 13:45       ` Michal Hocko
2016-04-13 13:45         ` Michal Hocko
2016-05-17 18:06     ` Oleg Nesterov
2016-05-17 18:06       ` Oleg Nesterov
2016-04-26 13:57 ` Michal Hocko
2016-04-26 13:57   ` Michal Hocko
2016-05-17 20:28   ` Michal Hocko
2016-05-17 20:28     ` Michal Hocko
2016-05-17 22:21     ` Andrew Morton
2016-05-17 22:21       ` Andrew Morton
2016-05-18  7:16       ` Michal Hocko
2016-05-18  7:16         ` Michal Hocko
2016-05-17 18:42 ` Oleg Nesterov
2016-05-17 18:42   ` Oleg Nesterov
2016-05-17 20:25   ` Michal Hocko
2016-05-17 20:25     ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.