All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm, oom: fix for hiding mm which is shared with kthread or global init
@ 2016-07-16  5:30 Tetsuo Handa
  2016-07-18  7:18 ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2016-07-16  5:30 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Tetsuo Handa, Michal Hocko, Oleg Nesterov, Vladimir Davydov,
	David Rientjes

Patch "mm, oom: hide mm which is shared with kthread or global init" tried
to guarantee a forward progress for the OOM killer even when the selected
victim is sharing memory with a kernel thread or global init, but a race
scenario still remains because it did not add a call to exit_oom_victim()
in oom_kill_process() in order to avoid a problem which is already worked
around by commit 74070542099c66d8 ("oom, suspend: fix oom_reaper vs.
oom_killer_disable race").

The race scenario is that a !can_oom_reap TIF_MEMDIE thread becomes
the only user of that mm (i.e. mm->mm_users drops to 1) and is later
blocked for unbounded period at __mmput() from mmput() from
exit_mm() from do_exit() by hitting e.g.

  (1) First round of OOM killer invocation starts.
  (2) select_bad_process() chooses P1 as an OOM victim because
      oom_scan_process_thread() does not find existing victims.
  (3) oom_kill_process() sets TIF_MEMDIE on P1, but does not put P1 under
      the OOM reaper's supervision due to (p->flags & PF_KTHREAD) being
      true, and instead sets MMF_OOM_REAPED on the P1's mm.
  (4) First round of OOM killer invocation finishes.
  (5) P1 is unable to arrive at do_exit() due to being blocked at
      unkillable event waiting for somebody else's memory allocation.
  (6) Second round of OOM killer invocation starts.
  (7) select_bad_process() chooses P2 as an OOM victim because
      oom_scan_process_thread() finds P1's mm with MMF_OOM_REAPED set.
  (8) oom_kill_process() sets TIF_MEMDIE on P2 via mark_oom_victim(),
      and puts P2 under the OOM reaper's supervision due to
      (p->flags & PF_KTHREAD) being false.
  (9) Second round of OOM killer invocation finishes.
  (10) The OOM reaper reaps P2's mm, and sets MMF_OOM_REAPED to
       P2's mm, and clears TIF_MEMDIE from P2.
  (11) Regarding P1's mm, (p->flags & PF_KTHREAD) becomes false because
       somebody else's memory allocation succeeds and unuse_mm(P1->mm)
       is called. At this point P1 becomes the only user of P1->mm.
  (12) P1 arrives at do_exit() due to no longer being blocked at
       unkillable event waiting for somebody else's memory allocation.
  (13) P1 reaches P1->mm = NULL line in exit_mm() from do_exit().
  (14) P1 is blocked at __mmput().
  (15) Third round of OOM killer invocation starts.
  (16) select_bad_process() does not choose new OOM victim because
       oom_scan_process_thread() fails to find P1's mm while
       P1->signal->oom_victims > 0.
  (17) Third round of OOM killer invocation finishes.
  (18) OOM livelock happens because nobody will clear TIF_MEMDIE from
       P1 (and decrement P1->signal->oom_victims) while P1 is blocked
       at __mmput().

sequence, but the patch "mm, oom: hide mm which is shared with kthread
or global init" is failing to return OOM_SCAN_CONTINUE when we hit
atomic_read(&task->signal->oom_victims) != 0 &&
find_lock_task_mm(task) == NULL in oom_scan_process_thread().

Long term we are planning to change oom_scan_process_thread() not to
depend on atomic_read(&task->signal->oom_victims) != 0 &&
find_lock_task_mm(task) != NULL, and remove exit_oom_victim() from
oom_kill_process() and oom_reap_task() along with signal->oom_victims
and commit 74070542099c66d8. But since we did not complete such changes
in time for 4.8 merge window, let's rely on commit 74070542099c66d8
for now in order to guarantee a forward progress for the OOM killer.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7d0a275..041373e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -922,6 +922,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 			 */
 			can_oom_reap = false;
 			set_bit(MMF_OOM_REAPED, &mm->flags);
+			exit_oom_victim(victim);
 			pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n",
 					task_pid_nr(victim), victim->comm,
 					task_pid_nr(p), p->comm);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, oom: fix for hiding mm which is shared with kthread or global init
  2016-07-16  5:30 [PATCH] mm, oom: fix for hiding mm which is shared with kthread or global init Tetsuo Handa
@ 2016-07-18  7:18 ` Michal Hocko
  2016-07-18 21:30   ` [PATCH] mm, oom: fix for hiding mm which is shared with kthreador " Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-07-18  7:18 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-mm, akpm, Oleg Nesterov, Vladimir Davydov, David Rientjes

On Sat 16-07-16 14:30:04, Tetsuo Handa wrote:
> Patch "mm, oom: hide mm which is shared with kthread or global init" tried
> to guarantee a forward progress for the OOM killer even when the selected
> victim is sharing memory with a kernel thread or global init, but a race
> scenario still remains because it did not add a call to exit_oom_victim()
> in oom_kill_process() in order to avoid a problem which is already worked
> around by commit 74070542099c66d8 ("oom, suspend: fix oom_reaper vs.
> oom_killer_disable race").
> 
> The race scenario is that a !can_oom_reap TIF_MEMDIE thread becomes
> the only user of that mm (i.e. mm->mm_users drops to 1) and is later
> blocked for unbounded period at __mmput() from mmput() from
> exit_mm() from do_exit() by hitting e.g.
> 
>   (1) First round of OOM killer invocation starts.
>   (2) select_bad_process() chooses P1 as an OOM victim because
>       oom_scan_process_thread() does not find existing victims.
>   (3) oom_kill_process() sets TIF_MEMDIE on P1, but does not put P1 under
>       the OOM reaper's supervision due to (p->flags & PF_KTHREAD) being
>       true, and instead sets MMF_OOM_REAPED on the P1's mm.
>   (4) First round of OOM killer invocation finishes.
>   (5) P1 is unable to arrive at do_exit() due to being blocked at
>       unkillable event waiting for somebody else's memory allocation.
>   (6) Second round of OOM killer invocation starts.
>   (7) select_bad_process() chooses P2 as an OOM victim because
>       oom_scan_process_thread() finds P1's mm with MMF_OOM_REAPED set.
>   (8) oom_kill_process() sets TIF_MEMDIE on P2 via mark_oom_victim(),
>       and puts P2 under the OOM reaper's supervision due to
>       (p->flags & PF_KTHREAD) being false.
>   (9) Second round of OOM killer invocation finishes.
>   (10) The OOM reaper reaps P2's mm, and sets MMF_OOM_REAPED to
>        P2's mm, and clears TIF_MEMDIE from P2.
>   (11) Regarding P1's mm, (p->flags & PF_KTHREAD) becomes false because
>        somebody else's memory allocation succeeds and unuse_mm(P1->mm)
>        is called. At this point P1 becomes the only user of P1->mm.
>   (12) P1 arrives at do_exit() due to no longer being blocked at
>        unkillable event waiting for somebody else's memory allocation.
>   (13) P1 reaches P1->mm = NULL line in exit_mm() from do_exit().
>   (14) P1 is blocked at __mmput().
>   (15) Third round of OOM killer invocation starts.
>   (16) select_bad_process() does not choose new OOM victim because
>        oom_scan_process_thread() fails to find P1's mm while
>        P1->signal->oom_victims > 0.
>   (17) Third round of OOM killer invocation finishes.
>   (18) OOM livelock happens because nobody will clear TIF_MEMDIE from
>        P1 (and decrement P1->signal->oom_victims) while P1 is blocked
>        at __mmput().
> 
> sequence, but the patch "mm, oom: hide mm which is shared with kthread
> or global init" is failing to return OOM_SCAN_CONTINUE when we hit
> atomic_read(&task->signal->oom_victims) != 0 &&
> find_lock_task_mm(task) == NULL in oom_scan_process_thread().
> 
> Long term we are planning to change oom_scan_process_thread() not to
> depend on atomic_read(&task->signal->oom_victims) != 0 &&
> find_lock_task_mm(task) != NULL, and remove exit_oom_victim() from
> oom_kill_process() and oom_reap_task() along with signal->oom_victims
> and commit 74070542099c66d8. But since we did not complete such changes
> in time for 4.8 merge window, let's rely on commit 74070542099c66d8
> for now in order to guarantee a forward progress for the OOM killer.

I really do not think that this unlikely case really has to be handled
now. We are very likely going to move to a different model of oom victim
detection soon. So let's do not add new hacks. exit_oom_victim from
oom_kill_process just looks like sand in eyes.
 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/oom_kill.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 7d0a275..041373e 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -922,6 +922,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>  			 */
>  			can_oom_reap = false;
>  			set_bit(MMF_OOM_REAPED, &mm->flags);
> +			exit_oom_victim(victim);
>  			pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n",
>  					task_pid_nr(victim), victim->comm,
>  					task_pid_nr(p), p->comm);
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, oom: fix for hiding mm which is shared with kthreador global init
  2016-07-18  7:18 ` Michal Hocko
@ 2016-07-18 21:30   ` Tetsuo Handa
  2016-07-19  6:40     ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2016-07-18 21:30 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, akpm, oleg, vdavydov, rientjes

Michal Hocko wrote:
> I really do not think that this unlikely case really has to be handled
> now. We are very likely going to move to a different model of oom victim
> detection soon. So let's do not add new hacks. exit_oom_victim from
> oom_kill_process just looks like sand in eyes.

Then, please revert "mm, oom: hide mm which is shared with kthread or global init"
( http://lkml.kernel.org/r/1466426628-15074-11-git-send-email-mhocko@kernel.org ).
I don't like that patch because it is doing pointless find_lock_task_mm() test
and is telling a lie because it does not guarantee that we won't hit OOM livelock.
Merging patches with a known lie is sand in eyes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, oom: fix for hiding mm which is shared with kthreador global init
  2016-07-18 21:30   ` [PATCH] mm, oom: fix for hiding mm which is shared with kthreador " Tetsuo Handa
@ 2016-07-19  6:40     ` Michal Hocko
  2016-07-19  9:37       ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-07-19  6:40 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, akpm, oleg, vdavydov, rientjes

On Tue 19-07-16 06:30:42, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > I really do not think that this unlikely case really has to be handled
> > now. We are very likely going to move to a different model of oom victim
> > detection soon. So let's do not add new hacks. exit_oom_victim from
> > oom_kill_process just looks like sand in eyes.
> 
> Then, please revert "mm, oom: hide mm which is shared with kthread or global init"
> ( http://lkml.kernel.org/r/1466426628-15074-11-git-send-email-mhocko@kernel.org ).
> I don't like that patch because it is doing pointless find_lock_task_mm() test
> and is telling a lie because it does not guarantee that we won't hit OOM livelock.

The above patch doesn't make the situation worse wrt livelock. I
consider it an improvement. It adds find_lock_task_mm into
oom_scan_process_thread but that can hardly be worse than just the
task->signal->oom_victims check because we can catch MMF_OOM_REAPED. If
we are mm loss, which is a less likely case, then we behave the same as
with the previous implementation.

So I do not really see a reason to revert that patch for now.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, oom: fix for hiding mm which is shared with kthreador global init
  2016-07-19  6:40     ` Michal Hocko
@ 2016-07-19  9:37       ` Michal Hocko
  2016-07-19 10:36         ` [PATCH] mm, oom: fix for hiding mm which is shared with kthread or " Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-07-19  9:37 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, akpm, oleg, vdavydov, rientjes

On Tue 19-07-16 08:40:48, Michal Hocko wrote:
> On Tue 19-07-16 06:30:42, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > I really do not think that this unlikely case really has to be handled
> > > now. We are very likely going to move to a different model of oom victim
> > > detection soon. So let's do not add new hacks. exit_oom_victim from
> > > oom_kill_process just looks like sand in eyes.
> > 
> > Then, please revert "mm, oom: hide mm which is shared with kthread or global init"
> > ( http://lkml.kernel.org/r/1466426628-15074-11-git-send-email-mhocko@kernel.org ).
> > I don't like that patch because it is doing pointless find_lock_task_mm() test
> > and is telling a lie because it does not guarantee that we won't hit OOM livelock.
> 
> The above patch doesn't make the situation worse wrt livelock. I
> consider it an improvement. It adds find_lock_task_mm into
> oom_scan_process_thread but that can hardly be worse than just the
> task->signal->oom_victims check because we can catch MMF_OOM_REAPED. If
> we are mm loss, which is a less likely case, then we behave the same as
> with the previous implementation.
> 
> So I do not really see a reason to revert that patch for now.

And that being said. If you strongly disagree with the wording then what
about the following:
"
    In order to help a forward progress for the OOM killer, make sure that
    this really rare cases will not get into the way and hide the mm from the
    oom killer by setting MMF_OOM_REAPED flag for it.  oom_scan_process_thread
    will ignore any TIF_MEMDIE task if it has MMF_OOM_REAPED flag set to catch
    these oom victims.
    
    After this patch we should guarantee a forward progress for the OOM killer
    even when the selected victim is sharing memory with a kernel thread or
    global init as long as the victims mm is still alive.
"
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, oom: fix for hiding mm which is shared with kthread or global init
  2016-07-19  9:37       ` Michal Hocko
@ 2016-07-19 10:36         ` Tetsuo Handa
  2016-07-19 10:54           ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2016-07-19 10:36 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, akpm, oleg, vdavydov, rientjes

Michal Hocko wrote:
> On Tue 19-07-16 08:40:48, Michal Hocko wrote:
> > On Tue 19-07-16 06:30:42, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > I really do not think that this unlikely case really has to be handled
> > > > now. We are very likely going to move to a different model of oom victim
> > > > detection soon. So let's do not add new hacks. exit_oom_victim from
> > > > oom_kill_process just looks like sand in eyes.
> > > 
> > > Then, please revert "mm, oom: hide mm which is shared with kthread or global init"
> > > ( http://lkml.kernel.org/r/1466426628-15074-11-git-send-email-mhocko@kernel.org ).
> > > I don't like that patch because it is doing pointless find_lock_task_mm() test
> > > and is telling a lie because it does not guarantee that we won't hit OOM livelock.
> > 
> > The above patch doesn't make the situation worse wrt livelock. I
> > consider it an improvement. It adds find_lock_task_mm into
> > oom_scan_process_thread but that can hardly be worse than just the
> > task->signal->oom_victims check because we can catch MMF_OOM_REAPED. If
> > we are mm loss, which is a less likely case, then we behave the same as
> > with the previous implementation.
> > 
> > So I do not really see a reason to revert that patch for now.
> 
> And that being said. If you strongly disagree with the wording then what
> about the following:
> "
>     In order to help a forward progress for the OOM killer, make sure that
>     this really rare cases will not get into the way and hide the mm from the
>     oom killer by setting MMF_OOM_REAPED flag for it.  oom_scan_process_thread
>     will ignore any TIF_MEMDIE task if it has MMF_OOM_REAPED flag set to catch
>     these oom victims.
>     
>     After this patch we should guarantee a forward progress for the OOM killer
>     even when the selected victim is sharing memory with a kernel thread or
>     global init as long as the victims mm is still alive.
> "

No, I don't like "as long as the victims mm is still alive" exception.

If you don't like exit_oom_victim() from oom_kill_process(), what about
alternative shown below?

 	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims)) {
 		struct task_struct *p = find_lock_task_mm(task);
 		enum oom_scan_t ret = OOM_SCAN_ABORT;
 
 		if (p) {
 			if (test_bit(MMF_OOM_REAPED, &p->mm->flags))
 				ret = OOM_SCAN_CONTINUE;
 			task_unlock(p);
+#ifdef CONFIG_MMU
+		} else {
+			/*
+			 * MMF_OOM_REAPED was set at oom_kill_process() without
+			 * waking up the OOM reaper, but this thread group lost
+			 * its mm. Therefore, pretend as if the OOM reaper lost
+			 * its mm (i.e. select next OOM victim).
+			 * But be sure to prevent CONFIG_MMU=n from acting
+			 * as if exit_oom_victim() in exit_mm() has moved from
+			 * after mmput() to before mmput().
+			 */
+			ret = OOM_SCAN_CONTINUE;
+#endif
 		}
 		return ret;
 	}

By using this alternative, we can really guarantee a forward progress for
the OOM killer even when the selected victim is sharing memory with a kernel
thread or global init. No "as long as the victims mm is still alive" exception.

Also, this alternative (when combined with removal of MMF_OOM_NOT_REAPABLE) has
a bonus that we no longer need to call exit_oom_victim() from the OOM reaper
because the OOM killer can move on to next OOM victim after the OOM reaper
set MMF_OOM_REAPED to that mm. That is, we can immediately disallow
exit_oom_victim() on remote thread and apply oom_killer_disable() timeout
patch and revert "oom, suspend: fix oom_reaper vs. oom_killer_disable race".

If we remember victim's mm via your "oom: keep mm of the killed task available"
or my "mm,oom: Use list of mm_struct used by OOM victims.", we can force the
OOM reaper to try to reap by intervening to regular __mmput() from mmput() from
exit_mm() by purposely taking a reference on mm->mm_users. Then, we can always
try to reclaim some memory using the OOM reaper before risking exit_aio() from
__mmput() from mmput() from exit_mm() to stall, for we can keep the OOM killer
waiting until MMF_OOM_REAPED is set using your or my patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, oom: fix for hiding mm which is shared with kthread or global init
  2016-07-19 10:36         ` [PATCH] mm, oom: fix for hiding mm which is shared with kthread or " Tetsuo Handa
@ 2016-07-19 10:54           ` Michal Hocko
  2016-07-19 11:43             ` Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-07-19 10:54 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, akpm, oleg, vdavydov, rientjes

On Tue 19-07-16 19:36:40, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 19-07-16 08:40:48, Michal Hocko wrote:
> > > On Tue 19-07-16 06:30:42, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > I really do not think that this unlikely case really has to be handled
> > > > > now. We are very likely going to move to a different model of oom victim
> > > > > detection soon. So let's do not add new hacks. exit_oom_victim from
> > > > > oom_kill_process just looks like sand in eyes.
> > > > 
> > > > Then, please revert "mm, oom: hide mm which is shared with kthread or global init"
> > > > ( http://lkml.kernel.org/r/1466426628-15074-11-git-send-email-mhocko@kernel.org ).
> > > > I don't like that patch because it is doing pointless find_lock_task_mm() test
> > > > and is telling a lie because it does not guarantee that we won't hit OOM livelock.
> > > 
> > > The above patch doesn't make the situation worse wrt livelock. I
> > > consider it an improvement. It adds find_lock_task_mm into
> > > oom_scan_process_thread but that can hardly be worse than just the
> > > task->signal->oom_victims check because we can catch MMF_OOM_REAPED. If
> > > we are mm loss, which is a less likely case, then we behave the same as
> > > with the previous implementation.
> > > 
> > > So I do not really see a reason to revert that patch for now.
> > 
> > And that being said. If you strongly disagree with the wording then what
> > about the following:
> > "
> >     In order to help a forward progress for the OOM killer, make sure that
> >     this really rare cases will not get into the way and hide the mm from the
> >     oom killer by setting MMF_OOM_REAPED flag for it.  oom_scan_process_thread
> >     will ignore any TIF_MEMDIE task if it has MMF_OOM_REAPED flag set to catch
> >     these oom victims.
> >     
> >     After this patch we should guarantee a forward progress for the OOM killer
> >     even when the selected victim is sharing memory with a kernel thread or
> >     global init as long as the victims mm is still alive.
> > "
> 
> No, I don't like "as long as the victims mm is still alive" exception.

Why? Because of the wording or in principle?

> If you don't like exit_oom_victim() from oom_kill_process(), what about
> alternative shown below?
> 
>  	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims)) {
>  		struct task_struct *p = find_lock_task_mm(task);
>  		enum oom_scan_t ret = OOM_SCAN_ABORT;
>  
>  		if (p) {
>  			if (test_bit(MMF_OOM_REAPED, &p->mm->flags))
>  				ret = OOM_SCAN_CONTINUE;
>  			task_unlock(p);
> +#ifdef CONFIG_MMU
> +		} else {
> +			/*
> +			 * MMF_OOM_REAPED was set at oom_kill_process() without
> +			 * waking up the OOM reaper, but this thread group lost
> +			 * its mm. Therefore, pretend as if the OOM reaper lost
> +			 * its mm (i.e. select next OOM victim).
> +			 * But be sure to prevent CONFIG_MMU=n from acting
> +			 * as if exit_oom_victim() in exit_mm() has moved from
> +			 * after mmput() to before mmput().
> +			 */
> +			ret = OOM_SCAN_CONTINUE;
> +#endif
>  		}
>  		return ret;
>  	}
> 
> By using this alternative, we can really guarantee a forward progress for
> the OOM killer even when the selected victim is sharing memory with a kernel
> thread or global init. No "as long as the victims mm is still alive" exception.

I wouldn't complicate the pile which is waiting for the merge window and
risk introducing some last minute bugs.
 
> Also, this alternative (when combined with removal of MMF_OOM_NOT_REAPABLE) has
> a bonus that we no longer need to call exit_oom_victim() from the OOM reaper
> because the OOM killer can move on to next OOM victim after the OOM reaper
> set MMF_OOM_REAPED to that mm. That is, we can immediately disallow
> exit_oom_victim() on remote thread and apply oom_killer_disable() timeout
> patch and revert "oom, suspend: fix oom_reaper vs. oom_killer_disable race".
> 
> If we remember victim's mm via your "oom: keep mm of the killed task available"
> or my "mm,oom: Use list of mm_struct used by OOM victims.", we can force the
> OOM reaper to try to reap by intervening to regular __mmput() from mmput() from
> exit_mm() by purposely taking a reference on mm->mm_users. Then, we can always
> try to reclaim some memory using the OOM reaper before risking exit_aio() from
> __mmput() from mmput() from exit_mm() to stall, for we can keep the OOM killer
> waiting until MMF_OOM_REAPED is set using your or my patch.

Let's discuss these things later on after merge window along with anothe
changes.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, oom: fix for hiding mm which is shared with kthread or global init
  2016-07-19 10:54           ` Michal Hocko
@ 2016-07-19 11:43             ` Tetsuo Handa
  2016-07-19 11:58               ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2016-07-19 11:43 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, akpm, oleg, vdavydov, rientjes

Michal Hocko wrote:
> On Tue 19-07-16 19:36:40, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 19-07-16 08:40:48, Michal Hocko wrote:
> > > > On Tue 19-07-16 06:30:42, Tetsuo Handa wrote:
> > > > > Michal Hocko wrote:
> > > > > > I really do not think that this unlikely case really has to be handled
> > > > > > now. We are very likely going to move to a different model of oom victim
> > > > > > detection soon. So let's do not add new hacks. exit_oom_victim from
> > > > > > oom_kill_process just looks like sand in eyes.
> > > > > 
> > > > > Then, please revert "mm, oom: hide mm which is shared with kthread or global init"
> > > > > ( http://lkml.kernel.org/r/1466426628-15074-11-git-send-email-mhocko@kernel.org ).
> > > > > I don't like that patch because it is doing pointless find_lock_task_mm() test
> > > > > and is telling a lie because it does not guarantee that we won't hit OOM livelock.
> > > > 
> > > > The above patch doesn't make the situation worse wrt livelock. I
> > > > consider it an improvement. It adds find_lock_task_mm into
> > > > oom_scan_process_thread but that can hardly be worse than just the
> > > > task->signal->oom_victims check because we can catch MMF_OOM_REAPED. If
> > > > we are mm loss, which is a less likely case, then we behave the same as
> > > > with the previous implementation.
> > > > 
> > > > So I do not really see a reason to revert that patch for now.
> > > 
> > > And that being said. If you strongly disagree with the wording then what
> > > about the following:
> > > "
> > >     In order to help a forward progress for the OOM killer, make sure that
> > >     this really rare cases will not get into the way and hide the mm from the
> > >     oom killer by setting MMF_OOM_REAPED flag for it.  oom_scan_process_thread
> > >     will ignore any TIF_MEMDIE task if it has MMF_OOM_REAPED flag set to catch
> > >     these oom victims.
> > >     
> > >     After this patch we should guarantee a forward progress for the OOM killer
> > >     even when the selected victim is sharing memory with a kernel thread or
> > >     global init as long as the victims mm is still alive.
> > > "
> > 
> > No, I don't like "as long as the victims mm is still alive" exception.
> 
> Why? Because of the wording or in principle?

Making a _guarantee without exceptions now_ can allow other OOM livelock handlings
(e.g. http://lkml.kernel.org/r/20160719074935.GC9486@dhcp22.suse.cz ) to rely on
the OOM reaper. We can improve OOM reaper after we made a guarantee without
exceptions now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, oom: fix for hiding mm which is shared with kthread or global init
  2016-07-19 11:43             ` Tetsuo Handa
@ 2016-07-19 11:58               ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2016-07-19 11:58 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, akpm, oleg, vdavydov, rientjes

On Tue 19-07-16 20:43:32, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 19-07-16 19:36:40, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
[...]
> > > > And that being said. If you strongly disagree with the wording then what
> > > > about the following:
> > > > "
> > > >     In order to help a forward progress for the OOM killer, make sure that
> > > >     this really rare cases will not get into the way and hide the mm from the
> > > >     oom killer by setting MMF_OOM_REAPED flag for it.  oom_scan_process_thread
> > > >     will ignore any TIF_MEMDIE task if it has MMF_OOM_REAPED flag set to catch
> > > >     these oom victims.
> > > >     
> > > >     After this patch we should guarantee a forward progress for the OOM killer
> > > >     even when the selected victim is sharing memory with a kernel thread or
> > > >     global init as long as the victims mm is still alive.
> > > > "
> > > 
> > > No, I don't like "as long as the victims mm is still alive" exception.
> > 
> > Why? Because of the wording or in principle?
> 
> Making a _guarantee without exceptions now_ can allow other OOM livelock handlings

I am not convinced this particular thing would be the last piece in the
puzzle... And as already said before. Can we wait for the merge window
with the next changes please? I really do not want end up in a situation
where we would have too many oom fixes in flight again. There is no
reason to hurry.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-19 12:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-16  5:30 [PATCH] mm, oom: fix for hiding mm which is shared with kthread or global init Tetsuo Handa
2016-07-18  7:18 ` Michal Hocko
2016-07-18 21:30   ` [PATCH] mm, oom: fix for hiding mm which is shared with kthreador " Tetsuo Handa
2016-07-19  6:40     ` Michal Hocko
2016-07-19  9:37       ` Michal Hocko
2016-07-19 10:36         ` [PATCH] mm, oom: fix for hiding mm which is shared with kthread or " Tetsuo Handa
2016-07-19 10:54           ` Michal Hocko
2016-07-19 11:43             ` Tetsuo Handa
2016-07-19 11:58               ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.