All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm,oom: Allow SysRq-f to always select !TIF_MEMDIE thread group.
@ 2016-05-28 10:53 Tetsuo Handa
  2016-05-28 16:25 ` [PATCH v2] " Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2016-05-28 10:53 UTC (permalink / raw)
  To: akpm, mhocko; +Cc: linux-mm, Tetsuo Handa, David Rientjes

There has been two problems about SysRq-f (manual invocation of the OOM
killer). One is that moom_callback() is not called by moom_work under OOM
livelock situation because it does not have a dedicated WQ like vmstat_wq.
The other is that select_bad_process() selects a thread group which
already has a TIF_MEMDIE thread because oom_scan_process_thread() was
scanning all threads of all thread groups and find_lock_task_mm() can
return a TIF_MEMDIE thread.

Since commit f44666b04605d1c7 ("mm,oom: speed up select_bad_process()
loop") changed oom_scan_process_group() to use task->signal->oom_victims,
the OOM killer will no longer select a thread group which already has a
TIF_MEMDIE thread. But SysRq-f will select such thread group due to
returning OOM_SCAN_OK.

Although we will change oom_badness() to return 0 after the OOM reaper
gave up reaping the OOM victim's mm, currently there is possibility that
the OOM reaper is not called (due to "the OOM victim's mm is shared by
unkillable threads" or "the OOM reaper thread is not available due to
kthread_run() failure or CONFIG_MMU=n"). Therefore, we need to make sure
that SysRq-f will skip oom_badness() if such thread group has a TIF_MEMDIE
thread.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1685890..c16331c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -283,8 +283,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 	 * This task already has access to memory reserves and is being killed.
 	 * Don't allow any other task to have access to the reserves.
 	 */
-	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
-		return OOM_SCAN_ABORT;
+	if (atomic_read(&task->signal->oom_victims))
+		return !is_sysrq_oom(oc) ? OOM_SCAN_ABORT : OOM_SCAN_CONTINUE;
 
 	/*
 	 * If task is allocating a lot of memory and has been marked to be
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2] mm,oom: Allow SysRq-f to always select !TIF_MEMDIE thread group.
  2016-05-28 10:53 [PATCH] mm,oom: Allow SysRq-f to always select !TIF_MEMDIE thread group Tetsuo Handa
@ 2016-05-28 16:25 ` Tetsuo Handa
  2016-05-31 13:11   ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2016-05-28 16:25 UTC (permalink / raw)
  To: akpm, mhocko; +Cc: linux-mm, Tetsuo Handa, David Rientjes

There has been three problems about SysRq-f (manual invocation of the OOM
killer) case. To make description simple, this patch assumes situation
where the OOM reaper is not called (because the OOM victim's mm is shared
by unkillable threads) or not available (due to kthread_run() failure or
CONFIG_MMU=n).

First is that moom_callback() is not called by moom_work under OOM
livelock situation because it does not have a dedicated WQ like vmstat_wq.
This problem is not fixed yet.

Second is that select_bad_process() chooses a thread group which already
has a TIF_MEMDIE thread. Since commit f44666b04605d1c7 ("mm,oom: speed up
select_bad_process() loop") changed oom_scan_process_group() to use
task->signal->oom_victims, non SysRq-f case will no longer select a
thread group which already has a TIF_MEMDIE thread. But SysRq-f case will
select such thread group due to returning OOM_SCAN_OK. This patch makes
sure that oom_badness() is skipped by making oom_scan_process_group() to
return OOM_SCAN_CONTINUE for SysRq-f case.

Third is that oom_kill_process() chooses a thread group which already
has a TIF_MEMDIE thread when the candidate select_bad_process() chose
has children because oom_badness() does not take TIF_MEMDIE into account.
This patch checks child->signal->oom_victims before calling oom_badness()
if oom_kill_process() was called by SysRq-f case. This resembles making
sure that oom_badness() is skipped by returning OOM_SCAN_CONTINUE.

If we don't limit child->signal->oom_victims check to SysRq-f case, we
will break sysctl_oom_kill_allocating_task case by immediately killing
all children of the candidate when killing some child did not immediately
solve the OOM situation because oom_scan_process_thread() is not called.
This will be something we need to mark such child as unkillable after
some reasonable period or make sysctl_oom_kill_allocating_task literally
kill allocating task. Anyway, this patch addresses only SysRq-f case.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1685890..159063e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -283,8 +283,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 	 * This task already has access to memory reserves and is being killed.
 	 * Don't allow any other task to have access to the reserves.
 	 */
-	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
-		return OOM_SCAN_ABORT;
+	if (atomic_read(&task->signal->oom_victims))
+		return !is_sysrq_oom(oc) ? OOM_SCAN_ABORT : OOM_SCAN_CONTINUE;
 
 	/*
 	 * If task is allocating a lot of memory and has been marked to be
@@ -793,6 +793,14 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 			if (process_shares_mm(child, p->mm))
 				continue;
 			/*
+			 * Don't select TIF_MEMDIE child by SysRq-f case, or
+			 * we will get stuck by selecting the same TIF_MEMDIE
+			 * child forever.
+			 */
+			if (is_sysrq_oom(oc) &&
+			    atomic_read(&child->signal->oom_victims))
+				continue;
+			/*
 			 * oom_badness() returns 0 if the thread is unkillable
 			 */
 			child_points = oom_badness(child, memcg, oc->nodemask,
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm,oom: Allow SysRq-f to always select !TIF_MEMDIE thread group.
  2016-05-28 16:25 ` [PATCH v2] " Tetsuo Handa
@ 2016-05-31 13:11   ` Michal Hocko
  2016-05-31 21:35     ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2016-05-31 13:11 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, linux-mm, David Rientjes

On Sun 29-05-16 01:25:14, Tetsuo Handa wrote:
> There has been three problems about SysRq-f (manual invocation of the OOM
> killer) case. To make description simple, this patch assumes situation
> where the OOM reaper is not called (because the OOM victim's mm is shared
> by unkillable threads) or not available (due to kthread_run() failure or
> CONFIG_MMU=n).
> 
> First is that moom_callback() is not called by moom_work under OOM
> livelock situation because it does not have a dedicated WQ like vmstat_wq.
> This problem is not fixed yet.

Why do you mention it in the changelog when it is not related to the
patch then?

Btw. you can (ab)use oom_reaper for that purpose. The patch would be
quite trivial.

> Second is that select_bad_process() chooses a thread group which already
> has a TIF_MEMDIE thread. Since commit f44666b04605d1c7 ("mm,oom: speed up
> select_bad_process() loop") changed oom_scan_process_group() to use
> task->signal->oom_victims, non SysRq-f case will no longer select a
> thread group which already has a TIF_MEMDIE thread.

I am not sure the reference to the commit is really helpful. The
behavior you are describing below was there before this commit, the only
thing that has changed is the scope of the TIF_MEMDIE check.

> But SysRq-f case will
> select such thread group due to returning OOM_SCAN_OK. This patch makes
> sure that oom_badness() is skipped by making oom_scan_process_group() to
> return OOM_SCAN_CONTINUE for SysRq-f case.

I am OK with this part. I was suggesting something similar except I
wanted to skip over tasks which have fatal_signal_pending and that part
got nacked by David AFAIR. Could you make this a separate patch, please?

> Third is that oom_kill_process() chooses a thread group which already
> has a TIF_MEMDIE thread when the candidate select_bad_process() chose
> has children because oom_badness() does not take TIF_MEMDIE into account.
> This patch checks child->signal->oom_victims before calling oom_badness()
> if oom_kill_process() was called by SysRq-f case. This resembles making
> sure that oom_badness() is skipped by returning OOM_SCAN_CONTINUE.

This makes sense to me as well but why should be limit this to sysrq case?
Does it make any sense to select a child which already got killed for
normal OOM killer? Anyway I think it would be better to split this into
its own patch as well.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm,oom: Allow SysRq-f to always select !TIF_MEMDIE thread group.
  2016-05-31 13:11   ` Michal Hocko
@ 2016-05-31 21:35     ` Tetsuo Handa
  2016-06-01  7:34       ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2016-05-31 21:35 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, linux-mm, rientjes

Michal Hocko wrote:
> On Sun 29-05-16 01:25:14, Tetsuo Handa wrote:
> > There has been three problems about SysRq-f (manual invocation of the OOM
> > killer) case. To make description simple, this patch assumes situation
> > where the OOM reaper is not called (because the OOM victim's mm is shared
> > by unkillable threads) or not available (due to kthread_run() failure or
> > CONFIG_MMU=n).
> > 
> > First is that moom_callback() is not called by moom_work under OOM
> > livelock situation because it does not have a dedicated WQ like vmstat_wq.
> > This problem is not fixed yet.
> 
> Why do you mention it in the changelog when it is not related to the
> patch then?

Just we won't forget about it.

> 
> Btw. you can (ab)use oom_reaper for that purpose. The patch would be
> quite trivial.

How do you handle CONFIG_MMU=n case?
Are we going to provide oom_reaper for CONFIG_MMU=n case?

> 
> > Second is that select_bad_process() chooses a thread group which already
> > has a TIF_MEMDIE thread. Since commit f44666b04605d1c7 ("mm,oom: speed up
> > select_bad_process() loop") changed oom_scan_process_group() to use
> > task->signal->oom_victims, non SysRq-f case will no longer select a
> > thread group which already has a TIF_MEMDIE thread.
> 
> I am not sure the reference to the commit is really helpful. The
> behavior you are describing below was there before this commit, the only
> thing that has changed is the scope of the TIF_MEMDIE check.

Indeed. Traversing on all threads can always find a thread group which
already has a TIF_MEMDIE thread when there is already a TIF_MEMDIE thread.

> 
> > But SysRq-f case will
> > select such thread group due to returning OOM_SCAN_OK. This patch makes
> > sure that oom_badness() is skipped by making oom_scan_process_group() to
> > return OOM_SCAN_CONTINUE for SysRq-f case.
> 
> I am OK with this part. I was suggesting something similar except I
> wanted to skip over tasks which have fatal_signal_pending and that part
> got nacked by David AFAIR. Could you make this a separate patch, please?

I think it is better to change both part with this patch.

> 
> > Third is that oom_kill_process() chooses a thread group which already
> > has a TIF_MEMDIE thread when the candidate select_bad_process() chose
> > has children because oom_badness() does not take TIF_MEMDIE into account.
> > This patch checks child->signal->oom_victims before calling oom_badness()
> > if oom_kill_process() was called by SysRq-f case. This resembles making
> > sure that oom_badness() is skipped by returning OOM_SCAN_CONTINUE.
> 
> This makes sense to me as well but why should be limit this to sysrq case?
> Does it make any sense to select a child which already got killed for
> normal OOM killer? Anyway I think it would be better to split this into
> its own patch as well.

The reason is described in next paragraph.
Do we prefer immediately killing all children of the allocating task?
If yes, I think it should be a separate patch on top of this patch because
somebody might complain such behavior as a regression.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm,oom: Allow SysRq-f to always select !TIF_MEMDIE thread group.
  2016-05-31 21:35     ` Tetsuo Handa
@ 2016-06-01  7:34       ` Michal Hocko
  0 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2016-06-01  7:34 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, linux-mm, rientjes

On Wed 01-06-16 06:35:30, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Sun 29-05-16 01:25:14, Tetsuo Handa wrote:
> > > There has been three problems about SysRq-f (manual invocation of the OOM
> > > killer) case. To make description simple, this patch assumes situation
> > > where the OOM reaper is not called (because the OOM victim's mm is shared
> > > by unkillable threads) or not available (due to kthread_run() failure or
> > > CONFIG_MMU=n).
> > > 
> > > First is that moom_callback() is not called by moom_work under OOM
> > > livelock situation because it does not have a dedicated WQ like vmstat_wq.
> > > This problem is not fixed yet.
> > 
> > Why do you mention it in the changelog when it is not related to the
> > patch then?
> 
> Just we won't forget about it.

OK, then this belongs to a cover letter. Discussing unrelated things in
the patch description might end up being just confusing.
 
> > Btw. you can (ab)use oom_reaper for that purpose. The patch would be
> > quite trivial.
> 
> How do you handle CONFIG_MMU=n case?

void schedule_sysrq_oom(void)
{
	if (IS_ENABLED(CONFIG_MMU) && oom_reaper_th)
		kick_oom_reaper()
	else
		schedule_work(&moom_work);
}

[...]
> > > But SysRq-f case will
> > > select such thread group due to returning OOM_SCAN_OK. This patch makes
> > > sure that oom_badness() is skipped by making oom_scan_process_group() to
> > > return OOM_SCAN_CONTINUE for SysRq-f case.
> > 
> > I am OK with this part. I was suggesting something similar except I
> > wanted to skip over tasks which have fatal_signal_pending and that part
> > got nacked by David AFAIR. Could you make this a separate patch, please?
> 
> I think it is better to change both part with this patch.

They are semantically different (one is sysrq specific while the other
is not) so I would prefer to split them up.
 
> > > Third is that oom_kill_process() chooses a thread group which already
> > > has a TIF_MEMDIE thread when the candidate select_bad_process() chose
> > > has children because oom_badness() does not take TIF_MEMDIE into account.
> > > This patch checks child->signal->oom_victims before calling oom_badness()
> > > if oom_kill_process() was called by SysRq-f case. This resembles making
> > > sure that oom_badness() is skipped by returning OOM_SCAN_CONTINUE.
> > 
> > This makes sense to me as well but why should be limit this to sysrq case?
> > Does it make any sense to select a child which already got killed for
> > normal OOM killer? Anyway I think it would be better to split this into
> > its own patch as well.
> 
> The reason is described in next paragraph.
> Do we prefer immediately killing all children of the allocating task?

I do not think we want to select them _all_. We haven't been doing that
and I do not see a reason we should start now. But it surely doesn't
make any sense to select a task which has already TIF_MEMDIE set.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-06-01  7:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-28 10:53 [PATCH] mm,oom: Allow SysRq-f to always select !TIF_MEMDIE thread group Tetsuo Handa
2016-05-28 16:25 ` [PATCH v2] " Tetsuo Handa
2016-05-31 13:11   ` Michal Hocko
2016-05-31 21:35     ` Tetsuo Handa
2016-06-01  7:34       ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.