linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org, linux-mm@kvack.org
Cc: rientjes@google.com, oleg@redhat.com, vdavydov@parallels.com,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/10 -v3] Handle oom bypass more gracefully
Date: Fri, 3 Jun 2016 21:00:31 +0900	[thread overview]
Message-ID: <201606032100.AIH12958.HMOOOFLJSFQtVF@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <1464945404-30157-1-git-send-email-mhocko@kernel.org>

Michal Hocko wrote:
> Patch 8 is new in this version and it addresses an issue pointed out
> by 0-day OOM report where an oom victim was reaped several times.

I believe we need below once-you-nacked patch as well.

It would be possible to clear victim->signal->oom_flag_origin when
that victim gets TIF_MEMDIE, but I think that moving oom_task_origin()
test to oom_badness() will allow oom_scan_process_thread() which calls
oom_unkillable_task() only for testing task->signal->oom_victims to be
removed by also moving task->signal->oom_victims test to oom_badness().
Thus, I prefer this way.
----------------------------------------
>From 91f167dc3a216894e124f976c3ffdcdf6fd802fd Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Fri, 3 Jun 2016 20:47:58 +0900
Subject: [PATCH] mm, oom: oom_task_origin should skip oom_reaped tasks

While "mm, oom: task_will_free_mem should skip oom_reaped tasks" meant to
make sure that task_will_free_mem(current) shortcut shall not select
MMF_OOM_REAPED current task after once the OOM reaper reaped current->mm
(or gave up reaping it), there is an unhandled exception.

Currently, oom_scan_process_thread() returns OOM_SCAN_SELECT if
oom_task_origin() returned true. But this might cause OOM livelock
because the OOM killer does not call oom_badness() in order to skip
MMF_OOM_REAPED task while it is possible that try_to_unuse() from swapoff
path or unmerge_and_remove_all_rmap_items() from ksm's run_store path
gets stuck at unkillable waits. We can't afford (at least for now)
replacing mmput() with mmput_async(), lock_page() with
lock_page_killable(), wait_on_page_bit() with wait_on_page_bit_killable(),
mutex_lock() with mutex_lock_killable(), down_read() with
down_read_killable() and so on which are used inside these paths.

Once the OOM reaper reaped that task's memory (or gave up reaping it),
the OOM killer must not select that task again when oom_task_origin(task)
returned true. We need to select different victims until that task can
call clear_current_oom_origin().

Since oom_badness() is a function which returns score of the given thread
group with eligibility/livelock test, it is more natural and safer to let
oom_badness() return highest score when oom_task_origin(task) == true.

This patch moves oom_task_origin() test from oom_scan_process_thread() to
after MMF_OOM_REAPED test inside oom_badness(), changes the callers to
receive the score using "unsigned long" variable, and eliminates
OOM_SCAN_SELECT path in the callers.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 include/linux/oom.h |  1 -
 mm/memcontrol.c     |  9 +--------
 mm/oom_kill.c       | 22 ++++++++++------------
 3 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 5bc0457..d6e4f2a 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -50,7 +50,6 @@ enum oom_scan_t {
 	OOM_SCAN_OK,		/* scan thread and find its badness */
 	OOM_SCAN_CONTINUE,	/* do not consider thread for oom kill */
 	OOM_SCAN_ABORT,		/* abort the iteration and return */
-	OOM_SCAN_SELECT,	/* always select this thread first */
 };
 
 extern struct mutex oom_lock;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f161fe8..c325336 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1266,7 +1266,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	struct mem_cgroup *iter;
 	unsigned long chosen_points = 0;
 	unsigned long totalpages;
-	unsigned int points = 0;
+	unsigned long points = 0;
 	struct task_struct *chosen = NULL;
 
 	mutex_lock(&oom_lock);
@@ -1291,13 +1291,6 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		css_task_iter_start(&iter->css, &it);
 		while ((task = css_task_iter_next(&it))) {
 			switch (oom_scan_process_thread(&oc, task)) {
-			case OOM_SCAN_SELECT:
-				if (chosen)
-					put_task_struct(chosen);
-				chosen = task;
-				chosen_points = ULONG_MAX;
-				get_task_struct(chosen);
-				/* fall through */
 			case OOM_SCAN_CONTINUE:
 				continue;
 			case OOM_SCAN_ABORT:
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 340ea11..a9af021 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -188,6 +188,15 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
 	}
 
 	/*
+	 * If task is allocating a lot of memory and has been marked to be
+	 * killed first if it triggers an oom, then select it.
+	 */
+	if (oom_task_origin(p)) {
+		task_unlock(p);
+		return ULONG_MAX;
+	}
+
+	/*
 	 * The baseline for the badness score is the proportion of RAM that each
 	 * task's rss, pagetable and swap space use.
 	 */
@@ -297,13 +306,6 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 		return OOM_SCAN_OK;
 	}
 
-	/*
-	 * If task is allocating a lot of memory and has been marked to be
-	 * killed first if it triggers an oom, then select it.
-	 */
-	if (oom_task_origin(task))
-		return OOM_SCAN_SELECT;
-
 	return OOM_SCAN_OK;
 }
 
@@ -320,13 +322,9 @@ static struct task_struct *select_bad_process(struct oom_control *oc,
 
 	rcu_read_lock();
 	for_each_process(p) {
-		unsigned int points;
+		unsigned long points;
 
 		switch (oom_scan_process_thread(oc, p)) {
-		case OOM_SCAN_SELECT:
-			chosen = p;
-			chosen_points = ULONG_MAX;
-			/* fall through */
 		case OOM_SCAN_CONTINUE:
 			continue;
 		case OOM_SCAN_ABORT:
-- 
1.8.3.1

  parent reply	other threads:[~2016-06-03 12:00 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03  9:16 [PATCH 0/10 -v3] Handle oom bypass more gracefully Michal Hocko
2016-06-03  9:16 ` [PATCH 01/10] proc, oom: drop bogus task_lock and mm check Michal Hocko
2016-06-03  9:16 ` [PATCH 02/10] proc, oom: drop bogus sighand lock Michal Hocko
2016-06-03  9:16 ` [PATCH 03/10] proc, oom_adj: extract oom_score_adj setting into a helper Michal Hocko
2016-06-03  9:16 ` [PATCH 04/10] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Michal Hocko
2016-06-03  9:16 ` [PATCH 05/10] mm, oom: skip vforked tasks from being selected Michal Hocko
2016-06-03  9:16 ` [PATCH 06/10] mm, oom: kill all tasks sharing the mm Michal Hocko
2016-06-06 22:27   ` David Rientjes
2016-06-06 23:20     ` Oleg Nesterov
2016-06-07  6:37       ` Michal Hocko
2016-06-07 22:15       ` David Rientjes
2016-06-08  6:22         ` Michal Hocko
2016-06-08 22:51           ` David Rientjes
2016-06-09  6:46             ` Michal Hocko
2016-06-03  9:16 ` [PATCH 07/10] mm, oom: fortify task_will_free_mem Michal Hocko
2016-06-03  9:16 ` [PATCH 08/10] mm, oom: task_will_free_mem should skip oom_reaped tasks Michal Hocko
2016-06-03  9:16 ` [RFC PATCH 09/10] mm, oom_reaper: do not attempt to reap a task more than twice Michal Hocko
2016-06-03  9:16 ` [RFC PATCH 10/10] mm, oom: hide mm which is shared with kthread or global init Michal Hocko
2016-06-03 15:16   ` Tetsuo Handa
2016-06-06  8:15     ` Michal Hocko
2016-06-06 13:26     ` Michal Hocko
2016-06-07  6:26       ` Michal Hocko
2016-06-03 12:00 ` Tetsuo Handa [this message]
2016-06-03 12:20   ` [PATCH 0/10 -v3] Handle oom bypass more gracefully Michal Hocko
2016-06-03 12:22     ` Michal Hocko
2016-06-04 10:57       ` Tetsuo Handa
2016-06-06  8:39         ` Michal Hocko
2016-06-03 15:17     ` Tetsuo Handa
2016-06-06  8:36       ` Michal Hocko
2016-06-07 14:30         ` Tetsuo Handa
2016-06-07 15:05           ` Michal Hocko
2016-06-07 21:49             ` Tetsuo Handa
2016-06-08  7:27               ` Michal Hocko
2016-06-08 14:55                 ` Tetsuo Handa
2016-06-08 16:05                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201606032100.AIH12958.HMOOOFLJSFQtVF@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).