linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: <linux-mm@kvack.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	David Rientjes <rientjes@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Vladimir Davydov <vdavydov@parallels.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: [PATCH 3/6] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj
Date: Thu, 26 May 2016 14:40:12 +0200	[thread overview]
Message-ID: <1464266415-15558-4-git-send-email-mhocko@kernel.org> (raw)
In-Reply-To: <1464266415-15558-1-git-send-email-mhocko@kernel.org>

From: Michal Hocko <mhocko@suse.com>

oom_score_adj is shared for the thread groups (via struct signal) but
this is not sufficient to cover processes sharing mm (CLONE_VM without
CLONE_THREAD resp. CLONE_SIGHAND) and so we can easily end up in a
situation when some processes update their oom_score_adj and confuse
the oom killer. In the worst case some of those processes might hide
from oom killer altogether via OOM_SCORE_ADJ_MIN while others are
eligible. OOM killer would then pick up those eligible but won't be
allowed to kill others sharing the same mm so the mm wouldn't release
the mm and so the memory.

It would be ideal to have the oom_score_adj per mm_struct becuase that
is the natural entity OOM killer considers. But this will not work
because some programs are doing
	vfork()
	set_oom_adj()
	exec()

We can achieve the same though. oom_score_adj write handler can set the
oom_score_adj for all processes sharing the same mm if the task is not
in the middle of vfork. As a result all the processes will share the
same oom_score_adj.

Note that we have to serialize all the oom_score_adj writers now to
guarantee they do not interleave and generate inconsistent results.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/proc/base.c     | 35 +++++++++++++++++++++++++++++++++++
 include/linux/mm.h |  2 ++
 mm/oom_kill.c      |  2 +-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 23679673bf5a..e3ee4fb1930c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1043,10 +1043,13 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count,
 
 static int __set_oom_adj(struct file *file, int oom_adj, bool legacy)
 {
+	static DEFINE_MUTEX(oom_adj_mutex);
+	struct mm_struct *mm = NULL;
 	struct task_struct *task;
 	unsigned long flags;
 	int err = 0;
 
+	mutex_lock(&oom_adj_mutex);
 	task = get_proc_task(file_inode(file));
 	if (!task) {
 		err = -ESRCH;
@@ -1085,6 +1088,20 @@ static int __set_oom_adj(struct file *file, int oom_adj, bool legacy)
 		}
 	}
 
+	/*
+	 * If we are not in the vfork and share mm with other processes we
+	 * have to propagate the score otherwise we would have a schizophrenic
+	 * requirements for the same mm. We can use racy check because we
+	 * only risk the slow path.
+	 */
+	if (!task->vfork_done &&
+			atomic_read(&task->mm->mm_users) > get_nr_threads(task)) {
+		mm = task->mm;
+
+		/* pin the mm so it doesn't go away and get reused */
+		atomic_inc(&mm->mm_count);
+	}
+
 	task->signal->oom_score_adj = oom_adj;
 	if (!legacy && has_capability_noaudit(current, CAP_SYS_RESOURCE))
 		task->signal->oom_score_adj_min = (short)oom_adj;
@@ -1094,7 +1111,25 @@ static int __set_oom_adj(struct file *file, int oom_adj, bool legacy)
 err_task_lock:
 	task_unlock(task);
 	put_task_struct(task);
+
+	if (mm) {
+		struct task_struct *p;
+
+		rcu_read_lock();
+		for_each_process(p) {
+			task_lock(p);
+			if (!p->vfork_done && process_shares_mm(p, mm)) {
+				p->signal->oom_score_adj = oom_adj;
+				if (!legacy && has_capability_noaudit(current, CAP_SYS_RESOURCE))
+					p->signal->oom_score_adj_min = (short)oom_adj;
+			}
+			task_unlock(p);
+		}
+		rcu_read_unlock();
+		mmdrop(mm);
+	}
 out:
+	mutex_unlock(&oom_adj_mutex);
 	return err;
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 05102822912c..b44d3d792a00 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2248,6 +2248,8 @@ static inline int in_gate_area(struct mm_struct *mm, unsigned long addr)
 }
 #endif	/* __HAVE_ARCH_GATE_AREA */
 
+extern bool process_shares_mm(struct task_struct *p, struct mm_struct *mm);
+
 #ifdef CONFIG_SYSCTL
 extern int sysctl_drop_caches;
 int drop_caches_sysctl_handler(struct ctl_table *, int,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0e33e912f7e4..eeccb4d7e7f5 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -416,7 +416,7 @@ bool oom_killer_disabled __read_mostly;
  * task's threads: if one of those is using this mm then this task was also
  * using it.
  */
-static bool process_shares_mm(struct task_struct *p, struct mm_struct *mm)
+bool process_shares_mm(struct task_struct *p, struct mm_struct *mm)
 {
 	struct task_struct *t;
 
-- 
2.8.1

  parent reply	other threads:[~2016-05-26 12:40 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-26 12:40 [PATCH 0/5] Handle oom bypass more gracefully Michal Hocko
2016-05-26 12:40 ` [PATCH 1/6] mm, oom: do not loop over all tasks if there are no external tasks sharing mm Michal Hocko
2016-05-26 14:30   ` Tetsuo Handa
2016-05-26 14:59     ` Michal Hocko
2016-05-26 15:25       ` [PATCH 1/6] mm, oom: do not loop over all tasks if there are noexternal " Tetsuo Handa
2016-05-26 15:35         ` Michal Hocko
2016-05-26 16:14           ` [PATCH 1/6] mm, oom: do not loop over all tasks if there are no external " Tetsuo Handa
2016-05-27  6:45             ` Michal Hocko
2016-05-27  7:15               ` Michal Hocko
2016-05-27  8:03                 ` Michal Hocko
2016-05-26 12:40 ` [PATCH 2/6] proc, oom_adj: extract oom_score_adj setting into a helper Michal Hocko
2016-05-26 12:40 ` Michal Hocko [this message]
2016-05-27 11:18   ` [PATCH 3/6] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Michal Hocko
2016-05-27 16:18     ` Vladimir Davydov
2016-05-30  7:07       ` Michal Hocko
2016-05-30  8:47         ` Vladimir Davydov
2016-05-30  9:39           ` Michal Hocko
2016-05-30 10:26             ` Vladimir Davydov
2016-05-30 11:11               ` Michal Hocko
2016-05-30 12:19                 ` Vladimir Davydov
2016-05-30 12:28                   ` Michal Hocko
2016-05-26 12:40 ` [PATCH 4/6] mm, oom: skip over vforked tasks Michal Hocko
2016-05-27 16:48   ` Vladimir Davydov
2016-05-30  7:13     ` Michal Hocko
2016-05-30  9:52       ` Michal Hocko
2016-05-30 10:40         ` Vladimir Davydov
2016-05-30 10:53           ` Michal Hocko
2016-05-30 12:03   ` Michal Hocko
2016-05-26 12:40 ` [PATCH 5/6] mm, oom: kill all tasks sharing the mm Michal Hocko
2016-05-26 12:40 ` [PATCH 6/6] mm, oom: fortify task_will_free_mem Michal Hocko
2016-05-26 14:11   ` Tetsuo Handa
2016-05-26 14:23     ` Michal Hocko
2016-05-26 14:41       ` Tetsuo Handa
2016-05-26 14:56         ` Michal Hocko
2016-05-27 11:07   ` Michal Hocko
2016-05-27 16:00 ` [PATCH 0/5] Handle oom bypass more gracefully Michal Hocko
2016-05-30 13:05 [PATCH 0/6 -v2] " Michal Hocko
2016-05-30 13:05 ` [PATCH 3/6] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Michal Hocko
2016-05-31  7:41   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1464266415-15558-4-git-send-email-mhocko@kernel.org \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).