linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>, Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Tejun Heo <tj@kernel.org>,
	kernel-team@fb.com, cgroups@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [patch -mm v2 3/3] mm, memcg: add hierarchical usage oom policy
Date: Thu, 25 Jan 2018 15:53:50 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.10.1801251553210.161808@chino.kir.corp.google.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1801251552320.161808@chino.kir.corp.google.com>

One of the three significant concerns brought up about the cgroup aware
oom killer is that its decisionmaking is completely evaded by creating
subcontainers and attaching processes such that the ancestor's usage does
not exceed another cgroup on the system.

In this regard, users who do not distribute their processes over a set of
subcontainers for mem cgroup control, statistics, or other controllers
are unfairly penalized.

This adds an oom policy, "tree", that accounts for hierarchical usage
when comparing cgroups and the cgroup aware oom killer is enabled by an
ancestor.  This allows administrators, for example, to require users in
their own top-level mem cgroup subtree to be accounted for with
hierarchical usage.  In other words, they can longer evade the oom killer
by using other controllers or subcontainers.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 Documentation/cgroup-v2.txt | 12 ++++++++++--
 include/linux/memcontrol.h  |  5 +++++
 mm/memcontrol.c             | 12 +++++++++---
 3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1078,6 +1078,11 @@ PAGE_SIZE multiple when read back.
 	memory consumers; that is, they will compare mem cgroup usage rather
 	than process memory footprint.  See the "OOM Killer" section.
 
+	If "tree", the OOM killer will compare mem cgroups and its subtree
+	as indivisible memory consumers when selecting a hierarchy.  This
+	policy cannot be set on the root mem cgroup.  See the "OOM Killer"
+	section.
+
   memory.events
 	A read-only flat-keyed file which exists on non-root cgroups.
 	The following entries are defined.  Unless specified
@@ -1301,6 +1306,9 @@ There are currently two available oom policies:
    subtree as an OOM victim and kill at least one process, depending on
    memory.oom_group, from it.
 
+ - "tree": choose the cgroup with the largest memory footprint considering
+   itself and its subtree and kill at least one process.
+
 When selecting a cgroup as a victim, the OOM killer will kill the process
 with the largest memory footprint.  A user can control this behavior by
 enabling the per-cgroup memory.oom_group option.  If set, it causes the
@@ -1314,8 +1322,8 @@ Please, note that memory charges are not migrating if tasks
 are moved between different memory cgroups. Moving tasks with
 significant memory footprint may affect OOM victim selection logic.
 If it's a case, please, consider creating a common ancestor for
-the source and destination memory cgroups and enabling oom_group
-on ancestor layer.
+the source and destination memory cgroups and setting a policy of "tree"
+and enabling oom_group on an ancestor layer.
 
 
 IO
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -69,6 +69,11 @@ enum memcg_oom_policy {
 	 * mem cgroup as an indivisible consumer
 	 */
 	MEMCG_OOM_POLICY_CGROUP,
+	/*
+	 * Tree cgroup usage for all descendant memcg groups, treating each mem
+	 * cgroup and its subtree as an indivisible consumer
+	 */
+	MEMCG_OOM_POLICY_TREE,
 };
 
 struct mem_cgroup_reclaim_cookie {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2728,7 +2728,7 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc)
 	/*
 	 * The oom_score is calculated for leaf memory cgroups (including
 	 * the root memcg).
-	 * Non-leaf oom_group cgroups accumulating score of descendant
+	 * Cgroups with oom policy of "tree" accumulate the score of descendant
 	 * leaf memory cgroups.
 	 */
 	rcu_read_lock();
@@ -2737,10 +2737,11 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc)
 
 		/*
 		 * We don't consider non-leaf non-oom_group memory cgroups
-		 * as OOM victims.
+		 * without the oom policy of "tree" as OOM victims.
 		 */
 		if (memcg_has_children(iter) && iter != root_mem_cgroup &&
-		    !mem_cgroup_oom_group(iter))
+		    !mem_cgroup_oom_group(iter) &&
+		    iter->oom_policy != MEMCG_OOM_POLICY_TREE)
 			continue;
 
 		/*
@@ -5538,6 +5539,9 @@ static int memory_oom_policy_show(struct seq_file *m, void *v)
 	case MEMCG_OOM_POLICY_CGROUP:
 		seq_puts(m, "cgroup\n");
 		break;
+	case MEMCG_OOM_POLICY_TREE:
+		seq_puts(m, "tree\n");
+		break;
 	case MEMCG_OOM_POLICY_NONE:
 	default:
 		seq_puts(m, "none\n");
@@ -5556,6 +5560,8 @@ static ssize_t memory_oom_policy_write(struct kernfs_open_file *of,
 		memcg->oom_policy = MEMCG_OOM_POLICY_NONE;
 	else if (!memcmp("cgroup", buf, min(sizeof("cgroup")-1, nbytes)))
 		memcg->oom_policy = MEMCG_OOM_POLICY_CGROUP;
+	else if (!memcmp("tree", buf, min(sizeof("tree")-1, nbytes)))
+		memcg->oom_policy = MEMCG_OOM_POLICY_TREE;
 	else
 		ret = -EINVAL;
 

      parent reply	other threads:[~2018-01-25 23:54 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-17  2:14 [patch -mm 0/4] mm, memcg: introduce oom policies David Rientjes
2018-01-17  2:15 ` [patch -mm 1/4] mm, memcg: introduce per-memcg oom policy tunable David Rientjes
2018-01-17  2:15 ` [patch -mm 2/4] mm, memcg: replace cgroup aware oom killer mount option with tunable David Rientjes
2018-01-17  2:15 ` [patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable David Rientjes
2018-01-17 15:41   ` Tejun Heo
2018-01-17 16:00     ` Michal Hocko
2018-01-17 22:18       ` David Rientjes
2018-01-23 15:13         ` Michal Hocko
2018-01-17 22:14     ` David Rientjes
2018-01-19 20:53       ` David Rientjes
2018-01-20 12:32         ` Tejun Heo
2018-01-22 22:34           ` David Rientjes
2018-01-23 15:53             ` Michal Hocko
2018-01-23 22:22               ` David Rientjes
2018-01-24  8:20                 ` Michal Hocko
2018-01-24 21:44                   ` David Rientjes
2018-01-24 22:08                     ` Andrew Morton
2018-01-24 22:18                       ` Tejun Heo
2018-01-25  8:11                       ` Michal Hocko
2018-01-25  8:05                     ` Michal Hocko
2018-01-25 23:27                       ` David Rientjes
2018-01-26 10:07                         ` Michal Hocko
2018-01-26 22:33                           ` David Rientjes
2018-01-17  2:15 ` [patch -mm 4/4] mm, memcg: add hierarchical usage oom policy David Rientjes
2018-01-17 11:46 ` [patch -mm 0/4] mm, memcg: introduce oom policies Roman Gushchin
2018-01-17 22:31   ` David Rientjes
2018-01-25 23:53 ` [patch -mm v2 0/3] " David Rientjes
2018-01-25 23:53   ` [patch -mm v2 1/3] mm, memcg: introduce per-memcg oom policy tunable David Rientjes
2018-01-26 17:15     ` Michal Hocko
2018-01-29 22:38       ` David Rientjes
2018-01-30  8:50         ` Michal Hocko
2018-01-30 22:38           ` David Rientjes
2018-01-31  9:47             ` Michal Hocko
2018-02-01 10:11               ` David Rientjes
2018-01-25 23:53   ` [patch -mm v2 2/3] mm, memcg: replace cgroup aware oom killer mount option with tunable David Rientjes
2018-01-26  0:00     ` Andrew Morton
2018-01-26 22:20       ` David Rientjes
2018-01-26 22:39         ` Andrew Morton
2018-01-26 22:52           ` David Rientjes
2018-01-27  0:17             ` Andrew Morton
2018-01-29 10:46               ` Michal Hocko
2018-01-29 19:11                 ` Tejun Heo
2018-01-30  8:54                   ` Michal Hocko
2018-01-30 11:58                     ` Roman Gushchin
2018-01-30 12:08                       ` Michal Hocko
2018-01-30 12:13                         ` Roman Gushchin
2018-01-30 12:20                           ` Michal Hocko
2018-01-30 15:15                             ` Tejun Heo
2018-01-30 17:30                             ` Johannes Weiner
2018-01-30 19:39                             ` Andrew Morton
2018-01-29 22:16               ` David Rientjes
2018-01-25 23:53   ` David Rientjes [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1801251553210.161808@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).