From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>, Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Tejun Heo <tj@kernel.org>,
kernel-team@fb.com, cgroups@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: [patch -mm 2/4] mm, memcg: replace cgroup aware oom killer mount option with tunable
Date: Tue, 16 Jan 2018 18:15:05 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.10.1801161814010.28198@chino.kir.corp.google.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1801161812550.28198@chino.kir.corp.google.com>
Now that each mem cgroup on the system has a memory.oom_policy tunable to
specify oom kill selection behavior, remove the needless "groupoom" mount
option that requires (1) the entire system to be forced, perhaps
unnecessarily, perhaps unexpectedly, into a single oom policy that
differs from the traditional per process selection, and (2) a remount to
change.
Instead of enabling the cgroup aware oom killer with the "groupoom" mount
option, set the mem cgroup subtree's memory.oom_policy to "cgroup".
Signed-off-by: David Rientjes <rientjes@google.com>
---
Documentation/cgroup-v2.txt | 43 +++++++++++++++++++++----------------------
include/linux/cgroup-defs.h | 5 -----
include/linux/memcontrol.h | 5 +++++
kernel/cgroup/cgroup.c | 13 +------------
mm/memcontrol.c | 17 ++++++++---------
5 files changed, 35 insertions(+), 48 deletions(-)
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1069,6 +1069,10 @@ PAGE_SIZE multiple when read back.
victim; that is, it will choose the single process with the largest
memory footprint.
+ If "cgroup", the OOM killer will compare mem cgroups as indivisible
+ memory consumers; that is, they will compare mem cgroup usage rather
+ than process memory footprint. See the "OOM Killer" section.
+
memory.events
A read-only flat-keyed file which exists on non-root cgroups.
The following entries are defined. Unless specified
@@ -1275,37 +1279,32 @@ belonging to the affected files to ensure correct memory ownership.
OOM Killer
~~~~~~~~~~
-Cgroup v2 memory controller implements a cgroup-aware OOM killer.
-It means that it treats cgroups as first class OOM entities.
+Cgroup v2 memory controller implements an optional cgroup-aware out of
+memory killer, which treats cgroups as indivisible OOM entities.
-Cgroup-aware OOM logic is turned off by default and requires
-passing the "groupoom" option on mounting cgroupfs. It can also
-by remounting cgroupfs with the following command::
+This policy is controlled by memory.oom_policy. When a memory cgroup is
+out of memory, its memory.oom_policy will dictate how the OOM killer will
+select a process, or cgroup, to kill. Likewise, when the system is OOM,
+the policy is dictated by the root mem cgroup.
- # mount -o remount,groupoom $MOUNT_POINT
+There are currently two available oom policies:
-Under OOM conditions the memory controller tries to make the best
-choice of a victim, looking for a memory cgroup with the largest
-memory footprint, considering leaf cgroups and cgroups with the
-memory.oom_group option set, which are considered to be an indivisible
-memory consumers.
+ - "none": default, choose the largest single memory hogging process to
+ oom kill, as traditionally the OOM killer has always done.
-By default, OOM killer will kill the biggest task in the selected
-memory cgroup. A user can change this behavior by enabling
-the per-cgroup memory.oom_group option. If set, it causes
-the OOM killer to kill all processes attached to the cgroup,
-except processes with oom_score_adj set to -1000.
+ - "cgroup": choose the cgroup with the largest memory footprint from the
+ subtree as an OOM victim and kill at least one process, depending on
+ memory.oom_group, from it.
-This affects both system- and cgroup-wide OOMs. For a cgroup-wide OOM
-the memory controller considers only cgroups belonging to the sub-tree
-of the OOM'ing cgroup.
+When selecting a cgroup as a victim, the OOM killer will kill the process
+with the largest memory footprint. A user can control this behavior by
+enabling the per-cgroup memory.oom_group option. If set, it causes the
+OOM killer to kill all processes attached to the cgroup, except processes
+with /proc/pid/oom_score_adj set to -1000 (oom disabled).
The root cgroup is treated as a leaf memory cgroup, so it's compared
with other leaf memory cgroups and cgroups with oom_group option set.
-If there are no cgroups with the enabled memory controller,
-the OOM killer is using the "traditional" process-based approach.
-
Please, note that memory charges are not migrating if tasks
are moved between different memory cgroups. Moving tasks with
significant memory footprint may affect OOM victim selection logic.
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -81,11 +81,6 @@ enum {
* Enable cpuset controller in v1 cgroup to use v2 behavior.
*/
CGRP_ROOT_CPUSET_V2_MODE = (1 << 4),
-
- /*
- * Enable cgroup-aware OOM killer.
- */
- CGRP_GROUP_OOM = (1 << 5),
};
/* cftype->flags */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -64,6 +64,11 @@ enum memcg_oom_policy {
* oom_badness()
*/
MEMCG_OOM_POLICY_NONE,
+ /*
+ * Local cgroup usage is used to select a target cgroup, treating each
+ * mem cgroup as an indivisible consumer
+ */
+ MEMCG_OOM_POLICY_CGROUP,
};
struct mem_cgroup_reclaim_cookie {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1732,9 +1732,6 @@ static int parse_cgroup_root_flags(char *data, unsigned int *root_flags)
if (!strcmp(token, "nsdelegate")) {
*root_flags |= CGRP_ROOT_NS_DELEGATE;
continue;
- } else if (!strcmp(token, "groupoom")) {
- *root_flags |= CGRP_GROUP_OOM;
- continue;
}
pr_err("cgroup2: unknown option \"%s\"\n", token);
@@ -1751,11 +1748,6 @@ static void apply_cgroup_root_flags(unsigned int root_flags)
cgrp_dfl_root.flags |= CGRP_ROOT_NS_DELEGATE;
else
cgrp_dfl_root.flags &= ~CGRP_ROOT_NS_DELEGATE;
-
- if (root_flags & CGRP_GROUP_OOM)
- cgrp_dfl_root.flags |= CGRP_GROUP_OOM;
- else
- cgrp_dfl_root.flags &= ~CGRP_GROUP_OOM;
}
}
@@ -1763,8 +1755,6 @@ static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root
{
if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE)
seq_puts(seq, ",nsdelegate");
- if (cgrp_dfl_root.flags & CGRP_GROUP_OOM)
- seq_puts(seq, ",groupoom");
return 0;
}
@@ -5921,8 +5911,7 @@ static struct kobj_attribute cgroup_delegate_attr = __ATTR_RO(delegate);
static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
- return snprintf(buf, PAGE_SIZE, "nsdelegate\n"
- "groupoom\n");
+ return snprintf(buf, PAGE_SIZE, "nsdelegate\n");
}
static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2798,14 +2798,14 @@ bool mem_cgroup_select_oom_victim(struct oom_control *oc)
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
return false;
- if (!(cgrp_dfl_root.flags & CGRP_GROUP_OOM))
- return false;
-
if (oc->memcg)
root = oc->memcg;
else
root = root_mem_cgroup;
+ if (root->oom_policy != MEMCG_OOM_POLICY_CGROUP)
+ return false;
+
select_victim_memcg(root, oc);
return oc->chosen_memcg;
@@ -5412,9 +5412,6 @@ static int memory_oom_group_show(struct seq_file *m, void *v)
struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
bool oom_group = memcg->oom_group;
- if (!(cgrp_dfl_root.flags & CGRP_GROUP_OOM))
- return -ENOTSUPP;
-
seq_printf(m, "%d\n", oom_group);
return 0;
@@ -5428,9 +5425,6 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
int oom_group;
int err;
- if (!(cgrp_dfl_root.flags & CGRP_GROUP_OOM))
- return -ENOTSUPP;
-
err = kstrtoint(strstrip(buf), 0, &oom_group);
if (err)
return err;
@@ -5541,6 +5535,9 @@ static int memory_oom_policy_show(struct seq_file *m, void *v)
enum memcg_oom_policy policy = READ_ONCE(memcg->oom_policy);
switch (policy) {
+ case MEMCG_OOM_POLICY_CGROUP:
+ seq_puts(m, "cgroup\n");
+ break;
case MEMCG_OOM_POLICY_NONE:
default:
seq_puts(m, "none\n");
@@ -5557,6 +5554,8 @@ static ssize_t memory_oom_policy_write(struct kernfs_open_file *of,
buf = strstrip(buf);
if (!memcmp("none", buf, min(sizeof("none")-1, nbytes)))
memcg->oom_policy = MEMCG_OOM_POLICY_NONE;
+ else if (!memcmp("cgroup", buf, min(sizeof("cgroup")-1, nbytes)))
+ memcg->oom_policy = MEMCG_OOM_POLICY_CGROUP;
else
ret = -EINVAL;
next prev parent reply other threads:[~2018-01-17 2:15 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-17 2:14 [patch -mm 0/4] mm, memcg: introduce oom policies David Rientjes
2018-01-17 2:15 ` [patch -mm 1/4] mm, memcg: introduce per-memcg oom policy tunable David Rientjes
2018-01-17 2:15 ` David Rientjes [this message]
2018-01-17 2:15 ` [patch -mm 3/4] mm, memcg: replace memory.oom_group with " David Rientjes
2018-01-17 15:41 ` Tejun Heo
2018-01-17 16:00 ` Michal Hocko
2018-01-17 22:18 ` David Rientjes
2018-01-23 15:13 ` Michal Hocko
2018-01-17 22:14 ` David Rientjes
2018-01-19 20:53 ` David Rientjes
2018-01-20 12:32 ` Tejun Heo
2018-01-22 22:34 ` David Rientjes
2018-01-23 15:53 ` Michal Hocko
2018-01-23 22:22 ` David Rientjes
2018-01-24 8:20 ` Michal Hocko
2018-01-24 21:44 ` David Rientjes
2018-01-24 22:08 ` Andrew Morton
2018-01-24 22:18 ` Tejun Heo
2018-01-25 8:11 ` Michal Hocko
2018-01-25 8:05 ` Michal Hocko
2018-01-25 23:27 ` David Rientjes
2018-01-26 10:07 ` Michal Hocko
2018-01-26 22:33 ` David Rientjes
2018-01-17 2:15 ` [patch -mm 4/4] mm, memcg: add hierarchical usage oom policy David Rientjes
2018-01-17 11:46 ` [patch -mm 0/4] mm, memcg: introduce oom policies Roman Gushchin
2018-01-17 22:31 ` David Rientjes
2018-01-25 23:53 ` [patch -mm v2 0/3] " David Rientjes
2018-01-25 23:53 ` [patch -mm v2 1/3] mm, memcg: introduce per-memcg oom policy tunable David Rientjes
2018-01-26 17:15 ` Michal Hocko
2018-01-29 22:38 ` David Rientjes
2018-01-30 8:50 ` Michal Hocko
2018-01-30 22:38 ` David Rientjes
2018-01-31 9:47 ` Michal Hocko
2018-02-01 10:11 ` David Rientjes
2018-01-25 23:53 ` [patch -mm v2 2/3] mm, memcg: replace cgroup aware oom killer mount option with tunable David Rientjes
2018-01-26 0:00 ` Andrew Morton
2018-01-26 22:20 ` David Rientjes
2018-01-26 22:39 ` Andrew Morton
2018-01-26 22:52 ` David Rientjes
2018-01-27 0:17 ` Andrew Morton
2018-01-29 10:46 ` Michal Hocko
2018-01-29 19:11 ` Tejun Heo
2018-01-30 8:54 ` Michal Hocko
2018-01-30 11:58 ` Roman Gushchin
2018-01-30 12:08 ` Michal Hocko
2018-01-30 12:13 ` Roman Gushchin
2018-01-30 12:20 ` Michal Hocko
2018-01-30 15:15 ` Tejun Heo
2018-01-30 17:30 ` Johannes Weiner
2018-01-30 19:39 ` Andrew Morton
2018-01-29 22:16 ` David Rientjes
2018-01-25 23:53 ` [patch -mm v2 3/3] mm, memcg: add hierarchical usage oom policy David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.10.1801161814010.28198@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=tj@kernel.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).