linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>, Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [patch -mm] mm, memcg: disregard mempolicies for cgroup-aware oom killer
Date: Thu, 15 Mar 2018 13:51:01 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.20.1803151350420.55261@chino.kir.corp.google.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1803131720470.247949@chino.kir.corp.google.com>

The cgroup-aware oom killer currently considers the set of allowed nodes
for the allocation that triggers the oom killer and discounts usage from
disallowed nodes when comparing cgroups.

If a cgroup has both the cpuset and memory controllers enabled, it may be
possible to restrict allocations to a subset of nodes, for example.  Some
latency sensitive users use cpusets to allocate only local memory, almost
to the point of oom even though there is an abundance of available free
memory on other nodes.

The same is true for processes that mbind(2) their memory to a set of
allowed nodes.

This yields very inconsistent results by considering usage from each mem
cgroup (and perhaps its subtree) for the allocation's set of allowed nodes
for its mempolicy.  Allocating a single page for a vma that is mbind to a
now-oom node can cause a cgroup that is restricted to that node by its
cpuset controller to be oom killed when other cgroups may have much higher
overall usage.

The cgroup-aware oom killer is described as killing the largest memory
consuming cgroup (or subtree) without mentioning the mempolicy of the
allocation.  For now, discount it.  It would be possible to add an
additional oom policy for NUMA awareness if it would be generally useful
later with the extensible interface.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 Based on top of oom policy patch series at
 https://marc.info/?t=152090280800001 and follow-up patch at
 https://marc.info/?l=linux-kernel&m=152098687824112

 mm/memcontrol.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2608,19 +2608,15 @@ static inline bool memcg_has_children(struct mem_cgroup *memcg)
 	return ret;
 }
 
-static long memcg_oom_badness(struct mem_cgroup *memcg,
-			      const nodemask_t *nodemask)
+static long memcg_oom_badness(struct mem_cgroup *memcg)
 {
 	const bool is_root_memcg = memcg == root_mem_cgroup;
 	long points = 0;
 	int nid;
-	pg_data_t *pgdat;
 
 	for_each_node_state(nid, N_MEMORY) {
-		if (nodemask && !node_isset(nid, *nodemask))
-			continue;
+		pg_data_t *pgdat = NODE_DATA(nid);
 
-		pgdat = NODE_DATA(nid);
 		if (is_root_memcg) {
 			points += node_page_state(pgdat, NR_ACTIVE_ANON) +
 				  node_page_state(pgdat, NR_INACTIVE_ANON);
@@ -2656,8 +2652,7 @@ static long memcg_oom_badness(struct mem_cgroup *memcg,
  *   >0: memcg is eligible, and the returned value is an estimation
  *       of the memory footprint
  */
-static long oom_evaluate_memcg(struct mem_cgroup *memcg,
-			       const nodemask_t *nodemask)
+static long oom_evaluate_memcg(struct mem_cgroup *memcg)
 {
 	struct css_task_iter it;
 	struct task_struct *task;
@@ -2691,7 +2686,7 @@ static long oom_evaluate_memcg(struct mem_cgroup *memcg,
 	if (eligible <= 0)
 		return eligible;
 
-	return memcg_oom_badness(memcg, nodemask);
+	return memcg_oom_badness(memcg);
 }
 
 static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc)
@@ -2751,7 +2746,7 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc)
 		if (memcg_has_children(iter))
 			continue;
 
-		score = oom_evaluate_memcg(iter, oc->nodemask);
+		score = oom_evaluate_memcg(iter);
 
 		/*
 		 * Ignore empty and non-eligible memory cgroups.
@@ -2780,8 +2775,7 @@ static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc)
 
 	if (oc->chosen_memcg != INFLIGHT_VICTIM) {
 		if (root == root_mem_cgroup) {
-			group_score = oom_evaluate_memcg(root_mem_cgroup,
-							 oc->nodemask);
+			group_score = oom_evaluate_memcg(root_mem_cgroup);
 			if (group_score > leaf_score) {
 				/*
 				 * Discount the sum of all leaf scores to find

  parent reply	other threads:[~2018-03-15 20:51 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-13  0:57 [patch -mm v3 0/3] mm, memcg: introduce oom policies David Rientjes
2018-03-13  0:57 ` [patch -mm v3 1/3] mm, memcg: introduce per-memcg oom policy tunable David Rientjes
2018-03-14 12:38   ` Roman Gushchin
2018-03-14 20:58     ` David Rientjes
2018-03-15 17:10       ` Roman Gushchin
2018-03-15 20:16         ` David Rientjes
2018-03-13  0:57 ` [patch -mm v3 2/3] mm, memcg: replace cgroup aware oom killer mount option with tunable David Rientjes
2018-03-13  0:57 ` [patch -mm v3 3/3] mm, memcg: add hierarchical usage oom policy David Rientjes
2018-03-14  0:21 ` [patch -mm] mm, memcg: evaluate root and leaf memcgs fairly on oom David Rientjes
2018-03-14 12:17   ` Roman Gushchin
2018-03-14 20:41     ` David Rientjes
2018-03-15 16:46       ` Roman Gushchin
2018-03-15 20:01         ` David Rientjes
2018-03-15 20:34   ` [patch -mm] mm, memcg: separate oom_group from selection criteria David Rientjes
2018-03-15 20:51   ` David Rientjes [this message]
2018-03-15 20:54 ` [patch -mm v3 0/3] mm, memcg: introduce oom policies David Rientjes
2018-03-16 21:08   ` [patch -mm 0/6] rewrite cgroup aware oom killer for general use David Rientjes
2018-03-16 21:08     ` [patch -mm 1/6] mm, memcg: introduce per-memcg oom policy tunable David Rientjes
2018-03-16 21:08     ` [patch -mm 2/6] mm, memcg: replace cgroup aware oom killer mount option with tunable David Rientjes
2018-03-16 21:08     ` [patch -mm 3/6] mm, memcg: add hierarchical usage oom policy David Rientjes
2018-03-16 21:08     ` [patch -mm 4/6] mm, memcg: evaluate root and leaf memcgs fairly on oom David Rientjes
2018-03-18 15:00       ` kbuild test robot
2018-03-18 20:14         ` [patch -mm 4/6 updated] " David Rientjes
2018-03-18 18:18       ` [patch -mm 4/6] " kbuild test robot
2018-03-16 21:08     ` [patch -mm 5/6] mm, memcg: separate oom_group from selection criteria David Rientjes
2018-03-16 21:08     ` [patch -mm 6/6] mm, memcg: disregard mempolicies for cgroup-aware oom killer David Rientjes
2018-03-22 21:53     ` [patch v2 -mm 0/6] rewrite cgroup aware oom killer for general use David Rientjes
2018-03-22 21:53       ` [patch v2 -mm 1/6] mm, memcg: introduce per-memcg oom policy tunable David Rientjes
2018-03-22 21:53       ` [patch v2 -mm 2/6] mm, memcg: replace cgroup aware oom killer mount option with tunable David Rientjes
2018-03-22 21:53       ` [patch v2 -mm 3/6] mm, memcg: add hierarchical usage oom policy David Rientjes
2018-03-22 21:53       ` [patch v2 -mm 4/6] mm, memcg: evaluate root and leaf memcgs fairly on oom David Rientjes
2018-03-22 21:53       ` [patch v2 -mm 5/6] mm, memcg: separate oom_group from selection criteria David Rientjes
2018-03-22 21:53       ` [patch v2 -mm 6/6] mm, memcg: disregard mempolicies for cgroup-aware oom killer David Rientjes
2018-07-13 23:07       ` [patch v3 -mm 0/6] rewrite cgroup aware oom killer for general use David Rientjes
2018-07-13 23:07         ` [patch v3 -mm 1/6] mm, memcg: introduce per-memcg oom policy tunable David Rientjes
2018-07-13 23:07         ` [patch v3 -mm 2/6] mm, memcg: replace cgroup aware oom killer mount option with tunable David Rientjes
2018-07-13 23:07         ` [patch v3 -mm 3/6] mm, memcg: add hierarchical usage oom policy David Rientjes
2018-07-16 18:16           ` Roman Gushchin
2018-07-17  4:06             ` David Rientjes
2018-07-23 20:33               ` David Rientjes
2018-07-23 21:28                 ` Roman Gushchin
2018-07-23 23:22                   ` David Rientjes
2018-07-13 23:07         ` [patch v3 -mm 4/6] mm, memcg: evaluate root and leaf memcgs fairly on oom David Rientjes
2018-07-13 23:07         ` [patch v3 -mm 5/6] mm, memcg: separate oom_group from selection criteria David Rientjes
2018-07-13 23:07         ` [patch v3 -mm 6/6] mm, memcg: disregard mempolicies for cgroup-aware oom killer David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1803151350420.55261@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).