All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	Christian Brauner <christian@brauner.io>
Cc: linux-kernel@vger.kernel.org, Feng Tang <feng.tang@intel.com>
Subject: [RFC PATCH] mm/oom: detect and kill task which has allocation forbidden by cpuset limit
Date: Tue, 31 Aug 2021 16:38:05 +0800	[thread overview]
Message-ID: <1630399085-70431-1-git-send-email-feng.tang@intel.com> (raw)

There was report that starting an Ubuntu in docker while using cpuset
to bind it to movlabe nodes (a node only has movable zone, like a node
for hotplug or a PMEM node in normal usage) will fail due to memory
allocation failure, and then OOM is involved and many other innocent
processes got killed. It can be reproduced with command:
$docker run -it --rm  --cpuset-mems 4 ubuntu:latest bash -c
"grep Mems_allowed /proc/self/status" (node 4 is a movable node)

The reason is, in the case, the target cpuset nodes only have movable
zone, while the creation of an OS in docker sometimes needs to allocate
memory in non-movable zones (dma/dma32/normal) like GFP_HIGHUSER, and
the cpuset limit forbids the allocation, then out-of-memory killing is
involved even when normal nodes and movable nodes both have many free
memory.

We've posted patches to LKML trying to make the usage working by
loosening the check, which is not agreed as the cpuset binding should
be respected, and should not be bypassed [1]

But still there is another problem, that when the usage fails as it's an
mission impossible due to the cpuset limit, the allocating should just
be killed first, before any other innocent processes get killed.

Add detection for cases like this, and kill the allocating task only.

[1].https://lore.kernel.org/lkml/1604470210-124827-1-git-send-email-feng.tang@intel.com/

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 include/linux/oom.h |  1 +
 mm/oom_kill.c       | 13 ++++++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 2db9a1432511..bf470d8cc421 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -18,6 +18,7 @@ struct task_struct;
 enum oom_constraint {
 	CONSTRAINT_NONE,
 	CONSTRAINT_CPUSET,
+	CONSTRAINT_CPUSET_NONE,	/* no available zone from cpuset's mem nodes */
 	CONSTRAINT_MEMORY_POLICY,
 	CONSTRAINT_MEMCG,
 };
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 431d38c3bba8..021ec8954279 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -247,6 +247,7 @@ long oom_badness(struct task_struct *p, unsigned long totalpages)
 static const char * const oom_constraint_text[] = {
 	[CONSTRAINT_NONE] = "CONSTRAINT_NONE",
 	[CONSTRAINT_CPUSET] = "CONSTRAINT_CPUSET",
+	[CONSTRAINT_CPUSET_NONE] = "CONSTRAINT_CPUSET_NONE",
 	[CONSTRAINT_MEMORY_POLICY] = "CONSTRAINT_MEMORY_POLICY",
 	[CONSTRAINT_MEMCG] = "CONSTRAINT_MEMCG",
 };
@@ -275,6 +276,14 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc)
 
 	if (!oc->zonelist)
 		return CONSTRAINT_NONE;
+
+	if (cpusets_enabled() && (oc->gfp_mask & __GFP_HARDWALL)) {
+		z = first_zones_zonelist(oc->zonelist,
+			highest_zoneidx, &cpuset_current_mems_allowed);
+		if (!z->zone)
+			return CONSTRAINT_CPUSET_NONE;
+	}
+
 	/*
 	 * Reach here only when __GFP_NOFAIL is used. So, we should avoid
 	 * to kill current.We have to random task kill in this case.
@@ -1093,7 +1102,9 @@ bool out_of_memory(struct oom_control *oc)
 		oc->nodemask = NULL;
 	check_panic_on_oom(oc);
 
-	if (!is_memcg_oom(oc) && sysctl_oom_kill_allocating_task &&
+	if (!is_memcg_oom(oc) &&
+	    (sysctl_oom_kill_allocating_task ||
+	       oc->constraint == CONSTRAINT_CPUSET_NONE) &&
 	    current->mm && !oom_unkillable_task(current) &&
 	    oom_cpuset_eligible(current, oc) &&
 	    current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
-- 
2.14.1


             reply	other threads:[~2021-08-31  8:39 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-31  8:38 Feng Tang [this message]
2021-08-31 15:57 ` [RFC PATCH] mm/oom: detect and kill task which has allocation forbidden by cpuset limit Michal Hocko
2021-09-01  1:06   ` David Rientjes
2021-09-01  1:06     ` David Rientjes
2021-09-01  2:44     ` Feng Tang
2021-09-01 13:42       ` Feng Tang
2021-09-01 14:05         ` Michal Hocko
2021-09-02  7:34           ` Feng Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1630399085-70431-1-git-send-email-feng.tang@intel.com \
    --to=feng.tang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.