All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Feng Tang <feng.tang@intel.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Christian Brauner <christian@brauner.io>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] mm/oom: detect and kill task which has allocation forbidden by cpuset limit
Date: Tue, 31 Aug 2021 17:57:02 +0200	[thread overview]
Message-ID: <YS5RTiVgydjszmjn@dhcp22.suse.cz> (raw)
In-Reply-To: <1630399085-70431-1-git-send-email-feng.tang@intel.com>

On Tue 31-08-21 16:38:05, Feng Tang wrote:
> There was report that starting an Ubuntu in docker while using cpuset
> to bind it to movlabe nodes (a node only has movable zone, like a node
> for hotplug or a PMEM node in normal usage) will fail due to memory
> allocation failure, and then OOM is involved and many other innocent
> processes got killed. It can be reproduced with command:
> $docker run -it --rm  --cpuset-mems 4 ubuntu:latest bash -c
> "grep Mems_allowed /proc/self/status" (node 4 is a movable node)

Is there any valid usecase to allow cpusets to be configured only to
movable nodes? Wouldn't it be better to simply disallow such a setup?
I do understand that we usually allow people to shoot their feet but
this one has some wider consequences.

> The reason is, in the case, the target cpuset nodes only have movable
> zone, while the creation of an OS in docker sometimes needs to allocate
> memory in non-movable zones (dma/dma32/normal) like GFP_HIGHUSER, and
> the cpuset limit forbids the allocation, then out-of-memory killing is
> involved even when normal nodes and movable nodes both have many free
> memory.
> 
> We've posted patches to LKML trying to make the usage working by
> loosening the check, which is not agreed as the cpuset binding should
> be respected, and should not be bypassed [1]
> 
> But still there is another problem, that when the usage fails as it's an
> mission impossible due to the cpuset limit, the allocating should just
> be killed first, before any other innocent processes get killed.

I do not like this solution TBH. We know that that it is impossible to
satisfy the allocation at the page allocator level so dealing with it at
the OOM killer level is just a bad layering and a lot of wasted cycles
to reach that point. Why cannot we simply fail the allocation if cpusets
filtering leads to an empty zone intersection?
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2021-08-31 15:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-31  8:38 [RFC PATCH] mm/oom: detect and kill task which has allocation forbidden by cpuset limit Feng Tang
2021-08-31 15:57 ` Michal Hocko [this message]
2021-09-01  1:06   ` David Rientjes
2021-09-01  1:06     ` David Rientjes
2021-09-01  2:44     ` Feng Tang
2021-09-01 13:42       ` Feng Tang
2021-09-01 14:05         ` Michal Hocko
2021-09-02  7:34           ` Feng Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YS5RTiVgydjszmjn@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=feng.tang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.