All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Feng Tang <feng.tang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/page_alloc: detect allocation forbidden by cpuset and bail out early
Date: Tue, 7 Sep 2021 10:44:32 +0200	[thread overview]
Message-ID: <YTcmcEUmtO++WeBk@dhcp22.suse.cz> (raw)
In-Reply-To: <1631003150-96935-1-git-send-email-feng.tang@intel.com>

On Tue 07-09-21 16:25:50, Feng Tang wrote:
> There was report that starting an Ubuntu in docker while using cpuset
> to bind it to movlabe nodes (a node only has movable zone, like a node

s@movlabe@movable@

> for hotplug or a Persistent Memory  node in normal usage) will fail
> due to memory allocation failure, and then OOM is involved and many
> other innocent processes got killed. It can be reproduced with command:
> $docker run -it --rm  --cpuset-mems 4 ubuntu:latest bash -c
> "grep Mems_allowed /proc/self/status" (node 4 is a movable node)
> 
> The reason is, in the case, the target cpuset nodes only have movable
> zone, while the creation of an OS in docker sometimes needs to allocate
> memory in non-movable zones (dma/dma32/normal) like GFP_HIGHUSER, and
> the cpuset limit forbids the allocation, then out-of-memory killing is
> involved even when normal nodes and movable nodes both have many free
> memory.

It would be great to add a oom report here as an example.

> The failure is reasonable, but still there is one problem, that when
> the usage fails as it's an mission impossible due to the cpuset limit,
> the allocation should just not trigger reclaim/compaction, and more
> importantly, not get any innocent process oom-killed.

I would reformulate to something like:
"
The OOM killer cannot help to resolve the situation as there is no
usable memory for the request in the cpuset scope. The only reasonable
measure to take is to fail the allocation right away and have the caller
to deal with it.
"
 
> So add detection for cases like this in the slowpath of allocation,
> and bail out early returning NULL for the allocation.
> 
> We've run some cases of malloc/mmap/page_fault/lru-shm/swap from
> will-it-scale and vm-scalability, and didn't see obvious performance
> change (all inside +/- 1%), test boxes are 2 socket Cascade Lake and
> Icelake servers.
> 
> [thanks to Micho Hocko and David Rientjes for suggesting not handle
>  it inside OOM code]

While this is a good fix from the functionality POV I believe you can go
a step further. Please add a detection to the cpuset code and complain
to the kernel log if somebody tries to configure movable only cpuset.
Once you have that in place you can easily create a static branch for
cpuset_insane_setup() and have zero overhead for all reasonable
configuration. There shouldn't be any reason to pay a single cpu cycle
to check for something that almost nobody does.

What do you think?
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2021-09-07  8:44 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-07  8:25 [PATCH] mm/page_alloc: detect allocation forbidden by cpuset and bail out early Feng Tang
2021-09-07  8:44 ` Michal Hocko [this message]
2021-09-08  1:50   ` Feng Tang
2021-09-08  7:06     ` Michal Hocko
2021-09-08  8:12       ` Feng Tang
2021-09-10  7:44       ` Feng Tang
2021-09-10  8:35         ` Michal Hocko
2021-09-10  9:21           ` Feng Tang
2021-09-10 10:35             ` Michal Hocko
2021-09-10 11:29               ` Feng Tang
2021-09-10 11:43                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTcmcEUmtO++WeBk@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=feng.tang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.