linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch] mm: memcg: do not declare OOM from __GFP_NOFAIL allocations
Date: Mon, 2 Dec 2013 14:22:01 +0100	[thread overview]
Message-ID: <20131202132201.GC18838@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.02.1311291543400.22413@chino.kir.corp.google.com>

On Fri 29-11-13 15:46:16, David Rientjes wrote:
> On Thu, 28 Nov 2013, Michal Hocko wrote:
> 
> > > Ok, so let's forget about GFP_KERNEL | __GFP_NOFAIL since anything doing 
> > > __GFP_FS should not be holding such locks, we have some of those in the 
> > > drivers code and that makes sense that they are doing GFP_KERNEL.
> > > 
> > > Focusing on the GFP_NOFS | __GFP_NOFAIL allocations in the filesystem 
> > > code, the kernel oom killer independent of memcg never gets called because 
> > > !__GFP_FS and they'll simply loop around the page allocator forever.
> > > 
> > > In the past, Andrew has expressed the desire to get rid of __GFP_NOFAIL 
> > > entirely since it's flawed when combined with GFP_NOFS (and GFP_KERNEL | 
> > > __GFP_NOFAIL could simply be reimplemented in the caller) because of the 
> > > reason you point out in addition to making it very difficult in the page 
> > > allocator to free memory independent of memcg.
> > > 
> > > So I'm wondering if we should just disable the oom killer in memcg for 
> > > __GFP_NOFAIL as you've done here, but not bypass to the root memcg and 
> > > just allow them to spin?  I think we should be focused on the fixing the 
> > > callers rather than breaking memcg isolation.
> > 
> > What if the callers simply cannot deal with the allocation failure?
> > 84235de394d97 (fs: buffer: move allocation failure loop into the
> > allocator) describes one such case when __getblk_slow tries desperately
> > to grow buffers relying on the reclaim to free something. As there might
> > be no reclaim going on we are screwed.
> > 
> 
> My suggestion is to spin, not return NULL. 

Spin on which level? The whole point of this change was to not spin for
ever because the caller might sit on top of other locks which might
prevent somebody else to die although it has been killed.

> Bypassing to the root memcg 
> can lead to a system oom condition whereas if memcg weren't involved at 
> all the page allocator would just spin (because of !__GFP_FS).

I am confused now. The page allocation has already happened at the time
we are doing the charge. So the global OOM would have happened already.

> > That being said, while I do agree with you that we should strive for
> > isolation as much as possible there are certain cases when this is
> > impossible to achieve without seeing much worse consequences. For now,
> > we hope that __GFP_NOFAIL is used very scarcely.
> 
> If that's true, why not bypass the per-zone min watermarks in the page 
> allocator as well to allow these allocations to succeed?

Allocations are already done. We simply cannot charge that allocation
because we have reached the hard limit. And the said allocation might
prevent OOM action to proceed due to held locks.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2013-12-02 13:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-22 17:17 [patch] mm: memcg: do not declare OOM from __GFP_NOFAIL allocations Johannes Weiner
2013-11-27  1:01 ` David Rientjes
2013-11-27  3:33   ` David Rientjes
2013-11-27 16:39     ` Johannes Weiner
2013-11-27 21:38       ` David Rientjes
2013-11-27 22:53         ` Johannes Weiner
2013-11-27 23:34           ` David Rientjes
2013-11-28 10:20             ` Michal Hocko
2013-11-29 23:46               ` David Rientjes
2013-12-02 13:22                 ` Michal Hocko [this message]
2013-12-02 23:02                   ` David Rientjes
2013-12-03 22:25                     ` Johannes Weiner
2013-12-03 23:40                       ` David Rientjes
2013-12-04  3:01                         ` Johannes Weiner
2013-12-04  4:34                           ` Dave Chinner
2013-12-04  5:25                             ` Johannes Weiner
2013-12-04  6:10                               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131202132201.GC18838@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).