linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Thelen <gthelen@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg
Date: Wed, 25 Oct 2017 21:00:57 +0200	[thread overview]
Message-ID: <20171025190057.mqmnprhce7kvsfz7@dhcp22.suse.cz> (raw)
In-Reply-To: <20171025181106.GA14967@cmpxchg.org>

On Wed 25-10-17 14:11:06, Johannes Weiner wrote:
> On Wed, Oct 25, 2017 at 07:29:24PM +0200, Michal Hocko wrote:
> > On Wed 25-10-17 12:44:02, Johannes Weiner wrote:
> > > On Wed, Oct 25, 2017 at 04:12:21PM +0200, Michal Hocko wrote:
> > > > So how about we start with a BIG FAT WARNING for the failure case?
> > > > Something resembling warn_alloc for the failure case.
> > > >
> > > > ---
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index 5d9323028870..3ba62c73eee5 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -1547,9 +1547,14 @@ static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
> > > >  	 * victim and then we have rely on mem_cgroup_oom_synchronize otherwise
> > > >  	 * we would fall back to the global oom killer in pagefault_out_of_memory
> > > >  	 */
> > > > -	if (!memcg->oom_kill_disable &&
> > > > -			mem_cgroup_out_of_memory(memcg, mask, order))
> > > > -		return true;
> > > > +	if (!memcg->oom_kill_disable) {
> > > > +		if (mem_cgroup_out_of_memory(memcg, mask, order))
> > > > +			return true;
> > > > +
> > > > +		WARN(!current->memcg_may_oom,
> > > > +				"Memory cgroup charge failed because of no reclaimable memory! "
> > > > +				"This looks like a misconfiguration or a kernel bug.");
> > > > +	}
> > > 
> > > That's crazy!
> > > 
> > > We shouldn't create interfaces that make it possible to accidentally
> > > livelock the kernel. Then warn about it and let it crash. That is a
> > > DOS-level lack of OS abstraction.
> > > 
> > > In such a situation, we should ignore oom_score_adj or ignore the hard
> > > limit. Even panic() would be better from a machine management point of
> > > view than leaving random tasks inside infinite loops.
> > > 
> > > Why is OOM-disabling a thing? Why isn't this simply a "kill everything
> > > else before you kill me"? It's crashing the kernel in trying to
> > > protect a userspace application. How is that not insane?
> > 
> > I really do not follow. What kind of livelock or crash are you talking
> > about. All this code does is that the charge request (which is not
> > explicitly GFP_NOFAIL) fails with ENOMEM if the oom killer is not able
> > to make a forward progress. That sounds like a safer option than failing
> > with ENOMEM unconditionally which is what we do currently.
> 
> I pointed out multiple times now that consistent -ENOMEM is better
> than a rare one because it's easier to verify application behavior
> with test runs. And I don't understand what your counter-argument is.

My counter argument is that running inside the memcg shouldn't be too
much different than running outside. And that means that if we try to
reduce chances of ENOMEM in the global case then we should do the same
in the memcg case as well. Failing overly eagerly is more harmful than
what the determinism gives you for testing. You have other means to test
failure paths like fault injections.
 
> "Safe" is a vague term, and it doesn't make much sense to me in this
> situation. The OOM behavior should be predictable and consistent.
> 
> Yes, global might in the rarest cases also return -ENOMEM. Maybe. We
> don't have to do that in memcg because we're not physically limited.

OK, so here seems to be the biggest disconnect. Being physically or
artificially constrained shouldn't make much difference IMHO. In both
cases the resource is simply limited for the consumer. And once all the
attempts to fit within the limit fail then the request for the resource
has to fail.
 
> > So the only change I am really proposing is to keep retrying as long
> > as the oom killer makes a forward progress and ENOMEM otherwise.
> 
> That's the behavior change I'm against.

So just to make it clear you would be OK with the retry on successful
OOM killer invocation and force charge on oom failure, right?

> > I am also not trying to protect an userspace application. Quite
> > contrary, I would like the application gets ENOMEM when it should run
> > away from the constraint it runs within. I am protecting everything
> > outside of the hard limited memcg actually.
> > 
> > So what is that I am missing?
> 
> The page fault path.
> 
> Maybe that's not what you set out to fix, but it will now not only
> enter an infinite loop, it will also WARN() on each iteration.

It will not warn! The WARN is explicitly for non-PF paths unless I
have screwed something there because admittedly I even didn't try to
compile that code - it was merely for an illustration. Please note that
the diff is on top of the previous one.

And we already do have an endless loop if the memcg PF oom path doesn't
make a forward progress. So there shouldn't be any difference in that
regards.

> It would make sense to step back and think about the comprehensive
> user-visible behavior of an out-of-memory situation, and not just the
> one syscall aspect.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2017-10-25 19:01 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-05 22:21 [PATCH] fs, mm: account filp and names caches to kmemcg Shakeel Butt
2017-10-06  7:59 ` Michal Hocko
2017-10-06 19:33   ` Shakeel Butt
2017-10-09  6:24     ` Michal Hocko
2017-10-09 17:52       ` Greg Thelen
2017-10-09 18:04         ` Michal Hocko
2017-10-09 18:17           ` Michal Hocko
2017-10-10  9:10             ` Michal Hocko
2017-10-10 22:21               ` Shakeel Butt
2017-10-11  9:09                 ` Michal Hocko
2017-10-09 20:26         ` Johannes Weiner
2017-10-10  9:14           ` Michal Hocko
2017-10-10 14:17             ` Johannes Weiner
2017-10-10 14:24               ` Michal Hocko
2017-10-12 19:03                 ` Johannes Weiner
2017-10-12 23:57                   ` Greg Thelen
2017-10-13  6:51                     ` Michal Hocko
2017-10-13  6:35                   ` Michal Hocko
2017-10-13  7:00                     ` Michal Hocko
2017-10-13 15:24                       ` Michal Hocko
2017-10-24 12:18                         ` Michal Hocko
2017-10-24 17:54                           ` Johannes Weiner
2017-10-24 16:06                         ` Johannes Weiner
2017-10-24 16:22                           ` Michal Hocko
2017-10-24 17:23                             ` Johannes Weiner
2017-10-24 17:55                               ` Michal Hocko
2017-10-24 18:58                                 ` Johannes Weiner
2017-10-24 20:15                                   ` Michal Hocko
2017-10-25  6:51                                     ` Greg Thelen
2017-10-25  7:15                                       ` Michal Hocko
2017-10-25 13:11                                         ` Johannes Weiner
2017-10-25 14:12                                           ` Michal Hocko
2017-10-25 16:44                                             ` Johannes Weiner
2017-10-25 17:29                                               ` Michal Hocko
2017-10-25 18:11                                                 ` Johannes Weiner
2017-10-25 19:00                                                   ` Michal Hocko [this message]
2017-10-25 21:13                                                     ` Johannes Weiner
2017-10-25 22:49                                                       ` Greg Thelen
2017-10-26  7:49                                                         ` Michal Hocko
2017-10-26 12:45                                                           ` Tetsuo Handa
2017-10-26 14:31                                                         ` Johannes Weiner
2017-10-26 19:56                                                           ` Greg Thelen
2017-10-27  8:20                                                             ` Michal Hocko
2017-10-27 20:50                                               ` Shakeel Butt
2017-10-30  8:29                                                 ` Michal Hocko
2017-10-30 19:28                                                   ` Shakeel Butt
2017-10-31  8:00                                                     ` Michal Hocko
2017-10-31 16:49                                                       ` Johannes Weiner
2017-10-31 18:50                                                         ` Michal Hocko
2017-10-24 15:45                     ` Johannes Weiner
2017-10-24 16:30                       ` Michal Hocko
2017-10-10 23:32 ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171025190057.mqmnprhce7kvsfz7@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).