All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg
Date: Wed, 25 Oct 2017 09:11:51 -0400	[thread overview]
Message-ID: <20171025131151.GA8210@cmpxchg.org> (raw)
In-Reply-To: <20171025071522.xyw4lsvdv4xsbhbo@dhcp22.suse.cz>

On Wed, Oct 25, 2017 at 09:15:22AM +0200, Michal Hocko wrote:
> On Tue 24-10-17 23:51:30, Greg Thelen wrote:
> > Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > > I am definitely not pushing that thing right now. It is good to discuss
> > > it, though. The more kernel allocations we will track the more careful we
> > > will have to be. So maybe we will have to reconsider the current
> > > approach. I am not sure we need it _right now_ but I feel we will
> > > eventually have to reconsider it.
> > 
> > The kernel already attempts to charge radix_tree_nodes.  If they fail
> > then we fallback to unaccounted memory. 
> 
> I am not sure which code path you have in mind. All I can see is that we
> drop __GFP_ACCOUNT when preloading radix tree nodes. Anyway...
> 
> > So the memcg limit already
> > isn't an air tight constraint.

I fully agree with this. Socket buffers overcharge too. There are
plenty of memory allocations that aren't even tracked.

The point is, it's a hard limit in the sense that breaching it will
trigger the OOM killer. It's not a hard limit in the sense that the
kernel will deadlock to avoid crossing it.

> ... we shouldn't make it more loose though.

Then we can end this discussion right now. I pointed out right from
the start that the only way to replace -ENOMEM with OOM killing in the
syscall is to force charges. If we don't, we either deadlock or still
return -ENOMEM occasionally. Nobody has refuted that this is the case.

> > The current thread can loop in syscall exit until
> > usage is reconciled (either via reclaim or kill).  This seems consistent
> > with pagefault oom handling and compatible with overcommit use case.
> 
> But we do not really want to make the syscall exit path any more complex
> or more expensive than it is. The point is that we shouldn't be afraid
> about triggering the oom killer from the charge patch because we do have
> async OOM killer. This is very same with the standard allocator path. So
> why should be memcg any different?

I have nothing against triggering the OOM killer from the allocation
path. I am dead-set against making the -ENOMEM return from syscalls
rare and unpredictable. They're a challenge as it is.

The only sane options are to stick with the status quo, or make sure
the task never returns before the allocation succeeds. Making things
in this path more speculative is a downgrade, not an improvement.

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg
Date: Wed, 25 Oct 2017 09:11:51 -0400	[thread overview]
Message-ID: <20171025131151.GA8210@cmpxchg.org> (raw)
In-Reply-To: <20171025071522.xyw4lsvdv4xsbhbo@dhcp22.suse.cz>

On Wed, Oct 25, 2017 at 09:15:22AM +0200, Michal Hocko wrote:
> On Tue 24-10-17 23:51:30, Greg Thelen wrote:
> > Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > > I am definitely not pushing that thing right now. It is good to discuss
> > > it, though. The more kernel allocations we will track the more careful we
> > > will have to be. So maybe we will have to reconsider the current
> > > approach. I am not sure we need it _right now_ but I feel we will
> > > eventually have to reconsider it.
> > 
> > The kernel already attempts to charge radix_tree_nodes.  If they fail
> > then we fallback to unaccounted memory. 
> 
> I am not sure which code path you have in mind. All I can see is that we
> drop __GFP_ACCOUNT when preloading radix tree nodes. Anyway...
> 
> > So the memcg limit already
> > isn't an air tight constraint.

I fully agree with this. Socket buffers overcharge too. There are
plenty of memory allocations that aren't even tracked.

The point is, it's a hard limit in the sense that breaching it will
trigger the OOM killer. It's not a hard limit in the sense that the
kernel will deadlock to avoid crossing it.

> ... we shouldn't make it more loose though.

Then we can end this discussion right now. I pointed out right from
the start that the only way to replace -ENOMEM with OOM killing in the
syscall is to force charges. If we don't, we either deadlock or still
return -ENOMEM occasionally. Nobody has refuted that this is the case.

> > The current thread can loop in syscall exit until
> > usage is reconciled (either via reclaim or kill).  This seems consistent
> > with pagefault oom handling and compatible with overcommit use case.
> 
> But we do not really want to make the syscall exit path any more complex
> or more expensive than it is. The point is that we shouldn't be afraid
> about triggering the oom killer from the charge patch because we do have
> async OOM killer. This is very same with the standard allocator path. So
> why should be memcg any different?

I have nothing against triggering the OOM killer from the allocation
path. I am dead-set against making the -ENOMEM return from syscalls
rare and unpredictable. They're a challenge as it is.

The only sane options are to stick with the status quo, or make sure
the task never returns before the allocation succeeds. Making things
in this path more speculative is a downgrade, not an improvement.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-25 13:16 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-05 22:21 [PATCH] fs, mm: account filp and names caches to kmemcg Shakeel Butt
2017-10-05 22:21 ` Shakeel Butt
2017-10-06  7:59 ` Michal Hocko
2017-10-06  7:59   ` Michal Hocko
2017-10-06 19:33   ` Shakeel Butt
2017-10-06 19:33     ` Shakeel Butt
2017-10-09  6:24     ` Michal Hocko
2017-10-09  6:24       ` Michal Hocko
2017-10-09 17:52       ` Greg Thelen
2017-10-09 17:52         ` Greg Thelen
2017-10-09 18:04         ` Michal Hocko
2017-10-09 18:04           ` Michal Hocko
2017-10-09 18:17           ` Michal Hocko
2017-10-09 18:17             ` Michal Hocko
2017-10-10  9:10             ` Michal Hocko
2017-10-10  9:10               ` Michal Hocko
2017-10-10 22:21               ` Shakeel Butt
2017-10-10 22:21                 ` Shakeel Butt
2017-10-11  9:09                 ` Michal Hocko
2017-10-11  9:09                   ` Michal Hocko
2017-10-09 20:26         ` Johannes Weiner
2017-10-09 20:26           ` Johannes Weiner
2017-10-10  9:14           ` Michal Hocko
2017-10-10  9:14             ` Michal Hocko
2017-10-10 14:17             ` Johannes Weiner
2017-10-10 14:17               ` Johannes Weiner
2017-10-10 14:24               ` Michal Hocko
2017-10-10 14:24                 ` Michal Hocko
2017-10-12 19:03                 ` Johannes Weiner
2017-10-12 19:03                   ` Johannes Weiner
2017-10-12 23:57                   ` Greg Thelen
2017-10-12 23:57                     ` Greg Thelen
2017-10-13  6:51                     ` Michal Hocko
2017-10-13  6:51                       ` Michal Hocko
2017-10-13  6:35                   ` Michal Hocko
2017-10-13  6:35                     ` Michal Hocko
2017-10-13  7:00                     ` Michal Hocko
2017-10-13  7:00                       ` Michal Hocko
2017-10-13 15:24                       ` Michal Hocko
2017-10-13 15:24                         ` Michal Hocko
2017-10-24 12:18                         ` Michal Hocko
2017-10-24 12:18                           ` Michal Hocko
2017-10-24 17:54                           ` Johannes Weiner
2017-10-24 17:54                             ` Johannes Weiner
2017-10-24 16:06                         ` Johannes Weiner
2017-10-24 16:06                           ` Johannes Weiner
2017-10-24 16:22                           ` Michal Hocko
2017-10-24 16:22                             ` Michal Hocko
2017-10-24 17:23                             ` Johannes Weiner
2017-10-24 17:23                               ` Johannes Weiner
2017-10-24 17:55                               ` Michal Hocko
2017-10-24 17:55                                 ` Michal Hocko
2017-10-24 18:58                                 ` Johannes Weiner
2017-10-24 18:58                                   ` Johannes Weiner
2017-10-24 20:15                                   ` Michal Hocko
2017-10-24 20:15                                     ` Michal Hocko
2017-10-25  6:51                                     ` Greg Thelen
2017-10-25  6:51                                       ` Greg Thelen
2017-10-25  7:15                                       ` Michal Hocko
2017-10-25  7:15                                         ` Michal Hocko
2017-10-25 13:11                                         ` Johannes Weiner [this message]
2017-10-25 13:11                                           ` Johannes Weiner
2017-10-25 14:12                                           ` Michal Hocko
2017-10-25 14:12                                             ` Michal Hocko
2017-10-25 16:44                                             ` Johannes Weiner
2017-10-25 16:44                                               ` Johannes Weiner
2017-10-25 17:29                                               ` Michal Hocko
2017-10-25 17:29                                                 ` Michal Hocko
2017-10-25 18:11                                                 ` Johannes Weiner
2017-10-25 18:11                                                   ` Johannes Weiner
2017-10-25 19:00                                                   ` Michal Hocko
2017-10-25 19:00                                                     ` Michal Hocko
2017-10-25 21:13                                                     ` Johannes Weiner
2017-10-25 21:13                                                       ` Johannes Weiner
2017-10-25 22:49                                                       ` Greg Thelen
2017-10-25 22:49                                                         ` Greg Thelen
2017-10-26  7:49                                                         ` Michal Hocko
2017-10-26  7:49                                                           ` Michal Hocko
2017-10-26 12:45                                                           ` Tetsuo Handa
2017-10-26 12:45                                                             ` Tetsuo Handa
2017-10-26 14:31                                                         ` Johannes Weiner
2017-10-26 14:31                                                           ` Johannes Weiner
2017-10-26 19:56                                                           ` Greg Thelen
2017-10-26 19:56                                                             ` Greg Thelen
2017-10-27  8:20                                                             ` Michal Hocko
2017-10-27  8:20                                                               ` Michal Hocko
2017-10-27 20:50                                               ` Shakeel Butt
2017-10-27 20:50                                                 ` Shakeel Butt
2017-10-30  8:29                                                 ` Michal Hocko
2017-10-30  8:29                                                   ` Michal Hocko
2017-10-30 19:28                                                   ` Shakeel Butt
2017-10-30 19:28                                                     ` Shakeel Butt
2017-10-31  8:00                                                     ` Michal Hocko
2017-10-31  8:00                                                       ` Michal Hocko
2017-10-31 16:49                                                       ` Johannes Weiner
2017-10-31 16:49                                                         ` Johannes Weiner
2017-10-31 18:50                                                         ` Michal Hocko
2017-10-31 18:50                                                           ` Michal Hocko
2017-10-24 15:45                     ` Johannes Weiner
2017-10-24 15:45                       ` Johannes Weiner
2017-10-24 16:30                       ` Michal Hocko
2017-10-24 16:30                         ` Michal Hocko
2017-10-10 23:32 ` Al Viro
2017-10-10 23:32   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171025131151.GA8210@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.