All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg
Date: Tue, 10 Oct 2017 10:17:33 -0400	[thread overview]
Message-ID: <20171010141733.GB16710@cmpxchg.org> (raw)
In-Reply-To: <20171010091430.giflzlayvjblx5bu@dhcp22.suse.cz>

On Tue, Oct 10, 2017 at 11:14:30AM +0200, Michal Hocko wrote:
> On Mon 09-10-17 16:26:13, Johannes Weiner wrote:
> > It's consistent in the sense that only page faults enable the memcg
> > OOM killer. It's not the type of memory that decides, it's whether the
> > allocation context has a channel to communicate an error to userspace.
> > 
> > Whether userspace is able to handle -ENOMEM from syscalls was a voiced
> > concern at the time this patch was merged, although there haven't been
> > any reports so far,
> 
> Well, I remember reports about MAP_POPULATE breaking or at least having
> an unexpected behavior.

Hm, that slipped past me. Did we do something about these? Or did they
fix userspace?

> Well, we should be able to do that with the oom_reaper. At least for v2
> which doesn't have synchronous userspace oom killing.

I don't see how the OOM reaper is a guarantee as long as we have this:

	if (!down_read_trylock(&mm->mmap_sem)) {
		ret = false;
		trace_skip_task_reaping(tsk->pid);
		goto unlock_oom;
	}

What do you mean by 'v2'?

> > > c) Overcharge kmem to oom memcg and queue an async memcg limit checker,
> > >    which will oom kill if needed.
> > 
> > This makes the most sense to me. Architecturally, I imagine this would
> > look like b), with an OOM handler at the point of return to userspace,
> > except that we'd overcharge instead of retrying the syscall.
> 
> I do not think we should break the hard limit semantic if possible. We
> can currently allow that for allocations which are very short term (oom
> victims) or too important to fail but allowing that for kmem charges in
> general sounds like too easy to runaway.

I'm not sure there is a convenient way out of this.

If we want to respect the hard limit AND guarantee allocation success,
the OOM killer has to free memory reliably - which it doesn't. But if
it did, we could also break the limit temporarily and have the OOM
killer replenish the pool before that userspace app can continue. The
allocation wouldn't have to be short-lived, since memory is fungible.

Until the OOM killer is 100% reliable, we have the choice between
sometimes deadlocking the cgroup tasks and everything that interacts
with them, returning -ENOMEM for syscalls, or breaking the hard limit
guarantee during memcg OOM.

It seems breaking the limit temporarily in order to reclaim memory is
the best option. There is kernel memory we don't account to the memcg
already because we think it's probably not going to be significant, so
the isolation isn't 100% watertight in the first place. And I'd rather
have the worst-case effect of a cgroup OOMing be spilling over its
hard limit than deadlocking things inside and outside the cgroup.

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg
Date: Tue, 10 Oct 2017 10:17:33 -0400	[thread overview]
Message-ID: <20171010141733.GB16710@cmpxchg.org> (raw)
In-Reply-To: <20171010091430.giflzlayvjblx5bu@dhcp22.suse.cz>

On Tue, Oct 10, 2017 at 11:14:30AM +0200, Michal Hocko wrote:
> On Mon 09-10-17 16:26:13, Johannes Weiner wrote:
> > It's consistent in the sense that only page faults enable the memcg
> > OOM killer. It's not the type of memory that decides, it's whether the
> > allocation context has a channel to communicate an error to userspace.
> > 
> > Whether userspace is able to handle -ENOMEM from syscalls was a voiced
> > concern at the time this patch was merged, although there haven't been
> > any reports so far,
> 
> Well, I remember reports about MAP_POPULATE breaking or at least having
> an unexpected behavior.

Hm, that slipped past me. Did we do something about these? Or did they
fix userspace?

> Well, we should be able to do that with the oom_reaper. At least for v2
> which doesn't have synchronous userspace oom killing.

I don't see how the OOM reaper is a guarantee as long as we have this:

	if (!down_read_trylock(&mm->mmap_sem)) {
		ret = false;
		trace_skip_task_reaping(tsk->pid);
		goto unlock_oom;
	}

What do you mean by 'v2'?

> > > c) Overcharge kmem to oom memcg and queue an async memcg limit checker,
> > >    which will oom kill if needed.
> > 
> > This makes the most sense to me. Architecturally, I imagine this would
> > look like b), with an OOM handler at the point of return to userspace,
> > except that we'd overcharge instead of retrying the syscall.
> 
> I do not think we should break the hard limit semantic if possible. We
> can currently allow that for allocations which are very short term (oom
> victims) or too important to fail but allowing that for kmem charges in
> general sounds like too easy to runaway.

I'm not sure there is a convenient way out of this.

If we want to respect the hard limit AND guarantee allocation success,
the OOM killer has to free memory reliably - which it doesn't. But if
it did, we could also break the limit temporarily and have the OOM
killer replenish the pool before that userspace app can continue. The
allocation wouldn't have to be short-lived, since memory is fungible.

Until the OOM killer is 100% reliable, we have the choice between
sometimes deadlocking the cgroup tasks and everything that interacts
with them, returning -ENOMEM for syscalls, or breaking the hard limit
guarantee during memcg OOM.

It seems breaking the limit temporarily in order to reclaim memory is
the best option. There is kernel memory we don't account to the memcg
already because we think it's probably not going to be significant, so
the isolation isn't 100% watertight in the first place. And I'd rather
have the worst-case effect of a cgroup OOMing be spilling over its
hard limit than deadlocking things inside and outside the cgroup.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-10 14:17 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-05 22:21 [PATCH] fs, mm: account filp and names caches to kmemcg Shakeel Butt
2017-10-05 22:21 ` Shakeel Butt
2017-10-06  7:59 ` Michal Hocko
2017-10-06  7:59   ` Michal Hocko
2017-10-06 19:33   ` Shakeel Butt
2017-10-06 19:33     ` Shakeel Butt
2017-10-09  6:24     ` Michal Hocko
2017-10-09  6:24       ` Michal Hocko
2017-10-09 17:52       ` Greg Thelen
2017-10-09 17:52         ` Greg Thelen
2017-10-09 18:04         ` Michal Hocko
2017-10-09 18:04           ` Michal Hocko
2017-10-09 18:17           ` Michal Hocko
2017-10-09 18:17             ` Michal Hocko
2017-10-10  9:10             ` Michal Hocko
2017-10-10  9:10               ` Michal Hocko
2017-10-10 22:21               ` Shakeel Butt
2017-10-10 22:21                 ` Shakeel Butt
2017-10-11  9:09                 ` Michal Hocko
2017-10-11  9:09                   ` Michal Hocko
2017-10-09 20:26         ` Johannes Weiner
2017-10-09 20:26           ` Johannes Weiner
2017-10-10  9:14           ` Michal Hocko
2017-10-10  9:14             ` Michal Hocko
2017-10-10 14:17             ` Johannes Weiner [this message]
2017-10-10 14:17               ` Johannes Weiner
2017-10-10 14:24               ` Michal Hocko
2017-10-10 14:24                 ` Michal Hocko
2017-10-12 19:03                 ` Johannes Weiner
2017-10-12 19:03                   ` Johannes Weiner
2017-10-12 23:57                   ` Greg Thelen
2017-10-12 23:57                     ` Greg Thelen
2017-10-13  6:51                     ` Michal Hocko
2017-10-13  6:51                       ` Michal Hocko
2017-10-13  6:35                   ` Michal Hocko
2017-10-13  6:35                     ` Michal Hocko
2017-10-13  7:00                     ` Michal Hocko
2017-10-13  7:00                       ` Michal Hocko
2017-10-13 15:24                       ` Michal Hocko
2017-10-13 15:24                         ` Michal Hocko
2017-10-24 12:18                         ` Michal Hocko
2017-10-24 12:18                           ` Michal Hocko
2017-10-24 17:54                           ` Johannes Weiner
2017-10-24 17:54                             ` Johannes Weiner
2017-10-24 16:06                         ` Johannes Weiner
2017-10-24 16:06                           ` Johannes Weiner
2017-10-24 16:22                           ` Michal Hocko
2017-10-24 16:22                             ` Michal Hocko
2017-10-24 17:23                             ` Johannes Weiner
2017-10-24 17:23                               ` Johannes Weiner
2017-10-24 17:55                               ` Michal Hocko
2017-10-24 17:55                                 ` Michal Hocko
2017-10-24 18:58                                 ` Johannes Weiner
2017-10-24 18:58                                   ` Johannes Weiner
2017-10-24 20:15                                   ` Michal Hocko
2017-10-24 20:15                                     ` Michal Hocko
2017-10-25  6:51                                     ` Greg Thelen
2017-10-25  6:51                                       ` Greg Thelen
2017-10-25  7:15                                       ` Michal Hocko
2017-10-25  7:15                                         ` Michal Hocko
2017-10-25 13:11                                         ` Johannes Weiner
2017-10-25 13:11                                           ` Johannes Weiner
2017-10-25 14:12                                           ` Michal Hocko
2017-10-25 14:12                                             ` Michal Hocko
2017-10-25 16:44                                             ` Johannes Weiner
2017-10-25 16:44                                               ` Johannes Weiner
2017-10-25 17:29                                               ` Michal Hocko
2017-10-25 17:29                                                 ` Michal Hocko
2017-10-25 18:11                                                 ` Johannes Weiner
2017-10-25 18:11                                                   ` Johannes Weiner
2017-10-25 19:00                                                   ` Michal Hocko
2017-10-25 19:00                                                     ` Michal Hocko
2017-10-25 21:13                                                     ` Johannes Weiner
2017-10-25 21:13                                                       ` Johannes Weiner
2017-10-25 22:49                                                       ` Greg Thelen
2017-10-25 22:49                                                         ` Greg Thelen
2017-10-26  7:49                                                         ` Michal Hocko
2017-10-26  7:49                                                           ` Michal Hocko
2017-10-26 12:45                                                           ` Tetsuo Handa
2017-10-26 12:45                                                             ` Tetsuo Handa
2017-10-26 14:31                                                         ` Johannes Weiner
2017-10-26 14:31                                                           ` Johannes Weiner
2017-10-26 19:56                                                           ` Greg Thelen
2017-10-26 19:56                                                             ` Greg Thelen
2017-10-27  8:20                                                             ` Michal Hocko
2017-10-27  8:20                                                               ` Michal Hocko
2017-10-27 20:50                                               ` Shakeel Butt
2017-10-27 20:50                                                 ` Shakeel Butt
2017-10-30  8:29                                                 ` Michal Hocko
2017-10-30  8:29                                                   ` Michal Hocko
2017-10-30 19:28                                                   ` Shakeel Butt
2017-10-30 19:28                                                     ` Shakeel Butt
2017-10-31  8:00                                                     ` Michal Hocko
2017-10-31  8:00                                                       ` Michal Hocko
2017-10-31 16:49                                                       ` Johannes Weiner
2017-10-31 16:49                                                         ` Johannes Weiner
2017-10-31 18:50                                                         ` Michal Hocko
2017-10-31 18:50                                                           ` Michal Hocko
2017-10-24 15:45                     ` Johannes Weiner
2017-10-24 15:45                       ` Johannes Weiner
2017-10-24 16:30                       ` Michal Hocko
2017-10-24 16:30                         ` Michal Hocko
2017-10-10 23:32 ` Al Viro
2017-10-10 23:32   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171010141733.GB16710@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.