linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>, <linux-mm@kvack.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, <kernel-team@fb.com>,
	<cgroups@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Thu, 14 Sep 2017 09:05:48 -0700	[thread overview]
Message-ID: <20170914160548.GA30441@castle> (raw)
In-Reply-To: <20170914134014.wqemev2kgychv7m5@dhcp22.suse.cz>

On Thu, Sep 14, 2017 at 03:40:14PM +0200, Michal Hocko wrote:
> On Wed 13-09-17 14:56:07, Roman Gushchin wrote:
> > On Wed, Sep 13, 2017 at 02:29:14PM +0200, Michal Hocko wrote:
> [...]
> > > I strongly believe that comparing only leaf memcgs
> > > is more straightforward and it doesn't lead to unexpected results as
> > > mentioned before (kill a small memcg which is a part of the larger
> > > sub-hierarchy).
> > 
> > One of two main goals of this patchset is to introduce cgroup-level
> > fairness: bigger cgroups should be affected more than smaller,
> > despite the size of tasks inside. I believe the same principle
> > should be used for cgroups.
> 
> Yes bigger cgroups should be preferred but I fail to see why bigger
> hierarchies should be considered as well if they are not kill-all. And
> whether non-leaf memcgs should allow kill-all is not entirely clear to
> me. What would be the usecase?

We definitely want to support kill-all for non-leaf cgroups.
A workload can consist of several cgroups and we want to clean up
the whole thing on OOM. I don't see any reasons to limit
this functionality to leaf cgroups only.

Hierarchies are memory consumers, we do account their usage,
we do apply limits and guarantees for the hierarchies. The same is
with OOM victim selection: we are reclaiming memory from the
biggest consumer. Kill-all knob only defines the way _how_ we do that:
by killing one or all processes.

Just for example, we might want to take memory.low into account at
some point: prefer cgroups which are above their guarantees, avoid
killing those who fit. It would be hard if we're comparing cgroups
from different hierarchies. The same will be with introducing
oom_priorities, which is much more required functionality.

> Consider that it might be not your choice (as a user) how deep is your
> leaf memcg. I can already see how people complain that their memcg has
> been killed just because it was one level deeper in the hierarchy...

The kill-all functionality is enforced by parent, and it seems to be
following the overall memcg design. The parent cgroup enforces memory
limit, memory low limit, etc.

I don't know why OOM control should be different.

> 
> I would really start simple and only allow kill-all on leaf memcgs and
> only compare leaf memcgs & root. If we ever need to kill whole
> hierarchies then allow kill-all on intermediate memcgs as well and then
> consider cumulative consumptions only on those that have kill-all
> enabled.

This sounds hacky to me: the whole thing is depending on cgroup v2 and
is additionally explicitly opt-in.

Why do we need to introduce such incomplete functionality first,
and then suffer trying to extend it and provide backward compatibility?

Also, I think we should compare root cgroup with top-level cgroups,
rather than leaf cgroups. A process in the root cgroup is definitely
system-level entity, and we should compare it with other top-level
entities (other containerized workloads), rather then some random
leaf cgroup deep inside the tree. If we decided, that we're not comparing
random tasks from different cgroups, why should we do this for leaf
cgroups? Is sounds like making only one step towards right direction,
while we can do more.

> 
> Or do I miss any reasonable usecase that would suffer from such a
> semantic?

Kill-all for sub-trees is definitely required.
Enforcing oom_priorities for sub-trees is something that I would expect
very useful too. Comparing leaf cgroups system-wide instead of processes
doesn't sound good for me, we're lacking hierarchical fairness, which
was one of two goals of this patchset.

Thanks!

  reply	other threads:[~2017-09-14 16:06 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-11 13:17 [v8 0/4] cgroup-aware OOM killer Roman Gushchin
2017-09-11 13:17 ` [v8 1/4] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-09-11 20:51   ` David Rientjes
2017-09-14 13:42   ` Michal Hocko
2017-09-11 13:17 ` [v8 2/4] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-09-13 20:46   ` David Rientjes
2017-09-13 21:59     ` Roman Gushchin
2017-09-11 13:17 ` [v8 3/4] mm, oom: add cgroup v2 mount option for " Roman Gushchin
2017-09-11 20:48   ` David Rientjes
2017-09-12 20:01     ` Roman Gushchin
2017-09-12 20:23       ` David Rientjes
2017-09-13 12:23       ` Michal Hocko
2017-09-11 13:17 ` [v8 4/4] mm, oom, docs: describe the " Roman Gushchin
2017-09-11 20:44 ` [v8 0/4] " David Rientjes
2017-09-13 12:29   ` Michal Hocko
2017-09-13 20:46     ` David Rientjes
2017-09-14 13:34       ` Michal Hocko
2017-09-14 20:07         ` David Rientjes
2017-09-13 21:56     ` Roman Gushchin
2017-09-14 13:40       ` Michal Hocko
2017-09-14 16:05         ` Roman Gushchin [this message]
2017-09-15 10:58           ` Michal Hocko
2017-09-15 15:23             ` Roman Gushchin
2017-09-15 19:55               ` David Rientjes
2017-09-15 21:08                 ` Roman Gushchin
2017-09-18  6:20                   ` Michal Hocko
2017-09-18 15:02                     ` Roman Gushchin
2017-09-21  8:30                       ` David Rientjes
2017-09-19 20:54                   ` David Rientjes
2017-09-20 22:24                     ` Roman Gushchin
2017-09-21  8:27                       ` David Rientjes
2017-09-18  6:16                 ` Michal Hocko
2017-09-19 20:51                   ` David Rientjes
2017-09-18  6:14               ` Michal Hocko
2017-09-20 21:53                 ` Roman Gushchin
2017-09-25 12:24                   ` Michal Hocko
2017-09-25 17:00                     ` Johannes Weiner
2017-09-25 18:15                       ` Roman Gushchin
2017-09-25 20:25                         ` Michal Hocko
2017-09-26 10:59                           ` Roman Gushchin
2017-09-26 11:21                             ` Michal Hocko
2017-09-26 12:13                               ` Roman Gushchin
2017-09-26 13:30                                 ` Michal Hocko
2017-09-26 17:26                                   ` Johannes Weiner
2017-09-27  3:37                                     ` Tim Hockin
2017-09-27  7:43                                       ` Michal Hocko
2017-09-27 10:19                                         ` Roman Gushchin
2017-09-27 15:35                                         ` Tim Hockin
2017-09-27 16:23                                           ` Roman Gushchin
2017-09-27 18:11                                             ` Tim Hockin
2017-10-01 23:29                                               ` Shakeel Butt
2017-10-02 11:56                                                 ` Tetsuo Handa
2017-10-02 12:24                                                 ` Michal Hocko
2017-10-02 12:47                                                   ` Roman Gushchin
2017-10-02 14:29                                                     ` Michal Hocko
2017-10-02 19:00                                                   ` Shakeel Butt
2017-10-02 19:28                                                     ` Michal Hocko
2017-10-02 19:45                                                       ` Shakeel Butt
2017-10-02 19:56                                                         ` Michal Hocko
2017-10-02 20:00                                                           ` Tim Hockin
2017-10-02 20:08                                                             ` Michal Hocko
2017-10-02 20:20                                                             ` Shakeel Butt
2017-10-02 20:24                                                           ` Shakeel Butt
2017-10-02 20:34                                                             ` Johannes Weiner
2017-10-02 20:55                                                             ` Michal Hocko
2017-09-25 22:21                       ` David Rientjes
2017-09-26  8:46                         ` Michal Hocko
2017-09-26 21:04                           ` David Rientjes
2017-09-27  7:37                             ` Michal Hocko
2017-09-27  9:57                               ` Roman Gushchin
2017-09-21 14:21   ` Johannes Weiner
2017-09-21 21:17     ` David Rientjes
2017-09-21 21:51       ` Johannes Weiner
2017-09-22 20:53         ` David Rientjes
2017-09-22 15:44       ` Tejun Heo
2017-09-22 20:39         ` David Rientjes
2017-09-22 21:05           ` Tejun Heo
2017-09-23  8:16             ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170914160548.GA30441@castle \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).