linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>, <linux-mm@kvack.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, <kernel-team@fb.com>,
	<cgroups@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Wed, 20 Sep 2017 15:24:03 -0700	[thread overview]
Message-ID: <20170920222403.GA4729@castle> (raw)
In-Reply-To: <alpine.DEB.2.10.1709191351330.7458@chino.kir.corp.google.com>

On Tue, Sep 19, 2017 at 01:54:48PM -0700, David Rientjes wrote:
> On Fri, 15 Sep 2017, Roman Gushchin wrote:
> 
> > > > > But then you just enforce a structural restriction on your configuration
> > > > > because
> > > > > 	root
> > > > >         /  \
> > > > >        A    D
> > > > >       /\   
> > > > >      B  C
> > > > > 
> > > > > is a different thing than
> > > > > 	root
> > > > >         / | \
> > > > >        B  C  D
> > > > >
> > > > 
> > > > I actually don't have a strong argument against an approach to select
> > > > largest leaf or kill-all-set memcg. I think, in practice there will be
> > > > no much difference.
> > > > 
> > > > The only real concern I have is that then we have to do the same with
> > > > oom_priorities (select largest priority tree-wide), and this will limit
> > > > an ability to enforce the priority by parent cgroup.
> > > > 
> > > 
> > > Yes, oom_priority cannot select the largest priority tree-wide for exactly 
> > > that reason.  We need the ability to control from which subtree the kill 
> > > occurs in ancestor cgroups.  If multiple jobs are allocated their own 
> > > cgroups and they can own memory.oom_priority for their own subcontainers, 
> > > this becomes quite powerful so they can define their own oom priorities.   
> > > Otherwise, they can easily override the oom priorities of other cgroups.
> > 
> > I believe, it's a solvable problem: we can require CAP_SYS_RESOURCE to set
> > the oom_priority below parent's value, or something like this.
> > 
> > But it looks more complex, and I'm not sure there are real examples,
> > when we have to compare memcgs, which are on different levels
> > (or in different subtrees).
> > 
> 
> It's actually much more complex because in our environment we'd need an 
> "activity manager" with CAP_SYS_RESOURCE to control oom priorities of user 
> subcontainers when today it need only be concerned with top-level memory 
> cgroups.  Users can create their own hierarchies with their own oom 
> priorities at will, it doesn't alter the selection heuristic for another 
> other user running on the same system and gives them full control over the 
> selection in their own subtree.  We shouldn't need to have a system-wide 
> daemon with CAP_SYS_RESOURCE be required to manage subcontainers when 
> nothing else requires it.  I believe it's also much easier to document: 
> oom_priority is considered for all sibling cgroups at each level of the 
> hierarchy and the cgroup with the lowest priority value gets iterated.

I do agree actually. System-wide OOM priorities make no sense.

Always compare sibling cgroups, either by priority or size, seems to be
simple, clear and powerful enough for all reasonable use cases. Am I right,
that it's exactly what you've used internally? This is a perfect confirmation,
I believe.

Thanks!

  reply	other threads:[~2017-09-20 22:24 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-11 13:17 [v8 0/4] cgroup-aware OOM killer Roman Gushchin
2017-09-11 13:17 ` [v8 1/4] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-09-11 20:51   ` David Rientjes
2017-09-14 13:42   ` Michal Hocko
2017-09-11 13:17 ` [v8 2/4] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-09-13 20:46   ` David Rientjes
2017-09-13 21:59     ` Roman Gushchin
2017-09-11 13:17 ` [v8 3/4] mm, oom: add cgroup v2 mount option for " Roman Gushchin
2017-09-11 20:48   ` David Rientjes
2017-09-12 20:01     ` Roman Gushchin
2017-09-12 20:23       ` David Rientjes
2017-09-13 12:23       ` Michal Hocko
2017-09-11 13:17 ` [v8 4/4] mm, oom, docs: describe the " Roman Gushchin
2017-09-11 20:44 ` [v8 0/4] " David Rientjes
2017-09-13 12:29   ` Michal Hocko
2017-09-13 20:46     ` David Rientjes
2017-09-14 13:34       ` Michal Hocko
2017-09-14 20:07         ` David Rientjes
2017-09-13 21:56     ` Roman Gushchin
2017-09-14 13:40       ` Michal Hocko
2017-09-14 16:05         ` Roman Gushchin
2017-09-15 10:58           ` Michal Hocko
2017-09-15 15:23             ` Roman Gushchin
2017-09-15 19:55               ` David Rientjes
2017-09-15 21:08                 ` Roman Gushchin
2017-09-18  6:20                   ` Michal Hocko
2017-09-18 15:02                     ` Roman Gushchin
2017-09-21  8:30                       ` David Rientjes
2017-09-19 20:54                   ` David Rientjes
2017-09-20 22:24                     ` Roman Gushchin [this message]
2017-09-21  8:27                       ` David Rientjes
2017-09-18  6:16                 ` Michal Hocko
2017-09-19 20:51                   ` David Rientjes
2017-09-18  6:14               ` Michal Hocko
2017-09-20 21:53                 ` Roman Gushchin
2017-09-25 12:24                   ` Michal Hocko
2017-09-25 17:00                     ` Johannes Weiner
2017-09-25 18:15                       ` Roman Gushchin
2017-09-25 20:25                         ` Michal Hocko
2017-09-26 10:59                           ` Roman Gushchin
2017-09-26 11:21                             ` Michal Hocko
2017-09-26 12:13                               ` Roman Gushchin
2017-09-26 13:30                                 ` Michal Hocko
2017-09-26 17:26                                   ` Johannes Weiner
2017-09-27  3:37                                     ` Tim Hockin
2017-09-27  7:43                                       ` Michal Hocko
2017-09-27 10:19                                         ` Roman Gushchin
2017-09-27 15:35                                         ` Tim Hockin
2017-09-27 16:23                                           ` Roman Gushchin
2017-09-27 18:11                                             ` Tim Hockin
2017-10-01 23:29                                               ` Shakeel Butt
2017-10-02 11:56                                                 ` Tetsuo Handa
2017-10-02 12:24                                                 ` Michal Hocko
2017-10-02 12:47                                                   ` Roman Gushchin
2017-10-02 14:29                                                     ` Michal Hocko
2017-10-02 19:00                                                   ` Shakeel Butt
2017-10-02 19:28                                                     ` Michal Hocko
2017-10-02 19:45                                                       ` Shakeel Butt
2017-10-02 19:56                                                         ` Michal Hocko
2017-10-02 20:00                                                           ` Tim Hockin
2017-10-02 20:08                                                             ` Michal Hocko
2017-10-02 20:20                                                             ` Shakeel Butt
2017-10-02 20:24                                                           ` Shakeel Butt
2017-10-02 20:34                                                             ` Johannes Weiner
2017-10-02 20:55                                                             ` Michal Hocko
2017-09-25 22:21                       ` David Rientjes
2017-09-26  8:46                         ` Michal Hocko
2017-09-26 21:04                           ` David Rientjes
2017-09-27  7:37                             ` Michal Hocko
2017-09-27  9:57                               ` Roman Gushchin
2017-09-21 14:21   ` Johannes Weiner
2017-09-21 21:17     ` David Rientjes
2017-09-21 21:51       ` Johannes Weiner
2017-09-22 20:53         ` David Rientjes
2017-09-22 15:44       ` Tejun Heo
2017-09-22 20:39         ` David Rientjes
2017-09-22 21:05           ` Tejun Heo
2017-09-23  8:16             ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170920222403.GA4729@castle \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).