All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, Vladimir Davydov <vdavydov.dev@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>,
	kernel-team@fb.com, cgroups@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Tue, 19 Sep 2017 13:54:48 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.10.1709191351330.7458@chino.kir.corp.google.com> (raw)
In-Reply-To: <20170915210807.GA5238@castle>

On Fri, 15 Sep 2017, Roman Gushchin wrote:

> > > > But then you just enforce a structural restriction on your configuration
> > > > because
> > > > 	root
> > > >         /  \
> > > >        A    D
> > > >       /\   
> > > >      B  C
> > > > 
> > > > is a different thing than
> > > > 	root
> > > >         / | \
> > > >        B  C  D
> > > >
> > > 
> > > I actually don't have a strong argument against an approach to select
> > > largest leaf or kill-all-set memcg. I think, in practice there will be
> > > no much difference.
> > > 
> > > The only real concern I have is that then we have to do the same with
> > > oom_priorities (select largest priority tree-wide), and this will limit
> > > an ability to enforce the priority by parent cgroup.
> > > 
> > 
> > Yes, oom_priority cannot select the largest priority tree-wide for exactly 
> > that reason.  We need the ability to control from which subtree the kill 
> > occurs in ancestor cgroups.  If multiple jobs are allocated their own 
> > cgroups and they can own memory.oom_priority for their own subcontainers, 
> > this becomes quite powerful so they can define their own oom priorities.   
> > Otherwise, they can easily override the oom priorities of other cgroups.
> 
> I believe, it's a solvable problem: we can require CAP_SYS_RESOURCE to set
> the oom_priority below parent's value, or something like this.
> 
> But it looks more complex, and I'm not sure there are real examples,
> when we have to compare memcgs, which are on different levels
> (or in different subtrees).
> 

It's actually much more complex because in our environment we'd need an 
"activity manager" with CAP_SYS_RESOURCE to control oom priorities of user 
subcontainers when today it need only be concerned with top-level memory 
cgroups.  Users can create their own hierarchies with their own oom 
priorities at will, it doesn't alter the selection heuristic for another 
other user running on the same system and gives them full control over the 
selection in their own subtree.  We shouldn't need to have a system-wide 
daemon with CAP_SYS_RESOURCE be required to manage subcontainers when 
nothing else requires it.  I believe it's also much easier to document: 
oom_priority is considered for all sibling cgroups at each level of the 
hierarchy and the cgroup with the lowest priority value gets iterated.

WARNING: multiple messages have this Message-ID (diff)
From: David Rientjes <rientjes@google.com>
To: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, Vladimir Davydov <vdavydov.dev@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>,
	kernel-team@fb.com, cgroups@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Tue, 19 Sep 2017 13:54:48 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.10.1709191351330.7458@chino.kir.corp.google.com> (raw)
In-Reply-To: <20170915210807.GA5238@castle>

On Fri, 15 Sep 2017, Roman Gushchin wrote:

> > > > But then you just enforce a structural restriction on your configuration
> > > > because
> > > > 	root
> > > >         /  \
> > > >        A    D
> > > >       /\   
> > > >      B  C
> > > > 
> > > > is a different thing than
> > > > 	root
> > > >         / | \
> > > >        B  C  D
> > > >
> > > 
> > > I actually don't have a strong argument against an approach to select
> > > largest leaf or kill-all-set memcg. I think, in practice there will be
> > > no much difference.
> > > 
> > > The only real concern I have is that then we have to do the same with
> > > oom_priorities (select largest priority tree-wide), and this will limit
> > > an ability to enforce the priority by parent cgroup.
> > > 
> > 
> > Yes, oom_priority cannot select the largest priority tree-wide for exactly 
> > that reason.  We need the ability to control from which subtree the kill 
> > occurs in ancestor cgroups.  If multiple jobs are allocated their own 
> > cgroups and they can own memory.oom_priority for their own subcontainers, 
> > this becomes quite powerful so they can define their own oom priorities.   
> > Otherwise, they can easily override the oom priorities of other cgroups.
> 
> I believe, it's a solvable problem: we can require CAP_SYS_RESOURCE to set
> the oom_priority below parent's value, or something like this.
> 
> But it looks more complex, and I'm not sure there are real examples,
> when we have to compare memcgs, which are on different levels
> (or in different subtrees).
> 

It's actually much more complex because in our environment we'd need an 
"activity manager" with CAP_SYS_RESOURCE to control oom priorities of user 
subcontainers when today it need only be concerned with top-level memory 
cgroups.  Users can create their own hierarchies with their own oom 
priorities at will, it doesn't alter the selection heuristic for another 
other user running on the same system and gives them full control over the 
selection in their own subtree.  We shouldn't need to have a system-wide 
daemon with CAP_SYS_RESOURCE be required to manage subcontainers when 
nothing else requires it.  I believe it's also much easier to document: 
oom_priority is considered for all sibling cgroups at each level of the 
hierarchy and the cgroup with the lowest priority value gets iterated.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-09-19 20:54 UTC|newest]

Thread overview: 168+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-11 13:17 [v8 0/4] cgroup-aware OOM killer Roman Gushchin
2017-09-11 13:17 ` Roman Gushchin
2017-09-11 13:17 ` [v8 1/4] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-09-11 13:17   ` Roman Gushchin
2017-09-11 20:51   ` David Rientjes
2017-09-11 20:51     ` David Rientjes
2017-09-14 13:42   ` Michal Hocko
2017-09-14 13:42     ` Michal Hocko
2017-09-11 13:17 ` [v8 2/4] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-09-11 13:17   ` Roman Gushchin
2017-09-13 20:46   ` David Rientjes
2017-09-13 20:46     ` David Rientjes
2017-09-13 21:59     ` Roman Gushchin
2017-09-13 21:59       ` Roman Gushchin
2017-09-13 21:59       ` Roman Gushchin
2017-09-11 13:17 ` [v8 3/4] mm, oom: add cgroup v2 mount option for " Roman Gushchin
2017-09-11 13:17   ` Roman Gushchin
2017-09-11 13:17   ` Roman Gushchin
2017-09-11 20:48   ` David Rientjes
2017-09-11 20:48     ` David Rientjes
2017-09-12 20:01     ` Roman Gushchin
2017-09-12 20:01       ` Roman Gushchin
2017-09-12 20:23       ` David Rientjes
2017-09-12 20:23         ` David Rientjes
2017-09-13 12:23       ` Michal Hocko
2017-09-13 12:23         ` Michal Hocko
2017-09-11 13:17 ` [v8 4/4] mm, oom, docs: describe the " Roman Gushchin
2017-09-11 13:17   ` Roman Gushchin
2017-09-11 20:44 ` [v8 0/4] " David Rientjes
2017-09-11 20:44   ` David Rientjes
2017-09-13 12:29   ` Michal Hocko
2017-09-13 12:29     ` Michal Hocko
2017-09-13 20:46     ` David Rientjes
2017-09-13 20:46       ` David Rientjes
2017-09-14 13:34       ` Michal Hocko
2017-09-14 13:34         ` Michal Hocko
2017-09-14 20:07         ` David Rientjes
2017-09-14 20:07           ` David Rientjes
2017-09-13 21:56     ` Roman Gushchin
2017-09-13 21:56       ` Roman Gushchin
2017-09-14 13:40       ` Michal Hocko
2017-09-14 13:40         ` Michal Hocko
2017-09-14 16:05         ` Roman Gushchin
2017-09-14 16:05           ` Roman Gushchin
2017-09-15 10:58           ` Michal Hocko
2017-09-15 10:58             ` Michal Hocko
2017-09-15 15:23             ` Roman Gushchin
2017-09-15 15:23               ` Roman Gushchin
2017-09-15 19:55               ` David Rientjes
2017-09-15 19:55                 ` David Rientjes
2017-09-15 21:08                 ` Roman Gushchin
2017-09-15 21:08                   ` Roman Gushchin
2017-09-18  6:20                   ` Michal Hocko
2017-09-18  6:20                     ` Michal Hocko
2017-09-18 15:02                     ` Roman Gushchin
2017-09-18 15:02                       ` Roman Gushchin
2017-09-18 15:02                       ` Roman Gushchin
2017-09-21  8:30                       ` David Rientjes
2017-09-21  8:30                         ` David Rientjes
2017-09-19 20:54                   ` David Rientjes [this message]
2017-09-19 20:54                     ` David Rientjes
2017-09-20 22:24                     ` Roman Gushchin
2017-09-20 22:24                       ` Roman Gushchin
2017-09-21  8:27                       ` David Rientjes
2017-09-21  8:27                         ` David Rientjes
2017-09-18  6:16                 ` Michal Hocko
2017-09-18  6:16                   ` Michal Hocko
2017-09-19 20:51                   ` David Rientjes
2017-09-19 20:51                     ` David Rientjes
2017-09-18  6:14               ` Michal Hocko
2017-09-18  6:14                 ` Michal Hocko
2017-09-20 21:53                 ` Roman Gushchin
2017-09-20 21:53                   ` Roman Gushchin
2017-09-20 21:53                   ` Roman Gushchin
2017-09-25 12:24                   ` Michal Hocko
2017-09-25 12:24                     ` Michal Hocko
2017-09-25 17:00                     ` Johannes Weiner
2017-09-25 17:00                       ` Johannes Weiner
2017-09-25 18:15                       ` Roman Gushchin
2017-09-25 18:15                         ` Roman Gushchin
2017-09-25 20:25                         ` Michal Hocko
2017-09-25 20:25                           ` Michal Hocko
2017-09-25 20:25                           ` Michal Hocko
2017-09-26 10:59                           ` Roman Gushchin
2017-09-26 10:59                             ` Roman Gushchin
2017-09-26 11:21                             ` Michal Hocko
2017-09-26 11:21                               ` Michal Hocko
2017-09-26 12:13                               ` Roman Gushchin
2017-09-26 12:13                                 ` Roman Gushchin
2017-09-26 12:13                                 ` Roman Gushchin
2017-09-26 13:30                                 ` Michal Hocko
2017-09-26 13:30                                   ` Michal Hocko
2017-09-26 17:26                                   ` Johannes Weiner
2017-09-26 17:26                                     ` Johannes Weiner
2017-09-27  3:37                                     ` Tim Hockin
2017-09-27  3:37                                       ` Tim Hockin
2017-09-27  7:43                                       ` Michal Hocko
2017-09-27  7:43                                         ` Michal Hocko
2017-09-27 10:19                                         ` Roman Gushchin
2017-09-27 10:19                                           ` Roman Gushchin
2017-09-27 10:19                                           ` Roman Gushchin
2017-09-27 15:35                                         ` Tim Hockin
2017-09-27 15:35                                           ` Tim Hockin
2017-09-27 16:23                                           ` Roman Gushchin
2017-09-27 16:23                                             ` Roman Gushchin
2017-09-27 18:11                                             ` Tim Hockin
2017-09-27 18:11                                               ` Tim Hockin
2017-10-01 23:29                                               ` Shakeel Butt
2017-10-01 23:29                                                 ` Shakeel Butt
2017-10-02 11:56                                                 ` Tetsuo Handa
2017-10-02 11:56                                                   ` Tetsuo Handa
2017-10-02 12:24                                                 ` Michal Hocko
2017-10-02 12:24                                                   ` Michal Hocko
2017-10-02 12:47                                                   ` Roman Gushchin
2017-10-02 12:47                                                     ` Roman Gushchin
2017-10-02 14:29                                                     ` Michal Hocko
2017-10-02 14:29                                                       ` Michal Hocko
2017-10-02 14:29                                                       ` Michal Hocko
2017-10-02 19:00                                                   ` Shakeel Butt
2017-10-02 19:00                                                     ` Shakeel Butt
2017-10-02 19:28                                                     ` Michal Hocko
2017-10-02 19:28                                                       ` Michal Hocko
2017-10-02 19:45                                                       ` Shakeel Butt
2017-10-02 19:45                                                         ` Shakeel Butt
2017-10-02 19:56                                                         ` Michal Hocko
2017-10-02 19:56                                                           ` Michal Hocko
2017-10-02 20:00                                                           ` Tim Hockin
2017-10-02 20:00                                                             ` Tim Hockin
2017-10-02 20:08                                                             ` Michal Hocko
2017-10-02 20:08                                                               ` Michal Hocko
2017-10-02 20:09                                                             ` Shakeel Butt
2017-10-02 20:20                                                             ` Shakeel Butt
2017-10-02 20:20                                                               ` Shakeel Butt
2017-10-02 20:24                                                           ` Shakeel Butt
2017-10-02 20:24                                                             ` Shakeel Butt
2017-10-02 20:34                                                             ` Johannes Weiner
2017-10-02 20:34                                                               ` Johannes Weiner
2017-10-02 20:55                                                             ` Michal Hocko
2017-10-02 20:55                                                               ` Michal Hocko
2017-09-25 22:21                       ` David Rientjes
2017-09-25 22:21                         ` David Rientjes
2017-09-26  8:46                         ` Michal Hocko
2017-09-26  8:46                           ` Michal Hocko
2017-09-26 21:04                           ` David Rientjes
2017-09-26 21:04                             ` David Rientjes
2017-09-27  7:37                             ` Michal Hocko
2017-09-27  7:37                               ` Michal Hocko
2017-09-27  9:57                               ` Roman Gushchin
2017-09-27  9:57                                 ` Roman Gushchin
2017-09-21 14:21   ` Johannes Weiner
2017-09-21 14:21     ` Johannes Weiner
2017-09-21 21:17     ` David Rientjes
2017-09-21 21:17       ` David Rientjes
2017-09-21 21:17       ` David Rientjes
2017-09-21 21:51       ` Johannes Weiner
2017-09-21 21:51         ` Johannes Weiner
2017-09-22 20:53         ` David Rientjes
2017-09-22 20:53           ` David Rientjes
2017-09-22 15:44       ` Tejun Heo
2017-09-22 15:44         ` Tejun Heo
2017-09-22 15:44         ` Tejun Heo
2017-09-22 20:39         ` David Rientjes
2017-09-22 20:39           ` David Rientjes
2017-09-22 20:39           ` David Rientjes
2017-09-22 21:05           ` Tejun Heo
2017-09-22 21:05             ` Tejun Heo
2017-09-23  8:16             ` David Rientjes
2017-09-23  8:16               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1709191351330.7458@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.