linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: <linux-mm@kvack.org>, Vladimir Davydov <vdavydov.dev@gmail.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, <kernel-team@fb.com>,
	<cgroups@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [v7 2/5] mm, oom: cgroup-aware OOM killer
Date: Wed, 6 Sep 2017 13:57:50 +0100	[thread overview]
Message-ID: <20170906125750.GB12904@castle> (raw)
In-Reply-To: <20170906083158.gvqx6pekrsy2ya47@dhcp22.suse.cz>

On Wed, Sep 06, 2017 at 10:31:58AM +0200, Michal Hocko wrote:
> On Tue 05-09-17 21:23:57, Roman Gushchin wrote:
> > On Tue, Sep 05, 2017 at 04:57:00PM +0200, Michal Hocko wrote:
> [...]
> > > Hmm. The changelog says "By default, it will look for the biggest leaf
> > > cgroup, and kill the largest task inside." But you are accumulating
> > > oom_score up the hierarchy and so parents will have higher score than
> > > the layer of their children and the larger the sub-hierarchy the more
> > > biased it will become. Say you have
> > > 	root
> > >          /\
> > >         /  \
> > >        A    D
> > >       / \
> > >      B   C
> > > 
> > > B (5), C(15) thus A(20) and D(20). Unless I am missing something we are
> > > going to go down A path and then chose C even though D is the largest
> > > leaf group, right?
> > 
> > You're right, changelog is not accurate, I'll fix it.
> > The behavior is correct, IMO.
> 
> Please explain why. This is really a non-intuitive semantic. Why should
> larger hierarchies be punished more than shallow ones? I would
> completely agree if the whole hierarchy would be a killable entity (aka
> A would be kill-all).

I think it's a reasonable and clear policy: we're looking for a memcg
with the smallest oom_priority and largest memory footprint recursively.
Then we reclaim some memory from it (by killing the biggest process
or all processes, depending on memcg preferences).

In general, if there are two memcgs of equal importance (which is defined
by setting the oom_priority), we're choosing the largest, because there
are more chances that it contain a leaking process. The same is true
right now for processes.

I agree, that for size-based comparison we could use a different policy:
comparing leaf cgroups despite their level. But I don't see a clever
way to apply oom_priorities in this case. Comparing oom_priority
on each level is a simple and powerful policy, and it works well
for delegation.

>  
> [...]
> > > I do not understand why do we have to handle root cgroup specially here.
> > > select_victim_memcg already iterates all memcgs in the oom hierarchy
> > > (including root) so if the root memcg is the largest one then we
> > > should simply consider it no?
> > 
> > We don't have necessary stats for the root cgroup, so we can't calculate
> > it's oom_score.
> 
> We used to charge pages to the root memcg as well so we might resurrect
> that idea. In any case this is something that could be hidden in
> memcg_oom_badness rather then special cased somewhere else.

In theory I agree, but I do not see a good way to calculate root memcg
oom_score.

> 
> > > You are skipping root there because of
> > > memcg_has_children but I suspect this and the whole accumulate up the
> > > hierarchy approach just makes the whole thing more complex than necessary. With
> > > "tasks only in leafs" cgroup policy we should only see any pages on LRUs
> > > on the global root memcg and leaf cgroups. The same applies to memcg
> > > stats. So why cannot we simply do the tree walk, calculate
> > > badness/check the priority and select the largest memcg in one go?
> > 
> > We have to traverse from top to bottom to make priority-based decision,
> > but size-based oom_score is calculated as sum of descending leaf cgroup scores.
> > 
> > For example:
> >  	root
> >           /\
> >          /  \
> >         A    D
> >        / \
> >       B   C
> > A and D have same priorities, B has larger priority than C.
> > 
> > In this case we need to calculate size-based score for A, which requires
> > summing oom_score of the sub-tree (B an C), despite we don't need it
> > for choosing between B and C.
> > 
> > Maybe I don't see it, but I don't know how to implement it more optimal.
> 
> I have to think about the priority based oom killing some more to be
> honest. Do we really want to allow setting a priority to non-leaf
> memcgs? How are you going to manage the whole tree consistency? Say your
> above example have prio(A) < prio(D) && prio(C) > prio(D). Your current
> implementation would kill D, right?

Right.

> Isn't that counter intuitive
> behavior again. If anything we should prio(A) = max(tree_prio(A)). Again
> I could understand comparing priorities only on killable entities.

Answered above.
Also, I don't think any per-memcg knobs should have global meaning,
despite parent memcg settings. It will break delegation model.

Thanks!

Roman

  reply	other threads:[~2017-09-06 12:58 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-04 14:21 [v7 0/5] cgroup-aware OOM killer Roman Gushchin
2017-09-04 14:21 ` [v7 1/5] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-09-05 13:34   ` Michal Hocko
2017-09-04 14:21 ` [v7 2/5] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-09-05 14:57   ` Michal Hocko
2017-09-05 20:23     ` Roman Gushchin
2017-09-06  8:31       ` Michal Hocko
2017-09-06 12:57         ` Roman Gushchin [this message]
2017-09-06 13:22           ` Michal Hocko
2017-09-06 13:41             ` Roman Gushchin
2017-09-06 14:10               ` Michal Hocko
2017-09-06  8:34       ` Michal Hocko
2017-09-06 12:33         ` Roman Gushchin
2017-09-07 16:18   ` Christopher Lameter
2017-09-11  8:49     ` Michal Hocko
2017-09-04 14:21 ` [v7 3/5] mm, oom: introduce oom_priority for memory cgroups Roman Gushchin
2017-09-04 14:21 ` [v7 4/5] mm, oom, docs: describe the cgroup-aware OOM killer Roman Gushchin
2017-09-04 14:21 ` [v7 5/5] mm, oom: cgroup v2 mount option to disable " Roman Gushchin
2017-09-04 17:32   ` Shakeel Butt
2017-09-04 17:51     ` Roman Gushchin
2017-09-05 13:44   ` Michal Hocko
2017-09-05 14:30     ` Roman Gushchin
2017-09-05 15:12       ` Michal Hocko
2017-09-05 19:16         ` Roman Gushchin
2017-09-06  8:42           ` Michal Hocko
2017-09-06 17:40             ` Roman Gushchin
2017-09-06 17:59               ` Michal Hocko
2017-09-06 20:59               ` David Rientjes
2017-09-07 14:43                 ` Christopher Lameter
2017-09-07 14:52                   ` Roman Gushchin
2017-09-07 15:03                     ` Christopher Lameter
2017-09-07 16:42                       ` Roman Gushchin
2017-09-07 17:03                         ` Christopher Lameter
2017-09-07 21:55                   ` David Rientjes
2017-09-07 16:21         ` Christopher Lameter
2017-09-05 21:53     ` Johannes Weiner
2017-09-06  8:28       ` Michal Hocko
2017-09-07 16:14         ` Johannes Weiner
2017-09-11  9:05           ` Michal Hocko
2017-09-11 12:50             ` Roman Gushchin
2017-09-07 16:27         ` Christopher Lameter
2017-09-07 22:03           ` David Rientjes
2017-09-08 21:07             ` Christopher Lameter
2017-09-09  8:45               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170906125750.GB12904@castle \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).