From: Shakeel Butt <shakeelb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>, Linux MM <linux-mm@kvack.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
David Rientjes <rientjes@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>,
kernel-team@fb.com, Cgroups <cgroups@vger.kernel.org>,
linux-doc@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer
Date: Tue, 31 Oct 2017 10:50:43 -0700 [thread overview]
Message-ID: <CALvZod5tVoX20Lir=4jnWMXzsEGhh1qCbi73j5vs_n6ViR80yw@mail.gmail.com> (raw)
In-Reply-To: <20171031164008.GA32246@cmpxchg.org>
On Tue, Oct 31, 2017 at 9:40 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Tue, Oct 31, 2017 at 08:04:19AM -0700, Shakeel Butt wrote:
>> > +
>> > +static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc)
>> > +{
>> > + struct mem_cgroup *iter;
>> > +
>> > + oc->chosen_memcg = NULL;
>> > + oc->chosen_points = 0;
>> > +
>> > + /*
>> > + * The oom_score is calculated for leaf memory cgroups (including
>> > + * the root memcg).
>> > + */
>> > + rcu_read_lock();
>> > + for_each_mem_cgroup_tree(iter, root) {
>> > + long score;
>> > +
>> > + if (memcg_has_children(iter) && iter != root_mem_cgroup)
>> > + continue;
>> > +
>>
>> Cgroup v2 does not support charge migration between memcgs. So, there
>> can be intermediate nodes which may contain the major charge of the
>> processes in their leave descendents. Skipping such intermediate nodes
>> will kind of protect such processes from oom-killer (lower on the list
>> to be killed). Is it ok to not handle such scenario? If yes, shouldn't
>> we document it?
>
> Tasks cannot be in intermediate nodes, so the only way you can end up
> in a situation like this is to start tasks fully, let them fault in
> their full workingset, then create child groups and move them there.
>
> That has attribution problems much wider than the OOM killer: any
> local limits you would set on a leaf cgroup like this ALSO won't
> control the memory of its tasks - as it's all sitting in the parent.
>
> We created the "no internal competition" rule exactly to prevent this
> situation.
Rather than the "no internal competition" restriction I think "charge
migration" would have resolved that situation? Also "no internal
competition" restriction (I am assuming 'no internal competition' is
no tasks in internal nodes, please correct me if I am wrong) has made
"charge migration" hard to implement and thus not added in cgroup v2.
I know this is parallel discussion and excuse my ignorance, what are
other reasons behind "no internal competition" specifically for memory
controller?
> To be consistent with that rule, we might want to disallow
> the creation of child groups once a cgroup has local memory charges.
>
> It's trivial to change the setup sequence to create the leaf cgroup
> first, then launch the workload from within.
>
Only if cgroup hierarchy is centrally controller and each task's whole
hierarchy is known in advance.
> Either way, this is nothing specific about the OOM killer.
next prev parent reply other threads:[~2017-10-31 17:50 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-19 18:52 [RESEND v12 0/6] cgroup-aware OOM killer Roman Gushchin
2017-10-19 18:52 ` [RESEND v12 1/6] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-10-19 18:52 ` [RESEND v12 2/6] mm: implement mem_cgroup_scan_tasks() for the root memory cgroup Roman Gushchin
2017-10-19 18:52 ` [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-10-19 19:30 ` Michal Hocko
2017-10-31 15:04 ` Shakeel Butt
2017-10-31 15:29 ` Michal Hocko
2017-10-31 19:06 ` Michal Hocko
2017-10-31 19:13 ` Michal Hocko
2017-10-31 16:40 ` Johannes Weiner
2017-10-31 17:50 ` Shakeel Butt [this message]
2017-10-31 18:44 ` Johannes Weiner
2017-10-19 18:52 ` [RESEND v12 4/6] mm, oom: introduce memory.oom_group Roman Gushchin
2017-10-19 18:52 ` [RESEND v12 5/6] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer Roman Gushchin
2017-10-19 18:52 ` [RESEND v12 6/6] mm, oom, docs: describe the " Roman Gushchin
2017-10-19 19:45 ` [RESEND v12 0/6] " Johannes Weiner
2017-10-19 21:09 ` Michal Hocko
2017-10-23 0:24 ` David Rientjes
2017-10-23 11:49 ` Michal Hocko
2017-10-25 20:12 ` David Rientjes
2017-10-26 14:24 ` Johannes Weiner
2017-10-26 21:03 ` David Rientjes
2017-10-27 9:31 ` Roman Gushchin
2017-10-30 21:36 ` David Rientjes
2017-10-31 7:54 ` Michal Hocko
2017-10-31 22:21 ` David Rientjes
2017-11-01 7:37 ` Michal Hocko
2017-11-01 20:42 ` David Rientjes
2017-10-27 20:05 ` Johannes Weiner
2017-10-31 14:17 ` peter enderborg
2017-10-31 14:34 ` Michal Hocko
2017-10-31 15:07 ` peter enderborg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALvZod5tVoX20Lir=4jnWMXzsEGhh1qCbi73j5vs_n6ViR80yw@mail.gmail.com' \
--to=shakeelb@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).