From: David Rientjes <rientjes@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <guro@fb.com>,
linux-mm@vger.kernel.org, Michal Hocko <mhocko@suse.com>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Tejun Heo <tj@kernel.org>,
kernel-team@fb.com, cgroups@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v13 0/7] cgroup-aware OOM killer
Date: Tue, 16 Jan 2018 13:21:50 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.10.1801161306250.242486@chino.kir.corp.google.com> (raw)
In-Reply-To: <20180115162500.GA26120@cmpxchg.org>
On Mon, 15 Jan 2018, Johannes Weiner wrote:
> > It's quite trivial to allow the root mem cgroup to be compared exactly the
> > same as another cgroup. Please see
> > https://marc.info/?l=linux-kernel&m=151579459920305.
>
> This only says "that will be fixed" and doesn't address why I care.
>
Because the design of the cgroup aware oom killer requires the entire
system to be fully containerized to be useful and select the
correct/anticipated victim. If anything is left behind in the root mem
cgroup, or on systems that use mem cgroups in ways that you do not, it
returns wildly unpredictable and downright wrong results; it factors
oom_score_adj into the selection criteria only for processes attached to
the root mem cgroup and ignores it otherwise. If we treated root and leaf
cgroups as comparable, this problem wouldn't arise.
> > > This assumes you even need one. Right now, the OOM killer picks the
> > > biggest MM, so you can evade selection by forking your MM. This patch
> > > allows picking the biggest cgroup, so you can evade by forking groups.
> >
> > It's quite trivial to prevent any cgroup from evading the oom killer by
> > either forking their mm or attaching all their processes to subcontainers.
> > Please see https://marc.info/?l=linux-kernel&m=151579459920305.
>
> This doesn't address anything I wrote.
>
It prevents both problems if you are attached to a mem cgroup subtree.
You can't fork the mm and you can't fork groups to evade the selection
logic. If the root mem cgroup and leaf cgroups were truly comparable, it
also prevents both problems regardless of which cgroup the processes
attached to.
> > > It's not a new vector, and clearly nobody cares. This has never been
> > > brought up against the current design that I know of.
> >
> > As cgroup v2 becomes more popular, people will organize their cgroup
> > hierarchies for all controllers they need to use. We do this today, for
> > example, by attaching some individual consumers to child mem cgroups
> > purely for the rich statistics and vmscan stats that mem cgroup provides
> > without any limitation on those cgroups.
>
> There is no connecting tissue between what I wrote and what you wrote.
>
You're completely ignoring other usecases other than your own highly
specialized usecase. You may attach every user process on the system to
leaf cgroups and you may not have any users who have control over a
subtree. Other people do. And this patchset, as implemented, returns
very inaccurate results or allows users to intentionally or
unintentionally evade the oom killer just because they want to use
subcontainers.
Creating and attaching a subset of processes to either top-level
containers or subcontainers for either limitation by mem cgroup or for
statistics is a valid usecase, and one that is used in practice. Your
suggestion of avoiding that problem to work around the shortcomings of
this patchset by limiting how many subcontainers can be created by the
user is utterly ridiculous.
> We have a patch set that was developed, iterated and improved over 10+
> revisions, based on evaluating and weighing trade-offs. It's reached a
> state where the memcg maintainers are happy with it and don't have any
> concern about future extendabilty to cover more specialized setups.
>
It's also obviously untested in those 10+ revisions since it uses
oom_badness() for the root mem cgroup and not leaf mem cgroups which is
why it breaks any system where user processes are attached to the root mem
cgroup. See my example where a 1GB worker attached to the root mem cgroup
is preferred over a cgroup with 6GB workers. It may have been tested with
your own highly specialized usecase, but not with anything else, and the
author obviously admits its shortcomings.
> You've had nine months to contribute and shape this patch series
> productively, and instead resorted to a cavalcade of polemics,
> evasion, and repetition of truisms and refuted points. A ten paragraph
> proposal of vague ideas at this point is simply not a valid argument.
>
The patch series has gone through massive changes over the past nine
months and I've brought up three very specific concerns with its current
form that makes it non-extensible. I know the patchset has very
significant changes or rewrites in its history, but I'm only concerned
with its present form because it's not something that can easily be
extended later. We don't need useless files polutting the cgroup v2
filesystem for things that don't matter with other oom policies and we
don't need the mount option, actually.
> You can send patches to replace or improve on Roman's code and make
> the case for them.
>
I volunteered in the email thread where I proposed an alternative to
replace it, I'm actively seeking any feedback on that proposal to address
anybody else's concerns early on. Your participation in that would be
very useful.
Thanks.
next prev parent reply other threads:[~2018-01-16 21:21 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-30 15:28 [PATCH v13 0/7] cgroup-aware OOM killer Roman Gushchin
2017-11-30 15:28 ` [PATCH v13 1/7] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-11-30 15:28 ` [PATCH v13 2/7] mm: implement mem_cgroup_scan_tasks() for the root memory cgroup Roman Gushchin
2017-11-30 15:28 ` [PATCH v13 3/7] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-12-01 8:35 ` Michal Hocko
2017-12-07 1:24 ` Andrew Morton
2017-12-07 13:39 ` Roman Gushchin
2017-11-30 15:28 ` [PATCH v13 4/7] mm, oom: introduce memory.oom_group Roman Gushchin
2017-11-30 15:28 ` [PATCH v13 5/7] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer Roman Gushchin
2017-12-01 8:41 ` Michal Hocko
2017-12-01 13:15 ` Roman Gushchin
2017-12-01 13:31 ` Michal Hocko
2017-12-01 17:00 ` Roman Gushchin
2017-11-30 15:28 ` [PATCH v13 6/7] mm, oom, docs: describe the " Roman Gushchin
2017-12-01 8:41 ` Michal Hocko
2017-12-01 17:01 ` Roman Gushchin
2017-12-01 17:13 ` Michal Hocko
2017-11-30 15:28 ` [PATCH v13 7/7] cgroup: list groupoom in cgroup features Roman Gushchin
2017-11-30 20:39 ` [PATCH v13 0/7] cgroup-aware OOM killer Andrew Morton
2018-01-10 0:57 ` David Rientjes
2018-01-10 13:11 ` Roman Gushchin
2018-01-10 19:33 ` Andrew Morton
2018-01-11 9:08 ` Michal Hocko
2018-01-11 13:18 ` Roman Gushchin
2018-01-12 22:03 ` David Rientjes
2018-01-15 11:54 ` Michal Hocko
2018-01-16 21:36 ` David Rientjes
2018-01-16 22:09 ` Michal Hocko
2018-01-11 21:57 ` David Rientjes
2018-01-13 17:14 ` Johannes Weiner
2018-01-14 23:44 ` David Rientjes
2018-01-15 16:25 ` Johannes Weiner
2018-01-16 21:21 ` David Rientjes [this message]
2018-01-10 20:50 ` David Rientjes
2017-12-01 9:14 ` [PATCH] mm, oom: simplify alloc_pages_before_oomkill handling Michal Hocko
2017-12-01 13:26 ` Tetsuo Handa
2017-12-01 13:32 ` Roman Gushchin
2017-12-01 13:54 ` Michal Hocko
2018-06-05 11:47 ` [PATCH v13 0/7] cgroup-aware OOM killer Michal Hocko
2018-06-05 12:13 ` Michal Hocko
2018-07-13 21:59 ` David Rientjes
2018-07-14 1:55 ` Tetsuo Handa
2018-07-16 21:13 ` Tetsuo Handa
2018-07-16 22:09 ` Roman Gushchin
2018-07-17 0:55 ` Tetsuo Handa
2018-07-31 14:14 ` Tetsuo Handa
2018-08-01 16:37 ` Roman Gushchin
2018-08-01 22:01 ` Tetsuo Handa
2018-08-01 22:55 ` Roman Gushchin
2018-07-16 9:36 ` Michal Hocko
2018-07-17 3:59 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.10.1801161306250.242486@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-mm@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=tj@kernel.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).