From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>,
linux-mm@kvack.org, Vladimir Davydov <vdavydov.dev@gmail.com>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
David Rientjes <rientjes@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>,
kernel-team@fb.com, cgroups@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer
Date: Wed, 6 Sep 2017 10:28:59 +0200 [thread overview]
Message-ID: <20170906082859.qlqenftxuib64j35@dhcp22.suse.cz> (raw)
In-Reply-To: <20170905215344.GA27427@cmpxchg.org>
On Tue 05-09-17 17:53:44, Johannes Weiner wrote:
> On Tue, Sep 05, 2017 at 03:44:12PM +0200, Michal Hocko wrote:
> > Why is this an opt out rather than opt-in? IMHO the original oom logic
> > should be preserved by default and specific workloads should opt in for
> > the cgroup aware logic. Changing the global behavior depending on
> > whether cgroup v2 interface is in use is more than unexpected and IMHO
> > wrong approach to take. I think we should instead go with
> > oom_strategy=[alloc_task,biggest_task,cgroup]
> >
> > we currently have alloc_task (via sysctl_oom_kill_allocating_task) and
> > biggest_task which is the default. You are adding cgroup and the more I
> > think about the more I agree that it doesn't really make sense to try to
> > fit thew new semantic into the existing one (compare tasks to kill-all
> > memcgs). Just introduce a new strategy and define a new semantic from
> > scratch. Memcg priority and kill-all are a natural extension of this new
> > strategy. This will make the life easier and easier to understand by
> > users.
>
> oom_kill_allocating_task is actually a really good example of why
> cgroup-awareness *should* be the new default.
>
> Before we had the oom killer victim selection, we simply killed the
> faulting/allocating task. While a valid answer to the problem, it's
> not very fair or representative of what the user wants or intends.
>
> Then we added code to kill the biggest offender instead, which should
> have been the case from the start and was hence made the new default.
> The oom_kill_allocating_task was added on the off-chance that there
> might be setups who, for historical reasons, rely on the old behavior.
> But our default was chosen based on what behavior is fair, expected,
> and most reflective of the user's intentions.
I am not sure this is how things evolved actually. This is way before
my time so my git log interpretation might be imprecise. We do have
oom_badness heuristic since out_of_memory has been introduced and
oom_kill_allocating_task has been introduced much later because of large
boxes with zillions of tasks (SGI I suspect) which took too long to
select a victim so David has added this heuristic.
> The cgroup-awareness in the OOM killer is exactly the same thing. It
> should have been the default from the beginning, because the user
> configures a group of tasks to be an interdependent, terminal unit of
> memory consumption, and it's undesirable for the OOM killer to ignore
> this intention and compare members across these boundaries.
I would agree if that was true in general. I can completely see how the
cgroup awareness is useful in e.g. containerized environments (especially
with kill-all enabled) but memcgs are used in a large variety of
usecases and I cannot really say all of them really demand the new
semantic. Say I have a workload which doesn't want to see reclaim
interference from others on the same machine. Why should I kill a
process from that particular memcg just because it is the largest one
when there is a memory hog/leak outside of this memcg?
>From my point of view the safest (in a sense of the least surprise)
way to go with opt-in for the new heuristic. I am pretty sure all who
would benefit from the new behavior will enable it while others will not
regress in unexpected way.
We can talk about the way _how_ to control these oom strategies, of
course. But I would be really reluctant to change the default which is
used for years and people got used to it.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2017-09-06 8:29 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-04 14:21 [v7 0/5] cgroup-aware OOM killer Roman Gushchin
2017-09-04 14:21 ` [v7 1/5] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-09-05 13:34 ` Michal Hocko
2017-09-04 14:21 ` [v7 2/5] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-09-05 14:57 ` Michal Hocko
2017-09-05 20:23 ` Roman Gushchin
2017-09-06 8:31 ` Michal Hocko
2017-09-06 12:57 ` Roman Gushchin
2017-09-06 13:22 ` Michal Hocko
2017-09-06 13:41 ` Roman Gushchin
2017-09-06 14:10 ` Michal Hocko
2017-09-06 8:34 ` Michal Hocko
2017-09-06 12:33 ` Roman Gushchin
2017-09-07 16:18 ` Christopher Lameter
2017-09-11 8:49 ` Michal Hocko
2017-09-04 14:21 ` [v7 3/5] mm, oom: introduce oom_priority for memory cgroups Roman Gushchin
2017-09-04 14:21 ` [v7 4/5] mm, oom, docs: describe the cgroup-aware OOM killer Roman Gushchin
2017-09-04 14:21 ` [v7 5/5] mm, oom: cgroup v2 mount option to disable " Roman Gushchin
2017-09-04 17:32 ` Shakeel Butt
2017-09-04 17:51 ` Roman Gushchin
2017-09-05 13:44 ` Michal Hocko
2017-09-05 14:30 ` Roman Gushchin
2017-09-05 15:12 ` Michal Hocko
2017-09-05 19:16 ` Roman Gushchin
2017-09-06 8:42 ` Michal Hocko
2017-09-06 17:40 ` Roman Gushchin
2017-09-06 17:59 ` Michal Hocko
2017-09-06 20:59 ` David Rientjes
2017-09-07 14:43 ` Christopher Lameter
2017-09-07 14:52 ` Roman Gushchin
2017-09-07 15:03 ` Christopher Lameter
2017-09-07 16:42 ` Roman Gushchin
2017-09-07 17:03 ` Christopher Lameter
2017-09-07 21:55 ` David Rientjes
2017-09-07 16:21 ` Christopher Lameter
2017-09-05 21:53 ` Johannes Weiner
2017-09-06 8:28 ` Michal Hocko [this message]
2017-09-07 16:14 ` Johannes Weiner
2017-09-11 9:05 ` Michal Hocko
2017-09-11 12:50 ` Roman Gushchin
2017-09-07 16:27 ` Christopher Lameter
2017-09-07 22:03 ` David Rientjes
2017-09-08 21:07 ` Christopher Lameter
2017-09-09 8:45 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170906082859.qlqenftxuib64j35@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).