From: Tim Hockin <thockin@hockin.org>
To: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
kernel-team@fb.com, David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, Vladimir Davydov <vdavydov.dev@gmail.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Andrew Morton <akpm@linux-foundation.org>,
Cgroups <cgroups@vger.kernel.org>,
linux-doc@vger.kernel.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Wed, 27 Sep 2017 11:11:42 -0700 [thread overview]
Message-ID: <CAAAKZwtApj-FgRc2V77nEb3BUd97Rwhgf-b-k0zhf1u+Y4fqxA@mail.gmail.com> (raw)
In-Reply-To: <20170927162300.GA5623@castle.DHCP.thefacebook.com>
On Wed, Sep 27, 2017 at 9:23 AM, Roman Gushchin <guro@fb.com> wrote:
> On Wed, Sep 27, 2017 at 08:35:50AM -0700, Tim Hockin wrote:
>> On Wed, Sep 27, 2017 at 12:43 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> > On Tue 26-09-17 20:37:37, Tim Hockin wrote:
>> > [...]
>> >> I feel like David has offered examples here, and many of us at Google
>> >> have offered examples as long ago as 2013 (if I recall) of cases where
>> >> the proposed heuristic is EXACTLY WRONG.
>> >
>> > I do not think we have discussed anything resembling the current
>> > approach. And I would really appreciate some more examples where
>> > decisions based on leaf nodes would be EXACTLY WRONG.
>> >
>> >> We need OOM behavior to kill in a deterministic order configured by
>> >> policy.
>> >
>> > And nobody is objecting to this usecase. I think we can build a priority
>> > policy on top of leaf-based decision as well. The main point we are
>> > trying to sort out here is a reasonable semantic that would work for
>> > most workloads. Sibling based selection will simply not work on those
>> > that have to use deeper hierarchies for organizational purposes. I
>> > haven't heard a counter argument for that example yet.
>>
>
> Hi, Tim!
>
>> We have a priority-based, multi-user cluster. That cluster runs a
>> variety of work, including critical things like search and gmail, as
>> well as non-critical things like batch work. We try to offer our
>> users an SLA around how often they will be killed by factors outside
>> themselves, but we also want to get higher utilization. We know for a
>> fact (data, lots of data) that most jobs have spare memory capacity,
>> set aside for spikes or simply because accurate sizing is hard. We
>> can sell "guaranteed" resources to critical jobs, with a high SLA. We
>> can sell "best effort" resources to non-critical jobs with a low SLA.
>> We achieve much better overall utilization this way.
>
> This is well understood.
>
>>
>> I need to represent the priority of these tasks in a way that gives me
>> a very strong promise that, in case of system OOM, the non-critical
>> jobs will be chosen before the critical jobs. Regardless of size.
>> Regardless of how many non-critical jobs have to die. I'd rather kill
>> *all* of the non-critical jobs than a single critical job. Size of
>> the process or cgroup is simply not a factor, and honestly given 2
>> options of equal priority I'd say age matters more than size.
>>
>> So concretely I have 2 first-level cgroups, one for "guaranteed" and
>> one for "best effort" classes. I always want to kill from "best
>> effort", even if that means killing 100 small cgroups, before touching
>> "guaranteed".
>>
>> I apologize if this is not as thorough as the rest of the thread - I
>> am somewhat out of touch with the guts of it all these days. I just
>> feel compelled to indicate that, as a historical user (via Google
>> systems) and current user (via Kubernetes), some of the assertions
>> being made here do not ring true for our very real use cases. I
>> desperately want cgroup-aware OOM handing, but it has to be
>> policy-based or it is just not useful to us.
>
> A policy-based approach was suggested by Michal at a very beginning of
> this discussion. Although nobody had any strong objections against it,
> we've agreed that this is out of scope of this patchset.
>
> The idea of this patchset is to introduce an ability to select a memcg
> as an OOM victim with the following optional killing of all belonging tasks.
> I believe, it's absolutely mandatory for _any_ further development
> of the OOM killer, which wants to deal with memory cgroups as OOM entities.
>
> If you think that it makes impossible to support some use cases in the future,
> let's discuss it. Otherwise, I'd prefer to finish this part of the work,
> and proceed to the following improvements on top of it.
>
> Thank you!
I am 100% in favor of killing whole groups. We want that too. I just
needed to express disagreement with statements that size-based
decisions could not produce bad results. They can and do.
next prev parent reply other threads:[~2017-09-27 18:12 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-11 13:17 [v8 0/4] cgroup-aware OOM killer Roman Gushchin
2017-09-11 13:17 ` [v8 1/4] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-09-11 20:51 ` David Rientjes
2017-09-14 13:42 ` Michal Hocko
2017-09-11 13:17 ` [v8 2/4] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-09-13 20:46 ` David Rientjes
2017-09-13 21:59 ` Roman Gushchin
2017-09-11 13:17 ` [v8 3/4] mm, oom: add cgroup v2 mount option for " Roman Gushchin
2017-09-11 20:48 ` David Rientjes
2017-09-12 20:01 ` Roman Gushchin
2017-09-12 20:23 ` David Rientjes
2017-09-13 12:23 ` Michal Hocko
2017-09-11 13:17 ` [v8 4/4] mm, oom, docs: describe the " Roman Gushchin
2017-09-11 20:44 ` [v8 0/4] " David Rientjes
2017-09-13 12:29 ` Michal Hocko
2017-09-13 20:46 ` David Rientjes
2017-09-14 13:34 ` Michal Hocko
2017-09-14 20:07 ` David Rientjes
2017-09-13 21:56 ` Roman Gushchin
2017-09-14 13:40 ` Michal Hocko
2017-09-14 16:05 ` Roman Gushchin
2017-09-15 10:58 ` Michal Hocko
2017-09-15 15:23 ` Roman Gushchin
2017-09-15 19:55 ` David Rientjes
2017-09-15 21:08 ` Roman Gushchin
2017-09-18 6:20 ` Michal Hocko
2017-09-18 15:02 ` Roman Gushchin
2017-09-21 8:30 ` David Rientjes
2017-09-19 20:54 ` David Rientjes
2017-09-20 22:24 ` Roman Gushchin
2017-09-21 8:27 ` David Rientjes
2017-09-18 6:16 ` Michal Hocko
2017-09-19 20:51 ` David Rientjes
2017-09-18 6:14 ` Michal Hocko
2017-09-20 21:53 ` Roman Gushchin
2017-09-25 12:24 ` Michal Hocko
2017-09-25 17:00 ` Johannes Weiner
2017-09-25 18:15 ` Roman Gushchin
2017-09-25 20:25 ` Michal Hocko
2017-09-26 10:59 ` Roman Gushchin
2017-09-26 11:21 ` Michal Hocko
2017-09-26 12:13 ` Roman Gushchin
2017-09-26 13:30 ` Michal Hocko
2017-09-26 17:26 ` Johannes Weiner
2017-09-27 3:37 ` Tim Hockin
2017-09-27 7:43 ` Michal Hocko
2017-09-27 10:19 ` Roman Gushchin
2017-09-27 15:35 ` Tim Hockin
2017-09-27 16:23 ` Roman Gushchin
2017-09-27 18:11 ` Tim Hockin [this message]
2017-10-01 23:29 ` Shakeel Butt
2017-10-02 11:56 ` Tetsuo Handa
2017-10-02 12:24 ` Michal Hocko
2017-10-02 12:47 ` Roman Gushchin
2017-10-02 14:29 ` Michal Hocko
2017-10-02 19:00 ` Shakeel Butt
2017-10-02 19:28 ` Michal Hocko
2017-10-02 19:45 ` Shakeel Butt
2017-10-02 19:56 ` Michal Hocko
2017-10-02 20:00 ` Tim Hockin
2017-10-02 20:08 ` Michal Hocko
2017-10-02 20:20 ` Shakeel Butt
2017-10-02 20:24 ` Shakeel Butt
2017-10-02 20:34 ` Johannes Weiner
2017-10-02 20:55 ` Michal Hocko
2017-09-25 22:21 ` David Rientjes
2017-09-26 8:46 ` Michal Hocko
2017-09-26 21:04 ` David Rientjes
2017-09-27 7:37 ` Michal Hocko
2017-09-27 9:57 ` Roman Gushchin
2017-09-21 14:21 ` Johannes Weiner
2017-09-21 21:17 ` David Rientjes
2017-09-21 21:51 ` Johannes Weiner
2017-09-22 20:53 ` David Rientjes
2017-09-22 15:44 ` Tejun Heo
2017-09-22 20:39 ` David Rientjes
2017-09-22 21:05 ` Tejun Heo
2017-09-23 8:16 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAAAKZwtApj-FgRc2V77nEb3BUd97Rwhgf-b-k0zhf1u+Y4fqxA@mail.gmail.com \
--to=thockin@hockin.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).