From: David Rientjes <rientjes@google.com>
To: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, akpm@linux-foundation.org,
hannes@cmpxchg.org, tj@kernel.org, gthelen@google.com
Subject: Re: cgroup-aware OOM killer, how to move forward
Date: Tue, 17 Jul 2018 13:41:33 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.21.1807171329200.12251@chino.kir.corp.google.com> (raw)
In-Reply-To: <20180717200641.GB18762@castle.DHCP.thefacebook.com>
On Tue, 17 Jul 2018, Roman Gushchin wrote:
> > > Let me show my proposal on examples. Let's say we have the following hierarchy,
> > > and the biggest process (or the process with highest oom_score_adj) is in D.
> > >
> > > /
> > > |
> > > A
> > > |
> > > B
> > > / \
> > > C D
> > >
> > > Let's look at different examples and intended behavior:
> > > 1) system-wide OOM
> > > - default settings: the biggest process is killed
> > > - D/memory.group_oom=1: all processes in D are killed
> > > - A/memory.group_oom=1: all processes in A are killed
> > > 2) memcg oom in B
> > > - default settings: the biggest process is killed
> > > - A/memory.group_oom=1: the biggest process is killed
> >
> > Huh? Why would you even consider A here when the oom is below it?
> > /me confused
>
> I do not.
> This is exactly a counter-example: A's memory.group_oom
> is not considered at all in this case,
> because A is above ooming cgroup.
>
I think the confusion is that this says A/memory.group_oom=1 and then the
biggest process is killed, which doesn't seem like it matches the
description you want to give memory.group_oom.
> > > - B/memory.group_oom=1: all processes in B are killed
> >
> > - B/memory.group_oom=0 &&
> > > - D/memory.group_oom=1: all processes in D are killed
> >
> > What about?
> > - B/memory.group_oom=1 && D/memory.group_oom=0
>
> All tasks in B are killed.
>
> Group_oom set to 1 means that the workload can't tolerate
> killing of a random process, so in this case it's better
> to guarantee consistency for B.
>
This example is missing the usecase that I was referring to, i.e. killing
all processes attached to a subtree because the limit on a common ancestor
has been reached.
In your example, I would think that the memory.group_oom setting of /A and
/A/B are meaningless because there are no processes attached to them.
IIUC, your proposal is to select the victim by whatever means, check the
memory.group_oom setting of that victim, and then either kill the victim
or all processes attached to that local mem cgroup depending on the
setting.
However, if C and D here are only limited by a common ancestor, /A or
/A/B, there is no way to specify that the subtree itself should be oom
killed. That was where I thought a tristate value would be helpful such
that you can define all processes attached to the subtree should be oom
killed when a mem cgroup has reached memory.max.
I was purposefully overloading memory.group_oom because the actual value
of memory.group_oom given your semantics here is not relevant for /A or
/A/B. I think an additional memory.group_oom_tree or whatever it would be
called would lead to unnecessary confusion because then we have a model
where one tunable means something based on the value of the other.
Given the no internal process constraint of cgroup v2, my suggestion was a
value, "tree", that could specify that a mem cgroup reaching its limit
could cause all processes attached to its subtree to be killed. This is
required only because the single unified hierarchy of cgroup v2 such that
we want to bind a subset of processes to be controlled by another
controller separately but still want all processes oom killed when
reaching the limit of a common ancestor.
Thus, the semantic would be: if oom mem cgroup is "tree", kill all
processes in subtree; otherwise, it can be "cgroup" or "process" to
determine what is oom killed depending on the victim selection.
Having the "tree" behavior could definitely be implemented as a separate
tunable; but then then value of /A/memory.group_oom and
/A/B/memory.group_oom are irrelevant and, to me, seems like it would be
more confusing.
next prev parent reply other threads:[~2018-07-17 20:41 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-11 22:40 cgroup-aware OOM killer, how to move forward Roman Gushchin
2018-07-12 12:07 ` Michal Hocko
2018-07-12 15:55 ` Roman Gushchin
2018-07-13 21:34 ` David Rientjes
2018-07-13 22:16 ` Roman Gushchin
2018-07-13 22:39 ` David Rientjes
2018-07-13 23:05 ` Roman Gushchin
2018-07-13 23:11 ` David Rientjes
2018-07-13 23:16 ` Roman Gushchin
2018-07-17 4:19 ` David Rientjes
2018-07-17 12:41 ` Michal Hocko
2018-07-17 17:38 ` Roman Gushchin
2018-07-17 19:49 ` Michal Hocko
2018-07-17 20:06 ` Roman Gushchin
2018-07-17 20:41 ` David Rientjes [this message]
2018-07-17 20:52 ` Roman Gushchin
2018-07-20 8:30 ` David Rientjes
2018-07-20 11:21 ` Tejun Heo
2018-07-20 16:13 ` Roman Gushchin
2018-07-20 20:28 ` David Rientjes
2018-07-20 20:47 ` Roman Gushchin
2018-07-23 23:06 ` David Rientjes
2018-07-23 14:12 ` Michal Hocko
2018-07-18 8:19 ` Michal Hocko
2018-07-18 8:12 ` Michal Hocko
2018-07-18 15:28 ` Roman Gushchin
2018-07-19 7:38 ` Michal Hocko
2018-07-19 17:05 ` Roman Gushchin
2018-07-20 8:32 ` David Rientjes
2018-07-23 14:17 ` Michal Hocko
2018-07-23 15:09 ` Tejun Heo
2018-07-24 7:32 ` Michal Hocko
2018-07-24 13:08 ` Tejun Heo
2018-07-24 13:26 ` Michal Hocko
2018-07-24 13:31 ` Tejun Heo
2018-07-24 13:50 ` Michal Hocko
2018-07-24 13:55 ` Tejun Heo
2018-07-24 14:25 ` Michal Hocko
2018-07-24 14:28 ` Tejun Heo
2018-07-24 14:35 ` Tejun Heo
2018-07-24 14:43 ` Michal Hocko
2018-07-24 14:49 ` Tejun Heo
2018-07-24 15:52 ` Roman Gushchin
2018-07-25 12:00 ` Michal Hocko
2018-07-25 11:58 ` Michal Hocko
2018-07-30 8:03 ` Michal Hocko
2018-07-30 14:04 ` Tejun Heo
2018-07-30 15:29 ` Roman Gushchin
2018-07-24 11:59 ` Tetsuo Handa
2018-07-25 0:10 ` Roman Gushchin
2018-07-25 12:23 ` Tetsuo Handa
2018-07-25 13:01 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.21.1807171329200.12251@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=gthelen@google.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).