linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	hannes@cmpxchg.org, tj@kernel.org, gthelen@google.com
Subject: Re: cgroup-aware OOM killer, how to move forward
Date: Tue, 17 Jul 2018 13:41:33 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1807171329200.12251@chino.kir.corp.google.com> (raw)
In-Reply-To: <20180717200641.GB18762@castle.DHCP.thefacebook.com>

On Tue, 17 Jul 2018, Roman Gushchin wrote:

> > > Let me show my proposal on examples. Let's say we have the following hierarchy,
> > > and the biggest process (or the process with highest oom_score_adj) is in D.
> > > 
> > >   /
> > >   |
> > >   A
> > >   |
> > >   B
> > >  / \
> > > C   D
> > > 
> > > Let's look at different examples and intended behavior:
> > > 1) system-wide OOM
> > >   - default settings: the biggest process is killed
> > >   - D/memory.group_oom=1: all processes in D are killed
> > >   - A/memory.group_oom=1: all processes in A are killed
> > > 2) memcg oom in B
> > >   - default settings: the biggest process is killed
> > >   - A/memory.group_oom=1: the biggest process is killed
> > 
> > Huh? Why would you even consider A here when the oom is below it?
> > /me confused
> 
> I do not.
> This is exactly a counter-example: A's memory.group_oom
> is not considered at all in this case,
> because A is above ooming cgroup.
> 

I think the confusion is that this says A/memory.group_oom=1 and then the 
biggest process is killed, which doesn't seem like it matches the 
description you want to give memory.group_oom.

> > >   - B/memory.group_oom=1: all processes in B are killed
> > 
> >     - B/memory.group_oom=0 &&
> > >   - D/memory.group_oom=1: all processes in D are killed
> > 
> > What about?
> >     - B/memory.group_oom=1 && D/memory.group_oom=0
> 
> All tasks in B are killed.
> 
> Group_oom set to 1 means that the workload can't tolerate
> killing of a random process, so in this case it's better
> to guarantee consistency for B.
> 

This example is missing the usecase that I was referring to, i.e. killing 
all processes attached to a subtree because the limit on a common ancestor 
has been reached.

In your example, I would think that the memory.group_oom setting of /A and 
/A/B are meaningless because there are no processes attached to them.

IIUC, your proposal is to select the victim by whatever means, check the 
memory.group_oom setting of that victim, and then either kill the victim 
or all processes attached to that local mem cgroup depending on the 
setting.

However, if C and D here are only limited by a common ancestor, /A or 
/A/B, there is no way to specify that the subtree itself should be oom 
killed.  That was where I thought a tristate value would be helpful such 
that you can define all processes attached to the subtree should be oom 
killed when a mem cgroup has reached memory.max.

I was purposefully overloading memory.group_oom because the actual value 
of memory.group_oom given your semantics here is not relevant for /A or 
/A/B.  I think an additional memory.group_oom_tree or whatever it would be 
called would lead to unnecessary confusion because then we have a model 
where one tunable means something based on the value of the other.

Given the no internal process constraint of cgroup v2, my suggestion was a 
value, "tree", that could specify that a mem cgroup reaching its limit 
could cause all processes attached to its subtree to be killed.  This is 
required only because the single unified hierarchy of cgroup v2 such that 
we want to bind a subset of processes to be controlled by another 
controller separately but still want all processes oom killed when 
reaching the limit of a common ancestor.

Thus, the semantic would be: if oom mem cgroup is "tree", kill all 
processes in subtree; otherwise, it can be "cgroup" or "process" to 
determine what is oom killed depending on the victim selection.

Having the "tree" behavior could definitely be implemented as a separate 
tunable; but then then value of /A/memory.group_oom and 
/A/B/memory.group_oom are irrelevant and, to me, seems like it would be 
more confusing.

  reply	other threads:[~2018-07-17 20:41 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 22:40 cgroup-aware OOM killer, how to move forward Roman Gushchin
2018-07-12 12:07 ` Michal Hocko
2018-07-12 15:55   ` Roman Gushchin
2018-07-13 21:34 ` David Rientjes
2018-07-13 22:16   ` Roman Gushchin
2018-07-13 22:39     ` David Rientjes
2018-07-13 23:05       ` Roman Gushchin
2018-07-13 23:11         ` David Rientjes
2018-07-13 23:16           ` Roman Gushchin
2018-07-17  4:19             ` David Rientjes
2018-07-17 12:41               ` Michal Hocko
2018-07-17 17:38               ` Roman Gushchin
2018-07-17 19:49                 ` Michal Hocko
2018-07-17 20:06                   ` Roman Gushchin
2018-07-17 20:41                     ` David Rientjes [this message]
2018-07-17 20:52                       ` Roman Gushchin
2018-07-20  8:30                         ` David Rientjes
2018-07-20 11:21                           ` Tejun Heo
2018-07-20 16:13                             ` Roman Gushchin
2018-07-20 20:28                             ` David Rientjes
2018-07-20 20:47                               ` Roman Gushchin
2018-07-23 23:06                                 ` David Rientjes
2018-07-23 14:12                               ` Michal Hocko
2018-07-18  8:19                       ` Michal Hocko
2018-07-18  8:12                     ` Michal Hocko
2018-07-18 15:28                       ` Roman Gushchin
2018-07-19  7:38                         ` Michal Hocko
2018-07-19 17:05                           ` Roman Gushchin
2018-07-20  8:32                             ` David Rientjes
2018-07-23 14:17                             ` Michal Hocko
2018-07-23 15:09                               ` Tejun Heo
2018-07-24  7:32                                 ` Michal Hocko
2018-07-24 13:08                                   ` Tejun Heo
2018-07-24 13:26                                     ` Michal Hocko
2018-07-24 13:31                                       ` Tejun Heo
2018-07-24 13:50                                         ` Michal Hocko
2018-07-24 13:55                                           ` Tejun Heo
2018-07-24 14:25                                             ` Michal Hocko
2018-07-24 14:28                                               ` Tejun Heo
2018-07-24 14:35                                                 ` Tejun Heo
2018-07-24 14:43                                                 ` Michal Hocko
2018-07-24 14:49                                                   ` Tejun Heo
2018-07-24 15:52                                                     ` Roman Gushchin
2018-07-25 12:00                                                       ` Michal Hocko
2018-07-25 11:58                                                     ` Michal Hocko
2018-07-30  8:03                                       ` Michal Hocko
2018-07-30 14:04                                         ` Tejun Heo
2018-07-30 15:29                                           ` Roman Gushchin
2018-07-24 11:59 ` Tetsuo Handa
2018-07-25  0:10   ` Roman Gushchin
2018-07-25 12:23     ` Tetsuo Handa
2018-07-25 13:01       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1807171329200.12251@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).