Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>,
	hannes@cmpxchg.org, David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	gthelen@google.com
Subject: Re: cgroup-aware OOM killer, how to move forward
Date: Mon, 30 Jul 2018 10:03:57 +0200
Message-ID: <20180730080357.GA24267@dhcp22.suse.cz> (raw)
In-Reply-To: <20180724132640.GL28386@dhcp22.suse.cz>

On Tue 24-07-18 15:26:40, Michal Hocko wrote:
> On Tue 24-07-18 06:08:36, Tejun Heo wrote:
> > Hello,
> > 
> > On Tue, Jul 24, 2018 at 09:32:30AM +0200, Michal Hocko wrote:
> [...]
> > > > There's no reason to put any
> > > > restrictions on what each cgroup can configure.  The only thing which
> > > > matters is is that the effective behavior is what the highest in the
> > > > ancestry configures, and, at the system level, it'd conceptually map
> > > > to panic_on_oom.
> > > 
> > > Hmm, so do we inherit group_oom? If not, how do we prevent from
> > > unexpected behavior?
> > 
> > Hmm... I guess we're debating two options here.  Please consider the
> > following hierarchy.
> > 
> >       R
> >       |
> >       A (group oom == 1)
> >      / \
> >     B   C
> >     |
> >     D
> > 
> > 1. No matter what B, C or D sets, as long as A sets group oom, any oom
> >    kill inside A's subtree kills the entire subtree.
> > 
> > 2. A's group oom policy applies iff the source of the OOM is either at
> >    or above A - ie. iff the OOM is system-wide or caused by memory.max
> >    of A.
> > 
> > In #1, it doesn't matter what B, C or D sets, so it's kinda moot to
> > discuss whether they inherit A's setting or not.  A's is, if set,
> > always overriding.  In #2, what B, C or D sets matters if they also
> > set their own memory.max, so there's no reason for them to inherit
> > anything.
> > 
> > I'm actually okay with either option.  #2 is more flexible than #1 but
> > given that this is a cgroup owned property which is likely to be set
> > on per-application basis, #1 is likely good enough.
> > 
> > IIRC, we did #2 in the original implementation and the simplified one
> > is doing #1, right?
> 
> No, we've been discussing #2 unless I have misunderstood something.
> I find it rather non-intuitive that a property outside of the oom domain
> controls the behavior inside the domain. I will keep thinking about that
> though.

So the more I think about it the more I am convinced that 1 is simply
wrong for the reason I've mentioned above. Consulting a property outside
of the oom hierarchy is tricky and non-intuitive.

So the implementation should be
	oom_group = NULL;
	memcg = mem_cgroup_from_task(oom_victim);
	for (; memcg; memcg = parent_mem_cgroup(memcg)) {
		if (memcg->group_oom)
			oom_group = memcg;
		if (memcg == root)
			break;
	}

And the documented semantic
oom.group - denotes a memory cgroup (or subhierarchy) which represents
an indivisble workload and should any process be selected as an oom
victim due to an OOM even (at this cgroup level or above) then all
processes belonging to the group/hierarchy are killed together.

Please be careful when defining differen oom.group policies within the
same hierarchy because OOM events at different hierarchy levels can 
have surprising effects. Example
	R
	|
	A (oom.group = 1)
       / \
      B   C (oom.group = 0)

oom victim living in B resp. C.

OOM event at R - (e.g. global OOM) or A will kill all tasks in A subtree.
OOM event at B resp. C will only kill a single process from those
memcgs. 
-- 
Michal Hocko
SUSE Labs

  parent reply index

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 22:40 Roman Gushchin
2018-07-12 12:07 ` Michal Hocko
2018-07-12 15:55   ` Roman Gushchin
2018-07-13 21:34 ` David Rientjes
2018-07-13 22:16   ` Roman Gushchin
2018-07-13 22:39     ` David Rientjes
2018-07-13 23:05       ` Roman Gushchin
2018-07-13 23:11         ` David Rientjes
2018-07-13 23:16           ` Roman Gushchin
2018-07-17  4:19             ` David Rientjes
2018-07-17 12:41               ` Michal Hocko
2018-07-17 17:38               ` Roman Gushchin
2018-07-17 19:49                 ` Michal Hocko
2018-07-17 20:06                   ` Roman Gushchin
2018-07-17 20:41                     ` David Rientjes
2018-07-17 20:52                       ` Roman Gushchin
2018-07-20  8:30                         ` David Rientjes
2018-07-20 11:21                           ` Tejun Heo
2018-07-20 16:13                             ` Roman Gushchin
2018-07-20 20:28                             ` David Rientjes
2018-07-20 20:47                               ` Roman Gushchin
2018-07-23 23:06                                 ` David Rientjes
2018-07-23 14:12                               ` Michal Hocko
2018-07-18  8:19                       ` Michal Hocko
2018-07-18  8:12                     ` Michal Hocko
2018-07-18 15:28                       ` Roman Gushchin
2018-07-19  7:38                         ` Michal Hocko
2018-07-19 17:05                           ` Roman Gushchin
2018-07-20  8:32                             ` David Rientjes
2018-07-23 14:17                             ` Michal Hocko
2018-07-23 15:09                               ` Tejun Heo
2018-07-24  7:32                                 ` Michal Hocko
2018-07-24 13:08                                   ` Tejun Heo
2018-07-24 13:26                                     ` Michal Hocko
2018-07-24 13:31                                       ` Tejun Heo
2018-07-24 13:50                                         ` Michal Hocko
2018-07-24 13:55                                           ` Tejun Heo
2018-07-24 14:25                                             ` Michal Hocko
2018-07-24 14:28                                               ` Tejun Heo
2018-07-24 14:35                                                 ` Tejun Heo
2018-07-24 14:43                                                 ` Michal Hocko
2018-07-24 14:49                                                   ` Tejun Heo
2018-07-24 15:52                                                     ` Roman Gushchin
2018-07-25 12:00                                                       ` Michal Hocko
2018-07-25 11:58                                                     ` Michal Hocko
2018-07-30  8:03                                       ` Michal Hocko [this message]
2018-07-30 14:04                                         ` Tejun Heo
2018-07-30 15:29                                           ` Roman Gushchin
2018-07-24 11:59 ` Tetsuo Handa
2018-07-25  0:10   ` Roman Gushchin
2018-07-25 12:23     ` Tetsuo Handa
2018-07-25 13:01       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180730080357.GA24267@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git