linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Vladimir Davydov <vdavydov.dev@gmail.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Tejun Heo <tj@kernel.org>,
	kernel-team@fb.com, cgroups@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RESEND v12 0/6] cgroup-aware OOM killer
Date: Wed, 1 Nov 2017 13:42:37 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.10.1711011329000.56485@chino.kir.corp.google.com> (raw)
In-Reply-To: <20171101073758.femijh7clfbwmqeg@dhcp22.suse.cz>

On Wed, 1 Nov 2017, Michal Hocko wrote:

> > memory.oom_score_adj would never need to be permanently tuned, just as 
> > /proc/pid/oom_score_adj need never be permanently tuned.  My response was 
> > an answer to Roman's concern that "v8 has it's own limitations," but I 
> > haven't seen a concrete example where the oom killer is forced to kill 
> > from the non-preferred cgroup while the user has power of biasing against 
> > certain cgroups with memory.oom_score_adj.  Do you have such a concrete 
> > example that we can work with?
> 
> Yes, the one with structural requirements due to other controllers or
> due to general organizational purposes where hierarchical (sibling
> oriented) comparison just doesn't work.

You mean where an admin or user does

	for i in $(cat cgroup.procs); do mkdir $i; echo $i > $i/cgroup.procs; done

to place constraints on processes with other controllers but unknowingly 
completely circumvented oom kill selection?  That's one of my concerns 
that hasn't been addressed and I believe only can be done with 
hierarchical accounting.

> Take the students, teachers,
> admins example. You definitely do not want to kill from students
> subgroups by default just because this is the largest entity type.
> Tuning memory.oom_score_adj doesn't work for that usecase as soon as
> new subgroups come and go.
> 

With hierarchical accounting, that solves all three concerns that I have 
enumerated, the top-level organizer knows the limits imposed.  It is 
therefore *trivial* to prefer /students by biasing against it with 
memory.oom_score_adj over other top-level mem cgroups and still kill 
from /students if going over a certain threshold of memory.  With 
hierarchical accounting and memory.oom_score_adj, it could even be used 
for userspace to determine which student to kill from.  If /admins is 
using more memory than expected, it gets biased against with the same 
memory.oom_score_adj.  The point is that the top-level organizer that is 
structing the mem cgroup tree knows the limits imposed and the preference 
of /admins over /students, or vice versa.  It also doesn't allow /students 
to circumvent victim selection by creating child mem cgroups.

If this is missing your point, please draw the hierarchy you are 
suggesting and show which mem cgroup is preferred by the admin and does 
not allow the user to circumvent that priority.

> > We simply cannot determine if improvements can be implemented in the 
> > future without user-visible changes if those improvements are unknown or 
> > undecided at this time.
> 
> Come on. There have been at least two examples on how this could be
> achieved. One priority based which would use cumulative memory
> consumption if set on intermediate nodes which would allow you to
> compare siblings. And another one was to add a new knob which would make
> an intermediate node an aggregate for accounting purposes.
> 

We don't need a memory.oom_group, memory.oom_hierarchical_accounting, 
memory.oom_priority, and memory.oom_score_adj.  I believe 
memory.oom_score_adj suffices.  I don't think it is in our best interest 
or the users best interest to maintain many different combinations of how 
an oom victim is selected.  I believe all the power needed is by providing 
a memory.oom_score_adj since cgroup memory usage is being considered, just 
as /proc/pid/oom_score_adj exists when process memory usage is being 
considered.  It seems very intuitive.

In the interest of not polluting the namespace, not allowing users to 
circumvent the decisions of the oom killer, and to allow userspace to be 
able to influence oom victim selection, we need this to be implemented now 
rather than later since it may affect the heuristic under consideration 
and we should have a complete patchset without being backed into a corner 
based on decisions made earlier with the rationale that it can be figured 
out later, let's merge this now.

> And I am pretty sure we have already agreed that something like this is
> useful for some usecases and nobody objected this would get merged in
> future. All we are saying now is that this is not in scope of _this_
> patchseries because the vast majority of usecases simply do not care
> about influencing the oom selection. They only do care about having per
> cgroup behavior and/or kill all semantic. I really do not understand
> what is hard about that.
> 

I honestly do not believe what hurry we're in or what is so hard to 
understand about implementing the ability of userspace to influence the 
decisionmaking that works well together with the heuristic being 
introduced.

We can stop wasting time arguing about whether its appropriate to merge an 
incomplete patchset or not and actually fix the three fundamental flaws 
that have been outlined with this approach, or I can fork the patchset and 
introduce it myself as proposed if that is preferred.

  reply	other threads:[~2017-11-01 20:42 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-19 18:52 [RESEND v12 0/6] cgroup-aware OOM killer Roman Gushchin
2017-10-19 18:52 ` [RESEND v12 1/6] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-10-19 18:52 ` [RESEND v12 2/6] mm: implement mem_cgroup_scan_tasks() for the root memory cgroup Roman Gushchin
2017-10-19 18:52 ` [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-10-19 19:30   ` Michal Hocko
2017-10-31 15:04   ` Shakeel Butt
2017-10-31 15:29     ` Michal Hocko
2017-10-31 19:06       ` Michal Hocko
2017-10-31 19:13         ` Michal Hocko
2017-10-31 16:40     ` Johannes Weiner
2017-10-31 17:50       ` Shakeel Butt
2017-10-31 18:44         ` Johannes Weiner
2017-10-19 18:52 ` [RESEND v12 4/6] mm, oom: introduce memory.oom_group Roman Gushchin
2017-10-19 18:52 ` [RESEND v12 5/6] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer Roman Gushchin
2017-10-19 18:52 ` [RESEND v12 6/6] mm, oom, docs: describe the " Roman Gushchin
2017-10-19 19:45 ` [RESEND v12 0/6] " Johannes Weiner
2017-10-19 21:09   ` Michal Hocko
2017-10-23  0:24   ` David Rientjes
2017-10-23 11:49     ` Michal Hocko
2017-10-25 20:12       ` David Rientjes
2017-10-26 14:24     ` Johannes Weiner
2017-10-26 21:03       ` David Rientjes
2017-10-27  9:31         ` Roman Gushchin
2017-10-30 21:36           ` David Rientjes
2017-10-31  7:54             ` Michal Hocko
2017-10-31 22:21               ` David Rientjes
2017-11-01  7:37                 ` Michal Hocko
2017-11-01 20:42                   ` David Rientjes [this message]
2017-10-27 20:05         ` Johannes Weiner
2017-10-31 14:17           ` peter enderborg
2017-10-31 14:34             ` Michal Hocko
2017-10-31 15:07               ` peter enderborg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1711011329000.56485@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).