linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Tejun Heo <tj@kernel.org>, Cgroups <cgroups@vger.kernel.org>,
	<kernel-team@fb.com>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/2] memory.low,min reclaim
Date: Tue, 24 Apr 2018 11:09:33 +0100	[thread overview]
Message-ID: <20180424100926.GA23745@castle.DHCP.thefacebook.com> (raw)
In-Reply-To: <CAHH2K0bDXrs+J3jWB1X7wphRMoLgjVUTAAFNLGFarDeAfRhA7Q@mail.gmail.com>

On Tue, Apr 24, 2018 at 12:56:09AM +0000, Greg Thelen wrote:
> On Mon, Apr 23, 2018 at 3:38 AM Roman Gushchin <guro@fb.com> wrote:
> 
> > Hi, Greg!
> 
> > On Sun, Apr 22, 2018 at 01:26:10PM -0700, Greg Thelen wrote:
> > > Roman's previously posted memory.low,min patches add per memcg effective
> > > low limit to detect overcommitment of parental limits.  But if we flip
> > > low,min reclaim to bail if usage<{low,min} at any level, then we don't
> > > need an effective low limit, which makes the code simpler.  When parent
> > > limits are overcommited memory.min will oom kill, which is more drastic but
> > > makes the memory.low a simpler concept.  If memcg a/b wants oom kill before
> > > reclaim, then give it to them.  It seems a bit strange for a/b/memory.low's
> > > behaviour to depend on a/c/memory.low (i.e. a/b.low is strong unless
> > > a/b.low+a/c.low exceed a.low).
> 
> > It's actually not strange: a/b and a/c are sharing a common resource:
> > a/memory.low.
> 
> > Exactly as a/b/memory.max and a/c/memory.max are sharing a/memory.max.
> > If there are sibling cgroups which are consuming memory, a cgroup can't
> > exceed parent's memory.max, even if its memory.max is grater.
> 
> > >
> > > I think there might be a simpler way (ableit it doesn't yet include
> > > Documentation):
> > > - memcg: fix memory.low
> > > - memcg: add memory.min
> > >  3 files changed, 75 insertions(+), 6 deletions(-)
> > >
> > > The idea of this alternate approach is for memory.low,min to avoid
> reclaim
> > > if any portion of under-consideration memcg ancestry is under respective
> > > limit.
> 
> > This approach has a significant downside: it breaks hierarchical
> constraints
> > for memory.low/min. There are two important outcomes:
> 
> > 1) Any leaf's memory.low/min value is respected, even if parent's value
> >           is lower or even 0. It's not possible anymore to limit the amount
> of
> >           protected memory for a sub-tree.
> >           This is especially bad in case of delegation.
> 
> As someone who has been using something like memory.min for a while, I have
> cases where it needs to be a strong protection.  Such jobs prefer oom kill
> to reclaim.  These jobs know they need X MB of memory.  But I guess it's on
> me to avoid configuring machines which overcommit memory.min at such cgroup
> levels all the way to the root.

Absolutely.

> 
> > 2) If a cgroup has an ancestor with the usage under its memory.low/min,
> >           it becomes protection, even if its memory.low/min is 0. So it
> becomes
> >           impossible to have unprotected cgroups in protected sub-tree.
> 
> Fair point.
> 
> One use case is where a non trivial job which has several memory accounting
> subcontainers.  Is there a way to only set memory.low at the top and have
> the offer protection to the job?
> The case I'm thinking of is:
> % cd /cgroup
> % echo +memory > cgroup.subtree_control
> % mkdir top
> % echo +memory > top/cgroup.subtree_control
> % mkdir top/part1 top/part2
> % echo 1GB > top/memory.min
> % (echo $BASHPID > top/part1/cgroup.procs && part1)
> % (echo $BASHPID > top/part2/cgroup.procs && part2)
> 
> Empirically it's been measured that the entire workload (/top) needs 1GB to
> perform well.  But we don't care how the memory is distributed between
> part1,part2.  Is the strategy for that to set /top, /top/part1.min, and
> /top/part2.min to 1GB?

The problem is that right now we don't have an "undefined" value for
memory.min/low. The default value is 0, which means "no protection".
So there is no way how a user can express "whatever parent cgroup wants".
It might be useful to introduce such value, as other controllers
may benefit too. But it's a separate theme to discuss.

In your example, it's possible to achieve the requested behavior by setting
top.min into 1G and part1.min and part2.min into "max".

> 
> What do you think about exposing emin and elow to user space?  I think that
> would reduce admin/user confusion in situations where memory.min is
> internally discounted.

They might be useful in some cases (e.g. a cgroup want's to know how much
actual protection it can get), but at the same time these values are
intentionally racy and don't have a clear semantics.
So, maybe we can show them in memory.stat, but I doubt that they deserve
a separate interface file.

> 
> (tangent) Delegation in v2 isn't something I've been able to fully
> internalize yet.
> The "no interior processes" rule challenges my notion of subdelegation.
> My current model is where a system controller creates a container C with
> C.min and then starts client manager process M in C.  Then M can choose
> to further divide C's resources (e.g. C/S).  This doesn't seem possible
> because v2 doesn't allow for interior processes.  So the system manager
> would need to create C, set C.low, create C/sub_manager, create
> C/sub_resources, set C/sub_manager.low, set C/sub_resources.low, then start
> M in C/sub_manager.  Then sub_manager can create and manage
> C/sub_resources/S.

And this is a good example of a case, when some cgroups in the tree
should be protected to work properly (for example, C/sub_manager/memory.low = 128M),
while an actual workload might be not (C/sub_resources/memory.low = 0).

Thanks!

      reply	other threads:[~2018-04-24 10:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-20 22:33 [RFC] mm: memory.low heirarchical behavior Roman Gushchin
2018-03-21 18:23 ` Johannes Weiner
2018-03-21 19:08   ` Roman Gushchin
2018-04-04 17:07     ` Johannes Weiner
2018-04-05 13:54       ` Roman Gushchin
2018-04-05 15:00         ` Johannes Weiner
2018-03-23 16:37   ` [PATCH v2] mm: memory.low hierarchical behavior Roman Gushchin
2018-04-22 20:26 ` [RFC PATCH 0/2] memory.low,min reclaim Greg Thelen
2018-04-22 20:26   ` [RFC PATCH 1/2] memcg: fix memory.low Greg Thelen
2018-04-22 20:26   ` [RFC PATCH 2/2] memcg: add memory.min Greg Thelen
2018-04-23 10:38   ` [RFC PATCH 0/2] memory.low,min reclaim Roman Gushchin
2018-04-24  0:56     ` Greg Thelen
2018-04-24 10:09       ` Roman Gushchin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180424100926.GA23745@castle.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).