linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Tejun Heo <tj@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH 1/3] mm: memcontrol: fix memory.low proportional distribution
Date: Fri, 13 Dec 2019 20:40:31 +0000	[thread overview]
Message-ID: <20191213204026.GA6830@localhost.localdomain> (raw)
In-Reply-To: <20191213192158.188939-2-hannes@cmpxchg.org>

On Fri, Dec 13, 2019 at 02:21:56PM -0500, Johannes Weiner wrote:
> When memory.low is overcommitted - i.e. the children claim more
> protection than their shared ancestor grants them - the allowance is
> distributed in proportion to each siblings's utilized protection:
> 
> 	low_usage = min(low, usage)
> 	elow = parent_elow * (low_usage / siblings_low_usage)
> 
> However, siblings_low_usage is not the sum of all low_usages. It sums
> up the usages of *only those cgroups that are within their memory.low*
> That means that low_usage can be *bigger* than siblings_low_usage, and
> consequently the total protection afforded to the children can be
> bigger than what the ancestor grants the subtree.
> 
> Consider three groups where two are in excess of their protection:
> 
>   A/memory.low = 10G
>   A/A1/memory.low = 10G, A/memory.current = 20G
>   A/A2/memory.low = 10G, B/memory.current = 20G
>   A/A3/memory.low = 10G, C/memory.current =  8G
> 
>   siblings_low_usage = 8G (only A3 contributes)
>   A1/elow = parent_elow(10G) * low_usage(20G) / siblings_low_usage(8G) = 25G
> 
> The 25G are then capped to A1's own memory.low setting, i.e. 10G. The
> same is true for A2. And A3 would also receive 10G. The combined
> protection of A1, A2 and A3 is 30G, when A limits the tree to 10G.
> 
> What does this mean in practice? A1 and A2 would still be in excess of
> their 10G allowance and would be reclaimed, whereas A3 would not. As
> they eventually drop below their protection setting, they would be
> counted in siblings_low_usage again and the error would right itself.
> 
> When reclaim is applied in a binary fashion - cgroup is reclaimed when
> it's above its protection, otherwise it's skipped - this could work
> actually work out just fine - although it's not quite clear to me why
> we'd introduce this error in the first place.

This complication is not simple an error, it protects cgroups under
their low limits if there is unprotected memory.

So, here is an example:

      A      A/memory.low = 2G, A/memory.current = 4G
     / \
    B   C    B/memory.low = 3G  B/memory.current = 2G
             C/memory.low = 1G  C/memory.current = 2G

as now:

B/elow = 2G * 2G / 2G = 2G == B/memory.current
C/elow = 2G * 1G / 2G = 1G < C/memory.current

with this fix:

B/elow = 2G * 2G / 3G = 4/3 G < B/memory.current
C/elow = 2G * 1G / 3G = 2/3 G < C/memory.current

So in other words, currently B won't be scanned at all, because
there is 1G of unprotected memory in C. With your patch both B and C
will be scanned.

> However, since
> 1bc63fb1272b ("mm, memcg: make scan aggression always exclude
> protection"), reclaim pressure is scaled to how much a cgroup is above
> its protection. As a result this calculation error unduly skews
> pressure away from A1 and A2 toward the rest of the system.

It could be that with 1bc63fb1272b the target memory distribution
will be fine. However the patch will change the memory pressure in B and C
(in the example above). Maybe it's ok, but at least it should be discussed
and documented.

> 
> Fix this by by making siblings_low_usage the sum of all protected
> memory among siblings, including those that are in excess of their
> protection.

So I'm afraid the fix need to be more complex. I need to think a bit more about
it.

Thanks!

  reply	other threads:[~2019-12-13 20:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-13 19:21 [PATCH 0/3] mm: memcontrol: recursive memory protection Johannes Weiner
2019-12-13 19:21 ` [PATCH 1/3] mm: memcontrol: fix memory.low proportional distribution Johannes Weiner
2019-12-13 20:40   ` Roman Gushchin [this message]
2019-12-16 18:25     ` Johannes Weiner
2019-12-16 19:11       ` Roman Gushchin
2019-12-13 19:21 ` [PATCH 2/3] mm: memcontrol: clean up and document effective low/min calculations Johannes Weiner
2019-12-13 19:21 ` [PATCH 3/3] mm: memcontrol: recursive memory.low protection Johannes Weiner
2019-12-13 20:05   ` Johannes Weiner
2020-02-27 19:56 [PATCH 0/3] " Johannes Weiner
2020-02-27 19:56 ` [PATCH 1/3] mm: memcontrol: fix memory.low proportional distribution Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191213204026.GA6830@localhost.localdomain \
    --to=guro@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).