All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yafang Shao <laoar.shao@gmail.com>,
	akpm@linux-foundation.org, vdavydov.dev@gmail.com,
	linux-mm@kvack.org, Chris Down <chris@chrisdown.name>,
	Roman Gushchin <guro@fb.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] mm, memcg: fix wrong mem cgroup protection
Date: Mon, 27 Apr 2020 10:25:24 +0200	[thread overview]
Message-ID: <20200427082524.GC28637@dhcp22.suse.cz> (raw)
In-Reply-To: <20200424165103.GA575707@cmpxchg.org>

On Fri 24-04-20 12:51:03, Johannes Weiner wrote:
> On Fri, Apr 24, 2020 at 06:21:03PM +0200, Michal Hocko wrote:
> > On Fri 24-04-20 11:10:13, Johannes Weiner wrote:
> > > On Fri, Apr 24, 2020 at 04:29:58PM +0200, Michal Hocko wrote:
> > > > On Fri 24-04-20 09:14:50, Johannes Weiner wrote:
> > > > > On Thu, Apr 23, 2020 at 02:16:29AM -0400, Yafang Shao wrote:
> > > > > > This patch is an improvement of a previous version[1], as the previous
> > > > > > version is not easy to understand.
> > > > > > This issue persists in the newest kernel, I have to resend the fix. As
> > > > > > the implementation is changed, I drop Roman's ack from the previous
> > > > > > version.
> > > > > 
> > > > > Now that I understand the problem, I much prefer the previous version.
> > > > > 
> > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > index 745697906ce3..2bf91ae1e640 100644
> > > > > --- a/mm/memcontrol.c
> > > > > +++ b/mm/memcontrol.c
> > > > > @@ -6332,8 +6332,19 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> > > > >  
> > > > >  	if (!root)
> > > > >  		root = root_mem_cgroup;
> > > > > -	if (memcg == root)
> > > > > +	if (memcg == root) {
> > > > > +		/*
> > > > > +		 * The cgroup is the reclaim root in this reclaim
> > > > > +		 * cycle, and therefore not protected. But it may have
> > > > > +		 * stale effective protection values from previous
> > > > > +		 * cycles in which it was not the reclaim root - for
> > > > > +		 * example, global reclaim followed by limit reclaim.
> > > > > +		 * Reset these values for mem_cgroup_protection().
> > > > > +		 */
> > > > > +		memcg->memory.emin = 0;
> > > > > +		memcg->memory.elow = 0;
> > > > >  		return MEMCG_PROT_NONE;
> > > > > +	}
> > > > 
> > > > Could you be more specific why you prefer this over the
> > > > mem_cgroup_protection which doesn't change the effective value?
> > > > Isn't it easier to simply ignore effective value for the reclaim roots?
> > > 
> > > Because now both mem_cgroup_protection() and mem_cgroup_protected()
> > > have to know about the reclaim root semantics, instead of just the one
> > > central place.
> > 
> > Yes this is true but it is also potentially overwriting the state with
> > a parallel reclaim which can lead to surprising results
> 
> Checking in mem_cgroup_protection() doesn't avoid the fundamental race:
> 
>   root
>      `- A (low=2G, elow=2G, max=3G)
>         `- A1 (low=2G, elow=2G)
> 
> If A does limit reclaim while global reclaim races, the memcg == root
> check in mem_cgroup_protection() will reliably calculate the "right"
> scan value for A, which has no pages, and the wrong scan value for A1
> where the memory actually is.

I am sorry but I do not see how A1 would get wrong scan value.
- Global reclaim
  - A.elow = 2G
  - A1.elow = min(A1.low, A1.usage) ; if (A.children_low_usage < A.elow)

- A reclaim.
  - A.elow = stale/undefined
  - A1.elow = A1.low

if mem_cgroup_protection returns 0 for A's reclaim targeting A (assuming
the check is there) then not a big deal as there are no pages there as
you say.

Let's compare the GR (global reclaim), AR (A reclaim).
GR(A1.elow) <= AR(A1.elow) by definition, right? For A1.low
overcommitted we have
min(A1.low, A1.usage) * A.elow / A.children_low_usage <= min(A1.low, A1.usage)
because A.elow <= A.children_low_usage

so in both cases we have GR(A1.elow) <= AR(A1.elow) which means that
racing reclaims will behave sanely because the protection for the
external pressure pressure is not violated. A is going to reclaim A1
less than the global reclaim but that should be OK.

Or what do I miss?

> I'm okay with fixing the case where a really old left-over value is
> used by target reclaim.
> 
> I don't see a point in special casing this one instance of a
> fundamental race condition at the expense of less robust code.

I am definitely not calling to fragment the code. I do agree that having
a special case in mem_cgroup_protection is quite non-intuitive.
The existing code is quite hard to reason about in its current form
as we can see. If we can fix all that in mem_cgroup_protected then no
objections from me at all.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2020-04-27  8:25 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23  6:16 [PATCH] mm, memcg: fix wrong mem cgroup protection Yafang Shao
2020-04-23 15:33 ` Chris Down
2020-04-23 21:13   ` Roman Gushchin
2020-04-24  0:32     ` Yafang Shao
2020-04-24  0:32       ` Yafang Shao
2020-04-24 10:40     ` Michal Hocko
2020-04-24 10:57       ` Yafang Shao
2020-04-24 10:57         ` Yafang Shao
2020-04-24  0:49   ` Yafang Shao
2020-04-24  0:49     ` Yafang Shao
2020-04-24 12:18     ` Chris Down
2020-04-24 12:44       ` Yafang Shao
2020-04-24 12:44         ` Yafang Shao
2020-04-24 13:05         ` Chris Down
2020-04-24 13:10           ` Yafang Shao
2020-04-24 13:10             ` Yafang Shao
2020-04-23 21:06 ` Roman Gushchin
2020-04-24  0:29   ` Yafang Shao
2020-04-24  0:29     ` Yafang Shao
2020-04-24 13:14 ` Johannes Weiner
2020-04-24 13:44   ` Johannes Weiner
2020-04-24 14:33     ` Michal Hocko
2020-04-24 16:08     ` Yafang Shao
2020-04-24 16:08       ` Yafang Shao
2020-04-24 14:29   ` Michal Hocko
2020-04-24 15:10     ` Johannes Weiner
2020-04-24 16:21       ` Michal Hocko
2020-04-24 16:51         ` Johannes Weiner
2020-04-27  8:25           ` Michal Hocko [this message]
2020-04-27  8:37             ` Yafang Shao
2020-04-27  8:37               ` Yafang Shao
2020-04-27 16:52             ` Johannes Weiner
2020-04-24 16:21     ` Roman Gushchin
2020-04-24 16:30       ` Yafang Shao
2020-04-24 16:30         ` Yafang Shao
2020-04-24 16:00   ` Yafang Shao
2020-04-24 16:00     ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200427082524.GC28637@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chrisdown.name \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=stable@vger.kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.