From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=bxHB=4C=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CC91CC3B18F
	for <linux-mm@archiver.kernel.org>; Fri, 14 Feb 2020 02:02:43 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 5E18720848
	for <linux-mm@archiver.kernel.org>; Fri, 14 Feb 2020 02:02:43 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="sIP6gtVC"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5E18720848
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id B31A16B05D0; Thu, 13 Feb 2020 21:02:42 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id AE0726B05D1; Thu, 13 Feb 2020 21:02:42 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 9D05B6B05D2; Thu, 13 Feb 2020 21:02:42 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34])
	by kanga.kvack.org (Postfix) with ESMTP id 83ACA6B05D0
	for <linux-mm@kvack.org>; Thu, 13 Feb 2020 21:02:42 -0500 (EST)
Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id 397B9181AC217
	for <linux-mm@kvack.org>; Fri, 14 Feb 2020 02:02:42 +0000 (UTC)
X-FDA: 76487083764.12.table54_6ed960ecf7859
X-HE-Tag: table54_6ed960ecf7859
X-Filterd-Recvd-Size: 9007
Received: from mail-io1-f67.google.com (mail-io1-f67.google.com [209.85.166.67])
	by imf45.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri, 14 Feb 2020 02:02:41 +0000 (UTC)
Received: by mail-io1-f67.google.com with SMTP id s24so8846186iog.5
        for <linux-mm@kvack.org>; Thu, 13 Feb 2020 18:02:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=yWoUBz+C7vm0A7bAhVVGXf9KuwG1nz4uADeDv9msL44=;
        b=sIP6gtVCerCIVpXLAoh50FbLG8VrPOSk/ScVTqjURLSSRW3frQUPSxY4R2mh+84mZl
         ig1XEQT6dIy3NXDMHrF0YZ4fimDlVaHEvCFjyOE0/DIHrMb99k8S6e+RoJyTBIso5Ql8
         dDeyickdIRmtWR7boXdp+0eqi1LXl0EMKvCJ7K04AtJPgP08lPS22w1VJFQAf8MJHaq4
         7ucv73YRCGmkEQg3CcjLQp/aA6FdRZXKCdgYyBnQL6zbvXyg2Fg6qZJBPOFzcz/ncLDI
         vXkd2Uj23tpEW7DsXmEBtnbGQ6m1mJGnKKAODnt45fRssk3gp9KGs1o9w0FtXUQ21PF0
         x+Gg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=yWoUBz+C7vm0A7bAhVVGXf9KuwG1nz4uADeDv9msL44=;
        b=RvbNBKR8A5gqZszDqmY20GLkTlpr4F5OEoP9ldTDLPIT0MqRCghiz5/nrHnjrfZEP1
         M+SFmkzSUQ3OaCc0JG42OWAAA4UJPKbXS4QRxNJCItK2knhEOkX8cZLdc8K5YgM5vHFO
         qdt0WsO7ZtcAUfSBR8Wb+sGOj0oZCo/7xx2DSraxakaDKkr6myurONsZzMXV8K/i/xqs
         8OyJ+x3fAEJJbnsINEvqdz4UBOxGVmGm0UkblNsK49FHrar4PU4qQ3ArkbkomoWGqySr
         dqhX3cJTnlJujILRZ1FsZ0NBACrSl4JA+zeOw1hrxfgkcWVIIMapSacOUrSTH1wVK+Nj
         GW+Q==
X-Gm-Message-State: APjAAAV8BhEkIMqtxmPmUDtO7Pn54tvJswtzJfaQ7rR6sTSVDHliaRsD
	AWLepBn6qPo/WlZUTRRMIUezCCJJ0/Tsin9h+n8=
X-Google-Smtp-Source: APXvYqyXqlharYXwxORkT7JzKk16r7h+7iSeiE0668V+RrlfSMv6jmg5byz682eRCj8akm4FqF3Kg8rxuMx0y/zcLgU=
X-Received: by 2002:a6b:f214:: with SMTP id q20mr447063ioh.137.1581645760858;
 Thu, 13 Feb 2020 18:02:40 -0800 (PST)
MIME-Version: 1.0
References: <20200211175507.178100-1-hannes@cmpxchg.org> <CALOAHbC3Bx3E7fwt35zuiHfuC8YyhVWA1tDh2KP+gQJoMtED3w@mail.gmail.com>
 <20200212164235.GB180867@cmpxchg.org> <CALOAHbCiBqdZzZVC7_c3Um_vDUu9ECsDYUebOL4+=MP9owA_Og@mail.gmail.com>
 <20200213134627.GB208501@cmpxchg.org>
In-Reply-To: <20200213134627.GB208501@cmpxchg.org>
From: Yafang Shao <laoar.shao@gmail.com>
Date: Fri, 14 Feb 2020 10:02:04 +0800
Message-ID: <CALOAHbD3FQWMN1q-O0Va+hk3Uo2gHnB1-OF870rCpiKPEk8otQ@mail.gmail.com>
Subject: Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-fsdevel@vger.kernel.org, Linux MM <linux-mm@kvack.org>, 
	LKML <linux-kernel@vger.kernel.org>, Dave Chinner <david@fromorbit.com>, 
	Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>, Andrew Morton <akpm@linux-foundation.org>, 
	Linus Torvalds <torvalds@linux-foundation.org>, Al Viro <viro@zeniv.linux.org.uk>, 
	Kernel Team <kernel-team@fb.com>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Feb 13, 2020 at 9:46 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, Feb 13, 2020 at 09:47:29AM +0800, Yafang Shao wrote:
> > On Thu, Feb 13, 2020 at 12:42 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > On Wed, Feb 12, 2020 at 08:25:45PM +0800, Yafang Shao wrote:
> > > > On Wed, Feb 12, 2020 at 1:55 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > > Another variant of this problem was recently observed, where the
> > > > > kernel violates cgroups' memory.low protection settings and reclaims
> > > > > page cache way beyond the configured thresholds. It was followed by a
> > > > > proposal of a modified form of the reverted commit above, that
> > > > > implements memory.low-sensitive shrinker skipping over populated
> > > > > inodes on the LRU [1]. However, this proposal continues to run the
> > > > > risk of attracting disproportionate reclaim pressure to a pool of
> > > > > still-used inodes,
> > > >
> > > > Hi Johannes,
> > > >
> > > > If you really think that is a risk, what about bellow additional patch
> > > > to fix this risk ?
> > > >
> > > > diff --git a/fs/inode.c b/fs/inode.c
> > > > index 80dddbc..61862d9 100644
> > > > --- a/fs/inode.c
> > > > +++ b/fs/inode.c
> > > > @@ -760,7 +760,7 @@ static bool memcg_can_reclaim_inode(struct inode *inode,
> > > >                 goto out;
> > > >
> > > >         cgroup_size = mem_cgroup_size(memcg);
> > > > -       if (inode->i_data.nrpages + protection >= cgroup_size)
> > > > +       if (inode->i_data.nrpages)
> > > >                 reclaimable = false;
> > > >
> > > >  out:
> > > >
> > > > With this additional patch, we skip all inodes in this memcg until all
> > > > its page cache pages are reclaimed.
> > >
> > > Well that's something we've tried and had to revert because it caused
> > > issues in slab reclaim. See the History part of my changelog.
> >
> > You misuderstood it.
> > The reverted patch skips all inodes in the system, while this patch
> > only works when you turn on memcg.{min, low} protection.
> > IOW, that is not a default behavior, while it only works when you want
> > it and only effect your targeted memcg rather than the whole system.
>
> I understand perfectly well.
>
> Keeping unreclaimable inodes on the shrinker LRU causes the shrinker
> to build up excessive pressure on all VFS objects. This is a
> bug. Making it cgroup-specific doesn't make it less of a bug, it just
> means you only hit the bug when you use cgroup memory protection.
>

What I mean to fix is really a cgroup-specific issue, but this issue
may be different with what you're meaning to fix.
(I will explain it bellow)
Considering the excessive pressure the protected inodes may give to
the shrinker, the protected page cache pages will give much more
pressure on the reclaimer. If you mean to remove the protecrted inodes
from the shrinker LRU, why not removing the protected page cache pages
from the page cache LRU as well ? Well, what I really to mean is, that
is how the memcg proctection works.

> > > > > while not addressing the more generic reclaim
> > > > > inversion problem outside of a very specific cgroup application.
> > > > >
> > > >
> > > > But I have a different understanding.  This method works like a
> > > > knob. If you really care about your workingset (data), you should
> > > > turn it on (i.e. by using memcg protection to protect them), while
> > > > if you don't care about your workingset (data) then you'd better
> > > > turn it off. That would be more flexible.  Regaring your case in the
> > > > commit log, why not protect your linux git tree with memcg
> > > > protection ?
> > >
> > > I can't imagine a scenario where I *wouldn't* care about my
> > > workingset, though. Why should it be opt-in, not the default?
> >
> > Because the default behavior has caused the XFS performace hit.
>
> That means that with your proposal you cannot use cgroup memory
> protection for workloads that run on xfs.
>

Well, if you set memory.min to protect your workload inside a specific
memcg, it means that you already know these memroy can't be used by
your workload outside the memcg. That means, the performace of the
workload outside the memcg may not as good as before. Then you should
adjust your SLA or migrating this protected memcgs to other host or
just killing this protected memcg.
IOW, the result is *expected*.

> (And if I remember the bug report correctly, this wasn't just xfs. It
> also caused metadata caches on other filesystems to get trashed. xfs
> was just more pronounced because it does sync inode flushing from the
> shrinker, adding write stalls to the mix of metadata cache misses.)
>
> What I'm proposing is an implementation that protects hot page cache
> without causing excessive shrinker pressure and rotations.

That's the different between your issue and my issue.
You're trying to fix the issue around the hot  page cache, but what I
want to fix may be cold page cache and it really is a memcg protection
specific issue.
Becuase the memcg protection can protect all page cache pages, even if
the page cache pages are cold and the inodes are cold (in the tail of
the list lru) as well.  That is one of the reasons why memcg protect
exist. (I know you are the author of memcg protection, but I have to
clarify what memcg protect is.)

Regarding your issue around the hot page cache  pages, I have another
question. If the page cache pages are hot, why are the inode of these
page cahe pages cold (in the tail of the list lru) ?  Per my
understanding, if the page cache pages are hot, the inodes of them
should be hot (not in the tail of the list lur) as well. That should
be how the LRU works.

Well, that doesn't mean I object to your patch.  What I really want to
clarify is that our issues are really different.

Thanks
Yafang