Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU
Date: Thu, 13 Feb 2020 08:46:27 -0500
Message-ID: <20200213134627.GB208501@cmpxchg.org> (raw)
In-Reply-To: <CALOAHbCiBqdZzZVC7_c3Um_vDUu9ECsDYUebOL4+=MP9owA_Og@mail.gmail.com>

On Thu, Feb 13, 2020 at 09:47:29AM +0800, Yafang Shao wrote:
> On Thu, Feb 13, 2020 at 12:42 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Wed, Feb 12, 2020 at 08:25:45PM +0800, Yafang Shao wrote:
> > > On Wed, Feb 12, 2020 at 1:55 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > Another variant of this problem was recently observed, where the
> > > > kernel violates cgroups' memory.low protection settings and reclaims
> > > > page cache way beyond the configured thresholds. It was followed by a
> > > > proposal of a modified form of the reverted commit above, that
> > > > implements memory.low-sensitive shrinker skipping over populated
> > > > inodes on the LRU [1]. However, this proposal continues to run the
> > > > risk of attracting disproportionate reclaim pressure to a pool of
> > > > still-used inodes,
> > >
> > > Hi Johannes,
> > >
> > > If you really think that is a risk, what about bellow additional patch
> > > to fix this risk ?
> > >
> > > diff --git a/fs/inode.c b/fs/inode.c
> > > index 80dddbc..61862d9 100644
> > > --- a/fs/inode.c
> > > +++ b/fs/inode.c
> > > @@ -760,7 +760,7 @@ static bool memcg_can_reclaim_inode(struct inode *inode,
> > >                 goto out;
> > >
> > >         cgroup_size = mem_cgroup_size(memcg);
> > > -       if (inode->i_data.nrpages + protection >= cgroup_size)
> > > +       if (inode->i_data.nrpages)
> > >                 reclaimable = false;
> > >
> > >  out:
> > >
> > > With this additional patch, we skip all inodes in this memcg until all
> > > its page cache pages are reclaimed.
> >
> > Well that's something we've tried and had to revert because it caused
> > issues in slab reclaim. See the History part of my changelog.
> 
> You misuderstood it.
> The reverted patch skips all inodes in the system, while this patch
> only works when you turn on memcg.{min, low} protection.
> IOW, that is not a default behavior, while it only works when you want
> it and only effect your targeted memcg rather than the whole system.

I understand perfectly well.

Keeping unreclaimable inodes on the shrinker LRU causes the shrinker
to build up excessive pressure on all VFS objects. This is a
bug. Making it cgroup-specific doesn't make it less of a bug, it just
means you only hit the bug when you use cgroup memory protection.

> > > > while not addressing the more generic reclaim
> > > > inversion problem outside of a very specific cgroup application.
> > > >
> > >
> > > But I have a different understanding.  This method works like a
> > > knob. If you really care about your workingset (data), you should
> > > turn it on (i.e. by using memcg protection to protect them), while
> > > if you don't care about your workingset (data) then you'd better
> > > turn it off. That would be more flexible.  Regaring your case in the
> > > commit log, why not protect your linux git tree with memcg
> > > protection ?
> >
> > I can't imagine a scenario where I *wouldn't* care about my
> > workingset, though. Why should it be opt-in, not the default?
> 
> Because the default behavior has caused the XFS performace hit.

That means that with your proposal you cannot use cgroup memory
protection for workloads that run on xfs.

(And if I remember the bug report correctly, this wasn't just xfs. It
also caused metadata caches on other filesystems to get trashed. xfs
was just more pronounced because it does sync inode flushing from the
shrinker, adding write stalls to the mix of metadata cache misses.)

What I'm proposing is an implementation that protects hot page cache
without causing excessive shrinker pressure and rotations.

  reply index

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11 17:55 Johannes Weiner
2020-02-11 18:20 ` Johannes Weiner
2020-02-11 19:05 ` Rik van Riel
2020-02-11 19:31   ` Johannes Weiner
2020-02-11 23:44     ` Andrew Morton
2020-02-12  0:28       ` Linus Torvalds
2020-02-12  0:47         ` Andrew Morton
2020-02-12  1:03           ` Linus Torvalds
2020-02-12  8:50             ` Russell King - ARM Linux admin
2020-02-13  9:50               ` Lucas Stach
2020-02-13 16:52               ` Arnd Bergmann
2020-02-15 11:25                 ` Geert Uytterhoeven
2020-02-15 16:59                   ` Arnd Bergmann
2020-02-16  9:44                     ` Geert Uytterhoeven
2020-02-16 19:54                       ` Chris Paterson
2020-02-16 20:38                         ` Arnd Bergmann
2020-02-20 14:35                           ` Chris Paterson
2020-02-26 18:04                 ` santosh.shilimkar
2020-02-26 21:01                   ` Arnd Bergmann
2020-02-26 21:11                     ` santosh.shilimkar
2020-03-06 20:34                       ` Nishanth Menon
2020-03-07  1:08                         ` santosh.shilimkar
2020-03-08 10:58                         ` Arnd Bergmann
2020-03-08 14:19                           ` Russell King - ARM Linux admin
2020-03-09 13:33                             ` Arnd Bergmann
2020-03-09 14:04                               ` Russell King - ARM Linux admin
2020-03-09 15:04                                 ` Arnd Bergmann
2020-03-10  9:16                                   ` Michal Hocko
2020-03-09 15:59                           ` Catalin Marinas
2020-03-09 16:09                             ` Russell King - ARM Linux admin
2020-03-09 16:57                               ` Catalin Marinas
2020-03-09 19:46                               ` Arnd Bergmann
2020-03-11 14:29                                 ` Catalin Marinas
2020-03-11 16:59                                   ` Arnd Bergmann
2020-03-11 17:26                                     ` Catalin Marinas
2020-03-11 22:21                                       ` Arnd Bergmann
2020-02-12  3:58         ` Matthew Wilcox
2020-02-12  8:09         ` Michal Hocko
2020-02-17 13:31         ` Pavel Machek
2020-02-12 16:35       ` Johannes Weiner
2020-02-12 18:26         ` Andrew Morton
2020-02-12 18:52           ` Johannes Weiner
2020-02-12 12:25 ` Yafang Shao
2020-02-12 16:42   ` Johannes Weiner
2020-02-13  1:47     ` Yafang Shao
2020-02-13 13:46       ` Johannes Weiner [this message]
2020-02-14  2:02         ` Yafang Shao
2020-02-13 18:34 ` [PATCH v2] " Johannes Weiner
2020-02-14 16:53 ` [PATCH] " kbuild test robot
2020-02-14 21:30 ` kbuild test robot
2020-02-14 21:30 ` [PATCH] vfs: fix boolreturn.cocci warnings kbuild test robot
2020-05-12 21:29 ` [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU Johannes Weiner
2020-05-13  1:32   ` Yafang Shao
2020-05-13 13:00     ` Johannes Weiner
2020-05-13 21:15   ` Andrew Morton
2020-05-14 11:27     ` Johannes Weiner
2020-05-14  2:24   ` Andrew Morton
2020-05-14 10:37     ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200213134627.GB208501@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=guro@fb.com \
    --cc=kernel-team@fb.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git