Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Rik van Riel <riel@surriel.com>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	Yafang Shao <laoar.shao@gmail.com>,
	Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	kernel-team@fb.com
Subject: Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU
Date: Tue, 11 Feb 2020 14:31:01 -0500
Message-ID: <20200211193101.GA178975@cmpxchg.org> (raw)
In-Reply-To: <29b6e848ff4ad69b55201751c9880921266ec7f4.camel@surriel.com>

On Tue, Feb 11, 2020 at 02:05:38PM -0500, Rik van Riel wrote:
> On Tue, 2020-02-11 at 12:55 -0500, Johannes Weiner wrote:
> > The VFS inode shrinker is currently allowed to reclaim inodes with
> > populated page cache. As a result it can drop gigabytes of hot and
> > active page cache on the floor without consulting the VM (recorded as
> > "inodesteal" events in /proc/vmstat).
> > 
> > This causes real problems in practice. Consider for example how the
> > VM
> > would cache a source tree, such as the Linux git tree. As large parts
> > of the checked out files and the object database are accessed
> > repeatedly, the page cache holding this data gets moved to the active
> > list, where it's fully (and indefinitely) insulated from one-off
> > cache
> > moving through the inactive list.
> 
> > This behavior of invalidating page cache from the inode shrinker goes
> > back to even before the git import of the kernel tree. It may have
> > been less noticeable when the VM itself didn't have real workingset
> > protection, and floods of one-off cache would push out any active
> > cache over time anyway. But the VM has come a long way since then and
> > the inode shrinker is now actively subverting its caching strategy.
> 
> Two things come to mind when looking at this:
> - highmem
> - NUMA
> 
> IIRC one of the reasons reclaim is done in this way is
> because a page cache page in one area of memory (highmem,
> or a NUMA node) can end up pinning inode slab memory in
> another memory area (normal zone, other NUMA node).

That's a good point, highmem does ring a bell now that you mention it.

If we still care, I think this could be solved by doing something
similar to what we do with buffer_heads_over_limit: allow a lowmem
allocation to reclaim page cache inside the highmem zone if the bhs
(or inodes in this case) have accumulated excessively.

AFAICS, we haven't done anything similar for NUMA, so it might not be
much of a problem there. I could imagine this is in part because NUMA
nodes tend to be more balanced in size, and the ratio between cache
memory and inode/bh memory means that these objects won't turn into a
significant externality. Whereas with extreme highmem:lowmem ratios,
they can.

  reply index

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11 17:55 Johannes Weiner
2020-02-11 18:20 ` Johannes Weiner
2020-02-11 19:05 ` Rik van Riel
2020-02-11 19:31   ` Johannes Weiner [this message]
2020-02-11 23:44     ` Andrew Morton
2020-02-12  0:28       ` Linus Torvalds
2020-02-12  0:47         ` Andrew Morton
2020-02-12  1:03           ` Linus Torvalds
2020-02-12  8:50             ` Russell King - ARM Linux admin
2020-02-13  9:50               ` Lucas Stach
2020-02-13 16:52               ` Arnd Bergmann
2020-02-15 11:25                 ` Geert Uytterhoeven
2020-02-15 16:59                   ` Arnd Bergmann
2020-02-16  9:44                     ` Geert Uytterhoeven
2020-02-16 19:54                       ` Chris Paterson
2020-02-16 20:38                         ` Arnd Bergmann
2020-02-20 14:35                           ` Chris Paterson
2020-02-26 18:04                 ` santosh.shilimkar
2020-02-26 21:01                   ` Arnd Bergmann
2020-02-26 21:11                     ` santosh.shilimkar
2020-03-06 20:34                       ` Nishanth Menon
2020-03-07  1:08                         ` santosh.shilimkar
2020-03-08 10:58                         ` Arnd Bergmann
2020-03-08 14:19                           ` Russell King - ARM Linux admin
2020-03-09 13:33                             ` Arnd Bergmann
2020-03-09 14:04                               ` Russell King - ARM Linux admin
2020-03-09 15:04                                 ` Arnd Bergmann
2020-03-10  9:16                                   ` Michal Hocko
2020-03-09 15:59                           ` Catalin Marinas
2020-03-09 16:09                             ` Russell King - ARM Linux admin
2020-03-09 16:57                               ` Catalin Marinas
2020-03-09 19:46                               ` Arnd Bergmann
2020-03-11 14:29                                 ` Catalin Marinas
2020-03-11 16:59                                   ` Arnd Bergmann
2020-03-11 17:26                                     ` Catalin Marinas
2020-03-11 22:21                                       ` Arnd Bergmann
2020-02-12  3:58         ` Matthew Wilcox
2020-02-12  8:09         ` Michal Hocko
2020-02-17 13:31         ` Pavel Machek
2020-02-12 16:35       ` Johannes Weiner
2020-02-12 18:26         ` Andrew Morton
2020-02-12 18:52           ` Johannes Weiner
2020-02-12 12:25 ` Yafang Shao
2020-02-12 16:42   ` Johannes Weiner
2020-02-13  1:47     ` Yafang Shao
2020-02-13 13:46       ` Johannes Weiner
2020-02-14  2:02         ` Yafang Shao
2020-02-13 18:34 ` [PATCH v2] " Johannes Weiner
2020-02-14 16:53 ` [PATCH] " kbuild test robot
2020-02-14 21:30 ` kbuild test robot
2020-02-14 21:30 ` [PATCH] vfs: fix boolreturn.cocci warnings kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200211193101.GA178975@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=guro@fb.com \
    --cc=kernel-team@fb.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git