From: Andrew Morton <firstname.lastname@example.org> To: Johannes Weiner <email@example.com> Cc: Rik van Riel <firstname.lastname@example.org>, email@example.com, firstname.lastname@example.org, email@example.com, Dave Chinner <firstname.lastname@example.org>, Yafang Shao <email@example.com>, Michal Hocko <firstname.lastname@example.org>, Roman Gushchin <email@example.com>, Linus Torvalds <firstname.lastname@example.org>, Al Viro <email@example.com>, firstname.lastname@example.org Subject: Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU Date: Tue, 11 Feb 2020 15:44:38 -0800 Message-ID: <email@example.com> (raw) In-Reply-To: <20200211193101.GA178975@cmpxchg.org> On Tue, 11 Feb 2020 14:31:01 -0500 Johannes Weiner <firstname.lastname@example.org> wrote: > On Tue, Feb 11, 2020 at 02:05:38PM -0500, Rik van Riel wrote: > > On Tue, 2020-02-11 at 12:55 -0500, Johannes Weiner wrote: > > > The VFS inode shrinker is currently allowed to reclaim inodes with > > > populated page cache. As a result it can drop gigabytes of hot and > > > active page cache on the floor without consulting the VM (recorded as > > > "inodesteal" events in /proc/vmstat). > > > > > > This causes real problems in practice. Consider for example how the > > > VM > > > would cache a source tree, such as the Linux git tree. As large parts > > > of the checked out files and the object database are accessed > > > repeatedly, the page cache holding this data gets moved to the active > > > list, where it's fully (and indefinitely) insulated from one-off > > > cache > > > moving through the inactive list. > > > > > This behavior of invalidating page cache from the inode shrinker goes > > > back to even before the git import of the kernel tree. It may have > > > been less noticeable when the VM itself didn't have real workingset > > > protection, and floods of one-off cache would push out any active > > > cache over time anyway. But the VM has come a long way since then and > > > the inode shrinker is now actively subverting its caching strategy. > > > > Two things come to mind when looking at this: > > - highmem > > - NUMA > > > > IIRC one of the reasons reclaim is done in this way is > > because a page cache page in one area of memory (highmem, > > or a NUMA node) can end up pinning inode slab memory in > > another memory area (normal zone, other NUMA node). > > That's a good point, highmem does ring a bell now that you mention it. Yup, that's why this mechanism exists. Here: https://marc.info/?l=git-commits-head&m=103646757213266&w=2 > If we still care, I think this could be solved by doing something > similar to what we do with buffer_heads_over_limit: allow a lowmem > allocation to reclaim page cache inside the highmem zone if the bhs > (or inodes in this case) have accumulated excessively. Well, reclaiming highmem pagecache at random would be a painful way to reclaim lowmem inodes. Better to pick an inode then shoot down all its pagecache. Perhaps we could take its pagecache's aging into account. Testing this will be a challenge, but the issue was real - a 7GB highmem machine isn't crazy and I expect the inode has become larger since those days. > AFAICS, we haven't done anything similar for NUMA, so it might not be > much of a problem there. I could imagine this is in part because NUMA > nodes tend to be more balanced in size, and the ratio between cache > memory and inode/bh memory means that these objects won't turn into a > significant externality. Whereas with extreme highmem:lowmem ratios, > they can.
next prev parent reply index Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-02-11 17:55 Johannes Weiner 2020-02-11 18:20 ` Johannes Weiner 2020-02-11 19:05 ` Rik van Riel 2020-02-11 19:31 ` Johannes Weiner 2020-02-11 23:44 ` Andrew Morton [this message] 2020-02-12 0:28 ` Linus Torvalds 2020-02-12 0:47 ` Andrew Morton 2020-02-12 1:03 ` Linus Torvalds 2020-02-12 8:50 ` Russell King - ARM Linux admin 2020-02-13 9:50 ` Lucas Stach 2020-02-13 16:52 ` Arnd Bergmann 2020-02-15 11:25 ` Geert Uytterhoeven 2020-02-15 16:59 ` Arnd Bergmann 2020-02-16 9:44 ` Geert Uytterhoeven 2020-02-16 19:54 ` Chris Paterson 2020-02-16 20:38 ` Arnd Bergmann 2020-02-20 14:35 ` Chris Paterson 2020-02-26 18:04 ` santosh.shilimkar 2020-02-26 21:01 ` Arnd Bergmann 2020-02-26 21:11 ` santosh.shilimkar 2020-03-06 20:34 ` Nishanth Menon 2020-03-07 1:08 ` santosh.shilimkar 2020-03-08 10:58 ` Arnd Bergmann 2020-03-08 14:19 ` Russell King - ARM Linux admin 2020-03-09 13:33 ` Arnd Bergmann 2020-03-09 14:04 ` Russell King - ARM Linux admin 2020-03-09 15:04 ` Arnd Bergmann 2020-03-10 9:16 ` Michal Hocko 2020-03-09 15:59 ` Catalin Marinas 2020-03-09 16:09 ` Russell King - ARM Linux admin 2020-03-09 16:57 ` Catalin Marinas 2020-03-09 19:46 ` Arnd Bergmann 2020-03-11 14:29 ` Catalin Marinas 2020-03-11 16:59 ` Arnd Bergmann 2020-03-11 17:26 ` Catalin Marinas 2020-03-11 22:21 ` Arnd Bergmann 2020-02-12 3:58 ` Matthew Wilcox 2020-02-12 8:09 ` Michal Hocko 2020-02-17 13:31 ` Pavel Machek 2020-02-12 16:35 ` Johannes Weiner 2020-02-12 18:26 ` Andrew Morton 2020-02-12 18:52 ` Johannes Weiner 2020-02-12 12:25 ` Yafang Shao 2020-02-12 16:42 ` Johannes Weiner 2020-02-13 1:47 ` Yafang Shao 2020-02-13 13:46 ` Johannes Weiner 2020-02-14 2:02 ` Yafang Shao 2020-02-13 18:34 ` [PATCH v2] " Johannes Weiner 2020-02-14 16:53 ` [PATCH] " kbuild test robot 2020-02-14 21:30 ` kbuild test robot 2020-02-14 21:30 ` [PATCH] vfs: fix boolreturn.cocci warnings kbuild test robot 2020-05-12 21:29 ` [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU Johannes Weiner 2020-05-13 1:32 ` Yafang Shao 2020-05-13 13:00 ` Johannes Weiner 2020-05-13 21:15 ` Andrew Morton 2020-05-14 11:27 ` Johannes Weiner 2020-05-14 2:24 ` Andrew Morton 2020-05-14 10:37 ` Johannes Weiner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-Fsdevel Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \ email@example.com public-inbox-index linux-fsdevel Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git