Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU

From: Arnd Bergmann <arnd@arndb.de>
To: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Nishanth Menon <nm@ti.com>,
	Santosh Shilimkar <santosh.shilimkar@oracle.com>,
	Tero Kristo <t-kristo@ti.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@surriel.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Santosh Shilimkar <ssantosh@kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, Yafang Shao <laoar.shao@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	kernel-team@fb.com, Kishon Vijay Abraham I <kishon@ti.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU
Date: Mon, 9 Mar 2020 16:04:54 +0100	[thread overview]
Message-ID: <CAK8P3a1HEhwie1uUObQMJyGcs_WSwz4Gj81tAWXZX4d2ff77XA@mail.gmail.com> (raw)
In-Reply-To: <20200309140439.GL25745@shell.armlinux.org.uk>

On Mon, Mar 9, 2020 at 3:05 PM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Mon, Mar 09, 2020 at 02:33:26PM +0100, Arnd Bergmann wrote:
> > On Sun, Mar 8, 2020 at 3:20 PM Russell King - ARM Linux admin
> > <linux@armlinux.org.uk> wrote:
> > > On Sun, Mar 08, 2020 at 11:58:52AM +0100, Arnd Bergmann wrote:
> > > > On Fri, Mar 6, 2020 at 9:36 PM Nishanth Menon <nm@ti.com> wrote:
> > > > > On 13:11-20200226, santosh.shilimkar@oracle.com wrote:
> > >
> > > > - extend zswap to use all the available high memory for swap space
> > > >   when highmem is disabled.
> > >
> > > I don't think that's a good idea.  Running debian stable kernels on my
> > > 8GB laptop, I have problems when leaving firefox running long before
> > > even half the 16GB of swap gets consumed - the entire machine slows
> > > down very quickly when it starts swapping more than about 2 or so GB.
> > > It seems either the kernel has become quite bad at selecting pages to
> > > evict.
> > >
> > > It gets to the point where any git operation has a battle to fight
> > > for RAM, despite not touching anything else other than git.
> > >
> > > The behaviour is much like firefox is locking memory into core, but
> > > that doesn't seem to be what's actually going on.  I've never really
> > > got to the bottom of it though.
> > >
> > > This is with 64-bit kernel and userspace.
> >
> > I agree there is something going wrong on your machine, but I
> > don't really see how that relates to my suggestion.
>
> You are suggesting for a 4GB machine to use 2GB of RAM for normal
> usage via an optimised virtual space layout, and 2GB of RAM for
> swap using ZSWAP, rather than having 4GB of RAM available via the
> present highmem / lowmem system.

No, that would not be good. The cases where I would hope
to get improvements out of zswap are:

- 1GB of RAM with VMSPLIT_3G, when VMSPLIT_3G_OPT
  and VMSPLIT_2G don't work because of user address space
  requirements

- 2GB of RAM with VMSPLIT_2G

- 4GB of RAM if we add VMSPLIT_4G_4G

> > - A lot of embedded systems are configured to have no swap at all,
> >   which can be for good or not-so-good reasons. Having some
> >   swap space available often improves things, even if it comes
> >   out of RAM.
>
> How do you come up with that assertion?  What is the evidence behind
> it?

The idea of zswap is that it's faster to compress/uncompress
data than to actually access a slow disk. So if you already have
a swap space, it gives you another performance tier inbetween
direct-mapped pages and the slow swap.

If you don't have a physical swap space, then reserving a little
bit of RAM for compressed swap means that rarely used pages
take up less space and you end up with more RAM available
for the workload you want to run.

> This is kind of the crux of my point in the previous email: Linux
> with swap performs way worse for me - if I had 16GB of RAM in my
> laptop, I bet it would perform better than my current 8GB with a
> 16GB swap file - where, when the swap file gets around 8GB full,
> the system as a whole starts to struggle.
>
> That's about a 50/50 split of VM space between RAM and swap.

As I said above I agree that very few workloads would behave
better from using using 1.75GB RAM plus 2.25GB zswap (storing
maybe 6GB of data) compared to highmem. To deal with 4GB
systems, we probably need either highmem or VMSPLIT_4G_4G.

> > - A particularly important case to optimize for is 2GB of RAM with
> >   LPAE enabled. With CONFIG_VMSPLIT_2G and highmem, this
> >   leads to the paradox -ENOMEM when 256MB of highmem are
> >   full while plenty of lowmem is available. With highmem disabled,
> >   you avoid that at the cost of losing 12% of RAM.
>
> What happened to requests for memory from highmem being able to be
> sourced from lowmem if highmem wasn't available?  That used to be
> standard kernel behaviour.

AFAICT this is how it's supposed to work, but for some reason it
doesn't always. I don't know the details, but have heard of recent
complaints about it. I don't think it's the actual get_free_pages
failing, but rather some heuristic looking at the number of free pages.

> > - With 4GB+ of RAM and CONFIG_VMSPLIT_2G or
> >   CONFIG_VMSPLIT_3G, using gigabytes of RAM for swap
> >   space would usually be worse than highmem, but once
> >   we have VMSPLIT_4G_4G, it's the same situation as above
> >   with 6% of RAM used for zswap instead of highmem.
>
> I think the chances of that happening are very slim - I doubt there
> is the will to invest the energy amongst what is left of the 32-bit
> ARM community.

Right. But I think it makes sense to discuss what it would take
to do it anyway, and to see who would be interested in funding or
implementing VMSPLIT_4G_4G. Whether it happens or not comes
down to another tradeoff: Without it, we have to keep highmem
around for a much long timer to support systems with 4GB of
RAM along with systems that need both 2GB of physical RAM
and 3GB of user address space, while adding VMSPLIT_4G_4G
soon means we can probably kill off highmem after everybody
with more 8GB of RAM or more has stopped upgrading kernels.
Unlike the 2GB case, this is something we can realistically
plan for.

What is going to be much harder I fear is to find someone to
implement it on MIPS32, which seems to be a decade ahead
of 32-bit ARM in its decline, and also has a small number of users
with 4GB or more, and architecturally it seems harder to
implement or impossible depending on the type of MIPS
MMU.

        Arnd