Re: [LSF/MM/BPF TOPIC] VM Memory Overcommit

From: David Rientjes <rientjes@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: SeongJae Park <sj@kernel.org>,
	"T.J. Alumbaugh" <talumbau@google.com>,
	 lsf-pc@lists.linux-foundation.org,
	 "Sudarshan Rajagopalan (QUIC)" <quic_sudaraja@quicinc.com>,
	hch@lst.de,  kai.huang@intel.com, jon@nutanix.com,
	Yuanchu Xie <yuanchu@google.com>,  linux-mm <linux-mm@kvack.org>,
	damon@lists.linux.dev
Subject: Re: [LSF/MM/BPF TOPIC] VM Memory Overcommit
Date: Wed, 1 Mar 2023 19:26:53 -0800 (PST)	[thread overview]
Message-ID: <c57f3f06-5079-6e28-5238-c5731ee02a6e@google.com> (raw)
In-Reply-To: <5751ca20-9848-af42-bd1d-c7671b5796db@redhat.com>

On Tue, 28 Feb 2023, David Hildenbrand wrote:

> On 28.02.23 23:38, SeongJae Park wrote:
> > On Tue, 28 Feb 2023 10:20:57 +0100 David Hildenbrand <david@redhat.com>
> > wrote:
> > 
> > > On 23.02.23 00:59, T.J. Alumbaugh wrote:
> > > > Hi,
> > > > 
> > > > This topic proposal would be to present and discuss multiple MM
> > > > features to improve host memory overcommit while running VMs. There
> > > > are two general cases:
> > > > 
> > > > 1. The host and its guests operate independently,
> > > > 
> > > > 2. The host and its guests cooperate by techniques like ballooning.
> > > > 
> > > > In the first case, we would discuss some new techniques, e.g., fast
> > > > access bit harvesting in the KVM MMU, and some difficulties, e.g.,
> > > > double zswapping.
> > > > 
> > > > In the second case, we would like to discuss a novel working set size
> > > > (WSS) notifier framework and some improvements to the ballooning
> > > > policy. The WSS notifier, when available, can report WSS to its
> > > > listeners. VM Memory Overcommit is one of its use cases: the
> > > > virtio-balloon driver can register for WSS notifications and relay WSS
> > > > to the host. The host can leverage the WSS notifications and improve
> > > > the ballooning policy.
> > > > 
> > > > This topic would be of interest to a wide range of audience, e.g.,
> > > > phones, laptops and servers.
> > > > Co-presented with Yuanchu Xie.
> > > 
> > > In general, having the WSS available to the hypervisor might be
> > > beneficial. I recall, that there was an idea to leverage MGLRU and to
> > > communicate MGLRU statistics to the hypervisor, such that the hypervisor
> > > can make decisions using these statistics.
> > > 
> > > But note that I don't think that the future will be traditional memory
> > > balloon inflation/deflation. I think it might be useful in related
> > > context, though.
> > > 
> > > What we actually might want is a way to tell the OS ruining inside the
> > > VM to "please try not using more than XXX MiB of physical memory" but
> > > treat it as a soft limit. So in case we mess up, or there is a sudden
> > > peak in memory consumption due to a workload, we won't harm the guest
> > > OS/workload, and don't have to act immediately to avoid trouble. One can
> > > think of it like an evolution of memory ballooning: instead of creating
> > > artificial memory pressure by inflating the balloon that is fairly event
> > > driven and requires explicit memory deflation, we teach the OS to do it
> > > natively and pair it with free page reporting.
> > > 
> > > All free physical memory inside the VM can be reported using free page
> > > reporting to the hypervisor, and the OS will try sticking to the
> > > requested "logical" VM size, unless there is real demand for more memory.
> > 
> > I think use of DAMON_RECLAIM[1] inside VM together with free pages reporting
> > could be an option.  Some users tried that in a manual way and reported some
> > positive results.  I'm trying to find a good way to provide some control of
> > the
> > in-VM DAMON_RECLAIM utilization to hypervisor.
> > 
> 
> I think we might want to go one step further and not only reclaim
> (pro)actively, but also limit e.g., the growth of caches, such as the
> pagecache, to make them also aware of a soft-limit. Having that said, I still
> have to learn more about DAMON reclaim :)
> 

I'm curious, is this limitation possible to impose with memcg today or are 
specifically looking to provide a cap on page cache, dentries, inodes, 
etc, without specifically requiring memcg?