Re: [PATCH v10 06/14] mm: multi-gen LRU: minimal implementation

From: Barry Song <21cnbao@gmail.com>
To: Yu Zhao <yuzhao@google.com>
Cc: "Stephen Rothwell" <sfr@rothwell.id.au>,
	Linux-MM <linux-mm@kvack.org>, "Andi Kleen" <ak@linux.intel.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Aneesh Kumar" <aneesh.kumar@linux.ibm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>,
	"Jesse Barnes" <jsbarnes@google.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Michael Larabel" <Michael@michaellarabel.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Rik van Riel" <riel@surriel.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Will Deacon" <will@kernel.org>,
	"Ying Huang" <ying.huang@intel.com>,
	LAK <linux-arm-kernel@lists.infradead.org>,
	"Linux Doc Mailing List" <linux-doc@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Kernel Page Reclaim v2" <page-reclaim@google.com>,
	x86 <x86@kernel.org>, "Brian Geffon" <bgeffon@google.com>,
	"Jan Alexander Steffens" <heftig@archlinux.org>,
	"Oleksandr Natalenko" <oleksandr@natalenko.name>,
	"Steven Barrett" <steven@liquorix.net>,
	"Suleiman Souhlal" <suleiman@google.com>,
	"Daniel Byrne" <djbyrne@mtu.edu>,
	"Donald Carr" <d@chaos-reins.com>,
	"Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Konstantin Kharlamov" <Hi-Angel@yandex.ru>,
	"Shuang Zhai" <szhai2@cs.rochester.edu>,
	"Sofia Trinh" <sofia.trinh@edi.works>,
	"Vaibhav Jain" <vaibhav@linux.ibm.com>
Subject: Re: [PATCH v10 06/14] mm: multi-gen LRU: minimal implementation
Date: Mon, 18 Apr 2022 21:58:26 +1200	[thread overview]
Message-ID: <CAGsJ_4x2wmR60GQO-jjd5UAvOMWMSi+kFpUa2DBm4e8KocH7jQ@mail.gmail.com> (raw)
In-Reply-To: <20220407031525.2368067-7-yuzhao@google.com>

On Thu, Apr 7, 2022 at 3:16 PM Yu Zhao <yuzhao@google.com> wrote:
>
> To avoid confusion, the terms "promotion" and "demotion" will be
> applied to the multi-gen LRU, as a new convention; the terms
> "activation" and "deactivation" will be applied to the active/inactive
> LRU, as usual.
>
> The aging produces young generations. Given an lruvec, it increments
> max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging
> promotes hot pages to the youngest generation when it finds them
> accessed through page tables; the demotion of cold pages happens
> consequently when it increments max_seq. The aging has the complexity
> O(nr_hot_pages), since it is only interested in hot pages. Promotion
> in the aging path does not require any LRU list operations, only the
> updates of the gen counter and lrugen->nr_pages[]; demotion, unless as
> the result of the increment of max_seq, requires LRU list operations,
> e.g., lru_deactivate_fn().
>
> The eviction consumes old generations. Given an lruvec, it increments
> min_seq when the lists indexed by min_seq%MAX_NR_GENS become empty. A
> feedback loop modeled after the PID controller monitors refaults over
> anon and file types and decides which type to evict when both types
> are available from the same generation.
>
> Each generation is divided into multiple tiers. Tiers represent
> different ranges of numbers of accesses through file descriptors. A
> page accessed N times through file descriptors is in tier
> order_base_2(N). Tiers do not have dedicated lrugen->lists[], only
> bits in folio->flags. In contrast to moving across generations, which
> requires the LRU lock, moving across tiers only involves operations on
> folio->flags. The feedback loop also monitors refaults over all tiers
> and decides when to protect pages in which tiers (N>1), using the
> first tier (N=0,1) as a baseline. The first tier contains single-use
> unmapped clean pages, which are most likely the best choices. The
> eviction moves a page to the next generation, i.e., min_seq+1, if the
> feedback loop decides so. This approach has the following advantages:
> 1. It removes the cost of activation in the buffered access path by
>    inferring whether pages accessed multiple times through file
>    descriptors are statistically hot and thus worth protecting in the
>    eviction path.
> 2. It takes pages accessed through page tables into account and avoids
>    overprotecting pages accessed multiple times through file
>    descriptors. (Pages accessed through page tables are in the first
>    tier, since N=0.)
> 3. More tiers provide better protection for pages accessed more than
>    twice through file descriptors, when under heavy buffered I/O
>    workloads.
>

Hi Yu,
As I told you before,  I tried to change the current LRU (not MGLRU) by only
promoting unmapped file pages to the head of the inactive head rather than
the active head on its second access:
https://lore.kernel.org/lkml/CAGsJ_4y=TkCGoWWtWSAptW4RDFUEBeYXwfwu=fUFvV4Sa4VA4A@mail.gmail.com/
I have already seen some very good results by the decease of cpu consumption of
kswapd and direct reclamation in the testing.

in mglru, it seems "twice" isn't a concern at all, one unmapped file
page accessed
twice has no much difference with those ones which are accessed once as you
only begin to increase refs from the third time:

+static void folio_inc_refs(struct folio *folio)
+{
+       unsigned long refs;
+       unsigned long old_flags, new_flags;
+
+       if (folio_test_unevictable(folio))
+               return;
+
+       /* see the comment on MAX_NR_TIERS */
+       do {
+               new_flags = old_flags = READ_ONCE(folio->flags);
+
+               if (!(new_flags & BIT(PG_referenced))) {
+                       new_flags |= BIT(PG_referenced);
+                       continue;
+               }
+
+               if (!(new_flags & BIT(PG_workingset))) {
+                       new_flags |= BIT(PG_workingset);
+                       continue;
+               }
+
+               refs = new_flags & LRU_REFS_MASK;
+               refs = min(refs + BIT(LRU_REFS_PGOFF), LRU_REFS_MASK);
+
+               new_flags &= ~LRU_REFS_MASK;
+               new_flags |= refs;
+       } while (new_flags != old_flags &&
+                cmpxchg(&folio->flags, old_flags, new_flags) != old_flags);
+}

So my question is what makes you so confident that twice doesn't need
any special treatment while the vanilla kernel is upgrading this kind of page
to the head of the active instead? I am asking this because I am considering
reclaiming unmapped file pages which are only accessed twice when they
get to the tail of the inactive list.

Thanks
Barry