linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Yu Zhao <yuzhao@google.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Andi Kleen" <ak@linux.intel.com>,
	"Aneesh Kumar" <aneesh.kumar@linux.ibm.com>,
	"Barry Song" <21cnbao@gmail.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>,
	"Jesse Barnes" <jsbarnes@google.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Michael Larabel" <Michael@michaellarabel.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Rik van Riel" <riel@surriel.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Will Deacon" <will@kernel.org>,
	"Ying Huang" <ying.huang@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	page-reclaim@google.com, x86@kernel.org,
	"Brian Geffon" <bgeffon@google.com>,
	"Jan Alexander Steffens" <heftig@archlinux.org>,
	"Oleksandr Natalenko" <oleksandr@natalenko.name>,
	"Steven Barrett" <steven@liquorix.net>,
	"Suleiman Souhlal" <suleiman@google.com>,
	"Daniel Byrne" <djbyrne@mtu.edu>,
	"Donald Carr" <d@chaos-reins.com>,
	"Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Konstantin Kharlamov" <Hi-Angel@yandex.ru>,
	"Shuang Zhai" <szhai2@cs.rochester.edu>,
	"Sofia Trinh" <sofia.trinh@edi.works>
Subject: Re: [PATCH v7 05/12] mm: multigenerational LRU: minimal implementation
Date: Tue, 8 Feb 2022 11:50:09 -0500	[thread overview]
Message-ID: <YgKfQVKcBebL7pY3@cmpxchg.org> (raw)
In-Reply-To: <20220208081902.3550911-6-yuzhao@google.com>

Hi Yu,

Thanks for restructuring this from the last version. It's easier to
learn the new model when you start out with the bare bones, then let
optimizations and self-contained features follow later.

On Tue, Feb 08, 2022 at 01:18:55AM -0700, Yu Zhao wrote:
> To avoid confusions, the terms "promotion" and "demotion" will be
> applied to the multigenerational LRU, as a new convention; the terms
> "activation" and "deactivation" will be applied to the active/inactive
> LRU, as usual.
> 
> The aging produces young generations. Given an lruvec, it increments
> max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging
> promotes hot pages to the youngest generation when it finds them
> accessed through page tables; the demotion of cold pages happens
> consequently when it increments max_seq. Since the aging is only
> interested in hot pages, its complexity is O(nr_hot_pages). Promotion
> in the aging path doesn't require any LRU list operations, only the
> updates of the gen counter and lrugen->nr_pages[]; demotion, unless
> as the result of the increment of max_seq, requires LRU list
> operations, e.g., lru_deactivate_fn().

I'm having trouble with this changelog. It opens with a footnote and
summarizes certain aspects of the implementation whose importance to
the reader aren't entirely clear at this time.

It would be better to start with a high-level overview of the problem
and how this algorithm solves it. How the reclaim algorithm needs to
find the page that is most suitable for eviction and to signal when
it's time to give up and OOM. Then explain how grouping pages into
multiple generations accomplishes that - in particular compared to the
current two use-once/use-many lists.

Explain the problem of MMU vs syscall references, and how tiering
addresses this.

Explain the significance of refaults and how the algorithm responds to
them. Not in terms of which running averages are updated, but in terms
of user-visible behavior ("will start swapping (more)" etc.)

Express *intent*, how it's supposed to behave wrt workloads and memory
pressure. The code itself will explain the how, its complexity etc.

Most reviewers will understand the fundamental challenges of page
reclaim. The difficulty is matching individual aspects of the problem
space to your individual components and design choices you have made.

Let us in on that thinking, please ;)

> @@ -892,6 +892,50 @@ config ANON_VMA_NAME
>  	  area from being merged with adjacent virtual memory areas due to the
>  	  difference in their name.
>  
> +# multigenerational LRU {
> +config LRU_GEN
> +	bool "Multigenerational LRU"
> +	depends on MMU
> +	# the following options can use up the spare bits in page flags
> +	depends on !MAXSMP && (64BIT || !SPARSEMEM || SPARSEMEM_VMEMMAP)
> +	help
> +	  A high performance LRU implementation for memory overcommit. See
> +	  Documentation/admin-guide/mm/multigen_lru.rst and
> +	  Documentation/vm/multigen_lru.rst for details.

These files don't exist at this time, please introduce them before or
when referencing them. If they document things introduced later in the
patchset, please start with a minimal version of the file and update
it as you extend the algorithm and add optimizations etc.

It's really important to only reference previous patches, not later
ones. This allows reviewers to read the patches linearly.  Having to
search for missing pieces in patches you haven't looked at yet is bad.

> +config NR_LRU_GENS
> +	int "Max number of generations"
> +	depends on LRU_GEN
> +	range 4 31
> +	default 4
> +	help
> +	  Do not increase this value unless you plan to use working set
> +	  estimation and proactive reclaim to optimize job scheduling in data
> +	  centers.
> +
> +	  This option uses order_base_2(N+1) bits in page flags.
> +
> +config TIERS_PER_GEN
> +	int "Number of tiers per generation"
> +	depends on LRU_GEN
> +	range 2 4
> +	default 4
> +	help
> +	  Do not decrease this value unless you run out of spare bits in page
> +	  flags, i.e., you see the "Not enough bits in page flags" build error.
> +
> +	  This option uses N-2 bits in page flags.

Linus had pointed out that we shouldn't ask these questions of the
user. How do you pick numbers here? I'm familiar with workingset
estimation and proactive reclaim usecases but I wouldn't know.

Even if we removed the config option and hardcoded the number, this is
a question for kernel developers: What does "4" mean? How would
behavior differ if it were 3 or 5 instead? Presumably there is some
sort of behavior gradient. "As you increase the number of
generations/tiers, the user-visible behavior of the kernel will..."
This should really be documented.

I'd also reiterate Mel's point: Distribution kernels need to support
the full spectrum of applications and production environments. Unless
using non-defaults it's an extremely niche usecase (like compiling out
BUG() calls) compile-time options are not the right choice. If we do
need a tunable, it could make more sense to have a compile time upper
limit (to determine page flag space) combined with a runtime knob?

Thanks!


  parent reply	other threads:[~2022-02-08 16:50 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08  8:18 [PATCH v7 00/12] Multigenerational LRU Framework Yu Zhao
2022-02-08  8:18 ` [PATCH v7 01/12] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-02-08  8:24   ` Yu Zhao
2022-02-08 10:33   ` Will Deacon
2022-02-08  8:18 ` [PATCH v7 02/12] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-02-08  8:27   ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 03/12] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-02-08  8:18 ` [PATCH v7 04/12] mm: multigenerational LRU: groundwork Yu Zhao
2022-02-08  8:28   ` Yu Zhao
2022-02-10 20:41   ` Johannes Weiner
2022-02-15  9:43     ` Yu Zhao
2022-02-15 21:53       ` Johannes Weiner
2022-02-21  8:14         ` Yu Zhao
2022-02-23 21:18           ` Yu Zhao
2022-02-25 16:34             ` Minchan Kim
2022-03-03 15:29           ` Johannes Weiner
2022-03-03 19:26             ` Yu Zhao
2022-03-03 21:43               ` Johannes Weiner
2022-03-11 10:16       ` Barry Song
2022-03-11 23:45         ` Yu Zhao
2022-03-12 10:37           ` Barry Song
2022-03-12 21:11             ` Yu Zhao
2022-03-13  4:57               ` Barry Song
2022-03-14 11:11                 ` Barry Song
2022-03-14 16:45                   ` Yu Zhao
2022-03-14 23:38                     ` Barry Song
     [not found]                       ` <CAOUHufa9eY44QadfGTzsxa2=hEvqwahXd7Canck5Gt-N6c4UKA@mail.gmail.com>
     [not found]                         ` <CAGsJ_4zvj5rmz7DkW-kJx+jmUT9G8muLJ9De--NZma9ey0Oavw@mail.gmail.com>
2022-03-15 10:29                           ` Barry Song
2022-03-16  2:46                             ` Yu Zhao
2022-03-16  4:37                               ` Barry Song
2022-03-16  5:44                                 ` Yu Zhao
2022-03-16  6:06                                   ` Barry Song
2022-03-16 21:37                                     ` Yu Zhao
2022-02-10 21:37   ` Matthew Wilcox
2022-02-13 21:16     ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 05/12] mm: multigenerational LRU: minimal implementation Yu Zhao
2022-02-08  8:33   ` Yu Zhao
2022-02-08 16:50   ` Johannes Weiner [this message]
2022-02-10  2:53     ` Yu Zhao
2022-02-13 10:04   ` Hillf Danton
2022-02-17  0:13     ` Yu Zhao
2022-02-23  8:27   ` Huang, Ying
2022-02-23  9:36     ` Yu Zhao
2022-02-24  0:59       ` Huang, Ying
2022-02-24  1:34         ` Yu Zhao
2022-02-24  3:31           ` Huang, Ying
2022-02-24  4:09             ` Yu Zhao
2022-02-24  5:27               ` Huang, Ying
2022-02-24  5:35                 ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 06/12] mm: multigenerational LRU: exploit locality in rmap Yu Zhao
2022-02-08  8:40   ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 07/12] mm: multigenerational LRU: support page table walks Yu Zhao
2022-02-08  8:39   ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 08/12] mm: multigenerational LRU: optimize multiple memcgs Yu Zhao
2022-02-08  8:18 ` [PATCH v7 09/12] mm: multigenerational LRU: runtime switch Yu Zhao
2022-02-08  8:42   ` Yu Zhao
2022-02-08  8:19 ` [PATCH v7 10/12] mm: multigenerational LRU: thrashing prevention Yu Zhao
2022-02-08  8:43   ` Yu Zhao
2022-02-08  8:19 ` [PATCH v7 11/12] mm: multigenerational LRU: debugfs interface Yu Zhao
2022-02-18 18:56   ` [page-reclaim] " David Rientjes
2022-02-08  8:19 ` [PATCH v7 12/12] mm: multigenerational LRU: documentation Yu Zhao
2022-02-08  8:44   ` Yu Zhao
2022-02-14 10:28   ` Mike Rapoport
2022-02-16  3:22     ` Yu Zhao
2022-02-21  9:01       ` Mike Rapoport
2022-02-22  1:47         ` Yu Zhao
2022-02-23 10:58           ` Mike Rapoport
2022-02-23 21:20             ` Yu Zhao
2022-02-08 10:11 ` [PATCH v7 00/12] Multigenerational LRU Framework Oleksandr Natalenko
2022-02-08 11:14   ` Michal Hocko
2022-02-08 11:23     ` Oleksandr Natalenko
2022-02-11 20:12 ` Alexey Avramov
2022-02-12 21:01   ` Yu Zhao
2022-03-03  6:06 ` Vaibhav Jain
2022-03-03  6:47   ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YgKfQVKcBebL7pY3@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=21cnbao@gmail.com \
    --cc=Hi-Angel@yandex.ru \
    --cc=Michael@michaellarabel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=bgeffon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=d@chaos-reins.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=djbyrne@mtu.edu \
    --cc=hdanton@sina.com \
    --cc=heftig@archlinux.org \
    --cc=holger@applied-asynchrony.com \
    --cc=jsbarnes@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=oleksandr@natalenko.name \
    --cc=page-reclaim@google.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=sofia.trinh@edi.works \
    --cc=steven@liquorix.net \
    --cc=suleiman@google.com \
    --cc=szhai2@cs.rochester.edu \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).