linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: linux-mm@kvack.org
Cc: Alex Shi <alexs@kernel.org>, Andi Kleen <ak@linux.intel.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Dave Chinner <david@fromorbit.com>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	Donald Carr <sirspudd@gmail.com>,
	 Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Jonathan Corbet <corbet@lwn.net>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	 Konstantin Kharlamov <hi-angel@yandex.ru>,
	Marcus Seyfarth <m.seyfarth@gmail.com>,
	 Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mgorman@suse.de>, Miaohe Lin <linmiaohe@huawei.com>,
	 Michael Larabel <michael@michaellarabel.com>,
	Michal Hocko <mhocko@suse.com>,
	 Michel Lespinasse <michel@lespinasse.org>,
	Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
	 Tim Chen <tim.c.chen@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,  Yang Shi <shy828301@gmail.com>,
	Ying Huang <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>,
	 linux-kernel@vger.kernel.org, lkp@lists.01.org,
	page-reclaim@google.com,  Yu Zhao <yuzhao@google.com>,
	Konstantin Kharlamov <Hi-Angel@yandex.ru>
Subject: [PATCH v3 14/14] mm: multigenerational lru: documentation
Date: Thu, 20 May 2021 00:53:55 -0600	[thread overview]
Message-ID: <20210520065355.2736558-15-yuzhao@google.com> (raw)
In-Reply-To: <20210520065355.2736558-1-yuzhao@google.com>

Add Documentation/vm/multigen_lru.rst.

Signed-off-by: Yu Zhao <yuzhao@google.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
---
 Documentation/vm/index.rst        |   1 +
 Documentation/vm/multigen_lru.rst | 143 ++++++++++++++++++++++++++++++
 2 files changed, 144 insertions(+)
 create mode 100644 Documentation/vm/multigen_lru.rst

diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst
index eff5fbd492d0..c353b3f55924 100644
--- a/Documentation/vm/index.rst
+++ b/Documentation/vm/index.rst
@@ -17,6 +17,7 @@ various features of the Linux memory management
 
    swap_numa
    zswap
+   multigen_lru
 
 Kernel developers MM documentation
 ==================================
diff --git a/Documentation/vm/multigen_lru.rst b/Documentation/vm/multigen_lru.rst
new file mode 100644
index 000000000000..a18416ed7e92
--- /dev/null
+++ b/Documentation/vm/multigen_lru.rst
@@ -0,0 +1,143 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Multigenerational LRU
+=====================
+
+Quick Start
+===========
+Build Options
+-------------
+:Required: Set ``CONFIG_LRU_GEN=y``.
+
+:Optional: Set ``CONFIG_LRU_GEN_ENABLED=y`` to turn the feature on by
+ default.
+
+:Optional: Change ``CONFIG_NR_LRU_GENS`` to a number ``X`` to support
+ a maximum of ``X`` generations.
+
+:Optional: Change ``CONFIG_TIERS_PER_GEN`` to a number ``Y`` to
+ support a maximum of ``Y`` tiers per generation.
+
+Runtime Options
+---------------
+:Required: Write ``1`` to ``/sys/kernel/mm/lru_gen/enable`` if the
+ feature was not turned on by default.
+
+:Optional: Change ``/sys/kernel/mm/lru_gen/spread`` to a number ``N``
+ to spread pages out across ``N+1`` generations. ``N`` should be less
+ than ``X``. Larger values make the background aging more aggressive.
+
+:Optional: Read ``/sys/kernel/debug/lru_gen`` to verify the feature.
+ This file has the following output:
+
+::
+
+  memcg  memcg_id  memcg_path
+    node  node_id
+      min_gen  birth_time  anon_size  file_size
+      ...
+      max_gen  birth_time  anon_size  file_size
+
+Given a memcg and a node, ``min_gen`` is the oldest generation
+(number) and ``max_gen`` is the youngest. Birth time is in
+milliseconds. The sizes of anon and file types are in pages.
+
+Recipes
+-------
+:Android on ARMv8.1+: ``X=4``, ``Y=3`` and ``N=0``.
+
+:Android on pre-ARMv8.1 CPUs: Not recommended due to the lack of
+ ``ARM64_HW_AFDBM``.
+
+:Laptops and workstations running Chrome on x86_64: Use the default
+ values.
+
+:Working set estimation: Write ``+ memcg_id node_id gen [swappiness]``
+ to ``/sys/kernel/debug/lru_gen`` to account referenced pages to
+ generation ``max_gen`` and create the next generation ``max_gen+1``.
+ ``gen`` should be equal to ``max_gen``. A swap file and a non-zero
+ ``swappiness`` are required to scan anon type. If swapping is not
+ desired, set ``vm.swappiness`` to ``0``.
+
+:Proactive reclaim: Write ``- memcg_id node_id gen [swappiness]
+ [nr_to_reclaim]`` to ``/sys/kernel/debug/lru_gen`` to evict
+ generations less than or equal to ``gen``. ``gen`` should be less
+ than ``max_gen-1`` as ``max_gen`` and ``max_gen-1`` are active
+ generations and therefore protected from the eviction. Use
+ ``nr_to_reclaim`` to limit the number of pages to evict. Multiple
+ command lines are supported, so does concatenation with delimiters
+ ``,`` and ``;``.
+
+Framework
+=========
+For each ``lruvec``, evictable pages are divided into multiple
+generations. The youngest generation number is stored in ``max_seq``
+for both anon and file types as they are aged on an equal footing. The
+oldest generation numbers are stored in ``min_seq[2]`` separately for
+anon and file types as clean file pages can be evicted regardless of
+swap and write-back constraints. These three variables are
+monotonically increasing. Generation numbers are truncated into
+``order_base_2(CONFIG_NR_LRU_GENS+1)`` bits in order to fit into
+``page->flags``. The sliding window technique is used to prevent
+truncated generation numbers from overlapping. Each truncated
+generation number is an index to an array of per-type and per-zone
+lists. Evictable pages are added to the per-zone lists indexed by
+``max_seq`` or ``min_seq[2]`` (modulo ``CONFIG_NR_LRU_GENS``),
+depending on their types.
+
+Each generation is then divided into multiple tiers. Tiers represent
+levels of usage from file descriptors only. Pages accessed N times via
+file descriptors belong to tier order_base_2(N). Each generation
+contains at most CONFIG_TIERS_PER_GEN tiers, and they require
+additional CONFIG_TIERS_PER_GEN-2 bits in page->flags. In contrast to
+moving across generations which requires the lru lock for the list
+operations, moving across tiers only involves an atomic operation on
+``page->flags`` and therefore has a negligible cost. A feedback loop
+modeled after the PID controller monitors the refault rates across all
+tiers and decides when to activate pages from which tiers in the
+reclaim path.
+
+The framework comprises two conceptually independent components: the
+aging and the eviction, which can be invoked separately from user
+space for the purpose of working set estimation and proactive reclaim.
+
+Aging
+-----
+The aging produces young generations. Given an ``lruvec``, the aging
+scans page tables for referenced pages of this ``lruvec``. Upon
+finding one, the aging updates its generation number to ``max_seq``.
+After each round of scan, the aging increments ``max_seq``.
+
+The aging maintains either a system-wide ``mm_struct`` list or
+per-memcg ``mm_struct`` lists, and it only scans page tables of
+processes that have been scheduled since the last scan.
+
+The aging is due when both of ``min_seq[2]`` reaches ``max_seq-1``,
+assuming both anon and file types are reclaimable.
+
+Eviction
+--------
+The eviction consumes old generations. Given an ``lruvec``, the
+eviction scans the pages on the per-zone lists indexed by either of
+``min_seq[2]``. It first tries to select a type based on the values of
+``min_seq[2]``. When anon and file types are both available from the
+same generation, it selects the one that has a lower refault rate.
+
+During a scan, the eviction sorts pages according to their new
+generation numbers, if the aging has found them referenced. It also
+moves pages from the tiers that have higher refault rates than tier 0
+to the next generation.
+
+When it finds all the per-zone lists of a selected type are empty, the
+eviction increments ``min_seq[2]`` indexed by this selected type.
+
+To-do List
+==========
+KVM Optimization
+----------------
+Support shadow page table scanning.
+
+NUMA Optimization
+-----------------
+Optimize page table scan for NUMA.
-- 
2.31.1.751.gd2f1c929bd-goog



  parent reply	other threads:[~2021-05-20  6:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-20  6:53 [PATCH v3 00/14] Multigenerational LRU Framework Yu Zhao
2021-05-20  6:53 ` [PATCH v3 01/14] include/linux/memcontrol.h: do not warn in page_memcg_rcu() if !CONFIG_MEMCG Yu Zhao
2021-05-20  6:53 ` [PATCH v3 02/14] include/linux/nodemask.h: define next_memory_node() if !CONFIG_NUMA Yu Zhao
2021-05-20  6:53 ` [PATCH v3 03/14] include/linux/cgroup.h: export cgroup_mutex Yu Zhao
2021-05-20  6:53 ` [PATCH v3 04/14] mm, x86: support the access bit on non-leaf PMD entries Yu Zhao
2021-05-20  6:53 ` [PATCH v3 05/14] mm/vmscan.c: refactor shrink_node() Yu Zhao
2021-05-20  6:53 ` [PATCH v3 06/14] mm/workingset.c: refactor pack_shadow() and unpack_shadow() Yu Zhao
2021-05-20  6:53 ` [PATCH v3 07/14] mm: multigenerational lru: groundwork Yu Zhao
2021-05-20  6:53 ` [PATCH v3 08/14] mm: multigenerational lru: activation Yu Zhao
2021-05-20  6:53 ` [PATCH v3 09/14] mm: multigenerational lru: mm_struct list Yu Zhao
2021-05-20  6:53 ` [PATCH v3 10/14] mm: multigenerational lru: aging Yu Zhao
2021-05-20  6:53 ` [PATCH v3 11/14] mm: multigenerational lru: eviction Yu Zhao
2021-05-20  6:53 ` [PATCH v3 12/14] mm: multigenerational lru: user interface Yu Zhao
2021-05-20  6:53 ` [PATCH v3 13/14] mm: multigenerational lru: Kconfig Yu Zhao
2021-05-20  6:53 ` Yu Zhao [this message]
2021-07-28  1:59 ` [PATCH v3 00/14] Multigenerational LRU Framework Hillf Danton
2021-08-01 17:21   ` Yu Zhao
     [not found]   ` <20210807150459.294f8c03@mail.inbox.lv>
2021-08-07  7:51     ` Hillf Danton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210520065355.2736558-15-yuzhao@google.com \
    --to=yuzhao@google.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexs@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=hi-angel@yandex.ru \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@lists.01.org \
    --cc=m.seyfarth@gmail.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=michael@michaellarabel.com \
    --cc=michel@lespinasse.org \
    --cc=page-reclaim@google.com \
    --cc=riel@surriel.com \
    --cc=shy828301@gmail.com \
    --cc=sirspudd@gmail.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).