All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleksandr Natalenko <oleksandr@natalenko.name>
To: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@kernel.org>,
	Yu Zhao <yuzhao@google.com>
Cc: Andi Kleen <ak@linux.intel.com>,
	Aneesh Kumar <aneesh.kumar@linux.ibm.com>,
	Barry Song <21cnbao@gmail.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>,
	Jesse Barnes <jsbarnes@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michael Larabel <Michael@michaellarabel.com>,
	Mike Rapoport <rppt@kernel.org>, Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>, Will Deacon <will@kernel.org>,
	Ying Huang <ying.huang@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	page-reclaim@google.com, x86@kernel.org,
	Yu Zhao <yuzhao@google.com>
Subject: Re: [PATCH v7 00/12] Multigenerational LRU Framework
Date: Tue, 08 Feb 2022 11:11:02 +0100	[thread overview]
Message-ID: <4714886.31r3eYUQgx@natalenko.name> (raw)
In-Reply-To: <20220208081902.3550911-1-yuzhao@google.com>

Hello.

On úterý 8. února 2022 9:18:50 CET Yu Zhao wrote:
> What's new
> ==========
> 1) Addressed all the comments received on the mailing list and in the
>    meeting with the stakeholders (will note on individual patches).
> 2) Measured the performance improvements for each patch between 5-8
>    (reported in the commit messages).
> 
> TLDR
> ====
> The current page reclaim is too expensive in terms of CPU usage and it
> often makes poor choices about what to evict. This patchset offers an
> alternative solution that is performant, versatile and straightforward.
> 
> Patchset overview
> =================
> The design and implementation overview was moved to patch 12 so that
> people can finish reading this cover letter.
> 
> 1. mm: x86, arm64: add arch_has_hw_pte_young()
> 2. mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG
> Using hardware optimizations when trying to clear the accessed bit in
> many PTEs.
> 
> 3. mm/vmscan.c: refactor shrink_node()
> A minor refactor.
> 
> 4. mm: multigenerational LRU: groundwork
> Adding the basic data structure and the functions that insert/remove
> pages to/from the multigenerational LRU (MGLRU) lists.
> 
> 5. mm: multigenerational LRU: minimal implementation
> A minimal (functional) implementation without any optimizations.
> 
> 6. mm: multigenerational LRU: exploit locality in rmap
> Improving the efficiency when using the rmap.
> 
> 7. mm: multigenerational LRU: support page table walks
> Adding the (optional) page table scanning.
> 
> 8. mm: multigenerational LRU: optimize multiple memcgs
> Optimizing the overall performance for multiple memcgs running mixed
> types of workloads.
> 
> 9. mm: multigenerational LRU: runtime switch
> Adding a runtime switch to enable or disable MGLRU.
> 
> 10. mm: multigenerational LRU: thrashing prevention
> 11. mm: multigenerational LRU: debugfs interface
> Providing userspace with additional features like thrashing prevention,
> working set estimation and proactive reclaim.
> 
> 12. mm: multigenerational LRU: documentation
> Adding a design doc and an admin guide.
> 
> Benchmark results
> =================
> Independent lab results
> -----------------------
> Based on the popularity of searches [01] and the memory usage in
> Google's public cloud, the most popular open-source memory-hungry
> applications, in alphabetical order, are:
>       Apache Cassandra      Memcached
>       Apache Hadoop         MongoDB
>       Apache Spark          PostgreSQL
>       MariaDB (MySQL)       Redis
> 
> An independent lab evaluated MGLRU with the most widely used benchmark
> suites for the above applications. They posted 960 data points along
> with kernel metrics and perf profiles collected over more than 500
> hours of total benchmark time. Their final reports show that, with 95%
> confidence intervals (CIs), the above applications all performed
> significantly better for at least part of their benchmark matrices.
> 
> On 5.14:
> 1. Apache Spark [02] took 95% CIs [9.28, 11.19]% and [12.20, 14.93]%
>    less wall time to sort three billion random integers, respectively,
>    under the medium- and the high-concurrency conditions, when
>    overcommitting memory. There were no statistically significant
>    changes in wall time for the rest of the benchmark matrix.
> 2. MariaDB [03] achieved 95% CIs [5.24, 10.71]% and [20.22, 25.97]%
>    more transactions per minute (TPM), respectively, under the medium-
>    and the high-concurrency conditions, when overcommitting memory.
>    There were no statistically significant changes in TPM for the rest
>    of the benchmark matrix.
> 3. Memcached [04] achieved 95% CIs [23.54, 32.25]%, [20.76, 41.61]%
>    and [21.59, 30.02]% more operations per second (OPS), respectively,
>    for sequential access, random access and Gaussian (distribution)
>    access, when THP=always; 95% CIs [13.85, 15.97]% and
>    [23.94, 29.92]% more OPS, respectively, for random access and
>    Gaussian access, when THP=never. There were no statistically
>    significant changes in OPS for the rest of the benchmark matrix.
> 4. MongoDB [05] achieved 95% CIs [2.23, 3.44]%, [6.97, 9.73]% and
>    [2.16, 3.55]% more operations per second (OPS), respectively, for
>    exponential (distribution) access, random access and Zipfian
>    (distribution) access, when underutilizing memory; 95% CIs
>    [8.83, 10.03]%, [21.12, 23.14]% and [5.53, 6.46]% more OPS,
>    respectively, for exponential access, random access and Zipfian
>    access, when overcommitting memory.
> 
> On 5.15:
> 5. Apache Cassandra [06] achieved 95% CIs [1.06, 4.10]%, [1.94, 5.43]%
>    and [4.11, 7.50]% more operations per second (OPS), respectively,
>    for exponential (distribution) access, random access and Zipfian
>    (distribution) access, when swap was off; 95% CIs [0.50, 2.60]%,
>    [6.51, 8.77]% and [3.29, 6.75]% more OPS, respectively, for
>    exponential access, random access and Zipfian access, when swap was
>    on.
> 6. Apache Hadoop [07] took 95% CIs [5.31, 9.69]% and [2.02, 7.86]%
>    less average wall time to finish twelve parallel TeraSort jobs,
>    respectively, under the medium- and the high-concurrency
>    conditions, when swap was on. There were no statistically
>    significant changes in average wall time for the rest of the
>    benchmark matrix.
> 7. PostgreSQL [08] achieved 95% CI [1.75, 6.42]% more transactions per
>    minute (TPM) under the high-concurrency condition, when swap was
>    off; 95% CIs [12.82, 18.69]% and [22.70, 46.86]% more TPM,
>    respectively, under the medium- and the high-concurrency
>    conditions, when swap was on. There were no statistically
>    significant changes in TPM for the rest of the benchmark matrix.
> 8. Redis [09] achieved 95% CIs [0.58, 5.94]%, [6.55, 14.58]% and
>    [11.47, 19.36]% more total operations per second (OPS),
>    respectively, for sequential access, random access and Gaussian
>    (distribution) access, when THP=always; 95% CIs [1.27, 3.54]%,
>    [10.11, 14.81]% and [8.75, 13.64]% more total OPS, respectively,
>    for sequential access, random access and Gaussian access, when
>    THP=never.
> 
> Our lab results
> ---------------
> To supplement the above results, we ran the following benchmark suites
> on 5.16-rc7 and found no regressions [10]. (These synthetic benchmarks
> are popular among MM developers, but we prefer large-scale A/B
> experiments to validate improvements.)
>       fs_fio_bench_hdd_mq      pft
>       fs_lmbench               pgsql-hammerdb
>       fs_parallelio            redis
>       fs_postmark              stream
>       hackbench                sysbenchthread
>       kernbench                tpcc_spark
>       memcached                unixbench
>       multichase               vm-scalability
>       mutilate                 will-it-scale
>       nginx
> 
> [01] https://trends.google.com
> [02] https://lore.kernel.org/lkml/20211102002002.92051-1-bot@edi.works/
> [03] https://lore.kernel.org/lkml/20211009054315.47073-1-bot@edi.works/
> [04] https://lore.kernel.org/lkml/20211021194103.65648-1-bot@edi.works/
> [05] https://lore.kernel.org/lkml/20211109021346.50266-1-bot@edi.works/
> [06] https://lore.kernel.org/lkml/20211202062806.80365-1-bot@edi.works/
> [07] https://lore.kernel.org/lkml/20211209072416.33606-1-bot@edi.works/
> [08] https://lore.kernel.org/lkml/20211218071041.24077-1-bot@edi.works/
> [09] https://lore.kernel.org/lkml/20211122053248.57311-1-bot@edi.works/
> [10] https://lore.kernel.org/lkml/20220104202247.2903702-1-yuzhao@google.com/
> 
> Read-world applications
> =======================
> Third-party testimonials
> ------------------------
> Konstantin wrote [11]:
>    I have Archlinux with 8G RAM + zswap + swap. While developing, I
>    have lots of apps opened such as multiple LSP-servers for different
>    langs, chats, two browsers, etc... Usually, my system gets quickly
>    to a point of SWAP-storms, where I have to kill LSP-servers,
>    restart browsers to free memory, etc, otherwise the system lags
>    heavily and is barely usable.
>    
>    1.5 day ago I migrated from 5.11.15 kernel to 5.12 + the LRU
>    patchset, and I started up by opening lots of apps to create memory
>    pressure, and worked for a day like this. Till now I had *not a
>    single SWAP-storm*, and mind you I got 3.4G in SWAP. I was never
>    getting to the point of 3G in SWAP before without a single
>    SWAP-storm.
> 
> An anonymous user wrote [12]:
>    Using that v5 for some time and confirm that difference under heavy
>    load and memory pressure is significant.
> 
> Shuang wrote [13]:
>    With the MGLRU, fio achieved 95% CIs [38.95, 40.26]%, [4.12, 6.64]%
>    and [9.26, 10.36]% higher throughput, respectively, for random
>    access, Zipfian (distribution) access and Gaussian (distribution)
>    access, when the average number of jobs per CPU is 1; 95% CIs
>    [42.32, 49.15]%, [9.44, 9.89]% and [20.99, 22.86]% higher throughput,
>    respectively, for random access, Zipfian access and Gaussian access,
>    when the average number of jobs per CPU is 2.
> 
> Daniel wrote [14]:
>    With memcached allocating ~100GB of byte-addressable Optante,
>    performance improvement in terms of throughput (measured as queries
>    per second) was about 10% for a series of workloads.
> 
> Large-scale deployments
> -----------------------
> The downstream kernels that have been using MGLRU include:
> 1. Android ARCVM [15]
> 2. Arch Linux Zen [16]
> 3. Chrome OS [17]
> 4. Liquorix [18]
> 5. post-factum [19]
> 6. XanMod [20]
> 
> We've rolled out MGLRU to tens of millions of Chrome OS users and
> about a million Android users. Google's fleetwide profiling [21] shows
> an overall 40% decrease in kswapd CPU usage, in addition to
> improvements in other UX metrics, e.g., an 85% decrease in the number
> of low-memory kills at the 75th percentile and an 18% decrease in
> rendering latency at the 50th percentile.
> 
> [11] https://lore.kernel.org/lkml/140226722f2032c86301fbd326d91baefe3d7d23.camel@yandex.ru/
> [12] https://phoronix.com/forums/forum/software/general-linux-open-source/1301258-mglru-is-a-very-enticing-enhancement-for-linux-in-2022?p=1301275#post1301275
> [13] https://lore.kernel.org/lkml/20220105024423.26409-1-szhai2@cs.rochester.edu/
> [14] https://lore.kernel.org/linux-mm/CA+4-3vksGvKd18FgRinxhqHetBS1hQekJE2gwco8Ja-bJWKtFw@mail.gmail.com/
> [15] https://chromium.googlesource.com/chromiumos/third_party/kernel
> [16] https://archlinux.org
> [17] https://chromium.org
> [18] https://liquorix.net
> [19] https://gitlab.com/post-factum/pf-kernel
> [20] https://xanmod.org
> [21] https://research.google/pubs/pub44271/
> 
> Summery
> =======
> The facts are:
> 1. The independent lab results and the real-world applications
>    indicate substantial improvements; there are no known regressions.
> 2. Thrashing prevention, working set estimation and proactive reclaim
>    work out of the box; there are no equivalent solutions.
> 3. There is a lot of new code; nobody has demonstrated smaller changes
>    with similar effects.
> 
> Our options, accordingly, are:
> 1. Given the amount of evidence, the reported improvements will likely
>    materialize for a wide range of workloads.
> 2. Gauging the interest from the past discussions [22][23][24], the
>    new features will likely be put to use for both personal computers
>    and data centers.
> 3. Based on Google's track record, the new code will likely be well
>    maintained in the long term. It'd be more difficult if not
>    impossible to achieve similar effects on top of the existing
>    design.
> 
> [22] https://lore.kernel.org/lkml/20201005081313.732745-1-andrea.righi@canonical.com/
> [23] https://lore.kernel.org/lkml/20210716081449.22187-1-sj38.park@gmail.com/
> [24] https://lore.kernel.org/lkml/20211130201652.2218636d@mail.inbox.lv/
> 
> Yu Zhao (12):
>   mm: x86, arm64: add arch_has_hw_pte_young()
>   mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG
>   mm/vmscan.c: refactor shrink_node()
>   mm: multigenerational LRU: groundwork
>   mm: multigenerational LRU: minimal implementation
>   mm: multigenerational LRU: exploit locality in rmap
>   mm: multigenerational LRU: support page table walks
>   mm: multigenerational LRU: optimize multiple memcgs
>   mm: multigenerational LRU: runtime switch
>   mm: multigenerational LRU: thrashing prevention
>   mm: multigenerational LRU: debugfs interface
>   mm: multigenerational LRU: documentation
> 
>  Documentation/admin-guide/mm/index.rst        |    1 +
>  Documentation/admin-guide/mm/multigen_lru.rst |  121 +
>  Documentation/vm/index.rst                    |    1 +
>  Documentation/vm/multigen_lru.rst             |  152 +
>  arch/Kconfig                                  |    9 +
>  arch/arm64/include/asm/pgtable.h              |   14 +-
>  arch/x86/Kconfig                              |    1 +
>  arch/x86/include/asm/pgtable.h                |    9 +-
>  arch/x86/mm/pgtable.c                         |    5 +-
>  fs/exec.c                                     |    2 +
>  fs/fuse/dev.c                                 |    3 +-
>  include/linux/cgroup.h                        |   15 +-
>  include/linux/memcontrol.h                    |   36 +
>  include/linux/mm.h                            |    8 +
>  include/linux/mm_inline.h                     |  214 ++
>  include/linux/mm_types.h                      |   78 +
>  include/linux/mmzone.h                        |  182 ++
>  include/linux/nodemask.h                      |    1 +
>  include/linux/page-flags-layout.h             |   19 +-
>  include/linux/page-flags.h                    |    4 +-
>  include/linux/pgtable.h                       |   17 +-
>  include/linux/sched.h                         |    4 +
>  include/linux/swap.h                          |    5 +
>  kernel/bounds.c                               |    3 +
>  kernel/cgroup/cgroup-internal.h               |    1 -
>  kernel/exit.c                                 |    1 +
>  kernel/fork.c                                 |    9 +
>  kernel/sched/core.c                           |    1 +
>  mm/Kconfig                                    |   50 +
>  mm/huge_memory.c                              |    3 +-
>  mm/memcontrol.c                               |   27 +
>  mm/memory.c                                   |   39 +-
>  mm/mm_init.c                                  |    6 +-
>  mm/page_alloc.c                               |    1 +
>  mm/rmap.c                                     |    7 +
>  mm/swap.c                                     |   55 +-
>  mm/vmscan.c                                   | 2831 ++++++++++++++++-
>  mm/workingset.c                               |  119 +-
>  38 files changed, 3908 insertions(+), 146 deletions(-)
>  create mode 100644 Documentation/admin-guide/mm/multigen_lru.rst
>  create mode 100644 Documentation/vm/multigen_lru.rst

Thanks for the new spin.

Is the patch submission broken for everyone, or for me only? I see raw emails cluttered with some garbage like =2D, and hence I cannot apply those neither from my email client nor from lore.

Probably, you've got a git repo where things can be pulled from so that we do not depend on mailing systems and/or tools breaking plaintext?

Thanks.

-- 
Oleksandr Natalenko (post-factum)



WARNING: multiple messages have this Message-ID (diff)
From: Oleksandr Natalenko <oleksandr@natalenko.name>
To: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@kernel.org>,
	Yu Zhao <yuzhao@google.com>
Cc: Andi Kleen <ak@linux.intel.com>,
	Aneesh Kumar <aneesh.kumar@linux.ibm.com>,
	Barry Song <21cnbao@gmail.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>,
	Jesse Barnes <jsbarnes@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michael Larabel <Michael@michaellarabel.com>,
	Mike Rapoport <rppt@kernel.org>, Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>, Will Deacon <will@kernel.org>,
	Ying Huang <ying.huang@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	page-reclaim@google.com, x86@kernel.org,
	Yu Zhao <yuzhao@google.com>
Subject: Re: [PATCH v7 00/12] Multigenerational LRU Framework
Date: Tue, 08 Feb 2022 11:11:02 +0100	[thread overview]
Message-ID: <4714886.31r3eYUQgx@natalenko.name> (raw)
In-Reply-To: <20220208081902.3550911-1-yuzhao@google.com>

Hello.

On úterý 8. února 2022 9:18:50 CET Yu Zhao wrote:
> What's new
> ==========
> 1) Addressed all the comments received on the mailing list and in the
>    meeting with the stakeholders (will note on individual patches).
> 2) Measured the performance improvements for each patch between 5-8
>    (reported in the commit messages).
> 
> TLDR
> ====
> The current page reclaim is too expensive in terms of CPU usage and it
> often makes poor choices about what to evict. This patchset offers an
> alternative solution that is performant, versatile and straightforward.
> 
> Patchset overview
> =================
> The design and implementation overview was moved to patch 12 so that
> people can finish reading this cover letter.
> 
> 1. mm: x86, arm64: add arch_has_hw_pte_young()
> 2. mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG
> Using hardware optimizations when trying to clear the accessed bit in
> many PTEs.
> 
> 3. mm/vmscan.c: refactor shrink_node()
> A minor refactor.
> 
> 4. mm: multigenerational LRU: groundwork
> Adding the basic data structure and the functions that insert/remove
> pages to/from the multigenerational LRU (MGLRU) lists.
> 
> 5. mm: multigenerational LRU: minimal implementation
> A minimal (functional) implementation without any optimizations.
> 
> 6. mm: multigenerational LRU: exploit locality in rmap
> Improving the efficiency when using the rmap.
> 
> 7. mm: multigenerational LRU: support page table walks
> Adding the (optional) page table scanning.
> 
> 8. mm: multigenerational LRU: optimize multiple memcgs
> Optimizing the overall performance for multiple memcgs running mixed
> types of workloads.
> 
> 9. mm: multigenerational LRU: runtime switch
> Adding a runtime switch to enable or disable MGLRU.
> 
> 10. mm: multigenerational LRU: thrashing prevention
> 11. mm: multigenerational LRU: debugfs interface
> Providing userspace with additional features like thrashing prevention,
> working set estimation and proactive reclaim.
> 
> 12. mm: multigenerational LRU: documentation
> Adding a design doc and an admin guide.
> 
> Benchmark results
> =================
> Independent lab results
> -----------------------
> Based on the popularity of searches [01] and the memory usage in
> Google's public cloud, the most popular open-source memory-hungry
> applications, in alphabetical order, are:
>       Apache Cassandra      Memcached
>       Apache Hadoop         MongoDB
>       Apache Spark          PostgreSQL
>       MariaDB (MySQL)       Redis
> 
> An independent lab evaluated MGLRU with the most widely used benchmark
> suites for the above applications. They posted 960 data points along
> with kernel metrics and perf profiles collected over more than 500
> hours of total benchmark time. Their final reports show that, with 95%
> confidence intervals (CIs), the above applications all performed
> significantly better for at least part of their benchmark matrices.
> 
> On 5.14:
> 1. Apache Spark [02] took 95% CIs [9.28, 11.19]% and [12.20, 14.93]%
>    less wall time to sort three billion random integers, respectively,
>    under the medium- and the high-concurrency conditions, when
>    overcommitting memory. There were no statistically significant
>    changes in wall time for the rest of the benchmark matrix.
> 2. MariaDB [03] achieved 95% CIs [5.24, 10.71]% and [20.22, 25.97]%
>    more transactions per minute (TPM), respectively, under the medium-
>    and the high-concurrency conditions, when overcommitting memory.
>    There were no statistically significant changes in TPM for the rest
>    of the benchmark matrix.
> 3. Memcached [04] achieved 95% CIs [23.54, 32.25]%, [20.76, 41.61]%
>    and [21.59, 30.02]% more operations per second (OPS), respectively,
>    for sequential access, random access and Gaussian (distribution)
>    access, when THP=always; 95% CIs [13.85, 15.97]% and
>    [23.94, 29.92]% more OPS, respectively, for random access and
>    Gaussian access, when THP=never. There were no statistically
>    significant changes in OPS for the rest of the benchmark matrix.
> 4. MongoDB [05] achieved 95% CIs [2.23, 3.44]%, [6.97, 9.73]% and
>    [2.16, 3.55]% more operations per second (OPS), respectively, for
>    exponential (distribution) access, random access and Zipfian
>    (distribution) access, when underutilizing memory; 95% CIs
>    [8.83, 10.03]%, [21.12, 23.14]% and [5.53, 6.46]% more OPS,
>    respectively, for exponential access, random access and Zipfian
>    access, when overcommitting memory.
> 
> On 5.15:
> 5. Apache Cassandra [06] achieved 95% CIs [1.06, 4.10]%, [1.94, 5.43]%
>    and [4.11, 7.50]% more operations per second (OPS), respectively,
>    for exponential (distribution) access, random access and Zipfian
>    (distribution) access, when swap was off; 95% CIs [0.50, 2.60]%,
>    [6.51, 8.77]% and [3.29, 6.75]% more OPS, respectively, for
>    exponential access, random access and Zipfian access, when swap was
>    on.
> 6. Apache Hadoop [07] took 95% CIs [5.31, 9.69]% and [2.02, 7.86]%
>    less average wall time to finish twelve parallel TeraSort jobs,
>    respectively, under the medium- and the high-concurrency
>    conditions, when swap was on. There were no statistically
>    significant changes in average wall time for the rest of the
>    benchmark matrix.
> 7. PostgreSQL [08] achieved 95% CI [1.75, 6.42]% more transactions per
>    minute (TPM) under the high-concurrency condition, when swap was
>    off; 95% CIs [12.82, 18.69]% and [22.70, 46.86]% more TPM,
>    respectively, under the medium- and the high-concurrency
>    conditions, when swap was on. There were no statistically
>    significant changes in TPM for the rest of the benchmark matrix.
> 8. Redis [09] achieved 95% CIs [0.58, 5.94]%, [6.55, 14.58]% and
>    [11.47, 19.36]% more total operations per second (OPS),
>    respectively, for sequential access, random access and Gaussian
>    (distribution) access, when THP=always; 95% CIs [1.27, 3.54]%,
>    [10.11, 14.81]% and [8.75, 13.64]% more total OPS, respectively,
>    for sequential access, random access and Gaussian access, when
>    THP=never.
> 
> Our lab results
> ---------------
> To supplement the above results, we ran the following benchmark suites
> on 5.16-rc7 and found no regressions [10]. (These synthetic benchmarks
> are popular among MM developers, but we prefer large-scale A/B
> experiments to validate improvements.)
>       fs_fio_bench_hdd_mq      pft
>       fs_lmbench               pgsql-hammerdb
>       fs_parallelio            redis
>       fs_postmark              stream
>       hackbench                sysbenchthread
>       kernbench                tpcc_spark
>       memcached                unixbench
>       multichase               vm-scalability
>       mutilate                 will-it-scale
>       nginx
> 
> [01] https://trends.google.com
> [02] https://lore.kernel.org/lkml/20211102002002.92051-1-bot@edi.works/
> [03] https://lore.kernel.org/lkml/20211009054315.47073-1-bot@edi.works/
> [04] https://lore.kernel.org/lkml/20211021194103.65648-1-bot@edi.works/
> [05] https://lore.kernel.org/lkml/20211109021346.50266-1-bot@edi.works/
> [06] https://lore.kernel.org/lkml/20211202062806.80365-1-bot@edi.works/
> [07] https://lore.kernel.org/lkml/20211209072416.33606-1-bot@edi.works/
> [08] https://lore.kernel.org/lkml/20211218071041.24077-1-bot@edi.works/
> [09] https://lore.kernel.org/lkml/20211122053248.57311-1-bot@edi.works/
> [10] https://lore.kernel.org/lkml/20220104202247.2903702-1-yuzhao@google.com/
> 
> Read-world applications
> =======================
> Third-party testimonials
> ------------------------
> Konstantin wrote [11]:
>    I have Archlinux with 8G RAM + zswap + swap. While developing, I
>    have lots of apps opened such as multiple LSP-servers for different
>    langs, chats, two browsers, etc... Usually, my system gets quickly
>    to a point of SWAP-storms, where I have to kill LSP-servers,
>    restart browsers to free memory, etc, otherwise the system lags
>    heavily and is barely usable.
>    
>    1.5 day ago I migrated from 5.11.15 kernel to 5.12 + the LRU
>    patchset, and I started up by opening lots of apps to create memory
>    pressure, and worked for a day like this. Till now I had *not a
>    single SWAP-storm*, and mind you I got 3.4G in SWAP. I was never
>    getting to the point of 3G in SWAP before without a single
>    SWAP-storm.
> 
> An anonymous user wrote [12]:
>    Using that v5 for some time and confirm that difference under heavy
>    load and memory pressure is significant.
> 
> Shuang wrote [13]:
>    With the MGLRU, fio achieved 95% CIs [38.95, 40.26]%, [4.12, 6.64]%
>    and [9.26, 10.36]% higher throughput, respectively, for random
>    access, Zipfian (distribution) access and Gaussian (distribution)
>    access, when the average number of jobs per CPU is 1; 95% CIs
>    [42.32, 49.15]%, [9.44, 9.89]% and [20.99, 22.86]% higher throughput,
>    respectively, for random access, Zipfian access and Gaussian access,
>    when the average number of jobs per CPU is 2.
> 
> Daniel wrote [14]:
>    With memcached allocating ~100GB of byte-addressable Optante,
>    performance improvement in terms of throughput (measured as queries
>    per second) was about 10% for a series of workloads.
> 
> Large-scale deployments
> -----------------------
> The downstream kernels that have been using MGLRU include:
> 1. Android ARCVM [15]
> 2. Arch Linux Zen [16]
> 3. Chrome OS [17]
> 4. Liquorix [18]
> 5. post-factum [19]
> 6. XanMod [20]
> 
> We've rolled out MGLRU to tens of millions of Chrome OS users and
> about a million Android users. Google's fleetwide profiling [21] shows
> an overall 40% decrease in kswapd CPU usage, in addition to
> improvements in other UX metrics, e.g., an 85% decrease in the number
> of low-memory kills at the 75th percentile and an 18% decrease in
> rendering latency at the 50th percentile.
> 
> [11] https://lore.kernel.org/lkml/140226722f2032c86301fbd326d91baefe3d7d23.camel@yandex.ru/
> [12] https://phoronix.com/forums/forum/software/general-linux-open-source/1301258-mglru-is-a-very-enticing-enhancement-for-linux-in-2022?p=1301275#post1301275
> [13] https://lore.kernel.org/lkml/20220105024423.26409-1-szhai2@cs.rochester.edu/
> [14] https://lore.kernel.org/linux-mm/CA+4-3vksGvKd18FgRinxhqHetBS1hQekJE2gwco8Ja-bJWKtFw@mail.gmail.com/
> [15] https://chromium.googlesource.com/chromiumos/third_party/kernel
> [16] https://archlinux.org
> [17] https://chromium.org
> [18] https://liquorix.net
> [19] https://gitlab.com/post-factum/pf-kernel
> [20] https://xanmod.org
> [21] https://research.google/pubs/pub44271/
> 
> Summery
> =======
> The facts are:
> 1. The independent lab results and the real-world applications
>    indicate substantial improvements; there are no known regressions.
> 2. Thrashing prevention, working set estimation and proactive reclaim
>    work out of the box; there are no equivalent solutions.
> 3. There is a lot of new code; nobody has demonstrated smaller changes
>    with similar effects.
> 
> Our options, accordingly, are:
> 1. Given the amount of evidence, the reported improvements will likely
>    materialize for a wide range of workloads.
> 2. Gauging the interest from the past discussions [22][23][24], the
>    new features will likely be put to use for both personal computers
>    and data centers.
> 3. Based on Google's track record, the new code will likely be well
>    maintained in the long term. It'd be more difficult if not
>    impossible to achieve similar effects on top of the existing
>    design.
> 
> [22] https://lore.kernel.org/lkml/20201005081313.732745-1-andrea.righi@canonical.com/
> [23] https://lore.kernel.org/lkml/20210716081449.22187-1-sj38.park@gmail.com/
> [24] https://lore.kernel.org/lkml/20211130201652.2218636d@mail.inbox.lv/
> 
> Yu Zhao (12):
>   mm: x86, arm64: add arch_has_hw_pte_young()
>   mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG
>   mm/vmscan.c: refactor shrink_node()
>   mm: multigenerational LRU: groundwork
>   mm: multigenerational LRU: minimal implementation
>   mm: multigenerational LRU: exploit locality in rmap
>   mm: multigenerational LRU: support page table walks
>   mm: multigenerational LRU: optimize multiple memcgs
>   mm: multigenerational LRU: runtime switch
>   mm: multigenerational LRU: thrashing prevention
>   mm: multigenerational LRU: debugfs interface
>   mm: multigenerational LRU: documentation
> 
>  Documentation/admin-guide/mm/index.rst        |    1 +
>  Documentation/admin-guide/mm/multigen_lru.rst |  121 +
>  Documentation/vm/index.rst                    |    1 +
>  Documentation/vm/multigen_lru.rst             |  152 +
>  arch/Kconfig                                  |    9 +
>  arch/arm64/include/asm/pgtable.h              |   14 +-
>  arch/x86/Kconfig                              |    1 +
>  arch/x86/include/asm/pgtable.h                |    9 +-
>  arch/x86/mm/pgtable.c                         |    5 +-
>  fs/exec.c                                     |    2 +
>  fs/fuse/dev.c                                 |    3 +-
>  include/linux/cgroup.h                        |   15 +-
>  include/linux/memcontrol.h                    |   36 +
>  include/linux/mm.h                            |    8 +
>  include/linux/mm_inline.h                     |  214 ++
>  include/linux/mm_types.h                      |   78 +
>  include/linux/mmzone.h                        |  182 ++
>  include/linux/nodemask.h                      |    1 +
>  include/linux/page-flags-layout.h             |   19 +-
>  include/linux/page-flags.h                    |    4 +-
>  include/linux/pgtable.h                       |   17 +-
>  include/linux/sched.h                         |    4 +
>  include/linux/swap.h                          |    5 +
>  kernel/bounds.c                               |    3 +
>  kernel/cgroup/cgroup-internal.h               |    1 -
>  kernel/exit.c                                 |    1 +
>  kernel/fork.c                                 |    9 +
>  kernel/sched/core.c                           |    1 +
>  mm/Kconfig                                    |   50 +
>  mm/huge_memory.c                              |    3 +-
>  mm/memcontrol.c                               |   27 +
>  mm/memory.c                                   |   39 +-
>  mm/mm_init.c                                  |    6 +-
>  mm/page_alloc.c                               |    1 +
>  mm/rmap.c                                     |    7 +
>  mm/swap.c                                     |   55 +-
>  mm/vmscan.c                                   | 2831 ++++++++++++++++-
>  mm/workingset.c                               |  119 +-
>  38 files changed, 3908 insertions(+), 146 deletions(-)
>  create mode 100644 Documentation/admin-guide/mm/multigen_lru.rst
>  create mode 100644 Documentation/vm/multigen_lru.rst

Thanks for the new spin.

Is the patch submission broken for everyone, or for me only? I see raw emails cluttered with some garbage like =2D, and hence I cannot apply those neither from my email client nor from lore.

Probably, you've got a git repo where things can be pulled from so that we do not depend on mailing systems and/or tools breaking plaintext?

Thanks.

-- 
Oleksandr Natalenko (post-factum)



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2022-02-08 11:26 UTC|newest]

Thread overview: 150+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08  8:18 [PATCH v7 00/12] Multigenerational LRU Framework Yu Zhao
2022-02-08  8:18 ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 01/12] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-02-08  8:18   ` Yu Zhao
2022-02-08  8:24   ` Yu Zhao
2022-02-08  8:24     ` Yu Zhao
2022-02-08 10:33   ` Will Deacon
2022-02-08 10:33     ` Will Deacon
2022-02-08  8:18 ` [PATCH v7 02/12] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-02-08  8:18   ` Yu Zhao
2022-02-08  8:27   ` Yu Zhao
2022-02-08  8:27     ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 03/12] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-02-08  8:18   ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 04/12] mm: multigenerational LRU: groundwork Yu Zhao
2022-02-08  8:18   ` Yu Zhao
2022-02-08  8:28   ` Yu Zhao
2022-02-08  8:28     ` Yu Zhao
2022-02-10 20:41   ` Johannes Weiner
2022-02-10 20:41     ` Johannes Weiner
2022-02-15  9:43     ` Yu Zhao
2022-02-15  9:43       ` Yu Zhao
2022-02-15 21:53       ` Johannes Weiner
2022-02-15 21:53         ` Johannes Weiner
2022-02-21  8:14         ` Yu Zhao
2022-02-21  8:14           ` Yu Zhao
2022-02-23 21:18           ` Yu Zhao
2022-02-23 21:18             ` Yu Zhao
2022-02-25 16:34             ` Minchan Kim
2022-02-25 16:34               ` Minchan Kim
2022-03-03 15:29           ` Johannes Weiner
2022-03-03 15:29             ` Johannes Weiner
2022-03-03 19:26             ` Yu Zhao
2022-03-03 19:26               ` Yu Zhao
2022-03-03 21:43               ` Johannes Weiner
2022-03-03 21:43                 ` Johannes Weiner
2022-03-11 10:16       ` Barry Song
2022-03-11 10:16         ` Barry Song
2022-03-11 23:45         ` Yu Zhao
2022-03-11 23:45           ` Yu Zhao
2022-03-12 10:37           ` Barry Song
2022-03-12 10:37             ` Barry Song
2022-03-12 21:11             ` Yu Zhao
2022-03-12 21:11               ` Yu Zhao
2022-03-13  4:57               ` Barry Song
2022-03-13  4:57                 ` Barry Song
2022-03-14 11:11                 ` Barry Song
2022-03-14 11:11                   ` Barry Song
2022-03-14 16:45                   ` Yu Zhao
2022-03-14 16:45                     ` Yu Zhao
2022-03-14 23:38                     ` Barry Song
2022-03-14 23:38                       ` Barry Song
2022-03-15  5:18                       ` Yu Zhao
2022-03-15  5:18                         ` Yu Zhao
2022-03-15  9:27                         ` Barry Song
2022-03-15  9:27                           ` Barry Song
2022-03-15 10:29                           ` Barry Song
2022-03-15 10:29                             ` Barry Song
2022-03-16  2:46                             ` Yu Zhao
2022-03-16  2:46                               ` Yu Zhao
2022-03-16  4:37                               ` Barry Song
2022-03-16  4:37                                 ` Barry Song
2022-03-16  5:44                                 ` Yu Zhao
2022-03-16  5:44                                   ` Yu Zhao
2022-03-16  6:06                                   ` Barry Song
2022-03-16  6:06                                     ` Barry Song
2022-03-16 21:37                                     ` Yu Zhao
2022-03-16 21:37                                       ` Yu Zhao
2022-02-10 21:37   ` Matthew Wilcox
2022-02-10 21:37     ` Matthew Wilcox
2022-02-13 21:16     ` Yu Zhao
2022-02-13 21:16       ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 05/12] mm: multigenerational LRU: minimal implementation Yu Zhao
2022-02-08  8:18   ` Yu Zhao
2022-02-08  8:33   ` Yu Zhao
2022-02-08  8:33     ` Yu Zhao
2022-02-08 16:50   ` Johannes Weiner
2022-02-08 16:50     ` Johannes Weiner
2022-02-10  2:53     ` Yu Zhao
2022-02-10  2:53       ` Yu Zhao
2022-02-13 10:04   ` Hillf Danton
2022-02-17  0:13     ` Yu Zhao
2022-02-23  8:27   ` Huang, Ying
2022-02-23  8:27     ` Huang, Ying
2022-02-23  9:36     ` Yu Zhao
2022-02-23  9:36       ` Yu Zhao
2022-02-24  0:59       ` Huang, Ying
2022-02-24  0:59         ` Huang, Ying
2022-02-24  1:34         ` Yu Zhao
2022-02-24  1:34           ` Yu Zhao
2022-02-24  3:31           ` Huang, Ying
2022-02-24  3:31             ` Huang, Ying
2022-02-24  4:09             ` Yu Zhao
2022-02-24  4:09               ` Yu Zhao
2022-02-24  5:27               ` Huang, Ying
2022-02-24  5:27                 ` Huang, Ying
2022-02-24  5:35                 ` Yu Zhao
2022-02-24  5:35                   ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 06/12] mm: multigenerational LRU: exploit locality in rmap Yu Zhao
2022-02-08  8:18   ` Yu Zhao
2022-02-08  8:40   ` Yu Zhao
2022-02-08  8:40     ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 07/12] mm: multigenerational LRU: support page table walks Yu Zhao
2022-02-08  8:18   ` Yu Zhao
2022-02-08  8:39   ` Yu Zhao
2022-02-08  8:39     ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 08/12] mm: multigenerational LRU: optimize multiple memcgs Yu Zhao
2022-02-08  8:18   ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 09/12] mm: multigenerational LRU: runtime switch Yu Zhao
2022-02-08  8:18   ` Yu Zhao
2022-02-08  8:42   ` Yu Zhao
2022-02-08  8:42     ` Yu Zhao
2022-02-08  8:19 ` [PATCH v7 10/12] mm: multigenerational LRU: thrashing prevention Yu Zhao
2022-02-08  8:19   ` Yu Zhao
2022-02-08  8:43   ` Yu Zhao
2022-02-08  8:43     ` Yu Zhao
2022-02-08  8:19 ` [PATCH v7 11/12] mm: multigenerational LRU: debugfs interface Yu Zhao
2022-02-08  8:19   ` Yu Zhao
2022-02-18 18:56   ` [page-reclaim] " David Rientjes
2022-02-18 18:56     ` David Rientjes
2022-02-08  8:19 ` [PATCH v7 12/12] mm: multigenerational LRU: documentation Yu Zhao
2022-02-08  8:19   ` Yu Zhao
2022-02-08  8:44   ` Yu Zhao
2022-02-08  8:44     ` Yu Zhao
2022-02-14 10:28   ` Mike Rapoport
2022-02-14 10:28     ` Mike Rapoport
2022-02-16  3:22     ` Yu Zhao
2022-02-16  3:22       ` Yu Zhao
2022-02-21  9:01       ` Mike Rapoport
2022-02-21  9:01         ` Mike Rapoport
2022-02-22  1:47         ` Yu Zhao
2022-02-22  1:47           ` Yu Zhao
2022-02-23 10:58           ` Mike Rapoport
2022-02-23 10:58             ` Mike Rapoport
2022-02-23 21:20             ` Yu Zhao
2022-02-23 21:20               ` Yu Zhao
2022-02-08 10:11 ` Oleksandr Natalenko [this message]
2022-02-08 10:11   ` [PATCH v7 00/12] Multigenerational LRU Framework Oleksandr Natalenko
2022-02-08 11:14   ` Michal Hocko
2022-02-08 11:14     ` Michal Hocko
2022-02-08 11:23     ` Oleksandr Natalenko
2022-02-08 11:23       ` Oleksandr Natalenko
2022-02-11 20:12 ` Alexey Avramov
2022-02-11 20:12   ` Alexey Avramov
2022-02-12 21:01   ` Yu Zhao
2022-02-12 21:01     ` Yu Zhao
2022-03-03  6:06 ` Vaibhav Jain
2022-03-03  6:06   ` Vaibhav Jain
2022-03-03  6:47   ` Yu Zhao
2022-03-03  6:47     ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4714886.31r3eYUQgx@natalenko.name \
    --to=oleksandr@natalenko.name \
    --cc=21cnbao@gmail.com \
    --cc=Michael@michaellarabel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=jsbarnes@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=page-reclaim@google.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.