From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
Michael Larabel <michael@michaellarabel.com>,
kvmarm@lists.linux.dev, kvm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org, x86@kernel.org,
linux-mm@google.com, Yu Zhao <yuzhao@google.com>
Subject: [PATCH mm-unstable v1 0/5] mm/kvm: lockless accessed bit harvest
Date: Thu, 16 Feb 2023 21:12:25 -0700 [thread overview]
Message-ID: <20230217041230.2417228-1-yuzhao@google.com> (raw)
TLDR
====
This patchset RCU-protects KVM page tables and compare-and-exchanges
KVM PTEs with the accessed bit set by hardware. It significantly
improves the performance of guests when the host is under heavy
memory pressure.
ChromeOS has been using a similar approach [1] since mid 2021 and it
was proven successful on tens of millions devices.
[1] https://crrev.com/c/2987928
Overview
========
The goal of this patchset is to optimize the performance of guests
when the host memory is overcommitted. It focuses on the vast
majority of VMs that are not nested and run on hardware that sets the
accessed bit in KVM page tables.
Note that nested VMs and hardware that does not support the accessed
bit are both out of scope.
This patchset relies on two techniques, RCU and cmpxchg, to safely
test and clear the accessed bit without taking kvm->mmu_lock. The
former protects KVM page tables from being freed while the latter
clears the accessed bit atomically against both hardware and other
software page table walkers.
A new MMU notifier API, mmu_notifier_test_clear_young(), is
introduced. It follows two design patterns: fallback and batching.
For any unsupported cases, it can optionally fall back to
mmu_notifier_ops->clear_young(). For a range of KVM PTEs, it can test
or test and clear their accessed bits according to a bitmap provided
by the caller.
This patchset only applies mmu_notifier_test_clear_young() to MGLRU.
A follow-up patchset will apply it to /proc/PID/pagemap and
/prod/PID/clear_refs.
Evaluation
==========
An existing selftest can quickly demonstrate the effectiveness of
this patchset. On a generic workstation equipped with 64 CPUs and
256GB DRAM:
$ sudo max_guest_memory_test -c 64 -m 256 -s 256
MGLRU run2
---------------
Before ~600s
After ~50s
Off ~250s
kswapd (MGLRU before)
100.00% balance_pgdat
100.00% shrink_node
100.00% shrink_one
99.97% try_to_shrink_lruvec
99.06% evict_folios
97.41% shrink_folio_list
31.33% folio_referenced
31.06% rmap_walk_file
30.89% folio_referenced_one
20.83% __mmu_notifier_clear_flush_young
20.54% kvm_mmu_notifier_clear_flush_young
=> 19.34% _raw_write_lock
kswapd (MGLRU after)
100.00% balance_pgdat
100.00% shrink_node
100.00% shrink_one
99.97% try_to_shrink_lruvec
99.51% evict_folios
71.70% shrink_folio_list
7.08% folio_referenced
6.78% rmap_walk_file
6.72% folio_referenced_one
5.60% lru_gen_look_around
=> 1.53% __mmu_notifier_test_clear_young
kswapd (MGLRU off)
100.00% balance_pgdat
100.00% shrink_node
99.92% shrink_lruvec
69.95% shrink_folio_list
19.35% folio_referenced
18.37% rmap_walk_file
17.88% folio_referenced_one
13.20% __mmu_notifier_clear_flush_young
11.64% kvm_mmu_notifier_clear_flush_young
=> 9.93% _raw_write_lock
26.23% shrink_active_list
25.50% folio_referenced
25.35% rmap_walk_file
25.28% folio_referenced_one
23.87% __mmu_notifier_clear_flush_young
23.69% kvm_mmu_notifier_clear_flush_young
=> 18.98% _raw_write_lock
Comprehensive benchmarks are coming soon.
Yu Zhao (5):
mm/kvm: add mmu_notifier_test_clear_young()
kvm/x86: add kvm_arch_test_clear_young()
kvm/arm64: add kvm_arch_test_clear_young()
kvm/powerpc: add kvm_arch_test_clear_young()
mm: multi-gen LRU: use mmu_notifier_test_clear_young()
arch/arm64/include/asm/kvm_host.h | 7 ++
arch/arm64/include/asm/kvm_pgtable.h | 8 ++
arch/arm64/include/asm/stage2_pgtable.h | 43 ++++++++
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/hyp/pgtable.c | 51 ++--------
arch/arm64/kvm/mmu.c | 77 +++++++++++++-
arch/powerpc/include/asm/kvm_host.h | 18 ++++
arch/powerpc/include/asm/kvm_ppc.h | 14 +--
arch/powerpc/kvm/book3s.c | 7 ++
arch/powerpc/kvm/book3s.h | 2 +
arch/powerpc/kvm/book3s_64_mmu_radix.c | 78 ++++++++++++++-
arch/powerpc/kvm/book3s_hv.c | 10 +-
arch/x86/include/asm/kvm_host.h | 27 +++++
arch/x86/kvm/mmu/spte.h | 12 ---
arch/x86/kvm/mmu/tdp_mmu.c | 41 ++++++++
include/linux/kvm_host.h | 29 ++++++
include/linux/mmu_notifier.h | 40 ++++++++
include/linux/mmzone.h | 6 +-
mm/mmu_notifier.c | 26 +++++
mm/rmap.c | 8 +-
mm/vmscan.c | 127 +++++++++++++++++++++---
virt/kvm/kvm_main.c | 58 +++++++++++
22 files changed, 593 insertions(+), 97 deletions(-)
--
2.39.2.637.g21b0678d19-goog
next reply other threads:[~2023-02-17 4:12 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-17 4:12 Yu Zhao [this message]
2023-02-17 4:12 ` [PATCH mm-unstable v1 1/5] mm/kvm: add mmu_notifier_test_clear_young() Yu Zhao
2023-02-23 17:13 ` Sean Christopherson
2023-02-23 17:40 ` Yu Zhao
2023-02-23 21:12 ` Sean Christopherson
2023-02-23 17:34 ` Sean Christopherson
2023-02-17 4:12 ` [PATCH mm-unstable v1 2/5] kvm/x86: add kvm_arch_test_clear_young() Yu Zhao
2023-02-17 4:19 ` Yu Zhao
2023-02-17 16:27 ` Sean Christopherson
2023-02-23 5:58 ` Yu Zhao
2023-02-23 17:09 ` Sean Christopherson
2023-02-23 17:27 ` Yu Zhao
2023-02-23 18:23 ` Sean Christopherson
2023-02-23 18:34 ` Yu Zhao
2023-02-23 18:47 ` Sean Christopherson
2023-02-23 19:02 ` Yu Zhao
2023-02-23 19:21 ` Sean Christopherson
2023-02-23 19:25 ` Yu Zhao
2023-02-17 4:12 ` [PATCH mm-unstable v1 3/5] kvm/arm64: " Yu Zhao
2023-02-17 4:21 ` Yu Zhao
2023-02-17 9:00 ` Marc Zyngier
2023-02-23 3:58 ` Yu Zhao
2023-02-23 9:03 ` Marc Zyngier
2023-02-23 9:18 ` Yu Zhao
2023-02-17 9:09 ` Oliver Upton
2023-02-17 16:00 ` Sean Christopherson
2023-02-23 5:25 ` Yu Zhao
2023-02-23 4:43 ` Yu Zhao
2023-02-17 4:12 ` [PATCH mm-unstable v1 4/5] kvm/powerpc: " Yu Zhao
2023-02-17 4:24 ` Yu Zhao
2023-02-17 4:12 ` [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young() Yu Zhao
2023-02-23 17:43 ` Sean Christopherson
2023-02-23 18:08 ` Yu Zhao
2023-02-23 19:11 ` Sean Christopherson
2023-02-23 19:36 ` Yu Zhao
2023-02-23 19:58 ` Sean Christopherson
2023-02-23 20:09 ` Yu Zhao
2023-02-23 20:28 ` Sean Christopherson
2023-02-23 20:48 ` Yu Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230217041230.2417228-1-yuzhao@google.com \
--to=yuzhao@google.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@google.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=michael@michaellarabel.com \
--cc=pbonzini@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).