KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Marc Zyngier <maz@kernel.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
	Paul Mackerras <paulus@ozlabs.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org,
	kvm@vger.kernel.org, kvm-ppc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Ben Gardon <bgardon@google.com>
Subject: [PATCH v2 08/10] KVM: Take mmu_lock when handling MMU notifier iff the hva hits a memslot
Date: Thu,  1 Apr 2021 17:56:56 -0700
Message-ID: <20210402005658.3024832-9-seanjc@google.com> (raw)
In-Reply-To: <20210402005658.3024832-1-seanjc@google.com>

Defer acquiring mmu_lock in the MMU notifier paths until a "hit" has been
detected in the memslots, i.e. don't take the lock for notifications that
don't affect the guest.

For small VMs, spurious locking is a minor annoyance.  And for "volatile"
setups where the majority of notifications _are_ relevant, this barely
qualifies as an optimization.

But, for large VMs (hundreds of threads) with static setups, e.g. no
page migration, no swapping, etc..., the vast majority of MMU notifier
callbacks will be unrelated to the guest, e.g. will often be in response
to the userspace VMM adjusting its own virtual address space.  In such
large VMs, acquiring mmu_lock can be painful as it blocks vCPUs from
handling page faults.  In some scenarios it can even be "fatal" in the
sense that it causes unacceptable brownouts, e.g. when rebuilding huge
pages after live migration, a significant percentage of vCPUs will be
attempting to handle page faults.

x86's TDP MMU implementation is especially susceptible to spurious
locking due it taking mmu_lock for read when handling page faults.
Because rwlock is fair, a single writer will stall future readers, while
the writer is itself stalled waiting for in-progress readers to complete.
This is exacerbated by the MMU notifiers often firing multiple times in
quick succession, e.g. moving a page will (always?) invoke three separate
notifiers: .invalidate_range_start(), invalidate_range_end(), and
.change_pte().  Unnecessarily taking mmu_lock each time means even a
single spurious sequence can be problematic.

Note, this optimizes only the unpaired callbacks.  Optimizing the
.invalidate_range_{start,end}() pairs is more complex and will be done in
a future patch.

Suggested-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
 virt/kvm/kvm_main.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 25ecb5235e17..f6697ad741ed 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -482,10 +482,10 @@ static void kvm_null_fn(void)
 static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
 						  const struct kvm_hva_range *range)
+	bool ret = false, locked = false;
 	struct kvm_gfn_range gfn_range;
 	struct kvm_memory_slot *slot;
 	struct kvm_memslots *slots;
-	bool ret = false;
 	int i, idx;
 	/* A null handler is allowed if and only if on_lock() is provided. */
@@ -493,11 +493,13 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
 		return 0;
-	KVM_MMU_LOCK(kvm);
 	idx = srcu_read_lock(&kvm->srcu);
+	/* The on_lock() path does not yet support lock elision. */
 	if (!IS_KVM_NULL_FN(range->on_lock)) {
+		locked = true;
+		KVM_MMU_LOCK(kvm);
 		range->on_lock(kvm, range->start, range->end);
 		if (IS_KVM_NULL_FN(range->handler))
@@ -532,6 +534,10 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
 			gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot);
 			gfn_range.slot = slot;
+			if (!locked) {
+				locked = true;
+				KVM_MMU_LOCK(kvm);
+			}
 			ret |= range->handler(kvm, &gfn_range);
@@ -540,7 +546,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
+	if (locked)
 	srcu_read_unlock(&kvm->srcu, idx);

  parent reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02  0:56 [PATCH v2 00/10] KVM: Consolidate and optimize MMU notifiers Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 01/10] KVM: Assert that notifier count is elevated in .change_pte() Sean Christopherson
2021-04-02 11:08   ` Paolo Bonzini
2021-04-02  0:56 ` [PATCH v2 02/10] KVM: Move x86's MMU notifier memslot walkers to generic code Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 03/10] KVM: arm64: Convert to the gfn-based MMU notifier callbacks Sean Christopherson
2021-04-12 10:12   ` Marc Zyngier
2021-04-02  0:56 ` [PATCH v2 04/10] KVM: MIPS/MMU: " Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 05/10] KVM: PPC: " Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 06/10] KVM: Kill off the old hva-based " Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 07/10] KVM: Move MMU notifier's mmu_lock acquisition into common helper Sean Christopherson
2021-04-02  9:35   ` Paolo Bonzini
2021-04-02 14:59     ` Sean Christopherson
2021-04-02  0:56 ` Sean Christopherson [this message]
2021-04-02  0:56 ` [PATCH v2 09/10] KVM: Don't take mmu_lock for range invalidation unless necessary Sean Christopherson
2021-04-02  9:34   ` Paolo Bonzini
2021-04-02 14:59     ` Sean Christopherson
2021-04-19  8:49   ` Wanpeng Li
2021-04-19 13:50     ` Paolo Bonzini
2021-04-19 15:09       ` Sean Christopherson
2021-04-19 22:09         ` Paolo Bonzini
2021-04-20  1:17           ` Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 10/10] KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible Sean Christopherson
2021-04-02 12:17 ` [PATCH v2 00/10] KVM: Consolidate and optimize MMU notifiers Paolo Bonzini
2021-04-12 10:27   ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210402005658.3024832-9-seanjc@google.com \
    --to=seanjc@google.com \
    --cc=aleksandar.qemu.devel@gmail.com \
    --cc=bgardon@google.com \
    --cc=chenhuacai@kernel.org \
    --cc=james.morse@arm.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=paulus@ozlabs.org \
    --cc=pbonzini@redhat.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git