kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow
@ 2019-09-13  2:46 Sean Christopherson
  2019-09-13  2:46 ` [PATCH 01/11] KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot Sean Christopherson
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Restore the fast invalidate flow for zapping shadow pages and use it
whenever vCPUs can be active in the VM.  This fixes (in theory, not yet
confirmed) a regression reported by James Harvey where KVM can livelock
in kvm_mmu_zap_all() when it's invoked in response to a memslot update.

The fast invalidate flow was removed as it was deemed to be unnecessary
after its primary user, memslot flushing, was reworked to zap only the
memslot in question instead of all shadow pages.  Unfortunately, zapping
only the memslot being (re)moved during a memslot update introduced a
regression for VMs with assigned devices.  Because we could not discern
why zapping only the relevant memslot broke device assignment, or if the
regression extended beyond device assignment, we reverted to zapping all
shadow pages when a memslot is (re)moved.

The revert to "zap all" failed to account for subsequent changes that
have been made to kvm_mmu_zap_all() between then and now.  Specifically,
kvm_mmu_zap_all() now conditionally drops reschedules and drops mmu_lock
if a reschedule is needed or if the lock is contended.  Dropping the lock
allows other vCPUs to add shadow pages, and, with enough vCPUs, can cause
kvm_mmu_zap_all() to get stuck in an infinite loop as it can never zap all
pages before observing lock contention or the need to reschedule.

The reasoning behind having kvm_mmu_zap_all() conditionally reschedule was
that it would only be used when the VM is inaccesible, e.g. when its
mm_struct is dying or when the VM itself is being destroyed.  In that case,
playing nice with the rest of the kernel instead of hogging cycles to free
unused shadow pages made sense.

Since it's unlikely we'll root cause the device assignment regression any
time soon, and that simply removing the conditional rescheduling isn't
guaranteed to return us to a known good state, restore the fast invalidate
flow for zapping on memslot updates, including mmio generation wraparound.
Opportunisticaly tack on a bug fix and a couple enhancements.

Alex and James, it probably goes without saying... please test, especially
patch 01/11 as a standalone patch as that'll likely need to be applied to
stable branches, assuming it works.  Thanks!

Sean Christopherson (11):
  KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
  KVM: x86/mmu: Treat invalid shadow pages as obsolete
  KVM: x86/mmu: Use fast invalidate mechanism to zap MMIO sptes
  KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow
    page related tracepoints""
  KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for
    kvm_mmu_invalidate_all_pages""
  KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""
  KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap
    all pages""
  KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete
    page first""
  KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call"
  KVM: x86/mmu: Explicitly track only a single invalid mmu generation
  KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero

 arch/x86/include/asm/kvm_host.h |   4 +-
 arch/x86/kvm/mmu.c              | 154 ++++++++++++++++++++++++++++----
 arch/x86/kvm/mmutrace.h         |  42 +++++++--
 arch/x86/kvm/x86.c              |   1 +
 4 files changed, 173 insertions(+), 28 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 01/11] KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13  2:46 ` [PATCH 02/11] KVM: x86/mmu: Treat invalid shadow pages as obsolete Sean Christopherson
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Reintroduce the fast invalidate mechanism and use it when zapping shadow
pages in response to a memslot being deleted/moved.  Using the fast
mechanism fixes a livelock reported by James Harvey that was introduced
by commit d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant
pages when removing a memslot"").

The livelock occurs because kvm_mmu_zap_all() as it exists today will
voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
to add shadow pages.  With enough vCPUs, kvm_mmu_zap_all() can get stuck
in an infinite loop as it can never zap all pages before observing lock
contention or the need to reschedule.

The equivalent of kvm_mmu_zap_all() that was in use at the time of
the reverted commit (4e103134b8623, "KVM: x86/mmu: Zap only the relevant
pages when removing a memslot") employed a fast invalidate mechanism and
was not susceptible to the above livelock.  Restore the fast invalidate
code and use it when flushing a memslot.

Reverting the revert (commit d012a06ab1d23) is not a viable option as
the revert is needed to fix a regression that occurs when the guest has
one or more assigned devices.

Alternatively, the livelock could be eliminated by removing the
conditional reschedule from kvm_mmu_zap_all().  However, although
removing the reschedule would be a smaller code change, it's less safe
in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
the wild for flushing memslots since the fast invalidate mechanism was
introduced by commit 6ca18b6950f8d ("KVM: x86: use the fast way to
invalidate all pages"), back in 2013.

For all intents and purposes, this is a revert of commit ea145aacf4ae8
("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
commit 7390de1e99a70 ("Revert "KVM: x86: use the fast way to invalidate
all pages""), i.e. restores the behavior of commit 5304b8d37c2a5 ("KVM:
MMU: fast invalidate all pages") and commit 6ca18b6950f8d ("KVM: x86:
use the fast way to invalidate all pages") respectively.

Fixes: d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
Reported-by: James Harvey <jamespharvey20@gmail.com>
Cc: Alex Willamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/mmu.c              | 101 +++++++++++++++++++++++++++++++-
 2 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 44a5ce57a905..fc279b513446 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -335,6 +335,7 @@ struct kvm_mmu_page {
 	int root_count;          /* Currently serving as active root */
 	unsigned int unsync_children;
 	struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
+	unsigned long mmu_valid_gen;
 	DECLARE_BITMAP(unsync_child_bitmap, 512);
 
 #ifdef CONFIG_X86_32
@@ -856,6 +857,7 @@ struct kvm_arch {
 	unsigned long n_requested_mmu_pages;
 	unsigned long n_max_mmu_pages;
 	unsigned int indirect_shadow_pages;
+	unsigned long mmu_valid_gen;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 	/*
 	 * Hash table of struct kvm_mmu_page.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4c45ff0cfbd0..5ac5e3f50f92 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2097,6 +2097,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
 	if (!direct)
 		sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache);
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
+
+	/*
+	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
+	 * depends on valid pages being added to the head of the list.  See
+	 * comments in kvm_zap_obsolete_pages().
+	 */
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
 	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
 	return sp;
@@ -2246,7 +2252,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 #define for_each_valid_sp(_kvm, _sp, _gfn)				\
 	hlist_for_each_entry(_sp,					\
 	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \
-		if ((_sp)->role.invalid) {    \
+		if (is_obsolete_sp((_kvm), (_sp)) || (_sp)->role.invalid) {    \
 		} else
 
 #define for_each_gfn_indirect_valid_sp(_kvm, _sp, _gfn)			\
@@ -2303,6 +2309,11 @@ static void kvm_mmu_audit(struct kvm_vcpu *vcpu, int point) { }
 static void mmu_audit_disable(void) { }
 #endif
 
+static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	return unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
+}
+
 static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			 struct list_head *invalid_list)
 {
@@ -2527,6 +2538,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (level > PT_PAGE_TABLE_LEVEL && need_sync)
 			flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
 	}
+	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
 	clear_page(sp->spt);
 	trace_kvm_mmu_get_page(sp, true);
 
@@ -4236,6 +4248,13 @@ static bool fast_cr3_switch(struct kvm_vcpu *vcpu, gpa_t new_cr3,
 			return false;
 
 		if (cached_root_available(vcpu, new_cr3, new_role)) {
+			/*
+			 * It is possible that the cached previous root page is
+			 * obsolete because of a change in the MMU generation
+			 * number. However, changing the generation number is
+			 * accompanied by KVM_REQ_MMU_RELOAD, which will free
+			 * the root set here and allocate a new one.
+			 */
 			kvm_make_request(KVM_REQ_LOAD_CR3, vcpu);
 			if (!skip_tlb_flush) {
 				kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
@@ -5652,11 +5671,89 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	return alloc_mmu_pages(vcpu);
 }
 
+
+static void kvm_zap_obsolete_pages(struct kvm *kvm)
+{
+	struct kvm_mmu_page *sp, *node;
+	LIST_HEAD(invalid_list);
+	int ign;
+
+restart:
+	list_for_each_entry_safe_reverse(sp, node,
+	      &kvm->arch.active_mmu_pages, link) {
+		/*
+		 * No obsolete valid page exists before a newly created page
+		 * since active_mmu_pages is a FIFO list.
+		 */
+		if (!is_obsolete_sp(kvm, sp))
+			break;
+
+		/*
+		 * Do not repeatedly zap a root page to avoid unnecessary
+		 * KVM_REQ_MMU_RELOAD, otherwise we may not be able to
+		 * progress:
+		 *    vcpu 0                        vcpu 1
+		 *                         call vcpu_enter_guest():
+		 *                            1): handle KVM_REQ_MMU_RELOAD
+		 *                                and require mmu-lock to
+		 *                                load mmu
+		 * repeat:
+		 *    1): zap root page and
+		 *        send KVM_REQ_MMU_RELOAD
+		 *
+		 *    2): if (cond_resched_lock(mmu-lock))
+		 *
+		 *                            2): hold mmu-lock and load mmu
+		 *
+		 *                            3): see KVM_REQ_MMU_RELOAD bit
+		 *                                on vcpu->requests is set
+		 *                                then return 1 to call
+		 *                                vcpu_enter_guest() again.
+		 *            goto repeat;
+		 *
+		 * Since we are reversely walking the list and the invalid
+		 * list will be moved to the head, skip the invalid page
+		 * can help us to avoid the infinity list walking.
+		 */
+		if (sp->role.invalid)
+			continue;
+
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+			kvm_mmu_commit_zap_page(kvm, &invalid_list);
+			cond_resched_lock(&kvm->mmu_lock);
+			goto restart;
+		}
+
+		if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
+			goto restart;
+	}
+
+	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+}
+
+/*
+ * Fast invalidate all shadow pages and use lock-break technique
+ * to zap obsolete pages.
+ *
+ * It's required when memslot is being deleted or VM is being
+ * destroyed, in these cases, we should ensure that KVM MMU does
+ * not use any resource of the being-deleted slot or all slots
+ * after calling the function.
+ */
+static void kvm_mmu_zap_all_fast(struct kvm *kvm)
+{
+	spin_lock(&kvm->mmu_lock);
+	kvm->arch.mmu_valid_gen++;
+
+	kvm_zap_obsolete_pages(kvm);
+	spin_unlock(&kvm->mmu_lock);
+}
+
 static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
 			struct kvm_memory_slot *slot,
 			struct kvm_page_track_notifier_node *node)
 {
-	kvm_mmu_zap_all(kvm);
+	kvm_mmu_zap_all_fast(kvm);
 }
 
 void kvm_mmu_init_vm(struct kvm *kvm)
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 02/11] KVM: x86/mmu: Treat invalid shadow pages as obsolete
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
  2019-09-13  2:46 ` [PATCH 01/11] KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13  2:46 ` [PATCH 03/11] KVM: x86/mmu: Use fast invalidate mechanism to zap MMIO sptes Sean Christopherson
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Treat invalid shadow pages as obsolete to fix a bug where an obsolete
and invalid page with a non-zero root count could become non-obsolete
due to mmu_valid_gen wrapping.  The bug is largely theoretical with the
current code base, as an unsigned long will effectively never wrap on
64-bit KVM, and userspace would have to deliberately stall a vCPU in
order to keep an obsolete invalid page on the active list while
simultaneously modifying memslots billions of times to trigger a wrap.

The obvious alternative is to use a 64-bit value for mmu_valid_gen,
but it's actually desirable to go in the opposite direction, i.e. using
a smaller 8-bit value to reduce KVM's memory footprint by 8 bytes per
shadow page, and relying on proper treatment of invalid pages instead of
preventing the generation from wrapping.

Note, "Fixes" points at a commit that was at one point reverted, but has
since been restored.

Fixes: 5304b8d37c2a5 ("KVM: MMU: fast invalidate all pages")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5ac5e3f50f92..373e6f052f9f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2252,7 +2252,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 #define for_each_valid_sp(_kvm, _sp, _gfn)				\
 	hlist_for_each_entry(_sp,					\
 	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \
-		if (is_obsolete_sp((_kvm), (_sp)) || (_sp)->role.invalid) {    \
+		if (is_obsolete_sp((_kvm), (_sp))) {			\
 		} else
 
 #define for_each_gfn_indirect_valid_sp(_kvm, _sp, _gfn)			\
@@ -2311,7 +2311,8 @@ static void mmu_audit_disable(void) { }
 
 static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	return unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
+	return sp->role.invalid ||
+	       unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
 }
 
 static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 03/11] KVM: x86/mmu: Use fast invalidate mechanism to zap MMIO sptes
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
  2019-09-13  2:46 ` [PATCH 01/11] KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot Sean Christopherson
  2019-09-13  2:46 ` [PATCH 02/11] KVM: x86/mmu: Treat invalid shadow pages as obsolete Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13  2:46 ` [PATCH 04/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow page related tracepoints"" Sean Christopherson
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Use the fast invalidate mechasim to zap MMIO sptes on a MMIO generation
wrap.  The fast invalidate flow was reintroduced to fix a livelock bug
in kvm_mmu_zap_all() that can occur if kvm_mmu_zap_all() is invoked when
the guest has live vCPUs.  I.e. using kvm_mmu_zap_all() to handle the
MMIO generation wrap is theoretically susceptible to the livelock bug.

This effectively reverts commit 4771450c345dc ("Revert "KVM: MMU: drop
kvm_mmu_zap_mmio_sptes""), i.e. restores the behavior of commit
a8eca9dcc656a ("KVM: MMU: drop kvm_mmu_zap_mmio_sptes").

Note, this actually fixes commit 571c5af06e303 ("KVM: x86/mmu:
Voluntarily reschedule as needed when zapping MMIO sptes"), but there
is no need to incrementally revert back to using fast invalidate, e.g.
doing so doesn't provide any bisection or stability benefits.

Fixes: 571c5af06e303 ("KVM: x86/mmu: Voluntarily reschedule as needed when zapping MMIO sptes")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu.c              | 17 +++--------------
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fc279b513446..ef378abac00f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -320,7 +320,6 @@ struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
 	bool unsync;
-	bool mmio_cached;
 
 	/*
 	 * The following two entries are used to key the shadow page in the
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 373e6f052f9f..8d3fbc48d1be 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -403,8 +403,6 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
 	mask |= (gpa & shadow_nonpresent_or_rsvd_mask)
 		<< shadow_nonpresent_or_rsvd_mask_len;
 
-	page_header(__pa(sptep))->mmio_cached = true;
-
 	trace_mark_mmio_spte(sptep, gfn, access, gen);
 	mmu_spte_set(sptep, mask);
 }
@@ -5947,7 +5945,7 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
 
-static void __kvm_mmu_zap_all(struct kvm *kvm, bool mmio_only)
+void kvm_mmu_zap_all(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
 	LIST_HEAD(invalid_list);
@@ -5956,14 +5954,10 @@ static void __kvm_mmu_zap_all(struct kvm *kvm, bool mmio_only)
 	spin_lock(&kvm->mmu_lock);
 restart:
 	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
-		if (mmio_only && !sp->mmio_cached)
-			continue;
 		if (sp->role.invalid && sp->root_count)
 			continue;
-		if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign)) {
-			WARN_ON_ONCE(mmio_only);
+		if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
 			goto restart;
-		}
 		if (cond_resched_lock(&kvm->mmu_lock))
 			goto restart;
 	}
@@ -5972,11 +5966,6 @@ static void __kvm_mmu_zap_all(struct kvm *kvm, bool mmio_only)
 	spin_unlock(&kvm->mmu_lock);
 }
 
-void kvm_mmu_zap_all(struct kvm *kvm)
-{
-	return __kvm_mmu_zap_all(kvm, false);
-}
-
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
 {
 	WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
@@ -5998,7 +5987,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
 	 */
 	if (unlikely(gen == 0)) {
 		kvm_debug_ratelimited("kvm: zapping shadow pages for mmio generation wraparound\n");
-		__kvm_mmu_zap_all(kvm, true);
+		kvm_mmu_zap_all_fast(kvm);
 	}
 }
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 04/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow page related tracepoints""
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
                   ` (2 preceding siblings ...)
  2019-09-13  2:46 ` [PATCH 03/11] KVM: x86/mmu: Use fast invalidate mechanism to zap MMIO sptes Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13  2:46 ` [PATCH 05/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages"" Sean Christopherson
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Now that the fast invalidate mechanism has been reintroduced, restore
tracing of the generation number in shadow page tracepoints.

This reverts commit b59c4830ca185ba0e9f9e046fb1cd10a4a92627a.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmutrace.h | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index d8001b4bca05..e9832b5ec53c 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -8,16 +8,18 @@
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvmmmu
 
-#define KVM_MMU_PAGE_FIELDS \
-	__field(__u64, gfn) \
-	__field(__u32, role) \
-	__field(__u32, root_count) \
+#define KVM_MMU_PAGE_FIELDS			\
+	__field(unsigned long, mmu_valid_gen)	\
+	__field(__u64, gfn)			\
+	__field(__u32, role)			\
+	__field(__u32, root_count)		\
 	__field(bool, unsync)
 
-#define KVM_MMU_PAGE_ASSIGN(sp)			     \
-	__entry->gfn = sp->gfn;			     \
-	__entry->role = sp->role.word;		     \
-	__entry->root_count = sp->root_count;        \
+#define KVM_MMU_PAGE_ASSIGN(sp)				\
+	__entry->mmu_valid_gen = sp->mmu_valid_gen;	\
+	__entry->gfn = sp->gfn;				\
+	__entry->role = sp->role.word;			\
+	__entry->root_count = sp->root_count;		\
 	__entry->unsync = sp->unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
@@ -29,8 +31,9 @@
 								        \
 	role.word = __entry->role;					\
 									\
-	trace_seq_printf(p, "sp gfn %llx l%u %u-byte q%u%s %s%s"	\
+	trace_seq_printf(p, "sp gen %lx gfn %llx l%u %u-byte q%u%s %s%s"\
 			 " %snxe %sad root %u %s%c",			\
+			 __entry->mmu_valid_gen,			\
 			 __entry->gfn, role.level,			\
 			 role.gpte_is_8_bytes ? 8 : 4,			\
 			 role.quadrant,					\
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 05/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages""
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
                   ` (3 preceding siblings ...)
  2019-09-13  2:46 ` [PATCH 04/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow page related tracepoints"" Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13  2:46 ` [PATCH 06/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch"" Sean Christopherson
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Now that the fast invalidate mechanism has been reintroduced, restore
the tracepoint associated with said mechanism.

Note, the name of the tracepoint deviates from the original tracepoint
so as to match KVM's current nomenclature.

This reverts commit 42560fb1f3c6c7f730897b7fa7a478bc37e0be50.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu.c      |  1 +
 arch/x86/kvm/mmutrace.h | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8d3fbc48d1be..0bf20afc3e73 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5742,6 +5742,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 {
 	spin_lock(&kvm->mmu_lock);
+	trace_kvm_mmu_zap_all_fast(kvm);
 	kvm->arch.mmu_valid_gen++;
 
 	kvm_zap_obsolete_pages(kvm);
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index e9832b5ec53c..1a063ba76281 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -282,6 +282,27 @@ TRACE_EVENT(
 	)
 );
 
+TRACE_EVENT(
+	kvm_mmu_zap_all_fast,
+	TP_PROTO(struct kvm *kvm),
+	TP_ARGS(kvm),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, mmu_valid_gen)
+		__field(unsigned int, mmu_used_pages)
+	),
+
+	TP_fast_assign(
+		__entry->mmu_valid_gen = kvm->arch.mmu_valid_gen;
+		__entry->mmu_used_pages = kvm->arch.n_used_mmu_pages;
+	),
+
+	TP_printk("kvm-mmu-valid-gen %lx used_pages %x",
+		  __entry->mmu_valid_gen, __entry->mmu_used_pages
+	)
+);
+
+
 TRACE_EVENT(
 	check_mmio_spte,
 	TP_PROTO(u64 spte, unsigned int kvm_gen, unsigned int spte_gen),
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 06/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
                   ` (4 preceding siblings ...)
  2019-09-13  2:46 ` [PATCH 05/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages"" Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13  2:46 ` [PATCH 07/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap all pages"" Sean Christopherson
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Now that the fast invalidate mechanism has been reintroduced, restore
the performance tweaks for fast invalidation that existed prior to its
removal.

Paraphrashing the original changelog:

  Zap at least 10 shadow pages before releasing mmu_lock to reduce the
  overhead associated with re-acquiring the lock.

  Note: "10" is an arbitrary number, speculated to be high enough so
  that a vCPU isn't stuck zapping obsolete pages for an extended period,
  but small enough so that other vCPUs aren't starved waiting for
  mmu_lock.

This reverts commit 43d2b14b105fb00b8864c7b0ee7043cc1cc4a969.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu.c | 35 +++++++++--------------------------
 1 file changed, 9 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0bf20afc3e73..827414b12dbd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5670,12 +5670,12 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	return alloc_mmu_pages(vcpu);
 }
 
-
+#define BATCH_ZAP_PAGES	10
 static void kvm_zap_obsolete_pages(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
 	LIST_HEAD(invalid_list);
-	int ign;
+	int nr_zapped, batch = 0;
 
 restart:
 	list_for_each_entry_safe_reverse(sp, node,
@@ -5688,28 +5688,6 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 			break;
 
 		/*
-		 * Do not repeatedly zap a root page to avoid unnecessary
-		 * KVM_REQ_MMU_RELOAD, otherwise we may not be able to
-		 * progress:
-		 *    vcpu 0                        vcpu 1
-		 *                         call vcpu_enter_guest():
-		 *                            1): handle KVM_REQ_MMU_RELOAD
-		 *                                and require mmu-lock to
-		 *                                load mmu
-		 * repeat:
-		 *    1): zap root page and
-		 *        send KVM_REQ_MMU_RELOAD
-		 *
-		 *    2): if (cond_resched_lock(mmu-lock))
-		 *
-		 *                            2): hold mmu-lock and load mmu
-		 *
-		 *                            3): see KVM_REQ_MMU_RELOAD bit
-		 *                                on vcpu->requests is set
-		 *                                then return 1 to call
-		 *                                vcpu_enter_guest() again.
-		 *            goto repeat;
-		 *
 		 * Since we are reversely walking the list and the invalid
 		 * list will be moved to the head, skip the invalid page
 		 * can help us to avoid the infinity list walking.
@@ -5717,14 +5695,19 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 		if (sp->role.invalid)
 			continue;
 
-		if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+		if (batch >= BATCH_ZAP_PAGES &&
+		    (need_resched() || spin_needbreak(&kvm->mmu_lock))) {
+			batch = 0;
 			kvm_mmu_commit_zap_page(kvm, &invalid_list);
 			cond_resched_lock(&kvm->mmu_lock);
 			goto restart;
 		}
 
-		if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
+		if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list,
+					       &nr_zapped)) {
+			batch += nr_zapped;
 			goto restart;
+		}
 	}
 
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 07/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap all pages""
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
                   ` (5 preceding siblings ...)
  2019-09-13  2:46 ` [PATCH 06/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch"" Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13  2:46 ` [PATCH 08/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete page first"" Sean Christopherson
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Now that the fast invalidate mechanism has been reintroduced, restore
the performance tweaks for fast invalidation that existed prior to its
removal.

Paraphrashing the original changelog:

  Reload the mmu on all vCPUs after updating the generation number so
  that obsolete pages are not used by any vCPUs.  This allows collapsing
  all TLB flushes during obsolete page zapping into a single flush, as
  there is no need to flush when dropping mmu_lock (to reschedule).

  Note: a remote TLB flush is still needed before freeing the pages as
  other vCPUs may be doing a lockless shadow page walk.

Opportunstically improve the comments restored by the revert (the
code itself is a true revert).

This reverts commit f34d251d66ba263c077ed9d2bbd1874339a4c887.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 827414b12dbd..8c0648bbc7c1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5695,11 +5695,15 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 		if (sp->role.invalid)
 			continue;
 
+		/*
+		 * No need to flush the TLB since we're only zapping shadow
+		 * pages with an obsolete generation number and all vCPUS have
+		 * loaded a new root, i.e. the shadow pages being zapped cannot
+		 * be in active use by the guest.
+		 */
 		if (batch >= BATCH_ZAP_PAGES &&
-		    (need_resched() || spin_needbreak(&kvm->mmu_lock))) {
+		    cond_resched_lock(&kvm->mmu_lock)) {
 			batch = 0;
-			kvm_mmu_commit_zap_page(kvm, &invalid_list);
-			cond_resched_lock(&kvm->mmu_lock);
 			goto restart;
 		}
 
@@ -5710,6 +5714,11 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 		}
 	}
 
+	/*
+	 * Trigger a remote TLB flush before freeing the page tables to ensure
+	 * KVM is not in the middle of a lockless shadow page table walk, which
+	 * may reference the pages.
+	 */
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 }
 
@@ -5728,6 +5737,16 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	trace_kvm_mmu_zap_all_fast(kvm);
 	kvm->arch.mmu_valid_gen++;
 
+	/*
+	 * Notify all vcpus to reload its shadow page table and flush TLB.
+	 * Then all vcpus will switch to new shadow page table with the new
+	 * mmu_valid_gen.
+	 *
+	 * Note: we need to do this under the protection of mmu_lock,
+	 * otherwise, vcpu would purge shadow page but miss tlb flush.
+	 */
+	kvm_reload_remote_mmus(kvm);
+
 	kvm_zap_obsolete_pages(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 08/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete page first""
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
                   ` (6 preceding siblings ...)
  2019-09-13  2:46 ` [PATCH 07/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap all pages"" Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13  2:46 ` [PATCH 09/11] KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call" Sean Christopherson
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Now that the fast invalidate mechanism has been reintroduced, restore
the performance tweaks for fast invalidation that existed prior to its
removal.

Paraphrashing the original changelog:

  Introduce a per-VM list to track obsolete shadow pages, i.e. pages
  which have been deleted from the mmu cache but haven't yet been freed.
  When page reclaiming is needed, zap/free the deleted pages first.

This reverts commit 52d5dedc79bdcbac2976159a172069618cf31be5.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu.c              | 22 +++++++++++++++++-----
 arch/x86/kvm/x86.c              |  1 +
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ef378abac00f..6e4fa75351fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -862,6 +862,7 @@ struct kvm_arch {
 	 * Hash table of struct kvm_mmu_page.
 	 */
 	struct list_head active_mmu_pages;
+	struct list_head zapped_obsolete_pages;
 	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8c0648bbc7c1..84d916674529 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5674,7 +5674,6 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 static void kvm_zap_obsolete_pages(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
-	LIST_HEAD(invalid_list);
 	int nr_zapped, batch = 0;
 
 restart:
@@ -5707,8 +5706,8 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 			goto restart;
 		}
 
-		if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list,
-					       &nr_zapped)) {
+		if (__kvm_mmu_prepare_zap_page(kvm, sp,
+				&kvm->arch.zapped_obsolete_pages, &nr_zapped)) {
 			batch += nr_zapped;
 			goto restart;
 		}
@@ -5719,7 +5718,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 	 * KVM is not in the middle of a lockless shadow page table walk, which
 	 * may reference the pages.
 	 */
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
 }
 
 /*
@@ -5751,6 +5750,11 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	spin_unlock(&kvm->mmu_lock);
 }
 
+static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
+{
+	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
+}
+
 static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
 			struct kvm_memory_slot *slot,
 			struct kvm_page_track_notifier_node *node)
@@ -6021,16 +6025,24 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 		 * want to shrink a VM that only started to populate its MMU
 		 * anyway.
 		 */
-		if (!kvm->arch.n_used_mmu_pages)
+		if (!kvm->arch.n_used_mmu_pages &&
+		    !kvm_has_zapped_obsolete_pages(kvm))
 			continue;
 
 		idx = srcu_read_lock(&kvm->srcu);
 		spin_lock(&kvm->mmu_lock);
 
+		if (kvm_has_zapped_obsolete_pages(kvm)) {
+			kvm_mmu_commit_zap_page(kvm,
+			      &kvm->arch.zapped_obsolete_pages);
+			goto unlock;
+		}
+
 		if (prepare_zap_oldest_mmu_page(kvm, &invalid_list))
 			freed++;
 		kvm_mmu_commit_zap_page(kvm, &invalid_list);
 
+unlock:
 		spin_unlock(&kvm->mmu_lock);
 		srcu_read_unlock(&kvm->srcu, idx);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4cfd786d0b6..3d092b0f6bcb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9306,6 +9306,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
+	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
 	INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
 	atomic_set(&kvm->arch.noncoherent_dma_count, 0);
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 09/11] KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call"
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
                   ` (7 preceding siblings ...)
  2019-09-13  2:46 ` [PATCH 08/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete page first"" Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13  2:46 ` [PATCH 10/11] KVM: x86/mmu: Explicitly track only a single invalid mmu generation Sean Christopherson
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Now that the fast invalidate mechanism has been reintroduced, restore
the performance tweaks for fast invalidation that existed prior to its
removal.

Paraphrasing the original changelog (commit 5ff0568374ed2 was itself a
partial revert):

  Don't force reloading the remote mmu when zapping an obsolete page, as
  a MMU_RELOAD request has already been issued by kvm_mmu_zap_all_fast()
  immediately after incrementing mmu_valid_gen, i.e. after marking pages
  obsolete.

This reverts commit 5ff0568374ed2e585376a3832857ade5daccd381.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 84d916674529..bce19918ca5a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2752,7 +2752,12 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	} else {
 		list_move(&sp->link, &kvm->arch.active_mmu_pages);
 
-		if (!sp->role.invalid)
+		/*
+		 * Obsolete pages cannot be used on any vCPUs, see the comment
+		 * in kvm_mmu_zap_all_fast().  Note, is_obsolete_sp() also
+		 * treats invalid shadow pages as being obsolete.
+		 */
+		if (!is_obsolete_sp(kvm, sp))
 			kvm_reload_remote_mmus(kvm);
 	}
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 10/11] KVM: x86/mmu: Explicitly track only a single invalid mmu generation
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
                   ` (8 preceding siblings ...)
  2019-09-13  2:46 ` [PATCH 09/11] KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call" Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13  2:46 ` [PATCH 11/11] KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero Sean Christopherson
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Toggle mmu_valid_gen between '0' and '1' instead of blindly incrementing
the generation.  Because slots_lock is held for the entire duration of
zapping obsolete pages, it's impossible for there to be multiple invalid
generations associated with shadow pages at any given time.

Toggling between the two generations (valid vs. invalid) allows changing
mmu_valid_gen from an unsigned long to a u8, which reduces the size of
struct kvm_mmu_page from 160 to 152 bytes on 64-bit KVM, i.e. reduces
KVM's memory footprint by 8 bytes per shadow page.

Set sp->mmu_valid_gen before it is added to active_mmu_pages.
Functionally this has no effect as kvm_mmu_alloc_page() has a single
caller that sets sp->mmu_valid_gen soon thereafter, but visually it is
jarring to see a shadow page being added to the list without its
mmu_valid_gen first being set.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/mmu.c              | 14 ++++++++++++--
 arch/x86/kvm/mmutrace.h         | 16 ++++++++--------
 3 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6e4fa75351fd..8912b04d4ae1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -320,6 +320,7 @@ struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
 	bool unsync;
+	u8 mmu_valid_gen;
 
 	/*
 	 * The following two entries are used to key the shadow page in the
@@ -334,7 +335,6 @@ struct kvm_mmu_page {
 	int root_count;          /* Currently serving as active root */
 	unsigned int unsync_children;
 	struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
-	unsigned long mmu_valid_gen;
 	DECLARE_BITMAP(unsync_child_bitmap, 512);
 
 #ifdef CONFIG_X86_32
@@ -856,7 +856,7 @@ struct kvm_arch {
 	unsigned long n_requested_mmu_pages;
 	unsigned long n_max_mmu_pages;
 	unsigned int indirect_shadow_pages;
-	unsigned long mmu_valid_gen;
+	u8 mmu_valid_gen;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 	/*
 	 * Hash table of struct kvm_mmu_page.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index bce19918ca5a..a7b14750cde9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2101,6 +2101,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
 	 * depends on valid pages being added to the head of the list.  See
 	 * comments in kvm_zap_obsolete_pages().
 	 */
+	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
 	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
 	return sp;
@@ -2537,7 +2538,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (level > PT_PAGE_TABLE_LEVEL && need_sync)
 			flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
 	}
-	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
 	clear_page(sp->spt);
 	trace_kvm_mmu_get_page(sp, true);
 
@@ -5737,9 +5737,19 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
  */
 static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 {
+	lockdep_assert_held(&kvm->slots_lock);
+
 	spin_lock(&kvm->mmu_lock);
 	trace_kvm_mmu_zap_all_fast(kvm);
-	kvm->arch.mmu_valid_gen++;
+
+	/*
+	 * Toggle mmu_valid_gen between '0' and '1'.  Because slots_lock is
+	 * held for the entire duration of zapping obsolete pages, it's
+	 * impossible for there to be multiple invalid generations associated
+	 * with *valid* shadow pages at any given time, i.e. there is exactly
+	 * one valid generation and (at most) one invalid generation.
+	 */
+	kvm->arch.mmu_valid_gen = kvm->arch.mmu_valid_gen ? 0 : 1;
 
 	/*
 	 * Notify all vcpus to reload its shadow page table and flush TLB.
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 1a063ba76281..7ca8831c7d1a 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -8,11 +8,11 @@
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvmmmu
 
-#define KVM_MMU_PAGE_FIELDS			\
-	__field(unsigned long, mmu_valid_gen)	\
-	__field(__u64, gfn)			\
-	__field(__u32, role)			\
-	__field(__u32, root_count)		\
+#define KVM_MMU_PAGE_FIELDS		\
+	__field(__u8, mmu_valid_gen)	\
+	__field(__u64, gfn)		\
+	__field(__u32, role)		\
+	__field(__u32, root_count)	\
 	__field(bool, unsync)
 
 #define KVM_MMU_PAGE_ASSIGN(sp)				\
@@ -31,7 +31,7 @@
 								        \
 	role.word = __entry->role;					\
 									\
-	trace_seq_printf(p, "sp gen %lx gfn %llx l%u %u-byte q%u%s %s%s"\
+	trace_seq_printf(p, "sp gen %u gfn %llx l%u %u-byte q%u%s %s%s"	\
 			 " %snxe %sad root %u %s%c",			\
 			 __entry->mmu_valid_gen,			\
 			 __entry->gfn, role.level,			\
@@ -288,7 +288,7 @@ TRACE_EVENT(
 	TP_ARGS(kvm),
 
 	TP_STRUCT__entry(
-		__field(unsigned long, mmu_valid_gen)
+		__field(__u8, mmu_valid_gen)
 		__field(unsigned int, mmu_used_pages)
 	),
 
@@ -297,7 +297,7 @@ TRACE_EVENT(
 		__entry->mmu_used_pages = kvm->arch.n_used_mmu_pages;
 	),
 
-	TP_printk("kvm-mmu-valid-gen %lx used_pages %x",
+	TP_printk("kvm-mmu-valid-gen %u used_pages %x",
 		  __entry->mmu_valid_gen, __entry->mmu_used_pages
 	)
 );
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 11/11] KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
                   ` (9 preceding siblings ...)
  2019-09-13  2:46 ` [PATCH 10/11] KVM: x86/mmu: Explicitly track only a single invalid mmu generation Sean Christopherson
@ 2019-09-13  2:46 ` Sean Christopherson
  2019-09-13 22:11 ` [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Paolo Bonzini
  2019-09-16 17:07 ` Alex Williamson
  12 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2019-09-13  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, James Harvey, Alex Willamson

Do not skip invalid shadow pages when zapping obsolete pages if the
pages' root_count has reached zero, in which case the page can be
immediately zapped and freed.

Update the comment accordingly.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a7b14750cde9..5e41b1f77a6d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5692,11 +5692,12 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 			break;
 
 		/*
-		 * Since we are reversely walking the list and the invalid
-		 * list will be moved to the head, skip the invalid page
-		 * can help us to avoid the infinity list walking.
+		 * Skip invalid pages with a non-zero root count, zapping pages
+		 * with a non-zero root count will never succeed, i.e. the page
+		 * will get thrown back on active_mmu_pages and we'll get stuck
+		 * in an infinite loop.
 		 */
-		if (sp->role.invalid)
+		if (sp->role.invalid && sp->root_count)
 			continue;
 
 		/*
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
                   ` (10 preceding siblings ...)
  2019-09-13  2:46 ` [PATCH 11/11] KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero Sean Christopherson
@ 2019-09-13 22:11 ` Paolo Bonzini
  2019-09-16 17:07 ` Alex Williamson
  12 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2019-09-13 22:11 UTC (permalink / raw)
  To: Sean Christopherson, Radim Krčmář
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, James Harvey, Alex Willamson

On 13/09/19 04:46, Sean Christopherson wrote:
> Restore the fast invalidate flow for zapping shadow pages and use it
> whenever vCPUs can be active in the VM.  This fixes (in theory, not yet
> confirmed) a regression reported by James Harvey where KVM can livelock
> in kvm_mmu_zap_all() when it's invoked in response to a memslot update.
> 
> The fast invalidate flow was removed as it was deemed to be unnecessary
> after its primary user, memslot flushing, was reworked to zap only the
> memslot in question instead of all shadow pages.  Unfortunately, zapping
> only the memslot being (re)moved during a memslot update introduced a
> regression for VMs with assigned devices.  Because we could not discern
> why zapping only the relevant memslot broke device assignment, or if the
> regression extended beyond device assignment, we reverted to zapping all
> shadow pages when a memslot is (re)moved.
> 
> The revert to "zap all" failed to account for subsequent changes that
> have been made to kvm_mmu_zap_all() between then and now.  Specifically,
> kvm_mmu_zap_all() now conditionally drops reschedules and drops mmu_lock
> if a reschedule is needed or if the lock is contended.  Dropping the lock
> allows other vCPUs to add shadow pages, and, with enough vCPUs, can cause
> kvm_mmu_zap_all() to get stuck in an infinite loop as it can never zap all
> pages before observing lock contention or the need to reschedule.
> 
> The reasoning behind having kvm_mmu_zap_all() conditionally reschedule was
> that it would only be used when the VM is inaccesible, e.g. when its
> mm_struct is dying or when the VM itself is being destroyed.  In that case,
> playing nice with the rest of the kernel instead of hogging cycles to free
> unused shadow pages made sense.
> 
> Since it's unlikely we'll root cause the device assignment regression any
> time soon, and that simply removing the conditional rescheduling isn't
> guaranteed to return us to a known good state, restore the fast invalidate
> flow for zapping on memslot updates, including mmio generation wraparound.
> Opportunisticaly tack on a bug fix and a couple enhancements.
> 
> Alex and James, it probably goes without saying... please test, especially
> patch 01/11 as a standalone patch as that'll likely need to be applied to
> stable branches, assuming it works.  Thanks!
> 
> Sean Christopherson (11):
>   KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
>   KVM: x86/mmu: Treat invalid shadow pages as obsolete
>   KVM: x86/mmu: Use fast invalidate mechanism to zap MMIO sptes
>   KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow
>     page related tracepoints""
>   KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for
>     kvm_mmu_invalidate_all_pages""
>   KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""
>   KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap
>     all pages""
>   KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete
>     page first""
>   KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call"
>   KVM: x86/mmu: Explicitly track only a single invalid mmu generation
>   KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero
> 
>  arch/x86/include/asm/kvm_host.h |   4 +-
>  arch/x86/kvm/mmu.c              | 154 ++++++++++++++++++++++++++++----
>  arch/x86/kvm/mmutrace.h         |  42 +++++++--
>  arch/x86/kvm/x86.c              |   1 +
>  4 files changed, 173 insertions(+), 28 deletions(-)
> 

Thanks, I'm testing patch 1 and should send a pull request to Linus
tomorrow morning as soon as I get the results.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow
  2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
                   ` (11 preceding siblings ...)
  2019-09-13 22:11 ` [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Paolo Bonzini
@ 2019-09-16 17:07 ` Alex Williamson
  12 siblings, 0 replies; 14+ messages in thread
From: Alex Williamson @ 2019-09-16 17:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, James Harvey

On Thu, 12 Sep 2019 19:46:01 -0700
Sean Christopherson <sean.j.christopherson@intel.com> wrote:

> Restore the fast invalidate flow for zapping shadow pages and use it
> whenever vCPUs can be active in the VM.  This fixes (in theory, not yet
> confirmed) a regression reported by James Harvey where KVM can livelock
> in kvm_mmu_zap_all() when it's invoked in response to a memslot update.
> 
> The fast invalidate flow was removed as it was deemed to be unnecessary
> after its primary user, memslot flushing, was reworked to zap only the
> memslot in question instead of all shadow pages.  Unfortunately, zapping
> only the memslot being (re)moved during a memslot update introduced a
> regression for VMs with assigned devices.  Because we could not discern
> why zapping only the relevant memslot broke device assignment, or if the
> regression extended beyond device assignment, we reverted to zapping all
> shadow pages when a memslot is (re)moved.
> 
> The revert to "zap all" failed to account for subsequent changes that
> have been made to kvm_mmu_zap_all() between then and now.  Specifically,
> kvm_mmu_zap_all() now conditionally drops reschedules and drops mmu_lock
> if a reschedule is needed or if the lock is contended.  Dropping the lock
> allows other vCPUs to add shadow pages, and, with enough vCPUs, can cause
> kvm_mmu_zap_all() to get stuck in an infinite loop as it can never zap all
> pages before observing lock contention or the need to reschedule.
> 
> The reasoning behind having kvm_mmu_zap_all() conditionally reschedule was
> that it would only be used when the VM is inaccesible, e.g. when its
> mm_struct is dying or when the VM itself is being destroyed.  In that case,
> playing nice with the rest of the kernel instead of hogging cycles to free
> unused shadow pages made sense.
> 
> Since it's unlikely we'll root cause the device assignment regression any
> time soon, and that simply removing the conditional rescheduling isn't
> guaranteed to return us to a known good state, restore the fast invalidate
> flow for zapping on memslot updates, including mmio generation wraparound.
> Opportunisticaly tack on a bug fix and a couple enhancements.
> 
> Alex and James, it probably goes without saying... please test, especially
> patch 01/11 as a standalone patch as that'll likely need to be applied to
> stable branches, assuming it works.  Thanks!

It looks like Paolo already included patch 01/11 in v5.3, I tested that
and it behaves ok for the GPU assignment windows issue.  I applied the
remaining 10 patches on v5.3 and tested those separately.  They also
behave well for this test case.

Tested-by: Alex Williamson <alex.williamson@redhat.com>

Thanks,
Alex 

> 
> Sean Christopherson (11):
>   KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
>   KVM: x86/mmu: Treat invalid shadow pages as obsolete
>   KVM: x86/mmu: Use fast invalidate mechanism to zap MMIO sptes
>   KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow
>     page related tracepoints""
>   KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for
>     kvm_mmu_invalidate_all_pages""
>   KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""
>   KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap
>     all pages""
>   KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete
>     page first""
>   KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call"
>   KVM: x86/mmu: Explicitly track only a single invalid mmu generation
>   KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero
> 
>  arch/x86/include/asm/kvm_host.h |   4 +-
>  arch/x86/kvm/mmu.c              | 154 ++++++++++++++++++++++++++++----
>  arch/x86/kvm/mmutrace.h         |  42 +++++++--
>  arch/x86/kvm/x86.c              |   1 +
>  4 files changed, 173 insertions(+), 28 deletions(-)
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-09-16 17:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13  2:46 [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Sean Christopherson
2019-09-13  2:46 ` [PATCH 01/11] KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot Sean Christopherson
2019-09-13  2:46 ` [PATCH 02/11] KVM: x86/mmu: Treat invalid shadow pages as obsolete Sean Christopherson
2019-09-13  2:46 ` [PATCH 03/11] KVM: x86/mmu: Use fast invalidate mechanism to zap MMIO sptes Sean Christopherson
2019-09-13  2:46 ` [PATCH 04/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow page related tracepoints"" Sean Christopherson
2019-09-13  2:46 ` [PATCH 05/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages"" Sean Christopherson
2019-09-13  2:46 ` [PATCH 06/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch"" Sean Christopherson
2019-09-13  2:46 ` [PATCH 07/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap all pages"" Sean Christopherson
2019-09-13  2:46 ` [PATCH 08/11] KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete page first"" Sean Christopherson
2019-09-13  2:46 ` [PATCH 09/11] KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call" Sean Christopherson
2019-09-13  2:46 ` [PATCH 10/11] KVM: x86/mmu: Explicitly track only a single invalid mmu generation Sean Christopherson
2019-09-13  2:46 ` [PATCH 11/11] KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero Sean Christopherson
2019-09-13 22:11 ` [PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow Paolo Bonzini
2019-09-16 17:07 ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).