All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups
@ 2021-09-18  0:56 Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 01/10] KVM: X86: Fix missed remote tlb flush in rmap_write_protect() Lai Jiangshan
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

The first two patches fix two old possible bugs.
And the others are just cleanups.

Changed from [V1]:
	The two fixes are changed as Sean suggested.
	And it triggers a different cleanup as patch3-6.
	Patch 7(V1's patch 3) is also updated as Sean suggested.
	Patch 8-10 which are not related to the fixes are unchanged.
	V1's patch 7 is dropped.
	
[V1]: https://lore.kernel.org/lkml/20210824075524.3354-1-jiangshanlai@gmail.com/

Lai Jiangshan (10):
  KVM: X86: Fix missed remote tlb flush in rmap_write_protect()
  KVM: X86: Synchronize the shadow pagetable before link it
  KVM: Remove tlbs_dirty
  KVM: X86: Don't flush current tlb on shadow page modification
  KVM: X86: Remove kvm_mmu_flush_or_zap()
  KVM: X86: Change kvm_sync_page() to return true when remote flush is
    needed
  KVM: X86: Zap the invalid list after remote tlb flushing
  KVM: X86: Remove FNAME(update_pte)
  KVM: X86: Don't unsync pagetables when speculative
  KVM: X86: Don't check unsync if the original spte is writible

 arch/x86/kvm/mmu/mmu.c          | 61 ++++++++++++---------------
 arch/x86/kvm/mmu/mmu_internal.h |  3 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 74 +++++++++++++++------------------
 arch/x86/kvm/mmu/spte.c         |  6 +--
 arch/x86/kvm/mmu/tdp_mmu.c      |  1 -
 include/linux/kvm_host.h        |  1 -
 virt/kvm/kvm_main.c             |  9 +---
 7 files changed, 66 insertions(+), 89 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V2 01/10] KVM: X86: Fix missed remote tlb flush in rmap_write_protect()
  2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
@ 2021-09-18  0:56 ` Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 02/10] KVM: X86: Synchronize the shadow pagetable before link it Lai Jiangshan
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Xiao Guangrong, Marcelo Tosatti, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

When kvm->tlbs_dirty > 0, some rmaps might have been deleted
without flushing tlb remotely after kvm_sync_page().  If @gfn
was writable before and it's rmaps was deleted in kvm_sync_page(),
and if the tlb entry is still in a remote running VCPU,  the @gfn
is not safely protected.

To fix the problem, kvm_sync_page() does the remote flush when
needed to avoid the problem.

Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
Changed from V1:
	force remote flush timely instead of increasing tlbs_dirty.

 arch/x86/kvm/mmu/paging_tmpl.h | 23 ++---------------------
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 72f358613786..5962d4f8a72e 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1038,14 +1038,6 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gpa_t vaddr,
  * Using the cached information from sp->gfns is safe because:
  * - The spte has a reference to the struct page, so the pfn for a given gfn
  *   can't change unless all sptes pointing to it are nuked first.
- *
- * Note:
- *   We should flush all tlbs if spte is dropped even though guest is
- *   responsible for it. Since if we don't, kvm_mmu_notifier_invalidate_page
- *   and kvm_mmu_notifier_invalidate_range_start detect the mapping page isn't
- *   used by guest then tlbs are not flushed, so guest is allowed to access the
- *   freed pages.
- *   And we increase kvm->tlbs_dirty to delay tlbs flush in this case.
  */
 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
@@ -1098,13 +1090,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 			return 0;
 
 		if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
-			/*
-			 * Update spte before increasing tlbs_dirty to make
-			 * sure no tlb flush is lost after spte is zapped; see
-			 * the comments in kvm_flush_remote_tlbs().
-			 */
-			smp_wmb();
-			vcpu->kvm->tlbs_dirty++;
+			set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
 			continue;
 		}
 
@@ -1119,12 +1105,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 
 		if (gfn != sp->gfns[i]) {
 			drop_spte(vcpu->kvm, &sp->spt[i]);
-			/*
-			 * The same as above where we are doing
-			 * prefetch_invalid_gpte().
-			 */
-			smp_wmb();
-			vcpu->kvm->tlbs_dirty++;
+			set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
 			continue;
 		}
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 02/10] KVM: X86: Synchronize the shadow pagetable before link it
  2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 01/10] KVM: X86: Fix missed remote tlb flush in rmap_write_protect() Lai Jiangshan
@ 2021-09-18  0:56 ` Lai Jiangshan
  2021-09-23 14:36   ` Paolo Bonzini
  2021-09-18  0:56 ` [PATCH V2 03/10] KVM: Remove tlbs_dirty Lai Jiangshan
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Marcelo Tosatti, Avi Kivity, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

If gpte is changed from non-present to present, the guest doesn't need
to flush tlb per SDM.  So the host must synchronze sp before
link it.  Otherwise the guest might use a wrong mapping.

For example: the guest first changes a level-1 pagetable, and then
links its parent to a new place where the original gpte is non-present.
Finally the guest can access the remapped area without flushing
the tlb.  The guest's behavior should be allowed per SDM, but the host
kvm mmu makes it wrong.

Fixes: 4731d4c7a077 ("KVM: MMU: out of sync shadow core")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
Changed from V1:
	Don't loop, but just return when it needs to break.

 arch/x86/kvm/mmu/mmu.c         | 15 ++++++++-------
 arch/x86/kvm/mmu/paging_tmpl.h | 22 ++++++++++++++++++++++
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 26f6bd238a77..3c1b069a7bcf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2024,8 +2024,8 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 	} while (!sp->unsync_children);
 }
 
-static void mmu_sync_children(struct kvm_vcpu *vcpu,
-			      struct kvm_mmu_page *parent)
+static int mmu_sync_children(struct kvm_vcpu *vcpu,
+			     struct kvm_mmu_page *parent, bool can_yield)
 {
 	int i;
 	struct kvm_mmu_page *sp;
@@ -2052,12 +2052,16 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 		}
 		if (need_resched() || rwlock_needbreak(&vcpu->kvm->mmu_lock)) {
 			kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
+			if (!can_yield)
+				return -EINTR;
+
 			cond_resched_rwlock_write(&vcpu->kvm->mmu_lock);
 			flush = false;
 		}
 	}
 
 	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
+	return 0;
 }
 
 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
@@ -2143,9 +2147,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 		}
 
-		if (sp->unsync_children)
-			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-
 		__clear_sp_write_flooding_count(sp);
 
 trace_get_page:
@@ -3642,7 +3643,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 		write_lock(&vcpu->kvm->mmu_lock);
 		kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
 
-		mmu_sync_children(vcpu, sp);
+		mmu_sync_children(vcpu, sp, true);
 
 		kvm_mmu_audit(vcpu, AUDIT_POST_SYNC);
 		write_unlock(&vcpu->kvm->mmu_lock);
@@ -3658,7 +3659,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 		if (IS_VALID_PAE_ROOT(root)) {
 			root &= PT64_BASE_ADDR_MASK;
 			sp = to_shadow_page(root);
-			mmu_sync_children(vcpu, sp);
+			mmu_sync_children(vcpu, sp, true);
 		}
 	}
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 5962d4f8a72e..87374cfd82be 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -704,6 +704,28 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			access = gw->pt_access[it.level - 2];
 			sp = kvm_mmu_get_page(vcpu, table_gfn, fault->addr,
 					      it.level-1, false, access);
+			/*
+			 * We must synchronize the pagetable before link it
+			 * because the guest doens't need to flush tlb when
+			 * gpte is changed from non-present to present.
+			 * Otherwise, the guest may use the wrong mapping.
+			 *
+			 * For PG_LEVEL_4K, kvm_mmu_get_page() has already
+			 * synchronized it transiently via kvm_sync_page().
+			 *
+			 * For higher level pagetable, we synchronize it
+			 * via slower mmu_sync_children().  If it needs to
+			 * break, returns RET_PF_RETRY and will retry on
+			 * next #PF.  It had already made some progress.
+			 *
+			 * It also makes KVM_REQ_MMU_SYNC request if the @sp
+			 * is linked on a different addr to expedite it.
+			 */
+			if (sp->unsync_children &&
+			    mmu_sync_children(vcpu, sp, false)) {
+				kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+				return RET_PF_RETRY;
+			}
 		}
 
 		/*
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 03/10] KVM: Remove tlbs_dirty
  2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 01/10] KVM: X86: Fix missed remote tlb flush in rmap_write_protect() Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 02/10] KVM: X86: Synchronize the shadow pagetable before link it Lai Jiangshan
@ 2021-09-18  0:56 ` Lai Jiangshan
  2021-09-23 15:23   ` Paolo Bonzini
  2021-09-18  0:56 ` [PATCH V2 04/10] KVM: X86: Don't flush current tlb on shadow page modification Lai Jiangshan
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Lai Jiangshan, Paolo Bonzini, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

There is no user of tlbs_dirty.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 include/linux/kvm_host.h | 1 -
 virt/kvm/kvm_main.c      | 9 +--------
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e4d712e9f760..3b7846cd0637 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -608,7 +608,6 @@ struct kvm {
 	unsigned long mmu_notifier_range_start;
 	unsigned long mmu_notifier_range_end;
 #endif
-	long tlbs_dirty;
 	struct list_head devices;
 	u64 manual_dirty_log_protect;
 	struct dentry *debugfs_dentry;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3e67c93ca403..6d6be42ec78d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -312,12 +312,6 @@ EXPORT_SYMBOL_GPL(kvm_make_all_cpus_request);
 #ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
-	/*
-	 * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
-	 * kvm_make_all_cpus_request.
-	 */
-	long dirty_count = smp_load_acquire(&kvm->tlbs_dirty);
-
 	/*
 	 * We want to publish modifications to the page tables before reading
 	 * mode. Pairs with a memory barrier in arch-specific code.
@@ -332,7 +326,6 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 	if (!kvm_arch_flush_remote_tlb(kvm)
 	    || kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
 		++kvm->stat.generic.remote_tlb_flush;
-	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 #endif
@@ -537,7 +530,7 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
 		}
 	}
 
-	if (range->flush_on_ret && (ret || kvm->tlbs_dirty))
+	if (range->flush_on_ret && ret)
 		kvm_flush_remote_tlbs(kvm);
 
 	if (locked)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 04/10] KVM: X86: Don't flush current tlb on shadow page modification
  2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
                   ` (2 preceding siblings ...)
  2021-09-18  0:56 ` [PATCH V2 03/10] KVM: Remove tlbs_dirty Lai Jiangshan
@ 2021-09-18  0:56 ` Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 05/10] KVM: X86: Remove kvm_mmu_flush_or_zap() Lai Jiangshan
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

After any shadow page modification, flushing tlb only on current VCPU
is weird due to other VCPU's tlb might still be stale.

In other words, if there is any mandatory tlb-flushing after shadow page
modification, SET_SPTE_NEED_REMOTE_TLB_FLUSH or remote_flush should be
set and the tlbs of all VCPUs should be flushed.  There is not point to
only flush current tlb except when the request is from vCPU's or pCPU's
activities.

If there was any bug that mandatory tlb-flushing is required and
SET_SPTE_NEED_REMOTE_TLB_FLUSH/remote_flush is failed to set, this patch
would expose the bug in a more destructive way.  The related code paths
are checked and no missing SET_SPTE_NEED_REMOTE_TLB_FLUSH is found yet.

Currently, there is no optional tlb-flushing after sync page related code
is changed to flush tlb timely.  So we can just remove these local flushing
code.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/mmu/mmu.c     | 5 -----
 arch/x86/kvm/mmu/tdp_mmu.c | 1 -
 2 files changed, 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3c1b069a7bcf..f40087ee2704 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1934,9 +1934,6 @@ static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
 {
 	if (kvm_mmu_remote_flush_or_zap(vcpu->kvm, invalid_list, remote_flush))
 		return;
-
-	if (local_flush)
-		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
 
 #ifdef CONFIG_KVM_MMU_AUDIT
@@ -2144,7 +2141,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 				break;
 
 			WARN_ON(!list_empty(&invalid_list));
-			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 		}
 
 		__clear_sp_write_flooding_count(sp);
@@ -2751,7 +2747,6 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (set_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
 		if (write_fault)
 			ret = RET_PF_EMULATE;
-		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 	}
 
 	if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 548393c2bfe9..d5339fee6f2d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -958,7 +958,6 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 	if (make_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
 		if (fault->write)
 			ret = RET_PF_EMULATE;
-		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 	}
 
 	/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 05/10] KVM: X86: Remove kvm_mmu_flush_or_zap()
  2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
                   ` (3 preceding siblings ...)
  2021-09-18  0:56 ` [PATCH V2 04/10] KVM: X86: Don't flush current tlb on shadow page modification Lai Jiangshan
@ 2021-09-18  0:56 ` Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 06/10] KVM: X86: Change kvm_sync_page() to return true when remote flush is needed Lai Jiangshan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

Because local_flush is useless, kvm_mmu_flush_or_zap() can be removed
and kvm_mmu_remote_flush_or_zap is used instead.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/mmu/mmu.c | 26 ++++++--------------------
 1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f40087ee2704..9aba5d93a747 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1928,14 +1928,6 @@ static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm,
 	return true;
 }
 
-static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
-				 struct list_head *invalid_list,
-				 bool remote_flush, bool local_flush)
-{
-	if (kvm_mmu_remote_flush_or_zap(vcpu->kvm, invalid_list, remote_flush))
-		return;
-}
-
 #ifdef CONFIG_KVM_MMU_AUDIT
 #include "mmu_audit.c"
 #else
@@ -2029,7 +2021,6 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
 	struct mmu_page_path parents;
 	struct kvm_mmu_pages pages;
 	LIST_HEAD(invalid_list);
-	bool flush = false;
 
 	while (mmu_unsync_walk(parent, &pages)) {
 		bool protected = false;
@@ -2039,25 +2030,23 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
 
 		if (protected) {
 			kvm_flush_remote_tlbs(vcpu->kvm);
-			flush = false;
 		}
 
 		for_each_sp(pages, sp, parents, i) {
 			kvm_unlink_unsync_page(vcpu->kvm, sp);
-			flush |= kvm_sync_page(vcpu, sp, &invalid_list);
+			kvm_sync_page(vcpu, sp, &invalid_list);
 			mmu_pages_clear_parents(&parents);
 		}
 		if (need_resched() || rwlock_needbreak(&vcpu->kvm->mmu_lock)) {
-			kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
+			kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, false);
 			if (!can_yield)
 				return -EINTR;
 
 			cond_resched_rwlock_write(&vcpu->kvm->mmu_lock);
-			flush = false;
 		}
 	}
 
-	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
+	kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, false);
 	return 0;
 }
 
@@ -5146,7 +5135,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	LIST_HEAD(invalid_list);
 	u64 entry, gentry, *spte;
 	int npte;
-	bool remote_flush, local_flush;
+	bool flush = false;
 
 	/*
 	 * If we don't have indirect shadow pages, it means no page is
@@ -5155,8 +5144,6 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	if (!READ_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
 		return;
 
-	remote_flush = local_flush = false;
-
 	pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
 
 	/*
@@ -5185,18 +5172,17 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		if (!spte)
 			continue;
 
-		local_flush = true;
 		while (npte--) {
 			entry = *spte;
 			mmu_page_zap_pte(vcpu->kvm, sp, spte, NULL);
 			if (gentry && sp->role.level != PG_LEVEL_4K)
 				++vcpu->kvm->stat.mmu_pde_zapped;
 			if (need_remote_flush(entry, *spte))
-				remote_flush = true;
+				flush = true;
 			++spte;
 		}
 	}
-	kvm_mmu_flush_or_zap(vcpu, &invalid_list, remote_flush, local_flush);
+	kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE);
 	write_unlock(&vcpu->kvm->mmu_lock);
 }
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 06/10] KVM: X86: Change kvm_sync_page() to return true when remote flush is needed
  2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
                   ` (4 preceding siblings ...)
  2021-09-18  0:56 ` [PATCH V2 05/10] KVM: X86: Remove kvm_mmu_flush_or_zap() Lai Jiangshan
@ 2021-09-18  0:56 ` Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 07/10] KVM: X86: Zap the invalid list after remote tlb flushing Lai Jiangshan
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

Currently kvm_sync_page() returns true when there is any present spte.
But the return value is ignored in the callers.

Changing kvm_sync_page() to return true when remote flush is needed and
changing mmu->sync_page() not to directly flush can combine and reduce
remote flush requests.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/mmu/mmu.c         | 21 +++++++++++++--------
 arch/x86/kvm/mmu/paging_tmpl.h | 21 ++++++++++-----------
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9aba5d93a747..2f3f47dc96b0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1792,7 +1792,7 @@ static void mark_unsync(u64 *spte)
 static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
 			       struct kvm_mmu_page *sp)
 {
-	return 0;
+	return -1;
 }
 
 #define KVM_PAGE_ARRAY_NR 16
@@ -1906,12 +1906,14 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			 struct list_head *invalid_list)
 {
-	if (vcpu->arch.mmu->sync_page(vcpu, sp) == 0) {
+	int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
+
+	if (ret < 0) {
 		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
 		return false;
 	}
 
-	return true;
+	return !!ret;
 }
 
 static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm,
@@ -2021,6 +2023,7 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
 	struct mmu_page_path parents;
 	struct kvm_mmu_pages pages;
 	LIST_HEAD(invalid_list);
+	bool flush = false;
 
 	while (mmu_unsync_walk(parent, &pages)) {
 		bool protected = false;
@@ -2030,23 +2033,25 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
 
 		if (protected) {
 			kvm_flush_remote_tlbs(vcpu->kvm);
+			flush = false;
 		}
 
 		for_each_sp(pages, sp, parents, i) {
 			kvm_unlink_unsync_page(vcpu->kvm, sp);
-			kvm_sync_page(vcpu, sp, &invalid_list);
+			flush |= kvm_sync_page(vcpu, sp, &invalid_list);
 			mmu_pages_clear_parents(&parents);
 		}
 		if (need_resched() || rwlock_needbreak(&vcpu->kvm->mmu_lock)) {
-			kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, false);
+			kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush);
 			if (!can_yield)
 				return -EINTR;
 
 			cond_resched_rwlock_write(&vcpu->kvm->mmu_lock);
+			flush = false;
 		}
 	}
 
-	kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, false);
+	kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush);
 	return 0;
 }
 
@@ -2130,6 +2135,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 				break;
 
 			WARN_ON(!list_empty(&invalid_list));
+			kvm_flush_remote_tlbs(vcpu->kvm);
 		}
 
 		__clear_sp_write_flooding_count(sp);
@@ -4128,7 +4134,7 @@ static unsigned long get_cr3(struct kvm_vcpu *vcpu)
 }
 
 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
-			   unsigned int access, int *nr_present)
+			   unsigned int access)
 {
 	if (unlikely(is_mmio_spte(*sptep))) {
 		if (gfn != get_mmio_spte_gfn(*sptep)) {
@@ -4136,7 +4142,6 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 			return true;
 		}
 
-		(*nr_present)++;
 		mark_mmio_spte(vcpu, sptep, gfn, access);
 		return true;
 	}
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 87374cfd82be..c3edbc0f06b3 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1060,11 +1060,16 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gpa_t vaddr,
  * Using the cached information from sp->gfns is safe because:
  * - The spte has a reference to the struct page, so the pfn for a given gfn
  *   can't change unless all sptes pointing to it are nuked first.
+ *
+ * Returns
+ * < 0: the sp should be zapped
+ *   0: the sp is synced and no tlb flushing is required
+ * > 0: the sp is synced and tlb flushing is required
  */
 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
 	union kvm_mmu_page_role mmu_role = vcpu->arch.mmu->mmu_role.base;
-	int i, nr_present = 0;
+	int i;
 	bool host_writable;
 	gpa_t first_pte_gpa;
 	int set_spte_ret = 0;
@@ -1092,7 +1097,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	 */
 	if (WARN_ON_ONCE(sp->role.direct ||
 			 (sp->role.word ^ mmu_role.word) & ~sync_role_ign.word))
-		return 0;
+		return -1;
 
 	first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
 
@@ -1109,7 +1114,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 
 		if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
 					       sizeof(pt_element_t)))
-			return 0;
+			return -1;
 
 		if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
 			set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
@@ -1121,8 +1126,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		pte_access &= FNAME(gpte_access)(gpte);
 		FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 
-		if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access,
-		      &nr_present))
+		if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
 			continue;
 
 		if (gfn != sp->gfns[i]) {
@@ -1131,8 +1135,6 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 			continue;
 		}
 
-		nr_present++;
-
 		host_writable = sp->spt[i] & shadow_host_writable_mask;
 
 		set_spte_ret |= set_spte(vcpu, &sp->spt[i],
@@ -1141,10 +1143,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 					 true, false, host_writable);
 	}
 
-	if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH)
-		kvm_flush_remote_tlbs(vcpu->kvm);
-
-	return nr_present;
+	return set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH;
 }
 
 #undef pt_element_t
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 07/10] KVM: X86: Zap the invalid list after remote tlb flushing
  2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
                   ` (5 preceding siblings ...)
  2021-09-18  0:56 ` [PATCH V2 06/10] KVM: X86: Change kvm_sync_page() to return true when remote flush is needed Lai Jiangshan
@ 2021-09-18  0:56 ` Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 08/10] KVM: X86: Remove FNAME(update_pte) Lai Jiangshan
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

In mmu_sync_children(), it can zap the invalid list after remote tlb flushing.
Emptifying the invalid list ASAP might help reduce a remote tlb flushing
in some cases.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2f3f47dc96b0..ff52e8354148 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2032,7 +2032,7 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
 			protected |= rmap_write_protect(vcpu, sp->gfn);
 
 		if (protected) {
-			kvm_flush_remote_tlbs(vcpu->kvm);
+			kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, true);
 			flush = false;
 		}
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 08/10] KVM: X86: Remove FNAME(update_pte)
  2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
                   ` (6 preceding siblings ...)
  2021-09-18  0:56 ` [PATCH V2 07/10] KVM: X86: Zap the invalid list after remote tlb flushing Lai Jiangshan
@ 2021-09-18  0:56 ` Lai Jiangshan
  2021-09-18  1:49   ` Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 09/10] KVM: X86: Don't unsync pagetables when speculative Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 10/10] KVM: X86: Don't check unsync if the original spte is writible Lai Jiangshan
  9 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

Its solo caller is changed to use FNAME(prefetch_gpte) directly.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index c3edbc0f06b3..ca5fdd07cfa2 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -589,14 +589,6 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	return true;
 }
 
-static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			      u64 *spte, const void *pte)
-{
-	pt_element_t gpte = *(const pt_element_t *)pte;
-
-	FNAME(prefetch_gpte)(vcpu, sp, spte, gpte, false);
-}
-
 static bool FNAME(gpte_changed)(struct kvm_vcpu *vcpu,
 				struct guest_walker *gw, int level)
 {
@@ -1001,7 +993,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 						       sizeof(pt_element_t)))
 				break;
 
-			FNAME(update_pte)(vcpu, sp, sptep, &gpte);
+			FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
 		}
 
 		if (!sp->unsync_children)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 09/10] KVM: X86: Don't unsync pagetables when speculative
  2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
                   ` (7 preceding siblings ...)
  2021-09-18  0:56 ` [PATCH V2 08/10] KVM: X86: Remove FNAME(update_pte) Lai Jiangshan
@ 2021-09-18  0:56 ` Lai Jiangshan
  2021-09-18  0:56 ` [PATCH V2 10/10] KVM: X86: Don't check unsync if the original spte is writible Lai Jiangshan
  9 siblings, 0 replies; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

We'd better only unsync the pagetable when there just was a really
write fault on a level-1 pagetable.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/mmu/mmu.c          | 6 +++++-
 arch/x86/kvm/mmu/mmu_internal.h | 3 ++-
 arch/x86/kvm/mmu/spte.c         | 2 +-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ff52e8354148..e3213d804647 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2577,7 +2577,8 @@ static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
  * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
  * be write-protected.
  */
-int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync,
+			    bool speculative)
 {
 	struct kvm_mmu_page *sp;
 	bool locked = false;
@@ -2603,6 +2604,9 @@ int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync)
 		if (sp->unsync)
 			continue;
 
+		if (speculative)
+			return -EEXIST;
+
 		/*
 		 * TDP MMU page faults require an additional spinlock as they
 		 * run with mmu_lock held for read, not write, and the unsync
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 658d8d228d43..f5d8be787993 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -116,7 +116,8 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
 	       kvm_x86_ops.cpu_dirty_log_size;
 }
 
-int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync);
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync,
+			    bool speculative);
 
 void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 3e97cdb13eb7..b68a580f3510 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -159,7 +159,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 		 * e.g. it's write-tracked (upper-level SPs) or has one or more
 		 * shadow pages and unsync'ing pages is not allowed.
 		 */
-		if (mmu_try_to_unsync_pages(vcpu, gfn, can_unsync)) {
+		if (mmu_try_to_unsync_pages(vcpu, gfn, can_unsync, speculative)) {
 			pgprintk("%s: found shadow page for %llx, marking ro\n",
 				 __func__, gfn);
 			ret |= SET_SPTE_WRITE_PROTECTED_PT;
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 10/10] KVM: X86: Don't check unsync if the original spte is writible
  2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
                   ` (8 preceding siblings ...)
  2021-09-18  0:56 ` [PATCH V2 09/10] KVM: X86: Don't unsync pagetables when speculative Lai Jiangshan
@ 2021-09-18  0:56 ` Lai Jiangshan
  9 siblings, 0 replies; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  0:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <laijs@linux.alibaba.com>

If the original spte is writable, the target gfn should not be the
gfn of synchronized shadowpage and can continue to be writable.

When !can_unsync, speculative must be false.  So when the check of
"!can_unsync" is removed, we need to move the label of "out" up.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 arch/x86/kvm/mmu/spte.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index b68a580f3510..a33c581aabd6 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -150,7 +150,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 		 * is responsibility of kvm_mmu_get_page / kvm_mmu_sync_roots.
 		 * Same reasoning can be applied to dirty page accounting.
 		 */
-		if (!can_unsync && is_writable_pte(old_spte))
+		if (is_writable_pte(old_spte))
 			goto out;
 
 		/*
@@ -171,10 +171,10 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 	if (pte_access & ACC_WRITE_MASK)
 		spte |= spte_shadow_dirty_mask(spte);
 
+out:
 	if (speculative)
 		spte = mark_spte_for_access_track(spte);
 
-out:
 	WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level),
 		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
 		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 08/10] KVM: X86: Remove FNAME(update_pte)
  2021-09-18  0:56 ` [PATCH V2 08/10] KVM: X86: Remove FNAME(update_pte) Lai Jiangshan
@ 2021-09-18  1:49   ` Lai Jiangshan
  0 siblings, 0 replies; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-18  1:49 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, kvm



On 2021/9/18 08:56, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> Its solo caller is changed to use FNAME(prefetch_gpte) directly.
> 
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---

Sorry, I've forgotten to add Maxim's Reviewed-by from V1:

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

>   arch/x86/kvm/mmu/paging_tmpl.h | 10 +---------
>   1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index c3edbc0f06b3..ca5fdd07cfa2 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -589,14 +589,6 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>   	return true;
>   }
>   
> -static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> -			      u64 *spte, const void *pte)
> -{
> -	pt_element_t gpte = *(const pt_element_t *)pte;
> -
> -	FNAME(prefetch_gpte)(vcpu, sp, spte, gpte, false);
> -}
> -
>   static bool FNAME(gpte_changed)(struct kvm_vcpu *vcpu,
>   				struct guest_walker *gw, int level)
>   {
> @@ -1001,7 +993,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>   						       sizeof(pt_element_t)))
>   				break;
>   
> -			FNAME(update_pte)(vcpu, sp, sptep, &gpte);
> +			FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
>   		}
>   
>   		if (!sp->unsync_children)
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 02/10] KVM: X86: Synchronize the shadow pagetable before link it
  2021-09-18  0:56 ` [PATCH V2 02/10] KVM: X86: Synchronize the shadow pagetable before link it Lai Jiangshan
@ 2021-09-23 14:36   ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2021-09-23 14:36 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel
  Cc: Lai Jiangshan, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Marcelo Tosatti,
	Avi Kivity, kvm

On 18/09/21 02:56, Lai Jiangshan wrote:
> +			 * It also makes KVM_REQ_MMU_SYNC request if the @sp
> +			 * is linked on a different addr to expedite it.
> +			 */
> +			if (sp->unsync_children &&
> +			    mmu_sync_children(vcpu, sp, false)) {
> +				kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> +				return RET_PF_RETRY;
> +			}
>   		}

I think we can put the sync in mmu_sync_children:

-			if (!can_yield)
+			if (!can_yield) {
+				/*
+				 * Some progress has been made so the caller
+				 * can simply retry, but we can expedite the
+				 * process by forcing a sync to happen.
+				 */
+				kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
  				return -EINTR;
+			}

Paolo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 03/10] KVM: Remove tlbs_dirty
  2021-09-18  0:56 ` [PATCH V2 03/10] KVM: Remove tlbs_dirty Lai Jiangshan
@ 2021-09-23 15:23   ` Paolo Bonzini
  2021-09-24 15:40     ` Lai Jiangshan
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2021-09-23 15:23 UTC (permalink / raw)
  To: Lai Jiangshan, linux-kernel; +Cc: Lai Jiangshan, kvm

On 18/09/21 02:56, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> There is no user of tlbs_dirty.
> 
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---
>   include/linux/kvm_host.h | 1 -
>   virt/kvm/kvm_main.c      | 9 +--------
>   2 files changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e4d712e9f760..3b7846cd0637 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -608,7 +608,6 @@ struct kvm {
>   	unsigned long mmu_notifier_range_start;
>   	unsigned long mmu_notifier_range_end;
>   #endif
> -	long tlbs_dirty;
>   	struct list_head devices;
>   	u64 manual_dirty_log_protect;
>   	struct dentry *debugfs_dentry;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 3e67c93ca403..6d6be42ec78d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -312,12 +312,6 @@ EXPORT_SYMBOL_GPL(kvm_make_all_cpus_request);
>   #ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
>   void kvm_flush_remote_tlbs(struct kvm *kvm)
>   {
> -	/*
> -	 * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
> -	 * kvm_make_all_cpus_request.
> -	 */
> -	long dirty_count = smp_load_acquire(&kvm->tlbs_dirty);
> -
>   	/*
>   	 * We want to publish modifications to the page tables before reading
>   	 * mode. Pairs with a memory barrier in arch-specific code.
> @@ -332,7 +326,6 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>   	if (!kvm_arch_flush_remote_tlb(kvm)
>   	    || kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>   		++kvm->stat.generic.remote_tlb_flush;
> -	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
>   }
>   EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
>   #endif
> @@ -537,7 +530,7 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
>   		}
>   	}
>   
> -	if (range->flush_on_ret && (ret || kvm->tlbs_dirty))
> +	if (range->flush_on_ret && ret)
>   		kvm_flush_remote_tlbs(kvm);
>   
>   	if (locked)
> 

Queued up to here for 5.15, thanks!

Paolo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 03/10] KVM: Remove tlbs_dirty
  2021-09-23 15:23   ` Paolo Bonzini
@ 2021-09-24 15:40     ` Lai Jiangshan
  2021-09-24 16:03       ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2021-09-24 15:40 UTC (permalink / raw)
  To: Paolo Bonzini, Lai Jiangshan, linux-kernel; +Cc: kvm



On 2021/9/23 23:23, Paolo Bonzini wrote:
> On 18/09/21 02:56, Lai Jiangshan wrote:

> 
> Queued up to here for 5.15, thanks!
> 
> Paolo

Any comments on other commits?

Thanks
Lai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 03/10] KVM: Remove tlbs_dirty
  2021-09-24 15:40     ` Lai Jiangshan
@ 2021-09-24 16:03       ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2021-09-24 16:03 UTC (permalink / raw)
  To: Lai Jiangshan, Lai Jiangshan, linux-kernel; +Cc: kvm

On 24/09/21 17:40, Lai Jiangshan wrote:
> 
> 
> On 2021/9/23 23:23, Paolo Bonzini wrote:
>> On 18/09/21 02:56, Lai Jiangshan wrote:
> 
>>
>> Queued up to here for 5.15, thanks!
>>
>> Paolo
> 
> Any comments on other commits?

Queued now for 5.16. :)

More precisely this is what I have queued from you for 5.16 only:

       KVM: X86: Don't flush current tlb on shadow page modification
       KVM: X86: Remove kvm_mmu_flush_or_zap()
       KVM: X86: Change kvm_sync_page() to return true when remote flush is needed
       KVM: X86: Zap the invalid list after remote tlb flushing
       KVM: X86: Remove FNAME(update_pte)
       KVM: X86: Don't unsync pagetables when speculative
       KVM: X86: Don't check unsync if the original spte is writible
       KVM: X86: Move PTE present check from loop body to __shadow_walk_next()

Paolo


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-09-24 16:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-18  0:56 [PATCH V2 00/10] KVM: X86: MMU: misc fixes and cleanups Lai Jiangshan
2021-09-18  0:56 ` [PATCH V2 01/10] KVM: X86: Fix missed remote tlb flush in rmap_write_protect() Lai Jiangshan
2021-09-18  0:56 ` [PATCH V2 02/10] KVM: X86: Synchronize the shadow pagetable before link it Lai Jiangshan
2021-09-23 14:36   ` Paolo Bonzini
2021-09-18  0:56 ` [PATCH V2 03/10] KVM: Remove tlbs_dirty Lai Jiangshan
2021-09-23 15:23   ` Paolo Bonzini
2021-09-24 15:40     ` Lai Jiangshan
2021-09-24 16:03       ` Paolo Bonzini
2021-09-18  0:56 ` [PATCH V2 04/10] KVM: X86: Don't flush current tlb on shadow page modification Lai Jiangshan
2021-09-18  0:56 ` [PATCH V2 05/10] KVM: X86: Remove kvm_mmu_flush_or_zap() Lai Jiangshan
2021-09-18  0:56 ` [PATCH V2 06/10] KVM: X86: Change kvm_sync_page() to return true when remote flush is needed Lai Jiangshan
2021-09-18  0:56 ` [PATCH V2 07/10] KVM: X86: Zap the invalid list after remote tlb flushing Lai Jiangshan
2021-09-18  0:56 ` [PATCH V2 08/10] KVM: X86: Remove FNAME(update_pte) Lai Jiangshan
2021-09-18  1:49   ` Lai Jiangshan
2021-09-18  0:56 ` [PATCH V2 09/10] KVM: X86: Don't unsync pagetables when speculative Lai Jiangshan
2021-09-18  0:56 ` [PATCH V2 10/10] KVM: X86: Don't check unsync if the original spte is writible Lai Jiangshan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.