kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler
@ 2022-09-21 17:35 David Matlack
  2022-09-21 17:35 ` [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter David Matlack
                   ` (9 more replies)
  0 siblings, 10 replies; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

This series changes the tdp_mmu module parameter to read-only so that
the TDP MMU can be tracked in a global variable instead of per-VM
state. Then it splits out a separate page fault handler for the TDP MMU
and makes some clean ups along the way.

v3:
 - Use __ro_after_init for tdp_mmu_allowed [Kai]
 - Use a macro instead of a const bool for 32-bit [Kai, Sean]
 - Drop unnecessary whitespace change [Isaku]
 - Make kvm_tdp_mmu_page_fault() static [Isaku]

v2: https://lore.kernel.org/kvm/20220826231227.4096391-1-dmatlack@google.com/
 - Make tdp_mmu read-only instead of deleting it entirely [Paolo]
 - Fix 32-bit compilation failures [kernel test robot]

v1: https://lore.kernel.org/kvm/20220815230110.2266741-1-dmatlack@google.com/

Cc: Kai Huang <kai.huang@intel.com>
Cc: Isaku Yamahata <isaku.yamahata@gmail.com>
Cc: Peter Xu <peterx@redhat.com>

David Matlack (10):
  KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled
  KVM: x86/mmu: Grab mmu_invalidate_seq in kvm_faultin_pfn()
  KVM: x86/mmu: Handle error PFNs in kvm_faultin_pfn()
  KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON
    handling
  KVM: x86/mmu: Handle no-slot faults in kvm_faultin_pfn()
  KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUs
  KVM: x86/mmu: Split out TDP MMU page fault handling
  KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU
    faults
  KVM: x86/mmu: Rename __direct_map() to direct_map()

 arch/x86/include/asm/kvm_host.h |   9 --
 arch/x86/kvm/mmu.h              |   6 +-
 arch/x86/kvm/mmu/mmu.c          | 237 ++++++++++++++++++++------------
 arch/x86/kvm/mmu/mmu_internal.h |   8 +-
 arch/x86/kvm/mmu/paging_tmpl.h  |  12 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |  11 --
 arch/x86/kvm/mmu/tdp_mmu.h      |   7 +-
 7 files changed, 164 insertions(+), 126 deletions(-)


base-commit: 372d07084593dc7a399bf9bee815711b1fb1bcf2
prerequisite-patch-id: 2e3661ba8856c29b769499bac525b6943d9284b8
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
@ 2022-09-21 17:35 ` David Matlack
  2022-09-27  9:19   ` Huang, Kai
  2022-10-11 20:12   ` Sean Christopherson
  2022-09-21 17:35 ` [PATCH v3 02/10] KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled David Matlack
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

Change tdp_mmu to a read-only parameter and drop the per-vm
tdp_mmu_enabled. For 32-bit KVM, make tdp_mmu_enabled a macro that is
always false so that the compiler can continue omitting cals to the TDP
MMU.

The TDP MMU was introduced in 5.10 and has been enabled by default since
5.15. At this point there are no known functionality gaps between the
TDP MMU and the shadow MMU, and the TDP MMU uses less memory and scales
better with the number of vCPUs. In other words, there is no good reason
to disable the TDP MMU on a live system.

Purposely do not drop tdp_mmu=N support (i.e. do not force 64-bit KVM to
always use the TDP MMU) since tdp_mmu=N is still used to get test
coverage of KVM's shadow MMU TDP support, which is used in 32-bit KVM.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h |  9 ------
 arch/x86/kvm/mmu.h              |  6 ++--
 arch/x86/kvm/mmu/mmu.c          | 51 ++++++++++++++++++++++-----------
 arch/x86/kvm/mmu/tdp_mmu.c      |  9 ++----
 4 files changed, 39 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2c96c43c313a..d76059270a43 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1262,15 +1262,6 @@ struct kvm_arch {
 	struct task_struct *nx_lpage_recovery_thread;
 
 #ifdef CONFIG_X86_64
-	/*
-	 * Whether the TDP MMU is enabled for this VM. This contains a
-	 * snapshot of the TDP MMU module parameter from when the VM was
-	 * created and remains unchanged for the life of the VM. If this is
-	 * true, TDP MMU handler functions will run for various MMU
-	 * operations.
-	 */
-	bool tdp_mmu_enabled;
-
 	/*
 	 * List of struct kvm_mmu_pages being used as roots.
 	 * All struct kvm_mmu_pages in the list should have
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6bdaacb6faa0..168c46fd8dd1 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -230,14 +230,14 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm)
 }
 
 #ifdef CONFIG_X86_64
-static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; }
+extern bool tdp_mmu_enabled;
 #else
-static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; }
+#define tdp_mmu_enabled false
 #endif
 
 static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
 {
-	return !is_tdp_mmu_enabled(kvm) || kvm_shadow_root_allocated(kvm);
+	return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm);
 }
 
 static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e418ef3ecfcb..ccb0b18fd194 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -98,6 +98,13 @@ module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
  */
 bool tdp_enabled = false;
 
+bool __ro_after_init tdp_mmu_allowed;
+
+#ifdef CONFIG_X86_64
+bool __read_mostly tdp_mmu_enabled = true;
+module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0444);
+#endif
+
 static int max_huge_page_level __read_mostly;
 static int tdp_root_level __read_mostly;
 static int max_tdp_level __read_mostly;
@@ -1253,7 +1260,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 {
 	struct kvm_rmap_head *rmap_head;
 
-	if (is_tdp_mmu_enabled(kvm))
+	if (tdp_mmu_enabled)
 		kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
 				slot->base_gfn + gfn_offset, mask, true);
 
@@ -1286,7 +1293,7 @@ static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 {
 	struct kvm_rmap_head *rmap_head;
 
-	if (is_tdp_mmu_enabled(kvm))
+	if (tdp_mmu_enabled)
 		kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
 				slot->base_gfn + gfn_offset, mask, false);
 
@@ -1369,7 +1376,7 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 		}
 	}
 
-	if (is_tdp_mmu_enabled(kvm))
+	if (tdp_mmu_enabled)
 		write_protected |=
 			kvm_tdp_mmu_write_protect_gfn(kvm, slot, gfn, min_level);
 
@@ -1532,7 +1539,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 	if (kvm_memslots_have_rmaps(kvm))
 		flush = kvm_handle_gfn_range(kvm, range, kvm_zap_rmap);
 
-	if (is_tdp_mmu_enabled(kvm))
+	if (tdp_mmu_enabled)
 		flush = kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
 
 	return flush;
@@ -1545,7 +1552,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	if (kvm_memslots_have_rmaps(kvm))
 		flush = kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmap);
 
-	if (is_tdp_mmu_enabled(kvm))
+	if (tdp_mmu_enabled)
 		flush |= kvm_tdp_mmu_set_spte_gfn(kvm, range);
 
 	return flush;
@@ -1618,7 +1625,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	if (kvm_memslots_have_rmaps(kvm))
 		young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap);
 
-	if (is_tdp_mmu_enabled(kvm))
+	if (tdp_mmu_enabled)
 		young |= kvm_tdp_mmu_age_gfn_range(kvm, range);
 
 	return young;
@@ -1631,7 +1638,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	if (kvm_memslots_have_rmaps(kvm))
 		young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap);
 
-	if (is_tdp_mmu_enabled(kvm))
+	if (tdp_mmu_enabled)
 		young |= kvm_tdp_mmu_test_age_gfn(kvm, range);
 
 	return young;
@@ -3543,7 +3550,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 	if (r < 0)
 		goto out_unlock;
 
-	if (is_tdp_mmu_enabled(vcpu->kvm)) {
+	if (tdp_mmu_enabled) {
 		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
 		mmu->root.hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
@@ -5662,6 +5669,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 	tdp_root_level = tdp_forced_root_level;
 	max_tdp_level = tdp_max_root_level;
 
+#ifdef CONFIG_X86_64
+	tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled;
+#endif
 	/*
 	 * max_huge_page_level reflects KVM's MMU capabilities irrespective
 	 * of kernel support, e.g. KVM may be capable of using 1GB pages when
@@ -5909,7 +5919,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	 * write and in the same critical section as making the reload request,
 	 * e.g. before kvm_zap_obsolete_pages() could drop mmu_lock and yield.
 	 */
-	if (is_tdp_mmu_enabled(kvm))
+	if (tdp_mmu_enabled)
 		kvm_tdp_mmu_invalidate_all_roots(kvm);
 
 	/*
@@ -5934,7 +5944,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	 * Deferring the zap until the final reference to the root is put would
 	 * lead to use-after-free.
 	 */
-	if (is_tdp_mmu_enabled(kvm))
+	if (tdp_mmu_enabled)
 		kvm_tdp_mmu_zap_invalidated_roots(kvm);
 }
 
@@ -6046,7 +6056,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 
 	flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
 
-	if (is_tdp_mmu_enabled(kvm)) {
+	if (tdp_mmu_enabled) {
 		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
 			flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start,
 						      gfn_end, true, flush);
@@ -6079,7 +6089,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 		write_unlock(&kvm->mmu_lock);
 	}
 
-	if (is_tdp_mmu_enabled(kvm)) {
+	if (tdp_mmu_enabled) {
 		read_lock(&kvm->mmu_lock);
 		kvm_tdp_mmu_wrprot_slot(kvm, memslot, start_level);
 		read_unlock(&kvm->mmu_lock);
@@ -6322,7 +6332,7 @@ void kvm_mmu_try_split_huge_pages(struct kvm *kvm,
 				   u64 start, u64 end,
 				   int target_level)
 {
-	if (!is_tdp_mmu_enabled(kvm))
+	if (!tdp_mmu_enabled)
 		return;
 
 	if (kvm_memslots_have_rmaps(kvm))
@@ -6343,7 +6353,7 @@ void kvm_mmu_slot_try_split_huge_pages(struct kvm *kvm,
 	u64 start = memslot->base_gfn;
 	u64 end = start + memslot->npages;
 
-	if (!is_tdp_mmu_enabled(kvm))
+	if (!tdp_mmu_enabled)
 		return;
 
 	if (kvm_memslots_have_rmaps(kvm)) {
@@ -6426,7 +6436,7 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 		write_unlock(&kvm->mmu_lock);
 	}
 
-	if (is_tdp_mmu_enabled(kvm)) {
+	if (tdp_mmu_enabled) {
 		read_lock(&kvm->mmu_lock);
 		kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot);
 		read_unlock(&kvm->mmu_lock);
@@ -6461,7 +6471,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 		write_unlock(&kvm->mmu_lock);
 	}
 
-	if (is_tdp_mmu_enabled(kvm)) {
+	if (tdp_mmu_enabled) {
 		read_lock(&kvm->mmu_lock);
 		kvm_tdp_mmu_clear_dirty_slot(kvm, memslot);
 		read_unlock(&kvm->mmu_lock);
@@ -6496,7 +6506,7 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 
-	if (is_tdp_mmu_enabled(kvm))
+	if (tdp_mmu_enabled)
 		kvm_tdp_mmu_zap_all(kvm);
 
 	write_unlock(&kvm->mmu_lock);
@@ -6661,6 +6671,13 @@ void __init kvm_mmu_x86_module_init(void)
 	if (nx_huge_pages == -1)
 		__set_nx_huge_pages(get_nx_auto_mode());
 
+	/*
+	 * Snapshot userspace's desire to enable the TDP MMU. Whether or not the
+	 * TDP MMU is actually enabled is determined in kvm_configure_mmu()
+	 * when the vendor module is loaded.
+	 */
+	tdp_mmu_allowed = tdp_mmu_enabled;
+
 	kvm_mmu_spte_module_init();
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bf2ccf9debca..e7d0f21fbbe8 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -10,23 +10,18 @@
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
 
-static bool __read_mostly tdp_mmu_enabled = true;
-module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644);
-
 /* Initializes the TDP MMU for the VM, if enabled. */
 int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 {
 	struct workqueue_struct *wq;
 
-	if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled))
+	if (!tdp_mmu_enabled)
 		return 0;
 
 	wq = alloc_workqueue("kvm", WQ_UNBOUND|WQ_MEM_RECLAIM|WQ_CPU_INTENSIVE, 0);
 	if (!wq)
 		return -ENOMEM;
 
-	/* This should not be changed for the lifetime of the VM. */
-	kvm->arch.tdp_mmu_enabled = true;
 	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
 	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
 	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
@@ -48,7 +43,7 @@ static __always_inline bool kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
 
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 {
-	if (!kvm->arch.tdp_mmu_enabled)
+	if (!tdp_mmu_enabled)
 		return;
 
 	/* Also waits for any queued work items.  */
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 02/10] KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled
  2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
  2022-09-21 17:35 ` [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter David Matlack
@ 2022-09-21 17:35 ` David Matlack
  2022-10-03 19:01   ` Isaku Yamahata
  2022-09-21 17:35 ` [PATCH v3 03/10] KVM: x86/mmu: Grab mmu_invalidate_seq in kvm_faultin_pfn() David Matlack
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

Move kvm_mmu_{init,uninit}_tdp_mmu() behind tdp_mmu_enabled. This makes
these functions consistent with the rest of the calls into the TDP MMU
from mmu.c, and which is now possible since tdp_mmu_enabled is only
modified when the x86 vendor module is loaded. i.e. It will never change
during the lifetime of a VM.

This change also enabled removing the stub definitions for 32-bit KVM,
as the compiler will just optimize the calls out like it does for all
the other TDP MMU functions.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 11 +++++++----
 arch/x86/kvm/mmu/tdp_mmu.c |  6 ------
 arch/x86/kvm/mmu/tdp_mmu.h |  7 +++----
 3 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ccb0b18fd194..dd261cd2ad4e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5970,9 +5970,11 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
 	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
 
-	r = kvm_mmu_init_tdp_mmu(kvm);
-	if (r < 0)
-		return r;
+	if (tdp_mmu_enabled) {
+		r = kvm_mmu_init_tdp_mmu(kvm);
+		if (r < 0)
+			return r;
+	}
 
 	node->track_write = kvm_mmu_pte_write;
 	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
@@ -6002,7 +6004,8 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
 
 	kvm_page_track_unregister_notifier(kvm, node);
 
-	kvm_mmu_uninit_tdp_mmu(kvm);
+	if (tdp_mmu_enabled)
+		kvm_mmu_uninit_tdp_mmu(kvm);
 
 	mmu_free_vm_memory_caches(kvm);
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index e7d0f21fbbe8..08ab3596dfaa 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -15,9 +15,6 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 {
 	struct workqueue_struct *wq;
 
-	if (!tdp_mmu_enabled)
-		return 0;
-
 	wq = alloc_workqueue("kvm", WQ_UNBOUND|WQ_MEM_RECLAIM|WQ_CPU_INTENSIVE, 0);
 	if (!wq)
 		return -ENOMEM;
@@ -43,9 +40,6 @@ static __always_inline bool kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
 
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 {
-	if (!tdp_mmu_enabled)
-		return;
-
 	/* Also waits for any queued work items.  */
 	destroy_workqueue(kvm->arch.tdp_mmu_zap_wq);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index c163f7cc23ca..9d086a103f77 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -5,6 +5,9 @@
 
 #include <linux/kvm_host.h>
 
+int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
+void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
+
 hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
 
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
@@ -66,8 +69,6 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte);
 
 #ifdef CONFIG_X86_64
-int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
-void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
 
 static inline bool is_tdp_mmu(struct kvm_mmu *mmu)
@@ -87,8 +88,6 @@ static inline bool is_tdp_mmu(struct kvm_mmu *mmu)
 	return sp && is_tdp_mmu_page(sp) && sp->root_count;
 }
 #else
-static inline int kvm_mmu_init_tdp_mmu(struct kvm *kvm) { return 0; }
-static inline void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) {}
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
 static inline bool is_tdp_mmu(struct kvm_mmu *mmu) { return false; }
 #endif
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 03/10] KVM: x86/mmu: Grab mmu_invalidate_seq in kvm_faultin_pfn()
  2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
  2022-09-21 17:35 ` [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter David Matlack
  2022-09-21 17:35 ` [PATCH v3 02/10] KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled David Matlack
@ 2022-09-21 17:35 ` David Matlack
  2022-10-03 19:05   ` Isaku Yamahata
  2022-09-21 17:35 ` [PATCH v3 04/10] KVM: x86/mmu: Handle error PFNs " David Matlack
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

Grab mmu_invalidate_seq in kvm_faultin_pfn() and stash it in struct
kvm_page_fault. The eliminates duplicate code and reduces the amount of
parameters needed for is_page_fault_stale().

Preemptively split out __kvm_faultin_pfn() to a separate function for
use in subsequent commits.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 21 ++++++++++++---------
 arch/x86/kvm/mmu/mmu_internal.h |  1 +
 arch/x86/kvm/mmu/paging_tmpl.h  |  6 +-----
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dd261cd2ad4e..31b835d20762 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4129,7 +4129,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
 }
 
-static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_memory_slot *slot = fault->slot;
 	bool async;
@@ -4185,12 +4185,20 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	return RET_PF_CONTINUE;
 }
 
+static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+{
+	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+	smp_rmb();
+
+	return __kvm_faultin_pfn(vcpu, fault);
+}
+
 /*
  * Returns true if the page fault is stale and needs to be retried, i.e. if the
  * root was invalidated by a memslot update or a relevant mmu_notifier fired.
  */
 static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
-				struct kvm_page_fault *fault, int mmu_seq)
+				struct kvm_page_fault *fault)
 {
 	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
 
@@ -4210,14 +4218,12 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
 		return true;
 
 	return fault->slot &&
-	       mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva);
+	       mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva);
 }
 
 static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
-
-	unsigned long mmu_seq;
 	int r;
 
 	fault->gfn = fault->addr >> PAGE_SHIFT;
@@ -4234,9 +4240,6 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		return r;
 
-	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
-	smp_rmb();
-
 	r = kvm_faultin_pfn(vcpu, fault);
 	if (r != RET_PF_CONTINUE)
 		return r;
@@ -4252,7 +4255,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	else
 		write_lock(&vcpu->kvm->mmu_lock);
 
-	if (is_page_fault_stale(vcpu, fault, mmu_seq))
+	if (is_page_fault_stale(vcpu, fault))
 		goto out_unlock;
 
 	r = make_mmu_pages_available(vcpu);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 582def531d4d..1c0a1e7c796d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -221,6 +221,7 @@ struct kvm_page_fault {
 	struct kvm_memory_slot *slot;
 
 	/* Outputs of kvm_faultin_pfn.  */
+	unsigned long mmu_seq;
 	kvm_pfn_t pfn;
 	hva_t hva;
 	bool map_writable;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 39e0205e7300..98f4abce4eaf 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -791,7 +791,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 {
 	struct guest_walker walker;
 	int r;
-	unsigned long mmu_seq;
 	bool is_self_change_mapping;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
@@ -838,9 +837,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	else
 		fault->max_level = walker.level;
 
-	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
-	smp_rmb();
-
 	r = kvm_faultin_pfn(vcpu, fault);
 	if (r != RET_PF_CONTINUE)
 		return r;
@@ -871,7 +867,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	r = RET_PF_RETRY;
 	write_lock(&vcpu->kvm->mmu_lock);
 
-	if (is_page_fault_stale(vcpu, fault, mmu_seq))
+	if (is_page_fault_stale(vcpu, fault))
 		goto out_unlock;
 
 	r = make_mmu_pages_available(vcpu);
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 04/10] KVM: x86/mmu: Handle error PFNs in kvm_faultin_pfn()
  2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
                   ` (2 preceding siblings ...)
  2022-09-21 17:35 ` [PATCH v3 03/10] KVM: x86/mmu: Grab mmu_invalidate_seq in kvm_faultin_pfn() David Matlack
@ 2022-09-21 17:35 ` David Matlack
  2022-09-21 17:35 ` [PATCH v3 05/10] KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON handling David Matlack
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

Handle error PFNs in kvm_faultin_pfn() rather than relying on the caller
to invoke handle_abnormal_pfn() after kvm_faultin_pfn().
Opportunistically rename kvm_handle_bad_page() to kvm_handle_error_pfn()
to make it more consistent with is_error_pfn().

This commit moves KVM closer to being able to drop
handle_abnormal_pfn(), which will reduce the amount of duplicate code in
the various page fault handlers.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 31b835d20762..49a5e38ecc5c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3141,7 +3141,7 @@ static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, PAGE_SHIFT, tsk);
 }
 
-static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
+static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
 {
 	/*
 	 * Do not cache the mmio info caused by writing the readonly gfn
@@ -3162,10 +3162,6 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
 static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			       unsigned int access)
 {
-	/* The pfn is invalid, report the error! */
-	if (unlikely(is_error_pfn(fault->pfn)))
-		return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn);
-
 	if (unlikely(!fault->slot)) {
 		gva_t gva = fault->is_tdp ? 0 : fault->addr;
 
@@ -4187,10 +4183,19 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 
 static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
+	int ret;
+
 	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	smp_rmb();
 
-	return __kvm_faultin_pfn(vcpu, fault);
+	ret = __kvm_faultin_pfn(vcpu, fault);
+	if (ret != RET_PF_CONTINUE)
+		return ret;
+
+	if (unlikely(is_error_pfn(fault->pfn)))
+		return kvm_handle_error_pfn(vcpu, fault->gfn, fault->pfn);
+
+	return RET_PF_CONTINUE;
 }
 
 /*
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 05/10] KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON handling
  2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
                   ` (3 preceding siblings ...)
  2022-09-21 17:35 ` [PATCH v3 04/10] KVM: x86/mmu: Handle error PFNs " David Matlack
@ 2022-09-21 17:35 ` David Matlack
  2022-10-03 19:23   ` Isaku Yamahata
  2022-09-21 17:35 ` [PATCH v3 06/10] KVM: x86/mmu: Handle no-slot faults in kvm_faultin_pfn() David Matlack
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

Pass the kvm_page_fault struct down to kvm_handle_error_pfn() to avoid a
memslot lookup when handling KVM_PFN_ERR_HWPOISON. Opportunistically
move the gfn_to_hva_memslot() call and @current down into
kvm_send_hwpoison_signal() to cut down on line lengths.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 49a5e38ecc5c..b6f84e470677 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3136,23 +3136,25 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	return ret;
 }
 
-static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *tsk)
+static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, PAGE_SHIFT, tsk);
+	unsigned long hva = gfn_to_hva_memslot(slot, gfn);
+
+	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, PAGE_SHIFT, current);
 }
 
-static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
+static int kvm_handle_error_pfn(struct kvm_page_fault *fault)
 {
 	/*
 	 * Do not cache the mmio info caused by writing the readonly gfn
 	 * into the spte otherwise read access on readonly gfn also can
 	 * caused mmio page fault and treat it as mmio access.
 	 */
-	if (pfn == KVM_PFN_ERR_RO_FAULT)
+	if (fault->pfn == KVM_PFN_ERR_RO_FAULT)
 		return RET_PF_EMULATE;
 
-	if (pfn == KVM_PFN_ERR_HWPOISON) {
-		kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn), current);
+	if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
+		kvm_send_hwpoison_signal(fault->slot, fault->gfn);
 		return RET_PF_RETRY;
 	}
 
@@ -4193,7 +4195,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		return ret;
 
 	if (unlikely(is_error_pfn(fault->pfn)))
-		return kvm_handle_error_pfn(vcpu, fault->gfn, fault->pfn);
+		return kvm_handle_error_pfn(fault);
 
 	return RET_PF_CONTINUE;
 }
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 06/10] KVM: x86/mmu: Handle no-slot faults in kvm_faultin_pfn()
  2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
                   ` (4 preceding siblings ...)
  2022-09-21 17:35 ` [PATCH v3 05/10] KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON handling David Matlack
@ 2022-09-21 17:35 ` David Matlack
  2022-09-21 17:35 ` [PATCH v3 07/10] KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUs David Matlack
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

Handle faults on GFNs that do not have a backing memslot in
kvm_faultin_pfn() and drop handle_abnormal_pfn(). This eliminates
duplicate code in the various page fault handlers.

Opportunistically tweak the comment about handling gfn > host.MAXPHYADDR
to reflect that the effect of returning RET_PF_EMULATE at that point is
to avoid creating an MMIO SPTE for such GFNs.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 56 ++++++++++++++++++----------------
 arch/x86/kvm/mmu/paging_tmpl.h |  6 +---
 2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b6f84e470677..e3b248385154 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3161,28 +3161,32 @@ static int kvm_handle_error_pfn(struct kvm_page_fault *fault)
 	return -EFAULT;
 }
 
-static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
-			       unsigned int access)
+static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
+				   struct kvm_page_fault *fault,
+				   unsigned int access)
 {
-	if (unlikely(!fault->slot)) {
-		gva_t gva = fault->is_tdp ? 0 : fault->addr;
+	gva_t gva = fault->is_tdp ? 0 : fault->addr;
 
-		vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
-				     access & shadow_mmio_access_mask);
-		/*
-		 * If MMIO caching is disabled, emulate immediately without
-		 * touching the shadow page tables as attempting to install an
-		 * MMIO SPTE will just be an expensive nop.  Do not cache MMIO
-		 * whose gfn is greater than host.MAXPHYADDR, any guest that
-		 * generates such gfns is running nested and is being tricked
-		 * by L0 userspace (you can observe gfn > L1.MAXPHYADDR if
-		 * and only if L1's MAXPHYADDR is inaccurate with respect to
-		 * the hardware's).
-		 */
-		if (unlikely(!enable_mmio_caching) ||
-		    unlikely(fault->gfn > kvm_mmu_max_gfn()))
-			return RET_PF_EMULATE;
-	}
+	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
+			     access & shadow_mmio_access_mask);
+
+	/*
+	 * If MMIO caching is disabled, emulate immediately without
+	 * touching the shadow page tables as attempting to install an
+	 * MMIO SPTE will just be an expensive nop.
+	 */
+	if (unlikely(!enable_mmio_caching))
+		return RET_PF_EMULATE;
+
+	/*
+	 * Do not create an MMIO SPTE for a gfn greater than host.MAXPHYADDR,
+	 * any guest that generates such gfns is running nested and is being
+	 * tricked by L0 userspace (you can observe gfn > L1.MAXPHYADDR if and
+	 * only if L1's MAXPHYADDR is inaccurate with respect to the
+	 * hardware's).
+	 */
+	if (unlikely(fault->gfn > kvm_mmu_max_gfn()))
+		return RET_PF_EMULATE;
 
 	return RET_PF_CONTINUE;
 }
@@ -4183,7 +4187,8 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	return RET_PF_CONTINUE;
 }
 
-static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+			   unsigned int access)
 {
 	int ret;
 
@@ -4197,6 +4202,9 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	if (unlikely(is_error_pfn(fault->pfn)))
 		return kvm_handle_error_pfn(fault);
 
+	if (unlikely(!fault->slot))
+		return kvm_handle_noslot_fault(vcpu, fault, access);
+
 	return RET_PF_CONTINUE;
 }
 
@@ -4247,11 +4255,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		return r;
 
-	r = kvm_faultin_pfn(vcpu, fault);
-	if (r != RET_PF_CONTINUE)
-		return r;
-
-	r = handle_abnormal_pfn(vcpu, fault, ACC_ALL);
+	r = kvm_faultin_pfn(vcpu, fault, ACC_ALL);
 	if (r != RET_PF_CONTINUE)
 		return r;
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 98f4abce4eaf..e014e09ac2c1 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -837,11 +837,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	else
 		fault->max_level = walker.level;
 
-	r = kvm_faultin_pfn(vcpu, fault);
-	if (r != RET_PF_CONTINUE)
-		return r;
-
-	r = handle_abnormal_pfn(vcpu, fault, walker.pte_access);
+	r = kvm_faultin_pfn(vcpu, fault, walker.pte_access);
 	if (r != RET_PF_CONTINUE)
 		return r;
 
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 07/10] KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUs
  2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
                   ` (5 preceding siblings ...)
  2022-09-21 17:35 ` [PATCH v3 06/10] KVM: x86/mmu: Handle no-slot faults in kvm_faultin_pfn() David Matlack
@ 2022-09-21 17:35 ` David Matlack
  2022-10-03 19:27   ` Isaku Yamahata
  2022-09-21 17:35 ` [PATCH v3 08/10] KVM: x86/mmu: Split out TDP MMU page fault handling David Matlack
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

Move the initialization of fault.{gfn,slot} earlier in the page fault
handling code for fully direct MMUs. This will enable a future commit to
split out TDP MMU page fault handling without needing to duplicate the
initialization of these 2 fields.

Opportunistically take advantage of the fact that fault.gfn is
initialized in kvm_tdp_page_fault() rather than recomputing it from
fault->addr.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 5 +----
 arch/x86/kvm/mmu/mmu_internal.h | 5 +++++
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e3b248385154..dc203973de83 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4241,9 +4241,6 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
 	int r;
 
-	fault->gfn = fault->addr >> PAGE_SHIFT;
-	fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
-
 	if (page_fault_handle_page_track(vcpu, fault))
 		return RET_PF_EMULATE;
 
@@ -4347,7 +4344,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	if (shadow_memtype_mask && kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
 		for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level) {
 			int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);
-			gfn_t base = (fault->addr >> PAGE_SHIFT) & ~(page_num - 1);
+			gfn_t base = fault->gfn & ~(page_num - 1);
 
 			if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num))
 				break;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 1c0a1e7c796d..1e91f24bd865 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -279,6 +279,11 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	};
 	int r;
 
+	if (vcpu->arch.mmu->root_role.direct) {
+		fault.gfn = fault.addr >> PAGE_SHIFT;
+		fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
+	}
+
 	/*
 	 * Async #PF "faults", a.k.a. prefetch faults, are not faults from the
 	 * guest perspective and have already been counted at the time of the
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 08/10] KVM: x86/mmu: Split out TDP MMU page fault handling
  2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
                   ` (6 preceding siblings ...)
  2022-09-21 17:35 ` [PATCH v3 07/10] KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUs David Matlack
@ 2022-09-21 17:35 ` David Matlack
  2022-10-03 19:28   ` Isaku Yamahata
  2022-10-11 21:56   ` Sean Christopherson
  2022-09-21 17:35 ` [PATCH v3 09/10] KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU faults David Matlack
  2022-09-21 17:35 ` [PATCH v3 10/10] KVM: x86/mmu: Rename __direct_map() to direct_map() David Matlack
  9 siblings, 2 replies; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

Split out the page fault handling for the TDP MMU to a separate
function.  This creates some duplicate code, but makes the TDP MMU fault
handler simpler to read by eliminating branches and will enable future
cleanups by allowing the TDP MMU and non-TDP MMU fault paths to diverge.

Only compile in the TDP MMU fault handler for 64-bit builds since
kvm_tdp_mmu_map() does not exist in 32-bit builds.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 62 ++++++++++++++++++++++++++++++++----------
 1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dc203973de83..b36f351138f7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4238,7 +4238,6 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
 
 static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
 	int r;
 
 	if (page_fault_handle_page_track(vcpu, fault))
@@ -4257,11 +4256,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 		return r;
 
 	r = RET_PF_RETRY;
-
-	if (is_tdp_mmu_fault)
-		read_lock(&vcpu->kvm->mmu_lock);
-	else
-		write_lock(&vcpu->kvm->mmu_lock);
+	write_lock(&vcpu->kvm->mmu_lock);
 
 	if (is_page_fault_stale(vcpu, fault))
 		goto out_unlock;
@@ -4270,16 +4265,10 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		goto out_unlock;
 
-	if (is_tdp_mmu_fault)
-		r = kvm_tdp_mmu_map(vcpu, fault);
-	else
-		r = __direct_map(vcpu, fault);
+	r = __direct_map(vcpu, fault);
 
 out_unlock:
-	if (is_tdp_mmu_fault)
-		read_unlock(&vcpu->kvm->mmu_lock);
-	else
-		write_unlock(&vcpu->kvm->mmu_lock);
+	write_unlock(&vcpu->kvm->mmu_lock);
 	kvm_release_pfn_clean(fault->pfn);
 	return r;
 }
@@ -4327,6 +4316,46 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 }
 EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
 
+#ifdef CONFIG_X86_64
+static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
+				  struct kvm_page_fault *fault)
+{
+	int r;
+
+	if (page_fault_handle_page_track(vcpu, fault))
+		return RET_PF_EMULATE;
+
+	r = fast_page_fault(vcpu, fault);
+	if (r != RET_PF_INVALID)
+		return r;
+
+	r = mmu_topup_memory_caches(vcpu, false);
+	if (r)
+		return r;
+
+	r = kvm_faultin_pfn(vcpu, fault, ACC_ALL);
+	if (r != RET_PF_CONTINUE)
+		return r;
+
+	r = RET_PF_RETRY;
+	read_lock(&vcpu->kvm->mmu_lock);
+
+	if (is_page_fault_stale(vcpu, fault))
+		goto out_unlock;
+
+	r = make_mmu_pages_available(vcpu);
+	if (r)
+		goto out_unlock;
+
+	r = kvm_tdp_mmu_map(vcpu, fault);
+
+out_unlock:
+	read_unlock(&vcpu->kvm->mmu_lock);
+	kvm_release_pfn_clean(fault->pfn);
+	return r;
+}
+#endif
+
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	/*
@@ -4351,6 +4380,11 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}
 
+#ifdef CONFIG_X86_64
+	if (tdp_mmu_enabled)
+		return kvm_tdp_mmu_page_fault(vcpu, fault);
+#endif
+
 	return direct_page_fault(vcpu, fault);
 }
 
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 09/10] KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU faults
  2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
                   ` (7 preceding siblings ...)
  2022-09-21 17:35 ` [PATCH v3 08/10] KVM: x86/mmu: Split out TDP MMU page fault handling David Matlack
@ 2022-09-21 17:35 ` David Matlack
  2022-10-03 19:28   ` Isaku Yamahata
  2022-09-21 17:35 ` [PATCH v3 10/10] KVM: x86/mmu: Rename __direct_map() to direct_map() David Matlack
  9 siblings, 1 reply; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

Stop calling make_mmu_pages_available() when handling TDP MMU faults.
The TDP MMU does not participate in the "available MMU pages" tracking
and limiting so calling this function is unnecessary work when handling
TDP MMU faults.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b36f351138f7..4ad70fa371df 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4343,10 +4343,6 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 	if (is_page_fault_stale(vcpu, fault))
 		goto out_unlock;
 
-	r = make_mmu_pages_available(vcpu);
-	if (r)
-		goto out_unlock;
-
 	r = kvm_tdp_mmu_map(vcpu, fault);
 
 out_unlock:
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 10/10] KVM: x86/mmu: Rename __direct_map() to direct_map()
  2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
                   ` (8 preceding siblings ...)
  2022-09-21 17:35 ` [PATCH v3 09/10] KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU faults David Matlack
@ 2022-09-21 17:35 ` David Matlack
  2022-10-03 19:29   ` Isaku Yamahata
  9 siblings, 1 reply; 28+ messages in thread
From: David Matlack @ 2022-09-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, David Matlack, Kai Huang,
	Isaku Yamahata, Peter Xu

Rename __direct_map() to direct_map() since the leading underscores are
unnecessary. This also makes the page fault handler names more
consistent: kvm_tdp_mmu_page_fault() calls kvm_tdp_mmu_map() and
direct_page_fault() calls direct_map().

Opportunistically make some trivial cleanups to comments that had to be
modified anyway since they mentioned __direct_map(). Specifically, use
"()" when referring to functions, and include kvm_tdp_mmu_map() among
the various callers of disallowed_hugepage_adjust().

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 14 +++++++-------
 arch/x86/kvm/mmu/mmu_internal.h |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4ad70fa371df..a0b4bc3c9202 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3079,11 +3079,11 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 	    is_shadow_present_pte(spte) &&
 	    !is_large_pte(spte)) {
 		/*
-		 * A small SPTE exists for this pfn, but FNAME(fetch)
-		 * and __direct_map would like to create a large PTE
-		 * instead: just force them to go down another level,
-		 * patching back for them into pfn the next 9 bits of
-		 * the address.
+		 * A small SPTE exists for this pfn, but FNAME(fetch),
+		 * direct_map(), or kvm_tdp_mmu_map() would like to create a
+		 * large PTE instead: just force them to go down another level,
+		 * patching back for them into pfn the next 9 bits of the
+		 * address.
 		 */
 		u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
 				KVM_PAGES_PER_HPAGE(cur_level - 1);
@@ -3092,7 +3092,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 	}
 }
 
-static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_shadow_walk_iterator it;
 	struct kvm_mmu_page *sp;
@@ -4265,7 +4265,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		goto out_unlock;
 
-	r = __direct_map(vcpu, fault);
+	r = direct_map(vcpu, fault);
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 1e91f24bd865..b8c116ec1a89 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -198,7 +198,7 @@ struct kvm_page_fault {
 
 	/*
 	 * Maximum page size that can be created for this fault; input to
-	 * FNAME(fetch), __direct_map and kvm_tdp_mmu_map.
+	 * FNAME(fetch), direct_map() and kvm_tdp_mmu_map().
 	 */
 	u8 max_level;
 
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  2022-09-21 17:35 ` [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter David Matlack
@ 2022-09-27  9:19   ` Huang, Kai
  2022-09-27 16:14     ` David Matlack
  2022-10-11 20:12   ` Sean Christopherson
  1 sibling, 1 reply; 28+ messages in thread
From: Huang, Kai @ 2022-09-27  9:19 UTC (permalink / raw)
  To: pbonzini, dmatlack; +Cc: kvm, peterx, Christopherson,, Sean, isaku.yamahata


>  
> +bool __ro_after_init tdp_mmu_allowed;
> +

[...]

> @@ -5662,6 +5669,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>  	tdp_root_level = tdp_forced_root_level;
>  	max_tdp_level = tdp_max_root_level;
>  
> +#ifdef CONFIG_X86_64
> +	tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled;
> +#endif
> 

[...]

> @@ -6661,6 +6671,13 @@ void __init kvm_mmu_x86_module_init(void)
>  	if (nx_huge_pages == -1)
>  		__set_nx_huge_pages(get_nx_auto_mode());
>  
> +	/*
> +	 * Snapshot userspace's desire to enable the TDP MMU. Whether or not the
> +	 * TDP MMU is actually enabled is determined in kvm_configure_mmu()
> +	 * when the vendor module is loaded.
> +	 */
> +	tdp_mmu_allowed = tdp_mmu_enabled;
> +
>  	kvm_mmu_spte_module_init();
>  }
> 

Sorry last time I didn't review deeply, but I am wondering why do we need
'tdp_mmu_allowed' at all?  The purpose of having 'allow_mmio_caching' is because
kvm_mmu_set_mmio_spte_mask() is called twice, and 'enable_mmio_caching' can be
disabled in the first call, so it can be against user's desire in the second
call.  However it appears for 'tdp_mmu_enabled' we don't need 'tdp_mmu_allowed',
as kvm_configure_mmu() is only called once by VMX or SVM, if I read correctly.

So, should we just do below in kvm_configure_mmu()?

	#ifdef CONFIG_X86_64
	if (!tdp_enabled)
		tdp_mmu_enabled = false;
	#endif


-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  2022-09-27  9:19   ` Huang, Kai
@ 2022-09-27 16:14     ` David Matlack
  2022-09-27 21:10       ` Huang, Kai
  0 siblings, 1 reply; 28+ messages in thread
From: David Matlack @ 2022-09-27 16:14 UTC (permalink / raw)
  To: Huang, Kai; +Cc: pbonzini, kvm, peterx, Christopherson,, Sean, isaku.yamahata

On Tue, Sep 27, 2022 at 2:19 AM Huang, Kai <kai.huang@intel.com> wrote:
>
>
> >
> > +bool __ro_after_init tdp_mmu_allowed;
> > +
>
> [...]
>
> > @@ -5662,6 +5669,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> >       tdp_root_level = tdp_forced_root_level;
> >       max_tdp_level = tdp_max_root_level;
> >
> > +#ifdef CONFIG_X86_64
> > +     tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled;
> > +#endif
> >
>
> [...]
>
> > @@ -6661,6 +6671,13 @@ void __init kvm_mmu_x86_module_init(void)
> >       if (nx_huge_pages == -1)
> >               __set_nx_huge_pages(get_nx_auto_mode());
> >
> > +     /*
> > +      * Snapshot userspace's desire to enable the TDP MMU. Whether or not the
> > +      * TDP MMU is actually enabled is determined in kvm_configure_mmu()
> > +      * when the vendor module is loaded.
> > +      */
> > +     tdp_mmu_allowed = tdp_mmu_enabled;
> > +
> >       kvm_mmu_spte_module_init();
> >  }
> >
>
> Sorry last time I didn't review deeply, but I am wondering why do we need
> 'tdp_mmu_allowed' at all?  The purpose of having 'allow_mmio_caching' is because
> kvm_mmu_set_mmio_spte_mask() is called twice, and 'enable_mmio_caching' can be
> disabled in the first call, so it can be against user's desire in the second
> call.  However it appears for 'tdp_mmu_enabled' we don't need 'tdp_mmu_allowed',
> as kvm_configure_mmu() is only called once by VMX or SVM, if I read correctly.

tdp_mmu_allowed is needed because kvm_intel and kvm_amd are separate
modules from kvm. So kvm_configure_mmu() can be called multiple times
(each time kvm_intel or kvm_amd is loaded).

>
> So, should we just do below in kvm_configure_mmu()?
>
>         #ifdef CONFIG_X86_64
>         if (!tdp_enabled)
>                 tdp_mmu_enabled = false;
>         #endif
>
>
> --
> Thanks,
> -Kai
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  2022-09-27 16:14     ` David Matlack
@ 2022-09-27 21:10       ` Huang, Kai
  2022-09-29  1:56         ` Huang, Kai
  2022-10-03 18:58         ` Isaku Yamahata
  0 siblings, 2 replies; 28+ messages in thread
From: Huang, Kai @ 2022-09-27 21:10 UTC (permalink / raw)
  To: dmatlack; +Cc: kvm, pbonzini, peterx, Christopherson,, Sean, isaku.yamahata

On Tue, 2022-09-27 at 09:14 -0700, David Matlack wrote:
> On Tue, Sep 27, 2022 at 2:19 AM Huang, Kai <kai.huang@intel.com> wrote:
> > 
> > 
> > > 
> > > +bool __ro_after_init tdp_mmu_allowed;
> > > +
> > 
> > [...]
> > 
> > > @@ -5662,6 +5669,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> > >       tdp_root_level = tdp_forced_root_level;
> > >       max_tdp_level = tdp_max_root_level;
> > > 
> > > +#ifdef CONFIG_X86_64
> > > +     tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled;
> > > +#endif
> > > 
> > 
> > [...]
> > 
> > > @@ -6661,6 +6671,13 @@ void __init kvm_mmu_x86_module_init(void)
> > >       if (nx_huge_pages == -1)
> > >               __set_nx_huge_pages(get_nx_auto_mode());
> > > 
> > > +     /*
> > > +      * Snapshot userspace's desire to enable the TDP MMU. Whether or not the
> > > +      * TDP MMU is actually enabled is determined in kvm_configure_mmu()
> > > +      * when the vendor module is loaded.
> > > +      */
> > > +     tdp_mmu_allowed = tdp_mmu_enabled;
> > > +
> > >       kvm_mmu_spte_module_init();
> > >  }
> > > 
> > 
> > Sorry last time I didn't review deeply, but I am wondering why do we need
> > 'tdp_mmu_allowed' at all?  The purpose of having 'allow_mmio_caching' is because
> > kvm_mmu_set_mmio_spte_mask() is called twice, and 'enable_mmio_caching' can be
> > disabled in the first call, so it can be against user's desire in the second
> > call.  However it appears for 'tdp_mmu_enabled' we don't need 'tdp_mmu_allowed',
> > as kvm_configure_mmu() is only called once by VMX or SVM, if I read correctly.
> 
> tdp_mmu_allowed is needed because kvm_intel and kvm_amd are separate
> modules from kvm. So kvm_configure_mmu() can be called multiple times
> (each time kvm_intel or kvm_amd is loaded).
> 
> 

Indeed. :)

Reviewed-by: Kai Huang <kai.huang@intel.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  2022-09-27 21:10       ` Huang, Kai
@ 2022-09-29  1:56         ` Huang, Kai
  2022-10-03 18:58         ` Isaku Yamahata
  1 sibling, 0 replies; 28+ messages in thread
From: Huang, Kai @ 2022-09-29  1:56 UTC (permalink / raw)
  To: dmatlack; +Cc: kvm, pbonzini, peterx, Christopherson,, Sean, isaku.yamahata

On Tue, 2022-09-27 at 21:10 +0000, Huang, Kai wrote:
> On Tue, 2022-09-27 at 09:14 -0700, David Matlack wrote:
> > On Tue, Sep 27, 2022 at 2:19 AM Huang, Kai <kai.huang@intel.com> wrote:
> > > 
> > > 
> > > > 
> > > > +bool __ro_after_init tdp_mmu_allowed;
> > > > +

Nit: it appears this can be static.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  2022-09-27 21:10       ` Huang, Kai
  2022-09-29  1:56         ` Huang, Kai
@ 2022-10-03 18:58         ` Isaku Yamahata
  2022-10-03 20:02           ` David Matlack
  2022-10-03 20:11           ` Huang, Kai
  1 sibling, 2 replies; 28+ messages in thread
From: Isaku Yamahata @ 2022-10-03 18:58 UTC (permalink / raw)
  To: Huang, Kai
  Cc: dmatlack, kvm, pbonzini, peterx, Christopherson,, Sean, isaku.yamahata

On Tue, Sep 27, 2022 at 09:10:43PM +0000,
"Huang, Kai" <kai.huang@intel.com> wrote:

> On Tue, 2022-09-27 at 09:14 -0700, David Matlack wrote:
> > On Tue, Sep 27, 2022 at 2:19 AM Huang, Kai <kai.huang@intel.com> wrote:
> > > 
> > > 
> > > > 
> > > > +bool __ro_after_init tdp_mmu_allowed;
> > > > +
> > > 
> > > [...]
> > > 
> > > > @@ -5662,6 +5669,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> > > >       tdp_root_level = tdp_forced_root_level;
> > > >       max_tdp_level = tdp_max_root_level;
> > > > 
> > > > +#ifdef CONFIG_X86_64
> > > > +     tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled;
> > > > +#endif
> > > > 
> > > 
> > > [...]
> > > 
> > > > @@ -6661,6 +6671,13 @@ void __init kvm_mmu_x86_module_init(void)
> > > >       if (nx_huge_pages == -1)
> > > >               __set_nx_huge_pages(get_nx_auto_mode());
> > > > 
> > > > +     /*
> > > > +      * Snapshot userspace's desire to enable the TDP MMU. Whether or not the
> > > > +      * TDP MMU is actually enabled is determined in kvm_configure_mmu()
> > > > +      * when the vendor module is loaded.
> > > > +      */
> > > > +     tdp_mmu_allowed = tdp_mmu_enabled;
> > > > +
> > > >       kvm_mmu_spte_module_init();
> > > >  }
> > > > 
> > > 
> > > Sorry last time I didn't review deeply, but I am wondering why do we need
> > > 'tdp_mmu_allowed' at all?  The purpose of having 'allow_mmio_caching' is because
> > > kvm_mmu_set_mmio_spte_mask() is called twice, and 'enable_mmio_caching' can be
> > > disabled in the first call, so it can be against user's desire in the second
> > > call.  However it appears for 'tdp_mmu_enabled' we don't need 'tdp_mmu_allowed',
> > > as kvm_configure_mmu() is only called once by VMX or SVM, if I read correctly.
> > 
> > tdp_mmu_allowed is needed because kvm_intel and kvm_amd are separate
> > modules from kvm. So kvm_configure_mmu() can be called multiple times
> > (each time kvm_intel or kvm_amd is loaded).
> > 
> > 
> 
> Indeed. :)
> 
> Reviewed-by: Kai Huang <kai.huang@intel.com>

kvm_arch_init() which is called early during the module initialization before
kvm_configure_mmu() via kvm_arch_hardware_setup() checks if the vendor module
(kvm_intel or kvm_amd) was already loaded.  If yes, it results in -EEXIST.

So kvm_configure_mmu() won't be called twice.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 02/10] KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled
  2022-09-21 17:35 ` [PATCH v3 02/10] KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled David Matlack
@ 2022-10-03 19:01   ` Isaku Yamahata
  0 siblings, 0 replies; 28+ messages in thread
From: Isaku Yamahata @ 2022-10-03 19:01 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Kai Huang,
	Isaku Yamahata, Peter Xu


On Wed, Sep 21, 2022 at 10:35:38AM -0700,
David Matlack <dmatlack@google.com> wrote:

> Move kvm_mmu_{init,uninit}_tdp_mmu() behind tdp_mmu_enabled. This makes
> these functions consistent with the rest of the calls into the TDP MMU
> from mmu.c, and which is now possible since tdp_mmu_enabled is only
> modified when the x86 vendor module is loaded. i.e. It will never change
> during the lifetime of a VM.
> 
> This change also enabled removing the stub definitions for 32-bit KVM,
> as the compiler will just optimize the calls out like it does for all
> the other TDP MMU functions.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c     | 11 +++++++----
>  arch/x86/kvm/mmu/tdp_mmu.c |  6 ------
>  arch/x86/kvm/mmu/tdp_mmu.h |  7 +++----
>  3 files changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index ccb0b18fd194..dd261cd2ad4e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5970,9 +5970,11 @@ int kvm_mmu_init_vm(struct kvm *kvm)
>  	INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
>  	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
>  
> -	r = kvm_mmu_init_tdp_mmu(kvm);
> -	if (r < 0)
> -		return r;
> +	if (tdp_mmu_enabled) {
> +		r = kvm_mmu_init_tdp_mmu(kvm);
> +		if (r < 0)
> +			return r;
> +	}
>  
>  	node->track_write = kvm_mmu_pte_write;
>  	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
> @@ -6002,7 +6004,8 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
>  
>  	kvm_page_track_unregister_notifier(kvm, node);
>  
> -	kvm_mmu_uninit_tdp_mmu(kvm);
> +	if (tdp_mmu_enabled)
> +		kvm_mmu_uninit_tdp_mmu(kvm);
>  
>  	mmu_free_vm_memory_caches(kvm);
>  }
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index e7d0f21fbbe8..08ab3596dfaa 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -15,9 +15,6 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
>  {
>  	struct workqueue_struct *wq;
>  
> -	if (!tdp_mmu_enabled)
> -		return 0;
> -
>  	wq = alloc_workqueue("kvm", WQ_UNBOUND|WQ_MEM_RECLAIM|WQ_CPU_INTENSIVE, 0);
>  	if (!wq)
>  		return -ENOMEM;
> @@ -43,9 +40,6 @@ static __always_inline bool kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
>  
>  void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>  {
> -	if (!tdp_mmu_enabled)
> -		return;
> -
>  	/* Also waits for any queued work items.  */
>  	destroy_workqueue(kvm->arch.tdp_mmu_zap_wq);
>  
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index c163f7cc23ca..9d086a103f77 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -5,6 +5,9 @@
>  
>  #include <linux/kvm_host.h>
>  
> +int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
> +void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
> +
>  hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
>  
>  __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
> @@ -66,8 +69,6 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
>  					u64 *spte);
>  
>  #ifdef CONFIG_X86_64
> -int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
> -void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
>  static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
>  
>  static inline bool is_tdp_mmu(struct kvm_mmu *mmu)
> @@ -87,8 +88,6 @@ static inline bool is_tdp_mmu(struct kvm_mmu *mmu)
>  	return sp && is_tdp_mmu_page(sp) && sp->root_count;
>  }
>  #else
> -static inline int kvm_mmu_init_tdp_mmu(struct kvm *kvm) { return 0; }
> -static inline void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) {}
>  static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
>  static inline bool is_tdp_mmu(struct kvm_mmu *mmu) { return false; }
>  #endif
> -- 
> 2.37.3.998.g577e59143f-goog

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 03/10] KVM: x86/mmu: Grab mmu_invalidate_seq in kvm_faultin_pfn()
  2022-09-21 17:35 ` [PATCH v3 03/10] KVM: x86/mmu: Grab mmu_invalidate_seq in kvm_faultin_pfn() David Matlack
@ 2022-10-03 19:05   ` Isaku Yamahata
  0 siblings, 0 replies; 28+ messages in thread
From: Isaku Yamahata @ 2022-10-03 19:05 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Kai Huang,
	Isaku Yamahata, Peter Xu

On Wed, Sep 21, 2022 at 10:35:39AM -0700,
David Matlack <dmatlack@google.com> wrote:

> Grab mmu_invalidate_seq in kvm_faultin_pfn() and stash it in struct
> kvm_page_fault. The eliminates duplicate code and reduces the amount of
> parameters needed for is_page_fault_stale().
> 
> Preemptively split out __kvm_faultin_pfn() to a separate function for
> use in subsequent commits.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c          | 21 ++++++++++++---------
>  arch/x86/kvm/mmu/mmu_internal.h |  1 +
>  arch/x86/kvm/mmu/paging_tmpl.h  |  6 +-----
>  3 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index dd261cd2ad4e..31b835d20762 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4129,7 +4129,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>  	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
>  }
>  
> -static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> +static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
>  	struct kvm_memory_slot *slot = fault->slot;
>  	bool async;
> @@ -4185,12 +4185,20 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  	return RET_PF_CONTINUE;
>  }
>  
> +static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> +{
> +	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> +	smp_rmb();
> +
> +	return __kvm_faultin_pfn(vcpu, fault);
> +}
> +
>  /*
>   * Returns true if the page fault is stale and needs to be retried, i.e. if the
>   * root was invalidated by a memslot update or a relevant mmu_notifier fired.
>   */
>  static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
> -				struct kvm_page_fault *fault, int mmu_seq)
> +				struct kvm_page_fault *fault)
>  {
>  	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
>  
> @@ -4210,14 +4218,12 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
>  		return true;
>  
>  	return fault->slot &&
> -	       mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva);
> +	       mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva);
>  }
>  
>  static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
>  	bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
> -
> -	unsigned long mmu_seq;
>  	int r;
>  
>  	fault->gfn = fault->addr >> PAGE_SHIFT;
> @@ -4234,9 +4240,6 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	if (r)
>  		return r;
>  
> -	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> -	smp_rmb();
> -
>  	r = kvm_faultin_pfn(vcpu, fault);
>  	if (r != RET_PF_CONTINUE)
>  		return r;
> @@ -4252,7 +4255,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	else
>  		write_lock(&vcpu->kvm->mmu_lock);
>  
> -	if (is_page_fault_stale(vcpu, fault, mmu_seq))
> +	if (is_page_fault_stale(vcpu, fault))
>  		goto out_unlock;
>  
>  	r = make_mmu_pages_available(vcpu);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 582def531d4d..1c0a1e7c796d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -221,6 +221,7 @@ struct kvm_page_fault {
>  	struct kvm_memory_slot *slot;
>  
>  	/* Outputs of kvm_faultin_pfn.  */
> +	unsigned long mmu_seq;
>  	kvm_pfn_t pfn;
>  	hva_t hva;
>  	bool map_writable;
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 39e0205e7300..98f4abce4eaf 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -791,7 +791,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  {
>  	struct guest_walker walker;
>  	int r;
> -	unsigned long mmu_seq;
>  	bool is_self_change_mapping;
>  
>  	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
> @@ -838,9 +837,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	else
>  		fault->max_level = walker.level;
>  
> -	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
> -	smp_rmb();
> -
>  	r = kvm_faultin_pfn(vcpu, fault);
>  	if (r != RET_PF_CONTINUE)
>  		return r;
> @@ -871,7 +867,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	r = RET_PF_RETRY;
>  	write_lock(&vcpu->kvm->mmu_lock);
>  
> -	if (is_page_fault_stale(vcpu, fault, mmu_seq))
> +	if (is_page_fault_stale(vcpu, fault))
>  		goto out_unlock;
>  
>  	r = make_mmu_pages_available(vcpu);
> -- 
> 2.37.3.998.g577e59143f-goog
> 

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 05/10] KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON handling
  2022-09-21 17:35 ` [PATCH v3 05/10] KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON handling David Matlack
@ 2022-10-03 19:23   ` Isaku Yamahata
  0 siblings, 0 replies; 28+ messages in thread
From: Isaku Yamahata @ 2022-10-03 19:23 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Kai Huang,
	Isaku Yamahata, Peter Xu

On Wed, Sep 21, 2022 at 10:35:41AM -0700,
David Matlack <dmatlack@google.com> wrote:

> Pass the kvm_page_fault struct down to kvm_handle_error_pfn() to avoid a
> memslot lookup when handling KVM_PFN_ERR_HWPOISON. Opportunistically
> move the gfn_to_hva_memslot() call and @current down into
> kvm_send_hwpoison_signal() to cut down on line lengths.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 49a5e38ecc5c..b6f84e470677 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3136,23 +3136,25 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  	return ret;
>  }
>  
> -static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *tsk)
> +static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn)
>  {
> -	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, PAGE_SHIFT, tsk);
> +	unsigned long hva = gfn_to_hva_memslot(slot, gfn);
> +
> +	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, PAGE_SHIFT, current);
>  }
>  
> -static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
> +static int kvm_handle_error_pfn(struct kvm_page_fault *fault)
>  {
>  	/*
>  	 * Do not cache the mmio info caused by writing the readonly gfn
>  	 * into the spte otherwise read access on readonly gfn also can
>  	 * caused mmio page fault and treat it as mmio access.
>  	 */
> -	if (pfn == KVM_PFN_ERR_RO_FAULT)
> +	if (fault->pfn == KVM_PFN_ERR_RO_FAULT)
>  		return RET_PF_EMULATE;
>  
> -	if (pfn == KVM_PFN_ERR_HWPOISON) {
> -		kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn), current);
> +	if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
> +		kvm_send_hwpoison_signal(fault->slot, fault->gfn);
>  		return RET_PF_RETRY;
>  	}
>  
> @@ -4193,7 +4195,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  		return ret;
>  
>  	if (unlikely(is_error_pfn(fault->pfn)))
> -		return kvm_handle_error_pfn(vcpu, fault->gfn, fault->pfn);
> +		return kvm_handle_error_pfn(fault);
>  
>  	return RET_PF_CONTINUE;
>  }
> -- 
> 2.37.3.998.g577e59143f-goog
> 

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 07/10] KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUs
  2022-09-21 17:35 ` [PATCH v3 07/10] KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUs David Matlack
@ 2022-10-03 19:27   ` Isaku Yamahata
  0 siblings, 0 replies; 28+ messages in thread
From: Isaku Yamahata @ 2022-10-03 19:27 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Kai Huang,
	Isaku Yamahata, Peter Xu

On Wed, Sep 21, 2022 at 10:35:43AM -0700,
David Matlack <dmatlack@google.com> wrote:

> Move the initialization of fault.{gfn,slot} earlier in the page fault
> handling code for fully direct MMUs. This will enable a future commit to
> split out TDP MMU page fault handling without needing to duplicate the
> initialization of these 2 fields.
> 
> Opportunistically take advantage of the fact that fault.gfn is
> initialized in kvm_tdp_page_fault() rather than recomputing it from
> fault->addr.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c          | 5 +----
>  arch/x86/kvm/mmu/mmu_internal.h | 5 +++++
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e3b248385154..dc203973de83 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4241,9 +4241,6 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
>  	int r;
>  
> -	fault->gfn = fault->addr >> PAGE_SHIFT;
> -	fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
> -
>  	if (page_fault_handle_page_track(vcpu, fault))
>  		return RET_PF_EMULATE;
>  
> @@ -4347,7 +4344,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  	if (shadow_memtype_mask && kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
>  		for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level) {
>  			int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);
> -			gfn_t base = (fault->addr >> PAGE_SHIFT) & ~(page_num - 1);
> +			gfn_t base = fault->gfn & ~(page_num - 1);
>  
>  			if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num))
>  				break;
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 1c0a1e7c796d..1e91f24bd865 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -279,6 +279,11 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  	};
>  	int r;
>  
> +	if (vcpu->arch.mmu->root_role.direct) {
> +		fault.gfn = fault.addr >> PAGE_SHIFT;
> +		fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
> +	}
> +
>  	/*
>  	 * Async #PF "faults", a.k.a. prefetch faults, are not faults from the
>  	 * guest perspective and have already been counted at the time of the
> -- 
> 2.37.3.998.g577e59143f-goog
> 

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 08/10] KVM: x86/mmu: Split out TDP MMU page fault handling
  2022-09-21 17:35 ` [PATCH v3 08/10] KVM: x86/mmu: Split out TDP MMU page fault handling David Matlack
@ 2022-10-03 19:28   ` Isaku Yamahata
  2022-10-11 21:56   ` Sean Christopherson
  1 sibling, 0 replies; 28+ messages in thread
From: Isaku Yamahata @ 2022-10-03 19:28 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Kai Huang,
	Isaku Yamahata, Peter Xu

On Wed, Sep 21, 2022 at 10:35:44AM -0700,
David Matlack <dmatlack@google.com> wrote:

> Split out the page fault handling for the TDP MMU to a separate
> function.  This creates some duplicate code, but makes the TDP MMU fault
> handler simpler to read by eliminating branches and will enable future
> cleanups by allowing the TDP MMU and non-TDP MMU fault paths to diverge.
> 
> Only compile in the TDP MMU fault handler for 64-bit builds since
> kvm_tdp_mmu_map() does not exist in 32-bit builds.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 62 ++++++++++++++++++++++++++++++++----------
>  1 file changed, 48 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index dc203973de83..b36f351138f7 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4238,7 +4238,6 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
>  
>  static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
> -	bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
>  	int r;
>  
>  	if (page_fault_handle_page_track(vcpu, fault))
> @@ -4257,11 +4256,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  		return r;
>  
>  	r = RET_PF_RETRY;
> -
> -	if (is_tdp_mmu_fault)
> -		read_lock(&vcpu->kvm->mmu_lock);
> -	else
> -		write_lock(&vcpu->kvm->mmu_lock);
> +	write_lock(&vcpu->kvm->mmu_lock);
>  
>  	if (is_page_fault_stale(vcpu, fault))
>  		goto out_unlock;
> @@ -4270,16 +4265,10 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	if (r)
>  		goto out_unlock;
>  
> -	if (is_tdp_mmu_fault)
> -		r = kvm_tdp_mmu_map(vcpu, fault);
> -	else
> -		r = __direct_map(vcpu, fault);
> +	r = __direct_map(vcpu, fault);
>  
>  out_unlock:
> -	if (is_tdp_mmu_fault)
> -		read_unlock(&vcpu->kvm->mmu_lock);
> -	else
> -		write_unlock(&vcpu->kvm->mmu_lock);
> +	write_unlock(&vcpu->kvm->mmu_lock);
>  	kvm_release_pfn_clean(fault->pfn);
>  	return r;
>  }
> @@ -4327,6 +4316,46 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
>  }
>  EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
>  
> +#ifdef CONFIG_X86_64
> +static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
> +				  struct kvm_page_fault *fault)
> +{
> +	int r;
> +
> +	if (page_fault_handle_page_track(vcpu, fault))
> +		return RET_PF_EMULATE;
> +
> +	r = fast_page_fault(vcpu, fault);
> +	if (r != RET_PF_INVALID)
> +		return r;
> +
> +	r = mmu_topup_memory_caches(vcpu, false);
> +	if (r)
> +		return r;
> +
> +	r = kvm_faultin_pfn(vcpu, fault, ACC_ALL);
> +	if (r != RET_PF_CONTINUE)
> +		return r;
> +
> +	r = RET_PF_RETRY;
> +	read_lock(&vcpu->kvm->mmu_lock);
> +
> +	if (is_page_fault_stale(vcpu, fault))
> +		goto out_unlock;
> +
> +	r = make_mmu_pages_available(vcpu);
> +	if (r)
> +		goto out_unlock;
> +
> +	r = kvm_tdp_mmu_map(vcpu, fault);
> +
> +out_unlock:
> +	read_unlock(&vcpu->kvm->mmu_lock);
> +	kvm_release_pfn_clean(fault->pfn);
> +	return r;
> +}
> +#endif
> +
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
>  	/*
> @@ -4351,6 +4380,11 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  		}
>  	}
>  
> +#ifdef CONFIG_X86_64
> +	if (tdp_mmu_enabled)
> +		return kvm_tdp_mmu_page_fault(vcpu, fault);
> +#endif
> +
>  	return direct_page_fault(vcpu, fault);
>  }
>  
> -- 
> 2.37.3.998.g577e59143f-goog
> 

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 09/10] KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU faults
  2022-09-21 17:35 ` [PATCH v3 09/10] KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU faults David Matlack
@ 2022-10-03 19:28   ` Isaku Yamahata
  0 siblings, 0 replies; 28+ messages in thread
From: Isaku Yamahata @ 2022-10-03 19:28 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Kai Huang,
	Isaku Yamahata, Peter Xu

On Wed, Sep 21, 2022 at 10:35:45AM -0700,
David Matlack <dmatlack@google.com> wrote:

> Stop calling make_mmu_pages_available() when handling TDP MMU faults.
> The TDP MMU does not participate in the "available MMU pages" tracking
> and limiting so calling this function is unnecessary work when handling
> TDP MMU faults.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b36f351138f7..4ad70fa371df 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4343,10 +4343,6 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
>  	if (is_page_fault_stale(vcpu, fault))
>  		goto out_unlock;
>  
> -	r = make_mmu_pages_available(vcpu);
> -	if (r)
> -		goto out_unlock;
> -
>  	r = kvm_tdp_mmu_map(vcpu, fault);
>  
>  out_unlock:
> -- 
> 2.37.3.998.g577e59143f-goog
> 

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 10/10] KVM: x86/mmu: Rename __direct_map() to direct_map()
  2022-09-21 17:35 ` [PATCH v3 10/10] KVM: x86/mmu: Rename __direct_map() to direct_map() David Matlack
@ 2022-10-03 19:29   ` Isaku Yamahata
  0 siblings, 0 replies; 28+ messages in thread
From: Isaku Yamahata @ 2022-10-03 19:29 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, kvm, Kai Huang,
	Isaku Yamahata, Peter Xu

On Wed, Sep 21, 2022 at 10:35:46AM -0700,
David Matlack <dmatlack@google.com> wrote:

> Rename __direct_map() to direct_map() since the leading underscores are
> unnecessary. This also makes the page fault handler names more
> consistent: kvm_tdp_mmu_page_fault() calls kvm_tdp_mmu_map() and
> direct_page_fault() calls direct_map().
> 
> Opportunistically make some trivial cleanups to comments that had to be
> modified anyway since they mentioned __direct_map(). Specifically, use
> "()" when referring to functions, and include kvm_tdp_mmu_map() among
> the various callers of disallowed_hugepage_adjust().
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c          | 14 +++++++-------
>  arch/x86/kvm/mmu/mmu_internal.h |  2 +-
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4ad70fa371df..a0b4bc3c9202 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3079,11 +3079,11 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>  	    is_shadow_present_pte(spte) &&
>  	    !is_large_pte(spte)) {
>  		/*
> -		 * A small SPTE exists for this pfn, but FNAME(fetch)
> -		 * and __direct_map would like to create a large PTE
> -		 * instead: just force them to go down another level,
> -		 * patching back for them into pfn the next 9 bits of
> -		 * the address.
> +		 * A small SPTE exists for this pfn, but FNAME(fetch),
> +		 * direct_map(), or kvm_tdp_mmu_map() would like to create a
> +		 * large PTE instead: just force them to go down another level,
> +		 * patching back for them into pfn the next 9 bits of the
> +		 * address.
>  		 */
>  		u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
>  				KVM_PAGES_PER_HPAGE(cur_level - 1);
> @@ -3092,7 +3092,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>  	}
>  }
>  
> -static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> +static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
>  	struct kvm_shadow_walk_iterator it;
>  	struct kvm_mmu_page *sp;
> @@ -4265,7 +4265,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	if (r)
>  		goto out_unlock;
>  
> -	r = __direct_map(vcpu, fault);
> +	r = direct_map(vcpu, fault);
>  
>  out_unlock:
>  	write_unlock(&vcpu->kvm->mmu_lock);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 1e91f24bd865..b8c116ec1a89 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -198,7 +198,7 @@ struct kvm_page_fault {
>  
>  	/*
>  	 * Maximum page size that can be created for this fault; input to
> -	 * FNAME(fetch), __direct_map and kvm_tdp_mmu_map.
> +	 * FNAME(fetch), direct_map() and kvm_tdp_mmu_map().
>  	 */
>  	u8 max_level;
>  
> -- 
> 2.37.3.998.g577e59143f-goog
> 

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  2022-10-03 18:58         ` Isaku Yamahata
@ 2022-10-03 20:02           ` David Matlack
  2022-10-03 21:56             ` Isaku Yamahata
  2022-10-03 20:11           ` Huang, Kai
  1 sibling, 1 reply; 28+ messages in thread
From: David Matlack @ 2022-10-03 20:02 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: Huang, Kai, kvm, pbonzini, peterx, Christopherson,, Sean

On Mon, Oct 03, 2022 at 11:58:34AM -0700, Isaku Yamahata wrote:
> On Tue, Sep 27, 2022 at 09:10:43PM +0000, "Huang, Kai" <kai.huang@intel.com> wrote:
> > On Tue, 2022-09-27 at 09:14 -0700, David Matlack wrote:
> > > On Tue, Sep 27, 2022 at 2:19 AM Huang, Kai <kai.huang@intel.com> wrote:
> > > > 
> > > > Sorry last time I didn't review deeply, but I am wondering why do we need
> > > > 'tdp_mmu_allowed' at all?  The purpose of having 'allow_mmio_caching' is because
> > > > kvm_mmu_set_mmio_spte_mask() is called twice, and 'enable_mmio_caching' can be
> > > > disabled in the first call, so it can be against user's desire in the second
> > > > call.  However it appears for 'tdp_mmu_enabled' we don't need 'tdp_mmu_allowed',
> > > > as kvm_configure_mmu() is only called once by VMX or SVM, if I read correctly.
> > > 
> > > tdp_mmu_allowed is needed because kvm_intel and kvm_amd are separate
> > > modules from kvm. So kvm_configure_mmu() can be called multiple times
> > > (each time kvm_intel or kvm_amd is loaded).
> > > 
> > > 
> > 
> > Indeed. :)
> > 
> > Reviewed-by: Kai Huang <kai.huang@intel.com>
> 
> kvm_arch_init() which is called early during the module initialization before
> kvm_configure_mmu() via kvm_arch_hardware_setup() checks if the vendor module
> (kvm_intel or kvm_amd) was already loaded.  If yes, it results in -EEXIST.
> 
> So kvm_configure_mmu() won't be called twice.

kvm_configure_mmu() can be called multiple times if the vendor module is
unloaded without unloading the kvm module. For example:

 $ modprobe kvm
 $ modprobe kvm_intel ept=Y  # kvm_configure_mmu(true, ...)
 $ modprobe -r kvm_intel
 $ modprobe kvm_intel ept=N  # kvm_configure_mmu(false, ...)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  2022-10-03 18:58         ` Isaku Yamahata
  2022-10-03 20:02           ` David Matlack
@ 2022-10-03 20:11           ` Huang, Kai
  1 sibling, 0 replies; 28+ messages in thread
From: Huang, Kai @ 2022-10-03 20:11 UTC (permalink / raw)
  To: isaku.yamahata; +Cc: kvm, pbonzini, peterx, dmatlack, Christopherson,, Sean

On Mon, 2022-10-03 at 11:58 -0700, Isaku Yamahata wrote:
> On Tue, Sep 27, 2022 at 09:10:43PM +0000,
> "Huang, Kai" <kai.huang@intel.com> wrote:
> 
> > On Tue, 2022-09-27 at 09:14 -0700, David Matlack wrote:
> > > On Tue, Sep 27, 2022 at 2:19 AM Huang, Kai <kai.huang@intel.com> wrote:
> > > > 
> > > > 
> > > > > 
> > > > > +bool __ro_after_init tdp_mmu_allowed;
> > > > > +
> > > > 
> > > > [...]
> > > > 
> > > > > @@ -5662,6 +5669,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> > > > >       tdp_root_level = tdp_forced_root_level;
> > > > >       max_tdp_level = tdp_max_root_level;
> > > > > 
> > > > > +#ifdef CONFIG_X86_64
> > > > > +     tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled;
> > > > > +#endif
> > > > > 
> > > > 
> > > > [...]
> > > > 
> > > > > @@ -6661,6 +6671,13 @@ void __init kvm_mmu_x86_module_init(void)
> > > > >       if (nx_huge_pages == -1)
> > > > >               __set_nx_huge_pages(get_nx_auto_mode());
> > > > > 
> > > > > +     /*
> > > > > +      * Snapshot userspace's desire to enable the TDP MMU. Whether or not the
> > > > > +      * TDP MMU is actually enabled is determined in kvm_configure_mmu()
> > > > > +      * when the vendor module is loaded.
> > > > > +      */
> > > > > +     tdp_mmu_allowed = tdp_mmu_enabled;
> > > > > +
> > > > >       kvm_mmu_spte_module_init();
> > > > >  }
> > > > > 
> > > > 
> > > > Sorry last time I didn't review deeply, but I am wondering why do we need
> > > > 'tdp_mmu_allowed' at all?  The purpose of having 'allow_mmio_caching' is because
> > > > kvm_mmu_set_mmio_spte_mask() is called twice, and 'enable_mmio_caching' can be
> > > > disabled in the first call, so it can be against user's desire in the second
> > > > call.  However it appears for 'tdp_mmu_enabled' we don't need 'tdp_mmu_allowed',
> > > > as kvm_configure_mmu() is only called once by VMX or SVM, if I read correctly.
> > > 
> > > tdp_mmu_allowed is needed because kvm_intel and kvm_amd are separate
> > > modules from kvm. So kvm_configure_mmu() can be called multiple times
> > > (each time kvm_intel or kvm_amd is loaded).
> > > 
> > > 
> > 
> > Indeed. :)
> > 
> > Reviewed-by: Kai Huang <kai.huang@intel.com>
> 
> kvm_arch_init() which is called early during the module initialization before
> kvm_configure_mmu() via kvm_arch_hardware_setup() checks if the vendor module
> (kvm_intel or kvm_amd) was already loaded.  If yes, it results in -EEXIST.
> 
> So kvm_configure_mmu() won't be called twice.
> 

Hi Isaku,

Please consider module reload.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  2022-10-03 20:02           ` David Matlack
@ 2022-10-03 21:56             ` Isaku Yamahata
  0 siblings, 0 replies; 28+ messages in thread
From: Isaku Yamahata @ 2022-10-03 21:56 UTC (permalink / raw)
  To: David Matlack
  Cc: Isaku Yamahata, Huang, Kai, kvm, pbonzini, peterx, Christopherson,, Sean

On Mon, Oct 03, 2022 at 01:02:42PM -0700,
David Matlack <dmatlack@google.com> wrote:

> On Mon, Oct 03, 2022 at 11:58:34AM -0700, Isaku Yamahata wrote:
> > On Tue, Sep 27, 2022 at 09:10:43PM +0000, "Huang, Kai" <kai.huang@intel.com> wrote:
> > > On Tue, 2022-09-27 at 09:14 -0700, David Matlack wrote:
> > > > On Tue, Sep 27, 2022 at 2:19 AM Huang, Kai <kai.huang@intel.com> wrote:
> > > > > 
> > > > > Sorry last time I didn't review deeply, but I am wondering why do we need
> > > > > 'tdp_mmu_allowed' at all?  The purpose of having 'allow_mmio_caching' is because
> > > > > kvm_mmu_set_mmio_spte_mask() is called twice, and 'enable_mmio_caching' can be
> > > > > disabled in the first call, so it can be against user's desire in the second
> > > > > call.  However it appears for 'tdp_mmu_enabled' we don't need 'tdp_mmu_allowed',
> > > > > as kvm_configure_mmu() is only called once by VMX or SVM, if I read correctly.
> > > > 
> > > > tdp_mmu_allowed is needed because kvm_intel and kvm_amd are separate
> > > > modules from kvm. So kvm_configure_mmu() can be called multiple times
> > > > (each time kvm_intel or kvm_amd is loaded).
> > > > 
> > > > 
> > > 
> > > Indeed. :)
> > > 
> > > Reviewed-by: Kai Huang <kai.huang@intel.com>
> > 
> > kvm_arch_init() which is called early during the module initialization before
> > kvm_configure_mmu() via kvm_arch_hardware_setup() checks if the vendor module
> > (kvm_intel or kvm_amd) was already loaded.  If yes, it results in -EEXIST.
> > 
> > So kvm_configure_mmu() won't be called twice.
> 
> kvm_configure_mmu() can be called multiple times if the vendor module is
> unloaded without unloading the kvm module. For example:
> 
>  $ modprobe kvm
>  $ modprobe kvm_intel ept=Y  # kvm_configure_mmu(true, ...)
>  $ modprobe -r kvm_intel
>  $ modprobe kvm_intel ept=N  # kvm_configure_mmu(false, ...)

Oh, yes, you're right.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter
  2022-09-21 17:35 ` [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter David Matlack
  2022-09-27  9:19   ` Huang, Kai
@ 2022-10-11 20:12   ` Sean Christopherson
  1 sibling, 0 replies; 28+ messages in thread
From: Sean Christopherson @ 2022-10-11 20:12 UTC (permalink / raw)
  To: David Matlack; +Cc: Paolo Bonzini, kvm, Kai Huang, Isaku Yamahata, Peter Xu

On Wed, Sep 21, 2022, David Matlack wrote:
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 6bdaacb6faa0..168c46fd8dd1 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -230,14 +230,14 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm)
>  }
>  
>  #ifdef CONFIG_X86_64
> -static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; }
> +extern bool tdp_mmu_enabled;
>  #else
> -static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; }
> +#define tdp_mmu_enabled false
>  #endif

Rather than open code references to the variable, keep the wrappers so that the
guts can be changed without needing to churn a pile of code.  I'll follow-up in
the "Split out TDP MMU page fault handling" with the reasoning.

E.g.

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6bdaacb6faa0..1ad6d02e103f 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -230,14 +230,21 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm)
 }
 
 #ifdef CONFIG_X86_64
-static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; }
+extern bool tdp_mmu_enabled;
+#endif
+
+static inline bool is_tdp_mmu_enabled(void)
+{
+#ifdef CONFIG_X86_64
+	return tdp_mmu_enabled;
 #else
-static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; }
+	return false;
 #endif
+}
 
 static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
 {
-	return !is_tdp_mmu_enabled(kvm) || kvm_shadow_root_allocated(kvm);
+	return !is_tdp_mmu_enabled() || kvm_shadow_root_allocated(kvm);
 }
 
 static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 08/10] KVM: x86/mmu: Split out TDP MMU page fault handling
  2022-09-21 17:35 ` [PATCH v3 08/10] KVM: x86/mmu: Split out TDP MMU page fault handling David Matlack
  2022-10-03 19:28   ` Isaku Yamahata
@ 2022-10-11 21:56   ` Sean Christopherson
  1 sibling, 0 replies; 28+ messages in thread
From: Sean Christopherson @ 2022-10-11 21:56 UTC (permalink / raw)
  To: David Matlack; +Cc: Paolo Bonzini, kvm, Kai Huang, Isaku Yamahata, Peter Xu

On Wed, Sep 21, 2022, David Matlack wrote:
> Split out the page fault handling for the TDP MMU to a separate
> function.  This creates some duplicate code, but makes the TDP MMU fault
> handler simpler to read by eliminating branches and will enable future
> cleanups by allowing the TDP MMU and non-TDP MMU fault paths to diverge.

IMO, the code duplication isn't worth the relatively minor simplification.  And
some of the "cleanups" to reduce the duplication arguably add complexity, e.g.
moving code around in "Initialize fault.{gfn,slot} earlier for direct MMUs" isn't
a net positive IMO.

With the TDP MMU being all-or-nothing, the need to check if the MMU is a TDP MMU
goes away.  If the TDP MMU is enabled, direct_page_fault() will be reached only
for non-nested TDP page faults.  Paging is required for NPT, and EPT faults will
always go through ept_page_fault(), i.e. nonpaging_page_fault() is unused.

In other words, this code

	bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);

gets replaced with direct checks on is_tdp_mmu_enabled().  And because fast page
faults are used only for direct MMUs,  that code can also simply query whether or
not the TDP MMU is enabled.

Avoiding make_mmu_pages_available() for the TDP MMU is easy enough and doesn't
require any new conditionals.

	if (is_tdp_mmu_enabled()) {
		r = kvm_tdp_mmu_map(vcpu, fault);
	} else {
		r = make_mmu_pages_available(vcpu);
		if (r)
			goto out_unlock;

		r = __direct_map(vcpu, fault);
	}


That leaves the read vs. write lock as the only code that is borderline difficult
to read.  I agree that's a little ugly, but IMO it's not worth duplicating the
entire function.  If we really wanted to hide that ugliness, we could add helpers
to do the lock+unlock, but even that seems somewhat superfluous.

From a performance perspective, with the flag being read-only after the vendor
module is loaded, we can add a static key to track TDP MMU enabling, biased to the
TDP MMU being enabled.  That turns all of if-statements into NOPs for the TDP MMU.
(this is why I suggested keeping the is_tdp_mmu_enabled() wrapper in patch 1).
Static branches actually shrink the code size too (because that matters), e.g. a
5-byte nop instead of 8 bytes for TEST+Jcc.

I'll speculatively post a v4 with the static key idea, minus patches 7, 8, and 10.
I think that'll make it easier to discuss the static key/branch implementation.
If we decide we still want to split out the TDP MMU page fault handler, then it
should be relatively simple to fold back in those patches.

E.g. the generate code looks like this:

4282		if (is_tdp_mmu_enabled())
4283			read_lock(&vcpu->kvm->mmu_lock);
   0x000000000005ff7e <+494>:   nopl   0x0(%rax,%rax,1)
   0x000000000005ff83 <+499>:	mov    (%rbx),%rdi
   0x000000000005ff86 <+502>:	call   0x5ff8b <direct_page_fault+507>

4286	
4287		if (is_page_fault_stale(vcpu, fault))
   0x000000000005ff8b <+507>:	mov    %r15,%rsi
   0x000000000005ff8e <+510>:	mov    %rbx,%rdi
   0x000000000005ff91 <+513>:	call   0x558b0 <is_page_fault_stale>
   0x000000000005ff96 <+518>:	test   %al,%al
   0x000000000005ff98 <+520>:	jne    0x603e1 <direct_page_fault+1617>

4288			goto out_unlock;
4289	
4290		if (is_tdp_mmu_enabled()) {
4291			r = kvm_tdp_mmu_map(vcpu, fault);
   0x000000000005ff9e <+526>:   nopl   0x0(%rax,%rax,1)
   0x000000000005ffa3 <+531>:	mov    %rbx,%rdi
   0x000000000005ffa6 <+534>:	mov    %r15,%rsi
   0x000000000005ffa9 <+537>:	call   0x5ffae <direct_page_fault+542>
   0x000000000005ffae <+542>:	mov    (%rbx),%rdi
   0x000000000005ffb1 <+545>:	mov    %eax,%ebp


#ifdef CONFIG_X86_64
DECLARE_STATIC_KEY_TRUE(tdp_mmu_enabled);
#endif
                                     
static inline bool is_tdp_mmu_enabled(void)
{
#ifdef CONFIG_X86_64
        return static_branch_likely(&tdp_mmu_enabled);
#else
        return false;
#endif
}
                              
static inline bool is_tdp_mmu(struct kvm_mmu *mmu)
{
        return is_tdp_mmu_enabled() && mmu->root_role.direct;
}





^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-10-11 21:56 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21 17:35 [PATCH v3 00/10] KVM: x86/mmu: Make tdp_mmu read-only and clean up TPD MMU fault handler David Matlack
2022-09-21 17:35 ` [PATCH v3 01/10] KVM: x86/mmu: Change tdp_mmu to a read-only parameter David Matlack
2022-09-27  9:19   ` Huang, Kai
2022-09-27 16:14     ` David Matlack
2022-09-27 21:10       ` Huang, Kai
2022-09-29  1:56         ` Huang, Kai
2022-10-03 18:58         ` Isaku Yamahata
2022-10-03 20:02           ` David Matlack
2022-10-03 21:56             ` Isaku Yamahata
2022-10-03 20:11           ` Huang, Kai
2022-10-11 20:12   ` Sean Christopherson
2022-09-21 17:35 ` [PATCH v3 02/10] KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled David Matlack
2022-10-03 19:01   ` Isaku Yamahata
2022-09-21 17:35 ` [PATCH v3 03/10] KVM: x86/mmu: Grab mmu_invalidate_seq in kvm_faultin_pfn() David Matlack
2022-10-03 19:05   ` Isaku Yamahata
2022-09-21 17:35 ` [PATCH v3 04/10] KVM: x86/mmu: Handle error PFNs " David Matlack
2022-09-21 17:35 ` [PATCH v3 05/10] KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON handling David Matlack
2022-10-03 19:23   ` Isaku Yamahata
2022-09-21 17:35 ` [PATCH v3 06/10] KVM: x86/mmu: Handle no-slot faults in kvm_faultin_pfn() David Matlack
2022-09-21 17:35 ` [PATCH v3 07/10] KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUs David Matlack
2022-10-03 19:27   ` Isaku Yamahata
2022-09-21 17:35 ` [PATCH v3 08/10] KVM: x86/mmu: Split out TDP MMU page fault handling David Matlack
2022-10-03 19:28   ` Isaku Yamahata
2022-10-11 21:56   ` Sean Christopherson
2022-09-21 17:35 ` [PATCH v3 09/10] KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU faults David Matlack
2022-10-03 19:28   ` Isaku Yamahata
2022-09-21 17:35 ` [PATCH v3 10/10] KVM: x86/mmu: Rename __direct_map() to direct_map() David Matlack
2022-10-03 19:29   ` Isaku Yamahata

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).