kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Lazily allocate memslot rmaps
@ 2021-04-27 22:36 Ben Gardon
  2021-04-27 22:36 ` [PATCH 1/6] KVM: x86/mmu: Track if shadow MMU active Ben Gardon
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Ben Gardon @ 2021-04-27 22:36 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

This series enables KVM to save memory when using the TDP MMU by waiting
to allocate memslot rmaps until they are needed. To do this, KVM tracks
whether or not a shadow root has been allocated. In order to get away
with not allocating the rmaps, KVM must also be sure to skip operations
which iterate over the rmaps. If the TDP MMU is in use and we have not
allocated a shadow root, these operations would essentially be op-ops
anyway. Skipping the rmap operations has a secondary benefit of avoiding
acquiring the MMU lock in write mode in many cases, substantially
reducing MMU lock contention.

This series was tested on an Intel Skylake machine. With the TDP MMU off
and on, this introduced no new failures on kvm-unit-tests or KVM selftests.

Ben Gardon (6):
  KVM: x86/mmu: Track if shadow MMU active
  KVM: x86/mmu: Skip rmap operations if shadow MMU inactive
  KVM: x86/mmu: Deduplicate rmap freeing in allocate_memslot_rmap
  KVM: x86/mmu: Factor out allocating memslot rmap
  KVM: x86/mmu: Protect kvm->memslots with a mutex
  KVM: x86/mmu: Lazily allocate memslot rmaps

 arch/x86/include/asm/kvm_host.h |  20 +++++
 arch/x86/kvm/mmu/mmu.c          | 153 +++++++++++++++++++++-----------
 arch/x86/kvm/mmu/mmu_internal.h |   2 +
 arch/x86/kvm/mmu/tdp_mmu.c      |   6 +-
 arch/x86/kvm/mmu/tdp_mmu.h      |   4 +-
 arch/x86/kvm/x86.c              | 133 +++++++++++++++++++++++----
 include/linux/kvm_host.h        |   2 +
 virt/kvm/kvm_main.c             |  48 +++++++---
 8 files changed, 283 insertions(+), 85 deletions(-)

-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] KVM: x86/mmu: Track if shadow MMU active
  2021-04-27 22:36 [PATCH 0/6] Lazily allocate memslot rmaps Ben Gardon
@ 2021-04-27 22:36 ` Ben Gardon
  2021-04-27 22:36 ` [PATCH 2/6] KVM: x86/mmu: Skip rmap operations if shadow MMU inactive Ben Gardon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Ben Gardon @ 2021-04-27 22:36 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

Add a field to each VM to track if the shadow / legacy MMU is actually
in use. If the shadow MMU is not in use, then that knowledge opens the
door to other optimizations which will be added in future patches.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu/mmu.c          | 10 +++++++++-
 arch/x86/kvm/mmu/mmu_internal.h |  2 ++
 arch/x86/kvm/mmu/tdp_mmu.c      |  6 ++++--
 arch/x86/kvm/mmu/tdp_mmu.h      |  4 ++--
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ad22d4839bcc..3900dcf2439e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1122,6 +1122,8 @@ struct kvm_arch {
 	 */
 	spinlock_t tdp_mmu_pages_lock;
 #endif /* CONFIG_X86_64 */
+
+	bool shadow_mmu_active;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 930ac8a7e7c9..3975272321d0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3110,6 +3110,11 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	return ret;
 }
 
+void activate_shadow_mmu(struct kvm *kvm)
+{
+	kvm->arch.shadow_mmu_active = true;
+}
+
 static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 			       struct list_head *invalid_list)
 {
@@ -3280,6 +3285,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	activate_shadow_mmu(vcpu->kvm);
+
 	write_lock(&vcpu->kvm->mmu_lock);
 	r = make_mmu_pages_available(vcpu);
 	if (r < 0)
@@ -5467,7 +5474,8 @@ void kvm_mmu_init_vm(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
 
-	kvm_mmu_init_tdp_mmu(kvm);
+	if (!kvm_mmu_init_tdp_mmu(kvm))
+		activate_shadow_mmu(kvm);
 
 	node->track_write = kvm_mmu_pte_write;
 	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index f2546d6d390c..297a911c018c 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -165,4 +165,6 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 
+void activate_shadow_mmu(struct kvm *kvm);
+
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 83cbdbe5de5a..5342aca2c8e0 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -14,10 +14,10 @@ static bool __read_mostly tdp_mmu_enabled = false;
 module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644);
 
 /* Initializes the TDP MMU for the VM, if enabled. */
-void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
+bool kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 {
 	if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled))
-		return;
+		return false;
 
 	/* This should not be changed for the lifetime of the VM. */
 	kvm->arch.tdp_mmu_enabled = true;
@@ -25,6 +25,8 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
 	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
 	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
+
+	return true;
 }
 
 static __always_inline void kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..b046ab5137a1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -80,12 +80,12 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 			 int *root_level);
 
 #ifdef CONFIG_X86_64
-void kvm_mmu_init_tdp_mmu(struct kvm *kvm);
+bool kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
 static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; }
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
 #else
-static inline void kvm_mmu_init_tdp_mmu(struct kvm *kvm) {}
+static inline bool kvm_mmu_init_tdp_mmu(struct kvm *kvm) { return false; }
 static inline void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) {}
 static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; }
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/6] KVM: x86/mmu: Skip rmap operations if shadow MMU inactive
  2021-04-27 22:36 [PATCH 0/6] Lazily allocate memslot rmaps Ben Gardon
  2021-04-27 22:36 ` [PATCH 1/6] KVM: x86/mmu: Track if shadow MMU active Ben Gardon
@ 2021-04-27 22:36 ` Ben Gardon
  2021-04-27 22:36 ` [PATCH 3/6] KVM: x86/mmu: Deduplicate rmap freeing in allocate_memslot_rmap Ben Gardon
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Ben Gardon @ 2021-04-27 22:36 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

If the shadow MMU is not in use, and only the TDP MMU is being used to
manage the memory mappings for a VM, then many rmap operations can be
skipped as they are guaranteed to be no-ops. This saves some time which
would be spent on the rmap operation. It also avoids acquiring the MMU
lock in write mode for many operations.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 128 +++++++++++++++++++++++++----------------
 1 file changed, 77 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3975272321d0..e252af46f205 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1189,6 +1189,10 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	if (is_tdp_mmu_enabled(kvm))
 		kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
 				slot->base_gfn + gfn_offset, mask, true);
+
+	if (!kvm->arch.shadow_mmu_active)
+		return;
+
 	while (mask) {
 		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
 					  PG_LEVEL_4K, slot);
@@ -1218,6 +1222,10 @@ static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 	if (is_tdp_mmu_enabled(kvm))
 		kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
 				slot->base_gfn + gfn_offset, mask, false);
+
+	if (!kvm->arch.shadow_mmu_active)
+		return;
+
 	while (mask) {
 		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
 					  PG_LEVEL_4K, slot);
@@ -1260,9 +1268,12 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 	int i;
 	bool write_protected = false;
 
-	for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
-		rmap_head = __gfn_to_rmap(gfn, i, slot);
-		write_protected |= __rmap_write_protect(kvm, rmap_head, true);
+	if (kvm->arch.shadow_mmu_active) {
+		for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
+			rmap_head = __gfn_to_rmap(gfn, i, slot);
+			write_protected |= __rmap_write_protect(kvm, rmap_head,
+								true);
+		}
 	}
 
 	if (is_tdp_mmu_enabled(kvm))
@@ -1433,9 +1444,10 @@ static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
 
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	bool flush;
+	bool flush = false;
 
-	flush = kvm_handle_gfn_range(kvm, range, kvm_unmap_rmapp);
+	if (kvm->arch.shadow_mmu_active)
+		flush = kvm_handle_gfn_range(kvm, range, kvm_unmap_rmapp);
 
 	if (is_tdp_mmu_enabled(kvm))
 		flush |= kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
@@ -1445,9 +1457,10 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	bool flush;
+	bool flush = false;
 
-	flush = kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmapp);
+	if (kvm->arch.shadow_mmu_active)
+		flush = kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmapp);
 
 	if (is_tdp_mmu_enabled(kvm))
 		flush |= kvm_tdp_mmu_set_spte_gfn(kvm, range);
@@ -1500,9 +1513,10 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	bool young;
+	bool young = false;
 
-	young = kvm_handle_gfn_range(kvm, range, kvm_age_rmapp);
+	if (kvm->arch.shadow_mmu_active)
+		young = kvm_handle_gfn_range(kvm, range, kvm_age_rmapp);
 
 	if (is_tdp_mmu_enabled(kvm))
 		young |= kvm_tdp_mmu_age_gfn_range(kvm, range);
@@ -1512,9 +1526,10 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	bool young;
+	bool young = false;
 
-	young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmapp);
+	if (kvm->arch.shadow_mmu_active)
+		young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmapp);
 
 	if (is_tdp_mmu_enabled(kvm))
 		young |= kvm_tdp_mmu_test_age_gfn(kvm, range);
@@ -5447,7 +5462,8 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	 */
 	kvm_reload_remote_mmus(kvm);
 
-	kvm_zap_obsolete_pages(kvm);
+	if (kvm->arch.shadow_mmu_active)
+		kvm_zap_obsolete_pages(kvm);
 
 	write_unlock(&kvm->mmu_lock);
 
@@ -5498,29 +5514,29 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	int i;
 	bool flush = false;
 
-	write_lock(&kvm->mmu_lock);
-	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
-		slots = __kvm_memslots(kvm, i);
-		kvm_for_each_memslot(memslot, slots) {
-			gfn_t start, end;
-
-			start = max(gfn_start, memslot->base_gfn);
-			end = min(gfn_end, memslot->base_gfn + memslot->npages);
-			if (start >= end)
-				continue;
+	if (kvm->arch.shadow_mmu_active) {
+		write_lock(&kvm->mmu_lock);
+		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+			slots = __kvm_memslots(kvm, i);
+			kvm_for_each_memslot(memslot, slots) {
+				gfn_t start, end;
+
+				start = max(gfn_start, memslot->base_gfn);
+				end = min(gfn_end, memslot->base_gfn + memslot->npages);
+				if (start >= end)
+					continue;
 
-			flush = slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
-							PG_LEVEL_4K,
-							KVM_MAX_HUGEPAGE_LEVEL,
-							start, end - 1, true, flush);
+				flush = slot_handle_level_range(kvm, memslot,
+						kvm_zap_rmapp, PG_LEVEL_4K,
+						KVM_MAX_HUGEPAGE_LEVEL, start,
+						end - 1, true, flush);
+			}
 		}
+		if (flush)
+			kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
+		write_unlock(&kvm->mmu_lock);
 	}
 
-	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
-
-	write_unlock(&kvm->mmu_lock);
-
 	if (is_tdp_mmu_enabled(kvm)) {
 		flush = false;
 
@@ -5547,12 +5563,15 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 				      struct kvm_memory_slot *memslot,
 				      int start_level)
 {
-	bool flush;
+	bool flush = false;
 
-	write_lock(&kvm->mmu_lock);
-	flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
-				start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
-	write_unlock(&kvm->mmu_lock);
+	if (kvm->arch.shadow_mmu_active) {
+		write_lock(&kvm->mmu_lock);
+		flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
+					  start_level, KVM_MAX_HUGEPAGE_LEVEL,
+					  false);
+		write_unlock(&kvm->mmu_lock);
+	}
 
 	if (is_tdp_mmu_enabled(kvm)) {
 		read_lock(&kvm->mmu_lock);
@@ -5622,16 +5641,15 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 	struct kvm_memory_slot *slot = (struct kvm_memory_slot *)memslot;
 	bool flush;
 
-	write_lock(&kvm->mmu_lock);
-	flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
-
-	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
-	write_unlock(&kvm->mmu_lock);
+	if (kvm->arch.shadow_mmu_active) {
+		write_lock(&kvm->mmu_lock);
+		flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
+		if (flush)
+			kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+		write_unlock(&kvm->mmu_lock);
+	}
 
 	if (is_tdp_mmu_enabled(kvm)) {
-		flush = false;
-
 		read_lock(&kvm->mmu_lock);
 		flush = kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot, flush);
 		if (flush)
@@ -5658,11 +5676,14 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   struct kvm_memory_slot *memslot)
 {
-	bool flush;
+	bool flush = false;
 
-	write_lock(&kvm->mmu_lock);
-	flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false);
-	write_unlock(&kvm->mmu_lock);
+	if (kvm->arch.shadow_mmu_active) {
+		write_lock(&kvm->mmu_lock);
+		flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty,
+					 false);
+		write_unlock(&kvm->mmu_lock);
+	}
 
 	if (is_tdp_mmu_enabled(kvm)) {
 		read_lock(&kvm->mmu_lock);
@@ -5687,6 +5708,14 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 	int ign;
 
 	write_lock(&kvm->mmu_lock);
+	if (is_tdp_mmu_enabled(kvm))
+		kvm_tdp_mmu_zap_all(kvm);
+
+	if (!kvm->arch.shadow_mmu_active) {
+		write_unlock(&kvm->mmu_lock);
+		return;
+	}
+
 restart:
 	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
 		if (WARN_ON(sp->role.invalid))
@@ -5699,9 +5728,6 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 
-	if (is_tdp_mmu_enabled(kvm))
-		kvm_tdp_mmu_zap_all(kvm);
-
 	write_unlock(&kvm->mmu_lock);
 }
 
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/6] KVM: x86/mmu: Deduplicate rmap freeing in allocate_memslot_rmap
  2021-04-27 22:36 [PATCH 0/6] Lazily allocate memslot rmaps Ben Gardon
  2021-04-27 22:36 ` [PATCH 1/6] KVM: x86/mmu: Track if shadow MMU active Ben Gardon
  2021-04-27 22:36 ` [PATCH 2/6] KVM: x86/mmu: Skip rmap operations if shadow MMU inactive Ben Gardon
@ 2021-04-27 22:36 ` Ben Gardon
  2021-04-28 10:00   ` Paolo Bonzini
  2021-04-27 22:36 ` [PATCH 4/6] KVM: x86/mmu: Factor out allocating memslot rmap Ben Gardon
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2021-04-27 22:36 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

Small code deduplication. No functional change expected.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/x86.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf3b67679cf0..5bcf07465c47 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10818,17 +10818,23 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_hv_destroy_vm(kvm);
 }
 
-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+static void free_memslot_rmap(struct kvm_memory_slot *slot)
 {
 	int i;
 
 	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
 		kvfree(slot->arch.rmap[i]);
 		slot->arch.rmap[i] = NULL;
+	}
+}
 
-		if (i == 0)
-			continue;
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	int i;
+
+	free_memslot_rmap(slot);
 
+	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
 		kvfree(slot->arch.lpage_info[i - 1]);
 		slot->arch.lpage_info[i - 1] = NULL;
 	}
@@ -10894,12 +10900,9 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 	return 0;
 
 out_free:
-	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
-		kvfree(slot->arch.rmap[i]);
-		slot->arch.rmap[i] = NULL;
-		if (i == 0)
-			continue;
+	free_memslot_rmap(slot);
 
+	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
 		kvfree(slot->arch.lpage_info[i - 1]);
 		slot->arch.lpage_info[i - 1] = NULL;
 	}
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/6] KVM: x86/mmu: Factor out allocating memslot rmap
  2021-04-27 22:36 [PATCH 0/6] Lazily allocate memslot rmaps Ben Gardon
                   ` (2 preceding siblings ...)
  2021-04-27 22:36 ` [PATCH 3/6] KVM: x86/mmu: Deduplicate rmap freeing in allocate_memslot_rmap Ben Gardon
@ 2021-04-27 22:36 ` Ben Gardon
  2021-04-27 22:36 ` [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex Ben Gardon
  2021-04-27 22:36 ` [PATCH 6/6] KVM: x86/mmu: Lazily allocate memslot rmaps Ben Gardon
  5 siblings, 0 replies; 22+ messages in thread
From: Ben Gardon @ 2021-04-27 22:36 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

Small refactor to facilitate allocating rmaps for all memslots at once.

No functional change expected.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/x86.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5bcf07465c47..fc32a7dbe4c4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10842,10 +10842,37 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 	kvm_page_track_free_memslot(slot);
 }
 
+static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
+			      unsigned long npages)
+{
+	int i;
+
+	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+		int lpages;
+		int level = i + 1;
+
+		lpages = gfn_to_index(slot->base_gfn + npages - 1,
+				      slot->base_gfn, level) + 1;
+
+		slot->arch.rmap[i] =
+			kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
+				 GFP_KERNEL_ACCOUNT);
+		if (!slot->arch.rmap[i])
+			goto out_free;
+	}
+
+	return 0;
+
+out_free:
+	free_memslot_rmap(slot);
+	return -ENOMEM;
+}
+
 static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 				      unsigned long npages)
 {
 	int i;
+	int r;
 
 	/*
 	 * Clear out the previous array pointers for the KVM_MR_MOVE case.  The
@@ -10854,7 +10881,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 	 */
 	memset(&slot->arch, 0, sizeof(slot->arch));
 
-	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+	r = alloc_memslot_rmap(slot, npages);
+	if (r)
+		return r;
+
+	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
 		struct kvm_lpage_info *linfo;
 		unsigned long ugfn;
 		int lpages;
@@ -10863,14 +10894,6 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 		lpages = gfn_to_index(slot->base_gfn + npages - 1,
 				      slot->base_gfn, level) + 1;
 
-		slot->arch.rmap[i] =
-			kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
-				 GFP_KERNEL_ACCOUNT);
-		if (!slot->arch.rmap[i])
-			goto out_free;
-		if (i == 0)
-			continue;
-
 		linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
 		if (!linfo)
 			goto out_free;
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-27 22:36 [PATCH 0/6] Lazily allocate memslot rmaps Ben Gardon
                   ` (3 preceding siblings ...)
  2021-04-27 22:36 ` [PATCH 4/6] KVM: x86/mmu: Factor out allocating memslot rmap Ben Gardon
@ 2021-04-27 22:36 ` Ben Gardon
  2021-04-28  6:25   ` Paolo Bonzini
  2021-04-27 22:36 ` [PATCH 6/6] KVM: x86/mmu: Lazily allocate memslot rmaps Ben Gardon
  5 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2021-04-27 22:36 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

Adds a lock around memslots changes. Currently this lock does not have
any effect on the syncronization model, but it will be used in a future
commit to facilitate lazy rmap allocation.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +++++
 arch/x86/kvm/x86.c              | 11 +++++++++++
 include/linux/kvm_host.h        |  2 ++
 virt/kvm/kvm_main.c             |  9 ++++++++-
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3900dcf2439e..bce7fa152473 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1124,6 +1124,11 @@ struct kvm_arch {
 #endif /* CONFIG_X86_64 */
 
 	bool shadow_mmu_active;
+
+	/*
+	 * Protects kvm->memslots.
+	 */
+	struct mutex memslot_assignment_lock;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc32a7dbe4c4..30234fe96f48 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10649,6 +10649,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
 	mutex_init(&kvm->arch.apic_map_lock);
 	spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock);
+	mutex_init(&kvm->arch.memslot_assignment_lock);
 
 	kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
 	pvclock_update_vm_gtod_copy(kvm);
@@ -10868,6 +10869,16 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
 	return -ENOMEM;
 }
 
+
+void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
+			     struct kvm_memslots *slots)
+{
+	mutex_lock(&kvm->arch.memslot_assignment_lock);
+	rcu_assign_pointer(kvm->memslots[as_id], slots);
+	mutex_unlock(&kvm->arch.memslot_assignment_lock);
+}
+
+
 static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 				      unsigned long npages)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8895b95b6a22..146bb839c754 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -720,6 +720,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *memslot,
 				const struct kvm_userspace_memory_region *mem,
 				enum kvm_mr_change change);
+void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
+			     struct kvm_memslots *slots);
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				const struct kvm_userspace_memory_region *mem,
 				struct kvm_memory_slot *old,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2799c6660cce..e62a37bc5b90 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1270,6 +1270,12 @@ static int check_memory_region_flags(const struct kvm_userspace_memory_region *m
 	return 0;
 }
 
+__weak void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
+				    struct kvm_memslots *slots)
+{
+	rcu_assign_pointer(kvm->memslots[as_id], slots);
+}
+
 static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 		int as_id, struct kvm_memslots *slots)
 {
@@ -1279,7 +1285,8 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 	WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
 	slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
 
-	rcu_assign_pointer(kvm->memslots[as_id], slots);
+	kvm_arch_assign_memslots(kvm, as_id, slots);
+
 	synchronize_srcu_expedited(&kvm->srcu);
 
 	/*
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/6] KVM: x86/mmu: Lazily allocate memslot rmaps
  2021-04-27 22:36 [PATCH 0/6] Lazily allocate memslot rmaps Ben Gardon
                   ` (4 preceding siblings ...)
  2021-04-27 22:36 ` [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex Ben Gardon
@ 2021-04-27 22:36 ` Ben Gardon
  2021-04-28 10:03   ` Paolo Bonzini
  5 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2021-04-27 22:36 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong, Ben Gardon

If the TDP MMU is in use, wait to allocate the rmaps until the shadow
MMU is actually used. (i.e. a nested VM is launched.) This saves memory
equal to 0.2% of guest memory in cases where the TDP MMU is used and
there are no nested guests involved.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/include/asm/kvm_host.h | 15 +++++++-
 arch/x86/kvm/mmu/mmu.c          | 21 ++++++++--
 arch/x86/kvm/mmu/mmu_internal.h |  2 +-
 arch/x86/kvm/x86.c              | 68 ++++++++++++++++++++++++++++++---
 include/linux/kvm_host.h        |  2 +-
 virt/kvm/kvm_main.c             | 43 +++++++++++++++------
 6 files changed, 129 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bce7fa152473..9ce4cfaf6539 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1126,7 +1126,18 @@ struct kvm_arch {
 	bool shadow_mmu_active;
 
 	/*
-	 * Protects kvm->memslots.
+	 * If set, the rmap should be allocated for any newly created or
+	 * modified memslots. If allocating rmaps lazily, this may be set
+	 * before the rmaps are allocated for existing memslots, but
+	 * shadow_mmu_active will not be set until after the rmaps are fully
+	 * allocated. Protected by the memslot assignment lock, below.
+	 */
+	bool alloc_memslot_rmaps;
+
+	/*
+	 * Protects kvm->memslots and alloc_memslot_rmaps (above) to ensure
+	 * that once alloc_memslot_rmaps is set, no memslot is left without an
+	 * rmap.
 	 */
 	struct mutex memslot_assignment_lock;
 };
@@ -1860,4 +1871,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 
 int kvm_cpu_dirty_log_size(void);
 
+int alloc_all_memslots_rmaps(struct kvm *kvm);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e252af46f205..b2a6585bd978 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3125,9 +3125,17 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	return ret;
 }
 
-void activate_shadow_mmu(struct kvm *kvm)
+int activate_shadow_mmu(struct kvm *kvm)
 {
+	int r;
+
+	r = alloc_all_memslots_rmaps(kvm);
+	if (r)
+		return r;
+
 	kvm->arch.shadow_mmu_active = true;
+
+	return 0;
 }
 
 static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
@@ -3300,7 +3308,9 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	activate_shadow_mmu(vcpu->kvm);
+	r = activate_shadow_mmu(vcpu->kvm);
+	if (r)
+		return r;
 
 	write_lock(&vcpu->kvm->mmu_lock);
 	r = make_mmu_pages_available(vcpu);
@@ -5491,7 +5501,12 @@ void kvm_mmu_init_vm(struct kvm *kvm)
 	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
 
 	if (!kvm_mmu_init_tdp_mmu(kvm))
-		activate_shadow_mmu(kvm);
+		/*
+		 * No memslots can have been allocated at this point.
+		 * activate_shadow_mmu won't actually need to allocate
+		 * rmaps, so it cannot fail.
+		 */
+		WARN_ON(activate_shadow_mmu(kvm));
 
 	node->track_write = kvm_mmu_pte_write;
 	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 297a911c018c..c6b21a916452 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -165,6 +165,6 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 
-void activate_shadow_mmu(struct kvm *kvm);
+int activate_shadow_mmu(struct kvm *kvm);
 
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 30234fe96f48..1aca39673168 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10843,11 +10843,24 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 	kvm_page_track_free_memslot(slot);
 }
 
-static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
+static int alloc_memslot_rmap(struct kvm *kvm, struct kvm_memory_slot *slot,
 			      unsigned long npages)
 {
 	int i;
 
+	if (!kvm->arch.alloc_memslot_rmaps)
+		return 0;
+
+	/*
+	 * All rmaps for a memslot should be allocated either before
+	 * the memslot is installed (in which case no other threads
+	 * should have a pointer to it), or under the
+	 * memslot_assignment_lock. Avoid overwriting already allocated
+	 * rmaps.
+	 */
+	if (slot->arch.rmap[0])
+		return 0;
+
 	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
 		int lpages;
 		int level = i + 1;
@@ -10869,17 +10882,62 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
 	return -ENOMEM;
 }
 
+int alloc_memslots_rmaps(struct kvm *kvm, struct kvm_memslots *slots)
+{
+	struct kvm_memory_slot *slot;
+	int r = 0;
+
+	kvm_for_each_memslot(slot, slots) {
+		r = alloc_memslot_rmap(kvm, slot, slot->npages);
+		if (r)
+			break;
+	}
+	return r;
+}
+
+int alloc_all_memslots_rmaps(struct kvm *kvm)
+{
+	struct kvm_memslots *slots;
+	int r = 0;
+	int i;
 
-void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
+	mutex_lock(&kvm->arch.memslot_assignment_lock);
+	kvm->arch.alloc_memslot_rmaps = true;
+
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(kvm, i);
+		r = alloc_memslots_rmaps(kvm, slots);
+		if (r)
+			break;
+	}
+	mutex_unlock(&kvm->arch.memslot_assignment_lock);
+	return r;
+}
+
+int kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
 			     struct kvm_memslots *slots)
 {
+	int r;
+
 	mutex_lock(&kvm->arch.memslot_assignment_lock);
+
+	if (kvm->arch.alloc_memslot_rmaps) {
+		r = alloc_memslots_rmaps(kvm, slots);
+		if (r) {
+			mutex_unlock(&kvm->arch.memslot_assignment_lock);
+			return r;
+		}
+	}
+
 	rcu_assign_pointer(kvm->memslots[as_id], slots);
 	mutex_unlock(&kvm->arch.memslot_assignment_lock);
+
+	return 0;
 }
 
 
-static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
+static int kvm_alloc_memslot_metadata(struct kvm *kvm,
+				      struct kvm_memory_slot *slot,
 				      unsigned long npages)
 {
 	int i;
@@ -10892,7 +10950,7 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 	 */
 	memset(&slot->arch, 0, sizeof(slot->arch));
 
-	r = alloc_memslot_rmap(slot, npages);
+	r = alloc_memslot_rmap(kvm, slot, npages);
 	if (r)
 		return r;
 
@@ -10965,7 +11023,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				enum kvm_mr_change change)
 {
 	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE)
-		return kvm_alloc_memslot_metadata(memslot,
+		return kvm_alloc_memslot_metadata(kvm, memslot,
 						  mem->memory_size >> PAGE_SHIFT);
 	return 0;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 146bb839c754..0a34491a5c40 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -720,7 +720,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *memslot,
 				const struct kvm_userspace_memory_region *mem,
 				enum kvm_mr_change change);
-void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
+int kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
 			     struct kvm_memslots *slots);
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				const struct kvm_userspace_memory_region *mem,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e62a37bc5b90..657e29ce8a05 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1270,22 +1270,31 @@ static int check_memory_region_flags(const struct kvm_userspace_memory_region *m
 	return 0;
 }
 
-__weak void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
+__weak int kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
 				    struct kvm_memslots *slots)
 {
 	rcu_assign_pointer(kvm->memslots[as_id], slots);
+	return 0;
 }
 
-static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
-		int as_id, struct kvm_memslots *slots)
+static int install_new_memslots(struct kvm *kvm, int as_id,
+				struct kvm_memslots *slots,
+				struct kvm_memslots **old_slots)
 {
-	struct kvm_memslots *old_memslots = __kvm_memslots(kvm, as_id);
-	u64 gen = old_memslots->generation;
+	u64 gen;
+	int r;
+
+	*old_slots = __kvm_memslots(kvm, as_id);
+	gen = (*old_slots)->generation;
 
 	WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
 	slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
 
-	kvm_arch_assign_memslots(kvm, as_id, slots);
+	r = kvm_arch_assign_memslots(kvm, as_id, slots);
+	if (r) {
+		old_slots = NULL;
+		return r;
+	}
 
 	synchronize_srcu_expedited(&kvm->srcu);
 
@@ -1310,7 +1319,7 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 
 	slots->generation = gen;
 
-	return old_memslots;
+	return 0;
 }
 
 /*
@@ -1346,6 +1355,7 @@ static int kvm_set_memslot(struct kvm *kvm,
 			   enum kvm_mr_change change)
 {
 	struct kvm_memory_slot *slot;
+	struct kvm_memslots *old_slots;
 	struct kvm_memslots *slots;
 	int r;
 
@@ -1367,7 +1377,10 @@ static int kvm_set_memslot(struct kvm *kvm,
 		 * dropped by update_memslots anyway.  We'll also revert to the
 		 * old memslots if preparing the new memory region fails.
 		 */
-		slots = install_new_memslots(kvm, as_id, slots);
+		r = install_new_memslots(kvm, as_id, slots,  &old_slots);
+		if (r)
+			goto out_free;
+		slots = old_slots;
 
 		/* From this point no new shadow pages pointing to a deleted,
 		 * or moved, memslot will be created.
@@ -1384,7 +1397,10 @@ static int kvm_set_memslot(struct kvm *kvm,
 		goto out_slots;
 
 	update_memslots(slots, new, change);
-	slots = install_new_memslots(kvm, as_id, slots);
+	r = install_new_memslots(kvm, as_id, slots,  &old_slots);
+	if (r)
+		goto out_slots;
+	slots = old_slots;
 
 	kvm_arch_commit_memory_region(kvm, mem, old, new, change);
 
@@ -1392,8 +1408,13 @@ static int kvm_set_memslot(struct kvm *kvm,
 	return 0;
 
 out_slots:
-	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
-		slots = install_new_memslots(kvm, as_id, slots);
+	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
+		r = install_new_memslots(kvm, as_id, slots, &old_slots);
+		if (r)
+			goto out_slots;
+		slots = old_slots;
+	}
+out_free:
 	kvfree(slots);
 	return r;
 }
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-27 22:36 ` [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex Ben Gardon
@ 2021-04-28  6:25   ` Paolo Bonzini
  2021-04-28 16:40     ` Ben Gardon
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-04-28  6:25 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Junaid Shahid,
	Jim Mattson, Yulei Zhang, Wanpeng Li, Vitaly Kuznetsov,
	Xiao Guangrong

On 28/04/21 00:36, Ben Gardon wrote:
> +void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
> +			     struct kvm_memslots *slots)
> +{
> +	mutex_lock(&kvm->arch.memslot_assignment_lock);
> +	rcu_assign_pointer(kvm->memslots[as_id], slots);
> +	mutex_unlock(&kvm->arch.memslot_assignment_lock);
> +}

Does the assignment also needs the lock, or only the rmap allocation?  I 
would prefer the hook to be something like kvm_arch_setup_new_memslots.

(Also it is useful to have a comment somewhere explaining why the 
slots_lock does not work.  IIUC there would be a deadlock because you'd 
be taking the slots_lock inside an SRCU critical region, while usually 
the slots_lock critical section is the one that includes a 
synchronize_srcu; I should dig that up and document that ordering in 
Documentation/virt/kvm too).

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: Deduplicate rmap freeing in allocate_memslot_rmap
  2021-04-27 22:36 ` [PATCH 3/6] KVM: x86/mmu: Deduplicate rmap freeing in allocate_memslot_rmap Ben Gardon
@ 2021-04-28 10:00   ` Paolo Bonzini
  2021-04-28 16:23     ` Ben Gardon
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-04-28 10:00 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Junaid Shahid,
	Jim Mattson, Yulei Zhang, Wanpeng Li, Vitaly Kuznetsov,
	Xiao Guangrong

Typo in the commit subject, I guess?

Paolo

On 28/04/21 00:36, Ben Gardon wrote:
> Small code deduplication. No functional change expected.
> 
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
>   arch/x86/kvm/x86.c | 19 +++++++++++--------
>   1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf3b67679cf0..5bcf07465c47 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10818,17 +10818,23 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>   	kvm_hv_destroy_vm(kvm);
>   }
>   
> -void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> +static void free_memslot_rmap(struct kvm_memory_slot *slot)
>   {
>   	int i;
>   
>   	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
>   		kvfree(slot->arch.rmap[i]);
>   		slot->arch.rmap[i] = NULL;
> +	}
> +}
>   
> -		if (i == 0)
> -			continue;
> +void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> +{
> +	int i;
> +
> +	free_memslot_rmap(slot);
>   
> +	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
>   		kvfree(slot->arch.lpage_info[i - 1]);
>   		slot->arch.lpage_info[i - 1] = NULL;
>   	}
> @@ -10894,12 +10900,9 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
>   	return 0;
>   
>   out_free:
> -	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> -		kvfree(slot->arch.rmap[i]);
> -		slot->arch.rmap[i] = NULL;
> -		if (i == 0)
> -			continue;
> +	free_memslot_rmap(slot);
>   
> +	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
>   		kvfree(slot->arch.lpage_info[i - 1]);
>   		slot->arch.lpage_info[i - 1] = NULL;
>   	}
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] KVM: x86/mmu: Lazily allocate memslot rmaps
  2021-04-27 22:36 ` [PATCH 6/6] KVM: x86/mmu: Lazily allocate memslot rmaps Ben Gardon
@ 2021-04-28 10:03   ` Paolo Bonzini
  2021-04-28 16:45     ` Ben Gardon
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-04-28 10:03 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Junaid Shahid,
	Jim Mattson, Yulei Zhang, Wanpeng Li, Vitaly Kuznetsov,
	Xiao Guangrong

On 28/04/21 00:36, Ben Gardon wrote:
> -static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> +static int kvm_alloc_memslot_metadata(struct kvm *kvm,
> +				      struct kvm_memory_slot *slot,
>   				      unsigned long npages)
>   {
>   	int i;
> @@ -10892,7 +10950,7 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
>   	 */
>   	memset(&slot->arch, 0, sizeof(slot->arch));
>   
> -	r = alloc_memslot_rmap(slot, npages);
> +	r = alloc_memslot_rmap(kvm, slot, npages);
>   	if (r)
>   		return r;
>   

I wonder why you need alloc_memslot_rmap at all here in 
kvm_alloc_memslot_metadata, or alternatively why you need to do it in 
kvm_arch_assign_memslots.  It seems like only one of those would be 
necessary.

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: Deduplicate rmap freeing in allocate_memslot_rmap
  2021-04-28 10:00   ` Paolo Bonzini
@ 2021-04-28 16:23     ` Ben Gardon
  0 siblings, 0 replies; 22+ messages in thread
From: Ben Gardon @ 2021-04-28 16:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On Wed, Apr 28, 2021 at 3:00 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Typo in the commit subject, I guess?

Oh woops, yeah It should just be "Deduplicate rmap freeing" or
something to that effect.

>
> Paolo
>
> On 28/04/21 00:36, Ben Gardon wrote:
> > Small code deduplication. No functional change expected.
> >
> > Signed-off-by: Ben Gardon <bgardon@google.com>
> > ---
> >   arch/x86/kvm/x86.c | 19 +++++++++++--------
> >   1 file changed, 11 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index cf3b67679cf0..5bcf07465c47 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10818,17 +10818,23 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >       kvm_hv_destroy_vm(kvm);
> >   }
> >
> > -void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> > +static void free_memslot_rmap(struct kvm_memory_slot *slot)
> >   {
> >       int i;
> >
> >       for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> >               kvfree(slot->arch.rmap[i]);
> >               slot->arch.rmap[i] = NULL;
> > +     }
> > +}
> >
> > -             if (i == 0)
> > -                     continue;
> > +void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> > +{
> > +     int i;
> > +
> > +     free_memslot_rmap(slot);
> >
> > +     for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
> >               kvfree(slot->arch.lpage_info[i - 1]);
> >               slot->arch.lpage_info[i - 1] = NULL;
> >       }
> > @@ -10894,12 +10900,9 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> >       return 0;
> >
> >   out_free:
> > -     for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> > -             kvfree(slot->arch.rmap[i]);
> > -             slot->arch.rmap[i] = NULL;
> > -             if (i == 0)
> > -                     continue;
> > +     free_memslot_rmap(slot);
> >
> > +     for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
> >               kvfree(slot->arch.lpage_info[i - 1]);
> >               slot->arch.lpage_info[i - 1] = NULL;
> >       }
> >
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-28  6:25   ` Paolo Bonzini
@ 2021-04-28 16:40     ` Ben Gardon
  2021-04-28 17:46       ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2021-04-28 16:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On Tue, Apr 27, 2021 at 11:25 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 28/04/21 00:36, Ben Gardon wrote:
> > +void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
> > +                          struct kvm_memslots *slots)
> > +{
> > +     mutex_lock(&kvm->arch.memslot_assignment_lock);
> > +     rcu_assign_pointer(kvm->memslots[as_id], slots);
> > +     mutex_unlock(&kvm->arch.memslot_assignment_lock);
> > +}
>
> Does the assignment also needs the lock, or only the rmap allocation?  I
> would prefer the hook to be something like kvm_arch_setup_new_memslots.

The assignment does need to be under the lock to prevent the following race:
1. Thread 1 (installing a new memslot): Acquires memslot assignment
lock (or perhaps in this case rmap_allocation_lock would be more apt.)
2. Thread 1: Check alloc_memslot_rmaps (it is false)
3. Thread 1: doesn't allocate memslot rmaps for new slot
4. Thread 1: Releases memslot assignment lock
5. Thread 2 (allocating a shadow root): Acquires memslot assignment lock
6. Thread 2: Sets alloc_memslot_rmaps = true
7. Thread 2: Allocates rmaps for all existing slots
8. Thread 2: Releases memslot assignment lock
9. Thread 2: Sets shadow_mmu_active = true
10. Thread 1: Installs the new memslots
11. Thread 3: Null pointer dereference when trying to access rmaps on
the new slot.

Putting the assignment under the lock prevents 5-8 from happening
between 2 and 10.

I'm open to other ideas as far as how to prevent this race though. I
admit this solution is not the most elegant looking.

>
> (Also it is useful to have a comment somewhere explaining why the
> slots_lock does not work.  IIUC there would be a deadlock because you'd
> be taking the slots_lock inside an SRCU critical region, while usually
> the slots_lock critical section is the one that includes a
> synchronize_srcu; I should dig that up and document that ordering in
> Documentation/virt/kvm too).

Yeah, sorry about that. I should have added a comment to that effect.
As you suspected, it's because of the slots lock / SRCU deadlock.
Using the slots lock was my original implementation, until the
deadlock issue came up.

I can add comments about that in a v2.

>
> Paolo
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] KVM: x86/mmu: Lazily allocate memslot rmaps
  2021-04-28 10:03   ` Paolo Bonzini
@ 2021-04-28 16:45     ` Ben Gardon
  0 siblings, 0 replies; 22+ messages in thread
From: Ben Gardon @ 2021-04-28 16:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On Wed, Apr 28, 2021 at 3:04 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 28/04/21 00:36, Ben Gardon wrote:
> > -static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> > +static int kvm_alloc_memslot_metadata(struct kvm *kvm,
> > +                                   struct kvm_memory_slot *slot,
> >                                     unsigned long npages)
> >   {
> >       int i;
> > @@ -10892,7 +10950,7 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> >        */
> >       memset(&slot->arch, 0, sizeof(slot->arch));
> >
> > -     r = alloc_memslot_rmap(slot, npages);
> > +     r = alloc_memslot_rmap(kvm, slot, npages);
> >       if (r)
> >               return r;
> >
>
> I wonder why you need alloc_memslot_rmap at all here in
> kvm_alloc_memslot_metadata, or alternatively why you need to do it in
> kvm_arch_assign_memslots.  It seems like only one of those would be
> necessary.

Oh, that's a good point. We need it in kvm_arch_assign_memslots
because of the race I identified in the thread for patch 5 of this
series, but we could remove it from kvm_alloc_memslot_metadata with
this patch.

Of course, it would be much nicer if we could just keep it in
kvm_alloc_memslot_metadata and find some other way to stop memslots
without rmaps from being inappropriately installed, if anyone has a
simpler way to accomplish that.

>
> Paolo
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-28 16:40     ` Ben Gardon
@ 2021-04-28 17:46       ` Paolo Bonzini
  2021-04-28 20:40         ` Ben Gardon
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-04-28 17:46 UTC (permalink / raw)
  To: Ben Gardon
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On 28/04/21 18:40, Ben Gardon wrote:
> On Tue, Apr 27, 2021 at 11:25 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 28/04/21 00:36, Ben Gardon wrote:
>>> +void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
>>> +                          struct kvm_memslots *slots)
>>> +{
>>> +     mutex_lock(&kvm->arch.memslot_assignment_lock);
>>> +     rcu_assign_pointer(kvm->memslots[as_id], slots);
>>> +     mutex_unlock(&kvm->arch.memslot_assignment_lock);
>>> +}
>>
>> Does the assignment also needs the lock, or only the rmap allocation?  I
>> would prefer the hook to be something like kvm_arch_setup_new_memslots.
> 
> The assignment does need to be under the lock to prevent the following race:
> 1. Thread 1 (installing a new memslot): Acquires memslot assignment
> lock (or perhaps in this case rmap_allocation_lock would be more apt.)
> 2. Thread 1: Check alloc_memslot_rmaps (it is false)
> 3. Thread 1: doesn't allocate memslot rmaps for new slot
> 4. Thread 1: Releases memslot assignment lock
> 5. Thread 2 (allocating a shadow root): Acquires memslot assignment lock
> 6. Thread 2: Sets alloc_memslot_rmaps = true
> 7. Thread 2: Allocates rmaps for all existing slots
> 8. Thread 2: Releases memslot assignment lock
> 9. Thread 2: Sets shadow_mmu_active = true
> 10. Thread 1: Installs the new memslots
> 11. Thread 3: Null pointer dereference when trying to access rmaps on
> the new slot.

... because thread 3 would be under mmu_lock and therefore cannot 
allocate the rmap itself (you have to do it in mmu_alloc_shadow_roots, 
as in patch 6).

Related to this, your solution does not have to protect kvm_dup_memslots 
with the new lock, because the first update of the memslots will not go 
through kvm_arch_prepare_memory_region but it _will_ go through 
install_new_memslots and therefore through the new hook.  But overall I 
think I'd prefer to have a kvm->slots_arch_lock mutex in generic code, 
and place the call to kvm_dup_memslots and 
kvm_arch_prepare_memory_region inside that mutex.

That makes the new lock decently intuitive, and easily documented as 
"Architecture code can use slots_arch_lock if the contents of struct 
kvm_arch_memory_slot needs to be written outside 
kvm_arch_prepare_memory_region.  Unlike slots_lock, slots_arch_lock can 
be taken inside a ``kvm->srcu`` read-side critical section".

I admit I haven't thought about it very thoroughly, but if something 
like this is enough, it is relatively pretty:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9b8e30dd5b9b..6e5106365597 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1333,6 +1333,7 @@ static struct kvm_memslots 
*install_new_memslots(struct kvm *kvm,

  	rcu_assign_pointer(kvm->memslots[as_id], slots);

+	mutex_unlock(&kvm->slots_arch_lock);
  	synchronize_srcu_expedited(&kvm->srcu);

  	/*
@@ -1399,6 +1398,7 @@ static int kvm_set_memslot(struct kvm *kvm,
  	struct kvm_memslots *slots;
  	int r;

+	mutex_lock(&kvm->slots_arch_lock);
  	slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
  	if (!slots)
  		return -ENOMEM;
@@ -1427,6 +1427,7 @@ static int kvm_set_memslot(struct kvm *kvm,
  		 *	- kvm_is_visible_gfn (mmu_check_root)
  		 */
  		kvm_arch_flush_shadow_memslot(kvm, slot);
+		mutex_lock(&kvm->slots_arch_lock);
  	}

  	r = kvm_arch_prepare_memory_region(kvm, new, mem, change);

It does make the critical section a bit larger, so that the 
initialization of the shadow page (which is in KVM_RUN context) contends 
with slightly more code than necessary.  However it's all but a 
performance critical situation, as it will only happen just once per VM.

WDYT?

Paolo

> Putting the assignment under the lock prevents 5-8 from happening
> between 2 and 10.
> 
> I'm open to other ideas as far as how to prevent this race though. I
> admit this solution is not the most elegant looking.



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-28 17:46       ` Paolo Bonzini
@ 2021-04-28 20:40         ` Ben Gardon
  2021-04-28 21:41           ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2021-04-28 20:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On Wed, Apr 28, 2021 at 10:46 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 28/04/21 18:40, Ben Gardon wrote:
> > On Tue, Apr 27, 2021 at 11:25 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 28/04/21 00:36, Ben Gardon wrote:
> >>> +void kvm_arch_assign_memslots(struct kvm *kvm, int as_id,
> >>> +                          struct kvm_memslots *slots)
> >>> +{
> >>> +     mutex_lock(&kvm->arch.memslot_assignment_lock);
> >>> +     rcu_assign_pointer(kvm->memslots[as_id], slots);
> >>> +     mutex_unlock(&kvm->arch.memslot_assignment_lock);
> >>> +}
> >>
> >> Does the assignment also needs the lock, or only the rmap allocation?  I
> >> would prefer the hook to be something like kvm_arch_setup_new_memslots.
> >
> > The assignment does need to be under the lock to prevent the following race:
> > 1. Thread 1 (installing a new memslot): Acquires memslot assignment
> > lock (or perhaps in this case rmap_allocation_lock would be more apt.)
> > 2. Thread 1: Check alloc_memslot_rmaps (it is false)
> > 3. Thread 1: doesn't allocate memslot rmaps for new slot
> > 4. Thread 1: Releases memslot assignment lock
> > 5. Thread 2 (allocating a shadow root): Acquires memslot assignment lock
> > 6. Thread 2: Sets alloc_memslot_rmaps = true
> > 7. Thread 2: Allocates rmaps for all existing slots
> > 8. Thread 2: Releases memslot assignment lock
> > 9. Thread 2: Sets shadow_mmu_active = true
> > 10. Thread 1: Installs the new memslots
> > 11. Thread 3: Null pointer dereference when trying to access rmaps on
> > the new slot.
>
> ... because thread 3 would be under mmu_lock and therefore cannot
> allocate the rmap itself (you have to do it in mmu_alloc_shadow_roots,
> as in patch 6).
>
> Related to this, your solution does not have to protect kvm_dup_memslots
> with the new lock, because the first update of the memslots will not go
> through kvm_arch_prepare_memory_region but it _will_ go through
> install_new_memslots and therefore through the new hook.  But overall I
> think I'd prefer to have a kvm->slots_arch_lock mutex in generic code,
> and place the call to kvm_dup_memslots and
> kvm_arch_prepare_memory_region inside that mutex.

That makes sense, and I think it also avoids a bug in this series.
Currently, if the rmaps are allocated between kvm_dup_memslots and and
install_new_memslots, we run into a problem where the copied slots
will have new rmaps allocated for them before installation.
Potentially creating all sorts of problems. I had fixed that issue in
the past by allocating the array of per-level rmaps at memslot
creation, seperate from the memslot. That meant that the copy would
get the newly allocated rmaps as well, but I thought that was obsolete
after some memslot refactors went in.

I hoped we could get away without that change, but I was probably
wrong. However, with this enlarged critical section, that should not
be an issue for creating memslots since kvm_dup_memslots will either
copy memslots with the rmaps already allocated, or the whole thing
will happen before the rmaps are allocated.

... However with the locking you propose below, we might still run
into issues on a move or delete, which would mean we'd still need the
separate memory allocation for the rmaps array. Or we do some
shenanigans where we try to copy the rmap pointers from the other set
of memslots.

I can put together a v2 with the seperate rmap memory and more generic
locking and see how that looks.

>
> That makes the new lock decently intuitive, and easily documented as
> "Architecture code can use slots_arch_lock if the contents of struct
> kvm_arch_memory_slot needs to be written outside
> kvm_arch_prepare_memory_region.  Unlike slots_lock, slots_arch_lock can
> be taken inside a ``kvm->srcu`` read-side critical section".
>
> I admit I haven't thought about it very thoroughly, but if something
> like this is enough, it is relatively pretty:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9b8e30dd5b9b..6e5106365597 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1333,6 +1333,7 @@ static struct kvm_memslots
> *install_new_memslots(struct kvm *kvm,
>
>         rcu_assign_pointer(kvm->memslots[as_id], slots);
>
> +       mutex_unlock(&kvm->slots_arch_lock);
>         synchronize_srcu_expedited(&kvm->srcu);
>
>         /*
> @@ -1399,6 +1398,7 @@ static int kvm_set_memslot(struct kvm *kvm,
>         struct kvm_memslots *slots;
>         int r;
>
> +       mutex_lock(&kvm->slots_arch_lock);
>         slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
>         if (!slots)
>                 return -ENOMEM;
> @@ -1427,6 +1427,7 @@ static int kvm_set_memslot(struct kvm *kvm,
>                  *      - kvm_is_visible_gfn (mmu_check_root)
>                  */
>                 kvm_arch_flush_shadow_memslot(kvm, slot);
> +               mutex_lock(&kvm->slots_arch_lock);
>         }
>
>         r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
>
> It does make the critical section a bit larger, so that the
> initialization of the shadow page (which is in KVM_RUN context) contends
> with slightly more code than necessary.  However it's all but a
> performance critical situation, as it will only happen just once per VM.

I agree performance is not a huge concern here. Excluding
kvm_arch_flush_shadow_memslot from the critical section also helps a
lot because that's where most of the work could be if we're deleting /
moving a slot.
My only worry is the latency this could add to a nested VM launch, but
it seems pretty unlikely that that would be frequently coinciding with
a memslot change in practice.

>
> WDYT?
>
> Paolo
>
> > Putting the assignment under the lock prevents 5-8 from happening
> > between 2 and 10.
> >
> > I'm open to other ideas as far as how to prevent this race though. I
> > admit this solution is not the most elegant looking.
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-28 20:40         ` Ben Gardon
@ 2021-04-28 21:41           ` Paolo Bonzini
  2021-04-28 21:46             ` Ben Gardon
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-04-28 21:41 UTC (permalink / raw)
  To: Ben Gardon
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On 28/04/21 22:40, Ben Gardon wrote:
> ... However with the locking you propose below, we might still run
> into issues on a move or delete, which would mean we'd still need the
> separate memory allocation for the rmaps array. Or we do some
> shenanigans where we try to copy the rmap pointers from the other set
> of memslots.

If that's (almost) as easy as passing old to 
kvm_arch_prepare_memory_region, that would be totally okay.

> My only worry is the latency this could add to a nested VM launch, but
> it seems pretty unlikely that that would be frequently coinciding with
> a memslot change in practice.

Right, memslot changes in practice occur only at boot and on hotplug. 
If that was a problem we could always make the allocation state 
off/in-progress/on, allowing to check the allocation state out of the 
lock.  This would only potentially slow down the first nested VM launch.

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-28 21:41           ` Paolo Bonzini
@ 2021-04-28 21:46             ` Ben Gardon
  2021-04-28 23:42               ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2021-04-28 21:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On Wed, Apr 28, 2021 at 2:41 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 28/04/21 22:40, Ben Gardon wrote:
> > ... However with the locking you propose below, we might still run
> > into issues on a move or delete, which would mean we'd still need the
> > separate memory allocation for the rmaps array. Or we do some
> > shenanigans where we try to copy the rmap pointers from the other set
> > of memslots.
>
> If that's (almost) as easy as passing old to
> kvm_arch_prepare_memory_region, that would be totally okay.

Unfortunately it's not quite that easy because it's all the slots
_besides_ the one being modified where we'd need to copy the rmaps.

>
> > My only worry is the latency this could add to a nested VM launch, but
> > it seems pretty unlikely that that would be frequently coinciding with
> > a memslot change in practice.
>
> Right, memslot changes in practice occur only at boot and on hotplug.
> If that was a problem we could always make the allocation state
> off/in-progress/on, allowing to check the allocation state out of the
> lock.  This would only potentially slow down the first nested VM launch.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-28 21:46             ` Ben Gardon
@ 2021-04-28 23:42               ` Paolo Bonzini
  2021-04-29  0:40                 ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-04-28 23:42 UTC (permalink / raw)
  To: Ben Gardon
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On 28/04/21 23:46, Ben Gardon wrote:
> On Wed, Apr 28, 2021 at 2:41 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 28/04/21 22:40, Ben Gardon wrote:
>>> ... However with the locking you propose below, we might still run
>>> into issues on a move or delete, which would mean we'd still need the
>>> separate memory allocation for the rmaps array. Or we do some
>>> shenanigans where we try to copy the rmap pointers from the other set
>>> of memslots.
>>
>> If that's (almost) as easy as passing old to
>> kvm_arch_prepare_memory_region, that would be totally okay.
> 
> Unfortunately it's not quite that easy because it's all the slots
> _besides_ the one being modified where we'd need to copy the rmaps.

Ah, now I understand the whole race.  And it seems to me that if one
kvm_dup_memslots within the new lock fixed a bug, two kvm_dup_memslots
within the new lock are going to fix two bugs. :)

Seriously: unless I'm missing another case (it's late here...), it's
not ugly and it's still relatively easy to explain.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2799c6660cce..48929dd5fb29 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1270,7 +1270,7 @@ static int check_memory_region_flags(const struct kvm_userspace_memory_region *m
  	return 0;
  }
  
-static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
+static void install_new_memslots(struct kvm *kvm,
  		int as_id, struct kvm_memslots *slots)
  {
  	struct kvm_memslots *old_memslots = __kvm_memslots(kvm, as_id);
@@ -1280,7 +1280,9 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
  	slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
  
  	rcu_assign_pointer(kvm->memslots[as_id], slots);
+	mutex_unlock(&kvm->slots_arch_lock);
  	synchronize_srcu_expedited(&kvm->srcu);
+	kvfree(old_memslots);
  
  	/*
  	 * Increment the new memslot generation a second time, dropping the
@@ -1302,8 +1304,6 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
  	kvm_arch_memslots_updated(kvm, gen);
  
  	slots->generation = gen;
-
-	return old_memslots;
  }
  
  /*
@@ -1342,6 +1342,7 @@ static int kvm_set_memslot(struct kvm *kvm,
  	struct kvm_memslots *slots;
  	int r;
  
+	mutex_lock(&kvm->slots_arch_lock);
  	slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
  	if (!slots)
  		return -ENOMEM;
@@ -1353,14 +1354,7 @@ static int kvm_set_memslot(struct kvm *kvm,
  		 */
  		slot = id_to_memslot(slots, old->id);
  		slot->flags |= KVM_MEMSLOT_INVALID;
-
-		/*
-		 * We can re-use the old memslots, the only difference from the
-		 * newly installed memslots is the invalid flag, which will get
-		 * dropped by update_memslots anyway.  We'll also revert to the
-		 * old memslots if preparing the new memory region fails.
-		 */
-		slots = install_new_memslots(kvm, as_id, slots);
+		install_new_memslots(kvm, as_id, slots);
  
  		/* From this point no new shadow pages pointing to a deleted,
  		 * or moved, memslot will be created.
@@ -1370,6 +1364,9 @@ static int kvm_set_memslot(struct kvm *kvm,
  		 *	- kvm_is_visible_gfn (mmu_check_root)
  		 */
  		kvm_arch_flush_shadow_memslot(kvm, slot);
+
+		mutex_lock(&kvm->slots_arch_lock);
+		slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
  	}
  
  	r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
@@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm,
  		goto out_slots;
  
  	update_memslots(slots, new, change);
-	slots = install_new_memslots(kvm, as_id, slots);
+	install_new_memslots(kvm, as_id, slots);
  
  	kvm_arch_commit_memory_region(kvm, mem, old, new, change);
-
-	kvfree(slots);
  	return 0;
  
  out_slots:
-	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
+		slot = id_to_memslot(slots, old->id);
+		slot->flags &= ~KVM_MEMSLOT_INVALID;
  		slots = install_new_memslots(kvm, as_id, slots);
+	}
  	kvfree(slots);
  	return r;
  }

One could optimize things a bit by reusing the allocation and only
doing a memcpy from the new memslots array to the old one under the
slots_arch_lock.  (Plus the above still lacks a mutex_init and
should be split in two patches, with the mutex going in the second;
but you get the idea and code sometimes is easier than words).

Paolo


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-28 23:42               ` Paolo Bonzini
@ 2021-04-29  0:40                 ` Sean Christopherson
  2021-04-29  1:42                   ` Sean Christopherson
  2021-04-29  7:02                   ` Paolo Bonzini
  0 siblings, 2 replies; 22+ messages in thread
From: Sean Christopherson @ 2021-04-29  0:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, LKML, kvm, Peter Xu, Peter Shier, Junaid Shahid,
	Jim Mattson, Yulei Zhang, Wanpeng Li, Vitaly Kuznetsov,
	Xiao Guangrong

On Thu, Apr 29, 2021, Paolo Bonzini wrote:
> it's not ugly and it's still relatively easy to explain.

LOL, that's debatable.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2799c6660cce..48929dd5fb29 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm,
>  		goto out_slots;
>  	update_memslots(slots, new, change);
> -	slots = install_new_memslots(kvm, as_id, slots);
> +	install_new_memslots(kvm, as_id, slots);
>  	kvm_arch_commit_memory_region(kvm, mem, old, new, change);
> -
> -	kvfree(slots);
>  	return 0;
>  out_slots:
> -	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
> +	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
> +		slot = id_to_memslot(slots, old->id);
> +		slot->flags &= ~KVM_MEMSLOT_INVALID;

Modifying flags on an SRCU-protect field outside of said protection is sketchy.
It's probably ok to do this prior to the generation update, emphasis on
"probably".  Of course, the VM is also likely about to be killed in this case...

>  		slots = install_new_memslots(kvm, as_id, slots);

This will explode if memory allocation for KVM_MR_MOVE fails.  In that case,
the rmaps for "slots" will have been cleared by kvm_alloc_memslot_metadata().

> +	}
>  	kvfree(slots);
>  	return r;
>  }

The SRCU index is already tracked in vcpu->srcu_idx, why not temporarily drop
the SRCU lock if activate_shadow_mmu() needs to do work so that it can take
slots_lock?  That seems simpler and I think would avoid modifying the common
memslot code.

kvm_arch_async_page_ready() is the only path for reaching kvm_mmu_reload() that
looks scary, but that should be impossible to reach with the correct MMU context.
We could always and an explicit sanity check on the rmaps being avaiable.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-29  0:40                 ` Sean Christopherson
@ 2021-04-29  1:42                   ` Sean Christopherson
  2021-04-29  7:02                   ` Paolo Bonzini
  1 sibling, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2021-04-29  1:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, LKML, kvm, Peter Xu, Peter Shier, Junaid Shahid,
	Jim Mattson, Yulei Zhang, Wanpeng Li, Vitaly Kuznetsov,
	Xiao Guangrong

On Thu, Apr 29, 2021, Sean Christopherson wrote:
> On Thu, Apr 29, 2021, Paolo Bonzini wrote:
> > it's not ugly and it's still relatively easy to explain.
> 
> LOL, that's debatable.
> 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 2799c6660cce..48929dd5fb29 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm,
> >  		goto out_slots;
> >  	update_memslots(slots, new, change);
> > -	slots = install_new_memslots(kvm, as_id, slots);
> > +	install_new_memslots(kvm, as_id, slots);
> >  	kvm_arch_commit_memory_region(kvm, mem, old, new, change);
> > -
> > -	kvfree(slots);
> >  	return 0;
> >  out_slots:
> > -	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
> > +	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
> > +		slot = id_to_memslot(slots, old->id);
> > +		slot->flags &= ~KVM_MEMSLOT_INVALID;
> 
> Modifying flags on an SRCU-protect field outside of said protection is sketchy.
> It's probably ok to do this prior to the generation update, emphasis on
> "probably".  Of course, the VM is also likely about to be killed in this case...
> 
> >  		slots = install_new_memslots(kvm, as_id, slots);
> 
> This will explode if memory allocation for KVM_MR_MOVE fails.  In that case,
> the rmaps for "slots" will have been cleared by kvm_alloc_memslot_metadata().

Gah, that's all wrong, slots are the second duplicate and the clear happens on
the new slot, not the old slot with the same id.

Though I still think temporarily dropping the SRCU lock would be simpler.  If
performance is a concern, it could be mitigated by adding a capability to
preallocate the rmaps.

> > +	}
> >  	kvfree(slots);
> >  	return r;
> >  }
> 
> The SRCU index is already tracked in vcpu->srcu_idx, why not temporarily drop
> the SRCU lock if activate_shadow_mmu() needs to do work so that it can take
> slots_lock?  That seems simpler and I think would avoid modifying the common
> memslot code.
> 
> kvm_arch_async_page_ready() is the only path for reaching kvm_mmu_reload() that
> looks scary, but that should be impossible to reach with the correct MMU context.
> We could always and an explicit sanity check on the rmaps being avaiable.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-29  0:40                 ` Sean Christopherson
  2021-04-29  1:42                   ` Sean Christopherson
@ 2021-04-29  7:02                   ` Paolo Bonzini
  2021-04-29 17:45                     ` Ben Gardon
  1 sibling, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2021-04-29  7:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ben Gardon, LKML, kvm, Peter Xu, Peter Shier, Junaid Shahid,
	Jim Mattson, Yulei Zhang, Wanpeng Li, Vitaly Kuznetsov,
	Xiao Guangrong

On 29/04/21 02:40, Sean Christopherson wrote:
> On Thu, Apr 29, 2021, Paolo Bonzini wrote:
>> it's not ugly and it's still relatively easy to explain.
> 
> LOL, that's debatable.

 From your remark below it looks like we have different priorities on 
what to avoid modifying.

I like the locks to be either very coarse or fine-grained enough for 
them to be leaves, as I find that to be the easiest way to avoid 
deadlocks and complex hierarchies.  For this reason, I treat unlocking 
in the middle of a large critical section as "scary by default"; you 
have to worry about which invariants might be required (in the case of 
RCU, which pointers might be stored somewhere and would be invalidated), 
and which locks are taken at that point so that the subsequent relocking 
would change the lock order from AB to BA.

This applies to every path leading to the unlock/relock.  So instead 
what matters IMO is shielding architecture code from the races that Ben 
had to point out to me, _and the possibility to apply easily explained 
rules_ outside more complex core code.

So, well, "relatively easy" because it's indeed subtle.  But if you 
consider what the locking rules are, "you can choose to protect 
slots->arch data with this mutex and it will have no problematic 
interactions with the memslot copy/update code" is as simple as it can get.

>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 2799c6660cce..48929dd5fb29 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm,
>>   		goto out_slots;
>>   	update_memslots(slots, new, change);
>> -	slots = install_new_memslots(kvm, as_id, slots);
>> +	install_new_memslots(kvm, as_id, slots);
>>   	kvm_arch_commit_memory_region(kvm, mem, old, new, change);
>> -
>> -	kvfree(slots);
>>   	return 0;
>>   out_slots:
>> -	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
>> +	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
>> +		slot = id_to_memslot(slots, old->id);
>> +		slot->flags &= ~KVM_MEMSLOT_INVALID;
> 
> Modifying flags on an SRCU-protect field outside of said protection is sketchy.
> It's probably ok to do this prior to the generation update, emphasis on
> "probably".  Of course, the VM is also likely about to be killed in this case...
> 
>>   		slots = install_new_memslots(kvm, as_id, slots);
> 
> This will explode if memory allocation for KVM_MR_MOVE fails.  In that case,
> the rmaps for "slots" will have been cleared by kvm_alloc_memslot_metadata().

I take your subsequent reply as a sort-of-review that the above approach 
works, though we may disagree on its elegance and complexity.

Paolo

> The SRCU index is already tracked in vcpu->srcu_idx, why not temporarily drop
> the SRCU lock if activate_shadow_mmu() needs to do work so that it can take
> slots_lock?  That seems simpler and I think would avoid modifying the common
> memslot code.
>
> kvm_arch_async_page_ready() is the only path for reaching kvm_mmu_reload() that
> looks scary, but that should be impossible to reach with the correct MMU context.
> We could always and an explicit sanity check on the rmaps being avaiable.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex
  2021-04-29  7:02                   ` Paolo Bonzini
@ 2021-04-29 17:45                     ` Ben Gardon
  0 siblings, 0 replies; 22+ messages in thread
From: Ben Gardon @ 2021-04-29 17:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, LKML, kvm, Peter Xu, Peter Shier,
	Junaid Shahid, Jim Mattson, Yulei Zhang, Wanpeng Li,
	Vitaly Kuznetsov, Xiao Guangrong

On Thu, Apr 29, 2021 at 12:02 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 29/04/21 02:40, Sean Christopherson wrote:
> > On Thu, Apr 29, 2021, Paolo Bonzini wrote:
> >> it's not ugly and it's still relatively easy to explain.
> >
> > LOL, that's debatable.
>
>  From your remark below it looks like we have different priorities on
> what to avoid modifying.
>
> I like the locks to be either very coarse or fine-grained enough for
> them to be leaves, as I find that to be the easiest way to avoid
> deadlocks and complex hierarchies.  For this reason, I treat unlocking
> in the middle of a large critical section as "scary by default"; you
> have to worry about which invariants might be required (in the case of
> RCU, which pointers might be stored somewhere and would be invalidated),
> and which locks are taken at that point so that the subsequent relocking
> would change the lock order from AB to BA.

The idea of dropping the srcu read lock to allocate the rmaps scares
me too. I have no idea what it protects, in addition to the memslots,
and where we might be making assumptions about things staying valid
because of it.
Is there anywhere that we enumerate the things protected by that SRCU?
I wonder if we have complete enough annotations for the SRCU / lockdep
checker to find a problem if we were dropping the SRCU read lock in a
bad place.

>
> This applies to every path leading to the unlock/relock.  So instead
> what matters IMO is shielding architecture code from the races that Ben
> had to point out to me, _and the possibility to apply easily explained
> rules_ outside more complex core code.
>
> So, well, "relatively easy" because it's indeed subtle.  But if you
> consider what the locking rules are, "you can choose to protect
> slots->arch data with this mutex and it will have no problematic
> interactions with the memslot copy/update code" is as simple as it can get.
>
> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >> index 2799c6660cce..48929dd5fb29 100644
> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm,
> >>              goto out_slots;
> >>      update_memslots(slots, new, change);
> >> -    slots = install_new_memslots(kvm, as_id, slots);
> >> +    install_new_memslots(kvm, as_id, slots);
> >>      kvm_arch_commit_memory_region(kvm, mem, old, new, change);
> >> -
> >> -    kvfree(slots);
> >>      return 0;
> >>   out_slots:
> >> -    if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
> >> +    if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
> >> +            slot = id_to_memslot(slots, old->id);
> >> +            slot->flags &= ~KVM_MEMSLOT_INVALID;
> >
> > Modifying flags on an SRCU-protect field outside of said protection is sketchy.
> > It's probably ok to do this prior to the generation update, emphasis on
> > "probably".  Of course, the VM is also likely about to be killed in this case...
> >
> >>              slots = install_new_memslots(kvm, as_id, slots);
> >
> > This will explode if memory allocation for KVM_MR_MOVE fails.  In that case,
> > the rmaps for "slots" will have been cleared by kvm_alloc_memslot_metadata().
>
> I take your subsequent reply as a sort-of-review that the above approach
> works, though we may disagree on its elegance and complexity.

I'll try Paolo's suggestion about using a second dup_slots to avoid
backing the higher level rmap arrays with dynamic memory and send out
a V2.

>
> Paolo
>
> > The SRCU index is already tracked in vcpu->srcu_idx, why not temporarily drop
> > the SRCU lock if activate_shadow_mmu() needs to do work so that it can take
> > slots_lock?  That seems simpler and I think would avoid modifying the common
> > memslot code.
> >
> > kvm_arch_async_page_ready() is the only path for reaching kvm_mmu_reload() that
> > looks scary, but that should be impossible to reach with the correct MMU context.
> > We could always and an explicit sanity check on the rmaps being avaiable.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-04-29 17:45 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 22:36 [PATCH 0/6] Lazily allocate memslot rmaps Ben Gardon
2021-04-27 22:36 ` [PATCH 1/6] KVM: x86/mmu: Track if shadow MMU active Ben Gardon
2021-04-27 22:36 ` [PATCH 2/6] KVM: x86/mmu: Skip rmap operations if shadow MMU inactive Ben Gardon
2021-04-27 22:36 ` [PATCH 3/6] KVM: x86/mmu: Deduplicate rmap freeing in allocate_memslot_rmap Ben Gardon
2021-04-28 10:00   ` Paolo Bonzini
2021-04-28 16:23     ` Ben Gardon
2021-04-27 22:36 ` [PATCH 4/6] KVM: x86/mmu: Factor out allocating memslot rmap Ben Gardon
2021-04-27 22:36 ` [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex Ben Gardon
2021-04-28  6:25   ` Paolo Bonzini
2021-04-28 16:40     ` Ben Gardon
2021-04-28 17:46       ` Paolo Bonzini
2021-04-28 20:40         ` Ben Gardon
2021-04-28 21:41           ` Paolo Bonzini
2021-04-28 21:46             ` Ben Gardon
2021-04-28 23:42               ` Paolo Bonzini
2021-04-29  0:40                 ` Sean Christopherson
2021-04-29  1:42                   ` Sean Christopherson
2021-04-29  7:02                   ` Paolo Bonzini
2021-04-29 17:45                     ` Ben Gardon
2021-04-27 22:36 ` [PATCH 6/6] KVM: x86/mmu: Lazily allocate memslot rmaps Ben Gardon
2021-04-28 10:03   ` Paolo Bonzini
2021-04-28 16:45     ` Ben Gardon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).