kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] Lazily allocate memslot rmaps
@ 2021-05-06 18:42 Ben Gardon
  2021-05-06 18:42 ` [PATCH v3 1/8] KVM: x86/mmu: Deduplicate rmap freeing Ben Gardon
                   ` (8 more replies)
  0 siblings, 9 replies; 31+ messages in thread
From: Ben Gardon @ 2021-05-06 18:42 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu,
	Ben Gardon

This series enables KVM to save memory when using the TDP MMU by waiting
to allocate memslot rmaps until they are needed. To do this, KVM tracks
whether or not a shadow root has been allocated. In order to get away
with not allocating the rmaps, KVM must also be sure to skip operations
which iterate over the rmaps. If the TDP MMU is in use and we have not
allocated a shadow root, these operations would essentially be op-ops
anyway. Skipping the rmap operations has a secondary benefit of avoiding
acquiring the MMU lock in write mode in many cases, substantially
reducing MMU lock contention.

This series was tested on an Intel Skylake machine. With the TDP MMU off
and on, this introduced no new failures on kvm-unit-tests or KVM selftests.

Changelog:
v2:
	Incorporated feedback from Paolo and Sean
	Replaced the memslot_assignment_lock with slots_arch_lock, which
	has a larger critical section.

v3:
	Removed shadow_mmu_active as suggested by Sean
	Removed everything except adding a return value to
	kvm_mmu_init_tdp_mmu from patch 1 of v2
	Added RCU protection and better memory ordering for installing the
	memslot rmaps as suggested by Paolo
	Reordered most of the patches

Ben Gardon (8):
  KVM: x86/mmu: Deduplicate rmap freeing
  KVM: x86/mmu: Factor out allocating memslot rmap
  KVM: mmu: Refactor memslot copy
  KVM: mmu: Add slots_arch_lock for memslot arch fields
  KVM: x86/mmu: Add a field to control memslot rmap allocation
  KVM: x86/mmu: Skip rmap operations if rmaps not allocated
  KVM: x86/mmu: Protect rmaps independently with SRCU
  KVM: x86/mmu: Lazily allocate memslot rmaps

 arch/x86/include/asm/kvm_host.h |   9 ++
 arch/x86/kvm/mmu/mmu.c          | 195 ++++++++++++++++++++------------
 arch/x86/kvm/mmu/tdp_mmu.c      |   6 +-
 arch/x86/kvm/mmu/tdp_mmu.h      |   4 +-
 arch/x86/kvm/x86.c              | 107 ++++++++++++++----
 include/linux/kvm_host.h        |   9 ++
 virt/kvm/kvm_main.c             |  54 +++++++--
 7 files changed, 275 insertions(+), 109 deletions(-)

-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 1/8] KVM: x86/mmu: Deduplicate rmap freeing
  2021-05-06 18:42 [PATCH v3 0/8] Lazily allocate memslot rmaps Ben Gardon
@ 2021-05-06 18:42 ` Ben Gardon
  2021-05-07  7:42   ` David Hildenbrand
  2021-05-06 18:42 ` [PATCH v3 2/8] KVM: x86/mmu: Factor out allocating memslot rmap Ben Gardon
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Ben Gardon @ 2021-05-06 18:42 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu,
	Ben Gardon

Small code deduplication. No functional change expected.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/x86.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf3b67679cf0..5bcf07465c47 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10818,17 +10818,23 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_hv_destroy_vm(kvm);
 }
 
-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+static void free_memslot_rmap(struct kvm_memory_slot *slot)
 {
 	int i;
 
 	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
 		kvfree(slot->arch.rmap[i]);
 		slot->arch.rmap[i] = NULL;
+	}
+}
 
-		if (i == 0)
-			continue;
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	int i;
+
+	free_memslot_rmap(slot);
 
+	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
 		kvfree(slot->arch.lpage_info[i - 1]);
 		slot->arch.lpage_info[i - 1] = NULL;
 	}
@@ -10894,12 +10900,9 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 	return 0;
 
 out_free:
-	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
-		kvfree(slot->arch.rmap[i]);
-		slot->arch.rmap[i] = NULL;
-		if (i == 0)
-			continue;
+	free_memslot_rmap(slot);
 
+	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
 		kvfree(slot->arch.lpage_info[i - 1]);
 		slot->arch.lpage_info[i - 1] = NULL;
 	}
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 2/8] KVM: x86/mmu: Factor out allocating memslot rmap
  2021-05-06 18:42 [PATCH v3 0/8] Lazily allocate memslot rmaps Ben Gardon
  2021-05-06 18:42 ` [PATCH v3 1/8] KVM: x86/mmu: Deduplicate rmap freeing Ben Gardon
@ 2021-05-06 18:42 ` Ben Gardon
  2021-05-07  7:46   ` David Hildenbrand
  2021-05-06 18:42 ` [PATCH v3 3/8] KVM: mmu: Refactor memslot copy Ben Gardon
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Ben Gardon @ 2021-05-06 18:42 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu,
	Ben Gardon

Small refactor to facilitate allocating rmaps for all memslots at once.

No functional change expected.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/x86.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5bcf07465c47..fc32a7dbe4c4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10842,10 +10842,37 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 	kvm_page_track_free_memslot(slot);
 }
 
+static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
+			      unsigned long npages)
+{
+	int i;
+
+	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+		int lpages;
+		int level = i + 1;
+
+		lpages = gfn_to_index(slot->base_gfn + npages - 1,
+				      slot->base_gfn, level) + 1;
+
+		slot->arch.rmap[i] =
+			kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
+				 GFP_KERNEL_ACCOUNT);
+		if (!slot->arch.rmap[i])
+			goto out_free;
+	}
+
+	return 0;
+
+out_free:
+	free_memslot_rmap(slot);
+	return -ENOMEM;
+}
+
 static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 				      unsigned long npages)
 {
 	int i;
+	int r;
 
 	/*
 	 * Clear out the previous array pointers for the KVM_MR_MOVE case.  The
@@ -10854,7 +10881,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 	 */
 	memset(&slot->arch, 0, sizeof(slot->arch));
 
-	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+	r = alloc_memslot_rmap(slot, npages);
+	if (r)
+		return r;
+
+	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
 		struct kvm_lpage_info *linfo;
 		unsigned long ugfn;
 		int lpages;
@@ -10863,14 +10894,6 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 		lpages = gfn_to_index(slot->base_gfn + npages - 1,
 				      slot->base_gfn, level) + 1;
 
-		slot->arch.rmap[i] =
-			kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
-				 GFP_KERNEL_ACCOUNT);
-		if (!slot->arch.rmap[i])
-			goto out_free;
-		if (i == 0)
-			continue;
-
 		linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
 		if (!linfo)
 			goto out_free;
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 3/8] KVM: mmu: Refactor memslot copy
  2021-05-06 18:42 [PATCH v3 0/8] Lazily allocate memslot rmaps Ben Gardon
  2021-05-06 18:42 ` [PATCH v3 1/8] KVM: x86/mmu: Deduplicate rmap freeing Ben Gardon
  2021-05-06 18:42 ` [PATCH v3 2/8] KVM: x86/mmu: Factor out allocating memslot rmap Ben Gardon
@ 2021-05-06 18:42 ` Ben Gardon
  2021-05-07  7:48   ` David Hildenbrand
  2021-05-06 18:42 ` [PATCH v3 4/8] KVM: mmu: Add slots_arch_lock for memslot arch fields Ben Gardon
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Ben Gardon @ 2021-05-06 18:42 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu,
	Ben Gardon

Factor out copying kvm_memslots from allocating the memory for new ones
in preparation for adding a new lock to protect the arch-specific fields
of the memslots.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 virt/kvm/kvm_main.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2799c6660cce..c8010f55e368 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1306,6 +1306,18 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 	return old_memslots;
 }
 
+static size_t kvm_memslots_size(int slots)
+{
+	return sizeof(struct kvm_memslots) +
+	       (sizeof(struct kvm_memory_slot) * slots);
+}
+
+static void kvm_copy_memslots(struct kvm_memslots *from,
+			      struct kvm_memslots *to)
+{
+	memcpy(to, from, kvm_memslots_size(from->used_slots));
+}
+
 /*
  * Note, at a minimum, the current number of used slots must be allocated, even
  * when deleting a memslot, as we need a complete duplicate of the memslots for
@@ -1315,19 +1327,16 @@ static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
 					     enum kvm_mr_change change)
 {
 	struct kvm_memslots *slots;
-	size_t old_size, new_size;
-
-	old_size = sizeof(struct kvm_memslots) +
-		   (sizeof(struct kvm_memory_slot) * old->used_slots);
+	size_t new_size;
 
 	if (change == KVM_MR_CREATE)
-		new_size = old_size + sizeof(struct kvm_memory_slot);
+		new_size = kvm_memslots_size(old->used_slots + 1);
 	else
-		new_size = old_size;
+		new_size = kvm_memslots_size(old->used_slots);
 
 	slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
 	if (likely(slots))
-		memcpy(slots, old, old_size);
+		kvm_copy_memslots(old, slots);
 
 	return slots;
 }
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 4/8] KVM: mmu: Add slots_arch_lock for memslot arch fields
  2021-05-06 18:42 [PATCH v3 0/8] Lazily allocate memslot rmaps Ben Gardon
                   ` (2 preceding siblings ...)
  2021-05-06 18:42 ` [PATCH v3 3/8] KVM: mmu: Refactor memslot copy Ben Gardon
@ 2021-05-06 18:42 ` Ben Gardon
  2021-05-06 18:42 ` [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation Ben Gardon
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Ben Gardon @ 2021-05-06 18:42 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu,
	Ben Gardon

Add a new lock to protect the arch-specific fields of memslots if they
need to be modified in a kvm->srcu read critical section. A future
commit will use this lock to lazily allocate memslot rmaps for x86.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 include/linux/kvm_host.h |  9 +++++++++
 virt/kvm/kvm_main.c      | 31 ++++++++++++++++++++++++++-----
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8895b95b6a22..2d5e797fbb08 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -472,6 +472,15 @@ struct kvm {
 #endif /* KVM_HAVE_MMU_RWLOCK */
 
 	struct mutex slots_lock;
+
+	/*
+	 * Protects the arch-specific fields of struct kvm_memory_slots in
+	 * use by the VM. To be used under the slots_lock (above) or in a
+	 * kvm->srcu read cirtical section where acquiring the slots_lock
+	 * would lead to deadlock with the synchronize_srcu in
+	 * install_new_memslots.
+	 */
+	struct mutex slots_arch_lock;
 	struct mm_struct *mm; /* userspace tied to this vm */
 	struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
 	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c8010f55e368..97b03fa2d0c8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -908,6 +908,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	mutex_init(&kvm->lock);
 	mutex_init(&kvm->irq_lock);
 	mutex_init(&kvm->slots_lock);
+	mutex_init(&kvm->slots_arch_lock);
 	INIT_LIST_HEAD(&kvm->devices);
 
 	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
@@ -1280,6 +1281,10 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 	slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
 
 	rcu_assign_pointer(kvm->memslots[as_id], slots);
+
+	/* Acquired in kvm_set_memslot. */
+	mutex_unlock(&kvm->slots_arch_lock);
+
 	synchronize_srcu_expedited(&kvm->srcu);
 
 	/*
@@ -1351,6 +1356,9 @@ static int kvm_set_memslot(struct kvm *kvm,
 	struct kvm_memslots *slots;
 	int r;
 
+	/* Released in install_new_memslots. */
+	mutex_lock(&kvm->slots_arch_lock);
+
 	slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
 	if (!slots)
 		return -ENOMEM;
@@ -1364,10 +1372,9 @@ static int kvm_set_memslot(struct kvm *kvm,
 		slot->flags |= KVM_MEMSLOT_INVALID;
 
 		/*
-		 * We can re-use the old memslots, the only difference from the
-		 * newly installed memslots is the invalid flag, which will get
-		 * dropped by update_memslots anyway.  We'll also revert to the
-		 * old memslots if preparing the new memory region fails.
+		 * We can re-use the memory from the old memslots.
+		 * It will be overwritten with a copy of the new memslots
+		 * after reacquiring the slots_arch_lock below.
 		 */
 		slots = install_new_memslots(kvm, as_id, slots);
 
@@ -1379,6 +1386,17 @@ static int kvm_set_memslot(struct kvm *kvm,
 		 *	- kvm_is_visible_gfn (mmu_check_root)
 		 */
 		kvm_arch_flush_shadow_memslot(kvm, slot);
+
+		/* Released in install_new_memslots. */
+		mutex_lock(&kvm->slots_arch_lock);
+
+		/*
+		 * The arch-specific fields of the memslots could have changed
+		 * between releasing the slots_arch_lock in
+		 * install_new_memslots and here, so get a fresh copy of the
+		 * slots.
+		 */
+		kvm_copy_memslots(__kvm_memslots(kvm, as_id), slots);
 	}
 
 	r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
@@ -1394,8 +1412,11 @@ static int kvm_set_memslot(struct kvm *kvm,
 	return 0;
 
 out_slots:
-	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
+		slot = id_to_memslot(slots, old->id);
+		slot->flags &= ~KVM_MEMSLOT_INVALID;
 		slots = install_new_memslots(kvm, as_id, slots);
+	}
 	kvfree(slots);
 	return r;
 }
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation
  2021-05-06 18:42 [PATCH v3 0/8] Lazily allocate memslot rmaps Ben Gardon
                   ` (3 preceding siblings ...)
  2021-05-06 18:42 ` [PATCH v3 4/8] KVM: mmu: Add slots_arch_lock for memslot arch fields Ben Gardon
@ 2021-05-06 18:42 ` Ben Gardon
  2021-05-06 23:44   ` Ben Gardon
  2021-05-06 18:42 ` [PATCH v3 6/8] KVM: x86/mmu: Skip rmap operations if rmaps not allocated Ben Gardon
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Ben Gardon @ 2021-05-06 18:42 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu,
	Ben Gardon

Add a field to control whether new memslots should have rmaps allocated
for them. As of this change, it's not safe to skip allocating rmaps, so
the field is always set to allocate rmaps. Future changes will make it
safe to operate without rmaps, using the TDP MMU. Then further changes
will allow the rmaps to be allocated lazily when needed for nested
oprtation.

No functional change expected.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/include/asm/kvm_host.h |  8 ++++++++
 arch/x86/kvm/mmu/mmu.c          |  2 ++
 arch/x86/kvm/x86.c              | 18 +++++++++++++-----
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ad22d4839bcc..00065f9bbc5e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1122,6 +1122,12 @@ struct kvm_arch {
 	 */
 	spinlock_t tdp_mmu_pages_lock;
 #endif /* CONFIG_X86_64 */
+
+	/*
+	 * If set, rmaps have been allocated for all memslots and should be
+	 * allocated for any newly created or modified memslots.
+	 */
+	bool memslots_have_rmaps;
 };
 
 struct kvm_vm_stat {
@@ -1853,4 +1859,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 
 int kvm_cpu_dirty_log_size(void);
 
+inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 930ac8a7e7c9..8761b4925755 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5469,6 +5469,8 @@ void kvm_mmu_init_vm(struct kvm *kvm)
 
 	kvm_mmu_init_tdp_mmu(kvm);
 
+	kvm->arch.memslots_have_rmaps = true;
+
 	node->track_write = kvm_mmu_pte_write;
 	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc32a7dbe4c4..d7a40ce342cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10868,7 +10868,13 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
 	return -ENOMEM;
 }
 
-static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
+bool kvm_memslots_have_rmaps(struct kvm *kvm)
+{
+	return kvm->arch.memslots_have_rmaps;
+}
+
+static int kvm_alloc_memslot_metadata(struct kvm *kvm,
+				      struct kvm_memory_slot *slot,
 				      unsigned long npages)
 {
 	int i;
@@ -10881,9 +10887,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 	 */
 	memset(&slot->arch, 0, sizeof(slot->arch));
 
-	r = alloc_memslot_rmap(slot, npages);
-	if (r)
-		return r;
+	if (kvm_memslots_have_rmaps(kvm)) {
+		r = alloc_memslot_rmap(slot, npages);
+		if (r)
+			return r;
+	}
 
 	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
 		struct kvm_lpage_info *linfo;
@@ -10954,7 +10962,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				enum kvm_mr_change change)
 {
 	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE)
-		return kvm_alloc_memslot_metadata(memslot,
+		return kvm_alloc_memslot_metadata(kvm, memslot,
 						  mem->memory_size >> PAGE_SHIFT);
 	return 0;
 }
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 6/8] KVM: x86/mmu: Skip rmap operations if rmaps not allocated
  2021-05-06 18:42 [PATCH v3 0/8] Lazily allocate memslot rmaps Ben Gardon
                   ` (4 preceding siblings ...)
  2021-05-06 18:42 ` [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation Ben Gardon
@ 2021-05-06 18:42 ` Ben Gardon
  2021-05-06 23:07   ` kernel test robot
  2021-05-06 18:42 ` [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU Ben Gardon
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Ben Gardon @ 2021-05-06 18:42 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu,
	Ben Gardon

If only the TDP MMU is being used to manage the memory mappings for a VM,
then many rmap operations can be skipped as they are guaranteed to be
no-ops. This saves some time which would be spent on the rmap operation.
It also avoids acquiring the MMU lock in write mode for many operations.

This makes it safe to run the VM without rmaps allocated, when only
using the TDP MMU and sets the stage for waiting to allocate the rmaps
until they're needed.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 128 +++++++++++++++++++++++++----------------
 1 file changed, 77 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8761b4925755..730ea84bf7e7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1189,6 +1189,10 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	if (is_tdp_mmu_enabled(kvm))
 		kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
 				slot->base_gfn + gfn_offset, mask, true);
+
+	if (!kvm_memslots_have_rmaps(kvm))
+		return;
+
 	while (mask) {
 		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
 					  PG_LEVEL_4K, slot);
@@ -1218,6 +1222,10 @@ static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 	if (is_tdp_mmu_enabled(kvm))
 		kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
 				slot->base_gfn + gfn_offset, mask, false);
+
+	if (!kvm_memslots_have_rmaps(kvm))
+		return;
+
 	while (mask) {
 		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
 					  PG_LEVEL_4K, slot);
@@ -1260,9 +1268,12 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 	int i;
 	bool write_protected = false;
 
-	for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
-		rmap_head = __gfn_to_rmap(gfn, i, slot);
-		write_protected |= __rmap_write_protect(kvm, rmap_head, true);
+	if (kvm_memslots_have_rmaps(kvm)) {
+		for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
+			rmap_head = __gfn_to_rmap(gfn, i, slot);
+			write_protected |= __rmap_write_protect(kvm, rmap_head,
+								true);
+		}
 	}
 
 	if (is_tdp_mmu_enabled(kvm))
@@ -1433,9 +1444,10 @@ static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
 
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	bool flush;
+	bool flush = false;
 
-	flush = kvm_handle_gfn_range(kvm, range, kvm_unmap_rmapp);
+	if (kvm_memslots_have_rmaps(kvm))
+		flush = kvm_handle_gfn_range(kvm, range, kvm_unmap_rmapp);
 
 	if (is_tdp_mmu_enabled(kvm))
 		flush |= kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
@@ -1445,9 +1457,10 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	bool flush;
+	bool flush = false;
 
-	flush = kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmapp);
+	if (kvm_memslots_have_rmaps(kvm))
+		flush = kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmapp);
 
 	if (is_tdp_mmu_enabled(kvm))
 		flush |= kvm_tdp_mmu_set_spte_gfn(kvm, range);
@@ -1500,9 +1513,10 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	bool young;
+	bool young = false;
 
-	young = kvm_handle_gfn_range(kvm, range, kvm_age_rmapp);
+	if (kvm_memslots_have_rmaps(kvm))
+		young = kvm_handle_gfn_range(kvm, range, kvm_age_rmapp);
 
 	if (is_tdp_mmu_enabled(kvm))
 		young |= kvm_tdp_mmu_age_gfn_range(kvm, range);
@@ -1512,9 +1526,10 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	bool young;
+	bool young = false;
 
-	young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmapp);
+	if (kvm_memslots_have_rmaps(kvm))
+		young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmapp);
 
 	if (is_tdp_mmu_enabled(kvm))
 		young |= kvm_tdp_mmu_test_age_gfn(kvm, range);
@@ -5440,7 +5455,8 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	 */
 	kvm_reload_remote_mmus(kvm);
 
-	kvm_zap_obsolete_pages(kvm);
+	if (kvm_memslots_have_rmaps(kvm))
+		kvm_zap_obsolete_pages(kvm);
 
 	write_unlock(&kvm->mmu_lock);
 
@@ -5492,29 +5508,29 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	int i;
 	bool flush = false;
 
-	write_lock(&kvm->mmu_lock);
-	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
-		slots = __kvm_memslots(kvm, i);
-		kvm_for_each_memslot(memslot, slots) {
-			gfn_t start, end;
-
-			start = max(gfn_start, memslot->base_gfn);
-			end = min(gfn_end, memslot->base_gfn + memslot->npages);
-			if (start >= end)
-				continue;
+	if (kvm_memslots_have_rmaps(kvm)) {
+		write_lock(&kvm->mmu_lock);
+		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+			slots = __kvm_memslots(kvm, i);
+			kvm_for_each_memslot(memslot, slots) {
+				gfn_t start, end;
+
+				start = max(gfn_start, memslot->base_gfn);
+				end = min(gfn_end, memslot->base_gfn + memslot->npages);
+				if (start >= end)
+					continue;
 
-			flush = slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
-							PG_LEVEL_4K,
-							KVM_MAX_HUGEPAGE_LEVEL,
-							start, end - 1, true, flush);
+				flush = slot_handle_level_range(kvm, memslot,
+						kvm_zap_rmapp, PG_LEVEL_4K,
+						KVM_MAX_HUGEPAGE_LEVEL, start,
+						end - 1, true, flush);
+			}
 		}
+		if (flush)
+			kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
+		write_unlock(&kvm->mmu_lock);
 	}
 
-	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
-
-	write_unlock(&kvm->mmu_lock);
-
 	if (is_tdp_mmu_enabled(kvm)) {
 		flush = false;
 
@@ -5541,12 +5557,15 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 				      struct kvm_memory_slot *memslot,
 				      int start_level)
 {
-	bool flush;
+	bool flush = false;
 
-	write_lock(&kvm->mmu_lock);
-	flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
-				start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
-	write_unlock(&kvm->mmu_lock);
+	if (kvm_memslots_have_rmaps(kvm)) {
+		write_lock(&kvm->mmu_lock);
+		flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
+					  start_level, KVM_MAX_HUGEPAGE_LEVEL,
+					  false);
+		write_unlock(&kvm->mmu_lock);
+	}
 
 	if (is_tdp_mmu_enabled(kvm)) {
 		read_lock(&kvm->mmu_lock);
@@ -5616,16 +5635,15 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 	struct kvm_memory_slot *slot = (struct kvm_memory_slot *)memslot;
 	bool flush;
 
-	write_lock(&kvm->mmu_lock);
-	flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
-
-	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
-	write_unlock(&kvm->mmu_lock);
+	if (kvm_memslots_have_rmaps(kvm)) {
+		write_lock(&kvm->mmu_lock);
+		flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
+		if (flush)
+			kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+		write_unlock(&kvm->mmu_lock);
+	}
 
 	if (is_tdp_mmu_enabled(kvm)) {
-		flush = false;
-
 		read_lock(&kvm->mmu_lock);
 		flush = kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot, flush);
 		if (flush)
@@ -5652,11 +5670,14 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   struct kvm_memory_slot *memslot)
 {
-	bool flush;
+	bool flush = false;
 
-	write_lock(&kvm->mmu_lock);
-	flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false);
-	write_unlock(&kvm->mmu_lock);
+	if (kvm_memslots_have_rmaps(kvm)) {
+		write_lock(&kvm->mmu_lock);
+		flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty,
+					 false);
+		write_unlock(&kvm->mmu_lock);
+	}
 
 	if (is_tdp_mmu_enabled(kvm)) {
 		read_lock(&kvm->mmu_lock);
@@ -5681,6 +5702,14 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 	int ign;
 
 	write_lock(&kvm->mmu_lock);
+	if (is_tdp_mmu_enabled(kvm))
+		kvm_tdp_mmu_zap_all(kvm);
+
+	if (!kvm_memslots_have_rmaps(kvm)) {
+		write_unlock(&kvm->mmu_lock);
+		return;
+	}
+
 restart:
 	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
 		if (WARN_ON(sp->role.invalid))
@@ -5693,9 +5722,6 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 
-	if (is_tdp_mmu_enabled(kvm))
-		kvm_tdp_mmu_zap_all(kvm);
-
 	write_unlock(&kvm->mmu_lock);
 }
 
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU
  2021-05-06 18:42 [PATCH v3 0/8] Lazily allocate memslot rmaps Ben Gardon
                   ` (5 preceding siblings ...)
  2021-05-06 18:42 ` [PATCH v3 6/8] KVM: x86/mmu: Skip rmap operations if rmaps not allocated Ben Gardon
@ 2021-05-06 18:42 ` Ben Gardon
  2021-05-06 23:58   ` kernel test robot
                     ` (2 more replies)
  2021-05-06 18:42 ` [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps Ben Gardon
  2021-05-07  7:40 ` [PATCH v3 0/8] " David Hildenbrand
  8 siblings, 3 replies; 31+ messages in thread
From: Ben Gardon @ 2021-05-06 18:42 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu,
	Ben Gardon

In preparation for lazily allocating the rmaps when the TDP MMU is in
use, protect the rmaps with SRCU. Unfortunately, this requires
propagating a pointer to struct kvm around to several functions.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 57 +++++++++++++++++++++++++-----------------
 arch/x86/kvm/x86.c     |  6 ++---
 2 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 730ea84bf7e7..48067c572c02 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -927,13 +927,18 @@ static void pte_list_remove(struct kvm_rmap_head *rmap_head, u64 *sptep)
 	__pte_list_remove(sptep, rmap_head);
 }
 
-static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
+static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
+					   int level,
 					   struct kvm_memory_slot *slot)
 {
+	struct kvm_rmap_head *head;
 	unsigned long idx;
 
 	idx = gfn_to_index(gfn, slot->base_gfn, level);
-	return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
+	head = srcu_dereference_check(slot->arch.rmap[level - PG_LEVEL_4K],
+				      &kvm->srcu,
+				      lockdep_is_held(&kvm->slots_arch_lock));
+	return &head[idx];
 }
 
 static struct kvm_rmap_head *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
@@ -944,7 +949,7 @@ static struct kvm_rmap_head *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
 
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
-	return __gfn_to_rmap(gfn, sp->role.level, slot);
+	return __gfn_to_rmap(kvm, gfn, sp->role.level, slot);
 }
 
 static bool rmap_can_add(struct kvm_vcpu *vcpu)
@@ -1194,7 +1199,8 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 		return;
 
 	while (mask) {
-		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+		rmap_head = __gfn_to_rmap(kvm,
+					  slot->base_gfn + gfn_offset + __ffs(mask),
 					  PG_LEVEL_4K, slot);
 		__rmap_write_protect(kvm, rmap_head, false);
 
@@ -1227,7 +1233,8 @@ static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 		return;
 
 	while (mask) {
-		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+		rmap_head = __gfn_to_rmap(kvm,
+					  slot->base_gfn + gfn_offset + __ffs(mask),
 					  PG_LEVEL_4K, slot);
 		__rmap_clear_dirty(kvm, rmap_head, slot);
 
@@ -1270,7 +1277,7 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 
 	if (kvm_memslots_have_rmaps(kvm)) {
 		for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
-			rmap_head = __gfn_to_rmap(gfn, i, slot);
+			rmap_head = __gfn_to_rmap(kvm, gfn, i, slot);
 			write_protected |= __rmap_write_protect(kvm, rmap_head,
 								true);
 		}
@@ -1373,17 +1380,19 @@ struct slot_rmap_walk_iterator {
 };
 
 static void
-rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
+rmap_walk_init_level(struct kvm *kvm, struct slot_rmap_walk_iterator *iterator,
+		     int level)
 {
 	iterator->level = level;
 	iterator->gfn = iterator->start_gfn;
-	iterator->rmap = __gfn_to_rmap(iterator->gfn, level, iterator->slot);
-	iterator->end_rmap = __gfn_to_rmap(iterator->end_gfn, level,
+	iterator->rmap = __gfn_to_rmap(kvm, iterator->gfn, level,
+				       iterator->slot);
+	iterator->end_rmap = __gfn_to_rmap(kvm, iterator->end_gfn, level,
 					   iterator->slot);
 }
 
 static void
-slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
+slot_rmap_walk_init(struct kvm *kvm, struct slot_rmap_walk_iterator *iterator,
 		    struct kvm_memory_slot *slot, int start_level,
 		    int end_level, gfn_t start_gfn, gfn_t end_gfn)
 {
@@ -1393,7 +1402,7 @@ slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
 	iterator->start_gfn = start_gfn;
 	iterator->end_gfn = end_gfn;
 
-	rmap_walk_init_level(iterator, iterator->start_level);
+	rmap_walk_init_level(kvm, iterator, iterator->start_level);
 }
 
 static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
@@ -1401,7 +1410,8 @@ static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
 	return !!iterator->rmap;
 }
 
-static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
+static void slot_rmap_walk_next(struct kvm *kvm,
+				struct slot_rmap_walk_iterator *iterator)
 {
 	if (++iterator->rmap <= iterator->end_rmap) {
 		iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
@@ -1413,15 +1423,15 @@ static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
 		return;
 	}
 
-	rmap_walk_init_level(iterator, iterator->level);
+	rmap_walk_init_level(kvm, iterator, iterator->level);
 }
 
-#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_,	\
-	   _start_gfn, _end_gfn, _iter_)				\
-	for (slot_rmap_walk_init(_iter_, _slot_, _start_level_,		\
-				 _end_level_, _start_gfn, _end_gfn);	\
-	     slot_rmap_walk_okay(_iter_);				\
-	     slot_rmap_walk_next(_iter_))
+#define for_each_slot_rmap_range(_kvm_, _slot_, _start_level_, _end_level_,	\
+				 _start_gfn, _end_gfn, _iter_)			\
+	for (slot_rmap_walk_init(_kvm_, _iter_, _slot_, _start_level_,		\
+				 _end_level_, _start_gfn, _end_gfn);		\
+	     slot_rmap_walk_okay(_iter_);					\
+	     slot_rmap_walk_next(_kvm_, _iter_))
 
 typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			       struct kvm_memory_slot *slot, gfn_t gfn,
@@ -1434,8 +1444,9 @@ static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
 	struct slot_rmap_walk_iterator iterator;
 	bool ret = false;
 
-	for_each_slot_rmap_range(range->slot, PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
-				 range->start, range->end - 1, &iterator)
+	for_each_slot_rmap_range(kvm, range->slot, PG_LEVEL_4K,
+				 KVM_MAX_HUGEPAGE_LEVEL, range->start,
+				 range->end - 1, &iterator)
 		ret |= handler(kvm, iterator.rmap, range->slot, iterator.gfn,
 			       iterator.level, range->pte);
 
@@ -5233,8 +5244,8 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
 {
 	struct slot_rmap_walk_iterator iterator;
 
-	for_each_slot_rmap_range(memslot, start_level, end_level, start_gfn,
-			end_gfn, &iterator) {
+	for_each_slot_rmap_range(kvm, memslot, start_level, end_level,
+				 start_gfn, end_gfn, &iterator) {
 		if (iterator.rmap)
 			flush |= fn(kvm, iterator.rmap, memslot);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7a40ce342cc..1098ab73a704 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10854,9 +10854,9 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
 		lpages = gfn_to_index(slot->base_gfn + npages - 1,
 				      slot->base_gfn, level) + 1;
 
-		slot->arch.rmap[i] =
-			kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
-				 GFP_KERNEL_ACCOUNT);
+		rcu_assign_pointer(slot->arch.rmap[i],
+				   kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
+					    GFP_KERNEL_ACCOUNT));
 		if (!slot->arch.rmap[i])
 			goto out_free;
 	}
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps
  2021-05-06 18:42 [PATCH v3 0/8] Lazily allocate memslot rmaps Ben Gardon
                   ` (6 preceding siblings ...)
  2021-05-06 18:42 ` [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU Ben Gardon
@ 2021-05-06 18:42 ` Ben Gardon
  2021-05-07  1:10   ` kernel test robot
  2021-05-07  8:28   ` Paolo Bonzini
  2021-05-07  7:40 ` [PATCH v3 0/8] " David Hildenbrand
  8 siblings, 2 replies; 31+ messages in thread
From: Ben Gardon @ 2021-05-06 18:42 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu,
	Ben Gardon

If the TDP MMU is in use, wait to allocate the rmaps until the shadow
MMU is actually used. (i.e. a nested VM is launched.) This saves memory
equal to 0.2% of guest memory in cases where the TDP MMU is used and
there are no nested guests involved.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c          | 14 ++++++++++---
 arch/x86/kvm/mmu/tdp_mmu.c      |  6 ++++--
 arch/x86/kvm/mmu/tdp_mmu.h      |  4 ++--
 arch/x86/kvm/x86.c              | 37 ++++++++++++++++++++++++++++++++-
 5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 00065f9bbc5e..7b8e1532fb55 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1860,5 +1860,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 int kvm_cpu_dirty_log_size(void);
 
 inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
+int alloc_all_memslots_rmaps(struct kvm *kvm);
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 48067c572c02..e3a3b65829c5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3306,6 +3306,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	r = alloc_all_memslots_rmaps(vcpu->kvm);
+	if (r)
+		return r;
+
 	write_lock(&vcpu->kvm->mmu_lock);
 	r = make_mmu_pages_available(vcpu);
 	if (r < 0)
@@ -5494,9 +5498,13 @@ void kvm_mmu_init_vm(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
 
-	kvm_mmu_init_tdp_mmu(kvm);
-
-	kvm->arch.memslots_have_rmaps = true;
+	if (!kvm_mmu_init_tdp_mmu(kvm))
+		/*
+		 * No smp_load/store wrappers needed here as we are in
+		 * VM init and there cannot be any memslots / other threads
+		 * accessing this struct kvm yet.
+		 */
+		kvm->arch.memslots_have_rmaps = true;
 
 	node->track_write = kvm_mmu_pte_write;
 	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 83cbdbe5de5a..5342aca2c8e0 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -14,10 +14,10 @@ static bool __read_mostly tdp_mmu_enabled = false;
 module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644);
 
 /* Initializes the TDP MMU for the VM, if enabled. */
-void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
+bool kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 {
 	if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled))
-		return;
+		return false;
 
 	/* This should not be changed for the lifetime of the VM. */
 	kvm->arch.tdp_mmu_enabled = true;
@@ -25,6 +25,8 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
 	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
 	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
+
+	return true;
 }
 
 static __always_inline void kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..b046ab5137a1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -80,12 +80,12 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 			 int *root_level);
 
 #ifdef CONFIG_X86_64
-void kvm_mmu_init_tdp_mmu(struct kvm *kvm);
+bool kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
 static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; }
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
 #else
-static inline void kvm_mmu_init_tdp_mmu(struct kvm *kvm) {}
+static inline bool kvm_mmu_init_tdp_mmu(struct kvm *kvm) { return false; }
 static inline void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) {}
 static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; }
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1098ab73a704..95e74fb9fc20 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10868,9 +10868,44 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
 	return -ENOMEM;
 }
 
+int alloc_all_memslots_rmaps(struct kvm *kvm)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *slot;
+	int r = 0;
+	int i;
+
+	if (kvm_memslots_have_rmaps(kvm))
+		return 0;
+
+	mutex_lock(&kvm->slots_arch_lock);
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(kvm, i);
+		kvm_for_each_memslot(slot, slots) {
+			r = alloc_memslot_rmap(slot, slot->npages);
+			if (r) {
+				mutex_unlock(&kvm->slots_arch_lock);
+				return r;
+			}
+		}
+	}
+
+	/*
+	 * memslots_have_rmaps is set and read in different lock contexts,
+	 * so protect it with smp_load/store.
+	 */
+	smp_store_release(&kvm->arch.memslots_have_rmaps, true);
+	mutex_unlock(&kvm->slots_arch_lock);
+	return 0;
+}
+
 bool kvm_memslots_have_rmaps(struct kvm *kvm)
 {
-	return kvm->arch.memslots_have_rmaps;
+	/*
+	 * memslots_have_rmaps is set and read in different lock contexts,
+	 * so protect it with smp_load/store.
+	 */
+	return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
 }
 
 static int kvm_alloc_memslot_metadata(struct kvm *kvm,
-- 
2.31.1.607.g51e8a6a459-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 6/8] KVM: x86/mmu: Skip rmap operations if rmaps not allocated
  2021-05-06 18:42 ` [PATCH v3 6/8] KVM: x86/mmu: Skip rmap operations if rmaps not allocated Ben Gardon
@ 2021-05-06 23:07   ` kernel test robot
  0 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2021-05-06 23:07 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: kbuild-all, Paolo Bonzini, Peter Xu, Sean Christopherson,
	Peter Shier, Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang

[-- Attachment #1: Type: text/plain, Size: 3127 bytes --]

Hi Ben,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kvm/queue]
[also build test WARNING on tip/master linus/master next-20210506]
[cannot apply to linux/master v5.12]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: x86_64-randconfig-s021-20210506 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/c7c4f34907c708172e52fb04dc753a917a5eebb6
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
        git checkout c7c4f34907c708172e52fb04dc753a917a5eebb6
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   arch/x86/kvm/mmu/mmu.c: note: in included file (through include/linux/kvm_host.h, arch/x86/kvm/irq.h):
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition

vim +1871 arch/x86/include/asm/kvm_host.h

fb04a1eddb1a65 Peter Xu   2020-09-30  1870  
fac0d516d29b67 Ben Gardon 2021-05-06 @1871  inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
fac0d516d29b67 Ben Gardon 2021-05-06  1872  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38255 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation
  2021-05-06 18:42 ` [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation Ben Gardon
@ 2021-05-06 23:44   ` Ben Gardon
  2021-05-07  7:50     ` David Hildenbrand
  2021-05-07  8:28     ` Paolo Bonzini
  0 siblings, 2 replies; 31+ messages in thread
From: Ben Gardon @ 2021-05-06 23:44 UTC (permalink / raw)
  To: LKML, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On Thu, May 6, 2021 at 11:43 AM Ben Gardon <bgardon@google.com> wrote:
>
> Add a field to control whether new memslots should have rmaps allocated
> for them. As of this change, it's not safe to skip allocating rmaps, so
> the field is always set to allocate rmaps. Future changes will make it
> safe to operate without rmaps, using the TDP MMU. Then further changes
> will allow the rmaps to be allocated lazily when needed for nested
> oprtation.
>
> No functional change expected.
>
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  8 ++++++++
>  arch/x86/kvm/mmu/mmu.c          |  2 ++
>  arch/x86/kvm/x86.c              | 18 +++++++++++++-----
>  3 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ad22d4839bcc..00065f9bbc5e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1122,6 +1122,12 @@ struct kvm_arch {
>          */
>         spinlock_t tdp_mmu_pages_lock;
>  #endif /* CONFIG_X86_64 */
> +
> +       /*
> +        * If set, rmaps have been allocated for all memslots and should be
> +        * allocated for any newly created or modified memslots.
> +        */
> +       bool memslots_have_rmaps;
>  };
>
>  struct kvm_vm_stat {
> @@ -1853,4 +1859,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
>
>  int kvm_cpu_dirty_log_size(void);
>
> +inline bool kvm_memslots_have_rmaps(struct kvm *kvm);

Woops, this shouldn't be marked inline as it creates build problems
for the next patch with some configs.

> +
>  #endif /* _ASM_X86_KVM_HOST_H */
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 930ac8a7e7c9..8761b4925755 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5469,6 +5469,8 @@ void kvm_mmu_init_vm(struct kvm *kvm)
>
>         kvm_mmu_init_tdp_mmu(kvm);
>
> +       kvm->arch.memslots_have_rmaps = true;
> +
>         node->track_write = kvm_mmu_pte_write;
>         node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
>         kvm_page_track_register_notifier(kvm, node);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fc32a7dbe4c4..d7a40ce342cc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10868,7 +10868,13 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
>         return -ENOMEM;
>  }
>
> -static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> +bool kvm_memslots_have_rmaps(struct kvm *kvm)
> +{
> +       return kvm->arch.memslots_have_rmaps;
> +}
> +
> +static int kvm_alloc_memslot_metadata(struct kvm *kvm,
> +                                     struct kvm_memory_slot *slot,
>                                       unsigned long npages)
>  {
>         int i;
> @@ -10881,9 +10887,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
>          */
>         memset(&slot->arch, 0, sizeof(slot->arch));
>
> -       r = alloc_memslot_rmap(slot, npages);
> -       if (r)
> -               return r;
> +       if (kvm_memslots_have_rmaps(kvm)) {
> +               r = alloc_memslot_rmap(slot, npages);
> +               if (r)
> +                       return r;
> +       }
>
>         for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
>                 struct kvm_lpage_info *linfo;
> @@ -10954,7 +10962,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>                                 enum kvm_mr_change change)
>  {
>         if (change == KVM_MR_CREATE || change == KVM_MR_MOVE)
> -               return kvm_alloc_memslot_metadata(memslot,
> +               return kvm_alloc_memslot_metadata(kvm, memslot,
>                                                   mem->memory_size >> PAGE_SHIFT);
>         return 0;
>  }
> --
> 2.31.1.607.g51e8a6a459-goog
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU
  2021-05-06 18:42 ` [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU Ben Gardon
@ 2021-05-06 23:58   ` kernel test robot
  2021-05-07  0:56   ` kernel test robot
  2021-05-07  8:42   ` Paolo Bonzini
  2 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2021-05-06 23:58 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: kbuild-all, Paolo Bonzini, Peter Xu, Sean Christopherson,
	Peter Shier, Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang

[-- Attachment #1: Type: text/plain, Size: 4224 bytes --]

Hi Ben,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kvm/queue]
[also build test WARNING on tip/master linus/master next-20210506]
[cannot apply to linux/master v5.12]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: x86_64-randconfig-s021-20210506 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/dd56af97c1d2b22f9acd46d33c83ac5e7bf67acc
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
        git checkout dd56af97c1d2b22f9acd46d33c83ac5e7bf67acc
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> arch/x86/kvm/mmu/mmu.c:938:16: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> arch/x86/kvm/mmu/mmu.c:938:16: sparse:    struct kvm_rmap_head [noderef] __rcu *
>> arch/x86/kvm/mmu/mmu.c:938:16: sparse:    struct kvm_rmap_head *
   arch/x86/kvm/mmu/mmu.c: note: in included file (through include/linux/kvm_host.h, arch/x86/kvm/irq.h):
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
   arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
--
   arch/x86/kvm/x86.c:10926:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> arch/x86/kvm/x86.c:10926:17: sparse:    struct kvm_rmap_head [noderef] __rcu *
>> arch/x86/kvm/x86.c:10926:17: sparse:    struct kvm_rmap_head *
   arch/x86/kvm/x86.c: note: in included file (through include/linux/notifier.h, include/linux/memory_hotplug.h, include/linux/mmzone.h, ...):
   include/linux/srcu.h:182:9: sparse: sparse: context imbalance in 'vcpu_enter_guest' - unexpected unlock

vim +938 arch/x86/kvm/mmu/mmu.c

   929	
   930	static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
   931						   int level,
   932						   struct kvm_memory_slot *slot)
   933	{
   934		struct kvm_rmap_head *head;
   935		unsigned long idx;
   936	
   937		idx = gfn_to_index(gfn, slot->base_gfn, level);
 > 938		head = srcu_dereference_check(slot->arch.rmap[level - PG_LEVEL_4K],
   939					      &kvm->srcu,
   940					      lockdep_is_held(&kvm->slots_arch_lock));
   941		return &head[idx];
   942	}
   943	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38255 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU
  2021-05-06 18:42 ` [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU Ben Gardon
  2021-05-06 23:58   ` kernel test robot
@ 2021-05-07  0:56   ` kernel test robot
  2021-05-07  8:42   ` Paolo Bonzini
  2 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2021-05-07  0:56 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: kbuild-all, Paolo Bonzini, Peter Xu, Sean Christopherson,
	Peter Shier, Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang

[-- Attachment #1: Type: text/plain, Size: 8939 bytes --]

Hi Ben,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kvm/queue]
[also build test ERROR on tip/master linus/master next-20210506]
[cannot apply to linux/master v5.12]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: i386-randconfig-r032-20210506 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/dd56af97c1d2b22f9acd46d33c83ac5e7bf67acc
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
        git checkout dd56af97c1d2b22f9acd46d33c83ac5e7bf67acc
        # save the attached .config to linux build tree
        make W=1 W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from arch/x86/kvm/mmu/mmu.c:1821:
   arch/x86/kvm/mmu/mmu_audit.c: In function 'inspect_spte_has_rmap':
>> arch/x86/kvm/mmu/mmu_audit.c:150:28: warning: passing argument 1 of '__gfn_to_rmap' makes pointer from integer without a cast [-Wint-conversion]
     150 |  rmap_head = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
         |                            ^~~
         |                            |
         |                            gfn_t {aka long long unsigned int}
   arch/x86/kvm/mmu/mmu.c:930:56: note: expected 'struct kvm *' but argument is of type 'gfn_t' {aka 'long long unsigned int'}
     930 | static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
         |                                            ~~~~~~~~~~~~^~~
   In file included from arch/x86/kvm/mmu/mmu.c:1821:
>> arch/x86/kvm/mmu/mmu_audit.c:150:53: warning: passing argument 3 of '__gfn_to_rmap' makes integer from pointer without a cast [-Wint-conversion]
     150 |  rmap_head = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
         |                                                     ^~~~
         |                                                     |
         |                                                     struct kvm_memory_slot *
   arch/x86/kvm/mmu/mmu.c:931:13: note: expected 'int' but argument is of type 'struct kvm_memory_slot *'
     931 |         int level,
         |         ~~~~^~~~~
   In file included from arch/x86/kvm/mmu/mmu.c:1821:
>> arch/x86/kvm/mmu/mmu_audit.c:150:14: error: too few arguments to function '__gfn_to_rmap'
     150 |  rmap_head = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
         |              ^~~~~~~~~~~~~
   arch/x86/kvm/mmu/mmu.c:930:30: note: declared here
     930 | static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
         |                              ^~~~~~~~~~~~~
   In file included from arch/x86/kvm/mmu/mmu.c:1821:
   arch/x86/kvm/mmu/mmu_audit.c: In function 'audit_write_protection':
   arch/x86/kvm/mmu/mmu_audit.c:203:30: warning: passing argument 1 of '__gfn_to_rmap' makes pointer from integer without a cast [-Wint-conversion]
     203 |  rmap_head = __gfn_to_rmap(sp->gfn, PG_LEVEL_4K, slot);
         |                            ~~^~~~~
         |                              |
         |                              gfn_t {aka long long unsigned int}
   arch/x86/kvm/mmu/mmu.c:930:56: note: expected 'struct kvm *' but argument is of type 'gfn_t' {aka 'long long unsigned int'}
     930 | static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
         |                                            ~~~~~~~~~~~~^~~
   In file included from arch/x86/kvm/mmu/mmu.c:1821:
   arch/x86/kvm/mmu/mmu_audit.c:203:50: warning: passing argument 3 of '__gfn_to_rmap' makes integer from pointer without a cast [-Wint-conversion]
     203 |  rmap_head = __gfn_to_rmap(sp->gfn, PG_LEVEL_4K, slot);
         |                                                  ^~~~
         |                                                  |
         |                                                  struct kvm_memory_slot *
   arch/x86/kvm/mmu/mmu.c:931:13: note: expected 'int' but argument is of type 'struct kvm_memory_slot *'
     931 |         int level,
         |         ~~~~^~~~~
   In file included from arch/x86/kvm/mmu/mmu.c:1821:
   arch/x86/kvm/mmu/mmu_audit.c:203:14: error: too few arguments to function '__gfn_to_rmap'
     203 |  rmap_head = __gfn_to_rmap(sp->gfn, PG_LEVEL_4K, slot);
         |              ^~~~~~~~~~~~~
   arch/x86/kvm/mmu/mmu.c:930:30: note: declared here
     930 | static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
         |                              ^~~~~~~~~~~~~


vim +/__gfn_to_rmap +150 arch/x86/kvm/mmu/mmu_audit.c

2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  125  
eb2591865a234c arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  126  static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  127  {
bd80158aff71a8 arch/x86/kvm/mmu_audit.c     Jan Kiszka          2011-09-12  128  	static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
018aabb56d6109 arch/x86/kvm/mmu_audit.c     Takuya Yoshikawa    2015-11-20  129  	struct kvm_rmap_head *rmap_head;
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  130  	struct kvm_mmu_page *rev_sp;
699023e239658e arch/x86/kvm/mmu_audit.c     Paolo Bonzini       2015-05-18  131  	struct kvm_memslots *slots;
699023e239658e arch/x86/kvm/mmu_audit.c     Paolo Bonzini       2015-05-18  132  	struct kvm_memory_slot *slot;
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  133  	gfn_t gfn;
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  134  
573546820b792e arch/x86/kvm/mmu/mmu_audit.c Sean Christopherson 2020-06-22  135  	rev_sp = sptep_to_sp(sptep);
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  136  	gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt);
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  137  
699023e239658e arch/x86/kvm/mmu_audit.c     Paolo Bonzini       2015-05-18  138  	slots = kvm_memslots_for_spte_role(kvm, rev_sp->role);
699023e239658e arch/x86/kvm/mmu_audit.c     Paolo Bonzini       2015-05-18  139  	slot = __gfn_to_memslot(slots, gfn);
699023e239658e arch/x86/kvm/mmu_audit.c     Paolo Bonzini       2015-05-18  140  	if (!slot) {
bd80158aff71a8 arch/x86/kvm/mmu_audit.c     Jan Kiszka          2011-09-12  141  		if (!__ratelimit(&ratelimit_state))
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  142  			return;
b034cf0105235e arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-12-23  143  		audit_printk(kvm, "no memslot for gfn %llx\n", gfn);
b034cf0105235e arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-12-23  144  		audit_printk(kvm, "index %ld of sp (gfn=%llx)\n",
38904e128778c3 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-09-27  145  		       (long int)(sptep - rev_sp->spt), rev_sp->gfn);
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  146  		dump_stack();
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  147  		return;
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  148  	}
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  149  
018aabb56d6109 arch/x86/kvm/mmu_audit.c     Takuya Yoshikawa    2015-11-20 @150  	rmap_head = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
018aabb56d6109 arch/x86/kvm/mmu_audit.c     Takuya Yoshikawa    2015-11-20  151  	if (!rmap_head->val) {
bd80158aff71a8 arch/x86/kvm/mmu_audit.c     Jan Kiszka          2011-09-12  152  		if (!__ratelimit(&ratelimit_state))
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  153  			return;
b034cf0105235e arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-12-23  154  		audit_printk(kvm, "no rmap for writable spte %llx\n",
b034cf0105235e arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-12-23  155  			     *sptep);
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  156  		dump_stack();
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  157  	}
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  158  }
2f4f337248cd56 arch/x86/kvm/mmu_audit.c     Xiao Guangrong      2010-08-30  159  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 43598 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps
  2021-05-06 18:42 ` [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps Ben Gardon
@ 2021-05-07  1:10   ` kernel test robot
  2021-05-07  8:28   ` Paolo Bonzini
  1 sibling, 0 replies; 31+ messages in thread
From: kernel test robot @ 2021-05-07  1:10 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: kbuild-all, Paolo Bonzini, Peter Xu, Sean Christopherson,
	Peter Shier, Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang

[-- Attachment #1: Type: text/plain, Size: 6970 bytes --]

Hi Ben,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kvm/queue]
[also build test WARNING on tip/master linus/master next-20210506]
[cannot apply to linux/master v5.12]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: x86_64-randconfig-s021-20210506 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/43798461d3f4a38ef487d9c626dd0fb13e403328
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
        git checkout 43798461d3f4a38ef487d9c626dd0fb13e403328
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   arch/x86/kvm/x86.c:10926:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
   arch/x86/kvm/x86.c:10926:17: sparse:    struct kvm_rmap_head [noderef] __rcu *
   arch/x86/kvm/x86.c:10926:17: sparse:    struct kvm_rmap_head *
   arch/x86/kvm/x86.c: note: in included file (through include/linux/notifier.h, include/linux/memory_hotplug.h, include/linux/mmzone.h, ...):
   include/linux/srcu.h:182:9: sparse: sparse: context imbalance in 'vcpu_enter_guest' - unexpected unlock
>> arch/x86/kvm/x86.c:10971:29: sparse: sparse: marked inline, but without a definition

vim +10971 arch/x86/kvm/x86.c

db3fe4eb45f355 Takuya Yoshikawa 2012-02-08  10913  
11bb59d1c3e83b Ben Gardon       2021-05-06  10914  static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
11bb59d1c3e83b Ben Gardon       2021-05-06  10915  			      unsigned long npages)
11bb59d1c3e83b Ben Gardon       2021-05-06  10916  {
11bb59d1c3e83b Ben Gardon       2021-05-06  10917  	int i;
11bb59d1c3e83b Ben Gardon       2021-05-06  10918  
11bb59d1c3e83b Ben Gardon       2021-05-06  10919  	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
11bb59d1c3e83b Ben Gardon       2021-05-06  10920  		int lpages;
11bb59d1c3e83b Ben Gardon       2021-05-06  10921  		int level = i + 1;
11bb59d1c3e83b Ben Gardon       2021-05-06  10922  
11bb59d1c3e83b Ben Gardon       2021-05-06  10923  		lpages = gfn_to_index(slot->base_gfn + npages - 1,
11bb59d1c3e83b Ben Gardon       2021-05-06  10924  				      slot->base_gfn, level) + 1;
11bb59d1c3e83b Ben Gardon       2021-05-06  10925  
dd56af97c1d2b2 Ben Gardon       2021-05-06 @10926  		rcu_assign_pointer(slot->arch.rmap[i],
11bb59d1c3e83b Ben Gardon       2021-05-06  10927  				   kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
dd56af97c1d2b2 Ben Gardon       2021-05-06  10928  					    GFP_KERNEL_ACCOUNT));
11bb59d1c3e83b Ben Gardon       2021-05-06  10929  		if (!slot->arch.rmap[i])
11bb59d1c3e83b Ben Gardon       2021-05-06  10930  			goto out_free;
11bb59d1c3e83b Ben Gardon       2021-05-06  10931  	}
11bb59d1c3e83b Ben Gardon       2021-05-06  10932  
11bb59d1c3e83b Ben Gardon       2021-05-06  10933  	return 0;
11bb59d1c3e83b Ben Gardon       2021-05-06  10934  
11bb59d1c3e83b Ben Gardon       2021-05-06  10935  out_free:
11bb59d1c3e83b Ben Gardon       2021-05-06  10936  	free_memslot_rmap(slot);
11bb59d1c3e83b Ben Gardon       2021-05-06  10937  	return -ENOMEM;
11bb59d1c3e83b Ben Gardon       2021-05-06  10938  }
11bb59d1c3e83b Ben Gardon       2021-05-06  10939  
43798461d3f4a3 Ben Gardon       2021-05-06  10940  int alloc_all_memslots_rmaps(struct kvm *kvm)
43798461d3f4a3 Ben Gardon       2021-05-06  10941  {
43798461d3f4a3 Ben Gardon       2021-05-06  10942  	struct kvm_memslots *slots;
43798461d3f4a3 Ben Gardon       2021-05-06  10943  	struct kvm_memory_slot *slot;
43798461d3f4a3 Ben Gardon       2021-05-06  10944  	int r = 0;
43798461d3f4a3 Ben Gardon       2021-05-06  10945  	int i;
43798461d3f4a3 Ben Gardon       2021-05-06  10946  
43798461d3f4a3 Ben Gardon       2021-05-06  10947  	if (kvm_memslots_have_rmaps(kvm))
43798461d3f4a3 Ben Gardon       2021-05-06  10948  		return 0;
43798461d3f4a3 Ben Gardon       2021-05-06  10949  
43798461d3f4a3 Ben Gardon       2021-05-06  10950  	mutex_lock(&kvm->slots_arch_lock);
43798461d3f4a3 Ben Gardon       2021-05-06  10951  	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
43798461d3f4a3 Ben Gardon       2021-05-06  10952  		slots = __kvm_memslots(kvm, i);
43798461d3f4a3 Ben Gardon       2021-05-06  10953  		kvm_for_each_memslot(slot, slots) {
43798461d3f4a3 Ben Gardon       2021-05-06  10954  			r = alloc_memslot_rmap(slot, slot->npages);
43798461d3f4a3 Ben Gardon       2021-05-06  10955  			if (r) {
43798461d3f4a3 Ben Gardon       2021-05-06  10956  				mutex_unlock(&kvm->slots_arch_lock);
43798461d3f4a3 Ben Gardon       2021-05-06  10957  				return r;
43798461d3f4a3 Ben Gardon       2021-05-06  10958  			}
43798461d3f4a3 Ben Gardon       2021-05-06  10959  		}
43798461d3f4a3 Ben Gardon       2021-05-06  10960  	}
43798461d3f4a3 Ben Gardon       2021-05-06  10961  
43798461d3f4a3 Ben Gardon       2021-05-06  10962  	/*
43798461d3f4a3 Ben Gardon       2021-05-06  10963  	 * memslots_have_rmaps is set and read in different lock contexts,
43798461d3f4a3 Ben Gardon       2021-05-06  10964  	 * so protect it with smp_load/store.
43798461d3f4a3 Ben Gardon       2021-05-06  10965  	 */
43798461d3f4a3 Ben Gardon       2021-05-06  10966  	smp_store_release(&kvm->arch.memslots_have_rmaps, true);
43798461d3f4a3 Ben Gardon       2021-05-06  10967  	mutex_unlock(&kvm->slots_arch_lock);
43798461d3f4a3 Ben Gardon       2021-05-06  10968  	return 0;
43798461d3f4a3 Ben Gardon       2021-05-06  10969  }
43798461d3f4a3 Ben Gardon       2021-05-06  10970  
fac0d516d29b67 Ben Gardon       2021-05-06 @10971  bool kvm_memslots_have_rmaps(struct kvm *kvm)
fac0d516d29b67 Ben Gardon       2021-05-06  10972  {
43798461d3f4a3 Ben Gardon       2021-05-06  10973  	/*
43798461d3f4a3 Ben Gardon       2021-05-06  10974  	 * memslots_have_rmaps is set and read in different lock contexts,
43798461d3f4a3 Ben Gardon       2021-05-06  10975  	 * so protect it with smp_load/store.
43798461d3f4a3 Ben Gardon       2021-05-06  10976  	 */
43798461d3f4a3 Ben Gardon       2021-05-06  10977  	return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
fac0d516d29b67 Ben Gardon       2021-05-06  10978  }
fac0d516d29b67 Ben Gardon       2021-05-06  10979  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38255 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/8] Lazily allocate memslot rmaps
  2021-05-06 18:42 [PATCH v3 0/8] Lazily allocate memslot rmaps Ben Gardon
                   ` (7 preceding siblings ...)
  2021-05-06 18:42 ` [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps Ben Gardon
@ 2021-05-07  7:40 ` David Hildenbrand
  8 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2021-05-07  7:40 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On 06.05.21 20:42, Ben Gardon wrote:
> This series enables KVM to save memory when using the TDP MMU by waiting
> to allocate memslot rmaps until they are needed. To do this, KVM tracks
> whether or not a shadow root has been allocated. In order to get away
> with not allocating the rmaps, KVM must also be sure to skip operations
> which iterate over the rmaps. If the TDP MMU is in use and we have not
> allocated a shadow root, these operations would essentially be op-ops
> anyway. Skipping the rmap operations has a secondary benefit of avoiding
> acquiring the MMU lock in write mode in many cases, substantially
> reducing MMU lock contention.
> 
> This series was tested on an Intel Skylake machine. With the TDP MMU off
> and on, this introduced no new failures on kvm-unit-tests or KVM selftests.
> 

Happy to see this change pop up, I remember discussing this with Paolo 
recently.

Another step to reduce the rmap overhead could be looking into using a 
dynamic datastructure to manage the rmap, instead of allocating a 
fixed-sized array. That could also significantly reduce memory overhead 
in some setups and give us more flexibility, for example, for resizing 
or splitting slots atomically.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/8] KVM: x86/mmu: Deduplicate rmap freeing
  2021-05-06 18:42 ` [PATCH v3 1/8] KVM: x86/mmu: Deduplicate rmap freeing Ben Gardon
@ 2021-05-07  7:42   ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2021-05-07  7:42 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On 06.05.21 20:42, Ben Gardon wrote:
> Small code deduplication. No functional change expected.
> 
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
>   arch/x86/kvm/x86.c | 19 +++++++++++--------
>   1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf3b67679cf0..5bcf07465c47 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10818,17 +10818,23 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>   	kvm_hv_destroy_vm(kvm);
>   }
>   
> -void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> +static void free_memslot_rmap(struct kvm_memory_slot *slot)
>   {
>   	int i;
>   
>   	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
>   		kvfree(slot->arch.rmap[i]);
>   		slot->arch.rmap[i] = NULL;
> +	}
> +}
>   
> -		if (i == 0)
> -			continue;
> +void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> +{
> +	int i;
> +
> +	free_memslot_rmap(slot);
>   
> +	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
>   		kvfree(slot->arch.lpage_info[i - 1]);
>   		slot->arch.lpage_info[i - 1] = NULL;
>   	}
> @@ -10894,12 +10900,9 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
>   	return 0;
>   
>   out_free:
> -	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> -		kvfree(slot->arch.rmap[i]);
> -		slot->arch.rmap[i] = NULL;
> -		if (i == 0)
> -			continue;
> +	free_memslot_rmap(slot);
>   
> +	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
>   		kvfree(slot->arch.lpage_info[i - 1]);
>   		slot->arch.lpage_info[i - 1] = NULL;
>   	}
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/8] KVM: x86/mmu: Factor out allocating memslot rmap
  2021-05-06 18:42 ` [PATCH v3 2/8] KVM: x86/mmu: Factor out allocating memslot rmap Ben Gardon
@ 2021-05-07  7:46   ` David Hildenbrand
  2021-05-10 16:29     ` Ben Gardon
  0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2021-05-07  7:46 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On 06.05.21 20:42, Ben Gardon wrote:
> Small refactor to facilitate allocating rmaps for all memslots at once.
> 
> No functional change expected.
> 
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
>   arch/x86/kvm/x86.c | 41 ++++++++++++++++++++++++++++++++---------
>   1 file changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5bcf07465c47..fc32a7dbe4c4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10842,10 +10842,37 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
>   	kvm_page_track_free_memslot(slot);
>   }
>   
> +static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
> +			      unsigned long npages)

I'd have called the functions memslot_rmap_alloc() and memslot_rmap_free()


> +{
> +	int i;
> +
> +	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> +		int lpages;
> +		int level = i + 1;
> +
> +		lpages = gfn_to_index(slot->base_gfn + npages - 1,
> +				      slot->base_gfn, level) + 1;
> +
> +		slot->arch.rmap[i] =
> +			kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
> +				 GFP_KERNEL_ACCOUNT);
> +		if (!slot->arch.rmap[i])
> +			goto out_free;

you can just avoid the goto here and do the free_memslot_rmap() right away.

> +	}
> +
> +	return 0;
> +
> +out_free:
> +	free_memslot_rmap(slot);
> +	return -ENOMEM;
> +}
> +
>   static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
>   				      unsigned long npages)
>   {
>   	int i;
> +	int r;
>   
>   	/*
>   	 * Clear out the previous array pointers for the KVM_MR_MOVE case.  The
> @@ -10854,7 +10881,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
>   	 */
>   	memset(&slot->arch, 0, sizeof(slot->arch));
>   
> -	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> +	r = alloc_memslot_rmap(slot, npages);
> +	if (r)
> +		return r;
> +
> +	for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
>   		struct kvm_lpage_info *linfo;
>   		unsigned long ugfn;
>   		int lpages;
> @@ -10863,14 +10894,6 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
>   		lpages = gfn_to_index(slot->base_gfn + npages - 1,
>   				      slot->base_gfn, level) + 1;
>   
> -		slot->arch.rmap[i] =
> -			kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
> -				 GFP_KERNEL_ACCOUNT);
> -		if (!slot->arch.rmap[i])
> -			goto out_free;
> -		if (i == 0)
> -			continue;
> -
>   		linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
>   		if (!linfo)
>   			goto out_free;
> 

apart from that LGTM

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/8] KVM: mmu: Refactor memslot copy
  2021-05-06 18:42 ` [PATCH v3 3/8] KVM: mmu: Refactor memslot copy Ben Gardon
@ 2021-05-07  7:48   ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2021-05-07  7:48 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On 06.05.21 20:42, Ben Gardon wrote:
> Factor out copying kvm_memslots from allocating the memory for new ones
> in preparation for adding a new lock to protect the arch-specific fields
> of the memslots.
> 
> No functional change intended.
> 
> Signed-off-by: Ben Gardon <bgardon@google.com>
> ---
>   virt/kvm/kvm_main.c | 23 ++++++++++++++++-------
>   1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2799c6660cce..c8010f55e368 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1306,6 +1306,18 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>   	return old_memslots;
>   }
>   
> +static size_t kvm_memslots_size(int slots)
> +{
> +	return sizeof(struct kvm_memslots) +
> +	       (sizeof(struct kvm_memory_slot) * slots);

no need for the extra set of parentheses

> +}
> +
> +static void kvm_copy_memslots(struct kvm_memslots *from,
> +			      struct kvm_memslots *to)
> +{
> +	memcpy(to, from, kvm_memslots_size(from->used_slots));
> +}
> +
>   /*
>    * Note, at a minimum, the current number of used slots must be allocated, even
>    * when deleting a memslot, as we need a complete duplicate of the memslots for
> @@ -1315,19 +1327,16 @@ static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
>   					     enum kvm_mr_change change)
>   {
>   	struct kvm_memslots *slots;
> -	size_t old_size, new_size;
> -
> -	old_size = sizeof(struct kvm_memslots) +
> -		   (sizeof(struct kvm_memory_slot) * old->used_slots);
> +	size_t new_size;
>   
>   	if (change == KVM_MR_CREATE)
> -		new_size = old_size + sizeof(struct kvm_memory_slot);
> +		new_size = kvm_memslots_size(old->used_slots + 1);
>   	else
> -		new_size = old_size;
> +		new_size = kvm_memslots_size(old->used_slots);
>   
>   	slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
>   	if (likely(slots))
> -		memcpy(slots, old, old_size);
> +		kvm_copy_memslots(old, slots);
>   
>   	return slots;
>   }
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation
  2021-05-06 23:44   ` Ben Gardon
@ 2021-05-07  7:50     ` David Hildenbrand
  2021-05-07  8:28     ` Paolo Bonzini
  1 sibling, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2021-05-07  7:50 UTC (permalink / raw)
  To: Ben Gardon, LKML, kvm
  Cc: Paolo Bonzini, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On 07.05.21 01:44, Ben Gardon wrote:
> On Thu, May 6, 2021 at 11:43 AM Ben Gardon <bgardon@google.com> wrote:
>>
>> Add a field to control whether new memslots should have rmaps allocated
>> for them. As of this change, it's not safe to skip allocating rmaps, so
>> the field is always set to allocate rmaps. Future changes will make it
>> safe to operate without rmaps, using the TDP MMU. Then further changes
>> will allow the rmaps to be allocated lazily when needed for nested
>> oprtation.
>>
>> No functional change expected.
>>
>> Signed-off-by: Ben Gardon <bgardon@google.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  8 ++++++++
>>   arch/x86/kvm/mmu/mmu.c          |  2 ++
>>   arch/x86/kvm/x86.c              | 18 +++++++++++++-----
>>   3 files changed, 23 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index ad22d4839bcc..00065f9bbc5e 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1122,6 +1122,12 @@ struct kvm_arch {
>>           */
>>          spinlock_t tdp_mmu_pages_lock;
>>   #endif /* CONFIG_X86_64 */
>> +
>> +       /*
>> +        * If set, rmaps have been allocated for all memslots and should be
>> +        * allocated for any newly created or modified memslots.
>> +        */
>> +       bool memslots_have_rmaps;
>>   };
>>
>>   struct kvm_vm_stat {
>> @@ -1853,4 +1859,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
>>
>>   int kvm_cpu_dirty_log_size(void);
>>
>> +inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
> 
> Woops, this shouldn't be marked inline as it creates build problems
> for the next patch with some configs.

With that fixed

Reviewed-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps
  2021-05-06 18:42 ` [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps Ben Gardon
  2021-05-07  1:10   ` kernel test robot
@ 2021-05-07  8:28   ` Paolo Bonzini
  1 sibling, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2021-05-07  8:28 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Yulei Zhang,
	Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On 06/05/21 20:42, Ben Gardon wrote:
> +	/*
> +	 * memslots_have_rmaps is set and read in different lock contexts,
> +	 * so protect it with smp_load/store.
> +	 */
> +	smp_store_release(&kvm->arch.memslots_have_rmaps, true);

Shorter and better: /* write rmap pointers before memslots_have_rmaps */

> +	mutex_unlock(&kvm->slots_arch_lock);
> +	return 0;
> +}
> +
>   bool kvm_memslots_have_rmaps(struct kvm *kvm)
>   {
> -	return kvm->arch.memslots_have_rmaps;
> +	/*
> +	 * memslots_have_rmaps is set and read in different lock contexts,
> +	 * so protect it with smp_load/store.
> +	 */
> +	return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
>   }
>   

Likewise, /* read memslots_have_rmaps before the rmaps themselves */

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation
  2021-05-06 23:44   ` Ben Gardon
  2021-05-07  7:50     ` David Hildenbrand
@ 2021-05-07  8:28     ` Paolo Bonzini
  2021-05-10 16:14       ` Ben Gardon
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2021-05-07  8:28 UTC (permalink / raw)
  To: Ben Gardon, LKML, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Yulei Zhang,
	Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On 07/05/21 01:44, Ben Gardon wrote:
>>   struct kvm_vm_stat {
>> @@ -1853,4 +1859,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
>>
>>   int kvm_cpu_dirty_log_size(void);
>>
>> +inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
> Woops, this shouldn't be marked inline as it creates build problems
> for the next patch with some configs.
> 

Possibly stupid (or at least lazy) question: why can't it be a "normal" 
static inline function?

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU
  2021-05-06 18:42 ` [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU Ben Gardon
  2021-05-06 23:58   ` kernel test robot
  2021-05-07  0:56   ` kernel test robot
@ 2021-05-07  8:42   ` Paolo Bonzini
  2021-05-10 17:45     ` Sean Christopherson
  2 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2021-05-07  8:42 UTC (permalink / raw)
  To: Ben Gardon, linux-kernel, kvm
  Cc: Peter Xu, Sean Christopherson, Peter Shier, Yulei Zhang,
	Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On 06/05/21 20:42, Ben Gardon wrote:
> In preparation for lazily allocating the rmaps when the TDP MMU is in
> use, protect the rmaps with SRCU. Unfortunately, this requires
> propagating a pointer to struct kvm around to several functions.

Thinking more about it, this is not needed because all reads of the rmap 
array are guarded by the load-acquire of kvm->arch.memslots_have_rmaps. 
  That is, the pattern is always

	if (!load-acquire(memslot_have_rmaps))
		return;
	... = __gfn_to_rmap(...)

				slots->arch.rmap[x] = ...
				store-release(memslot_have_rmaps, true)

where the load-acquire/store-release have the same role that 
srcu_dereference/rcu_assign_pointer had before this patch.

We also know that any read that misses the check has the potential for a 
NULL pointer dereference, so it *has* to be like that.

That said, srcu_dereference has zero cost unless debugging options are 
enabled, and it *is* true that the rmap can disappear if kvm->srcu is 
not held, so I lean towards keeping this change and just changing the 
commit message like this:

---------
Currently, rmaps are always allocated and published together with a new 
memslot, so the srcu_dereference for the memslots array already ensures 
that the memory pointed to by slots->arch.rmap is zero at the time 
slots->arch.rmap.  However, they still need to be accessed in an SRCU 
read-side critical section, as the whole memslot can be deleted outside 
SRCU.
--------

Thanks,

Paolo

> 
> Suggested-by: Paolo Bonzini<pbonzini@redhat.com>
> Signed-off-by: Ben Gardon<bgardon@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c | 57 +++++++++++++++++++++++++-----------------
>   arch/x86/kvm/x86.c     |  6 ++---
>   2 files changed, 37 insertions(+), 26 deletions(-)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation
  2021-05-07  8:28     ` Paolo Bonzini
@ 2021-05-10 16:14       ` Ben Gardon
  2021-05-10 16:33         ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Ben Gardon @ 2021-05-10 16:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On Fri, May 7, 2021 at 1:28 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 07/05/21 01:44, Ben Gardon wrote:
> >>   struct kvm_vm_stat {
> >> @@ -1853,4 +1859,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
> >>
> >>   int kvm_cpu_dirty_log_size(void);
> >>
> >> +inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
> > Woops, this shouldn't be marked inline as it creates build problems
> > for the next patch with some configs.
> >
>
> Possibly stupid (or at least lazy) question: why can't it be a "normal"
> static inline function?

That was my initial approach (hence the leftover inline) but I got
some warnings about a forward declaration of struct kvm because
arch/x86/include/asm/kvm_host.h doesn't include virt/kvm/kvm_host.h.
Maybe there's a way to fix that, but I didn't want to mess with it.

>
> Paolo
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/8] KVM: x86/mmu: Factor out allocating memslot rmap
  2021-05-07  7:46   ` David Hildenbrand
@ 2021-05-10 16:29     ` Ben Gardon
  0 siblings, 0 replies; 31+ messages in thread
From: Ben Gardon @ 2021-05-10 16:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, kvm, Paolo Bonzini, Peter Xu, Sean Christopherson,
	Peter Shier, Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang,
	Keqian Zhu

On Fri, May 7, 2021 at 12:47 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 06.05.21 20:42, Ben Gardon wrote:
> > Small refactor to facilitate allocating rmaps for all memslots at once.
> >
> > No functional change expected.
> >
> > Signed-off-by: Ben Gardon <bgardon@google.com>
> > ---
> >   arch/x86/kvm/x86.c | 41 ++++++++++++++++++++++++++++++++---------
> >   1 file changed, 32 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 5bcf07465c47..fc32a7dbe4c4 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10842,10 +10842,37 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> >       kvm_page_track_free_memslot(slot);
> >   }
> >
> > +static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
> > +                           unsigned long npages)
>
> I'd have called the functions memslot_rmap_alloc() and memslot_rmap_free()
>
>
> > +{
> > +     int i;
> > +
> > +     for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> > +             int lpages;
> > +             int level = i + 1;
> > +
> > +             lpages = gfn_to_index(slot->base_gfn + npages - 1,
> > +                                   slot->base_gfn, level) + 1;
> > +
> > +             slot->arch.rmap[i] =
> > +                     kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
> > +                              GFP_KERNEL_ACCOUNT);
> > +             if (!slot->arch.rmap[i])
> > +                     goto out_free;
>
> you can just avoid the goto here and do the free_memslot_rmap() right away.

Oh good catch. I'll incorporate your suggestions in v4.

>
> > +     }
> > +
> > +     return 0;
> > +
> > +out_free:
> > +     free_memslot_rmap(slot);
> > +     return -ENOMEM;
> > +}
> > +
> >   static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> >                                     unsigned long npages)
> >   {
> >       int i;
> > +     int r;
> >
> >       /*
> >        * Clear out the previous array pointers for the KVM_MR_MOVE case.  The
> > @@ -10854,7 +10881,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> >        */
> >       memset(&slot->arch, 0, sizeof(slot->arch));
> >
> > -     for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> > +     r = alloc_memslot_rmap(slot, npages);
> > +     if (r)
> > +             return r;
> > +
> > +     for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
> >               struct kvm_lpage_info *linfo;
> >               unsigned long ugfn;
> >               int lpages;
> > @@ -10863,14 +10894,6 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> >               lpages = gfn_to_index(slot->base_gfn + npages - 1,
> >                                     slot->base_gfn, level) + 1;
> >
> > -             slot->arch.rmap[i] =
> > -                     kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
> > -                              GFP_KERNEL_ACCOUNT);
> > -             if (!slot->arch.rmap[i])
> > -                     goto out_free;
> > -             if (i == 0)
> > -                     continue;
> > -
> >               linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
> >               if (!linfo)
> >                       goto out_free;
> >
>
> apart from that LGTM
>
> --
> Thanks,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation
  2021-05-10 16:14       ` Ben Gardon
@ 2021-05-10 16:33         ` Paolo Bonzini
  2021-05-10 16:37           ` Ben Gardon
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2021-05-10 16:33 UTC (permalink / raw)
  To: Ben Gardon
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On 10/05/21 18:14, Ben Gardon wrote:
>> Possibly stupid (or at least lazy) question: why can't it be a "normal"
>> static inline function?
> That was my initial approach (hence the leftover inline) but I got
> some warnings about a forward declaration of struct kvm because
> arch/x86/include/asm/kvm_host.h doesn't include virt/kvm/kvm_host.h.
> Maybe there's a way to fix that, but I didn't want to mess with it.
> 

Let's just use the field directly.

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation
  2021-05-10 16:33         ` Paolo Bonzini
@ 2021-05-10 16:37           ` Ben Gardon
  0 siblings, 0 replies; 31+ messages in thread
From: Ben Gardon @ 2021-05-10 16:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Peter Xu, Sean Christopherson, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On Mon, May 10, 2021 at 9:33 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 10/05/21 18:14, Ben Gardon wrote:
> >> Possibly stupid (or at least lazy) question: why can't it be a "normal"
> >> static inline function?
> > That was my initial approach (hence the leftover inline) but I got
> > some warnings about a forward declaration of struct kvm because
> > arch/x86/include/asm/kvm_host.h doesn't include virt/kvm/kvm_host.h.
> > Maybe there's a way to fix that, but I didn't want to mess with it.
> >
>
> Let's just use the field directly.

That works for me too. I moved to the wrapper because adding the
smp_load_acquire and a comment explaining why we were doing that
looked bloated and I thought it would be easier to document in one
place, but it's not that much bloat, and having the subtleties
documented directly in the function is probably clearer for readers
anyway.

>
> Paolo
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU
  2021-05-07  8:42   ` Paolo Bonzini
@ 2021-05-10 17:45     ` Sean Christopherson
  2021-05-10 17:53       ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2021-05-10 17:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, linux-kernel, kvm, Peter Xu, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On Fri, May 07, 2021, Paolo Bonzini wrote:
> On 06/05/21 20:42, Ben Gardon wrote:
> > In preparation for lazily allocating the rmaps when the TDP MMU is in
> > use, protect the rmaps with SRCU. Unfortunately, this requires
> > propagating a pointer to struct kvm around to several functions.
> 
> Thinking more about it, this is not needed because all reads of the rmap
> array are guarded by the load-acquire of kvm->arch.memslots_have_rmaps.
> That is, the pattern is always
> 
> 	if (!load-acquire(memslot_have_rmaps))
> 		return;
> 	... = __gfn_to_rmap(...)
> 
> 				slots->arch.rmap[x] = ...
> 				store-release(memslot_have_rmaps, true)
> 
> where the load-acquire/store-release have the same role that
> srcu_dereference/rcu_assign_pointer had before this patch.
> 
> We also know that any read that misses the check has the potential for a
> NULL pointer dereference, so it *has* to be like that.
> 
> That said, srcu_dereference has zero cost unless debugging options are
> enabled, and it *is* true that the rmap can disappear if kvm->srcu is not
> held, so I lean towards keeping this change and just changing the commit
> message like this:
> 
> ---------
> Currently, rmaps are always allocated and published together with a new
> memslot, so the srcu_dereference for the memslots array already ensures that
> the memory pointed to by slots->arch.rmap is zero at the time
> slots->arch.rmap.  However, they still need to be accessed in an SRCU
> read-side critical section, as the whole memslot can be deleted outside
> SRCU.
> --------

I disagree, sprinkling random and unnecessary __rcu/SRCU annotations does more
harm than good.  Adding the unnecessary tag could be quite misleading as it
would imply the rmap pointers can _change_ independent of the memslots.

Similary, adding rcu_assign_pointer() in alloc_memslot_rmap() implies that its
safe to access the rmap after its pointer is assigned, and that's simply not
true since an rmap array can be freed if rmap allocation for a different memslot
fails.  Accessing the rmap is safe if and only if all rmaps are allocated, i.e.
if arch.memslots_have_rmaps is true, as you pointed out.

Furthermore, to actually gain any protection from SRCU, there would have to be
an synchronize_srcu() call after assigning the pointers, and that _does_ have an
associated.  Not to mention that to truly respect the __rcu annotation, deleting
the rmaps would also have to be done "independently" with the correct
rcu_assign_pointer() and synchronize_srcu() logic.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU
  2021-05-10 17:45     ` Sean Christopherson
@ 2021-05-10 17:53       ` Paolo Bonzini
  2021-05-10 18:28         ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2021-05-10 17:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ben Gardon, linux-kernel, kvm, Peter Xu, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On 10/05/21 19:45, Sean Christopherson wrote:
>>
>> ---------
>> Currently, rmaps are always allocated and published together with a new
>> memslot, so the srcu_dereference for the memslots array already ensures that
>> the memory pointed to by slots->arch.rmap is zero at the time
>> slots->arch.rmap.  However, they still need to be accessed in an SRCU
>> read-side critical section, as the whole memslot can be deleted outside
>> SRCU.
>> --------
> I disagree, sprinkling random and unnecessary __rcu/SRCU annotations does more
> harm than good.  Adding the unnecessary tag could be quite misleading as it
> would imply the rmap pointers can_change_  independent of the memslots.
> 
> Similary, adding rcu_assign_pointer() in alloc_memslot_rmap() implies that its
> safe to access the rmap after its pointer is assigned, and that's simply not
> true since an rmap array can be freed if rmap allocation for a different memslot
> fails.  Accessing the rmap is safe if and only if all rmaps are allocated, i.e.
> if arch.memslots_have_rmaps is true, as you pointed out.

This about freeing is a very good point.

> Furthermore, to actually gain any protection from SRCU, there would have to be
> an synchronize_srcu() call after assigning the pointers, and that _does_  have an
> associated.

... but this is incorrect (I was almost going to point out the below in 
my reply to Ben, then decided I was pointing out the obvious; lesson 
learned).

synchronize_srcu() is only needed after *deleting* something, which in 
this case is done as part of deleting the memslots---it's perfectly fine 
to batch multiple synchronize_*() calls given how expensive some of them 
are.

(BTW an associated what?)

So they still count as RCU-protected in my opinion, just because reading 
them outside SRCU is a big no and ought to warn (it's unlikely that it 
happens with rmaps, but then we just had 2-3 bugs like this being 
reported in a short time for memslots so never say never).  However, 
rcu_assign_pointer is not needed because the visibility of the rmaps is 
further protected by the have-rmaps flag (to be accessed with 
load-acquire/store-release) and not just by the pointer being there and 
non-NULL.

Paolo

> Not to mention that to truly respect the __rcu annotation, deleting
> the rmaps would also have to be done "independently" with the correct
> rcu_assign_pointer() and synchronize_srcu() logic.
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU
  2021-05-10 17:53       ` Paolo Bonzini
@ 2021-05-10 18:28         ` Sean Christopherson
  2021-05-11 16:22           ` Ben Gardon
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2021-05-10 18:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, linux-kernel, kvm, Peter Xu, Peter Shier,
	Yulei Zhang, Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On Mon, May 10, 2021, Paolo Bonzini wrote:
> On 10/05/21 19:45, Sean Christopherson wrote:
> > > 
> > > ---------
> > > Currently, rmaps are always allocated and published together with a new
> > > memslot, so the srcu_dereference for the memslots array already ensures that
> > > the memory pointed to by slots->arch.rmap is zero at the time
> > > slots->arch.rmap.  However, they still need to be accessed in an SRCU
> > > read-side critical section, as the whole memslot can be deleted outside
> > > SRCU.
> > > --------
> > I disagree, sprinkling random and unnecessary __rcu/SRCU annotations does more
> > harm than good.  Adding the unnecessary tag could be quite misleading as it
> > would imply the rmap pointers can_change_  independent of the memslots.
> > 
> > Similary, adding rcu_assign_pointer() in alloc_memslot_rmap() implies that its
> > safe to access the rmap after its pointer is assigned, and that's simply not
> > true since an rmap array can be freed if rmap allocation for a different memslot
> > fails.  Accessing the rmap is safe if and only if all rmaps are allocated, i.e.
> > if arch.memslots_have_rmaps is true, as you pointed out.
> 
> This about freeing is a very good point.
> 
> > Furthermore, to actually gain any protection from SRCU, there would have to be
> > an synchronize_srcu() call after assigning the pointers, and that _does_  have an
> > associated.
> 
> ... but this is incorrect (I was almost going to point out the below in my
> reply to Ben, then decided I was pointing out the obvious; lesson learned).
> 
> synchronize_srcu() is only needed after *deleting* something, which in this

No, synchronization is required any time the writer needs to ensure readers have
recognized the change.  E.g. making a memslot RO, moving a memslot's gfn base,
adding an MSR to the filter list.  I suppose you could frame any modification as
"deleting" something, but IMO that's cheating :-)

> case is done as part of deleting the memslots---it's perfectly fine to batch
> multiple synchronize_*() calls given how expensive some of them are.

Yes, but the shortlog says "Protect rmaps _independently_ with SRCU", emphasis
mine.  If the rmaps are truly protected independently, then they need to have
their own synchronization.  Setting all rmaps could be batched under a single
synchronize_srcu(), but IMO batching the rmaps with the memslot itself would be
in direct contradiction with the shortlog.

> (BTW an associated what?)

Doh.  "associated memslot."

> So they still count as RCU-protected in my opinion, just because reading
> them outside SRCU is a big no and ought to warn (it's unlikely that it
> happens with rmaps, but then we just had 2-3 bugs like this being reported
> in a short time for memslots so never say never).

Yes, but that interpretation holds true for literally everything that is hidden
behind an SRCU-protected pointer.  E.g. this would also be wrong, it's just much
more obviously broken:

bool kvm_is_gfn_writable(struct kvm* kvm, gfn_t gfn)
{
	struct kvm_memory_slot *slot;
	int idx;

	idx = srcu_read_lock(&kvm->srcu);
	slot = gfn_to_memslot(kvm, gfn);
	srcu_read_unlock(&kvm->srcu);

	return slot && !(slot->flags & KVM_MEMSLOT_INVALID) &&
	       !(slot->flags & KVM_MEM_READONLY);
}


> However, rcu_assign_pointer is not needed because the visibility of the rmaps
> is further protected by the have-rmaps flag (to be accessed with
> load-acquire/store-release) and not just by the pointer being there and
> non-NULL.

Yes, and I'm arguing that annotating the rmaps as __rcu is wrong because they
themselves are not protected by SRCU.  The memslot that contains the rmaps is
protected by SRCU, and because of that asserting SRCU is held for read will hold
true.  But, if the memslot code were changed to use a different protection scheme,
e.g. a rwlock for argument's sake, then the SRCU assertion would fail even though
the rmap logic itself didn't change.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU
  2021-05-10 18:28         ` Sean Christopherson
@ 2021-05-11 16:22           ` Ben Gardon
  2021-05-11 16:45             ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Ben Gardon @ 2021-05-11 16:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, LKML, kvm, Peter Xu, Peter Shier, Yulei Zhang,
	Wanpeng Li, Xiao Guangrong, Kai Huang, Keqian Zhu

On Mon, May 10, 2021 at 11:28 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, May 10, 2021, Paolo Bonzini wrote:
> > On 10/05/21 19:45, Sean Christopherson wrote:
> > > >
> > > > ---------
> > > > Currently, rmaps are always allocated and published together with a new
> > > > memslot, so the srcu_dereference for the memslots array already ensures that
> > > > the memory pointed to by slots->arch.rmap is zero at the time
> > > > slots->arch.rmap.  However, they still need to be accessed in an SRCU
> > > > read-side critical section, as the whole memslot can be deleted outside
> > > > SRCU.
> > > > --------
> > > I disagree, sprinkling random and unnecessary __rcu/SRCU annotations does more
> > > harm than good.  Adding the unnecessary tag could be quite misleading as it
> > > would imply the rmap pointers can_change_  independent of the memslots.
> > >
> > > Similary, adding rcu_assign_pointer() in alloc_memslot_rmap() implies that its
> > > safe to access the rmap after its pointer is assigned, and that's simply not
> > > true since an rmap array can be freed if rmap allocation for a different memslot
> > > fails.  Accessing the rmap is safe if and only if all rmaps are allocated, i.e.
> > > if arch.memslots_have_rmaps is true, as you pointed out.
> >
> > This about freeing is a very good point.
> >
> > > Furthermore, to actually gain any protection from SRCU, there would have to be
> > > an synchronize_srcu() call after assigning the pointers, and that _does_  have an
> > > associated.
> >
> > ... but this is incorrect (I was almost going to point out the below in my
> > reply to Ben, then decided I was pointing out the obvious; lesson learned).
> >
> > synchronize_srcu() is only needed after *deleting* something, which in this
>
> No, synchronization is required any time the writer needs to ensure readers have
> recognized the change.  E.g. making a memslot RO, moving a memslot's gfn base,
> adding an MSR to the filter list.  I suppose you could frame any modification as
> "deleting" something, but IMO that's cheating :-)
>
> > case is done as part of deleting the memslots---it's perfectly fine to batch
> > multiple synchronize_*() calls given how expensive some of them are.
>
> Yes, but the shortlog says "Protect rmaps _independently_ with SRCU", emphasis
> mine.  If the rmaps are truly protected independently, then they need to have
> their own synchronization.  Setting all rmaps could be batched under a single
> synchronize_srcu(), but IMO batching the rmaps with the memslot itself would be
> in direct contradiction with the shortlog.
>
> > (BTW an associated what?)
>
> Doh.  "associated memslot."
>
> > So they still count as RCU-protected in my opinion, just because reading
> > them outside SRCU is a big no and ought to warn (it's unlikely that it
> > happens with rmaps, but then we just had 2-3 bugs like this being reported
> > in a short time for memslots so never say never).
>
> Yes, but that interpretation holds true for literally everything that is hidden
> behind an SRCU-protected pointer.  E.g. this would also be wrong, it's just much
> more obviously broken:
>
> bool kvm_is_gfn_writable(struct kvm* kvm, gfn_t gfn)
> {
>         struct kvm_memory_slot *slot;
>         int idx;
>
>         idx = srcu_read_lock(&kvm->srcu);
>         slot = gfn_to_memslot(kvm, gfn);
>         srcu_read_unlock(&kvm->srcu);
>
>         return slot && !(slot->flags & KVM_MEMSLOT_INVALID) &&
>                !(slot->flags & KVM_MEM_READONLY);
> }
>
>
> > However, rcu_assign_pointer is not needed because the visibility of the rmaps
> > is further protected by the have-rmaps flag (to be accessed with
> > load-acquire/store-release) and not just by the pointer being there and
> > non-NULL.
>
> Yes, and I'm arguing that annotating the rmaps as __rcu is wrong because they
> themselves are not protected by SRCU.  The memslot that contains the rmaps is
> protected by SRCU, and because of that asserting SRCU is held for read will hold
> true.  But, if the memslot code were changed to use a different protection scheme,
> e.g. a rwlock for argument's sake, then the SRCU assertion would fail even though
> the rmap logic itself didn't change.

I'm inclined to agree with Sean that the extra RCU annotations are
probably unnecessary since we're already doing the srcu dereference
for all the slots. I'll move all these RCU annotations to their own
patch and put it at the end of the series when I send v4.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU
  2021-05-11 16:22           ` Ben Gardon
@ 2021-05-11 16:45             ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2021-05-11 16:45 UTC (permalink / raw)
  To: Ben Gardon, Sean Christopherson
  Cc: LKML, kvm, Peter Xu, Peter Shier, Yulei Zhang, Wanpeng Li,
	Xiao Guangrong, Kai Huang, Keqian Zhu

On 11/05/21 18:22, Ben Gardon wrote:
>> Yes, and I'm arguing that annotating the rmaps as __rcu is wrong because they
>> themselves are not protected by SRCU.  The memslot that contains the rmaps is
>> protected by SRCU, and because of that asserting SRCU is held for read will hold
>> true.  But, if the memslot code were changed to use a different protection scheme,
>> e.g. a rwlock for argument's sake, then the SRCU assertion would fail even though
>> the rmap logic itself didn't change.
>
> I'm inclined to agree with Sean that the extra RCU annotations are
> probably unnecessary since we're already doing the srcu dereference
> for all the slots. I'll move all these RCU annotations to their own
> patch and put it at the end of the series when I send v4.
> 

Fair enough, you can even remove them then.

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-05-11 16:50 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 18:42 [PATCH v3 0/8] Lazily allocate memslot rmaps Ben Gardon
2021-05-06 18:42 ` [PATCH v3 1/8] KVM: x86/mmu: Deduplicate rmap freeing Ben Gardon
2021-05-07  7:42   ` David Hildenbrand
2021-05-06 18:42 ` [PATCH v3 2/8] KVM: x86/mmu: Factor out allocating memslot rmap Ben Gardon
2021-05-07  7:46   ` David Hildenbrand
2021-05-10 16:29     ` Ben Gardon
2021-05-06 18:42 ` [PATCH v3 3/8] KVM: mmu: Refactor memslot copy Ben Gardon
2021-05-07  7:48   ` David Hildenbrand
2021-05-06 18:42 ` [PATCH v3 4/8] KVM: mmu: Add slots_arch_lock for memslot arch fields Ben Gardon
2021-05-06 18:42 ` [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation Ben Gardon
2021-05-06 23:44   ` Ben Gardon
2021-05-07  7:50     ` David Hildenbrand
2021-05-07  8:28     ` Paolo Bonzini
2021-05-10 16:14       ` Ben Gardon
2021-05-10 16:33         ` Paolo Bonzini
2021-05-10 16:37           ` Ben Gardon
2021-05-06 18:42 ` [PATCH v3 6/8] KVM: x86/mmu: Skip rmap operations if rmaps not allocated Ben Gardon
2021-05-06 23:07   ` kernel test robot
2021-05-06 18:42 ` [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU Ben Gardon
2021-05-06 23:58   ` kernel test robot
2021-05-07  0:56   ` kernel test robot
2021-05-07  8:42   ` Paolo Bonzini
2021-05-10 17:45     ` Sean Christopherson
2021-05-10 17:53       ` Paolo Bonzini
2021-05-10 18:28         ` Sean Christopherson
2021-05-11 16:22           ` Ben Gardon
2021-05-11 16:45             ` Paolo Bonzini
2021-05-06 18:42 ` [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps Ben Gardon
2021-05-07  1:10   ` kernel test robot
2021-05-07  8:28   ` Paolo Bonzini
2021-05-07  7:40 ` [PATCH v3 0/8] " David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).