[PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages
@ 2013-01-23 10:12 Takuya Yoshikawa
  2013-01-23 10:13 ` [PATCH 1/8] KVM: MMU: Fix and clean up for_each_gfn_* macros Takuya Yoshikawa
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 10:12 UTC (permalink / raw)
  To: mtosatti, gleb; +Cc: kvm

This patch set mitigates another mmu_lock hold time issue.  Although
this is not enough and I'm thinking of additional work already, this
alone can reduce the lock hold time to some extent.

Takuya Yoshikawa (8):
  KVM: MMU: Fix and clean up for_each_gfn_* macros
  KVM: MMU: Use list_for_each_entry_safe in kvm_mmu_commit_zap_page()
  KVM: MMU: Add a parameter to kvm_mmu_prepare_zap_page() to update the next position
  KVM: MMU: Introduce for_each_gfn_indirect_valid_sp_safe macro
  KVM: MMU: Delete hash_link node in kvm_mmu_prepare_zap_page()
  KVM: MMU: Introduce free_zapped_mmu_pages() for freeing mmu pages in a list
  KVM: MMU: Split out free_zapped_mmu_pages() from kvm_mmu_commit_zap_page()
  KVM: MMU: Move free_zapped_mmu_pages() out of the protection of mmu_lock

 arch/x86/kvm/mmu.c |  149 +++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 101 insertions(+), 48 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/8] KVM: MMU: Fix and clean up for_each_gfn_* macros
  2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
@ 2013-01-23 10:13 ` Takuya Yoshikawa
  2013-01-28 12:24   ` Gleb Natapov
  2013-01-23 10:13 ` [PATCH 2/8] KVM: MMU: Use list_for_each_entry_safe in kvm_mmu_commit_zap_page() Takuya Yoshikawa
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 10:13 UTC (permalink / raw)
  To: mtosatti, gleb; +Cc: kvm

The expression (sp)->gfn should not be expanded using @gfn.

Although no user of these macros passes a string other than gfn now,
this should be fixed before anyone sees strange errors.

Also, the counter-intuitive conditions, which had been used before these
macros were introduced to avoid extra indentations, should not be used.

Note: ignored the following checkpatch report:
  ERROR: Macros with complex values should be enclosed in parenthesis

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   18 ++++++++----------
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9f628f7..376cec8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1658,16 +1658,14 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 				    struct list_head *invalid_list);
 
-#define for_each_gfn_sp(kvm, sp, gfn, pos)				\
-  hlist_for_each_entry(sp, pos,						\
-   &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)	\
-	if ((sp)->gfn != (gfn)) {} else
-
-#define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos)		\
-  hlist_for_each_entry(sp, pos,						\
-   &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)	\
-		if ((sp)->gfn != (gfn) || (sp)->role.direct ||		\
-			(sp)->role.invalid) {} else
+#define for_each_gfn_sp(_kvm, _sp, _gfn, _pos)				\
+	hlist_for_each_entry(_sp, _pos,					\
+	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \
+		if ((_sp)->gfn == (_gfn))
+
+#define for_each_gfn_indirect_valid_sp(_kvm, _sp, _gfn, _pos)		\
+	for_each_gfn_sp(_kvm, _sp, _gfn, _pos)				\
+		if (!(_sp)->role.direct && !(_sp)->role.invalid)
 
 /* @sp->gfn should be write-protected at the call site */
 static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/8] KVM: MMU: Use list_for_each_entry_safe in kvm_mmu_commit_zap_page()
  2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
  2013-01-23 10:13 ` [PATCH 1/8] KVM: MMU: Fix and clean up for_each_gfn_* macros Takuya Yoshikawa
@ 2013-01-23 10:13 ` Takuya Yoshikawa
  2013-01-23 10:14 ` [PATCH 3/8] KVM: MMU: Add a parameter to kvm_mmu_prepare_zap_page() to update the next position Takuya Yoshikawa
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 10:13 UTC (permalink / raw)
  To: mtosatti, gleb; +Cc: kvm

We are traversing the linked list, invalid_list, deleting each entry by
kvm_mmu_free_page().  _safe version is there for such a case.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 376cec8..46b1435 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2103,7 +2103,7 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 				    struct list_head *invalid_list)
 {
-	struct kvm_mmu_page *sp;
+	struct kvm_mmu_page *sp, *nsp;
 
 	if (list_empty(invalid_list))
 		return;
@@ -2120,12 +2120,11 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	 */
 	kvm_flush_remote_tlbs(kvm);
 
-	do {
-		sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
+	list_for_each_entry_safe(sp, nsp, invalid_list, link) {
 		WARN_ON(!sp->role.invalid || sp->root_count);
 		kvm_mmu_isolate_page(sp);
 		kvm_mmu_free_page(sp);
-	} while (!list_empty(invalid_list));
+	}
 }
 
 /*
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/8] KVM: MMU: Add a parameter to kvm_mmu_prepare_zap_page() to update the next position
  2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
  2013-01-23 10:13 ` [PATCH 1/8] KVM: MMU: Fix and clean up for_each_gfn_* macros Takuya Yoshikawa
  2013-01-23 10:13 ` [PATCH 2/8] KVM: MMU: Use list_for_each_entry_safe in kvm_mmu_commit_zap_page() Takuya Yoshikawa
@ 2013-01-23 10:14 ` Takuya Yoshikawa
  2013-01-23 10:15 ` [PATCH 4/8] KVM: MMU: Introduce for_each_gfn_indirect_valid_sp_safe macro Takuya Yoshikawa
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 10:14 UTC (permalink / raw)
  To: mtosatti, gleb; +Cc: kvm

Currently we cannot do the isolation of mmu pages, i.e. deleting the
current hash_link node by hlist_del(), in this function, because we
may call it while traversing the linked list; we cannot solve the
problem by hlist_for_each_entry_safe as zapping can happen recursively.

Since the isolation must be done before releasing mmu_lock, we are now
forced to call kvm_mmu_isolate_page() for each mmu page found in the
invalid_list in kvm_mmu_commit_zap_page().

This patch adds a new parameter to kvm_mmu_prepare_zap_page() as a
preparation for solving this issue: all callers just pass NULL now.

Note: the abstraction, the introduction of sp_next_pos, makes it
possible to support the other list later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   41 +++++++++++++++++++++++++++--------------
 1 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 46b1435..2a48533 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1653,8 +1653,18 @@ static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	--kvm->stat.mmu_unsync;
 }
 
+/*
+ * Used to hold a pointer to the next mmu page's node when traversing through
+ * one of the linked lists.  This must be updated correctly when deleting any
+ * entries from the list.
+ */
+struct sp_next_pos {
+	struct hlist_node *hn;	/* next hash_link node */
+};
+
 static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-				    struct list_head *invalid_list);
+				    struct list_head *invalid_list,
+				    struct sp_next_pos *npos);
 static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 				    struct list_head *invalid_list);
 
@@ -1672,7 +1682,7 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			   struct list_head *invalid_list, bool clear_unsync)
 {
 	if (sp->role.cr4_pae != !!is_pae(vcpu)) {
-		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
+		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list, NULL);
 		return 1;
 	}
 
@@ -1680,7 +1690,7 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		kvm_unlink_unsync_page(vcpu->kvm, sp);
 
 	if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
-		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
+		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list, NULL);
 		return 1;
 	}
 
@@ -1730,7 +1740,8 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 		kvm_unlink_unsync_page(vcpu->kvm, s);
 		if ((s->role.cr4_pae != !!is_pae(vcpu)) ||
 			(vcpu->arch.mmu.sync_page(vcpu, s))) {
-			kvm_mmu_prepare_zap_page(vcpu->kvm, s, &invalid_list);
+			kvm_mmu_prepare_zap_page(vcpu->kvm, s,
+						 &invalid_list, NULL);
 			continue;
 		}
 		flush = true;
@@ -2062,7 +2073,7 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
 		struct kvm_mmu_page *sp;
 
 		for_each_sp(pages, sp, parents, i) {
-			kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+			kvm_mmu_prepare_zap_page(kvm, sp, invalid_list, NULL);
 			mmu_pages_clear_parents(&parents);
 			zapped++;
 		}
@@ -2073,7 +2084,8 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
 }
 
 static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-				    struct list_head *invalid_list)
+				    struct list_head *invalid_list,
+				    struct sp_next_pos *npos)
 {
 	int ret;
 
@@ -2149,7 +2161,7 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
 
 			page = container_of(kvm->arch.active_mmu_pages.prev,
 					    struct kvm_mmu_page, link);
-			kvm_mmu_prepare_zap_page(kvm, page, &invalid_list);
+			kvm_mmu_prepare_zap_page(kvm, page, &invalid_list, NULL);
 		}
 		kvm_mmu_commit_zap_page(kvm, &invalid_list);
 		goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages;
@@ -2174,7 +2186,7 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 		pgprintk("%s: gfn %llx role %x\n", __func__, gfn,
 			 sp->role.word);
 		r = 1;
-		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, NULL);
 	}
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 	spin_unlock(&kvm->mmu_lock);
@@ -2894,7 +2906,8 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
 		sp = page_header(root);
 		--sp->root_count;
 		if (!sp->root_count && sp->role.invalid) {
-			kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
+			kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
+						 &invalid_list, NULL);
 			kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
 		}
 		vcpu->arch.mmu.root_hpa = INVALID_PAGE;
@@ -2910,7 +2923,7 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
 			--sp->root_count;
 			if (!sp->root_count && sp->role.invalid)
 				kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
-							 &invalid_list);
+							 &invalid_list, NULL);
 		}
 		vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
 	}
@@ -3987,7 +4000,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		if (detect_write_misaligned(sp, gpa, bytes) ||
 		      detect_write_flooding(sp)) {
 			zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
-						     &invalid_list);
+							&invalid_list, NULL);
 			++vcpu->kvm->stat.mmu_flooded;
 			continue;
 		}
@@ -4041,7 +4054,7 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
 
 		sp = container_of(vcpu->kvm->arch.active_mmu_pages.prev,
 				  struct kvm_mmu_page, link);
-		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
+		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list, NULL);
 		++vcpu->kvm->stat.mmu_recycled;
 	}
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
@@ -4203,7 +4216,7 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 	spin_lock(&kvm->mmu_lock);
 restart:
 	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link)
-		if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
+		if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, NULL))
 			goto restart;
 
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
@@ -4220,7 +4233,7 @@ static void kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm,
 
 	page = container_of(kvm->arch.active_mmu_pages.prev,
 			    struct kvm_mmu_page, link);
-	kvm_mmu_prepare_zap_page(kvm, page, invalid_list);
+	kvm_mmu_prepare_zap_page(kvm, page, invalid_list, NULL);
 }
 
 static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/8] KVM: MMU: Introduce for_each_gfn_indirect_valid_sp_safe macro
  2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
                   ` (2 preceding siblings ...)
  2013-01-23 10:14 ` [PATCH 3/8] KVM: MMU: Add a parameter to kvm_mmu_prepare_zap_page() to update the next position Takuya Yoshikawa
@ 2013-01-23 10:15 ` Takuya Yoshikawa
  2013-01-23 10:16 ` [PATCH 5/8] KVM: MMU: Delete hash_link node in kvm_mmu_prepare_zap_page() Takuya Yoshikawa
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 10:15 UTC (permalink / raw)
  To: mtosatti, gleb; +Cc: kvm

This is a preparation for moving hlist_del(&sp->hash_link) from
kvm_mmu_isolate_page() to kvm_mmu_prepare_zap_page().

All for_each_gfn_indirect_valid_sp's whose bodies contain a function
call which will reach kvm_mmu_prepare_zap_page(), and not break the
loop right after the call, are converted to this new one.

Note: ignored the following checkpatch report:
  ERROR: Macros with complex values should be enclosed in parenthesis

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   27 +++++++++++++++++++++------
 1 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a48533..d5bf373 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1677,6 +1677,18 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	for_each_gfn_sp(_kvm, _sp, _gfn, _pos)				\
 		if (!(_sp)->role.direct && !(_sp)->role.invalid)
 
+/*
+ * Used for zapping mmu pages while traversing the mmu page hash list.
+ * Users must update @_n so that it points to the new next node after deleting
+ * any entries in such a way that can make the value prepared by
+ * hlist_for_each_entry_safe invalid.
+ */
+#define for_each_gfn_indirect_valid_sp_safe(_kvm, _sp, _gfn, _pos, _n)	\
+	hlist_for_each_entry_safe(_sp, _pos, _n,			\
+	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \
+		if (((_sp)->gfn == (_gfn)) &&				\
+		    !(_sp)->role.direct && !(_sp)->role.invalid)
+
 /* @sp->gfn should be write-protected at the call site */
 static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			   struct list_head *invalid_list, bool clear_unsync)
@@ -1729,10 +1741,11 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 {
 	struct kvm_mmu_page *s;
 	struct hlist_node *node;
+	struct sp_next_pos npos;
 	LIST_HEAD(invalid_list);
 	bool flush = false;
 
-	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) {
+	for_each_gfn_indirect_valid_sp_safe(vcpu->kvm, s, gfn, node, npos.hn) {
 		if (!s->unsync)
 			continue;
 
@@ -1741,7 +1754,7 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 		if ((s->role.cr4_pae != !!is_pae(vcpu)) ||
 			(vcpu->arch.mmu.sync_page(vcpu, s))) {
 			kvm_mmu_prepare_zap_page(vcpu->kvm, s,
-						 &invalid_list, NULL);
+						 &invalid_list, &npos);
 			continue;
 		}
 		flush = true;
@@ -2176,17 +2189,18 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 {
 	struct kvm_mmu_page *sp;
 	struct hlist_node *node;
+	struct sp_next_pos npos;
 	LIST_HEAD(invalid_list);
 	int r;
 
 	pgprintk("%s: looking for gfn %llx\n", __func__, gfn);
 	r = 0;
 	spin_lock(&kvm->mmu_lock);
-	for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) {
+	for_each_gfn_indirect_valid_sp_safe(kvm, sp, gfn, node, npos.hn) {
 		pgprintk("%s: gfn %llx role %x\n", __func__, gfn,
 			 sp->role.word);
 		r = 1;
-		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, NULL);
+		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &npos);
 	}
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 	spin_unlock(&kvm->mmu_lock);
@@ -3966,6 +3980,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	union kvm_mmu_page_role mask = { .word = 0 };
 	struct kvm_mmu_page *sp;
 	struct hlist_node *node;
+	struct sp_next_pos npos;
 	LIST_HEAD(invalid_list);
 	u64 entry, gentry, *spte;
 	int npte;
@@ -3996,11 +4011,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);
 
 	mask.cr0_wp = mask.cr4_pae = mask.nxe = 1;
-	for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn, node) {
+	for_each_gfn_indirect_valid_sp_safe(vcpu->kvm, sp, gfn, node, npos.hn) {
 		if (detect_write_misaligned(sp, gpa, bytes) ||
 		      detect_write_flooding(sp)) {
 			zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
-							&invalid_list, NULL);
+							&invalid_list, &npos);
 			++vcpu->kvm->stat.mmu_flooded;
 			continue;
 		}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/8] KVM: MMU: Delete hash_link node in kvm_mmu_prepare_zap_page()
  2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
                   ` (3 preceding siblings ...)
  2013-01-23 10:15 ` [PATCH 4/8] KVM: MMU: Introduce for_each_gfn_indirect_valid_sp_safe macro Takuya Yoshikawa
@ 2013-01-23 10:16 ` Takuya Yoshikawa
  2013-01-23 10:16 ` [PATCH 6/8] KVM: MMU: Introduce free_zapped_mmu_pages() for freeing mmu pages in a list Takuya Yoshikawa
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 10:16 UTC (permalink / raw)
  To: mtosatti, gleb; +Cc: kvm

Now that we are using for_each_gfn_indirect_valid_sp_safe, we can safely
delete the node by correctly updating the pointer to the next one.

The only case we need to care about is when mmu_zap_unsync_children()
has zapped anything other than the current one.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d5bf373..a72c573 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1469,7 +1469,6 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr)
 static void kvm_mmu_isolate_page(struct kvm_mmu_page *sp)
 {
 	ASSERT(is_empty_shadow_page(sp->spt));
-	hlist_del(&sp->hash_link);
 	if (!sp->role.direct)
 		free_page((unsigned long)sp->gfns);
 }
@@ -2111,9 +2110,15 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 		unaccount_shadowed(kvm, sp->gfn);
 	if (sp->unsync)
 		kvm_unlink_unsync_page(kvm, sp);
+
+	/* Next entry might be deleted by mmu_zap_unsync_children(). */
+	if (npos && ret)
+		npos->hn = sp->hash_link.next;
+
 	if (!sp->root_count) {
 		/* Count self */
 		ret++;
+		hlist_del(&sp->hash_link);
 		list_move(&sp->link, invalid_list);
 		kvm_mod_used_mmu_pages(kvm, -1);
 	} else {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/8] KVM: MMU: Introduce free_zapped_mmu_pages() for freeing mmu pages in a list
  2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
                   ` (4 preceding siblings ...)
  2013-01-23 10:16 ` [PATCH 5/8] KVM: MMU: Delete hash_link node in kvm_mmu_prepare_zap_page() Takuya Yoshikawa
@ 2013-01-23 10:16 ` Takuya Yoshikawa
  2013-01-23 10:17 ` [PATCH 7/8] KVM: MMU: Split out free_zapped_mmu_pages() from kvm_mmu_commit_zap_page() Takuya Yoshikawa
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 10:16 UTC (permalink / raw)
  To: mtosatti, gleb; +Cc: kvm

This will be split out from kvm_mmu_commit_zap_page() and moved out of
the protection of the mmu_lock later.

Note: kvm_mmu_isolate_page() is folded into kvm_mmu_free_page() since it
now does nothing but free sp->gfns.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   35 +++++++++++++++++------------------
 1 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a72c573..97d372a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1461,27 +1461,32 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr)
 }
 
 /*
- * Remove the sp from shadow page cache, after call it,
- * we can not find this sp from the cache, and the shadow
- * page table is still valid.
- * It should be under the protection of mmu lock.
+ * Free the shadow page table and the sp, we can do it
+ * out of the protection of mmu lock.
  */
-static void kvm_mmu_isolate_page(struct kvm_mmu_page *sp)
+static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
 {
 	ASSERT(is_empty_shadow_page(sp->spt));
+
 	if (!sp->role.direct)
 		free_page((unsigned long)sp->gfns);
+
+	list_del(&sp->link);
+	free_page((unsigned long)sp->spt);
+	kmem_cache_free(mmu_page_header_cache, sp);
 }
 
 /*
- * Free the shadow page table and the sp, we can do it
- * out of the protection of mmu lock.
+ * Free zapped mmu pages in @invalid_list.
+ * Call this after releasing mmu_lock if possible.
  */
-static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
+static void free_zapped_mmu_pages(struct kvm *kvm,
+				  struct list_head *invalid_list)
 {
-	list_del(&sp->link);
-	free_page((unsigned long)sp->spt);
-	kmem_cache_free(mmu_page_header_cache, sp);
+	struct kvm_mmu_page *sp, *nsp;
+
+	list_for_each_entry_safe(sp, nsp, invalid_list, link)
+		kvm_mmu_free_page(sp);
 }
 
 static unsigned kvm_page_table_hashfn(gfn_t gfn)
@@ -2133,8 +2138,6 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 				    struct list_head *invalid_list)
 {
-	struct kvm_mmu_page *sp, *nsp;
-
 	if (list_empty(invalid_list))
 		return;
 
@@ -2150,11 +2153,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	 */
 	kvm_flush_remote_tlbs(kvm);
 
-	list_for_each_entry_safe(sp, nsp, invalid_list, link) {
-		WARN_ON(!sp->role.invalid || sp->root_count);
-		kvm_mmu_isolate_page(sp);
-		kvm_mmu_free_page(sp);
-	}
+	free_zapped_mmu_pages(kvm, invalid_list);
 }
 
 /*
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/8] KVM: MMU: Split out free_zapped_mmu_pages() from kvm_mmu_commit_zap_page()
  2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
                   ` (5 preceding siblings ...)
  2013-01-23 10:16 ` [PATCH 6/8] KVM: MMU: Introduce free_zapped_mmu_pages() for freeing mmu pages in a list Takuya Yoshikawa
@ 2013-01-23 10:17 ` Takuya Yoshikawa
  2013-01-23 10:18 ` [PATCH 8/8] KVM: MMU: Move free_zapped_mmu_pages() out of the protection of mmu_lock Takuya Yoshikawa
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 10:17 UTC (permalink / raw)
  To: mtosatti, gleb; +Cc: kvm

Just trivial conversions at this point.  Some of these will be moved out
of the protection of the mmu_lock in the following patch.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 97d372a..dd7b455 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1721,8 +1721,10 @@ static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
 	int ret;
 
 	ret = __kvm_sync_page(vcpu, sp, &invalid_list, false);
-	if (ret)
+	if (ret) {
 		kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+		free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
+	}
 
 	return ret;
 }
@@ -1765,6 +1767,8 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 	}
 
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
+
 	if (flush)
 		kvm_mmu_flush_tlb(vcpu);
 }
@@ -1852,6 +1856,8 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 			mmu_pages_clear_parents(&parents);
 		}
 		kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+		free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
+
 		cond_resched_lock(&vcpu->kvm->mmu_lock);
 		kvm_mmu_pages_init(parent, &parents, &pages);
 	}
@@ -2152,8 +2158,6 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	 * page table walks.
 	 */
 	kvm_flush_remote_tlbs(kvm);
-
-	free_zapped_mmu_pages(kvm, invalid_list);
 }
 
 /*
@@ -2181,6 +2185,8 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
 			kvm_mmu_prepare_zap_page(kvm, page, &invalid_list, NULL);
 		}
 		kvm_mmu_commit_zap_page(kvm, &invalid_list);
+		free_zapped_mmu_pages(kvm, &invalid_list);
+
 		goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages;
 	}
 
@@ -2207,6 +2213,7 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &npos);
 	}
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	free_zapped_mmu_pages(kvm, &invalid_list);
 	spin_unlock(&kvm->mmu_lock);
 
 	return r;
@@ -2927,6 +2934,7 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
 			kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
 						 &invalid_list, NULL);
 			kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+			free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 		}
 		vcpu->arch.mmu.root_hpa = INVALID_PAGE;
 		spin_unlock(&vcpu->kvm->mmu_lock);
@@ -2946,6 +2954,8 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
 		vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
 	}
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
+
 	spin_unlock(&vcpu->kvm->mmu_lock);
 	vcpu->arch.mmu.root_hpa = INVALID_PAGE;
 }
@@ -4042,7 +4052,10 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		}
 	}
 	mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush);
+
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
+
 	kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 }
@@ -4076,7 +4089,9 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
 		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list, NULL);
 		++vcpu->kvm->stat.mmu_recycled;
 	}
+
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 }
 
 static bool is_mmio_page_fault(struct kvm_vcpu *vcpu, gva_t addr)
@@ -4239,6 +4254,8 @@ restart:
 			goto restart;
 
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	free_zapped_mmu_pages(kvm, &invalid_list);
+
 	spin_unlock(&kvm->mmu_lock);
 }
 
@@ -4291,6 +4308,7 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 
 		kvm_mmu_remove_some_alloc_mmu_pages(kvm, &invalid_list);
 		kvm_mmu_commit_zap_page(kvm, &invalid_list);
+		free_zapped_mmu_pages(kvm, &invalid_list);
 
 		spin_unlock(&kvm->mmu_lock);
 		srcu_read_unlock(&kvm->srcu, idx);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/8] KVM: MMU: Move free_zapped_mmu_pages() out of the protection of mmu_lock
  2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
                   ` (6 preceding siblings ...)
  2013-01-23 10:17 ` [PATCH 7/8] KVM: MMU: Split out free_zapped_mmu_pages() from kvm_mmu_commit_zap_page() Takuya Yoshikawa
@ 2013-01-23 10:18 ` Takuya Yoshikawa
  2013-02-04 13:50   ` Marcelo Tosatti
  2013-01-23 10:44 ` [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Xiao Guangrong
  2013-02-04 13:29 ` Marcelo Tosatti
  9 siblings, 1 reply; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 10:18 UTC (permalink / raw)
  To: mtosatti, gleb; +Cc: kvm

We noticed that kvm_mmu_zap_all() could take hundreds of milliseconds
for zapping mmu pages with mmu_lock held.

Although we need to do conditional rescheduling for completely
fixing this issue, we can reduce the hold time to some extent by moving
free_zapped_mmu_pages() out of the protection.  Since invalid_list can
be very long, the effect is not negligible.

Note: this patch does not treat non-trivial cases.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index dd7b455..7976f55 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2185,7 +2185,6 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
 			kvm_mmu_prepare_zap_page(kvm, page, &invalid_list, NULL);
 		}
 		kvm_mmu_commit_zap_page(kvm, &invalid_list);
-		free_zapped_mmu_pages(kvm, &invalid_list);
 
 		goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages;
 	}
@@ -2193,6 +2192,8 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
 	kvm->arch.n_max_mmu_pages = goal_nr_mmu_pages;
 
 	spin_unlock(&kvm->mmu_lock);
+
+	free_zapped_mmu_pages(kvm, &invalid_list);
 }
 
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
@@ -2213,9 +2214,10 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &npos);
 	}
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
-	free_zapped_mmu_pages(kvm, &invalid_list);
 	spin_unlock(&kvm->mmu_lock);
 
+	free_zapped_mmu_pages(kvm, &invalid_list);
+
 	return r;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
@@ -2934,10 +2936,11 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
 			kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
 						 &invalid_list, NULL);
 			kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
-			free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 		}
 		vcpu->arch.mmu.root_hpa = INVALID_PAGE;
 		spin_unlock(&vcpu->kvm->mmu_lock);
+
+		free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 		return;
 	}
 	for (i = 0; i < 4; ++i) {
@@ -2954,9 +2957,10 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
 		vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
 	}
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
-	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 
 	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 	vcpu->arch.mmu.root_hpa = INVALID_PAGE;
 }
 
@@ -4054,10 +4058,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush);
 
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
-	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 
 	kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE);
 	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 }
 
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
@@ -4254,9 +4259,10 @@ restart:
 			goto restart;
 
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
-	free_zapped_mmu_pages(kvm, &invalid_list);
 
 	spin_unlock(&kvm->mmu_lock);
+
+	free_zapped_mmu_pages(kvm, &invalid_list);
 }
 
 static void kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm,
@@ -4308,10 +4314,10 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 
 		kvm_mmu_remove_some_alloc_mmu_pages(kvm, &invalid_list);
 		kvm_mmu_commit_zap_page(kvm, &invalid_list);
-		free_zapped_mmu_pages(kvm, &invalid_list);
 
 		spin_unlock(&kvm->mmu_lock);
 		srcu_read_unlock(&kvm->srcu, idx);
+		free_zapped_mmu_pages(kvm, &invalid_list);
 
 		list_move_tail(&kvm->vm_list, &vm_list);
 		break;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages
  2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
                   ` (7 preceding siblings ...)
  2013-01-23 10:18 ` [PATCH 8/8] KVM: MMU: Move free_zapped_mmu_pages() out of the protection of mmu_lock Takuya Yoshikawa
@ 2013-01-23 10:44 ` Xiao Guangrong
  2013-01-23 13:28   ` Takuya Yoshikawa
  2013-02-04 13:42   ` Marcelo Tosatti
  2013-02-04 13:29 ` Marcelo Tosatti
  9 siblings, 2 replies; 21+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:44 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, gleb, kvm

On 01/23/2013 06:12 PM, Takuya Yoshikawa wrote:
> This patch set mitigates another mmu_lock hold time issue.  Although
> this is not enough and I'm thinking of additional work already, this
> alone can reduce the lock hold time to some extent.
> 

It is not worth doing this kind of complex thing, usually, only a few pages on
the invalid list. The *really* heavily case is kvm_mmu_zap_all() which can be speeded
up by using generation number, this is a todo work in kvm wiki:

http://www.linux-kvm.org/page/TODO: O(1) mmu invalidation using a generation number

I am doing this work for some weeks and will post the patch out during these days.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages
  2013-01-23 10:44 ` [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Xiao Guangrong
@ 2013-01-23 13:28   ` Takuya Yoshikawa
  2013-01-23 13:45     ` Xiao Guangrong
  2013-02-04 13:42   ` Marcelo Tosatti
  1 sibling, 1 reply; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 13:28 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Takuya Yoshikawa, mtosatti, gleb, kvm

On Wed, 23 Jan 2013 18:44:52 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> On 01/23/2013 06:12 PM, Takuya Yoshikawa wrote:
> > This patch set mitigates another mmu_lock hold time issue.  Although
> > this is not enough and I'm thinking of additional work already, this
> > alone can reduce the lock hold time to some extent.
> > 
> 
> It is not worth doing this kind of complex thing, usually, only a few pages on
> the invalid list. The *really* heavily case is kvm_mmu_zap_all() which can be speeded
> up by using generation number, this is a todo work in kvm wiki:

I don't think this is so complex.  This is a basic programming technique
using linked lists.

The current code which deletes the two link nodes in different functions
looks unnatural to me: traversing the sp->link nodes forces us to break
the loop and sp->hash_link nodes alone is allowed to continue ...

Making each function semantically clear should be more important than
other things.

But maybe a matter of taste, so I'll wait for the maintainers' comments.

> http://www.linux-kvm.org/page/TODO: O(1) mmu invalidation using a generation number
> 
> I am doing this work for some weeks and will post the patch out during these days.

I remember that Avi originally wrote the idea of introducing the
generation of mmu pages in his other work.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages
  2013-01-23 13:28   ` Takuya Yoshikawa
@ 2013-01-23 13:45     ` Xiao Guangrong
  2013-01-23 14:49       ` Takuya Yoshikawa
  0 siblings, 1 reply; 21+ messages in thread
From: Xiao Guangrong @ 2013-01-23 13:45 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Takuya Yoshikawa, mtosatti, gleb, kvm

On 01/23/2013 09:28 PM, Takuya Yoshikawa wrote:
> On Wed, 23 Jan 2013 18:44:52 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>> On 01/23/2013 06:12 PM, Takuya Yoshikawa wrote:
>>> This patch set mitigates another mmu_lock hold time issue.  Although
>>> this is not enough and I'm thinking of additional work already, this
>>> alone can reduce the lock hold time to some extent.
>>>
>>
>> It is not worth doing this kind of complex thing, usually, only a few pages on
>> the invalid list. The *really* heavily case is kvm_mmu_zap_all() which can be speeded
>> up by using generation number, this is a todo work in kvm wiki:
> 
> I don't think this is so complex.  This is a basic programming technique
> using linked lists.

I just consider that only a few page exist on the invalid list, no worth introducing
this.

> 
> The current code which deletes the two link nodes in different functions
> looks unnatural to me: traversing the sp->link nodes forces us to break
> the loop and sp->hash_link nodes alone is allowed to continue ...
> 
> Making each function semantically clear should be more important than
> other things.
> 

The reason the code like this is, we have lockless shadow page walker.

> But maybe a matter of taste, so I'll wait for the maintainers' comments.
> 
>> http://www.linux-kvm.org/page/TODO: O(1) mmu invalidation using a generation number
>>
>> I am doing this work for some weeks and will post the patch out during these days.
> 
> I remember that Avi originally wrote the idea of introducing the
> generation of mmu pages in his other work.
> 

Whatever the original consideration is, the idea can speed up mmu invalidation a lot.
(Actually, i mentioned this idea to you when discussion fast write protect long time ago.)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages
  2013-01-23 13:45     ` Xiao Guangrong
@ 2013-01-23 14:49       ` Takuya Yoshikawa
  2013-01-23 15:45         ` Xiao Guangrong
  0 siblings, 1 reply; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-23 14:49 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Takuya Yoshikawa, mtosatti, gleb, kvm

On Wed, 23 Jan 2013 21:45:23 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> > The current code which deletes the two link nodes in different functions
> > looks unnatural to me: traversing the sp->link nodes forces us to break
> > the loop and sp->hash_link nodes alone is allowed to continue ...
> > 
> > Making each function semantically clear should be more important than
> > other things.
> > 
> 
> The reason the code like this is, we have lockless shadow page walker.

But hash_link needs to be protected by mmu_lock anyway?

> > But maybe a matter of taste, so I'll wait for the maintainers' comments.
> > 
> >> http://www.linux-kvm.org/page/TODO: O(1) mmu invalidation using a generation number
> >>
> >> I am doing this work for some weeks and will post the patch out during these days.
> > 
> > I remember that Avi originally wrote the idea of introducing the
> > generation of mmu pages in his other work.
> > 
> 
> Whatever the original consideration is, the idea can speed up mmu invalidation a lot.
> (Actually, i mentioned this idea to you when discussion fast write protect long time ago.)

It's fine.  I just wanted to know if my memory was correct.

	Takuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages
  2013-01-23 14:49       ` Takuya Yoshikawa
@ 2013-01-23 15:45         ` Xiao Guangrong
  0 siblings, 0 replies; 21+ messages in thread
From: Xiao Guangrong @ 2013-01-23 15:45 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Xiao Guangrong, Takuya Yoshikawa, mtosatti, gleb, kvm

On 01/23/2013 10:49 PM, Takuya Yoshikawa wrote:
> On Wed, 23 Jan 2013 21:45:23 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>>> The current code which deletes the two link nodes in different functions
>>> looks unnatural to me: traversing the sp->link nodes forces us to break
>>> the loop and sp->hash_link nodes alone is allowed to continue ...
>>>
>>> Making each function semantically clear should be more important than
>>> other things.
>>>
>>
>> The reason the code like this is, we have lockless shadow page walker.
> 
> But hash_link needs to be protected by mmu_lock anyway?

The purpose that do not delete hlist is for continuously walking hash table entry.
Deleting link is for reusing it on invalid list to save memory space. If you really
like to continuously walk sh->link, we can introduce another list for invalid list
using, but it is not worthwhile.

To be honest, i do not care this, no one ask us to obey the rule that "all lists
should have the same walking behaviour". ;). But comment for these code is always
appreciated.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] KVM: MMU: Fix and clean up for_each_gfn_* macros
  2013-01-23 10:13 ` [PATCH 1/8] KVM: MMU: Fix and clean up for_each_gfn_* macros Takuya Yoshikawa
@ 2013-01-28 12:24   ` Gleb Natapov
  2013-01-28 12:29     ` Takuya Yoshikawa
  0 siblings, 1 reply; 21+ messages in thread
From: Gleb Natapov @ 2013-01-28 12:24 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm

On Wed, Jan 23, 2013 at 07:13:21PM +0900, Takuya Yoshikawa wrote:
> The expression (sp)->gfn should not be expanded using @gfn.
> 
> Although no user of these macros passes a string other than gfn now,
> this should be fixed before anyone sees strange errors.
> 
> Also, the counter-intuitive conditions, which had been used before these
> macros were introduced to avoid extra indentations, should not be used.
> 
Not sure what do you mean here. This counter-intuitive conditions are
used so that if "else" follows the macro it will not be interpreted as
belonging to the hidden if(). Like in the following code:

 if (x)
    for_each_gfn_sp()
 else
   do_y();

You do not expect do_y() to be called for each sp->gfn != gfn.

> Note: ignored the following checkpatch report:
>   ERROR: Macros with complex values should be enclosed in parenthesis
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> ---
>  arch/x86/kvm/mmu.c |   18 ++++++++----------
>  1 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9f628f7..376cec8 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1658,16 +1658,14 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
>  static void kvm_mmu_commit_zap_page(struct kvm *kvm,
>  				    struct list_head *invalid_list);
>  
> -#define for_each_gfn_sp(kvm, sp, gfn, pos)				\
> -  hlist_for_each_entry(sp, pos,						\
> -   &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)	\
> -	if ((sp)->gfn != (gfn)) {} else
> -
> -#define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos)		\
> -  hlist_for_each_entry(sp, pos,						\
> -   &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)	\
> -		if ((sp)->gfn != (gfn) || (sp)->role.direct ||		\
> -			(sp)->role.invalid) {} else
> +#define for_each_gfn_sp(_kvm, _sp, _gfn, _pos)				\
> +	hlist_for_each_entry(_sp, _pos,					\
> +	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \
> +		if ((_sp)->gfn == (_gfn))
> +
> +#define for_each_gfn_indirect_valid_sp(_kvm, _sp, _gfn, _pos)		\
> +	for_each_gfn_sp(_kvm, _sp, _gfn, _pos)				\
> +		if (!(_sp)->role.direct && !(_sp)->role.invalid)
>  
>  /* @sp->gfn should be write-protected at the call site */
>  static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> -- 
> 1.7.5.4

--
			Gleb.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] KVM: MMU: Fix and clean up for_each_gfn_* macros
  2013-01-28 12:24   ` Gleb Natapov
@ 2013-01-28 12:29     ` Takuya Yoshikawa
  0 siblings, 0 replies; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-01-28 12:29 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Takuya Yoshikawa, mtosatti, kvm

On Mon, 28 Jan 2013 14:24:25 +0200
Gleb Natapov <gleb@redhat.com> wrote:

> On Wed, Jan 23, 2013 at 07:13:21PM +0900, Takuya Yoshikawa wrote:
> > The expression (sp)->gfn should not be expanded using @gfn.
> > 
> > Although no user of these macros passes a string other than gfn now,
> > this should be fixed before anyone sees strange errors.
> > 
> > Also, the counter-intuitive conditions, which had been used before these
> > macros were introduced to avoid extra indentations, should not be used.
> > 
> Not sure what do you mean here. This counter-intuitive conditions are
> used so that if "else" follows the macro it will not be interpreted as
> belonging to the hidden if(). Like in the following code:
> 
>  if (x)
>     for_each_gfn_sp()
>  else
>    do_y();
> 
> You do not expect do_y() to be called for each sp->gfn != gfn.

I could not think of this case.

Will fix not to change the current conditions.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages
  2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
                   ` (8 preceding siblings ...)
  2013-01-23 10:44 ` [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Xiao Guangrong
@ 2013-02-04 13:29 ` Marcelo Tosatti
  9 siblings, 0 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2013-02-04 13:29 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: gleb, kvm

On Wed, Jan 23, 2013 at 07:12:31PM +0900, Takuya Yoshikawa wrote:
> This patch set mitigates another mmu_lock hold time issue.  Although
> this is not enough and I'm thinking of additional work already, this
> alone can reduce the lock hold time to some extent.
> 
> Takuya Yoshikawa (8):
>   KVM: MMU: Fix and clean up for_each_gfn_* macros
>   KVM: MMU: Use list_for_each_entry_safe in kvm_mmu_commit_zap_page()
>   KVM: MMU: Add a parameter to kvm_mmu_prepare_zap_page() to update the next position
>   KVM: MMU: Introduce for_each_gfn_indirect_valid_sp_safe macro
>   KVM: MMU: Delete hash_link node in kvm_mmu_prepare_zap_page()
>   KVM: MMU: Introduce free_zapped_mmu_pages() for freeing mmu pages in a list
>   KVM: MMU: Split out free_zapped_mmu_pages() from kvm_mmu_commit_zap_page()
>   KVM: MMU: Move free_zapped_mmu_pages() out of the protection of mmu_lock
> 
>  arch/x86/kvm/mmu.c |  149 +++++++++++++++++++++++++++++++++++-----------------
>  1 files changed, 101 insertions(+), 48 deletions(-)

Need a limit on the number of pages whose freeing is delayed. See that
n_used_mmu_pages is used by both SLAB freeing (to know how much pressure
to apply) and allocators (to decide when to allocate more).

You allow n_used_mmu_pages to be inaccurate, which is fine as long as
the error is limited.

Perhaps have a max of 64 pages at invalid_pages per round and if exceeded
release memory inside mmu_lock (one-by-one) ?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages
  2013-01-23 10:44 ` [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Xiao Guangrong
  2013-01-23 13:28   ` Takuya Yoshikawa
@ 2013-02-04 13:42   ` Marcelo Tosatti
  2013-02-05  5:30     ` Xiao Guangrong
  1 sibling, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2013-02-04 13:42 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Takuya Yoshikawa, gleb, kvm

On Wed, Jan 23, 2013 at 06:44:52PM +0800, Xiao Guangrong wrote:
> On 01/23/2013 06:12 PM, Takuya Yoshikawa wrote:
> > This patch set mitigates another mmu_lock hold time issue.  Although
> > this is not enough and I'm thinking of additional work already, this
> > alone can reduce the lock hold time to some extent.
> > 
> 
> It is not worth doing this kind of complex thing, usually, only a few pages on
> the invalid list.

I think its a good idea - memory freeing can be done outside mmu_lock
protection (as long as its bounded). It reduces mmu_lock contention
overall.

> The *really* heavily case is kvm_mmu_zap_all() which can be speeded
> up by using generation number, this is a todo work in kvm wiki:
> 
> http://www.linux-kvm.org/page/TODO: O(1) mmu invalidation using a generation number
> 
> I am doing this work for some weeks and will post the patch out during these days.

Can you describe the generation number scheme in more detail, please?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] KVM: MMU: Move free_zapped_mmu_pages() out of the protection of mmu_lock
  2013-01-23 10:18 ` [PATCH 8/8] KVM: MMU: Move free_zapped_mmu_pages() out of the protection of mmu_lock Takuya Yoshikawa
@ 2013-02-04 13:50   ` Marcelo Tosatti
  2013-02-05  2:21     ` Takuya Yoshikawa
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2013-02-04 13:50 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: gleb, kvm

On Wed, Jan 23, 2013 at 07:18:11PM +0900, Takuya Yoshikawa wrote:
> We noticed that kvm_mmu_zap_all() could take hundreds of milliseconds
> for zapping mmu pages with mmu_lock held.
> 
> Although we need to do conditional rescheduling for completely
> fixing this issue, we can reduce the hold time to some extent by moving
> free_zapped_mmu_pages() out of the protection.  Since invalid_list can
> be very long, the effect is not negligible.
> 
> Note: this patch does not treat non-trivial cases.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>

Can you describe the case thats biting? Is it

        /*
         * If memory slot is created, or moved, we need to clear all
         * mmio sptes.
         */
        if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT) {
                kvm_mmu_zap_all(kvm);
                kvm_reload_remote_mmus(kvm);
        }

Because conditional rescheduling for kvm_mmu_zap_all() might not be
desirable: KVM_SET_USER_MEMORY has low latency requirements.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] KVM: MMU: Move free_zapped_mmu_pages() out of the protection of mmu_lock
  2013-02-04 13:50   ` Marcelo Tosatti
@ 2013-02-05  2:21     ` Takuya Yoshikawa
  0 siblings, 0 replies; 21+ messages in thread
From: Takuya Yoshikawa @ 2013-02-05  2:21 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: gleb, kvm

On Mon, 4 Feb 2013 11:50:00 -0200
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> On Wed, Jan 23, 2013 at 07:18:11PM +0900, Takuya Yoshikawa wrote:
> > We noticed that kvm_mmu_zap_all() could take hundreds of milliseconds
> > for zapping mmu pages with mmu_lock held.
> > 
> > Although we need to do conditional rescheduling for completely
> > fixing this issue, we can reduce the hold time to some extent by moving
> > free_zapped_mmu_pages() out of the protection.  Since invalid_list can
> > be very long, the effect is not negligible.
> > 
> > Note: this patch does not treat non-trivial cases.
> > 
> > Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> 
> Can you describe the case thats biting? Is it

Non-trivial cases: a few functions that indirectly call zap_page() and
cannot access invalid_list outside of mmu_lock.  Not worth fixing, I think.

> 
>         /*
>          * If memory slot is created, or moved, we need to clear all
>          * mmio sptes.
>          */
>         if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT) {
>                 kvm_mmu_zap_all(kvm);
>                 kvm_reload_remote_mmus(kvm);
>         }
> 
> Because conditional rescheduling for kvm_mmu_zap_all() might not be
> desirable: KVM_SET_USER_MEMORY has low latency requirements.

This case is problematic.  With huge pages in use, things gets improved
to some extent: big guests need TDP and THP anyway.

But as Avi noted once, we need a way to make long mmu_lock holder break
for achieving lock-less TLB flushes.  Xiao's work may help zap_all() case.


But in general, protecting post zap work by spin_lock is not desirable.
I'll think about inaccurate n_used_mmu_pages problem you pointed out.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages
  2013-02-04 13:42   ` Marcelo Tosatti
@ 2013-02-05  5:30     ` Xiao Guangrong
  0 siblings, 0 replies; 21+ messages in thread
From: Xiao Guangrong @ 2013-02-05  5:30 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Takuya Yoshikawa, gleb, kvm

On 02/04/2013 09:42 PM, Marcelo Tosatti wrote:
> On Wed, Jan 23, 2013 at 06:44:52PM +0800, Xiao Guangrong wrote:
>> On 01/23/2013 06:12 PM, Takuya Yoshikawa wrote:
>>> This patch set mitigates another mmu_lock hold time issue.  Although
>>> this is not enough and I'm thinking of additional work already, this
>>> alone can reduce the lock hold time to some extent.
>>>
>>
>> It is not worth doing this kind of complex thing, usually, only a few pages on
>> the invalid list.
> 
> I think its a good idea - memory freeing can be done outside mmu_lock
> protection (as long as its bounded). It reduces mmu_lock contention
> overall.

It is not much help since we still need to walk and delete all shadow
pages, rmaps and parenet-pte-list - still need cost lots of time and not
good for the scalability.

> 
>> The *really* heavily case is kvm_mmu_zap_all() which can be speeded
>> up by using generation number, this is a todo work in kvm wiki:
>>
>> http://www.linux-kvm.org/page/TODO: O(1) mmu invalidation using a generation number
>>
>> I am doing this work for some weeks and will post the patch out during these days.
> 
> Can you describe the generation number scheme in more detail, please?

Yes, but i currently use a simple way instead of the generation number.

The optimization way is, we can switch the hashtable and rmaps into the new one, then
the later page fault can install shadow-pages and rmaps on the new one, and the old one
can be directly freed out of mmu-lock.

More detail:

zap_all_shadow_pages:

hold mmu_lock;
LIST_HEAD(active_list);
LIST_HEAD(pte_list_desc);

/*
 * Prepare the root shadow pages since they can not be
 * freed directly.
 */
for_each_root_sp(sp, mmu->root_sp_list) {
	prepare_zap(sp);
	/* Delete it from mmu->active_list */
	list_del_init(sp->link);
}

/* Zap the hashtable and ramp. */
memset(mmu->hashtable, 0);
memset(memslot->rmap, 0);

list_replace_init(mmu->active_sp_list, active_list);

/* All the pte_list_desc for rmap and parent_list */
list_replace_init(mmu->pte_list_desc_list, pte_list_desc);

/* Reload mmu, let the old shadow pages be zapped. */
kvm_reload_remote_mmus(kvm);

release_mmu_lock;

for_each_sp_on_active_list(sp, active_list)
	kvm_mmu_free_page(sp);

for_each_pte_desc(desc, pte_list_desc)
	mmu_free_pte_list_desc(desc);

The patches is being tested on my box, it works well and can improve
zap_all_shadow_pages more than 75%.

============
Note: later we can use the generation number to continue to optimize it:
zap_all_shadow_pages:
   generation_number++;
   kvm_reload_remote_mmus(kvm);

And, on unload_mmu path:

hold mmu_lock
   if (kvm->generation_number != generation_number) {
	switch hashtable and ramp to the new one;
	kvm->generation_number = generation_number
   }
   release mmu_lock

   free the old one

We need to adjust the code of page-fault and sync-children, let them do not
install sp on the old shadow page cache.
=============



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-02-05  5:30 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-23 10:12 [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Takuya Yoshikawa
2013-01-23 10:13 ` [PATCH 1/8] KVM: MMU: Fix and clean up for_each_gfn_* macros Takuya Yoshikawa
2013-01-28 12:24   ` Gleb Natapov
2013-01-28 12:29     ` Takuya Yoshikawa
2013-01-23 10:13 ` [PATCH 2/8] KVM: MMU: Use list_for_each_entry_safe in kvm_mmu_commit_zap_page() Takuya Yoshikawa
2013-01-23 10:14 ` [PATCH 3/8] KVM: MMU: Add a parameter to kvm_mmu_prepare_zap_page() to update the next position Takuya Yoshikawa
2013-01-23 10:15 ` [PATCH 4/8] KVM: MMU: Introduce for_each_gfn_indirect_valid_sp_safe macro Takuya Yoshikawa
2013-01-23 10:16 ` [PATCH 5/8] KVM: MMU: Delete hash_link node in kvm_mmu_prepare_zap_page() Takuya Yoshikawa
2013-01-23 10:16 ` [PATCH 6/8] KVM: MMU: Introduce free_zapped_mmu_pages() for freeing mmu pages in a list Takuya Yoshikawa
2013-01-23 10:17 ` [PATCH 7/8] KVM: MMU: Split out free_zapped_mmu_pages() from kvm_mmu_commit_zap_page() Takuya Yoshikawa
2013-01-23 10:18 ` [PATCH 8/8] KVM: MMU: Move free_zapped_mmu_pages() out of the protection of mmu_lock Takuya Yoshikawa
2013-02-04 13:50   ` Marcelo Tosatti
2013-02-05  2:21     ` Takuya Yoshikawa
2013-01-23 10:44 ` [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages Xiao Guangrong
2013-01-23 13:28   ` Takuya Yoshikawa
2013-01-23 13:45     ` Xiao Guangrong
2013-01-23 14:49       ` Takuya Yoshikawa
2013-01-23 15:45         ` Xiao Guangrong
2013-02-04 13:42   ` Marcelo Tosatti
2013-02-05  5:30     ` Xiao Guangrong
2013-02-04 13:29 ` Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.