linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages
@ 2013-05-31  0:36 Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 01/11] KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall Xiao Guangrong
                   ` (13 more replies)
  0 siblings, 14 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

Hi Gleb, Paolo, Marcelo,

I have putted the potential controversial patches to the latter that are
patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed,
I think its are ready for being merged. If not luck enough, further discussion
is needed, could you please apply that patches first? :)

Thank you in advance!

Some points are raised during discussion but missed in this version:
1) Gleb's idea that skip obsolete pages in the hast list walker

   Unfortunately, it is not safe. There has a window between updating
   valid-gen and reloading mmu, in that window, the obsolete page can
   be used by vcpu, but the guest page table fail to be write-protected
   (since the obsolete page is skipped in mmu_need_write_protect()).

   Instead, we can only skip the zapped-obsolete page
   (is_obsolete_sp(sp) && sp->role.invalid)), the current code has already
   skip them but put the comment around the hash list walker to warn the
   further development.

2) Marcelo's comment that obsolete pages can cause the number of shadow page
   greater than the n_max_mmu_pages

   I am not sure this is really a problem, it only exists in the really tiny
   window and the page-reclaim path are able to handle the obsolete pages.
   Furthermore, we can properly reduce n_max_mmu_pages to make that window
   more tiny.

   Anyway, like commit 5d21881432 shows that "the mmu counters are for
   beancounting purposes only", maybe that window is allowed.

Changlog:
V8:
  1): add some comments to explain FIFO around active_mmu_list address
      Marcelo's comments.

  2): the page-reclaim path may fail to free zapped-obsolete pages pointed
      out by Marcelo, the patchset fixes it by listing all zapped obsolete
      pages on a global list, always free page from that list first.

  3): address Marcelo's suggestion to move the "zap pages in batch" patch
      to the latter.

  4): drop the previous patch which introduced
      kvm_mmu_prepare_zap_obsolete_page(), instead, we put the comments
      around hash list walker to warn the user that the zapped-obsolete
      page still live on hash list.

  5): add the note into the changelog of "zap pages in batch" patch to explain
      the batch number is the speculative value based on Takuya's comments.

V7:
  1): separate some optimization into two patches which do not reuse
      the obsolete pages and collapse tlb flushes, suggested by Marcelo.

  2): make the patch based on Gleb's diff change which reduce
      KVM_REQ_MMU_RELOAD when root page is being zapped.

  3): remove calling kvm_mmu_zap_page when patching hypercall, investigated
      by Gleb.

  4): drop the patch which deleted page from hash list at the "prepare"
      time since it can break the walk based on hash list.

  5): rename kvm_mmu_invalidate_all_pages to kvm_mmu_invalidate_zap_all_pages.

  6): introduce kvm_mmu_prepare_zap_obsolete_page which is used to zap obsolete
      page to collapse tlb flushes.

V6:
  1): reversely walk active_list to skip the new created pages based
      on the comments from Gleb and Paolo.

  2): completely replace kvm_mmu_zap_all by kvm_mmu_invalidate_all_pages
      based on Gleb's comments.

  3): improve the parameters of kvm_mmu_invalidate_all_pages based on
      Gleb's comments.
 
  4): rename kvm_mmu_invalidate_memslot_pages to kvm_mmu_invalidate_all_pages
  5): rename zap_invalid_pages to kvm_zap_obsolete_pages

V5:
  1): rename is_valid_sp to is_obsolete_sp
  2): use lock-break technique to zap all old pages instead of only pages
      linked on invalid slot's rmap suggested by Marcelo.
  3): trace invalid pages and kvm_mmu_invalidate_memslot_pages()
  4): rename kvm_mmu_invalid_memslot_pages to kvm_mmu_invalidate_memslot_pages
      according to Takuya's comments.

V4:
  1): drop unmapping invalid rmap out of mmu-lock and use lock-break technique
      instead. Thanks to Gleb's comments.

  2): needn't handle invalid-gen pages specially due to page table always
      switched by KVM_REQ_MMU_RELOAD. Thanks to Marcelo's comments.

V3:
  completely redesign the algorithm, please see below.

V2:
  - do not reset n_requested_mmu_pages and n_max_mmu_pages
  - batch free root shadow pages to reduce vcpu notification and mmu-lock
    contention
  - remove the first patch that introduce kvm->arch.mmu_cache since we only
    'memset zero' on hashtable rather than all mmu cache members in this
    version
  - remove unnecessary kvm_reload_remote_mmus after kvm_mmu_zap_all

* Issue
The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
walk and zap all shadow pages one by one, also it need to zap all guest
page's rmap and all shadow page's parent spte list. Particularly, things
become worse if guest uses more memory or vcpus. It is not good for
scalability.

* Idea
KVM maintains a global mmu invalid generation-number which is stored in
kvm->arch.mmu_valid_gen and every shadow page stores the current global
generation-number into sp->mmu_valid_gen when it is created.

When KVM need zap all shadow pages sptes, it just simply increase the
global generation-number then reload root shadow pages on all vcpus.
Vcpu will create a new shadow page table according to current kvm's
generation-number. It ensures the old pages are not used any more.

Then the invalid-gen pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen)
are zapped by using lock-break technique.

Gleb Natapov (1):
  KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped

Xiao Guangrong (10):
  KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall
  KVM: MMU: drop unnecessary kvm_reload_remote_mmus
  KVM: MMU: fast invalidate all pages
  KVM: x86: use the fast way to invalidate all pages
  KVM: MMU: show mmu_valid_gen in shadow page related tracepoints
  KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages
  KVM: MMU: do not reuse the obsolete page
  KVM: MMU: zap pages in batch
  KVM: MMU: collapse TLB flushes when zap all pages
  KVM: MMU: reclaim the zapped-obsolete page first

 arch/x86/include/asm/kvm_host.h |    4 +
 arch/x86/kvm/mmu.c              |  128 ++++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/mmu.h              |    1 +
 arch/x86/kvm/mmutrace.h         |   42 ++++++++++---
 arch/x86/kvm/x86.c              |   17 +----
 5 files changed, 161 insertions(+), 31 deletions(-)

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v8 01/11] KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 02/11] KVM: MMU: drop unnecessary kvm_reload_remote_mmus Xiao Guangrong
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

Quote Gleb's mail:

| Back then kvm->lock protected memslot access so code like:
|
| mutex_lock(&vcpu->kvm->lock);
| kvm_mmu_zap_all(vcpu->kvm);
| mutex_unlock(&vcpu->kvm->lock);
|
| which is what 7aa81cc0 does was enough to guaranty that no vcpu will
| run while code is patched. This is no longer the case and
| mutex_lock(&vcpu->kvm->lock); is gone from that code path long time ago,
| so now kvm_mmu_zap_all() there is useless and the code is incorrect.

So we drop it and it will be fixed later

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c |    7 -------
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8d28810..6739b1d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5523,13 +5523,6 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
 	char instruction[3];
 	unsigned long rip = kvm_rip_read(vcpu);
 
-	/*
-	 * Blow out the MMU to ensure that no other VCPU has an active mapping
-	 * to ensure that the updated hypercall appears atomically across all
-	 * VCPUs.
-	 */
-	kvm_mmu_zap_all(vcpu->kvm);
-
 	kvm_x86_ops->patch_hypercall(vcpu, instruction);
 
 	return emulator_write_emulated(ctxt, rip, instruction, 3, NULL);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 02/11] KVM: MMU: drop unnecessary kvm_reload_remote_mmus
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 01/11] KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 03/11] KVM: MMU: fast invalidate all pages Xiao Guangrong
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

It is the responsibility of kvm_mmu_zap_all that keeps the
consistent of mmu and tlbs. And it is also unnecessary after
zap all mmio sptes since no mmio spte exists on root shadow
page and it can not be cached into tlb

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6739b1d..3758ff9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7060,16 +7060,13 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 	 * If memory slot is created, or moved, we need to clear all
 	 * mmio sptes.
 	 */
-	if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
+	if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE))
 		kvm_mmu_zap_mmio_sptes(kvm);
-		kvm_reload_remote_mmus(kvm);
-	}
 }
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_mmu_zap_all(kvm);
-	kvm_reload_remote_mmus(kvm);
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 03/11] KVM: MMU: fast invalidate all pages
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 01/11] KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 02/11] KVM: MMU: drop unnecessary kvm_reload_remote_mmus Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 04/11] KVM: x86: use the fast way to " Xiao Guangrong
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
walk and zap all shadow pages one by one, also it need to zap all guest
page's rmap and all shadow page's parent spte list. Particularly, things
become worse if guest uses more memory or vcpus. It is not good for
scalability

In this patch, we introduce a faster way to invalidate all shadow pages.
KVM maintains a global mmu invalid generation-number which is stored in
kvm->arch.mmu_valid_gen and every shadow page stores the current global
generation-number into sp->mmu_valid_gen when it is created

When KVM need zap all shadow pages sptes, it just simply increase the
global generation-number then reload root shadow pages on all vcpus.
Vcpu will create a new shadow page table according to current kvm's
generation-number. It ensures the old pages are not used any more.
Then the obsolete pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen)
are zapped by using lock-break technique

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_host.h |    2 +
 arch/x86/kvm/mmu.c              |   90 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu.h              |    1 +
 3 files changed, 93 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3741c65..bff7d46 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -222,6 +222,7 @@ struct kvm_mmu_page {
 	int root_count;          /* Currently serving as active root */
 	unsigned int unsync_children;
 	unsigned long parent_ptes;	/* Reverse mapping for parent_pte */
+	unsigned long mmu_valid_gen;
 	DECLARE_BITMAP(unsync_child_bitmap, 512);
 
 #ifdef CONFIG_X86_32
@@ -529,6 +530,7 @@ struct kvm_arch {
 	unsigned int n_requested_mmu_pages;
 	unsigned int n_max_mmu_pages;
 	unsigned int indirect_shadow_pages;
+	unsigned long mmu_valid_gen;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 	/*
 	 * Hash table of struct kvm_mmu_page.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f8ca2f3..d71bf8f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1511,6 +1511,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 	if (!direct)
 		sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache);
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
+
+	/*
+	 * The active_mmu_pages list is the FIFO list, do not move the
+	 * page until it is zapped. kvm_zap_obsolete_pages depends on
+	 * this feature. See the comments in kvm_zap_obsolete_pages().
+	 */
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
 	sp->parent_ptes = 0;
 	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
@@ -1838,6 +1844,11 @@ static void clear_sp_write_flooding_count(u64 *spte)
 	__clear_sp_write_flooding_count(sp);
 }
 
+static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	return unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
+}
+
 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 					     gfn_t gfn,
 					     gva_t gaddr,
@@ -1900,6 +1911,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 
 		account_shadowed(vcpu->kvm, gfn);
 	}
+	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
 	init_shadow_page_table(sp);
 	trace_kvm_mmu_get_page(sp, true);
 	return sp;
@@ -2070,8 +2082,10 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 	ret = mmu_zap_unsync_children(kvm, sp, invalid_list);
 	kvm_mmu_page_unlink_children(kvm, sp);
 	kvm_mmu_unlink_parents(kvm, sp);
+
 	if (!sp->role.invalid && !sp->role.direct)
 		unaccount_shadowed(kvm, sp->gfn);
+
 	if (sp->unsync)
 		kvm_unlink_unsync_page(kvm, sp);
 	if (!sp->root_count) {
@@ -4195,6 +4209,82 @@ restart:
 	spin_unlock(&kvm->mmu_lock);
 }
 
+static void kvm_zap_obsolete_pages(struct kvm *kvm)
+{
+	struct kvm_mmu_page *sp, *node;
+	LIST_HEAD(invalid_list);
+
+restart:
+	list_for_each_entry_safe_reverse(sp, node,
+	      &kvm->arch.active_mmu_pages, link) {
+		/*
+		 * No obsolete page exists before new created page since
+		 * active_mmu_pages is the FIFO list.
+		 */
+		if (!is_obsolete_sp(kvm, sp))
+			break;
+
+		/*
+		 * Do not repeatedly zap a root page to avoid unnecessary
+		 * KVM_REQ_MMU_RELOAD, otherwise we may not be able to
+		 * progress:
+		 *    vcpu 0                        vcpu 1
+		 *                         call vcpu_enter_guest():
+		 *                            1): handle KVM_REQ_MMU_RELOAD
+		 *                                and require mmu-lock to
+		 *                                load mmu
+		 * repeat:
+		 *    1): zap root page and
+		 *        send KVM_REQ_MMU_RELOAD
+		 *
+		 *    2): if (cond_resched_lock(mmu-lock))
+		 *
+		 *                            2): hold mmu-lock and load mmu
+		 *
+		 *                            3): see KVM_REQ_MMU_RELOAD bit
+		 *                                on vcpu->requests is set
+		 *                                then return 1 to call
+		 *                                vcpu_enter_guest() again.
+		 *            goto repeat;
+		 *
+		 * Since we are reversely walking the list and the invalid
+		 * list will be moved to the head, skip the invalid page
+		 * can help us to avoid the infinity list walking.
+		 */
+		if (sp->role.invalid)
+			continue;
+
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+			kvm_mmu_commit_zap_page(kvm, &invalid_list);
+			cond_resched_lock(&kvm->mmu_lock);
+			goto restart;
+		}
+
+		if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
+			goto restart;
+	}
+
+	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+}
+
+/*
+ * Fast invalidate all shadow pages and use lock-break technique
+ * to zap obsolete pages.
+ *
+ * It's required when memslot is being deleted or VM is being
+ * destroyed, in these cases, we should ensure that KVM MMU does
+ * not use any resource of the being-deleted slot or all slots
+ * after calling the function.
+ */
+void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
+{
+	spin_lock(&kvm->mmu_lock);
+	kvm->arch.mmu_valid_gen++;
+
+	kvm_zap_obsolete_pages(kvm);
+	spin_unlock(&kvm->mmu_lock);
+}
+
 void kvm_mmu_zap_mmio_sptes(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2adcbc2..922bfae 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -97,4 +97,5 @@ static inline bool permission_fault(struct kvm_mmu *mmu, unsigned pte_access,
 	return (mmu->permissions[pfec >> 1] >> pte_access) & 1;
 }
 
+void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm);
 #endif
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 04/11] KVM: x86: use the fast way to invalidate all pages
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (2 preceding siblings ...)
  2013-05-31  0:36 ` [PATCH v8 03/11] KVM: MMU: fast invalidate all pages Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 05/11] KVM: MMU: show mmu_valid_gen in shadow page related tracepoints Xiao Guangrong
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

Replace kvm_mmu_zap_all by kvm_mmu_invalidate_zap_all_pages

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   15 ---------------
 arch/x86/kvm/x86.c |    4 ++--
 2 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d71bf8f..c8063b9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4194,21 +4194,6 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
 	spin_unlock(&kvm->mmu_lock);
 }
 
-void kvm_mmu_zap_all(struct kvm *kvm)
-{
-	struct kvm_mmu_page *sp, *node;
-	LIST_HEAD(invalid_list);
-
-	spin_lock(&kvm->mmu_lock);
-restart:
-	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link)
-		if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
-			goto restart;
-
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
-	spin_unlock(&kvm->mmu_lock);
-}
-
 static void kvm_zap_obsolete_pages(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3758ff9..15e10f7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7066,13 +7066,13 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
-	kvm_mmu_zap_all(kvm);
+	kvm_mmu_invalidate_zap_all_pages(kvm);
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
-	kvm_arch_flush_shadow_all(kvm);
+	kvm_mmu_invalidate_zap_all_pages(kvm);
 }
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 05/11] KVM: MMU: show mmu_valid_gen in shadow page related tracepoints
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (3 preceding siblings ...)
  2013-05-31  0:36 ` [PATCH v8 04/11] KVM: x86: use the fast way to " Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 06/11] KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages Xiao Guangrong
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

Show sp->mmu_valid_gen

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmutrace.h |   22 ++++++++++++----------
 1 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index b8f6172..697f466 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -7,16 +7,18 @@
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvmmmu
 
-#define KVM_MMU_PAGE_FIELDS \
-	__field(__u64, gfn) \
-	__field(__u32, role) \
-	__field(__u32, root_count) \
+#define KVM_MMU_PAGE_FIELDS			\
+	__field(unsigned long, mmu_valid_gen)	\
+	__field(__u64, gfn)			\
+	__field(__u32, role)			\
+	__field(__u32, root_count)		\
 	__field(bool, unsync)
 
-#define KVM_MMU_PAGE_ASSIGN(sp)			     \
-	__entry->gfn = sp->gfn;			     \
-	__entry->role = sp->role.word;		     \
-	__entry->root_count = sp->root_count;        \
+#define KVM_MMU_PAGE_ASSIGN(sp)				\
+	__entry->mmu_valid_gen = sp->mmu_valid_gen;	\
+	__entry->gfn = sp->gfn;				\
+	__entry->role = sp->role.word;			\
+	__entry->root_count = sp->root_count;		\
 	__entry->unsync = sp->unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
@@ -28,8 +30,8 @@
 								        \
 	role.word = __entry->role;					\
 									\
-	trace_seq_printf(p, "sp gfn %llx %u%s q%u%s %s%s"		\
-			 " %snxe root %u %s%c",				\
+	trace_seq_printf(p, "sp gen %lx gfn %llx %u%s q%u%s %s%s"	\
+			 " %snxe root %u %s%c",	__entry->mmu_valid_gen,	\
 			 __entry->gfn, role.level,			\
 			 role.cr4_pae ? " pae" : "",			\
 			 role.quadrant,					\
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 06/11] KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (4 preceding siblings ...)
  2013-05-31  0:36 ` [PATCH v8 05/11] KVM: MMU: show mmu_valid_gen in shadow page related tracepoints Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 07/11] KVM: MMU: do not reuse the obsolete page Xiao Guangrong
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

It is good for debug and development

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c      |    1 +
 arch/x86/kvm/mmutrace.h |   20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c8063b9..3fd060a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4264,6 +4264,7 @@ restart:
 void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
 {
 	spin_lock(&kvm->mmu_lock);
+	trace_kvm_mmu_invalidate_zap_all_pages(kvm);
 	kvm->arch.mmu_valid_gen++;
 
 	kvm_zap_obsolete_pages(kvm);
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 697f466..eb444dd 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -276,6 +276,26 @@ TRACE_EVENT(
 		  __spte_satisfied(old_spte), __spte_satisfied(new_spte)
 	)
 );
+
+TRACE_EVENT(
+	kvm_mmu_invalidate_zap_all_pages,
+	TP_PROTO(struct kvm *kvm),
+	TP_ARGS(kvm),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, mmu_valid_gen)
+		__field(unsigned int, mmu_used_pages)
+	),
+
+	TP_fast_assign(
+		__entry->mmu_valid_gen = kvm->arch.mmu_valid_gen;
+		__entry->mmu_used_pages = kvm->arch.n_used_mmu_pages;
+	),
+
+	TP_printk("kvm-mmu-valid-gen %lx used_pages %x",
+		  __entry->mmu_valid_gen, __entry->mmu_used_pages
+	)
+);
 #endif /* _TRACE_KVMMMU_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 07/11] KVM: MMU: do not reuse the obsolete page
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (5 preceding siblings ...)
  2013-05-31  0:36 ` [PATCH v8 06/11] KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 08/11] KVM: MMU: zap pages in batch Xiao Guangrong
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

The obsolete page will be zapped soon, do not reuse it to
reduce future page fault

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3fd060a..0880b9b4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1875,6 +1875,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		role.quadrant = quadrant;
 	}
 	for_each_gfn_sp(vcpu->kvm, sp, gfn) {
+		if (is_obsolete_sp(vcpu->kvm, sp))
+			continue;
+
 		if (!need_sync && sp->unsync)
 			need_sync = true;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 08/11] KVM: MMU: zap pages in batch
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (6 preceding siblings ...)
  2013-05-31  0:36 ` [PATCH v8 07/11] KVM: MMU: do not reuse the obsolete page Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 09/11] KVM: MMU: collapse TLB flushes when zap all pages Xiao Guangrong
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

Zap at lease 10 pages before releasing mmu-lock to reduce the overload
caused by requiring lock

After the patch, kvm_zap_obsolete_pages can forward progress anyway,
so update the comments

[ It improves the case 0.6% ~ 1% that do kernel building meanwhile read
  PCI ROM. ]

Note: i am not sure that "10" is the best speculative value, i just
guessed that '10' can make vcpu do not spend long time on
kvm_zap_obsolete_pages and do not cause mmu-lock too hungry.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   35 +++++++++++------------------------
 1 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0880b9b4..fe9d6f1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4197,14 +4197,18 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
 	spin_unlock(&kvm->mmu_lock);
 }
 
+#define BATCH_ZAP_PAGES	10
 static void kvm_zap_obsolete_pages(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
 	LIST_HEAD(invalid_list);
+	int batch = 0;
 
 restart:
 	list_for_each_entry_safe_reverse(sp, node,
 	      &kvm->arch.active_mmu_pages, link) {
+		int ret;
+
 		/*
 		 * No obsolete page exists before new created page since
 		 * active_mmu_pages is the FIFO list.
@@ -4213,28 +4217,6 @@ restart:
 			break;
 
 		/*
-		 * Do not repeatedly zap a root page to avoid unnecessary
-		 * KVM_REQ_MMU_RELOAD, otherwise we may not be able to
-		 * progress:
-		 *    vcpu 0                        vcpu 1
-		 *                         call vcpu_enter_guest():
-		 *                            1): handle KVM_REQ_MMU_RELOAD
-		 *                                and require mmu-lock to
-		 *                                load mmu
-		 * repeat:
-		 *    1): zap root page and
-		 *        send KVM_REQ_MMU_RELOAD
-		 *
-		 *    2): if (cond_resched_lock(mmu-lock))
-		 *
-		 *                            2): hold mmu-lock and load mmu
-		 *
-		 *                            3): see KVM_REQ_MMU_RELOAD bit
-		 *                                on vcpu->requests is set
-		 *                                then return 1 to call
-		 *                                vcpu_enter_guest() again.
-		 *            goto repeat;
-		 *
 		 * Since we are reversely walking the list and the invalid
 		 * list will be moved to the head, skip the invalid page
 		 * can help us to avoid the infinity list walking.
@@ -4242,13 +4224,18 @@ restart:
 		if (sp->role.invalid)
 			continue;
 
-		if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+		if (batch >= BATCH_ZAP_PAGES &&
+		      (need_resched() || spin_needbreak(&kvm->mmu_lock))) {
+			batch = 0;
 			kvm_mmu_commit_zap_page(kvm, &invalid_list);
 			cond_resched_lock(&kvm->mmu_lock);
 			goto restart;
 		}
 
-		if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
+		ret = kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+		batch += ret;
+
+		if (ret)
 			goto restart;
 	}
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 09/11] KVM: MMU: collapse TLB flushes when zap all pages
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (7 preceding siblings ...)
  2013-05-31  0:36 ` [PATCH v8 08/11] KVM: MMU: zap pages in batch Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 10/11] KVM: MMU: reclaim the zapped-obsolete page first Xiao Guangrong
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

kvm_zap_obsolete_pages uses lock-break technique to zap pages,
it will flush tlb every time when it does lock-break

We can reload mmu on all vcpus after updating the generation
number so that the obsolete pages are not used on any vcpus,
after that we do not need to flush tlb when obsolete pages
are zapped

It will do kvm_mmu_prepare_zap_page many times and use one
kvm_mmu_commit_zap_page to collapse tlb flush, the side-effects
is that causes obsolete pages unlinked from active_list but leave
on hash-list, so we add the comment around the hash list walker

Note: kvm_mmu_commit_zap_page is still needed before free
the pages since other vcpus may be doing locklessly shadow
page walking

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   33 ++++++++++++++++++++++++++++++---
 1 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fe9d6f1..674c044 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1654,6 +1654,16 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 				    struct list_head *invalid_list);
 
+/*
+ * NOTE: we should pay more attention on the zapped-obsolete page
+ * (is_obsolete_sp(sp) && sp->role.invalid) when you do hash list walk
+ * since it has been deleted from active_mmu_pages but still can be found
+ * at hast list.
+ *
+ * for_each_gfn_indirect_valid_sp has skipped that kind of page and
+ * kvm_mmu_get_page(), the only user of for_each_gfn_sp(), has skipped
+ * all the obsolete pages.
+ */
 #define for_each_gfn_sp(_kvm, _sp, _gfn)				\
 	hlist_for_each_entry(_sp,					\
 	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \
@@ -4224,11 +4234,13 @@ restart:
 		if (sp->role.invalid)
 			continue;
 
+		/*
+		 * Need not flush tlb since we only zap the sp with invalid
+		 * generation number.
+		 */
 		if (batch >= BATCH_ZAP_PAGES &&
-		      (need_resched() || spin_needbreak(&kvm->mmu_lock))) {
+		      cond_resched_lock(&kvm->mmu_lock)) {
 			batch = 0;
-			kvm_mmu_commit_zap_page(kvm, &invalid_list);
-			cond_resched_lock(&kvm->mmu_lock);
 			goto restart;
 		}
 
@@ -4239,6 +4251,10 @@ restart:
 			goto restart;
 	}
 
+	/*
+	 * Should flush tlb before free page tables since lockless-walking
+	 * may use the pages.
+	 */
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 }
 
@@ -4257,6 +4273,17 @@ void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
 	trace_kvm_mmu_invalidate_zap_all_pages(kvm);
 	kvm->arch.mmu_valid_gen++;
 
+	/*
+	 * Notify all vcpus to reload its shadow page table
+	 * and flush TLB. Then all vcpus will switch to new
+	 * shadow page table with the new mmu_valid_gen.
+	 *
+	 * Note: we should do this under the protection of
+	 * mmu-lock, otherwise, vcpu would purge shadow page
+	 * but miss tlb flush.
+	 */
+	kvm_reload_remote_mmus(kvm);
+
 	kvm_zap_obsolete_pages(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 10/11] KVM: MMU: reclaim the zapped-obsolete page first
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (8 preceding siblings ...)
  2013-05-31  0:36 ` [PATCH v8 09/11] KVM: MMU: collapse TLB flushes when zap all pages Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-05-31  0:36 ` [PATCH v8 11/11] KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped Xiao Guangrong
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

As Marcelo pointed out that
| "(retention of large number of pages while zapping)
| can be fatal, it can lead to OOM and host crash"

We introduce a list, kvm->arch.zapped_obsolete_pages, to link all
the pages which are deleted from the mmu cache but not actually
freed. When page reclaiming is needed, we always zap this kind of
pages first.

[
  Can we use this list to instead all of "invalid_list"? That may
  be interesting and will cause big change. Will do it separately
  if it is necessary.
]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_host.h |    2 ++
 arch/x86/kvm/mmu.c              |   21 +++++++++++++++++----
 arch/x86/kvm/x86.c              |    1 +
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bff7d46..1f98c1b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -536,6 +536,8 @@ struct kvm_arch {
 	 * Hash table of struct kvm_mmu_page.
 	 */
 	struct list_head active_mmu_pages;
+	struct list_head zapped_obsolete_pages;
+
 	struct list_head assigned_dev_head;
 	struct iommu_domain *iommu_domain;
 	int iommu_flags;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 674c044..79af88a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4211,7 +4211,6 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
 static void kvm_zap_obsolete_pages(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
-	LIST_HEAD(invalid_list);
 	int batch = 0;
 
 restart:
@@ -4244,7 +4243,8 @@ restart:
 			goto restart;
 		}
 
-		ret = kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+		ret = kvm_mmu_prepare_zap_page(kvm, sp,
+				&kvm->arch.zapped_obsolete_pages);
 		batch += ret;
 
 		if (ret)
@@ -4255,7 +4255,7 @@ restart:
 	 * Should flush tlb before free page tables since lockless-walking
 	 * may use the pages.
 	 */
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
 }
 
 /*
@@ -4306,6 +4306,11 @@ restart:
 	spin_unlock(&kvm->mmu_lock);
 }
 
+static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
+{
+	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
+}
+
 static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 {
 	struct kvm *kvm;
@@ -4334,15 +4339,23 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 		 * want to shrink a VM that only started to populate its MMU
 		 * anyway.
 		 */
-		if (!kvm->arch.n_used_mmu_pages)
+		if (!kvm->arch.n_used_mmu_pages &&
+		      !kvm_has_zapped_obsolete_pages(kvm))
 			continue;
 
 		idx = srcu_read_lock(&kvm->srcu);
 		spin_lock(&kvm->mmu_lock);
 
+		if (kvm_has_zapped_obsolete_pages(kvm)) {
+			kvm_mmu_commit_zap_page(kvm,
+			      &kvm->arch.zapped_obsolete_pages);
+			goto unlock;
+		}
+
 		prepare_zap_oldest_mmu_page(kvm, &invalid_list);
 		kvm_mmu_commit_zap_page(kvm, &invalid_list);
 
+unlock:
 		spin_unlock(&kvm->mmu_lock);
 		srcu_read_unlock(&kvm->srcu, idx);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 15e10f7..6402951 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6832,6 +6832,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		return -EINVAL;
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
+	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
 	INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
 
 	/* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 11/11] KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (9 preceding siblings ...)
  2013-05-31  0:36 ` [PATCH v8 10/11] KVM: MMU: reclaim the zapped-obsolete page first Xiao Guangrong
@ 2013-05-31  0:36 ` Xiao Guangrong
  2013-06-05  1:26 ` [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Marcelo Tosatti
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 18+ messages in thread
From: Xiao Guangrong @ 2013-05-31  0:36 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

From: Gleb Natapov <gleb@redhat.com>

Quote Gleb's mail:
| why don't we check for sp->role.invalid in
| kvm_mmu_prepare_zap_page before calling kvm_reload_remote_mmus()?

and

| Actually we can add check for is_obsolete_sp() there too since
| kvm_mmu_invalidate_all_pages() already calls kvm_reload_remote_mmus()
| after incrementing mmu_valid_gen.

[ Xiao: add some comments and the check of is_obsolete_sp() ]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 79af88a..6941fa7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2108,7 +2108,13 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 		kvm_mod_used_mmu_pages(kvm, -1);
 	} else {
 		list_move(&sp->link, &kvm->arch.active_mmu_pages);
-		kvm_reload_remote_mmus(kvm);
+
+		/*
+		 * The obsolete pages can not be used on any vcpus.
+		 * See the comments in kvm_mmu_invalidate_zap_all_pages().
+		 */
+		if (!sp->role.invalid && !is_obsolete_sp(kvm, sp))
+			kvm_reload_remote_mmus(kvm);
 	}
 
 	sp->role.invalid = 1;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (10 preceding siblings ...)
  2013-05-31  0:36 ` [PATCH v8 11/11] KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped Xiao Guangrong
@ 2013-06-05  1:26 ` Marcelo Tosatti
  2013-06-05  9:51   ` Gleb Natapov
  2013-06-05  9:52 ` Gleb Natapov
  2013-06-09  8:53 ` Gleb Natapov
  13 siblings, 1 reply; 18+ messages in thread
From: Marcelo Tosatti @ 2013-06-05  1:26 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, avi.kivity, pbonzini, linux-kernel, kvm

On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote:
> Hi Gleb, Paolo, Marcelo,
> 
> I have putted the potential controversial patches to the latter that are
> patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed,
> I think its are ready for being merged. If not luck enough, further discussion
> is needed, could you please apply that patches first? :)
> 
> Thank you in advance!

<snip>

Looks good to me.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages
  2013-06-05  1:26 ` [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Marcelo Tosatti
@ 2013-06-05  9:51   ` Gleb Natapov
  0 siblings, 0 replies; 18+ messages in thread
From: Gleb Natapov @ 2013-06-05  9:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, avi.kivity, pbonzini, linux-kernel, kvm

On Tue, Jun 04, 2013 at 10:26:01PM -0300, Marcelo Tosatti wrote:
> On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote:
> > Hi Gleb, Paolo, Marcelo,
> > 
> > I have putted the potential controversial patches to the latter that are
> > patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed,
> > I think its are ready for being merged. If not luck enough, further discussion
> > is needed, could you please apply that patches first? :)
> > 
> > Thank you in advance!
> 
> <snip>
> 
> Looks good to me.
I'll take it as Reviewed-by for the entire series :)

--
			Gleb.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (11 preceding siblings ...)
  2013-06-05  1:26 ` [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Marcelo Tosatti
@ 2013-06-05  9:52 ` Gleb Natapov
  2013-06-09  8:53 ` Gleb Natapov
  13 siblings, 0 replies; 18+ messages in thread
From: Gleb Natapov @ 2013-06-05  9:52 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm

On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote:
> Hi Gleb, Paolo, Marcelo,
> 
> I have putted the potential controversial patches to the latter that are
> patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed,
> I think its are ready for being merged. If not luck enough, further discussion
> is needed, could you please apply that patches first? :)
> 
> Thank you in advance!
> 
Applied all of them to queue. Thanks!

--
			Gleb.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages
  2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
                   ` (12 preceding siblings ...)
  2013-06-05  9:52 ` Gleb Natapov
@ 2013-06-09  8:53 ` Gleb Natapov
  2013-06-09  9:06   ` Xiao Guangrong
  13 siblings, 1 reply; 18+ messages in thread
From: Gleb Natapov @ 2013-06-09  8:53 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm

On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote:
> Hi Gleb, Paolo, Marcelo,
> 
> I have putted the potential controversial patches to the latter that are
> patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed,
> I think its are ready for being merged. If not luck enough, further discussion
> is needed, could you please apply that patches first? :)
> 
> Thank you in advance!
> 
> Some points are raised during discussion but missed in this version:
> 1) Gleb's idea that skip obsolete pages in the hast list walker
> 
>    Unfortunately, it is not safe. There has a window between updating
>    valid-gen and reloading mmu, in that window, the obsolete page can
>    be used by vcpu, but the guest page table fail to be write-protected
>    (since the obsolete page is skipped in mmu_need_write_protect()).
> 
Can you elaborate on how this can happen. valid_gen is updated under
mmu_lock and reloading of mmus happens under the same lock, so for all
other vcpus this should look like atomic thing.

>    Instead, we can only skip the zapped-obsolete page
>    (is_obsolete_sp(sp) && sp->role.invalid)), the current code has already
>    skip them but put the comment around the hash list walker to warn the
>    further development.
> 
> 2) Marcelo's comment that obsolete pages can cause the number of shadow page
>    greater than the n_max_mmu_pages
> 
>    I am not sure this is really a problem, it only exists in the really tiny
>    window and the page-reclaim path are able to handle the obsolete pages.
>    Furthermore, we can properly reduce n_max_mmu_pages to make that window
>    more tiny.
> 
>    Anyway, like commit 5d21881432 shows that "the mmu counters are for
>    beancounting purposes only", maybe that window is allowed.
> 
> Changlog:
> V8:
>   1): add some comments to explain FIFO around active_mmu_list address
>       Marcelo's comments.
> 
>   2): the page-reclaim path may fail to free zapped-obsolete pages pointed
>       out by Marcelo, the patchset fixes it by listing all zapped obsolete
>       pages on a global list, always free page from that list first.
> 
>   3): address Marcelo's suggestion to move the "zap pages in batch" patch
>       to the latter.
> 
>   4): drop the previous patch which introduced
>       kvm_mmu_prepare_zap_obsolete_page(), instead, we put the comments
>       around hash list walker to warn the user that the zapped-obsolete
>       page still live on hash list.
> 
>   5): add the note into the changelog of "zap pages in batch" patch to explain
>       the batch number is the speculative value based on Takuya's comments.
> 
> V7:
>   1): separate some optimization into two patches which do not reuse
>       the obsolete pages and collapse tlb flushes, suggested by Marcelo.
> 
>   2): make the patch based on Gleb's diff change which reduce
>       KVM_REQ_MMU_RELOAD when root page is being zapped.
> 
>   3): remove calling kvm_mmu_zap_page when patching hypercall, investigated
>       by Gleb.
> 
>   4): drop the patch which deleted page from hash list at the "prepare"
>       time since it can break the walk based on hash list.
> 
>   5): rename kvm_mmu_invalidate_all_pages to kvm_mmu_invalidate_zap_all_pages.
> 
>   6): introduce kvm_mmu_prepare_zap_obsolete_page which is used to zap obsolete
>       page to collapse tlb flushes.
> 
> V6:
>   1): reversely walk active_list to skip the new created pages based
>       on the comments from Gleb and Paolo.
> 
>   2): completely replace kvm_mmu_zap_all by kvm_mmu_invalidate_all_pages
>       based on Gleb's comments.
> 
>   3): improve the parameters of kvm_mmu_invalidate_all_pages based on
>       Gleb's comments.
>  
>   4): rename kvm_mmu_invalidate_memslot_pages to kvm_mmu_invalidate_all_pages
>   5): rename zap_invalid_pages to kvm_zap_obsolete_pages
> 
> V5:
>   1): rename is_valid_sp to is_obsolete_sp
>   2): use lock-break technique to zap all old pages instead of only pages
>       linked on invalid slot's rmap suggested by Marcelo.
>   3): trace invalid pages and kvm_mmu_invalidate_memslot_pages()
>   4): rename kvm_mmu_invalid_memslot_pages to kvm_mmu_invalidate_memslot_pages
>       according to Takuya's comments.
> 
> V4:
>   1): drop unmapping invalid rmap out of mmu-lock and use lock-break technique
>       instead. Thanks to Gleb's comments.
> 
>   2): needn't handle invalid-gen pages specially due to page table always
>       switched by KVM_REQ_MMU_RELOAD. Thanks to Marcelo's comments.
> 
> V3:
>   completely redesign the algorithm, please see below.
> 
> V2:
>   - do not reset n_requested_mmu_pages and n_max_mmu_pages
>   - batch free root shadow pages to reduce vcpu notification and mmu-lock
>     contention
>   - remove the first patch that introduce kvm->arch.mmu_cache since we only
>     'memset zero' on hashtable rather than all mmu cache members in this
>     version
>   - remove unnecessary kvm_reload_remote_mmus after kvm_mmu_zap_all
> 
> * Issue
> The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
> walk and zap all shadow pages one by one, also it need to zap all guest
> page's rmap and all shadow page's parent spte list. Particularly, things
> become worse if guest uses more memory or vcpus. It is not good for
> scalability.
> 
> * Idea
> KVM maintains a global mmu invalid generation-number which is stored in
> kvm->arch.mmu_valid_gen and every shadow page stores the current global
> generation-number into sp->mmu_valid_gen when it is created.
> 
> When KVM need zap all shadow pages sptes, it just simply increase the
> global generation-number then reload root shadow pages on all vcpus.
> Vcpu will create a new shadow page table according to current kvm's
> generation-number. It ensures the old pages are not used any more.
> 
> Then the invalid-gen pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen)
> are zapped by using lock-break technique.
> 
> Gleb Natapov (1):
>   KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped
> 
> Xiao Guangrong (10):
>   KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall
>   KVM: MMU: drop unnecessary kvm_reload_remote_mmus
>   KVM: MMU: fast invalidate all pages
>   KVM: x86: use the fast way to invalidate all pages
>   KVM: MMU: show mmu_valid_gen in shadow page related tracepoints
>   KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages
>   KVM: MMU: do not reuse the obsolete page
>   KVM: MMU: zap pages in batch
>   KVM: MMU: collapse TLB flushes when zap all pages
>   KVM: MMU: reclaim the zapped-obsolete page first
> 
>  arch/x86/include/asm/kvm_host.h |    4 +
>  arch/x86/kvm/mmu.c              |  128 ++++++++++++++++++++++++++++++++++++---
>  arch/x86/kvm/mmu.h              |    1 +
>  arch/x86/kvm/mmutrace.h         |   42 ++++++++++---
>  arch/x86/kvm/x86.c              |   17 +----
>  5 files changed, 161 insertions(+), 31 deletions(-)
> 
> -- 
> 1.7.7.6

--
			Gleb.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages
  2013-06-09  8:53 ` Gleb Natapov
@ 2013-06-09  9:06   ` Xiao Guangrong
  2013-06-09  9:09     ` Gleb Natapov
  0 siblings, 1 reply; 18+ messages in thread
From: Xiao Guangrong @ 2013-06-09  9:06 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm

On 06/09/2013 04:53 PM, Gleb Natapov wrote:
> On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote:
>> Hi Gleb, Paolo, Marcelo,
>>
>> I have putted the potential controversial patches to the latter that are
>> patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed,
>> I think its are ready for being merged. If not luck enough, further discussion
>> is needed, could you please apply that patches first? :)
>>
>> Thank you in advance!
>>
>> Some points are raised during discussion but missed in this version:
>> 1) Gleb's idea that skip obsolete pages in the hast list walker
>>
>>    Unfortunately, it is not safe. There has a window between updating
>>    valid-gen and reloading mmu, in that window, the obsolete page can
>>    be used by vcpu, but the guest page table fail to be write-protected
>>    (since the obsolete page is skipped in mmu_need_write_protect()).
>>
> Can you elaborate on how this can happen. valid_gen is updated under
> mmu_lock and reloading of mmus happens under the same lock, so for all
> other vcpus this should look like atomic thing.

You're right.

Actually, i made another optimization patch in this version that moves
kvm_reload_remote_mmus() out of mmu-lock, but did not attach it into this
series. It seems my brain is not parallel-able enough. :(


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages
  2013-06-09  9:06   ` Xiao Guangrong
@ 2013-06-09  9:09     ` Gleb Natapov
  0 siblings, 0 replies; 18+ messages in thread
From: Gleb Natapov @ 2013-06-09  9:09 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm

On Sun, Jun 09, 2013 at 05:06:24PM +0800, Xiao Guangrong wrote:
> On 06/09/2013 04:53 PM, Gleb Natapov wrote:
> > On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote:
> >> Hi Gleb, Paolo, Marcelo,
> >>
> >> I have putted the potential controversial patches to the latter that are
> >> patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed,
> >> I think its are ready for being merged. If not luck enough, further discussion
> >> is needed, could you please apply that patches first? :)
> >>
> >> Thank you in advance!
> >>
> >> Some points are raised during discussion but missed in this version:
> >> 1) Gleb's idea that skip obsolete pages in the hast list walker
> >>
> >>    Unfortunately, it is not safe. There has a window between updating
> >>    valid-gen and reloading mmu, in that window, the obsolete page can
> >>    be used by vcpu, but the guest page table fail to be write-protected
> >>    (since the obsolete page is skipped in mmu_need_write_protect()).
> >>
> > Can you elaborate on how this can happen. valid_gen is updated under
> > mmu_lock and reloading of mmus happens under the same lock, so for all
> > other vcpus this should look like atomic thing.
> 
> You're right.
> 
> Actually, i made another optimization patch in this version that moves
> kvm_reload_remote_mmus() out of mmu-lock, but did not attach it into this
> series. It seems my brain is not parallel-able enough. :(
Yours is the most parallel-able I ever saw :)

--
			Gleb.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-06-09  9:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-31  0:36 [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 01/11] KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 02/11] KVM: MMU: drop unnecessary kvm_reload_remote_mmus Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 03/11] KVM: MMU: fast invalidate all pages Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 04/11] KVM: x86: use the fast way to " Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 05/11] KVM: MMU: show mmu_valid_gen in shadow page related tracepoints Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 06/11] KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 07/11] KVM: MMU: do not reuse the obsolete page Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 08/11] KVM: MMU: zap pages in batch Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 09/11] KVM: MMU: collapse TLB flushes when zap all pages Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 10/11] KVM: MMU: reclaim the zapped-obsolete page first Xiao Guangrong
2013-05-31  0:36 ` [PATCH v8 11/11] KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped Xiao Guangrong
2013-06-05  1:26 ` [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Marcelo Tosatti
2013-06-05  9:51   ` Gleb Natapov
2013-06-05  9:52 ` Gleb Natapov
2013-06-09  8:53 ` Gleb Natapov
2013-06-09  9:06   ` Xiao Guangrong
2013-06-09  9:09     ` Gleb Natapov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).