All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 01/11] KVM: MMU: avoid pte_list_desc running out in kvm_mmu_pte_write
@ 2011-08-16  6:40 Xiao Guangrong
  2011-08-16  6:41 ` [PATCH 02/11] KVM: x86: tag the instructions which are used to write page table Xiao Guangrong
                   ` (9 more replies)
  0 siblings, 10 replies; 41+ messages in thread
From: Xiao Guangrong @ 2011-08-16  6:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

kvm_mmu_pte_write is unsafe since we need to alloc pte_list_desc in the
function when spte is prefetched, unfortunately, we can not know how many
spte need to be prefetched on this path, that means we can use out of the
free  pte_list_desc object in the cache, and BUG_ON() is triggered, also some
path does not fill the cache, such as INS instruction emulated that does not
trigger page fault

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 arch/x86/kvm/mmu.c |   25 ++++++++++++++++++++-----
 1 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5d7fbf0..b01afee 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -592,6 +592,11 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
 	return 0;
 }
 
+static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *cache)
+{
+	return cache->nobjs;
+}
+
 static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
 				  struct kmem_cache *cache)
 {
@@ -969,6 +974,14 @@ static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
 	return &linfo->rmap_pde;
 }
 
+static bool rmap_can_add(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu_memory_cache *cache;
+
+	cache = &vcpu->arch.mmu_pte_list_desc_cache;
+	return mmu_memory_cache_free_objects(cache);
+}
+
 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 {
 	struct kvm_mmu_page *sp;
@@ -3585,6 +3598,12 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		break;
 	}
 
+	/*
+	 * No need to care whether allocation memory is successful
+	 * or not since pte prefetch is skiped if it does not have
+	 * enough objects in the cache.
+	 */
+	mmu_topup_memory_caches(vcpu);
 	spin_lock(&vcpu->kvm->mmu_lock);
 	if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
 		gentry = 0;
@@ -3655,7 +3674,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 			mmu_page_zap_pte(vcpu->kvm, sp, spte);
 			if (gentry &&
 			      !((sp->role.word ^ vcpu->arch.mmu.base_role.word)
-			      & mask.word))
+			      & mask.word) && rmap_can_add(vcpu))
 				mmu_pte_write_new_pte(vcpu, sp, spte, &gentry);
 			if (!remote_flush && need_remote_flush(entry, *spte))
 				remote_flush = true;
@@ -3716,10 +3735,6 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
 		goto out;
 	}
 
-	r = mmu_topup_memory_caches(vcpu);
-	if (r)
-		goto out;
-
 	er = x86_emulate_instruction(vcpu, cr2, 0, insn, insn_len);
 
 	switch (er) {
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread
* [PATCH 0/11] KVM: x86: optimize for guest page written
@ 2011-07-26 11:24 Xiao Guangrong
  2011-07-26 11:32 ` [PATCH 11/11] KVM: MMU: improve write flooding detected Xiao Guangrong
  0 siblings, 1 reply; 41+ messages in thread
From: Xiao Guangrong @ 2011-07-26 11:24 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

Too keep shadow page consistency, we should write-protect the guest page if
if it is a page structure. Unfortunately, even if the guest page structure is
tear-down and is used for other usage, we still write-protect it and cause page
fault if it is written, in this case, we need to zap the corresponding shadow
page and let the guest page became normal as possible, that is just what
kvm_mmu_pte_write does, however, sometimes, it does not work well:
- kvm_mmu_pte_write is unsafe since we need to alloc pte_list_desc in the
  function when spte is prefetched, unfortunately, we can not know how many
  spte need to be prefetched on this path, that means we can use out of the
  free  pte_list_desc object in the cache, and BUG_ON() is triggered, also some
  path does not fill the cache, such as INS instruction emulated that does not
  trigger page fault.


- we usually use repeat string instructions to clear the page, for example,
  we call memset to clear a page table, 'stosb' is used in this function, and 
  repeated for 1024 times, that means we should occupy mmu lock for 1024 times
  and walking shadow page cache for 1024 times, it is terrible.

- Sometimes, we only modify the last one byte of a pte to update status bit,
  for example, clear_bit is used to clear r/w bit in linux kernel and 'andb'
  instruction is used in this function, in this case, kvm_mmu_pte_write will
  treat it as misaligned access, and the shadow page table is zapped.

- detecting write-flooding does not work well, when we handle page written, if
  the last speculative spte is not accessed, we treat the page is
  write-flooding, however, we can speculative spte on many path, such as pte
  prefetch, page synced, that means the last speculative spte may be not point
  to the written page and the written page can be accessed via other sptes, so
  depends on the Accessed bit of the last speculative spte is not enough.


In this patchset, we fixed/avoided these issues:
- instead of filling the cache in page fault path, we do it in
  kvm_mmu_pte_write, and do not prefetch the spte if it dose not have free
  pte_list_desc object in the cache.

- if it is the repeat string instructions emulated and it is not a IO/MMIO
  access, we can zap all the corresponding shadow pages and return to the guest
  then, the mapping can became writable and directly write the page

- do not zap the shadow page if it only modify the last byte of pte.

- Instead of detected page accessed, we can detect whether the spte is accessed
  or not, if the spte is not accessed but it is written frequently, we treat is
  not a page table or it not used for a long time.


Performance test result:
the performance is obvious improved tested by kernebench:

Before patchset      After patchset
3m0.094s               2m50.177s
3m1.813s               2m52.774s
3m6.239                2m51.512





^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2011-08-26 14:22 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-16  6:40 [PATCH 01/11] KVM: MMU: avoid pte_list_desc running out in kvm_mmu_pte_write Xiao Guangrong
2011-08-16  6:41 ` [PATCH 02/11] KVM: x86: tag the instructions which are used to write page table Xiao Guangrong
2011-08-22 14:32   ` Marcelo Tosatti
2011-08-22 14:36     ` Avi Kivity
2011-08-16  6:42 ` [PATCH 03/11] KVM: x86: retry non-page-table writing instruction Xiao Guangrong
2011-08-22 19:59   ` Marcelo Tosatti
2011-08-22 20:21     ` Xiao Guangrong
2011-08-22 20:42       ` Marcelo Tosatti
2011-08-16  6:42 ` [PATCH 04/11] KVM: x86: cleanup port-in/port-out emulated Xiao Guangrong
2011-08-16  6:43 ` [PATCH 05/11] KVM: MMU: do not mark access bit on pte write path Xiao Guangrong
2011-08-16  6:44 ` [PATCH 06/11] KVM: MMU: cleanup FNAME(invlpg) Xiao Guangrong
2011-08-16  6:44 ` [PATCH 07/11] KVM: MMU: fast prefetch spte on invlpg path Xiao Guangrong
2011-08-22 22:28   ` Marcelo Tosatti
2011-08-23  1:50     ` Xiao Guangrong
2011-08-16  6:45 ` [PATCH 08/11] KVM: MMU: remove unnecessary kvm_mmu_free_some_pages Xiao Guangrong
2011-08-16  6:45 ` [PATCH 09/11] KVM: MMU: split kvm_mmu_pte_write function Xiao Guangrong
2011-08-16  6:46 ` [PATCH 10/11] KVM: MMU: fix detecting misaligned accessed Xiao Guangrong
2011-08-16  6:46 ` [PATCH 11/11] KVM: MMU: improve write flooding detected Xiao Guangrong
2011-08-23  8:00   ` Marcelo Tosatti
2011-08-23 10:55     ` Xiao Guangrong
2011-08-23 12:38       ` Marcelo Tosatti
2011-08-23 16:32         ` Xiao Guangrong
2011-08-23 19:09           ` Marcelo Tosatti
2011-08-23 20:16             ` Xiao Guangrong
2011-08-24 20:05               ` Marcelo Tosatti
2011-08-25  2:04                 ` Marcelo Tosatti
2011-08-25  4:42                   ` Avi Kivity
2011-08-25 13:21                     ` Marcelo Tosatti
2011-08-25 14:06                       ` Avi Kivity
2011-08-25 14:07                         ` Avi Kivity
2011-08-25  7:40                   ` Xiao Guangrong
2011-08-25  7:57             ` Xiao Guangrong
2011-08-25 13:47               ` Marcelo Tosatti
2011-08-26  3:18                 ` Xiao Guangrong
2011-08-26 10:53                   ` Marcelo Tosatti
2011-08-26 14:24                     ` Xiao Guangrong
  -- strict thread matches above, loose matches on Subject: below --
2011-07-26 11:24 [PATCH 0/11] KVM: x86: optimize for guest page written Xiao Guangrong
2011-07-26 11:32 ` [PATCH 11/11] KVM: MMU: improve write flooding detected Xiao Guangrong
2011-07-27  9:23   ` Avi Kivity
2011-07-27 10:20     ` Xiao Guangrong
2011-07-27 11:08       ` Avi Kivity
2011-07-28  2:43         ` Xiao Guangrong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.