All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/16] KVM: MMU: fast page fault
@ 2012-04-13 10:05 Xiao Guangrong
  2012-04-13 10:09 ` [PATCH v2 01/16] KVM: MMU: cleanup __direct_map Xiao Guangrong
                   ` (16 more replies)
  0 siblings, 17 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:05 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

Thanks for Avi and Marcelo's review, i have simplified the whole things
in this version:
- it only fix the page fault with PFEC.P = 1 && PFEC.W = 0 that means
  unlock set_spte path can be dropped.

- it only fixes the page fault caused by dirty-log

In this version, all the information we need is from spte, the
SPTE_ALLOW_WRITE bit and SPTE_WRITE_PROTECT bit:
   - SPTE_ALLOW_WRITE is set if the gpte is writable and the pfn pointed
     by the spte is writable on host.
   - SPTE_WRITE_PROTECT is set if the spte is write-protected by shadow
     page table protection.

All these bits can be protected by cmpxchg, now, all the things is fairly
simple than before. :)

Performance test:

autotest migration:
(Host: Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz * 12 + 32G)

- For ept:

Before:
                    smp2.Fedora.16.64.migrate
Times   .unix      .with_autotest.dbench.unix     total
 1       104           214                         323
 2       68            238                         310
 3       68            242                         314

After:
                    smp2.Fedora.16.64.migrate
Times   .unix      .with_autotest.dbench.unix     total
 1       101           190                         295
 2       67            188                         259
 3       66            217                         289


- For shadow mmu:

Before:
                    smp2.Fedora.16.64.migrate
Times   .unix      .with_autotest.dbench.unix     total
 1       103           235                         342
 2       64            219                         286
 3       68            234                         305

After:
                    smp2.Fedora.16.64.migrate
Times   .unix      .with_autotest.dbench.unix     total
 1       104           220                         328
 2       65            204                         273
 3       64            219                         286


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v2 01/16] KVM: MMU: cleanup __direct_map
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
@ 2012-04-13 10:09 ` Xiao Guangrong
  2012-04-13 10:10 ` [PATCH v2 02/16] KVM: MMU: introduce mmu_spte_establish Xiao Guangrong
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:09 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Use link_shadow_page to link the sp to the spte in __direct_map

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 29ad6f9..151fcddc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1850,9 +1850,9 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
 {
 	u64 spte;

-	spte = __pa(sp->spt)
-		| PT_PRESENT_MASK | PT_ACCESSED_MASK
-		| PT_WRITABLE_MASK | PT_USER_MASK;
+	spte = __pa(sp->spt) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
+	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+
 	mmu_spte_set(sptep, spte);
 }

@@ -2541,11 +2541,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 				return -ENOMEM;
 			}

-			mmu_spte_set(iterator.sptep,
-				     __pa(sp->spt)
-				     | PT_PRESENT_MASK | PT_WRITABLE_MASK
-				     | shadow_user_mask | shadow_x_mask
-				     | shadow_accessed_mask);
+			link_shadow_page(iterator.sptep, sp);
 		}
 	}
 	return emulate;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 02/16] KVM: MMU: introduce mmu_spte_establish
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
  2012-04-13 10:09 ` [PATCH v2 01/16] KVM: MMU: cleanup __direct_map Xiao Guangrong
@ 2012-04-13 10:10 ` Xiao Guangrong
  2012-04-13 10:10 ` [PATCH v2 03/16] KVM: MMU: properly assert spte on rmap walking path Xiao Guangrong
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:10 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

It is used to establish the spte if it is not present to cleanup the
code, it also marks spte present before linking it to the sp's
parent_list, then we can integrate the code between rmap walking and
parent_lisk walking in the later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c         |   68 +++++++++++++++++++++----------------------
 arch/x86/kvm/paging_tmpl.h |   16 ++++------
 2 files changed, 40 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 151fcddc..aee13c6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1368,9 +1368,6 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn)
 static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
 				    struct kvm_mmu_page *sp, u64 *parent_pte)
 {
-	if (!parent_pte)
-		return;
-
 	pte_list_add(vcpu, parent_pte, &sp->parent_ptes);
 }

@@ -1401,7 +1398,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
 	bitmap_zero(sp->slot_bitmap, KVM_MEM_SLOTS_NUM);
 	sp->parent_ptes = 0;
-	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
 	return sp;
 }
@@ -1765,12 +1761,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
 			break;

-		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
-		if (sp->unsync_children) {
+		if (sp->unsync_children)
 			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-			kvm_mmu_mark_parents_unsync(sp);
-		} else if (sp->unsync)
-			kvm_mmu_mark_parents_unsync(sp);

 		__clear_sp_write_flooding_count(sp);
 		trace_kvm_mmu_get_page(sp, false);
@@ -1797,6 +1789,35 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	return sp;
 }

+static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
+{
+	u64 spte;
+
+	spte = __pa(sp->spt) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
+	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+
+	mmu_spte_set(sptep, spte);
+}
+
+static struct kvm_mmu_page *
+mmu_spte_establish(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, gva_t gaddr,
+		   unsigned level, int direct, unsigned access)
+{
+	struct kvm_mmu_page *sp;
+
+	WARN_ON(is_shadow_present_pte(*spte));
+
+	sp = kvm_mmu_get_page(vcpu, gfn, gaddr, level, direct, access, spte);
+
+	link_shadow_page(spte, sp);
+	mmu_page_add_parent_pte(vcpu, sp, spte);
+
+	if (sp->unsync_children || sp->unsync)
+		kvm_mmu_mark_parents_unsync(sp);
+
+	return sp;
+}
+
 static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
 			     struct kvm_vcpu *vcpu, u64 addr)
 {
@@ -1846,16 +1867,6 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
 	return __shadow_walk_next(iterator, *iterator->sptep);
 }

-static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
-{
-	u64 spte;
-
-	spte = __pa(sp->spt) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
-	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
-
-	mmu_spte_set(sptep, spte);
-}
-
 static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
 {
 	if (is_large_pte(*sptep)) {
@@ -1921,11 +1932,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
 		mmu_page_zap_pte(kvm, sp, sp->spt + i);
 }

-static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-	mmu_page_remove_parent_pte(sp, parent_pte);
-}
-
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	u64 *sptep;
@@ -2511,7 +2517,6 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 			bool prefault)
 {
 	struct kvm_shadow_walk_iterator iterator;
-	struct kvm_mmu_page *sp;
 	int emulate = 0;
 	gfn_t pseudo_gfn;

@@ -2532,16 +2537,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,

 			base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
 			pseudo_gfn = base_addr >> PAGE_SHIFT;
-			sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
-					      iterator.level - 1,
-					      1, ACC_ALL, iterator.sptep);
-			if (!sp) {
-				pgprintk("nonpaging_map: ENOMEM\n");
-				kvm_release_pfn_clean(pfn);
-				return -ENOMEM;
-			}
-
-			link_shadow_page(iterator.sptep, sp);
+			mmu_spte_establish(vcpu, iterator.sptep, pseudo_gfn,
+					   iterator.addr, iterator.level - 1,
+					   1, ACC_ALL);
 		}
 	}
 	return emulate;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index df5a703..e3cf43b 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -503,8 +503,9 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 		sp = NULL;
 		if (!is_shadow_present_pte(*it.sptep)) {
 			table_gfn = gw->table_gfn[it.level - 2];
-			sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
-					      false, access, it.sptep);
+			sp = mmu_spte_establish(vcpu, it.sptep, table_gfn,
+						addr, it.level-1, false,
+						access);
 		}

 		/*
@@ -513,9 +514,6 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 		 */
 		if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))
 			goto out_gpte_changed;
-
-		if (sp)
-			link_shadow_page(it.sptep, sp);
 	}

 	for (;
@@ -533,9 +531,8 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,

 		direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);

-		sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
-				      true, direct_access, it.sptep);
-		link_shadow_page(it.sptep, sp);
+		mmu_spte_establish(vcpu, it.sptep, direct_gfn, addr, it.level-1,
+				      true, direct_access);
 	}

 	clear_sp_write_flooding_count(it.sptep);
@@ -548,7 +545,8 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,

 out_gpte_changed:
 	if (sp)
-		kvm_mmu_put_page(sp, it.sptep);
+		drop_parent_pte(sp, it.sptep);
+
 	kvm_release_pfn_clean(pfn);
 	return NULL;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 03/16] KVM: MMU: properly assert spte on rmap walking path
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
  2012-04-13 10:09 ` [PATCH v2 01/16] KVM: MMU: cleanup __direct_map Xiao Guangrong
  2012-04-13 10:10 ` [PATCH v2 02/16] KVM: MMU: introduce mmu_spte_establish Xiao Guangrong
@ 2012-04-13 10:10 ` Xiao Guangrong
  2012-04-14  2:15   ` Takuya Yoshikawa
  2012-04-13 10:11 ` [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect Xiao Guangrong
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:10 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Only test present bit is not enough since mmio spte is also set this
bit, use is_shadow_present_pte() instead of it

Also move the BUG_ONs to the common function to cleanup the code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   38 ++++++++++++++++++++------------------
 1 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index aee13c6..91518b6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -993,17 +993,25 @@ struct rmap_iterator {
  */
 static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
 {
+	u64 *sptep;
+
 	if (!rmap)
 		return NULL;

 	if (!(rmap & 1)) {
 		iter->desc = NULL;
-		return (u64 *)rmap;
+		sptep = (u64 *)rmap;
+
+		goto exit;
 	}

 	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
 	iter->pos = 0;
-	return iter->desc->sptes[iter->pos];
+	sptep = iter->desc->sptes[iter->pos];
+
+exit:
+	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
+	return sptep;
 }

 /*
@@ -1013,14 +1021,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
  */
 static u64 *rmap_get_next(struct rmap_iterator *iter)
 {
+	u64 *sptep = NULL;
+
 	if (iter->desc) {
 		if (iter->pos < PTE_LIST_EXT - 1) {
-			u64 *sptep;
-
 			++iter->pos;
 			sptep = iter->desc->sptes[iter->pos];
 			if (sptep)
-				return sptep;
+				goto exit;
 		}

 		iter->desc = iter->desc->more;
@@ -1028,11 +1036,14 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
 		if (iter->desc) {
 			iter->pos = 0;
 			/* desc->sptes[0] cannot be NULL */
-			return iter->desc->sptes[iter->pos];
+			sptep = iter->desc->sptes[iter->pos];
+			goto exit;
 		}
 	}

-	return NULL;
+exit:
+	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
+	return sptep;
 }

 static void drop_spte(struct kvm *kvm, u64 *sptep)
@@ -1048,7 +1059,6 @@ static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level
 	int write_protected = 0;

 	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
 		rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);

 		if (!is_writable_pte(*sptep)) {
@@ -1123,7 +1133,6 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	int need_tlb_flush = 0;

 	while ((sptep = rmap_get_first(*rmapp, &iter))) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
 		rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", sptep, *sptep);

 		drop_spte(kvm, sptep);
@@ -1147,7 +1156,6 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	new_pfn = pte_pfn(*ptep);

 	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
-		BUG_ON(!is_shadow_present_pte(*sptep));
 		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", sptep, *sptep);

 		need_flush = 1;
@@ -1242,14 +1250,11 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		return kvm_unmap_rmapp(kvm, rmapp, data);

 	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter)) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+	     sptep = rmap_get_next(&iter))
 		if (*sptep & PT_ACCESSED_MASK) {
 			young = 1;
 			clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)sptep);
 		}
-	}

 	return young;
 }
@@ -1270,14 +1275,11 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		goto out;

 	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter)) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+	     sptep = rmap_get_next(&iter))
 		if (*sptep & PT_ACCESSED_MASK) {
 			young = 1;
 			break;
 		}
-	}
 out:
 	return young;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (2 preceding siblings ...)
  2012-04-13 10:10 ` [PATCH v2 03/16] KVM: MMU: properly assert spte on rmap walking path Xiao Guangrong
@ 2012-04-13 10:11 ` Xiao Guangrong
  2012-04-14  2:00   ` Takuya Yoshikawa
  2012-04-13 10:11 ` [PATCH v2 05/16] KVM: MMU: abstract spte write-protect Xiao Guangrong
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:11 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

The reture value of __rmap_write_protect is either 1 or 0, use
true/false instead of these

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 91518b6..7589e56 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1052,11 +1052,12 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
 		rmap_remove(kvm, sptep);
 }

-static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level)
+static bool
+__rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
-	int write_protected = 0;
+	bool write_protected = false;

 	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
 		rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);
@@ -1076,7 +1077,7 @@ static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level
 			sptep = rmap_get_first(*rmapp, &iter);
 		}

-		write_protected = 1;
+		write_protected = true;
 	}

 	return write_protected;
@@ -1107,12 +1108,12 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	}
 }

-static int rmap_write_protect(struct kvm *kvm, u64 gfn)
+static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
 {
 	struct kvm_memory_slot *slot;
 	unsigned long *rmapp;
 	int i;
-	int write_protected = 0;
+	bool write_protected = false;

 	slot = gfn_to_memslot(kvm, gfn);

@@ -1689,7 +1690,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,

 	kvm_mmu_pages_init(parent, &parents, &pages);
 	while (mmu_unsync_walk(parent, &pages)) {
-		int protected = 0;
+		bool protected = false;

 		for_each_sp(pages, sp, parents, i)
 			protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 05/16] KVM: MMU: abstract spte write-protect
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (3 preceding siblings ...)
  2012-04-13 10:11 ` [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect Xiao Guangrong
@ 2012-04-13 10:11 ` Xiao Guangrong
  2012-04-14  2:26   ` Takuya Yoshikawa
  2012-04-13 10:12 ` [PATCH v2 06/16] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:11 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Introduce a common function to abstract spte write-protect to
cleanup the code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   60 ++++++++++++++++++++++++++++++---------------------
 1 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7589e56..a1c3628 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1052,6 +1052,34 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
 		rmap_remove(kvm, sptep);
 }

+/* Return true if the spte is dropped. */
+static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
+			       bool *flush)
+{
+	u64 spte = *sptep;
+
+	if (!is_writable_pte(spte))
+		return false;
+
+	*flush |= true;
+
+	if (large) {
+		pgprintk("rmap_write_protect(large): spte %p %llx\n",
+			 spte, *spte);
+		BUG_ON(!is_large_pte(spte));
+
+		drop_spte(kvm, sptep);
+		--kvm->stat.lpages;
+		return true;
+	}
+
+	rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
+	spte = spte & ~PT_WRITABLE_MASK;
+	mmu_spte_update(sptep, spte);
+
+	return false;
+}
+
 static bool
 __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level)
 {
@@ -1060,24 +1088,13 @@ __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level)
 	bool write_protected = false;

 	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
-		rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);
-
-		if (!is_writable_pte(*sptep)) {
-			sptep = rmap_get_next(&iter);
-			continue;
-		}
-
-		if (level == PT_PAGE_TABLE_LEVEL) {
-			mmu_spte_update(sptep, *sptep & ~PT_WRITABLE_MASK);
-			sptep = rmap_get_next(&iter);
-		} else {
-			BUG_ON(!is_large_pte(*sptep));
-			drop_spte(kvm, sptep);
-			--kvm->stat.lpages;
+		if (spte_write_protect(kvm, sptep, level > PT_PAGE_TABLE_LEVEL,
+			  &write_protected)) {
 			sptep = rmap_get_first(*rmapp, &iter);
+			continue;
 		}

-		write_protected = true;
+		sptep = rmap_get_next(&iter);
 	}

 	return write_protected;
@@ -3898,6 +3915,7 @@ int kvm_mmu_setup(struct kvm_vcpu *vcpu)
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
 {
 	struct kvm_mmu_page *sp;
+	bool flush = false;

 	list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
 		int i;
@@ -3912,16 +3930,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
 			      !is_last_spte(pt[i], sp->role.level))
 				continue;

-			if (is_large_pte(pt[i])) {
-				drop_spte(kvm, &pt[i]);
-				--kvm->stat.lpages;
-				continue;
-			}
-
-			/* avoid RMW */
-			if (is_writable_pte(pt[i]))
-				mmu_spte_update(&pt[i],
-						pt[i] & ~PT_WRITABLE_MASK);
+			spte_write_protect(kvm, &pt[i],
+					   is_large_pte(pt[i]), &flush);
 		}
 	}
 	kvm_flush_remote_tlbs(kvm);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 06/16] KVM: VMX: export PFEC.P bit on ept
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (4 preceding siblings ...)
  2012-04-13 10:11 ` [PATCH v2 05/16] KVM: MMU: abstract spte write-protect Xiao Guangrong
@ 2012-04-13 10:12 ` Xiao Guangrong
  2012-04-13 10:12 ` [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte Xiao Guangrong
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:12 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Export the present bit of page fault error code, the later patch
will use it

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/vmx.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 52f6856..2c98057 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4735,6 +4735,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification;
 	gpa_t gpa;
+	u32 error_code;
 	int gla_validity;

 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
@@ -4759,7 +4760,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)

 	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
 	trace_kvm_page_fault(gpa, exit_qualification);
-	return kvm_mmu_page_fault(vcpu, gpa, exit_qualification & 0x3, NULL, 0);
+
+	/* It is a write fault? */
+	error_code = exit_qualification & (1U << 1);
+	/* ept page table is present? */
+	error_code |= (exit_qualification >> 3) & 0x1;
+
+	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }

 static u64 ept_rsvd_mask(u64 spte, int level)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (5 preceding siblings ...)
  2012-04-13 10:12 ` [PATCH v2 06/16] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
@ 2012-04-13 10:12 ` Xiao Guangrong
  2012-04-14  2:44   ` Takuya Yoshikawa
  2012-04-13 10:13 ` [PATCH v2 08/16] KVM: MMU: store more bits in rmap Xiao Guangrong
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:12 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

It is used to walk all the sptes of the specified pte_list, after
this, the code of pte_list_walk can be removed

And it can restart the walking automatically if the spte is zapped

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c       |  233 +++++++++++++++++++++++-----------------------
 arch/x86/kvm/mmu_audit.c |    6 +-
 2 files changed, 118 insertions(+), 121 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a1c3628..4e91e94 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -900,26 +900,110 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
 	}
 }

-typedef void (*pte_list_walk_fn) (u64 *spte);
-static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
-{
+/*
+ * Used by the following functions to iterate through the sptes linked by a
+ * pte_list. All fields are private and not assumed to be used outside.
+ */
+struct spte_iterator {
+	/* private fields */
+	unsigned long *pte_list;
+	/* holds the sptep if not NULL */
 	struct pte_list_desc *desc;
-	int i;
+	/* index of the sptep, -1 means the walking does not start. */
+	int pos;
+};

-	if (!*pte_list)
-		return;
+static void pte_list_walk_init(struct spte_iterator *iter,
+			       unsigned long *pte_list)
+{
+	iter->pte_list = pte_list;
+	iter->pos = -1;
+}
+
+static void pte_list_walk_check_restart(struct spte_iterator *iter, u64 *spte)
+{
+	/*
+	 * If the spte is zapped, we should set 'iter->pos = -1' to restart
+	 * the walk.
+	 */
+	if (!is_shadow_present_pte(*spte))
+		iter->pos = -1;
+}
+
+static u64 *pte_list_first(struct spte_iterator *iter)
+{
+	unsigned long pte_list = *iter->pte_list;
+	u64 *sptep;
+
+	if (!pte_list)
+		return NULL;
+
+	if (!(pte_list & 1)) {
+		iter->desc = NULL;
+		iter->pos = 0;
+		sptep = (u64 *)pte_list;
+
+		goto exit;
+	}
+
+	iter->desc = (struct pte_list_desc *)(pte_list & ~1ul);
+	iter->pos = 0;
+	sptep = iter->desc->sptes[iter->pos];
+
+exit:
+	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
+	return sptep;
+}

-	if (!(*pte_list & 1))
-		return fn((u64 *)*pte_list);
+static u64 *pte_list_next(struct spte_iterator *iter)
+{
+	u64 *sptep = NULL;

-	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
-	while (desc) {
-		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
-			fn(desc->sptes[i]);
-		desc = desc->more;
+	if (iter->pos < 0)
+		return pte_list_first(iter);
+
+	if (iter->desc) {
+		if (iter->pos < PTE_LIST_EXT - 1) {
+			++iter->pos;
+			sptep = iter->desc->sptes[iter->pos];
+			if (sptep)
+				goto exit;
+		}
+
+		iter->desc = iter->desc->more;
+
+		if (iter->desc) {
+			iter->pos = 0;
+			/* desc->sptes[0] cannot be NULL */
+			sptep = iter->desc->sptes[iter->pos];
+			goto exit;
+		}
 	}
+
+exit:
+	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
+	return sptep;
 }

+
+#define for_each_pte_list_spte(pte_list, iter, spte)	\
+	for (pte_list_walk_init(iter, pte_list);	\
+	      (spte = pte_list_next(iter)) != NULL;	\
+		pte_list_walk_check_restart(iter, spte))
+
+typedef void (*pte_list_walk_fn) (u64 *spte);
+static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
+{
+	struct spte_iterator iter;
+	u64 *spte;
+
+	for_each_pte_list_spte(pte_list, &iter, spte)
+		fn(spte);
+}
+
+#define for_each_rmap_spte(rmap, iter, spte)	\
+		for_each_pte_list_spte(rmap, iter, spte)
+
 static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
 				    struct kvm_memory_slot *slot)
 {
@@ -974,92 +1058,19 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	pte_list_remove(spte, rmapp);
 }

-/*
- * Used by the following functions to iterate through the sptes linked by a
- * rmap.  All fields are private and not assumed to be used outside.
- */
-struct rmap_iterator {
-	/* private fields */
-	struct pte_list_desc *desc;	/* holds the sptep if not NULL */
-	int pos;			/* index of the sptep */
-};
-
-/*
- * Iteration must be started by this function.  This should also be used after
- * removing/dropping sptes from the rmap link because in such cases the
- * information in the itererator may not be valid.
- *
- * Returns sptep if found, NULL otherwise.
- */
-static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
-{
-	u64 *sptep;
-
-	if (!rmap)
-		return NULL;
-
-	if (!(rmap & 1)) {
-		iter->desc = NULL;
-		sptep = (u64 *)rmap;
-
-		goto exit;
-	}
-
-	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
-	iter->pos = 0;
-	sptep = iter->desc->sptes[iter->pos];
-
-exit:
-	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
-	return sptep;
-}
-
-/*
- * Must be used with a valid iterator: e.g. after rmap_get_first().
- *
- * Returns sptep if found, NULL otherwise.
- */
-static u64 *rmap_get_next(struct rmap_iterator *iter)
-{
-	u64 *sptep = NULL;
-
-	if (iter->desc) {
-		if (iter->pos < PTE_LIST_EXT - 1) {
-			++iter->pos;
-			sptep = iter->desc->sptes[iter->pos];
-			if (sptep)
-				goto exit;
-		}
-
-		iter->desc = iter->desc->more;
-
-		if (iter->desc) {
-			iter->pos = 0;
-			/* desc->sptes[0] cannot be NULL */
-			sptep = iter->desc->sptes[iter->pos];
-			goto exit;
-		}
-	}
-
-exit:
-	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
-	return sptep;
-}
-
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
 	if (mmu_spte_clear_track_bits(sptep))
 		rmap_remove(kvm, sptep);
 }

-/* Return true if the spte is dropped. */
-static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
+static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
 			       bool *flush)
 {
 	u64 spte = *sptep;

 	if (!is_writable_pte(spte))
-		return false;
+		return;

 	*flush |= true;

@@ -1070,32 +1081,24 @@ static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,

 		drop_spte(kvm, sptep);
 		--kvm->stat.lpages;
-		return true;
+		return;
 	}

 	rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
 	spte = spte & ~PT_WRITABLE_MASK;
 	mmu_spte_update(sptep, spte);
-
-	return false;
 }

 static bool
 __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct spte_iterator iter;
 	bool write_protected = false;

-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
-		if (spte_write_protect(kvm, sptep, level > PT_PAGE_TABLE_LEVEL,
-			  &write_protected)) {
-			sptep = rmap_get_first(*rmapp, &iter);
-			continue;
-		}
-
-		sptep = rmap_get_next(&iter);
-	}
+	for_each_rmap_spte(rmapp, &iter, sptep)
+		spte_write_protect(kvm, sptep, level > PT_PAGE_TABLE_LEVEL,
+			  &write_protected);

 	return write_protected;
 }
@@ -1147,10 +1150,10 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			   unsigned long data)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct spte_iterator iter;
 	int need_tlb_flush = 0;

-	while ((sptep = rmap_get_first(*rmapp, &iter))) {
+	for_each_rmap_spte(rmapp, &iter, sptep) {
 		rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", sptep, *sptep);

 		drop_spte(kvm, sptep);
@@ -1164,7 +1167,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			     unsigned long data)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct spte_iterator iter;
 	int need_flush = 0;
 	u64 new_spte;
 	pte_t *ptep = (pte_t *)data;
@@ -1173,15 +1176,14 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	WARN_ON(pte_huge(*ptep));
 	new_pfn = pte_pfn(*ptep);

-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
+	for_each_rmap_spte(rmapp, &iter, sptep) {
 		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", sptep, *sptep);

 		need_flush = 1;

-		if (pte_write(*ptep)) {
+		if (pte_write(*ptep))
 			drop_spte(kvm, sptep);
-			sptep = rmap_get_first(*rmapp, &iter);
-		} else {
+		else {
 			new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
 			new_spte |= (u64)new_pfn << PAGE_SHIFT;

@@ -1191,7 +1193,6 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,

 			mmu_spte_clear_track_bits(sptep);
 			mmu_spte_set(sptep, new_spte);
-			sptep = rmap_get_next(&iter);
 		}
 	}

@@ -1254,7 +1255,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			 unsigned long data)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct spte_iterator iter;
 	int young = 0;

 	/*
@@ -1267,8 +1268,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	if (!shadow_accessed_mask)
 		return kvm_unmap_rmapp(kvm, rmapp, data);

-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter))
+	for_each_rmap_spte(rmapp, &iter, sptep)
 		if (*sptep & PT_ACCESSED_MASK) {
 			young = 1;
 			clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)sptep);
@@ -1281,7 +1281,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			      unsigned long data)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct spte_iterator iter;
 	int young = 0;

 	/*
@@ -1292,8 +1292,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	if (!shadow_accessed_mask)
 		goto out;

-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter))
+	for_each_rmap_spte(rmapp, &iter, sptep)
 		if (*sptep & PT_ACCESSED_MASK) {
 			young = 1;
 			break;
@@ -1955,9 +1954,9 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct spte_iterator iter;

-	while ((sptep = rmap_get_first(sp->parent_ptes, &iter)))
+	for_each_pte_list_spte(&sp->parent_ptes, &iter, sptep)
 		drop_parent_pte(sp, sptep);
 }

diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 7d7d0b9..a42bd85 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -193,7 +193,7 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
 	struct kvm_memory_slot *slot;
 	unsigned long *rmapp;
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct spte_iterator iter;

 	if (sp->role.direct || sp->unsync || sp->role.invalid)
 		return;
@@ -201,13 +201,11 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slot = gfn_to_memslot(kvm, sp->gfn);
 	rmapp = &slot->rmap[sp->gfn - slot->base_gfn];

-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter)) {
+	for_each_rmap_spte(rmapp, &iter, sptep)
 		if (is_writable_pte(*sptep))
 			audit_printk(kvm, "shadow page has writable "
 				     "mappings: gfn %llx role %x\n",
 				     sp->gfn, sp->role.word);
-	}
 }

 static void audit_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 08/16] KVM: MMU: store more bits in rmap
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (6 preceding siblings ...)
  2012-04-13 10:12 ` [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte Xiao Guangrong
@ 2012-04-13 10:13 ` Xiao Guangrong
  2012-04-13 10:13 ` [PATCH v2 09/16] KVM: MMU: fast mmu_need_write_protect path for hard mmu Xiao Guangrong
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:13 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

In current code, only one bit (bit 0) is used in rmap, this patch
export more bits from rmap, during spte add/remove, only bit 0 is
touched and other bits are keeped

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |  138 ++++++++++++++++++++++++++++++++--------------------
 1 files changed, 85 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4e91e94..53e92de 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -796,13 +796,24 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
 	return level - 1;
 }

+#define PTE_LIST_DESC		(0x1ull)
+#define PTE_LIST_FLAG_MASK	(0x3ull)
+
+static void
+pte_list_decode(const unsigned long *pte_list, unsigned long *map,
+		unsigned long *flags)
+{
+	*map = *pte_list & (~PTE_LIST_FLAG_MASK);
+	*flags = *pte_list & PTE_LIST_FLAG_MASK;
+}
+
 /*
  * Pte mapping structures:
  *
- * If pte_list bit zero is zero, then pte_list point to the spte.
+ * If PTE_LIST_DESC bit is zero, then pte_list point to the spte.
  *
- * If pte_list bit zero is one, (then pte_list & ~1) points to a struct
- * pte_list_desc containing more mappings.
+ * If PTE_LIST_DESC bit is one, (then pte_list & ~PTE_LIST_FLAG_MASK) points
+ * to a struct pte_list_desc containing more mappings.
  *
  * Returns the number of pte entries before the spte was added or zero if
  * the spte was not added.
@@ -812,39 +823,52 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
 			unsigned long *pte_list)
 {
 	struct pte_list_desc *desc;
+	unsigned long map, flags;
 	int i, count = 0;

-	if (!*pte_list) {
+	pte_list_decode(pte_list, &map, &flags);
+
+	if (!map) {
 		rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte);
-		*pte_list = (unsigned long)spte;
-	} else if (!(*pte_list & 1)) {
+		WARN_ON(flags & PTE_LIST_DESC);
+		*pte_list = (unsigned long)spte | flags;
+
+		 return 0;
+	}
+
+	if (!(flags & PTE_LIST_DESC)) {
 		rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte);
 		desc = mmu_alloc_pte_list_desc(vcpu);
-		desc->sptes[0] = (u64 *)*pte_list;
+		desc->sptes[0] = (u64 *)map;
 		desc->sptes[1] = spte;
-		*pte_list = (unsigned long)desc | 1;
-		++count;
-	} else {
-		rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte);
-		desc = (struct pte_list_desc *)(*pte_list & ~1ul);
-		while (desc->sptes[PTE_LIST_EXT-1] && desc->more) {
-			desc = desc->more;
-			count += PTE_LIST_EXT;
-		}
-		if (desc->sptes[PTE_LIST_EXT-1]) {
-			desc->more = mmu_alloc_pte_list_desc(vcpu);
-			desc = desc->more;
-		}
-		for (i = 0; desc->sptes[i]; ++i)
-			++count;
-		desc->sptes[i] = spte;
+		*pte_list = (unsigned long)desc | flags | PTE_LIST_DESC;
+
+		return 1;
+	}
+
+	rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte);
+	desc = (struct pte_list_desc *)map;
+	while (desc->sptes[PTE_LIST_EXT - 1] && desc->more) {
+		desc = desc->more;
+		count += PTE_LIST_EXT;
 	}
+
+	if (desc->sptes[PTE_LIST_EXT - 1]) {
+		desc->more = mmu_alloc_pte_list_desc(vcpu);
+		desc = desc->more;
+	}
+
+	for (i = 0; desc->sptes[i]; ++i)
+		++count;
+	desc->sptes[i] = spte;
+
 	return count;
 }

 static void
 pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
-			   int i, struct pte_list_desc *prev_desc)
+			   int i, struct pte_list_desc *prev_desc,
+			   unsigned long flags)
 {
 	int j;

@@ -855,12 +879,13 @@ pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
 	if (j != 0)
 		return;
 	if (!prev_desc && !desc->more)
-		*pte_list = (unsigned long)desc->sptes[0];
+		*pte_list = (unsigned long)desc->sptes[0] |
+					(flags & (~PTE_LIST_DESC)) ;
 	else
 		if (prev_desc)
 			prev_desc->more = desc->more;
 		else
-			*pte_list = (unsigned long)desc->more | 1;
+			*pte_list = (unsigned long)desc->more | flags;
 	mmu_free_pte_list_desc(desc);
 }

@@ -868,36 +893,42 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
 {
 	struct pte_list_desc *desc;
 	struct pte_list_desc *prev_desc;
+	unsigned long map, flags;
 	int i;

-	if (!*pte_list) {
+	pte_list_decode(pte_list, &map, &flags);
+
+	if (!map) {
 		printk(KERN_ERR "pte_list_remove: %p 0->BUG\n", spte);
 		BUG();
-	} else if (!(*pte_list & 1)) {
+		return;
+	}
+
+	if (!(flags & PTE_LIST_DESC)) {
 		rmap_printk("pte_list_remove:  %p 1->0\n", spte);
-		if ((u64 *)*pte_list != spte) {
+		if ((u64 *)map != spte) {
 			printk(KERN_ERR "pte_list_remove:  %p 1->BUG\n", spte);
 			BUG();
 		}
-		*pte_list = 0;
-	} else {
-		rmap_printk("pte_list_remove:  %p many->many\n", spte);
-		desc = (struct pte_list_desc *)(*pte_list & ~1ul);
-		prev_desc = NULL;
-		while (desc) {
-			for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
-				if (desc->sptes[i] == spte) {
-					pte_list_desc_remove_entry(pte_list,
-							       desc, i,
-							       prev_desc);
-					return;
-				}
-			prev_desc = desc;
-			desc = desc->more;
-		}
-		pr_err("pte_list_remove: %p many->many\n", spte);
-		BUG();
+		*pte_list = flags;
+		return;
+	}
+
+	rmap_printk("pte_list_remove:  %p many->many\n", spte);
+	desc = (struct pte_list_desc *)map;
+	prev_desc = NULL;
+	while (desc) {
+		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
+			if (desc->sptes[i] == spte) {
+				pte_list_desc_remove_entry(pte_list,
+						desc, i, prev_desc, flags);
+				return;
+			}
+		prev_desc = desc;
+		desc = desc->more;
 	}
+	pr_err("pte_list_remove: %p many->many\n", spte);
+	BUG();
 }

 /*
@@ -932,21 +963,22 @@ static void pte_list_walk_check_restart(struct spte_iterator *iter, u64 *spte)

 static u64 *pte_list_first(struct spte_iterator *iter)
 {
-	unsigned long pte_list = *iter->pte_list;
+	unsigned long map, flags;
 	u64 *sptep;

-	if (!pte_list)
+	pte_list_decode(iter->pte_list, &map, &flags);
+
+	if (!map)
 		return NULL;

-	if (!(pte_list & 1)) {
+	if (!(flags & PTE_LIST_DESC)) {
 		iter->desc = NULL;
 		iter->pos = 0;
-		sptep = (u64 *)pte_list;
-
+		sptep = (u64 *)map;
 		goto exit;
 	}

-	iter->desc = (struct pte_list_desc *)(pte_list & ~1ul);
+	iter->desc = (struct pte_list_desc *)map;
 	iter->pos = 0;
 	sptep = iter->desc->sptes[iter->pos];

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 09/16] KVM: MMU: fast mmu_need_write_protect path for hard mmu
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (7 preceding siblings ...)
  2012-04-13 10:13 ` [PATCH v2 08/16] KVM: MMU: store more bits in rmap Xiao Guangrong
@ 2012-04-13 10:13 ` Xiao Guangrong
  2012-04-13 10:14 ` [PATCH v2 10/16] KVM: MMU: fask check whether page is writable Xiao Guangrong
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:13 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On direct mmu without nested, all the page is not write-protected by
shadow page table protection

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 53e92de..0c6e92d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2293,6 +2293,9 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 	struct hlist_node *node;
 	bool need_unsync = false;

+	if (!vcpu->kvm->arch.indirect_shadow_pages)
+		return 0;
+
 	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) {
 		if (!can_unsync)
 			return 1;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (8 preceding siblings ...)
  2012-04-13 10:13 ` [PATCH v2 09/16] KVM: MMU: fast mmu_need_write_protect path for hard mmu Xiao Guangrong
@ 2012-04-13 10:14 ` Xiao Guangrong
  2012-04-14  3:01   ` Takuya Yoshikawa
  2012-04-15 15:16   ` Avi Kivity
  2012-04-13 10:14 ` [PATCH v2 11/16] KVM: MMU: introduce SPTE_ALLOW_WRITE bit Xiao Guangrong
                   ` (6 subsequent siblings)
  16 siblings, 2 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:14 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Using bit 1 (PTE_LIST_WP_BIT) in rmap store the write-protect status
to avoid unnecessary shadow page walking

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   40 ++++++++++++++++++++++++++++++++++------
 1 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0c6e92d..8b71908 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -796,7 +796,9 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
 	return level - 1;
 }

-#define PTE_LIST_DESC		(0x1ull)
+#define PTE_LIST_DESC_BIT	0
+#define PTE_LIST_WP_BIT	1
+#define PTE_LIST_DESC		(1 << PTE_LIST_DESC_BIT)
 #define PTE_LIST_FLAG_MASK	(0x3ull)

 static void
@@ -1067,6 +1069,12 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
 	return mmu_memory_cache_free_objects(cache);
 }

+static void host_page_write_protect(u64 *spte, unsigned long *rmapp)
+{
+	if (!(*spte & SPTE_HOST_WRITEABLE))
+		__test_and_set_bit(PTE_LIST_WP_BIT, rmapp);
+}
+
 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 {
 	struct kvm_mmu_page *sp;
@@ -1074,7 +1082,10 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)

 	sp = page_header(__pa(spte));
 	kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
+
 	rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
+	host_page_write_protect(spte, rmapp);
+
 	return pte_list_add(vcpu, spte, rmapp);
 }

@@ -1164,16 +1175,23 @@ static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
 {
 	struct kvm_memory_slot *slot;
 	unsigned long *rmapp;
-	int i;
+	int i = PT_PAGE_TABLE_LEVEL;
 	bool write_protected = false;

 	slot = gfn_to_memslot(kvm, gfn);
+	rmapp = __gfn_to_rmap(gfn, i, slot);
+
+	if (__test_and_set_bit(PTE_LIST_WP_BIT, rmapp))
+		return false;
+
+	do {
+		write_protected |= __rmap_write_protect(kvm, rmapp, i++);
+
+		if (i >= PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES)
+			break;

-	for (i = PT_PAGE_TABLE_LEVEL;
-	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
 		rmapp = __gfn_to_rmap(gfn, i, slot);
-		write_protected |= __rmap_write_protect(kvm, rmapp, i);
-	}
+	} while (true);

 	return write_protected;
 }
@@ -1225,6 +1243,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,

 			mmu_spte_clear_track_bits(sptep);
 			mmu_spte_set(sptep, new_spte);
+			host_page_write_protect(sptep, rmapp);
 		}
 	}

@@ -2291,9 +2310,15 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 {
 	struct kvm_mmu_page *s;
 	struct hlist_node *node;
+	unsigned long *rmap;
 	bool need_unsync = false;

+	rmap = gfn_to_rmap(vcpu->kvm, gfn, PT_PAGE_TABLE_LEVEL);
+
 	if (!vcpu->kvm->arch.indirect_shadow_pages)
+		goto write_free;
+
+	if (!test_bit(PTE_LIST_WP_BIT, rmap))
 		return 0;

 	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) {
@@ -2309,6 +2334,9 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 	}
 	if (need_unsync)
 		kvm_unsync_pages(vcpu, gfn);
+
+write_free:
+	__clear_bit(PTE_LIST_WP_BIT, rmap);
 	return 0;
 }

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 11/16] KVM: MMU: introduce SPTE_ALLOW_WRITE bit
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (9 preceding siblings ...)
  2012-04-13 10:14 ` [PATCH v2 10/16] KVM: MMU: fask check whether page is writable Xiao Guangrong
@ 2012-04-13 10:14 ` Xiao Guangrong
  2012-04-13 10:15 ` [PATCH v2 12/16] KVM: MMU: introduce SPTE_WRITE_PROTECT bit Xiao Guangrong
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:14 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

This bit indicates whether the spte is allow to be writable that
means the gpte of this spte is writable and the pfn pointed by
this spte is writable on host

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8b71908..1a06776 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -145,7 +145,8 @@ module_param(dbg, bool, 0644);
 #define CREATE_TRACE_POINTS
 #include "mmutrace.h"

-#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
+#define SPTE_HOST_WRITEABLE	(1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
+#define SPTE_ALLOW_WRITE	(1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))

 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)

@@ -1237,9 +1238,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
 			new_spte |= (u64)new_pfn << PAGE_SHIFT;

-			new_spte &= ~PT_WRITABLE_MASK;
-			new_spte &= ~SPTE_HOST_WRITEABLE;
-			new_spte &= ~shadow_accessed_mask;
+			new_spte &= ~(PT_WRITABLE_MASK | SPTE_HOST_WRITEABLE |
+				      shadow_accessed_mask | SPTE_ALLOW_WRITE);

 			mmu_spte_clear_track_bits(sptep);
 			mmu_spte_set(sptep, new_spte);
@@ -2386,7 +2386,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			goto done;
 		}

-		spte |= PT_WRITABLE_MASK;
+		spte |= PT_WRITABLE_MASK | SPTE_ALLOW_WRITE;

 		if (!vcpu->arch.mmu.direct_map
 		    && !(pte_access & ACC_WRITE_MASK)) {
@@ -2415,8 +2415,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 				 __func__, gfn);
 			ret = 1;
 			pte_access &= ~ACC_WRITE_MASK;
-			if (is_writable_pte(spte))
-				spte &= ~PT_WRITABLE_MASK;
+			spte &= ~PT_WRITABLE_MASK;
 		}
 	}

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 12/16] KVM: MMU: introduce SPTE_WRITE_PROTECT bit
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (10 preceding siblings ...)
  2012-04-13 10:14 ` [PATCH v2 11/16] KVM: MMU: introduce SPTE_ALLOW_WRITE bit Xiao Guangrong
@ 2012-04-13 10:15 ` Xiao Guangrong
  2012-04-13 10:15 ` [PATCH v2 13/16] KVM: MMU: break sptes write-protect if gfn is writable Xiao Guangrong
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:15 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

If this bit is set, it means the W bit of the spte is cleared due
to shadow page table protection

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   55 +++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1a06776..578a1e2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -147,6 +147,7 @@ module_param(dbg, bool, 0644);

 #define SPTE_HOST_WRITEABLE	(1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
 #define SPTE_ALLOW_WRITE	(1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
+#define SPTE_WRITE_PROTECT	(1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 2))

 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)

@@ -1108,33 +1109,49 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
 		rmap_remove(kvm, sptep);
 }

+static bool spte_wp_by_dirty_log(u64 spte)
+{
+	WARN_ON(is_writable_pte(spte));
+
+	return (spte & SPTE_ALLOW_WRITE) && !(spte & SPTE_WRITE_PROTECT);
+}
+
 static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
-			       bool *flush)
+			       bool *flush, bool page_table_protect)
 {
 	u64 spte = *sptep;

-	if (!is_writable_pte(spte))
-		return;
+	if (is_writable_pte(spte)) {
+		*flush |= true;

-	*flush |= true;
+		if (large) {
+			pgprintk("rmap_write_protect(large): spte %p %llx\n",
+				 spte, *spte);
+			BUG_ON(!is_large_pte(spte));

-	if (large) {
-		pgprintk("rmap_write_protect(large): spte %p %llx\n",
-			 spte, *spte);
-		BUG_ON(!is_large_pte(spte));
+			drop_spte(kvm, sptep);
+			--kvm->stat.lpages;
+			return;
+		}

-		drop_spte(kvm, sptep);
-		--kvm->stat.lpages;
-		return;
+		goto reset_spte;
 	}

+	if (page_table_protect && spte_wp_by_dirty_log(spte))
+		goto reset_spte;
+
+	return;
+
+reset_spte:
 	rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
 	spte = spte & ~PT_WRITABLE_MASK;
+	if (page_table_protect)
+		spte |= SPTE_WRITE_PROTECT;
 	mmu_spte_update(sptep, spte);
 }

-static bool
-__rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level)
+static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
+				 int level, bool page_table_protect)
 {
 	u64 *sptep;
 	struct spte_iterator iter;
@@ -1142,7 +1159,7 @@ __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level)

 	for_each_rmap_spte(rmapp, &iter, sptep)
 		spte_write_protect(kvm, sptep, level > PT_PAGE_TABLE_LEVEL,
-			  &write_protected);
+			  &write_protected, page_table_protect);

 	return write_protected;
 }
@@ -1165,7 +1182,7 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,

 	while (mask) {
 		rmapp = &slot->rmap[gfn_offset + __ffs(mask)];
-		__rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL);
+		__rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL, false);

 		/* clear the first set bit */
 		mask &= mask - 1;
@@ -1186,7 +1203,7 @@ static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
 		return false;

 	do {
-		write_protected |= __rmap_write_protect(kvm, rmapp, i++);
+		write_protected |= __rmap_write_protect(kvm, rmapp, i++, true);

 		if (i >= PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES)
 			break;
@@ -1239,7 +1256,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			new_spte |= (u64)new_pfn << PAGE_SHIFT;

 			new_spte &= ~(PT_WRITABLE_MASK | SPTE_HOST_WRITEABLE |
-				      shadow_accessed_mask | SPTE_ALLOW_WRITE);
+				      shadow_accessed_mask | SPTE_ALLOW_WRITE |
+				      SPTE_WRITE_PROTECT);

 			mmu_spte_clear_track_bits(sptep);
 			mmu_spte_set(sptep, new_spte);
@@ -2416,6 +2434,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			ret = 1;
 			pte_access &= ~ACC_WRITE_MASK;
 			spte &= ~PT_WRITABLE_MASK;
+			spte |= SPTE_WRITE_PROTECT;
 		}
 	}

@@ -3992,7 +4011,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
 				continue;

 			spte_write_protect(kvm, &pt[i],
-					   is_large_pte(pt[i]), &flush);
+					   is_large_pte(pt[i]), &flush, false);
 		}
 	}
 	kvm_flush_remote_tlbs(kvm);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 13/16] KVM: MMU: break sptes write-protect if gfn is writable
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (11 preceding siblings ...)
  2012-04-13 10:15 ` [PATCH v2 12/16] KVM: MMU: introduce SPTE_WRITE_PROTECT bit Xiao Guangrong
@ 2012-04-13 10:15 ` Xiao Guangrong
  2012-04-13 10:16 ` [PATCH v2 14/16] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:15 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Make all sptes to be writable if the gfn become write-free to reduce
the later page fault

The idea is from Avi

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   34 +++++++++++++++++++++++++++++++++-
 1 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 578a1e2..efa5d59 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2323,15 +2323,45 @@ static void kvm_unsync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 	}
 }

+/*
+ * If the gfn become write-free, we make all sptes which point to this
+ * gfn to be writable.
+ * Note: we should call mark_page_dirty for the gfn later.
+ */
+static void rmap_break_page_table_wp(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	struct spte_iterator iter;
+	u64 *sptep;
+	int i;
+
+	for (i = PT_PAGE_TABLE_LEVEL;
+	      i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; i++) {
+		unsigned long *rmap = __gfn_to_rmap(gfn, i, slot);
+
+		for_each_rmap_spte(rmap, &iter, sptep) {
+			u64 spte = *sptep;
+
+			if (!is_writable_pte(spte) &&
+			      (spte & SPTE_ALLOW_WRITE)) {
+				spte &= ~SPTE_WRITE_PROTECT;
+				spte |= PT_WRITABLE_MASK;
+				mmu_spte_update(sptep, spte);
+			}
+		}
+	}
+}
+
 static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 				  bool can_unsync)
 {
+	struct kvm_memory_slot *slot;
 	struct kvm_mmu_page *s;
 	struct hlist_node *node;
 	unsigned long *rmap;
 	bool need_unsync = false;

-	rmap = gfn_to_rmap(vcpu->kvm, gfn, PT_PAGE_TABLE_LEVEL);
+	slot = gfn_to_memslot(vcpu->kvm, gfn);
+	rmap =	__gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);

 	if (!vcpu->kvm->arch.indirect_shadow_pages)
 		goto write_free;
@@ -2353,6 +2383,8 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (need_unsync)
 		kvm_unsync_pages(vcpu, gfn);

+	rmap_break_page_table_wp(slot, gfn);
+
 write_free:
 	__clear_bit(PTE_LIST_WP_BIT, rmap);
 	return 0;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 14/16] KVM: MMU: fast path of handling guest page fault
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (12 preceding siblings ...)
  2012-04-13 10:15 ` [PATCH v2 13/16] KVM: MMU: break sptes write-protect if gfn is writable Xiao Guangrong
@ 2012-04-13 10:16 ` Xiao Guangrong
  2012-04-18  1:47   ` Marcelo Tosatti
  2012-04-13 10:17 ` [PATCH v2 15/16] KVM: MMU: trace fast " Xiao Guangrong
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:16 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

If the the present bit of page fault error code is set, it indicates
the shadow page is populated on all levels, it means what we do is
only modify the access bit which can be done out of mmu-lock

Currently, in order to simplify the code, we only fix the page fault
caused by write-protect on the fast path

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c         |  205 ++++++++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/paging_tmpl.h |    3 +
 2 files changed, 192 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index efa5d59..fc91667 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -446,6 +446,13 @@ static bool __check_direct_spte_mmio_pf(u64 spte)
 }
 #endif

+static bool spte_wp_by_dirty_log(u64 spte)
+{
+	WARN_ON(is_writable_pte(spte));
+
+	return (spte & SPTE_ALLOW_WRITE) && !(spte & SPTE_WRITE_PROTECT);
+}
+
 static bool spte_has_volatile_bits(u64 spte)
 {
 	if (!shadow_accessed_mask)
@@ -454,9 +461,18 @@ static bool spte_has_volatile_bits(u64 spte)
 	if (!is_shadow_present_pte(spte))
 		return false;

-	if ((spte & shadow_accessed_mask) &&
-	      (!is_writable_pte(spte) || (spte & shadow_dirty_mask)))
-		return false;
+	if (spte & shadow_accessed_mask) {
+		if (is_writable_pte(spte))
+			return !(spte & shadow_dirty_mask);
+
+		/*
+		 * If the spte is write-protected by dirty-log, it may
+		 * be marked writable on fast page fault path, then CPU
+		 * can modify the Dirty bit.
+		 */
+		if (!spte_wp_by_dirty_log(spte))
+			return false;
+	}

 	return true;
 }
@@ -1109,26 +1125,18 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
 		rmap_remove(kvm, sptep);
 }

-static bool spte_wp_by_dirty_log(u64 spte)
-{
-	WARN_ON(is_writable_pte(spte));
-
-	return (spte & SPTE_ALLOW_WRITE) && !(spte & SPTE_WRITE_PROTECT);
-}
-
 static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
 			       bool *flush, bool page_table_protect)
 {
 	u64 spte = *sptep;

 	if (is_writable_pte(spte)) {
-		*flush |= true;
-
 		if (large) {
 			pgprintk("rmap_write_protect(large): spte %p %llx\n",
 				 spte, *spte);
 			BUG_ON(!is_large_pte(spte));

+			*flush |= true;
 			drop_spte(kvm, sptep);
 			--kvm->stat.lpages;
 			return;
@@ -1137,6 +1145,9 @@ static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
 		goto reset_spte;
 	}

+	/* We need flush tlbs in this case: the fast page fault path
+	 * can mark the spte writable after we read the sptep.
+	 */
 	if (page_table_protect && spte_wp_by_dirty_log(spte))
 		goto reset_spte;

@@ -1144,6 +1155,8 @@ static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,

 reset_spte:
 	rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
+
+	*flush |= true;
 	spte = spte & ~PT_WRITABLE_MASK;
 	if (page_table_protect)
 		spte |= SPTE_WRITE_PROTECT;
@@ -2767,18 +2780,172 @@ exit:
 	return ret;
 }

+static bool page_fault_can_be_fast(struct kvm_vcpu *vcpu, gfn_t gfn,
+				   u32 error_code)
+{
+	unsigned long *rmap;
+
+	/*
+	 * #PF can be fast only if the shadow page table is present and it
+	 * is caused by write-protect, that means we just need change the
+	 * W bit of the spte which can be done out of mmu-lock.
+	 */
+	if (!(error_code & PFERR_PRESENT_MASK) ||
+	      !(error_code & PFERR_WRITE_MASK))
+		return false;
+
+	rmap = gfn_to_rmap(vcpu->kvm, gfn, PT_PAGE_TABLE_LEVEL);
+
+	/* Quickly check the page can be writable. */
+	if (test_bit(PTE_LIST_WP_BIT, ACCESS_ONCE(rmap)))
+		return false;
+
+	return true;
+}
+
+static bool
+fast_pf_fix_indirect_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+			  u64 *sptep, u64 spte, gfn_t gfn)
+{
+	pfn_t pfn;
+	bool ret = false;
+
+	/*
+	 * For the indirect spte, it is hard to get a stable gfn since
+	 * we just use a cmpxchg to avoid all the races which is not
+	 * enough to avoid the ABA problem: the host can arbitrarily
+	 * change spte and the mapping from gfn to pfh.
+	 *
+	 * What we do is call gfn_to_pfn_atomic to bind the gfn and the
+	 * pfn because after the call:
+	 * - we have held the refcount of pfn that means the pfn can not
+	 *   be freed and be reused for another gfn.
+	 * - the pfn is writable that means it can not be shared by different
+	 *   gfn.
+	 */
+	pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn);
+
+	/* The host page is swapped out or merged. */
+	if (mmu_invalid_pfn(pfn))
+		goto exit;
+
+	ret = true;
+
+	if (pfn != spte_to_pfn(spte))
+		goto exit;
+
+	if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
+		mark_page_dirty(vcpu->kvm, gfn);
+
+exit:
+	kvm_release_pfn_clean(pfn);
+	return ret;
+}
+
+static bool
+fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+			u64 *sptep, u64 spte)
+{
+	gfn_t gfn;
+
+	WARN_ON(!sp->role.direct);
+
+	/*
+	 * The gfn of direct spte is stable since it is calculated
+	 * by sp->gfn.
+	 */
+	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+
+	if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
+		mark_page_dirty(vcpu->kvm, gfn);
+
+	return true;
+}
+
+/*
+ * Return value:
+ * - true: let the vcpu to access on the same address again.
+ * - false: let the real page fault path to fix it.
+ */
+static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+			    int level, u32 error_code)
+{
+	struct kvm_shadow_walk_iterator iterator;
+	struct kvm_mmu_page *sp;
+	bool ret = false;
+	u64 spte = 0ull;
+
+	if (!page_fault_can_be_fast(vcpu, gfn, error_code))
+		return false;
+
+	walk_shadow_page_lockless_begin(vcpu);
+	for_each_shadow_entry_lockless(vcpu, gva, iterator, spte)
+		if (!is_shadow_present_pte(spte) || iterator.level < level)
+			break;
+
+	/*
+	 * If the mapping has been changed, let the vcpu fault on the
+	 * same address again.
+	 */
+	if (!is_rmap_spte(spte)) {
+		ret = true;
+		goto exit;
+	}
+
+	if (!is_last_spte(spte, level))
+		goto exit;
+
+	/*
+	 * Check if it is a spurious fault caused by TLB lazily flushed.
+	 *
+	 * Need not check the access of upper level table entries since
+	 * they are always ACC_ALL.
+	 */
+	 if (is_writable_pte(spte)) {
+		ret = true;
+		goto exit;
+	}
+
+	/*
+	 * If the page fault is caused by write but host do not allow
+	 * to write the page, we need cow the host page.
+	 */
+	if (!(spte & SPTE_ALLOW_WRITE))
+		goto exit;
+
+	/*
+	 * Currently, to simplify the code, only the spte write-protected
+	 * by dirty-log can be fast fixed.
+	 */
+	if (!spte_wp_by_dirty_log(spte))
+		goto exit;
+
+	sp = page_header(__pa(iterator.sptep));
+
+	if (sp->role.direct)
+		ret = fast_pf_fix_direct_spte(vcpu, sp, iterator.sptep, spte);
+	else
+		ret = fast_pf_fix_indirect_spte(vcpu, sp, iterator.sptep,
+						spte, gfn);
+
+exit:
+	walk_shadow_page_lockless_end(vcpu);
+
+	return ret;
+}
+
 static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			 gva_t gva, pfn_t *pfn, bool write, bool *writable);

-static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn,
-			 bool prefault)
+static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
+			 gfn_t gfn, bool prefault)
 {
 	int r;
 	int level;
 	int force_pt_level;
 	pfn_t pfn;
 	unsigned long mmu_seq;
-	bool map_writable;
+	bool map_writable, write = error_code & PFERR_WRITE_MASK;

 	force_pt_level = mapping_level_dirty_bitmap(vcpu, gfn);
 	if (likely(!force_pt_level)) {
@@ -2795,6 +2962,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn,
 	} else
 		level = PT_PAGE_TABLE_LEVEL;

+	if (fast_page_fault(vcpu, v, gfn, level, error_code))
+		return 0;
+
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();

@@ -3183,7 +3353,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 	gfn = gva >> PAGE_SHIFT;

 	return nonpaging_map(vcpu, gva & PAGE_MASK,
-			     error_code & PFERR_WRITE_MASK, gfn, prefault);
+			     error_code, gfn, prefault);
 }

 static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
@@ -3263,6 +3433,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	} else
 		level = PT_PAGE_TABLE_LEVEL;

+	if (fast_page_fault(vcpu, gpa, gfn, level, error_code))
+		return 0;
+
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index e3cf43b..c85dbbe 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -615,6 +615,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 		walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1);
 	}

+	if (fast_page_fault(vcpu, addr, walker.gfn, level, error_code))
+		return 0;
+
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 15/16] KVM: MMU: trace fast page fault
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (13 preceding siblings ...)
  2012-04-13 10:16 ` [PATCH v2 14/16] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
@ 2012-04-13 10:17 ` Xiao Guangrong
  2012-04-13 10:17 ` [PATCH v2 16/16] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
  2012-04-14  3:37 ` [PATCH v2 00/16] KVM: MMU: fast page fault Takuya Yoshikawa
  16 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:17 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

To see what happen on this path and help us to optimize it

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c      |    2 ++
 arch/x86/kvm/mmutrace.h |   41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fc91667..589b507 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2929,6 +2929,8 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
 						spte, gfn);

 exit:
+	trace_fast_page_fault(vcpu, gva, gfn, error_code, iterator.sptep,
+			      spte, ret);
 	walk_shadow_page_lockless_end(vcpu);

 	return ret;
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 89fb0e8..84da94f 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -243,6 +243,47 @@ TRACE_EVENT(
 	TP_printk("addr:%llx gfn %llx access %x", __entry->addr, __entry->gfn,
 		  __entry->access)
 );
+
+#define __spte_satisfied(__spte)				\
+	(__entry->retry && is_writable_pte(__entry->__spte))
+
+TRACE_EVENT(
+	fast_page_fault,
+	TP_PROTO(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn, u32 error_code,
+		 u64 *sptep, u64 old_spte, bool retry),
+	TP_ARGS(vcpu, gva, gfn, error_code, sptep, old_spte, retry),
+
+	TP_STRUCT__entry(
+		__field(int, vcpu_id)
+		__field(gva_t, gva)
+		__field(gfn_t, gfn)
+		__field(u32, error_code)
+		__field(u64 *, sptep)
+		__field(u64, old_spte)
+		__field(u64, new_spte)
+		__field(bool, retry)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu->vcpu_id;
+		__entry->gva = gva;
+		__entry->gfn = gfn;
+		__entry->error_code = error_code;
+		__entry->sptep = sptep;
+		__entry->old_spte = old_spte;
+		__entry->new_spte = *sptep;
+		__entry->retry = retry;
+	),
+
+	TP_printk("vcpu %d gva %lx gfn %llx error_code %s sptep %p "
+		  "old %#llx new %llx spurious %d fixed %d", __entry->vcpu_id,
+		  __entry->gva, __entry->gfn,
+		  __print_flags(__entry->error_code, "|",
+				kvm_mmu_trace_pferr_flags),
+		  __entry->sptep, __entry->old_spte, __entry->new_spte,
+		  __spte_satisfied(old_spte), __spte_satisfied(new_spte)
+	)
+);
 #endif /* _TRACE_KVMMMU_H */

 #undef TRACE_INCLUDE_PATH
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 16/16] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (14 preceding siblings ...)
  2012-04-13 10:17 ` [PATCH v2 15/16] KVM: MMU: trace fast " Xiao Guangrong
@ 2012-04-13 10:17 ` Xiao Guangrong
  2012-04-14  3:37 ` [PATCH v2 00/16] KVM: MMU: fast page fault Takuya Yoshikawa
  16 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-13 10:17 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

P bit of page fault error code is missed in this tracepoint, fix it by
passing the full error code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmutrace.h    |    7 +++----
 arch/x86/kvm/paging_tmpl.h |    3 +--
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 84da94f..e762d35 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -54,8 +54,8 @@
  */
 TRACE_EVENT(
 	kvm_mmu_pagetable_walk,
-	TP_PROTO(u64 addr, int write_fault, int user_fault, int fetch_fault),
-	TP_ARGS(addr, write_fault, user_fault, fetch_fault),
+	TP_PROTO(u64 addr, u32 pferr),
+	TP_ARGS(addr, pferr),

 	TP_STRUCT__entry(
 		__field(__u64, addr)
@@ -64,8 +64,7 @@ TRACE_EVENT(

 	TP_fast_assign(
 		__entry->addr = addr;
-		__entry->pferr = (!!write_fault << 1) | (!!user_fault << 2)
-		                 | (!!fetch_fault << 4);
+		__entry->pferr = pferr;
 	),

 	TP_printk("addr %llx pferr %x %s", __entry->addr, __entry->pferr,
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index c85dbbe..ff85616 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -154,8 +154,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	const int fetch_fault = access & PFERR_FETCH_MASK;
 	u16 errcode = 0;

-	trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
-				     fetch_fault);
+	trace_kvm_mmu_pagetable_walk(addr, access);
 retry_walk:
 	eperm = false;
 	walker->level = mmu->root_level;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect
  2012-04-13 10:11 ` [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect Xiao Guangrong
@ 2012-04-14  2:00   ` Takuya Yoshikawa
  2012-04-15 11:25     ` Avi Kivity
  0 siblings, 1 reply; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-14  2:00 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On Fri, 13 Apr 2012 18:11:13 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> The reture value of __rmap_write_protect is either 1 or 0, use
> true/false instead of these

...

> @@ -1689,7 +1690,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
> 
>  	kvm_mmu_pages_init(parent, &parents, &pages);
>  	while (mmu_unsync_walk(parent, &pages)) {
> -		int protected = 0;
> +		bool protected = false;
> 
>  		for_each_sp(pages, sp, parents, i)
>  			protected |= rmap_write_protect(vcpu->kvm, sp->gfn);

Isn't this the reason we prefer int to bool?

Not sure people like to use |= with boolean.

	Takuya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 03/16] KVM: MMU: properly assert spte on rmap walking path
  2012-04-13 10:10 ` [PATCH v2 03/16] KVM: MMU: properly assert spte on rmap walking path Xiao Guangrong
@ 2012-04-14  2:15   ` Takuya Yoshikawa
  2012-04-16  3:26     ` Xiao Guangrong
  0 siblings, 1 reply; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-14  2:15 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On Fri, 13 Apr 2012 18:10:45 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

>  static u64 *rmap_get_next(struct rmap_iterator *iter)
>  {
> +	u64 *sptep = NULL;
> +
>  	if (iter->desc) {
>  		if (iter->pos < PTE_LIST_EXT - 1) {
> -			u64 *sptep;
> -
>  			++iter->pos;
>  			sptep = iter->desc->sptes[iter->pos];
>  			if (sptep)
> -				return sptep;
> +				goto exit;
>  		}
> 
>  		iter->desc = iter->desc->more;
> @@ -1028,11 +1036,14 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
>  		if (iter->desc) {
>  			iter->pos = 0;
>  			/* desc->sptes[0] cannot be NULL */
> -			return iter->desc->sptes[iter->pos];
> +			sptep = iter->desc->sptes[iter->pos];
> +			goto exit;
>  		}
>  	}
> 
> -	return NULL;
> +exit:
> +	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
> +	return sptep;
>  }

This will, probably, again force rmap_get_next function-call even with EPT/NPT:
CPU cannot skip it by branch prediction.

	Takuya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 05/16] KVM: MMU: abstract spte write-protect
  2012-04-13 10:11 ` [PATCH v2 05/16] KVM: MMU: abstract spte write-protect Xiao Guangrong
@ 2012-04-14  2:26   ` Takuya Yoshikawa
  2012-04-16  3:27     ` Xiao Guangrong
  0 siblings, 1 reply; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-14  2:26 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On Fri, 13 Apr 2012 18:11:45 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> +/* Return true if the spte is dropped. */

Return value does not correspond with the function name so it is confusing.

People may think that true means write protection has been done.

> +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
> +			       bool *flush)
> +{
> +	u64 spte = *sptep;
> +
> +	if (!is_writable_pte(spte))
> +		return false;
> +
> +	*flush |= true;
> +
> +	if (large) {
> +		pgprintk("rmap_write_protect(large): spte %p %llx\n",
> +			 spte, *spte);
> +		BUG_ON(!is_large_pte(spte));
> +
> +		drop_spte(kvm, sptep);
> +		--kvm->stat.lpages;
> +		return true;
> +	}

This suggests we should use separate functions?

	Takuya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte
  2012-04-13 10:12 ` [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte Xiao Guangrong
@ 2012-04-14  2:44   ` Takuya Yoshikawa
  2012-04-16  3:36     ` Xiao Guangrong
  0 siblings, 1 reply; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-14  2:44 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On Fri, 13 Apr 2012 18:12:41 +0800b
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> It is used to walk all the sptes of the specified pte_list, after
> this, the code of pte_list_walk can be removed
> 
> And it can restart the walking automatically if the spte is zapped

Well, I want to ask two questions:

	- why do you prefer pte_list_* naming to rmap_*?
	  (not a big issue but just curious)
	- Are you sure the whole indirection by this patch will
	  not introduce any regression?
	  (not restricted to get_dirty)

As my previous patch showed, just a slight trivial change can introduce
siginificant performance regression/improvement.

Thanks,
	Takuya

> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c       |  233 +++++++++++++++++++++++-----------------------
>  arch/x86/kvm/mmu_audit.c |    6 +-
>  2 files changed, 118 insertions(+), 121 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index a1c3628..4e91e94 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -900,26 +900,110 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
>  	}
>  }
> 
> -typedef void (*pte_list_walk_fn) (u64 *spte);
> -static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
> -{
> +/*
> + * Used by the following functions to iterate through the sptes linked by a
> + * pte_list. All fields are private and not assumed to be used outside.
> + */
> +struct spte_iterator {
> +	/* private fields */
> +	unsigned long *pte_list;
> +	/* holds the sptep if not NULL */
>  	struct pte_list_desc *desc;
> -	int i;
> +	/* index of the sptep, -1 means the walking does not start. */
> +	int pos;
> +};
> 
> -	if (!*pte_list)
> -		return;
> +static void pte_list_walk_init(struct spte_iterator *iter,
> +			       unsigned long *pte_list)
> +{
> +	iter->pte_list = pte_list;
> +	iter->pos = -1;
> +}
> +
> +static void pte_list_walk_check_restart(struct spte_iterator *iter, u64 *spte)
> +{
> +	/*
> +	 * If the spte is zapped, we should set 'iter->pos = -1' to restart
> +	 * the walk.
> +	 */
> +	if (!is_shadow_present_pte(*spte))
> +		iter->pos = -1;
> +}
> +
> +static u64 *pte_list_first(struct spte_iterator *iter)
> +{
> +	unsigned long pte_list = *iter->pte_list;
> +	u64 *sptep;
> +
> +	if (!pte_list)
> +		return NULL;
> +
> +	if (!(pte_list & 1)) {
> +		iter->desc = NULL;
> +		iter->pos = 0;
> +		sptep = (u64 *)pte_list;
> +
> +		goto exit;
> +	}
> +
> +	iter->desc = (struct pte_list_desc *)(pte_list & ~1ul);
> +	iter->pos = 0;
> +	sptep = iter->desc->sptes[iter->pos];
> +
> +exit:
> +	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
> +	return sptep;
> +}
> 
> -	if (!(*pte_list & 1))
> -		return fn((u64 *)*pte_list);
> +static u64 *pte_list_next(struct spte_iterator *iter)
> +{
> +	u64 *sptep = NULL;
> 
> -	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
> -	while (desc) {
> -		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
> -			fn(desc->sptes[i]);
> -		desc = desc->more;
> +	if (iter->pos < 0)
> +		return pte_list_first(iter);
> +
> +	if (iter->desc) {
> +		if (iter->pos < PTE_LIST_EXT - 1) {
> +			++iter->pos;
> +			sptep = iter->desc->sptes[iter->pos];
> +			if (sptep)
> +				goto exit;
> +		}
> +
> +		iter->desc = iter->desc->more;
> +
> +		if (iter->desc) {
> +			iter->pos = 0;
> +			/* desc->sptes[0] cannot be NULL */
> +			sptep = iter->desc->sptes[iter->pos];
> +			goto exit;
> +		}
>  	}
> +
> +exit:
> +	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
> +	return sptep;
>  }
> 
> +
> +#define for_each_pte_list_spte(pte_list, iter, spte)	\
> +	for (pte_list_walk_init(iter, pte_list);	\
> +	      (spte = pte_list_next(iter)) != NULL;	\
> +		pte_list_walk_check_restart(iter, spte))
> +
> +typedef void (*pte_list_walk_fn) (u64 *spte);
> +static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
> +{
> +	struct spte_iterator iter;
> +	u64 *spte;
> +
> +	for_each_pte_list_spte(pte_list, &iter, spte)
> +		fn(spte);
> +}
> +
> +#define for_each_rmap_spte(rmap, iter, spte)	\
> +		for_each_pte_list_spte(rmap, iter, spte)
> +
>  static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
>  				    struct kvm_memory_slot *slot)
>  {
> @@ -974,92 +1058,19 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
>  	pte_list_remove(spte, rmapp);
>  }
> 
> -/*
> - * Used by the following functions to iterate through the sptes linked by a
> - * rmap.  All fields are private and not assumed to be used outside.
> - */
> -struct rmap_iterator {
> -	/* private fields */
> -	struct pte_list_desc *desc;	/* holds the sptep if not NULL */
> -	int pos;			/* index of the sptep */
> -};
> -
> -/*
> - * Iteration must be started by this function.  This should also be used after
> - * removing/dropping sptes from the rmap link because in such cases the
> - * information in the itererator may not be valid.
> - *
> - * Returns sptep if found, NULL otherwise.
> - */
> -static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
> -{
> -	u64 *sptep;
> -
> -	if (!rmap)
> -		return NULL;
> -
> -	if (!(rmap & 1)) {
> -		iter->desc = NULL;
> -		sptep = (u64 *)rmap;
> -
> -		goto exit;
> -	}
> -
> -	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
> -	iter->pos = 0;
> -	sptep = iter->desc->sptes[iter->pos];
> -
> -exit:
> -	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
> -	return sptep;
> -}
> -
> -/*
> - * Must be used with a valid iterator: e.g. after rmap_get_first().
> - *
> - * Returns sptep if found, NULL otherwise.
> - */
> -static u64 *rmap_get_next(struct rmap_iterator *iter)
> -{
> -	u64 *sptep = NULL;
> -
> -	if (iter->desc) {
> -		if (iter->pos < PTE_LIST_EXT - 1) {
> -			++iter->pos;
> -			sptep = iter->desc->sptes[iter->pos];
> -			if (sptep)
> -				goto exit;
> -		}
> -
> -		iter->desc = iter->desc->more;
> -
> -		if (iter->desc) {
> -			iter->pos = 0;
> -			/* desc->sptes[0] cannot be NULL */
> -			sptep = iter->desc->sptes[iter->pos];
> -			goto exit;
> -		}
> -	}
> -
> -exit:
> -	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
> -	return sptep;
> -}
> -
>  static void drop_spte(struct kvm *kvm, u64 *sptep)
>  {
>  	if (mmu_spte_clear_track_bits(sptep))
>  		rmap_remove(kvm, sptep);
>  }
> 
> -/* Return true if the spte is dropped. */
> -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
> +static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
>  			       bool *flush)
>  {
>  	u64 spte = *sptep;
> 
>  	if (!is_writable_pte(spte))
> -		return false;
> +		return;
> 
>  	*flush |= true;
> 
> @@ -1070,32 +1081,24 @@ static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
> 
>  		drop_spte(kvm, sptep);
>  		--kvm->stat.lpages;
> -		return true;
> +		return;
>  	}
> 
>  	rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
>  	spte = spte & ~PT_WRITABLE_MASK;
>  	mmu_spte_update(sptep, spte);
> -
> -	return false;
>  }
> 
>  static bool
>  __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct spte_iterator iter;
>  	bool write_protected = false;
> 
> -	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
> -		if (spte_write_protect(kvm, sptep, level > PT_PAGE_TABLE_LEVEL,
> -			  &write_protected)) {
> -			sptep = rmap_get_first(*rmapp, &iter);
> -			continue;
> -		}
> -
> -		sptep = rmap_get_next(&iter);
> -	}
> +	for_each_rmap_spte(rmapp, &iter, sptep)
> +		spte_write_protect(kvm, sptep, level > PT_PAGE_TABLE_LEVEL,
> +			  &write_protected);
> 
>  	return write_protected;
>  }
> @@ -1147,10 +1150,10 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  			   unsigned long data)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct spte_iterator iter;
>  	int need_tlb_flush = 0;
> 
> -	while ((sptep = rmap_get_first(*rmapp, &iter))) {
> +	for_each_rmap_spte(rmapp, &iter, sptep) {
>  		rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", sptep, *sptep);
> 
>  		drop_spte(kvm, sptep);
> @@ -1164,7 +1167,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  			     unsigned long data)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct spte_iterator iter;
>  	int need_flush = 0;
>  	u64 new_spte;
>  	pte_t *ptep = (pte_t *)data;
> @@ -1173,15 +1176,14 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  	WARN_ON(pte_huge(*ptep));
>  	new_pfn = pte_pfn(*ptep);
> 
> -	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
> +	for_each_rmap_spte(rmapp, &iter, sptep) {
>  		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", sptep, *sptep);
> 
>  		need_flush = 1;
> 
> -		if (pte_write(*ptep)) {
> +		if (pte_write(*ptep))
>  			drop_spte(kvm, sptep);
> -			sptep = rmap_get_first(*rmapp, &iter);
> -		} else {
> +		else {
>  			new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
>  			new_spte |= (u64)new_pfn << PAGE_SHIFT;
> 
> @@ -1191,7 +1193,6 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
> 
>  			mmu_spte_clear_track_bits(sptep);
>  			mmu_spte_set(sptep, new_spte);
> -			sptep = rmap_get_next(&iter);
>  		}
>  	}
> 
> @@ -1254,7 +1255,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  			 unsigned long data)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct spte_iterator iter;
>  	int young = 0;
> 
>  	/*
> @@ -1267,8 +1268,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  	if (!shadow_accessed_mask)
>  		return kvm_unmap_rmapp(kvm, rmapp, data);
> 
> -	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
> -	     sptep = rmap_get_next(&iter))
> +	for_each_rmap_spte(rmapp, &iter, sptep)
>  		if (*sptep & PT_ACCESSED_MASK) {
>  			young = 1;
>  			clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)sptep);
> @@ -1281,7 +1281,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  			      unsigned long data)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct spte_iterator iter;
>  	int young = 0;
> 
>  	/*
> @@ -1292,8 +1292,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  	if (!shadow_accessed_mask)
>  		goto out;
> 
> -	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
> -	     sptep = rmap_get_next(&iter))
> +	for_each_rmap_spte(rmapp, &iter, sptep)
>  		if (*sptep & PT_ACCESSED_MASK) {
>  			young = 1;
>  			break;
> @@ -1955,9 +1954,9 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
>  static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct spte_iterator iter;
> 
> -	while ((sptep = rmap_get_first(sp->parent_ptes, &iter)))
> +	for_each_pte_list_spte(&sp->parent_ptes, &iter, sptep)
>  		drop_parent_pte(sp, sptep);
>  }
> 
> diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
> index 7d7d0b9..a42bd85 100644
> --- a/arch/x86/kvm/mmu_audit.c
> +++ b/arch/x86/kvm/mmu_audit.c
> @@ -193,7 +193,7 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
>  	struct kvm_memory_slot *slot;
>  	unsigned long *rmapp;
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct spte_iterator iter;
> 
>  	if (sp->role.direct || sp->unsync || sp->role.invalid)
>  		return;
> @@ -201,13 +201,11 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
>  	slot = gfn_to_memslot(kvm, sp->gfn);
>  	rmapp = &slot->rmap[sp->gfn - slot->base_gfn];
> 
> -	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
> -	     sptep = rmap_get_next(&iter)) {
> +	for_each_rmap_spte(rmapp, &iter, sptep)
>  		if (is_writable_pte(*sptep))
>  			audit_printk(kvm, "shadow page has writable "
>  				     "mappings: gfn %llx role %x\n",
>  				     sp->gfn, sp->role.word);
> -	}
>  }
> 
>  static void audit_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Takuya Yoshikawa <takuya.yoshikawa@gmail.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-13 10:14 ` [PATCH v2 10/16] KVM: MMU: fask check whether page is writable Xiao Guangrong
@ 2012-04-14  3:01   ` Takuya Yoshikawa
  2012-04-16  3:38     ` Xiao Guangrong
  2012-04-15 15:16   ` Avi Kivity
  1 sibling, 1 reply; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-14  3:01 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On Fri, 13 Apr 2012 18:14:26 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> Using bit 1 (PTE_LIST_WP_BIT) in rmap store the write-protect status
> to avoid unnecessary shadow page walking
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c |   40 ++++++++++++++++++++++++++++++++++------
>  1 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 0c6e92d..8b71908 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -796,7 +796,9 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
>  	return level - 1;
>  }
> 
> -#define PTE_LIST_DESC		(0x1ull)
> +#define PTE_LIST_DESC_BIT	0
> +#define PTE_LIST_WP_BIT	1

_BIT ?

> +#define PTE_LIST_DESC		(1 << PTE_LIST_DESC_BIT)
>  #define PTE_LIST_FLAG_MASK	(0x3ull)
> 

Well, I think that rmap is losing its original meaning:
Before, it held just the information needed to link spteps.

Now it has more roles, and getting kind of descriptor?

...

> @@ -2291,9 +2310,15 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>  {
>  	struct kvm_mmu_page *s;
>  	struct hlist_node *node;
> +	unsigned long *rmap;
>  	bool need_unsync = false;
> 
> +	rmap = gfn_to_rmap(vcpu->kvm, gfn, PT_PAGE_TABLE_LEVEL);

Please use consistent variable names.

In other parts of this patch, you are using rmapp for this.

	Takuya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/16] KVM: MMU: fast page fault
  2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
                   ` (15 preceding siblings ...)
  2012-04-13 10:17 ` [PATCH v2 16/16] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
@ 2012-04-14  3:37 ` Takuya Yoshikawa
  2012-04-16  3:50   ` Xiao Guangrong
  16 siblings, 1 reply; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-14  3:37 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On Fri, 13 Apr 2012 18:05:29 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> Thanks for Avi and Marcelo's review, i have simplified the whole things
> in this version:
> - it only fix the page fault with PFEC.P = 1 && PFEC.W = 0 that means
>   unlock set_spte path can be dropped.
> 
> - it only fixes the page fault caused by dirty-log
> 
> In this version, all the information we need is from spte, the
> SPTE_ALLOW_WRITE bit and SPTE_WRITE_PROTECT bit:
>    - SPTE_ALLOW_WRITE is set if the gpte is writable and the pfn pointed
>      by the spte is writable on host.
>    - SPTE_WRITE_PROTECT is set if the spte is write-protected by shadow
>      page table protection.
> 
> All these bits can be protected by cmpxchg, now, all the things is fairly
> simple than before. :)

Well, could you remove cleanup patches not needed for "lock-less" from
this patch series?

I want to see them separately.

Or everything was needed for "lock-less" ?

> Performance test:
> 
> autotest migration:
> (Host: Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz * 12 + 32G)

Please explain what this test result means, not just numbers.

There are many aspects:
	- how fast migration can converge/complete
	- how fast programs inside the guest can run during migration:
	  -- throughput
	  -- latency
	- ...

I think lock-less will reduce latency a lot, but not sure about convergence:
why it became fast?

> - For ept:
> 
> Before:
>                     smp2.Fedora.16.64.migrate
> Times   .unix      .with_autotest.dbench.unix     total
>  1       104           214                         323
>  2       68            238                         310
>  3       68            242                         314
> 
> After:
>                     smp2.Fedora.16.64.migrate
> Times   .unix      .with_autotest.dbench.unix     total
>  1       101           190                         295
>  2       67            188                         259
>  3       66            217                         289
> 

As discussed on v1-threads, the main goal of this "lock-less" should be
the elimination of mmu_lock contentions.

So what we should measure is latency.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect
  2012-04-14  2:00   ` Takuya Yoshikawa
@ 2012-04-15 11:25     ` Avi Kivity
  2012-04-16 14:14       ` Takuya Yoshikawa
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2012-04-15 11:25 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Xiao Guangrong, Marcelo Tosatti, LKML, KVM

On 04/14/2012 05:00 AM, Takuya Yoshikawa wrote:
> On Fri, 13 Apr 2012 18:11:13 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
>
> > The reture value of __rmap_write_protect is either 1 or 0, use
> > true/false instead of these
>
> ...
>
> > @@ -1689,7 +1690,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
> > 
> >  	kvm_mmu_pages_init(parent, &parents, &pages);
> >  	while (mmu_unsync_walk(parent, &pages)) {
> > -		int protected = 0;
> > +		bool protected = false;
> > 
> >  		for_each_sp(pages, sp, parents, i)
> >  			protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
>
> Isn't this the reason we prefer int to bool?
>
> Not sure people like to use |= with boolean.
>

Why not?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-13 10:14 ` [PATCH v2 10/16] KVM: MMU: fask check whether page is writable Xiao Guangrong
  2012-04-14  3:01   ` Takuya Yoshikawa
@ 2012-04-15 15:16   ` Avi Kivity
  2012-04-16  3:25     ` Xiao Guangrong
  1 sibling, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2012-04-15 15:16 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM

On 04/13/2012 01:14 PM, Xiao Guangrong wrote:
> Using bit 1 (PTE_LIST_WP_BIT) in rmap store the write-protect status
> to avoid unnecessary shadow page walking
>
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c |   40 ++++++++++++++++++++++++++++++++++------
>  1 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 0c6e92d..8b71908 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -796,7 +796,9 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
>  	return level - 1;
>  }
>
> -#define PTE_LIST_DESC		(0x1ull)
> +#define PTE_LIST_DESC_BIT	0
> +#define PTE_LIST_WP_BIT	1
> +#define PTE_LIST_DESC		(1 << PTE_LIST_DESC_BIT)
>  #define PTE_LIST_FLAG_MASK	(0x3ull)
>
>  static void
> @@ -1067,6 +1069,12 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
>  	return mmu_memory_cache_free_objects(cache);
>  }
>
> +static void host_page_write_protect(u64 *spte, unsigned long *rmapp)
> +{
> +	if (!(*spte & SPTE_HOST_WRITEABLE))
> +		__test_and_set_bit(PTE_LIST_WP_BIT, rmapp);
> +}
>

Why is this needed, in addition to spte.SPTE_WRITE_PROTECT?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-15 15:16   ` Avi Kivity
@ 2012-04-16  3:25     ` Xiao Guangrong
  2012-04-16 10:02       ` Avi Kivity
  0 siblings, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-16  3:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

On 04/15/2012 11:16 PM, Avi Kivity wrote:

> On 04/13/2012 01:14 PM, Xiao Guangrong wrote:
>> Using bit 1 (PTE_LIST_WP_BIT) in rmap store the write-protect status
>> to avoid unnecessary shadow page walking
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c |   40 ++++++++++++++++++++++++++++++++++------
>>  1 files changed, 34 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 0c6e92d..8b71908 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -796,7 +796,9 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
>>  	return level - 1;
>>  }
>>
>> -#define PTE_LIST_DESC		(0x1ull)
>> +#define PTE_LIST_DESC_BIT	0
>> +#define PTE_LIST_WP_BIT	1
>> +#define PTE_LIST_DESC		(1 << PTE_LIST_DESC_BIT)
>>  #define PTE_LIST_FLAG_MASK	(0x3ull)
>>
>>  static void
>> @@ -1067,6 +1069,12 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
>>  	return mmu_memory_cache_free_objects(cache);
>>  }
>>
>> +static void host_page_write_protect(u64 *spte, unsigned long *rmapp)
>> +{
>> +	if (!(*spte & SPTE_HOST_WRITEABLE))
>> +		__test_and_set_bit(PTE_LIST_WP_BIT, rmapp);
>> +}
>>
> 
> Why is this needed, in addition to spte.SPTE_WRITE_PROTECT?
> 


It is used to avoid the unnecessary overload for fast page fault if
KSM is enabled. On the fast check path, it can see the gfn is write-protected
by host, then the fast page fault path is not called.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 03/16] KVM: MMU: properly assert spte on rmap walking path
  2012-04-14  2:15   ` Takuya Yoshikawa
@ 2012-04-16  3:26     ` Xiao Guangrong
  0 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-16  3:26 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On 04/14/2012 10:15 AM, Takuya Yoshikawa wrote:

> On Fri, 13 Apr 2012 18:10:45 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>>  static u64 *rmap_get_next(struct rmap_iterator *iter)
>>  {
>> +	u64 *sptep = NULL;
>> +
>>  	if (iter->desc) {
>>  		if (iter->pos < PTE_LIST_EXT - 1) {
>> -			u64 *sptep;
>> -
>>  			++iter->pos;
>>  			sptep = iter->desc->sptes[iter->pos];
>>  			if (sptep)
>> -				return sptep;
>> +				goto exit;
>>  		}
>>
>>  		iter->desc = iter->desc->more;
>> @@ -1028,11 +1036,14 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
>>  		if (iter->desc) {
>>  			iter->pos = 0;
>>  			/* desc->sptes[0] cannot be NULL */
>> -			return iter->desc->sptes[iter->pos];
>> +			sptep = iter->desc->sptes[iter->pos];
>> +			goto exit;
>>  		}
>>  	}
>>
>> -	return NULL;
>> +exit:
>> +	WARN_ON(sptep && !is_shadow_present_pte(*sptep));
>> +	return sptep;
>>  }
> 
> This will, probably, again force rmap_get_next function-call even with EPT/NPT:
> CPU cannot skip it by branch prediction.
> 

No, EPT/NPT also needs it.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 05/16] KVM: MMU: abstract spte write-protect
  2012-04-14  2:26   ` Takuya Yoshikawa
@ 2012-04-16  3:27     ` Xiao Guangrong
  0 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-16  3:27 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On 04/14/2012 10:26 AM, Takuya Yoshikawa wrote:

> On Fri, 13 Apr 2012 18:11:45 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>> +/* Return true if the spte is dropped. */
> 
> Return value does not correspond with the function name so it is confusing.
> 


That is why i put comment here.

> People may think that true means write protection has been done.
> 


You have a better name?


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte
  2012-04-14  2:44   ` Takuya Yoshikawa
@ 2012-04-16  3:36     ` Xiao Guangrong
  2012-04-17 14:47       ` Takuya Yoshikawa
  0 siblings, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-16  3:36 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On 04/14/2012 10:44 AM, Takuya Yoshikawa wrote:

> On Fri, 13 Apr 2012 18:12:41 +0800b
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>> It is used to walk all the sptes of the specified pte_list, after
>> this, the code of pte_list_walk can be removed
>>
>> And it can restart the walking automatically if the spte is zapped
> 
> Well, I want to ask two questions:
> 
> 	- why do you prefer pte_list_* naming to rmap_*?
> 	  (not a big issue but just curious)


pte_list is a common infrastructure for both parent-list and rmap.

> 	- Are you sure the whole indirection by this patch will
> 	  not introduce any regression?
> 	  (not restricted to get_dirty)
> 


I tested it with kernbench, no regression is found.

It is not a problem since the iter and spte should be in the cache.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-14  3:01   ` Takuya Yoshikawa
@ 2012-04-16  3:38     ` Xiao Guangrong
  0 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-16  3:38 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On 04/14/2012 11:01 AM, Takuya Yoshikawa wrote:

> On Fri, 13 Apr 2012 18:14:26 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>> Using bit 1 (PTE_LIST_WP_BIT) in rmap store the write-protect status
>> to avoid unnecessary shadow page walking
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c |   40 ++++++++++++++++++++++++++++++++++------
>>  1 files changed, 34 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 0c6e92d..8b71908 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -796,7 +796,9 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
>>  	return level - 1;
>>  }
>>
>> -#define PTE_LIST_DESC		(0x1ull)
>> +#define PTE_LIST_DESC_BIT	0
>> +#define PTE_LIST_WP_BIT	1
> 
> _BIT ?


What is the problem?

> 
>> @@ -2291,9 +2310,15 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>>  {
>>  	struct kvm_mmu_page *s;
>>  	struct hlist_node *node;
>> +	unsigned long *rmap;
>>  	bool need_unsync = false;
>>
>> +	rmap = gfn_to_rmap(vcpu->kvm, gfn, PT_PAGE_TABLE_LEVEL);
> 
> Please use consistent variable names.
> 
> In other parts of this patch, you are using rmapp for this.


Okay.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/16] KVM: MMU: fast page fault
  2012-04-14  3:37 ` [PATCH v2 00/16] KVM: MMU: fast page fault Takuya Yoshikawa
@ 2012-04-16  3:50   ` Xiao Guangrong
  0 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-16  3:50 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On 04/14/2012 11:37 AM, Takuya Yoshikawa wrote:

> On Fri, 13 Apr 2012 18:05:29 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>> Thanks for Avi and Marcelo's review, i have simplified the whole things
>> in this version:
>> - it only fix the page fault with PFEC.P = 1 && PFEC.W = 0 that means
>>   unlock set_spte path can be dropped.
>>
>> - it only fixes the page fault caused by dirty-log
>>
>> In this version, all the information we need is from spte, the
>> SPTE_ALLOW_WRITE bit and SPTE_WRITE_PROTECT bit:
>>    - SPTE_ALLOW_WRITE is set if the gpte is writable and the pfn pointed
>>      by the spte is writable on host.
>>    - SPTE_WRITE_PROTECT is set if the spte is write-protected by shadow
>>      page table protection.
>>
>> All these bits can be protected by cmpxchg, now, all the things is fairly
>> simple than before. :)
> 
> Well, could you remove cleanup patches not needed for "lock-less" from
> this patch series?
> 
> I want to see them separately.
> 
> Or everything was needed for "lock-less" ?
> 


The cleanup patches do the prepare work for fast page fault, the later path will
be easily implemented, for example, the for_each_spte_rmap patches make "store
more bits in rmap" patch doing little change since spte_list_walk is removed.

>> Performance test:
>>
>> autotest migration:
>> (Host: Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz * 12 + 32G)
> 
> Please explain what this test result means, not just numbers.
> 
> There are many aspects:
> 	- how fast migration can converge/complete
> 	- how fast programs inside the guest can run during migration:
> 	  -- throughput
> 	  -- latency
> 	- ...
> 


The result is rather straightforward, i think explanation is not needed.

> I think lock-less will reduce latency a lot, but not sure about convergence:
> why it became fast?
> 


It is hard to understand? It is faster since it can be parallel.

>> - For ept:
>>
>> Before:
>>                     smp2.Fedora.16.64.migrate
>> Times   .unix      .with_autotest.dbench.unix     total
>>  1       104           214                         323
>>  2       68            238                         310
>>  3       68            242                         314
>>
>> After:
>>                     smp2.Fedora.16.64.migrate
>> Times   .unix      .with_autotest.dbench.unix     total
>>  1       101           190                         295
>>  2       67            188                         259
>>  3       66            217                         289
>>
> 
> As discussed on v1-threads, the main goal of this "lock-less" should be
> the elimination of mmu_lock contentions
> 
> So what we should measure is latency.
> 


I think the test of migration time is fairly enough to see the effect.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-16  3:25     ` Xiao Guangrong
@ 2012-04-16 10:02       ` Avi Kivity
  2012-04-16 10:20         ` Xiao Guangrong
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2012-04-16 10:02 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM

On 04/16/2012 06:25 AM, Xiao Guangrong wrote:
> On 04/15/2012 11:16 PM, Avi Kivity wrote:
>
> > On 04/13/2012 01:14 PM, Xiao Guangrong wrote:
> >> Using bit 1 (PTE_LIST_WP_BIT) in rmap store the write-protect status
> >> to avoid unnecessary shadow page walking
> >>
> >> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >> ---
> >>  arch/x86/kvm/mmu.c |   40 ++++++++++++++++++++++++++++++++++------
> >>  1 files changed, 34 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> >> index 0c6e92d..8b71908 100644
> >> --- a/arch/x86/kvm/mmu.c
> >> +++ b/arch/x86/kvm/mmu.c
> >> @@ -796,7 +796,9 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
> >>  	return level - 1;
> >>  }
> >>
> >> -#define PTE_LIST_DESC		(0x1ull)
> >> +#define PTE_LIST_DESC_BIT	0
> >> +#define PTE_LIST_WP_BIT	1
> >> +#define PTE_LIST_DESC		(1 << PTE_LIST_DESC_BIT)
> >>  #define PTE_LIST_FLAG_MASK	(0x3ull)
> >>
> >>  static void
> >> @@ -1067,6 +1069,12 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
> >>  	return mmu_memory_cache_free_objects(cache);
> >>  }
> >>
> >> +static void host_page_write_protect(u64 *spte, unsigned long *rmapp)
> >> +{
> >> +	if (!(*spte & SPTE_HOST_WRITEABLE))
> >> +		__test_and_set_bit(PTE_LIST_WP_BIT, rmapp);
> >> +}
> >>
> > 
> > Why is this needed, in addition to spte.SPTE_WRITE_PROTECT?
> > 
>
>
> It is used to avoid the unnecessary overload 

It's overloading me :(

> for fast page fault if
> KSM is enabled. On the fast check path, it can see the gfn is write-protected
> by host, then the fast page fault path is not called.

The fast page fault path is supposed to be fast, so it's okay if we take
a bit of extra overhead before a COW (which is going to be slow anyway).

Let's get the simplest possible version in, and then discuss if/how we
need to optimize it further.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-16 10:02       ` Avi Kivity
@ 2012-04-16 10:20         ` Xiao Guangrong
  2012-04-16 11:47           ` Avi Kivity
  0 siblings, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-16 10:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

On 04/16/2012 06:02 PM, Avi Kivity wrote:

> On 04/16/2012 06:25 AM, Xiao Guangrong wrote:
>> On 04/15/2012 11:16 PM, Avi Kivity wrote:
>>
>>> On 04/13/2012 01:14 PM, Xiao Guangrong wrote:
>>>> Using bit 1 (PTE_LIST_WP_BIT) in rmap store the write-protect status
>>>> to avoid unnecessary shadow page walking
>>>>
>>>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>>>> ---
>>>>  arch/x86/kvm/mmu.c |   40 ++++++++++++++++++++++++++++++++++------
>>>>  1 files changed, 34 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>>> index 0c6e92d..8b71908 100644
>>>> --- a/arch/x86/kvm/mmu.c
>>>> +++ b/arch/x86/kvm/mmu.c
>>>> @@ -796,7 +796,9 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
>>>>  	return level - 1;
>>>>  }
>>>>
>>>> -#define PTE_LIST_DESC		(0x1ull)
>>>> +#define PTE_LIST_DESC_BIT	0
>>>> +#define PTE_LIST_WP_BIT	1
>>>> +#define PTE_LIST_DESC		(1 << PTE_LIST_DESC_BIT)
>>>>  #define PTE_LIST_FLAG_MASK	(0x3ull)
>>>>
>>>>  static void
>>>> @@ -1067,6 +1069,12 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
>>>>  	return mmu_memory_cache_free_objects(cache);
>>>>  }
>>>>
>>>> +static void host_page_write_protect(u64 *spte, unsigned long *rmapp)
>>>> +{
>>>> +	if (!(*spte & SPTE_HOST_WRITEABLE))
>>>> +		__test_and_set_bit(PTE_LIST_WP_BIT, rmapp);
>>>> +}
>>>>
>>>
>>> Why is this needed, in addition to spte.SPTE_WRITE_PROTECT?
>>>
>>
>>
>> It is used to avoid the unnecessary overload 
> 
> It's overloading me :(
> 


Sorry.

>> for fast page fault if
>> KSM is enabled. On the fast check path, it can see the gfn is write-protected
>> by host, then the fast page fault path is not called.
> 
> The fast page fault path is supposed to be fast, so it's okay if we take
> a bit of extra overhead before a COW (which is going to be slow anyway).
> 
> Let's get the simplest possible version in, and then discuss if/how we
> need to optimize it further.
> 


Okay, i will drop setting PTE_LIST_WP_BIT for this case in the next version.:)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-16 10:20         ` Xiao Guangrong
@ 2012-04-16 11:47           ` Avi Kivity
  2012-04-17  3:55             ` Xiao Guangrong
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2012-04-16 11:47 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM

On 04/16/2012 01:20 PM, Xiao Guangrong wrote:
> >>
> >> It is used to avoid the unnecessary overload 
> > 
> > It's overloading me :(
> > 
>
>
> Sorry.
>

The trick is to send those in separate patchset so the maintainer
doesn't notice.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect
  2012-04-15 11:25     ` Avi Kivity
@ 2012-04-16 14:14       ` Takuya Yoshikawa
  2012-04-16 14:28         ` Avi Kivity
  0 siblings, 1 reply; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-16 14:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Xiao Guangrong, Marcelo Tosatti, LKML, KVM

On Sun, 15 Apr 2012 14:25:30 +0300
Avi Kivity <avi@redhat.com> wrote:

> > > @@ -1689,7 +1690,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
> > > 
> > >  	kvm_mmu_pages_init(parent, &parents, &pages);
> > >  	while (mmu_unsync_walk(parent, &pages)) {
> > > -		int protected = 0;
> > > +		bool protected = false;
> > > 
> > >  		for_each_sp(pages, sp, parents, i)
> > >  			protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
> >
> > Isn't this the reason we prefer int to bool?
> >
> > Not sure people like to use |= with boolean.
> >
> 
> Why not?
> 

The code "bitwise OR assignment" is assuming the internal representations
of true and false: true=1, false=0.

I might have seen some point if it had been
	protected = protected || rmap_...


But the real question is whether there is any point in re-writing completely
correct C code: there are tons of int like this in the kernel code.

__rmap_write_protect() was introduced recently, so if this conversion is
really worthwhile, I should have been told to use bool at that time, no?


Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect
  2012-04-16 14:14       ` Takuya Yoshikawa
@ 2012-04-16 14:28         ` Avi Kivity
  2012-04-16 15:54           ` Takuya Yoshikawa
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2012-04-16 14:28 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Xiao Guangrong, Marcelo Tosatti, LKML, KVM

On 04/16/2012 05:14 PM, Takuya Yoshikawa wrote:
> On Sun, 15 Apr 2012 14:25:30 +0300
> Avi Kivity <avi@redhat.com> wrote:
>
> > > > @@ -1689,7 +1690,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
> > > > 
> > > >  	kvm_mmu_pages_init(parent, &parents, &pages);
> > > >  	while (mmu_unsync_walk(parent, &pages)) {
> > > > -		int protected = 0;
> > > > +		bool protected = false;
> > > > 
> > > >  		for_each_sp(pages, sp, parents, i)
> > > >  			protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
> > >
> > > Isn't this the reason we prefer int to bool?
> > >
> > > Not sure people like to use |= with boolean.
> > >
> > 
> > Why not?
> > 
>
> The code "bitwise OR assignment" is assuming the internal representations
> of true and false: true=1, false=0.

No, it doesn't.  |= converts the result back to bool.

In fact it's better than

   int x;
   ...
   x |= some_value() & MASK;

   since MASK might be of type longer than int, and the result can be
truncated.  With bool |=, it cannot.

Disassembly of section .text:

0000000000000000 <f>:

static bool x;

void f(long y)
{
    x |= y;
   0:   0f b6 05 00 00 00 00    movzbl 0x0(%rip),%eax        # 7 <f+0x7>
                        3: R_X86_64_PC32        .bss-0x4
   7:   48 09 c7                or     %rax,%rdi
   a:   0f 95 05 00 00 00 00    setne  0x0(%rip)        # 11 <f+0x11>
                        d: R_X86_64_PC32        .bss-0x4
}
  11:   c3                      retq  

The corresponding code with 'int x' would just store the truncated
result back into x.


> I might have seen some point if it had been
> 	protected = protected || rmap_...
>
>
> But the real question is whether there is any point in re-writing completely
> correct C code: there are tons of int like this in the kernel code.
>
> __rmap_write_protect() was introduced recently, so if this conversion is
> really worthwhile, I should have been told to use bool at that time, no?

It's up to developer and maintainer preference.  I like bool since it
documents the usage and is safer, but sometimes I miss it on review.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect
  2012-04-16 14:28         ` Avi Kivity
@ 2012-04-16 15:54           ` Takuya Yoshikawa
  0 siblings, 0 replies; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-16 15:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Xiao Guangrong, Marcelo Tosatti, LKML, KVM

On Mon, 16 Apr 2012 17:28:32 +0300
Avi Kivity <avi@redhat.com> wrote:

> > But the real question is whether there is any point in re-writing completely
> > correct C code: there are tons of int like this in the kernel code.
> >
> > __rmap_write_protect() was introduced recently, so if this conversion is
> > really worthwhile, I should have been told to use bool at that time, no?
> 
> It's up to developer and maintainer preference.  I like bool since it
> documents the usage and is safer, but sometimes I miss it on review.

OK.

Thank you for your explanation!

	Takuya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-16 11:47           ` Avi Kivity
@ 2012-04-17  3:55             ` Xiao Guangrong
  2012-04-17  7:41               ` Avi Kivity
  0 siblings, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-17  3:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

On 04/16/2012 07:47 PM, Avi Kivity wrote:

> On 04/16/2012 01:20 PM, Xiao Guangrong wrote:
>>>>
>>>> It is used to avoid the unnecessary overload 
>>>
>>> It's overloading me :(
>>>
>>
>>
>> Sorry.
>>
> 
> The trick is to send those in separate patchset so the maintainer
> doesn't notice.
> 


Thanks for your suggestion, i will pay more attention on it in the
further.

For this patch, what did you mean of "those"? You mean the whole
rmap.PTE_LIST_WP_BIT (fast check for shadow page table write protection
and host write protection) or just about host_page_write_protect
(for KSM only)?

If we do not have rmap.PTE_LIST_WP_BIT, there may have regression on
shadow mmu.

Hmm, do i need implement rmap.PTE_LIST_WP_BIT, then fast page fault?


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-17  3:55             ` Xiao Guangrong
@ 2012-04-17  7:41               ` Avi Kivity
  2012-04-17 12:10                 ` Xiao Guangrong
  0 siblings, 1 reply; 49+ messages in thread
From: Avi Kivity @ 2012-04-17  7:41 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM

On 04/17/2012 06:55 AM, Xiao Guangrong wrote:
> On 04/16/2012 07:47 PM, Avi Kivity wrote:
>
> > On 04/16/2012 01:20 PM, Xiao Guangrong wrote:
> >>>>
> >>>> It is used to avoid the unnecessary overload 
> >>>
> >>> It's overloading me :(
> >>>
> >>
> >>
> >> Sorry.
> >>
> > 
> > The trick is to send those in separate patchset so the maintainer
> > doesn't notice.
> > 
>
>
> Thanks for your suggestion, i will pay more attention on it in the
> further.
>
> For this patch, what did you mean of "those"? You mean the whole
> rmap.PTE_LIST_WP_BIT (fast check for shadow page table write protection
> and host write protection) or just about host_page_write_protect
> (for KSM only)?

All of it.  Let's start with just modifying sptes concurrently and only
later add reading bits from rmap concurrently, if it proves necessary.

>
> If we do not have rmap.PTE_LIST_WP_BIT, there may have regression on
> shadow mmu.
>
> Hmm, do i need implement rmap.PTE_LIST_WP_BIT, then fast page fault?

Let's try to measure the effect without rmap.PTE_LIST_WP_BIT.  Usually
PTE chains for page tables are short so the effect would be small.  Of
course we can't tell about all guest.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable
  2012-04-17  7:41               ` Avi Kivity
@ 2012-04-17 12:10                 ` Xiao Guangrong
  0 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-17 12:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

On 04/17/2012 03:41 PM, Avi Kivity wrote:

> On 04/17/2012 06:55 AM, Xiao Guangrong wrote:
>> On 04/16/2012 07:47 PM, Avi Kivity wrote:
>>
>>> On 04/16/2012 01:20 PM, Xiao Guangrong wrote:
>>>>>>
>>>>>> It is used to avoid the unnecessary overload 
>>>>>
>>>>> It's overloading me :(
>>>>>
>>>>
>>>>
>>>> Sorry.
>>>>
>>>
>>> The trick is to send those in separate patchset so the maintainer
>>> doesn't notice.
>>>
>>
>>
>> Thanks for your suggestion, i will pay more attention on it in the
>> further.
>>
>> For this patch, what did you mean of "those"? You mean the whole
>> rmap.PTE_LIST_WP_BIT (fast check for shadow page table write protection
>> and host write protection) or just about host_page_write_protect
>> (for KSM only)?
> 
> All of it.  Let's start with just modifying sptes concurrently and only
> later add reading bits from rmap concurrently, if it proves necessary.
> 


Okay, i agree.

>>
>> If we do not have rmap.PTE_LIST_WP_BIT, there may have regression on
>> shadow mmu.
>>
>> Hmm, do i need implement rmap.PTE_LIST_WP_BIT, then fast page fault?
> 
> Let's try to measure the effect without rmap.PTE_LIST_WP_BIT.  Usually
> PTE chains for page tables are short so the effect would be small.  Of
> course we can't tell about all guest.
> 


It is not about rmap's spte, it is about sp.sync write-protect, if the sp.sync
is written, the fast page fault path will be triggered even if no migration and
no framebuffer.

I have done a quick test for kernbench for 10 times and get the average value
without xwindow:

keep rmap.PTE_LIST_WP_BIT: 53.494
comment rmap.PTE_LIST_WP_BIT checking in page_fault_can_be_fast: 53.948

Anyway, for good review, let move fast page fault in first and discuss this in
the separate patchset later.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte
  2012-04-16  3:36     ` Xiao Guangrong
@ 2012-04-17 14:47       ` Takuya Yoshikawa
  2012-04-18  4:01         ` Xiao Guangrong
  2012-04-18 10:03         ` Xiao Guangrong
  0 siblings, 2 replies; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-17 14:47 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On Mon, 16 Apr 2012 11:36:25 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> I tested it with kernbench, no regression is found.

Because kernbench is not at all good test for this.

> It is not a problem since the iter and spte should be in the cache.

I have checked dirty-log-perf myself with this patch [01-07].

GET_DIRTY_LOG for 1GB dirty pages has become more than 15% slower.

This is not a pure cleanup and introduces significant regression to
rmap handling.

This is the reason why I asked to remove "cleanup patches" from this
patch series.  Had it been really trivial cleanup, I would not have
asked that.


Note: if you had checked the worst case latency with this patch series,
you should have noticed this regression by yourself.

Auto-test and kernbench are not enough to see the effect of this work.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 14/16] KVM: MMU: fast path of handling guest page fault
  2012-04-13 10:16 ` [PATCH v2 14/16] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
@ 2012-04-18  1:47   ` Marcelo Tosatti
  2012-04-18  3:53     ` Xiao Guangrong
  0 siblings, 1 reply; 49+ messages in thread
From: Marcelo Tosatti @ 2012-04-18  1:47 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Fri, Apr 13, 2012 at 06:16:33PM +0800, Xiao Guangrong wrote:
> If the the present bit of page fault error code is set, it indicates
> the shadow page is populated on all levels, it means what we do is
> only modify the access bit which can be done out of mmu-lock
> 
> Currently, in order to simplify the code, we only fix the page fault
> caused by write-protect on the fast path
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c         |  205 ++++++++++++++++++++++++++++++++++++++++----
>  arch/x86/kvm/paging_tmpl.h |    3 +
>  2 files changed, 192 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index efa5d59..fc91667 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -446,6 +446,13 @@ static bool __check_direct_spte_mmio_pf(u64 spte)
>  }
>  #endif
> 
> +static bool spte_wp_by_dirty_log(u64 spte)
> +{
> +	WARN_ON(is_writable_pte(spte));
> +
> +	return (spte & SPTE_ALLOW_WRITE) && !(spte & SPTE_WRITE_PROTECT);
> +}
> +
>  static bool spte_has_volatile_bits(u64 spte)
>  {
>  	if (!shadow_accessed_mask)
> @@ -454,9 +461,18 @@ static bool spte_has_volatile_bits(u64 spte)
>  	if (!is_shadow_present_pte(spte))
>  		return false;
> 
> -	if ((spte & shadow_accessed_mask) &&
> -	      (!is_writable_pte(spte) || (spte & shadow_dirty_mask)))
> -		return false;
> +	if (spte & shadow_accessed_mask) {
> +		if (is_writable_pte(spte))
> +			return !(spte & shadow_dirty_mask);
> +
> +		/*
> +		 * If the spte is write-protected by dirty-log, it may
> +		 * be marked writable on fast page fault path, then CPU
> +		 * can modify the Dirty bit.
> +		 */
> +		if (!spte_wp_by_dirty_log(spte))
> +			return false;
> +	}
> 
>  	return true;
>  }
> @@ -1109,26 +1125,18 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
>  		rmap_remove(kvm, sptep);
>  }
> 
> -static bool spte_wp_by_dirty_log(u64 spte)
> -{
> -	WARN_ON(is_writable_pte(spte));
> -
> -	return (spte & SPTE_ALLOW_WRITE) && !(spte & SPTE_WRITE_PROTECT);
> -}
> -
>  static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
>  			       bool *flush, bool page_table_protect)
>  {
>  	u64 spte = *sptep;
> 
>  	if (is_writable_pte(spte)) {
> -		*flush |= true;
> -
>  		if (large) {
>  			pgprintk("rmap_write_protect(large): spte %p %llx\n",
>  				 spte, *spte);
>  			BUG_ON(!is_large_pte(spte));
> 
> +			*flush |= true;
>  			drop_spte(kvm, sptep);
>  			--kvm->stat.lpages;
>  			return;
> @@ -1137,6 +1145,9 @@ static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
>  		goto reset_spte;
>  	}
> 
> +	/* We need flush tlbs in this case: the fast page fault path
> +	 * can mark the spte writable after we read the sptep.
> +	 */
>  	if (page_table_protect && spte_wp_by_dirty_log(spte))
>  		goto reset_spte;
> 
> @@ -1144,6 +1155,8 @@ static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
> 
>  reset_spte:
>  	rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
> +
> +	*flush |= true;
>  	spte = spte & ~PT_WRITABLE_MASK;
>  	if (page_table_protect)
>  		spte |= SPTE_WRITE_PROTECT;
> @@ -2767,18 +2780,172 @@ exit:
>  	return ret;
>  }
> 
> +static bool page_fault_can_be_fast(struct kvm_vcpu *vcpu, gfn_t gfn,
> +				   u32 error_code)
> +{
> +	unsigned long *rmap;
> +
> +	/*
> +	 * #PF can be fast only if the shadow page table is present and it
> +	 * is caused by write-protect, that means we just need change the
> +	 * W bit of the spte which can be done out of mmu-lock.
> +	 */
> +	if (!(error_code & PFERR_PRESENT_MASK) ||
> +	      !(error_code & PFERR_WRITE_MASK))
> +		return false;
> +
> +	rmap = gfn_to_rmap(vcpu->kvm, gfn, PT_PAGE_TABLE_LEVEL);
> +
> +	/* Quickly check the page can be writable. */
> +	if (test_bit(PTE_LIST_WP_BIT, ACCESS_ONCE(rmap)))
> +		return false;
> +
> +	return true;
> +}
> +
> +static bool
> +fast_pf_fix_indirect_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> +			  u64 *sptep, u64 spte, gfn_t gfn)
> +{
> +	pfn_t pfn;
> +	bool ret = false;
> +
> +	/*
> +	 * For the indirect spte, it is hard to get a stable gfn since
> +	 * we just use a cmpxchg to avoid all the races which is not
> +	 * enough to avoid the ABA problem: the host can arbitrarily
> +	 * change spte and the mapping from gfn to pfh.
> +	 *
> +	 * What we do is call gfn_to_pfn_atomic to bind the gfn and the
> +	 * pfn because after the call:
> +	 * - we have held the refcount of pfn that means the pfn can not
> +	 *   be freed and be reused for another gfn.
> +	 * - the pfn is writable that means it can not be shared by different
> +	 *   gfn.
> +	 */
> +	pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn);

Please document what can happen in parallel whenever you manipulate
sptes without mmu_lock held, convincing the reader that this is safe.

> +
> +/*
> + * Return value:
> + * - true: let the vcpu to access on the same address again.
> + * - false: let the real page fault path to fix it.
> + */
> +static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
> +			    int level, u32 error_code)
> +{
> +	struct kvm_shadow_walk_iterator iterator;
> +	struct kvm_mmu_page *sp;
> +	bool ret = false;
> +	u64 spte = 0ull;
> +
> +	if (!page_fault_can_be_fast(vcpu, gfn, error_code))
> +		return false;

What prevents kvm_mmu_commit_zap_page from nukeing the "shadow_page"
here, again? At this point the faulting spte could be zero (!present), 
and you have not yet increased reader_counter.

Same with current users of walk_shadow_page_lockless_begin.

I agree with Takuya, please reduce the size of the patchset to only to
what is strictly necessary, it appears many of the first patches are
not.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 14/16] KVM: MMU: fast path of handling guest page fault
  2012-04-18  1:47   ` Marcelo Tosatti
@ 2012-04-18  3:53     ` Xiao Guangrong
  2012-04-18 23:08       ` Marcelo Tosatti
  0 siblings, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-18  3:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 04/18/2012 09:47 AM, Marcelo Tosatti wrote:

> On Fri, Apr 13, 2012 at 06:16:33PM +0800, Xiao Guangrong wrote:
>> If the the present bit of page fault error code is set, it indicates
>> the shadow page is populated on all levels, it means what we do is
>> only modify the access bit which can be done out of mmu-lock
>>
>> Currently, in order to simplify the code, we only fix the page fault
>> caused by write-protect on the fast path
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c         |  205 ++++++++++++++++++++++++++++++++++++++++----
>>  arch/x86/kvm/paging_tmpl.h |    3 +
>>  2 files changed, 192 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index efa5d59..fc91667 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -446,6 +446,13 @@ static bool __check_direct_spte_mmio_pf(u64 spte)
>>  }
>>  #endif
>>
>> +static bool spte_wp_by_dirty_log(u64 spte)
>> +{
>> +	WARN_ON(is_writable_pte(spte));
>> +
>> +	return (spte & SPTE_ALLOW_WRITE) && !(spte & SPTE_WRITE_PROTECT);
>> +}
>> +
>>  static bool spte_has_volatile_bits(u64 spte)
>>  {
>>  	if (!shadow_accessed_mask)
>> @@ -454,9 +461,18 @@ static bool spte_has_volatile_bits(u64 spte)
>>  	if (!is_shadow_present_pte(spte))
>>  		return false;
>>
>> -	if ((spte & shadow_accessed_mask) &&
>> -	      (!is_writable_pte(spte) || (spte & shadow_dirty_mask)))
>> -		return false;
>> +	if (spte & shadow_accessed_mask) {
>> +		if (is_writable_pte(spte))
>> +			return !(spte & shadow_dirty_mask);
>> +
>> +		/*
>> +		 * If the spte is write-protected by dirty-log, it may
>> +		 * be marked writable on fast page fault path, then CPU
>> +		 * can modify the Dirty bit.
>> +		 */
>> +		if (!spte_wp_by_dirty_log(spte))
>> +			return false;
>> +	}
>>
>>  	return true;
>>  }
>> @@ -1109,26 +1125,18 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
>>  		rmap_remove(kvm, sptep);
>>  }
>>
>> -static bool spte_wp_by_dirty_log(u64 spte)
>> -{
>> -	WARN_ON(is_writable_pte(spte));
>> -
>> -	return (spte & SPTE_ALLOW_WRITE) && !(spte & SPTE_WRITE_PROTECT);
>> -}
>> -
>>  static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
>>  			       bool *flush, bool page_table_protect)
>>  {
>>  	u64 spte = *sptep;
>>
>>  	if (is_writable_pte(spte)) {
>> -		*flush |= true;
>> -
>>  		if (large) {
>>  			pgprintk("rmap_write_protect(large): spte %p %llx\n",
>>  				 spte, *spte);
>>  			BUG_ON(!is_large_pte(spte));
>>
>> +			*flush |= true;
>>  			drop_spte(kvm, sptep);
>>  			--kvm->stat.lpages;
>>  			return;
>> @@ -1137,6 +1145,9 @@ static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
>>  		goto reset_spte;
>>  	}
>>
>> +	/* We need flush tlbs in this case: the fast page fault path
>> +	 * can mark the spte writable after we read the sptep.
>> +	 */
>>  	if (page_table_protect && spte_wp_by_dirty_log(spte))
>>  		goto reset_spte;
>>
>> @@ -1144,6 +1155,8 @@ static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
>>
>>  reset_spte:
>>  	rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
>> +
>> +	*flush |= true;
>>  	spte = spte & ~PT_WRITABLE_MASK;
>>  	if (page_table_protect)
>>  		spte |= SPTE_WRITE_PROTECT;
>> @@ -2767,18 +2780,172 @@ exit:
>>  	return ret;
>>  }
>>
>> +static bool page_fault_can_be_fast(struct kvm_vcpu *vcpu, gfn_t gfn,
>> +				   u32 error_code)
>> +{
>> +	unsigned long *rmap;
>> +
>> +	/*
>> +	 * #PF can be fast only if the shadow page table is present and it
>> +	 * is caused by write-protect, that means we just need change the
>> +	 * W bit of the spte which can be done out of mmu-lock.
>> +	 */
>> +	if (!(error_code & PFERR_PRESENT_MASK) ||
>> +	      !(error_code & PFERR_WRITE_MASK))
>> +		return false;
>> +
>> +	rmap = gfn_to_rmap(vcpu->kvm, gfn, PT_PAGE_TABLE_LEVEL);
>> +
>> +	/* Quickly check the page can be writable. */
>> +	if (test_bit(PTE_LIST_WP_BIT, ACCESS_ONCE(rmap)))
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +static bool
>> +fast_pf_fix_indirect_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>> +			  u64 *sptep, u64 spte, gfn_t gfn)
>> +{
>> +	pfn_t pfn;
>> +	bool ret = false;
>> +
>> +	/*
>> +	 * For the indirect spte, it is hard to get a stable gfn since
>> +	 * we just use a cmpxchg to avoid all the races which is not
>> +	 * enough to avoid the ABA problem: the host can arbitrarily
>> +	 * change spte and the mapping from gfn to pfh.
>> +	 *
>> +	 * What we do is call gfn_to_pfn_atomic to bind the gfn and the
>> +	 * pfn because after the call:
>> +	 * - we have held the refcount of pfn that means the pfn can not
>> +	 *   be freed and be reused for another gfn.
>> +	 * - the pfn is writable that means it can not be shared by different
>> +	 *   gfn.
>> +	 */
>> +	pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn);
> 
> Please document what can happen in parallel whenever you manipulate
> sptes without mmu_lock held, convincing the reader that this is safe.
> 


OK, I will documents it in locking.txt

I am not good at documenting, How about the below description?

What we use to avoid all the race is spte.SPTE_ALLOW_WRITE and spte.SPTE_WRITE_PROTECT:
SPTE_ALLOW_WRITE means the gfn is writable on both guest and host, and SPTE_ALLOW_WRITE
means this gfn is not write-protected for shadow page write protection.

On fast page fault path, we will atomically set the spte.W bit if
spte.SPTE_WRITE_PROTECT = 1 and spte.SPTE_WRITE_PROTECT = 0, this is safe because whenever
changing these bits can be detected by cmpxchg.

But we need carefully check the mapping between gfn to pfn since we can only ensure the pfn
is not changed during cmpxchg. This is a ABA problem, for example, below case will happen:

At the beginning:
gpte = gfn1
gfn1 is mapped to pfn1 on host
spte is the shadow page table entry corresponding with gpte and
spte = pfn1

   VCPU 0                                           VCPU0
on page fault path:

   old_spte = *spte;
                                                 pfn1 is swapped out:
                                                             spte = 0;
                                                 pfn1 is re-alloced for gfn2
                                                 gpte is changed by host, and pointing to gfn2:
                                                             spte = pfn1;

   if (cmpxchg(spte, old_spte, old_spte+W)
	mark_page_dirty(vcpu->kvm, gfn1)
  	OOPS!!!
   we dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.

For direct sp, we can easily avoid it since the spte of direct sp is fixed to gfn.
For indirect sp, before we do cmpxchg, we call gfn_to_pfn_atomic to pin gfn to pfn,
because after gfn_to_pfn_atomic:
- we have held the refcount of pfn that means the pfn can not
  be freed and be reused for another gfn.
- the pfn is writable that means it can not be shared by different
  gfn by KSM.
Then, we can ensure the dirty bitmaps is correctly set for a gfn.

>> +
>> +/*
>> + * Return value:
>> + * - true: let the vcpu to access on the same address again.
>> + * - false: let the real page fault path to fix it.
>> + */
>> +static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
>> +			    int level, u32 error_code)
>> +{
>> +	struct kvm_shadow_walk_iterator iterator;
>> +	struct kvm_mmu_page *sp;
>> +	bool ret = false;
>> +	u64 spte = 0ull;
>> +
>> +	if (!page_fault_can_be_fast(vcpu, gfn, error_code))
>> +		return false;
> 
> What prevents kvm_mmu_commit_zap_page from nukeing the "shadow_page"
> here, again? At this point the faulting spte could be zero (!present), 
> and you have not yet increased reader_counter.
> 


Hmm, you mean page_fault_can_be_fast? we do not use any spte in this function,
kvm_mmu_commit_zap_page can happily zap the shadow page.

> Same with current users of walk_shadow_page_lockless_begin.


After walk_shadow_page_lockless_begin, it is safe since reader_counter has been
increased:

static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
{
	rcu_read_lock();
	atomic_inc(&vcpu->kvm->arch.reader_counter);

	/* Increase the counter before walking shadow page table */
	smp_mb__after_atomic_inc();
}

It is not enough?

 
> I agree with Takuya, please reduce the size of the patchset to only to
> what is strictly necessary, it appears many of the first patches are
> not.


I will post a simple patchset soon, of course, after collecting your comments
in this mail. ;)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte
  2012-04-17 14:47       ` Takuya Yoshikawa
@ 2012-04-18  4:01         ` Xiao Guangrong
  2012-04-21  1:01           ` Takuya Yoshikawa
  2012-04-18 10:03         ` Xiao Guangrong
  1 sibling, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-18  4:01 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On 04/17/2012 10:47 PM, Takuya Yoshikawa wrote:

> On Mon, 16 Apr 2012 11:36:25 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>> I tested it with kernbench, no regression is found.
> 
> Because kernbench is not at all good test for this.
> 
>> It is not a problem since the iter and spte should be in the cache.
> 
> I have checked dirty-log-perf myself with this patch [01-07].
> 
> GET_DIRTY_LOG for 1GB dirty pages has become more than 15% slower.
> 


Thanks for your test!

Unbelievable, i will do more test and check it more carefully.

Could you please open your tool, then i can reproduction it and find the
real reason?

> 
> Note: if you had checked the worst case latency with this patch series,
> you should have noticed this regression by yourself.
> 
> Auto-test and kernbench are not enough to see the effect of this work.
> 


I will check whether your tool is better then kernbench/autotest after
review your tool.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte
  2012-04-17 14:47       ` Takuya Yoshikawa
  2012-04-18  4:01         ` Xiao Guangrong
@ 2012-04-18 10:03         ` Xiao Guangrong
  2012-04-21  1:03           ` Takuya Yoshikawa
  1 sibling, 1 reply; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-18 10:03 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On 04/17/2012 10:47 PM, Takuya Yoshikawa wrote:

> On Mon, 16 Apr 2012 11:36:25 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>> I tested it with kernbench, no regression is found.
> 
> Because kernbench is not at all good test for this.
> 
>> It is not a problem since the iter and spte should be in the cache.
> 
> I have checked dirty-log-perf myself with this patch [01-07].
> 
> GET_DIRTY_LOG for 1GB dirty pages has become more than 15% slower.


By the way, what is the case did you test? ept = 1 ?


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 14/16] KVM: MMU: fast path of handling guest page fault
  2012-04-18  3:53     ` Xiao Guangrong
@ 2012-04-18 23:08       ` Marcelo Tosatti
  0 siblings, 0 replies; 49+ messages in thread
From: Marcelo Tosatti @ 2012-04-18 23:08 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Wed, Apr 18, 2012 at 11:53:51AM +0800, Xiao Guangrong wrote:
> On 04/18/2012 09:47 AM, Marcelo Tosatti wrote:
> 
> > On Fri, Apr 13, 2012 at 06:16:33PM +0800, Xiao Guangrong wrote:
> >> If the the present bit of page fault error code is set, it indicates
> >> the shadow page is populated on all levels, it means what we do is
> >> only modify the access bit which can be done out of mmu-lock
> >>
> >> Currently, in order to simplify the code, we only fix the page fault
> >> caused by write-protect on the fast path
> >>
> >> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >> ---
> >>  arch/x86/kvm/mmu.c         |  205 ++++++++++++++++++++++++++++++++++++++++----
> >>  arch/x86/kvm/paging_tmpl.h |    3 +
> >>  2 files changed, 192 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> >> index efa5d59..fc91667 100644
> >> --- a/arch/x86/kvm/mmu.c
> >> +++ b/arch/x86/kvm/mmu.c
> >> @@ -446,6 +446,13 @@ static bool __check_direct_spte_mmio_pf(u64 spte)
> >>  }
> >>  #endif
> >>
> >> +static bool spte_wp_by_dirty_log(u64 spte)
> >> +{
> >> +	WARN_ON(is_writable_pte(spte));
> >> +
> >> +	return (spte & SPTE_ALLOW_WRITE) && !(spte & SPTE_WRITE_PROTECT);
> >> +}
> >> +
> >>  static bool spte_has_volatile_bits(u64 spte)
> >>  {
> >>  	if (!shadow_accessed_mask)
> >> @@ -454,9 +461,18 @@ static bool spte_has_volatile_bits(u64 spte)
> >>  	if (!is_shadow_present_pte(spte))
> >>  		return false;
> >>
> >> -	if ((spte & shadow_accessed_mask) &&
> >> -	      (!is_writable_pte(spte) || (spte & shadow_dirty_mask)))
> >> -		return false;
> >> +	if (spte & shadow_accessed_mask) {
> >> +		if (is_writable_pte(spte))
> >> +			return !(spte & shadow_dirty_mask);
> >> +
> >> +		/*
> >> +		 * If the spte is write-protected by dirty-log, it may
> >> +		 * be marked writable on fast page fault path, then CPU
> >> +		 * can modify the Dirty bit.
> >> +		 */
> >> +		if (!spte_wp_by_dirty_log(spte))
> >> +			return false;
> >> +	}
> >>
> >>  	return true;
> >>  }
> >> @@ -1109,26 +1125,18 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
> >>  		rmap_remove(kvm, sptep);
> >>  }
> >>
> >> -static bool spte_wp_by_dirty_log(u64 spte)
> >> -{
> >> -	WARN_ON(is_writable_pte(spte));
> >> -
> >> -	return (spte & SPTE_ALLOW_WRITE) && !(spte & SPTE_WRITE_PROTECT);
> >> -}
> >> -
> >>  static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
> >>  			       bool *flush, bool page_table_protect)
> >>  {
> >>  	u64 spte = *sptep;
> >>
> >>  	if (is_writable_pte(spte)) {
> >> -		*flush |= true;
> >> -
> >>  		if (large) {
> >>  			pgprintk("rmap_write_protect(large): spte %p %llx\n",
> >>  				 spte, *spte);
> >>  			BUG_ON(!is_large_pte(spte));
> >>
> >> +			*flush |= true;
> >>  			drop_spte(kvm, sptep);
> >>  			--kvm->stat.lpages;
> >>  			return;
> >> @@ -1137,6 +1145,9 @@ static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
> >>  		goto reset_spte;
> >>  	}
> >>
> >> +	/* We need flush tlbs in this case: the fast page fault path
> >> +	 * can mark the spte writable after we read the sptep.
> >> +	 */
> >>  	if (page_table_protect && spte_wp_by_dirty_log(spte))
> >>  		goto reset_spte;
> >>
> >> @@ -1144,6 +1155,8 @@ static void spte_write_protect(struct kvm *kvm, u64 *sptep, bool large,
> >>
> >>  reset_spte:
> >>  	rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
> >> +
> >> +	*flush |= true;
> >>  	spte = spte & ~PT_WRITABLE_MASK;
> >>  	if (page_table_protect)
> >>  		spte |= SPTE_WRITE_PROTECT;
> >> @@ -2767,18 +2780,172 @@ exit:
> >>  	return ret;
> >>  }
> >>
> >> +static bool page_fault_can_be_fast(struct kvm_vcpu *vcpu, gfn_t gfn,
> >> +				   u32 error_code)
> >> +{
> >> +	unsigned long *rmap;
> >> +
> >> +	/*
> >> +	 * #PF can be fast only if the shadow page table is present and it
> >> +	 * is caused by write-protect, that means we just need change the
> >> +	 * W bit of the spte which can be done out of mmu-lock.
> >> +	 */
> >> +	if (!(error_code & PFERR_PRESENT_MASK) ||
> >> +	      !(error_code & PFERR_WRITE_MASK))
> >> +		return false;
> >> +
> >> +	rmap = gfn_to_rmap(vcpu->kvm, gfn, PT_PAGE_TABLE_LEVEL);
> >> +
> >> +	/* Quickly check the page can be writable. */
> >> +	if (test_bit(PTE_LIST_WP_BIT, ACCESS_ONCE(rmap)))
> >> +		return false;
> >> +
> >> +	return true;
> >> +}
> >> +
> >> +static bool
> >> +fast_pf_fix_indirect_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> >> +			  u64 *sptep, u64 spte, gfn_t gfn)
> >> +{
> >> +	pfn_t pfn;
> >> +	bool ret = false;
> >> +
> >> +	/*
> >> +	 * For the indirect spte, it is hard to get a stable gfn since
> >> +	 * we just use a cmpxchg to avoid all the races which is not
> >> +	 * enough to avoid the ABA problem: the host can arbitrarily
> >> +	 * change spte and the mapping from gfn to pfh.
> >> +	 *
> >> +	 * What we do is call gfn_to_pfn_atomic to bind the gfn and the
> >> +	 * pfn because after the call:
> >> +	 * - we have held the refcount of pfn that means the pfn can not
> >> +	 *   be freed and be reused for another gfn.
> >> +	 * - the pfn is writable that means it can not be shared by different
> >> +	 *   gfn.
> >> +	 */
> >> +	pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn);
> > 
> > Please document what can happen in parallel whenever you manipulate
> > sptes without mmu_lock held, convincing the reader that this is safe.
> > 
> 
> 
> OK, I will documents it in locking.txt
> 
> I am not good at documenting, How about the below description?
> 
> What we use to avoid all the race is spte.SPTE_ALLOW_WRITE and spte.SPTE_WRITE_PROTECT:
> SPTE_ALLOW_WRITE means the gfn is writable on both guest and host, and SPTE_ALLOW_WRITE
> means this gfn is not write-protected for shadow page write protection.
> 
> On fast page fault path, we will atomically set the spte.W bit if
> spte.SPTE_WRITE_PROTECT = 1 and spte.SPTE_WRITE_PROTECT = 0, this is safe because whenever
> changing these bits can be detected by cmpxchg.
> 
> But we need carefully check the mapping between gfn to pfn since we can only ensure the pfn
> is not changed during cmpxchg. This is a ABA problem, for example, below case will happen:
> 
> At the beginning:
> gpte = gfn1
> gfn1 is mapped to pfn1 on host
> spte is the shadow page table entry corresponding with gpte and
> spte = pfn1
> 
>    VCPU 0                                           VCPU0
> on page fault path:
> 
>    old_spte = *spte;
>                                                  pfn1 is swapped out:
>                                                              spte = 0;
>                                                  pfn1 is re-alloced for gfn2
>                                                  gpte is changed by host, and pointing to gfn2:
>                                                              spte = pfn1;
> 
>    if (cmpxchg(spte, old_spte, old_spte+W)
> 	mark_page_dirty(vcpu->kvm, gfn1)
>   	OOPS!!!
>    we dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.
> 
> For direct sp, we can easily avoid it since the spte of direct sp is fixed to gfn.
> For indirect sp, before we do cmpxchg, we call gfn_to_pfn_atomic to pin gfn to pfn,
> because after gfn_to_pfn_atomic:
> - we have held the refcount of pfn that means the pfn can not
>   be freed and be reused for another gfn.
> - the pfn is writable that means it can not be shared by different
>   gfn by KSM.
> Then, we can ensure the dirty bitmaps is correctly set for a gfn.

This is one possible scenario, OK. If you can list all possibilities,
better.

We need to check every possible case carefully.

> > Same with current users of walk_shadow_page_lockless_begin.
> 
> 
> After walk_shadow_page_lockless_begin, it is safe since reader_counter has been
> increased:
> 
> static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
> {
> 	rcu_read_lock();
> 	atomic_inc(&vcpu->kvm->arch.reader_counter);
> 
> 	/* Increase the counter before walking shadow page table */
> 	smp_mb__after_atomic_inc();
> }
> 
> It is not enough?

It is, i don't know what i was talking about.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte
  2012-04-18  4:01         ` Xiao Guangrong
@ 2012-04-21  1:01           ` Takuya Yoshikawa
  2012-04-21  4:36             ` Xiao Guangrong
  0 siblings, 1 reply; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-21  1:01 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Sorry for the delay.

On Wed, 18 Apr 2012 12:01:03 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> > I have checked dirty-log-perf myself with this patch [01-07].
> > 
> > GET_DIRTY_LOG for 1GB dirty pages has become more than 15% slower.

> Thanks for your test!
> 
> Unbelievable, i will do more test and check it more carefully.

GET_DIRTY_LOG now traverses rmap lists intensively.
So only a small change can affect the performance when there are many
dirty pages.

> Could you please open your tool, then i can reproduction it and find the
> real reason?

It's already in kvm unit tests!

> I will check whether your tool is better then kernbench/autotest after
> review your tool.

Let's focus on "lock-less" now: so dirty-log-perf is not needed now.

I think you failed to appeal the real advantage of your "lock-less" approach!
I will write about this on v3-threads.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte
  2012-04-18 10:03         ` Xiao Guangrong
@ 2012-04-21  1:03           ` Takuya Yoshikawa
  0 siblings, 0 replies; 49+ messages in thread
From: Takuya Yoshikawa @ 2012-04-21  1:03 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On Wed, 18 Apr 2012 18:03:10 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> By the way, what is the case did you test? ept = 1 ?

Yes!

I am not worrying about without-EPT/NPT-migration.

	Takuya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte
  2012-04-21  1:01           ` Takuya Yoshikawa
@ 2012-04-21  4:36             ` Xiao Guangrong
  0 siblings, 0 replies; 49+ messages in thread
From: Xiao Guangrong @ 2012-04-21  4:36 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Xiao Guangrong, Avi Kivity, Marcelo Tosatti, LKML, KVM

On 04/21/2012 09:01 AM, Takuya Yoshikawa wrote:

> Sorry for the delay.
> 
> On Wed, 18 Apr 2012 12:01:03 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>>> I have checked dirty-log-perf myself with this patch [01-07].
>>>
>>> GET_DIRTY_LOG for 1GB dirty pages has become more than 15% slower.
> 
>> Thanks for your test!
>>
>> Unbelievable, i will do more test and check it more carefully.
> 
> GET_DIRTY_LOG now traverses rmap lists intensively.
> So only a small change can affect the performance when there are many
> dirty pages.
> 
>> Could you please open your tool, then i can reproduction it and find the
>> real reason?
> 
> It's already in kvm unit tests!
> 


Great, i will check that.

>> I will check whether your tool is better then kernbench/autotest after
>> review your tool.
> 
> Let's focus on "lock-less" now: so dirty-log-perf is not needed now.
> 
> I think you failed to appeal the real advantage of your "lock-less" approach!
> I will write about this on v3-threads.


Thank you very much, Takuya!

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2012-04-21  4:36 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-13 10:05 [PATCH v2 00/16] KVM: MMU: fast page fault Xiao Guangrong
2012-04-13 10:09 ` [PATCH v2 01/16] KVM: MMU: cleanup __direct_map Xiao Guangrong
2012-04-13 10:10 ` [PATCH v2 02/16] KVM: MMU: introduce mmu_spte_establish Xiao Guangrong
2012-04-13 10:10 ` [PATCH v2 03/16] KVM: MMU: properly assert spte on rmap walking path Xiao Guangrong
2012-04-14  2:15   ` Takuya Yoshikawa
2012-04-16  3:26     ` Xiao Guangrong
2012-04-13 10:11 ` [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect Xiao Guangrong
2012-04-14  2:00   ` Takuya Yoshikawa
2012-04-15 11:25     ` Avi Kivity
2012-04-16 14:14       ` Takuya Yoshikawa
2012-04-16 14:28         ` Avi Kivity
2012-04-16 15:54           ` Takuya Yoshikawa
2012-04-13 10:11 ` [PATCH v2 05/16] KVM: MMU: abstract spte write-protect Xiao Guangrong
2012-04-14  2:26   ` Takuya Yoshikawa
2012-04-16  3:27     ` Xiao Guangrong
2012-04-13 10:12 ` [PATCH v2 06/16] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
2012-04-13 10:12 ` [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte Xiao Guangrong
2012-04-14  2:44   ` Takuya Yoshikawa
2012-04-16  3:36     ` Xiao Guangrong
2012-04-17 14:47       ` Takuya Yoshikawa
2012-04-18  4:01         ` Xiao Guangrong
2012-04-21  1:01           ` Takuya Yoshikawa
2012-04-21  4:36             ` Xiao Guangrong
2012-04-18 10:03         ` Xiao Guangrong
2012-04-21  1:03           ` Takuya Yoshikawa
2012-04-13 10:13 ` [PATCH v2 08/16] KVM: MMU: store more bits in rmap Xiao Guangrong
2012-04-13 10:13 ` [PATCH v2 09/16] KVM: MMU: fast mmu_need_write_protect path for hard mmu Xiao Guangrong
2012-04-13 10:14 ` [PATCH v2 10/16] KVM: MMU: fask check whether page is writable Xiao Guangrong
2012-04-14  3:01   ` Takuya Yoshikawa
2012-04-16  3:38     ` Xiao Guangrong
2012-04-15 15:16   ` Avi Kivity
2012-04-16  3:25     ` Xiao Guangrong
2012-04-16 10:02       ` Avi Kivity
2012-04-16 10:20         ` Xiao Guangrong
2012-04-16 11:47           ` Avi Kivity
2012-04-17  3:55             ` Xiao Guangrong
2012-04-17  7:41               ` Avi Kivity
2012-04-17 12:10                 ` Xiao Guangrong
2012-04-13 10:14 ` [PATCH v2 11/16] KVM: MMU: introduce SPTE_ALLOW_WRITE bit Xiao Guangrong
2012-04-13 10:15 ` [PATCH v2 12/16] KVM: MMU: introduce SPTE_WRITE_PROTECT bit Xiao Guangrong
2012-04-13 10:15 ` [PATCH v2 13/16] KVM: MMU: break sptes write-protect if gfn is writable Xiao Guangrong
2012-04-13 10:16 ` [PATCH v2 14/16] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
2012-04-18  1:47   ` Marcelo Tosatti
2012-04-18  3:53     ` Xiao Guangrong
2012-04-18 23:08       ` Marcelo Tosatti
2012-04-13 10:17 ` [PATCH v2 15/16] KVM: MMU: trace fast " Xiao Guangrong
2012-04-13 10:17 ` [PATCH v2 16/16] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
2012-04-14  3:37 ` [PATCH v2 00/16] KVM: MMU: fast page fault Takuya Yoshikawa
2012-04-16  3:50   ` Xiao Guangrong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.