All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 01/12] KVM: MMU: lazily drop large spte
@ 2013-01-23 10:04 Xiao Guangrong
  2013-01-23 10:04 ` [PATCH v2 02/12] KVM: MMU: cleanup mapping-level Xiao Guangrong
                   ` (11 more replies)
  0 siblings, 12 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:04 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Gleb Natapov, LKML, KVM

Do not drop large spte until it can be insteaded by small pages so that
the guest can happliy read memory through it

The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter.  This removes the need for the return value.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   21 ++++++---------------
 1 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9f628f7..0f90269 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1105,7 +1105,7 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)

 /*
  * Write-protect on the specified @sptep, @pt_protect indicates whether
- * spte writ-protection is caused by protecting shadow page table.
+ * spte write-protection is caused by protecting shadow page table.
  * @flush indicates whether tlb need be flushed.
  *
  * Note: write protection is difference between drity logging and spte
@@ -1114,31 +1114,23 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
  *   its dirty bitmap is properly set.
  * - for spte protection, the spte can be writable only after unsync-ing
  *   shadow page.
- *
- * Return true if the spte is dropped.
  */
-static bool
+static void
 spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect)
 {
 	u64 spte = *sptep;

 	if (!is_writable_pte(spte) &&
 	      !(pt_protect && spte_is_locklessly_modifiable(spte)))
-		return false;
+		return;

 	rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);

-	if (__drop_large_spte(kvm, sptep)) {
-		*flush |= true;
-		return true;
-	}
-
 	if (pt_protect)
 		spte &= ~SPTE_MMU_WRITEABLE;
 	spte = spte & ~PT_WRITABLE_MASK;

 	*flush |= mmu_spte_update(sptep, spte);
-	return false;
 }

 static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
@@ -1150,11 +1142,8 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,

 	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
 		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-		if (spte_write_protect(kvm, sptep, &flush, pt_protect)) {
-			sptep = rmap_get_first(*rmapp, &iter);
-			continue;
-		}

+		spte_write_protect(kvm, sptep, &flush, pt_protect);
 		sptep = rmap_get_next(&iter);
 	}

@@ -2611,6 +2600,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 			break;
 		}

+		drop_large_spte(vcpu, iterator.sptep);
+
 		if (!is_shadow_present_pte(*iterator.sptep)) {
 			u64 base_addr = iterator.addr;

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 02/12] KVM: MMU: cleanup mapping-level
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
@ 2013-01-23 10:04 ` Xiao Guangrong
  2013-01-23 10:05 ` [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte Xiao Guangrong
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:04 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

Use min() to cleanup mapping_level

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0f90269..8dca8af 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -831,8 +831,7 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
 	if (host_level == PT_PAGE_TABLE_LEVEL)
 		return host_level;

-	max_level = kvm_x86_ops->get_lpage_level() < host_level ?
-		kvm_x86_ops->get_lpage_level() : host_level;
+	max_level = min(kvm_x86_ops->get_lpage_level(), host_level);

 	for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
 		if (has_wrprotected_page(vcpu->kvm, large_gfn, level))
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
  2013-01-23 10:04 ` [PATCH v2 02/12] KVM: MMU: cleanup mapping-level Xiao Guangrong
@ 2013-01-23 10:05 ` Xiao Guangrong
  2013-01-29  0:21   ` Marcelo Tosatti
  2013-01-23 10:06 ` [PATCH v2 04/12] KVM: MMU: simplify set_spte Xiao Guangrong
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:05 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

In order to detecting spte remapping, we can simply check whether the
spte has already been pointing to the pfn even if the spte is not the
last spte, for middle spte is pointing to the kernel pfn which can not
be mapped to userspace

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   28 +++++++---------------------
 1 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8dca8af..a999755 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2407,33 +2407,20 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			 int write_fault, int *emulate, int level, gfn_t gfn,
 			 pfn_t pfn, bool speculative, bool host_writable)
 {
-	int was_rmapped = 0;
-	int rmap_count;
+	bool was_rmapped = false;

 	pgprintk("%s: spte %llx access %x write_fault %d gfn %llx\n",
 		 __func__, *sptep, pt_access,
 		 write_fault, gfn);

 	if (is_rmap_spte(*sptep)) {
-		/*
-		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
-		 * the parent of the now unreachable PTE.
-		 */
-		if (level > PT_PAGE_TABLE_LEVEL &&
-		    !is_large_pte(*sptep)) {
-			struct kvm_mmu_page *child;
-			u64 pte = *sptep;
+		if (pfn != spte_to_pfn(*sptep)) {
+			struct kvm_mmu_page *sp = page_header(__pa(sptep));

-			child = page_header(pte & PT64_BASE_ADDR_MASK);
-			drop_parent_pte(child, sptep);
-			kvm_flush_remote_tlbs(vcpu->kvm);
-		} else if (pfn != spte_to_pfn(*sptep)) {
-			pgprintk("hfn old %llx new %llx\n",
-				 spte_to_pfn(*sptep), pfn);
-			drop_spte(vcpu->kvm, sptep);
-			kvm_flush_remote_tlbs(vcpu->kvm);
+			if (mmu_page_zap_pte(vcpu->kvm, sp, sptep))
+				kvm_flush_remote_tlbs(vcpu->kvm);
 		} else
-			was_rmapped = 1;
+			was_rmapped = true;
 	}

 	if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
@@ -2456,8 +2443,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,

 	if (is_shadow_present_pte(*sptep)) {
 		if (!was_rmapped) {
-			rmap_count = rmap_add(vcpu, sptep, gfn);
-			if (rmap_count > RMAP_RECYCLE_THRESHOLD)
+			if (rmap_add(vcpu, sptep, gfn) > RMAP_RECYCLE_THRESHOLD)
 				rmap_recycle(vcpu, sptep, gfn);
 		}
 	}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 04/12] KVM: MMU: simplify set_spte
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
  2013-01-23 10:04 ` [PATCH v2 02/12] KVM: MMU: cleanup mapping-level Xiao Guangrong
  2013-01-23 10:05 ` [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte Xiao Guangrong
@ 2013-01-23 10:06 ` Xiao Guangrong
  2013-01-23 10:06 ` [PATCH v2 05/12] KVM: MMU: introduce vcpu_adjust_access Xiao Guangrong
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:06 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

For the logic, the function can be divided into two parts: one is adjusting
pte_access, the rest one is setting spte according the pte_access. It makes
the code more readable

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   51 ++++++++++++++++++++++++++-------------------------
 1 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a999755..af8bcb2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2336,32 +2336,13 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		return 0;

 	spte = PT_PRESENT_MASK;
-	if (!speculative)
-		spte |= shadow_accessed_mask;
-
-	if (pte_access & ACC_EXEC_MASK)
-		spte |= shadow_x_mask;
-	else
-		spte |= shadow_nx_mask;
-
-	if (pte_access & ACC_USER_MASK)
-		spte |= shadow_user_mask;
-
-	if (level > PT_PAGE_TABLE_LEVEL)
-		spte |= PT_PAGE_SIZE_MASK;
-	if (tdp_enabled)
-		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
-			kvm_is_mmio_pfn(pfn));

 	if (host_writable)
 		spte |= SPTE_HOST_WRITEABLE;
 	else
 		pte_access &= ~ACC_WRITE_MASK;

-	spte |= (u64)pfn << PAGE_SHIFT;
-
 	if (pte_access & ACC_WRITE_MASK) {
-
 		/*
 		 * Other vcpu creates new sp in the window between
 		 * mapping_level() and acquiring mmu-lock. We can
@@ -2369,11 +2350,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * be fixed if guest refault.
 		 */
 		if (level > PT_PAGE_TABLE_LEVEL &&
-		    has_wrprotected_page(vcpu->kvm, gfn, level))
+		      has_wrprotected_page(vcpu->kvm, gfn, level))
 			goto done;

-		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
-
 		/*
 		 * Optimization: for pte sync, if spte was writable the hash
 		 * lookup is unnecessary (and expensive). Write protection
@@ -2381,21 +2360,43 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * Same reasoning can be applied to dirty page accounting.
 		 */
 		if (!can_unsync && is_writable_pte(*sptep))
-			goto set_pte;
+			goto out_access_adjust;

 		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 			pgprintk("%s: found shadow page for %llx, marking ro\n",
 				 __func__, gfn);
 			ret = 1;
 			pte_access &= ~ACC_WRITE_MASK;
-			spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
 		}
 	}

+out_access_adjust:
+	if (!speculative)
+		spte |= shadow_accessed_mask;
+
+	if (pte_access & ACC_EXEC_MASK)
+		spte |= shadow_x_mask;
+	else
+		spte |= shadow_nx_mask;
+
+	if (pte_access & ACC_USER_MASK)
+		spte |= shadow_user_mask;
+
 	if (pte_access & ACC_WRITE_MASK)
+		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
+
+	if (level > PT_PAGE_TABLE_LEVEL)
+		spte |= PT_PAGE_SIZE_MASK;
+
+	if (tdp_enabled)
+		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
+			kvm_is_mmio_pfn(pfn));
+
+	spte |= (u64)pfn << PAGE_SHIFT;
+
+	if (is_writable_pte(spte))
 		mark_page_dirty(vcpu->kvm, gfn);

-set_pte:
 	if (mmu_spte_update(sptep, spte))
 		kvm_flush_remote_tlbs(vcpu->kvm);
 done:
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 05/12] KVM: MMU: introduce vcpu_adjust_access
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
                   ` (2 preceding siblings ...)
  2013-01-23 10:06 ` [PATCH v2 04/12] KVM: MMU: simplify set_spte Xiao Guangrong
@ 2013-01-23 10:06 ` Xiao Guangrong
  2013-01-24 10:36   ` Gleb Natapov
  2013-01-23 10:07 ` [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access Xiao Guangrong
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:06 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

Introduce it to split the code of adjusting pte_access from the large
function of set_spte

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   63 +++++++++++++++++++++++++++++++++-------------------
 1 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index af8bcb2..43b7e0c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2324,25 +2324,18 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 	return 0;
 }

-static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-		    unsigned pte_access, int level,
-		    gfn_t gfn, pfn_t pfn, bool speculative,
-		    bool can_unsync, bool host_writable)
+/*
+ * Return -1 if a race condition is detected, 1 if @gfn need to be
+ * write-protected, otherwise 0 is returned.
+ */
+static int vcpu_adjust_access(struct kvm_vcpu *vcpu, u64 *sptep,
+			      unsigned *pte_access, int level, gfn_t gfn,
+			      bool can_unsync, bool host_writable)
 {
-	u64 spte;
-	int ret = 0;
-
-	if (set_mmio_spte(sptep, gfn, pfn, pte_access))
-		return 0;
+	if (!host_writable)
+		*pte_access &= ~ACC_WRITE_MASK;

-	spte = PT_PRESENT_MASK;
-
-	if (host_writable)
-		spte |= SPTE_HOST_WRITEABLE;
-	else
-		pte_access &= ~ACC_WRITE_MASK;
-
-	if (pte_access & ACC_WRITE_MASK) {
+	if (*pte_access & ACC_WRITE_MASK) {
 		/*
 		 * Other vcpu creates new sp in the window between
 		 * mapping_level() and acquiring mmu-lock. We can
@@ -2351,7 +2344,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 */
 		if (level > PT_PAGE_TABLE_LEVEL &&
 		      has_wrprotected_page(vcpu->kvm, gfn, level))
-			goto done;
+			return -1;

 		/*
 		 * Optimization: for pte sync, if spte was writable the hash
@@ -2360,17 +2353,41 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * Same reasoning can be applied to dirty page accounting.
 		 */
 		if (!can_unsync && is_writable_pte(*sptep))
-			goto out_access_adjust;
+			return 0;

 		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 			pgprintk("%s: found shadow page for %llx, marking ro\n",
 				 __func__, gfn);
-			ret = 1;
-			pte_access &= ~ACC_WRITE_MASK;
+
+			*pte_access &= ~ACC_WRITE_MASK;
+			return 1;
 		}
 	}

-out_access_adjust:
+	return 0;
+}
+
+static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
+		    unsigned pte_access, int level,
+		    gfn_t gfn, pfn_t pfn, bool speculative,
+		    bool can_unsync, bool host_writable)
+{
+	u64 spte;
+	int ret;
+
+	if (set_mmio_spte(sptep, gfn, pfn, pte_access))
+		return 0;
+
+	ret = vcpu_adjust_access(vcpu, sptep, &pte_access, level, gfn,
+				 can_unsync, host_writable);
+	if (ret < 0)
+		return 0;
+
+	spte = PT_PRESENT_MASK;
+
+	if (host_writable)
+		spte |= SPTE_HOST_WRITEABLE;
+
 	if (!speculative)
 		spte |= shadow_accessed_mask;

@@ -2399,7 +2416,7 @@ out_access_adjust:

 	if (mmu_spte_update(sptep, spte))
 		kvm_flush_remote_tlbs(vcpu->kvm);
-done:
+
 	return ret;
 }

-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
                   ` (3 preceding siblings ...)
  2013-01-23 10:06 ` [PATCH v2 05/12] KVM: MMU: introduce vcpu_adjust_access Xiao Guangrong
@ 2013-01-23 10:07 ` Xiao Guangrong
  2013-01-25  0:15   ` Marcelo Tosatti
  2013-01-23 10:07 ` [PATCH v2 07/12] KVM: MMU: remove pt_access in mmu_set_spte Xiao Guangrong
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:07 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

It makes set_spte more clean and reduces branch prediction

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   37 ++++++++++++++++++++++++++-----------
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 43b7e0c..a8a9c0e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -235,6 +235,29 @@ static inline u64 rsvd_bits(int s, int e)
 	return ((1ULL << (e - s + 1)) - 1) << s;
 }

+static u64 gaccess_to_spte_access[ACC_ALL + 1];
+static void build_access_table(void)
+{
+	int access;
+
+	for (access = 0; access < ACC_ALL + 1; access++) {
+		u64 spte_access = 0;
+
+		if (access & ACC_EXEC_MASK)
+			spte_access |= shadow_x_mask;
+		else
+			spte_access |= shadow_nx_mask;
+
+		if (access & ACC_USER_MASK)
+			spte_access |= shadow_user_mask;
+
+		if (access & ACC_WRITE_MASK)
+			spte_access |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
+
+		gaccess_to_spte_access[access] = spte_access;
+	}
+}
+
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 		u64 dirty_mask, u64 nx_mask, u64 x_mask)
 {
@@ -243,6 +266,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 	shadow_dirty_mask = dirty_mask;
 	shadow_nx_mask = nx_mask;
 	shadow_x_mask = x_mask;
+	build_access_table();
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);

@@ -2391,20 +2415,11 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (!speculative)
 		spte |= shadow_accessed_mask;

-	if (pte_access & ACC_EXEC_MASK)
-		spte |= shadow_x_mask;
-	else
-		spte |= shadow_nx_mask;
-
-	if (pte_access & ACC_USER_MASK)
-		spte |= shadow_user_mask;
-
-	if (pte_access & ACC_WRITE_MASK)
-		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
-
 	if (level > PT_PAGE_TABLE_LEVEL)
 		spte |= PT_PAGE_SIZE_MASK;

+	spte |= gaccess_to_spte_access[pte_access];
+
 	if (tdp_enabled)
 		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
 			kvm_is_mmio_pfn(pfn));
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 07/12] KVM: MMU: remove pt_access in mmu_set_spte
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
                   ` (4 preceding siblings ...)
  2013-01-23 10:07 ` [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access Xiao Guangrong
@ 2013-01-23 10:07 ` Xiao Guangrong
  2013-01-23 10:08 ` [PATCH v2 08/12] KVM: MMU: cleanup __direct_map Xiao Guangrong
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:07 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

It is only used in debug code, so drop it

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c         |   15 ++++++---------
 arch/x86/kvm/paging_tmpl.h |    9 ++++-----
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a8a9c0e..c0bb7cf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2436,15 +2436,14 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 }

 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-			 unsigned pt_access, unsigned pte_access,
-			 int write_fault, int *emulate, int level, gfn_t gfn,
-			 pfn_t pfn, bool speculative, bool host_writable)
+			 unsigned pte_access, int write_fault, int *emulate,
+			 int level, gfn_t gfn, pfn_t pfn, bool speculative,
+			 bool host_writable)
 {
 	bool was_rmapped = false;

 	pgprintk("%s: spte %llx access %x write_fault %d gfn %llx\n",
-		 __func__, *sptep, pt_access,
-		 write_fault, gfn);
+		 __func__, *sptep, write_fault, gfn);

 	if (is_rmap_spte(*sptep)) {
 		if (pfn != spte_to_pfn(*sptep)) {
@@ -2547,7 +2546,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 		return -1;

 	for (i = 0; i < ret; i++, gfn++, start++)
-		mmu_set_spte(vcpu, start, ACC_ALL, access, 0, NULL,
+		mmu_set_spte(vcpu, start, access, 0, NULL,
 			     sp->role.level, gfn, page_to_pfn(pages[i]),
 			     true, true);

@@ -2608,9 +2607,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,

 	for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
 		if (iterator.level == level) {
-			unsigned pte_access = ACC_ALL;
-
-			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access,
+			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
 				     write, &emulate, level, gfn, pfn,
 				     prefault, map_writable);
 			direct_pte_prefetch(vcpu, iterator.sptep);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index ca69dcc..b9a0df6 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -326,8 +326,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0,
-		     NULL, PT_PAGE_TABLE_LEVEL, gfn, pfn, true, true);
+	mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL,
+		     gfn, pfn, true, true);

 	return true;
 }
@@ -473,9 +473,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 	}

 	clear_sp_write_flooding_count(it.sptep);
-	mmu_set_spte(vcpu, it.sptep, access, gw->pte_access,
-		     write_fault, &emulate, it.level,
-		     gw->gfn, pfn, prefault, map_writable);
+	mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, &emulate,
+		     it.level, gw->gfn, pfn, prefault, map_writable);
 	FNAME(pte_prefetch)(vcpu, gw, it.sptep);

 	return emulate;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 08/12] KVM: MMU: cleanup __direct_map
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
                   ` (5 preceding siblings ...)
  2013-01-23 10:07 ` [PATCH v2 07/12] KVM: MMU: remove pt_access in mmu_set_spte Xiao Guangrong
@ 2013-01-23 10:08 ` Xiao Guangrong
  2013-01-23 10:09 ` [PATCH v2 09/12] KVM: MMU: introduce mmu_spte_establish Xiao Guangrong
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:08 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

Use link_shadow_page to link the sp to the spte in __direct_map

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c0bb7cf..7d7eb4a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1984,9 +1984,9 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
 {
 	u64 spte;

-	spte = __pa(sp->spt)
-		| PT_PRESENT_MASK | PT_ACCESSED_MASK
-		| PT_WRITABLE_MASK | PT_USER_MASK;
+	spte = __pa(sp->spt) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
+	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+
 	mmu_spte_set(sptep, spte);
 }

@@ -2626,11 +2626,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 					      iterator.level - 1,
 					      1, ACC_ALL, iterator.sptep);

-			mmu_spte_set(iterator.sptep,
-				     __pa(sp->spt)
-				     | PT_PRESENT_MASK | PT_WRITABLE_MASK
-				     | shadow_user_mask | shadow_x_mask
-				     | shadow_accessed_mask);
+			link_shadow_page(iterator.sptep, sp);
 		}
 	}
 	return emulate;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 09/12] KVM: MMU: introduce mmu_spte_establish
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
                   ` (6 preceding siblings ...)
  2013-01-23 10:08 ` [PATCH v2 08/12] KVM: MMU: cleanup __direct_map Xiao Guangrong
@ 2013-01-23 10:09 ` Xiao Guangrong
  2013-01-23 10:09 ` [PATCH v2 10/12] KVM: MMU: unify the code of walking pte list Xiao Guangrong
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:09 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

It is used to establish the spte if it is not present to cleanup the
code, it also marks spte present before linking it to the sp's
parent_list, then we can integrate the code between rmap walking and
parent_lisk walking in the later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c         |   81 ++++++++++++++++++++++---------------------
 arch/x86/kvm/paging_tmpl.h |   16 ++++-----
 2 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7d7eb4a..55198a1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1505,9 +1505,6 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn)
 static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
 				    struct kvm_mmu_page *sp, u64 *parent_pte)
 {
-	if (!parent_pte)
-		return;
-
 	pte_list_add(vcpu, parent_pte, &sp->parent_ptes);
 }

@@ -1525,7 +1522,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
 }

 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
-					       u64 *parent_pte, int direct)
+					       int direct)
 {
 	struct kvm_mmu_page *sp;
 	sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
@@ -1535,7 +1532,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
 	sp->parent_ptes = 0;
-	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
 	return sp;
 }
@@ -1868,8 +1864,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 					     gva_t gaddr,
 					     unsigned level,
 					     int direct,
-					     unsigned access,
-					     u64 *parent_pte)
+					     unsigned access)
 {
 	union kvm_mmu_page_role role;
 	unsigned quadrant;
@@ -1899,19 +1894,15 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
 			break;

-		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
-		if (sp->unsync_children) {
+		if (sp->unsync_children)
 			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-			kvm_mmu_mark_parents_unsync(sp);
-		} else if (sp->unsync)
-			kvm_mmu_mark_parents_unsync(sp);

 		__clear_sp_write_flooding_count(sp);
 		trace_kvm_mmu_get_page(sp, false);
 		return sp;
 	}
 	++vcpu->kvm->stat.mmu_cache_miss;
-	sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
+	sp = kvm_mmu_alloc_page(vcpu, direct);
 	if (!sp)
 		return sp;
 	sp->gfn = gfn;
@@ -1931,6 +1922,35 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	return sp;
 }

+static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
+{
+	u64 spte;
+
+	spte = __pa(sp->spt) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
+	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+
+	mmu_spte_set(sptep, spte);
+}
+
+static struct kvm_mmu_page *
+mmu_spte_establish(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, gva_t gaddr,
+		   unsigned level, int direct, unsigned access)
+{
+	struct kvm_mmu_page *sp;
+
+	WARN_ON(is_shadow_present_pte(*spte));
+
+	sp = kvm_mmu_get_page(vcpu, gfn, gaddr, level, direct, access);
+
+	link_shadow_page(spte, sp);
+	mmu_page_add_parent_pte(vcpu, sp, spte);
+
+	if (sp->unsync_children || sp->unsync)
+		kvm_mmu_mark_parents_unsync(sp);
+
+	return sp;
+}
+
 static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
 			     struct kvm_vcpu *vcpu, u64 addr)
 {
@@ -1980,16 +2000,6 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
 	return __shadow_walk_next(iterator, *iterator->sptep);
 }

-static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
-{
-	u64 spte;
-
-	spte = __pa(sp->spt) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
-	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
-
-	mmu_spte_set(sptep, spte);
-}
-
 static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 				   unsigned direct_access)
 {
@@ -2046,11 +2056,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
 		mmu_page_zap_pte(kvm, sp, sp->spt + i);
 }

-static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-	mmu_page_remove_parent_pte(sp, parent_pte);
-}
-
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	u64 *sptep;
@@ -2601,9 +2606,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 			bool prefault)
 {
 	struct kvm_shadow_walk_iterator iterator;
-	struct kvm_mmu_page *sp;
 	int emulate = 0;
-	gfn_t pseudo_gfn;

 	for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
 		if (iterator.level == level) {
@@ -2621,12 +2624,11 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 			u64 base_addr = iterator.addr;

 			base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
-			pseudo_gfn = base_addr >> PAGE_SHIFT;
-			sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
-					      iterator.level - 1,
-					      1, ACC_ALL, iterator.sptep);
+			mmu_spte_establish(vcpu, iterator.sptep,
+					   gpa_to_gfn(base_addr),
+					   iterator.addr, iterator.level - 1,
+					   1, ACC_ALL);

-			link_shadow_page(iterator.sptep, sp);
 		}
 	}
 	return emulate;
@@ -2945,7 +2947,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		spin_lock(&vcpu->kvm->mmu_lock);
 		kvm_mmu_free_some_pages(vcpu);
 		sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL,
-				      1, ACC_ALL, NULL);
+				      1, ACC_ALL);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
 		vcpu->arch.mmu.root_hpa = __pa(sp->spt);
@@ -2958,8 +2960,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 			kvm_mmu_free_some_pages(vcpu);
 			sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT),
 					      i << 30,
-					      PT32_ROOT_LEVEL, 1, ACC_ALL,
-					      NULL);
+					      PT32_ROOT_LEVEL, 1, ACC_ALL);
 			root = __pa(sp->spt);
 			++sp->root_count;
 			spin_unlock(&vcpu->kvm->mmu_lock);
@@ -2996,7 +2997,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		spin_lock(&vcpu->kvm->mmu_lock);
 		kvm_mmu_free_some_pages(vcpu);
 		sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
-				      0, ACC_ALL, NULL);
+				      0, ACC_ALL);
 		root = __pa(sp->spt);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3031,7 +3032,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		kvm_mmu_free_some_pages(vcpu);
 		sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
 				      PT32_ROOT_LEVEL, 0,
-				      ACC_ALL, NULL);
+				      ACC_ALL);
 		root = __pa(sp->spt);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index b9a0df6..9dbb79a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -437,8 +437,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 		sp = NULL;
 		if (!is_shadow_present_pte(*it.sptep)) {
 			table_gfn = gw->table_gfn[it.level - 2];
-			sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
-					      false, access, it.sptep);
+			sp = mmu_spte_establish(vcpu, it.sptep, table_gfn,
+						addr, it.level-1, false,
+						access);
 		}

 		/*
@@ -447,9 +448,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 		 */
 		if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))
 			goto out_gpte_changed;
-
-		if (sp)
-			link_shadow_page(it.sptep, sp);
 	}

 	for (;
@@ -467,9 +465,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,

 		direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);

-		sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
-				      true, direct_access, it.sptep);
-		link_shadow_page(it.sptep, sp);
+		mmu_spte_establish(vcpu, it.sptep, direct_gfn, addr,
+				   it.level-1, true, direct_access);
 	}

 	clear_sp_write_flooding_count(it.sptep);
@@ -481,7 +478,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,

 out_gpte_changed:
 	if (sp)
-		kvm_mmu_put_page(sp, it.sptep);
+		drop_parent_pte(sp, it.sptep);
+
 	kvm_release_pfn_clean(pfn);
 	return 0;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 10/12] KVM: MMU: unify the code of walking pte list
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
                   ` (7 preceding siblings ...)
  2013-01-23 10:09 ` [PATCH v2 09/12] KVM: MMU: introduce mmu_spte_establish Xiao Guangrong
@ 2013-01-23 10:09 ` Xiao Guangrong
  2013-01-27 13:28   ` Gleb Natapov
  2013-01-23 10:10 ` [PATCH v2 11/12] KVM: MMU: fix spte assertion Xiao Guangrong
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:09 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

Walking parent spte and walking ramp have same logic, this patch introduces
for_each_spte_in_pte_list to integrate their code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c       |  199 ++++++++++++++++++++++------------------------
 arch/x86/kvm/mmu_audit.c |    5 +-
 2 files changed, 97 insertions(+), 107 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 55198a1..b7da3fb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -968,26 +968,75 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
 	}
 }

-typedef void (*pte_list_walk_fn) (u64 *spte);
-static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
+/*
+ * Used by the following functions to iterate through the sptes linked by a
+ * pte_list.  All fields are private and not assumed to be used outside.
+ */
+struct pte_list_iterator {
+	/* private fields */
+	struct pte_list_desc *desc;	/* holds the sptep if not NULL */
+	int pos;			/* index of the sptep */
+};
+
+/*
+ * Iteration must be started by this function.  This should also be used after
+ * removing/dropping sptes from the pte_list link because in such cases the
+ * information in the itererator may not be valid.
+ *
+ * Returns sptep if found, NULL otherwise.
+ */
+static u64 *pte_list_get_first(unsigned long pte_list,
+			       struct pte_list_iterator *iter)
 {
-	struct pte_list_desc *desc;
-	int i;
+	if (!pte_list)
+		return NULL;

-	if (!*pte_list)
-		return;
+	if (!(pte_list & 1)) {
+		iter->desc = NULL;
+		return (u64 *)pte_list;
+	}

-	if (!(*pte_list & 1))
-		return fn((u64 *)*pte_list);
+	iter->desc = (struct pte_list_desc *)(pte_list & ~1ul);
+	iter->pos = 0;
+	return iter->desc->sptes[iter->pos];
+}

-	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
-	while (desc) {
-		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
-			fn(desc->sptes[i]);
-		desc = desc->more;
+/*
+ * Must be used with a valid iterator: e.g. after pte_list_get_next().
+ *
+ * Returns sptep if found, NULL otherwise.
+ */
+static u64 *pte_list_get_next(struct pte_list_iterator *iter)
+{
+	if (iter->desc) {
+		if (iter->pos < PTE_LIST_EXT - 1) {
+			u64 *sptep;
+
+			++iter->pos;
+			sptep = iter->desc->sptes[iter->pos];
+			if (sptep)
+				return sptep;
+		}
+
+		iter->desc = iter->desc->more;
+
+		if (iter->desc) {
+			iter->pos = 0;
+			/* desc->sptes[0] cannot be NULL */
+			return iter->desc->sptes[iter->pos];
+		}
 	}
+
+	return NULL;
 }

+#define for_each_spte_in_pte_list(pte_list, iter, spte)		\
+	   for (spte = pte_list_get_first(pte_list, &(iter));	\
+	      spte != NULL; spte = pte_list_get_next(&(iter)))
+
+#define for_each_spte_in_rmap(rmap, iter, spte)			\
+	   for_each_spte_in_pte_list(rmap, iter, spte)
+
 static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
 				    struct kvm_memory_slot *slot)
 {
@@ -1039,67 +1088,6 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	pte_list_remove(spte, rmapp);
 }

-/*
- * Used by the following functions to iterate through the sptes linked by a
- * rmap.  All fields are private and not assumed to be used outside.
- */
-struct rmap_iterator {
-	/* private fields */
-	struct pte_list_desc *desc;	/* holds the sptep if not NULL */
-	int pos;			/* index of the sptep */
-};
-
-/*
- * Iteration must be started by this function.  This should also be used after
- * removing/dropping sptes from the rmap link because in such cases the
- * information in the itererator may not be valid.
- *
- * Returns sptep if found, NULL otherwise.
- */
-static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
-{
-	if (!rmap)
-		return NULL;
-
-	if (!(rmap & 1)) {
-		iter->desc = NULL;
-		return (u64 *)rmap;
-	}
-
-	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
-	iter->pos = 0;
-	return iter->desc->sptes[iter->pos];
-}
-
-/*
- * Must be used with a valid iterator: e.g. after rmap_get_first().
- *
- * Returns sptep if found, NULL otherwise.
- */
-static u64 *rmap_get_next(struct rmap_iterator *iter)
-{
-	if (iter->desc) {
-		if (iter->pos < PTE_LIST_EXT - 1) {
-			u64 *sptep;
-
-			++iter->pos;
-			sptep = iter->desc->sptes[iter->pos];
-			if (sptep)
-				return sptep;
-		}
-
-		iter->desc = iter->desc->more;
-
-		if (iter->desc) {
-			iter->pos = 0;
-			/* desc->sptes[0] cannot be NULL */
-			return iter->desc->sptes[iter->pos];
-		}
-	}
-
-	return NULL;
-}
-
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
 	if (mmu_spte_clear_track_bits(sptep))
@@ -1160,14 +1148,13 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
 				 bool pt_protect)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct pte_list_iterator iter;
 	bool flush = false;

-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
+	for_each_spte_in_rmap(*rmapp, iter, sptep) {
 		BUG_ON(!(*sptep & PT_PRESENT_MASK));

 		spte_write_protect(kvm, sptep, &flush, pt_protect);
-		sptep = rmap_get_next(&iter);
 	}

 	return flush;
@@ -1221,15 +1208,14 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			   struct kvm_memory_slot *slot, unsigned long data)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct pte_list_iterator iter;
 	int need_tlb_flush = 0;

-	while ((sptep = rmap_get_first(*rmapp, &iter))) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-		rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", sptep, *sptep);
-
+restart:
+	for_each_spte_in_rmap(*rmapp, iter, sptep) {
 		drop_spte(kvm, sptep);
 		need_tlb_flush = 1;
+		goto restart;
 	}

 	return need_tlb_flush;
@@ -1239,7 +1225,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			     struct kvm_memory_slot *slot, unsigned long data)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct pte_list_iterator iter;
 	int need_flush = 0;
 	u64 new_spte;
 	pte_t *ptep = (pte_t *)data;
@@ -1248,7 +1234,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	WARN_ON(pte_huge(*ptep));
 	new_pfn = pte_pfn(*ptep);

-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
+restart:
+	for_each_spte_in_rmap(*rmapp, iter, sptep) {
 		BUG_ON(!is_shadow_present_pte(*sptep));
 		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", sptep, *sptep);

@@ -1256,19 +1243,18 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,

 		if (pte_write(*ptep)) {
 			drop_spte(kvm, sptep);
-			sptep = rmap_get_first(*rmapp, &iter);
-		} else {
-			new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
-			new_spte |= (u64)new_pfn << PAGE_SHIFT;
+			goto restart;
+		}

-			new_spte &= ~PT_WRITABLE_MASK;
-			new_spte &= ~SPTE_HOST_WRITEABLE;
-			new_spte &= ~shadow_accessed_mask;
+		new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
+		new_spte |= (u64)new_pfn << PAGE_SHIFT;

-			mmu_spte_clear_track_bits(sptep);
-			mmu_spte_set(sptep, new_spte);
-			sptep = rmap_get_next(&iter);
-		}
+		new_spte &= ~PT_WRITABLE_MASK;
+		new_spte &= ~SPTE_HOST_WRITEABLE;
+		new_spte &= ~shadow_accessed_mask;
+
+		mmu_spte_clear_track_bits(sptep);
+		mmu_spte_set(sptep, new_spte);
 	}

 	if (need_flush)
@@ -1359,7 +1345,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			 struct kvm_memory_slot *slot, unsigned long data)
 {
 	u64 *sptep;
-	struct rmap_iterator uninitialized_var(iter);
+	struct pte_list_iterator uninitialized_var(iter);
 	int young = 0;

 	/*
@@ -1375,8 +1361,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		goto out;
 	}

-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter)) {
+	for_each_spte_in_rmap(*rmapp, iter, sptep) {
 		BUG_ON(!is_shadow_present_pte(*sptep));

 		if (*sptep & shadow_accessed_mask) {
@@ -1395,7 +1380,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			      struct kvm_memory_slot *slot, unsigned long data)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct pte_list_iterator iter;
 	int young = 0;

 	/*
@@ -1406,8 +1391,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	if (!shadow_accessed_mask)
 		goto out;

-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter)) {
+	for_each_spte_in_rmap(*rmapp, iter, sptep) {
 		BUG_ON(!is_shadow_present_pte(*sptep));

 		if (*sptep & shadow_accessed_mask) {
@@ -1539,7 +1523,11 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 static void mark_unsync(u64 *spte);
 static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 {
-	pte_list_walk(&sp->parent_ptes, mark_unsync);
+	struct pte_list_iterator iter;
+	u64 *spte;
+
+	for_each_spte_in_pte_list(sp->parent_ptes, iter, spte)
+		mark_unsync(spte);
 }

 static void mark_unsync(u64 *spte)
@@ -2059,10 +2047,13 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct pte_list_iterator iter;

-	while ((sptep = rmap_get_first(sp->parent_ptes, &iter)))
+restart:
+	for_each_spte_in_rmap(sp->parent_ptes, iter, sptep) {
 		drop_parent_pte(sp, sptep);
+		goto restart;
+	}
 }

 static int mmu_zap_unsync_children(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index daff69e..a08d384 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -190,15 +190,14 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	unsigned long *rmapp;
 	u64 *sptep;
-	struct rmap_iterator iter;
+	struct pte_list_iterator iter;

 	if (sp->role.direct || sp->unsync || sp->role.invalid)
 		return;

 	rmapp = gfn_to_rmap(kvm, sp->gfn, PT_PAGE_TABLE_LEVEL);

-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter)) {
+	for_each_spte_in_rmap(*rmapp, iter, sptep) {
 		if (is_writable_pte(*sptep))
 			audit_printk(kvm, "shadow page has writable "
 				     "mappings: gfn %llx role %x\n",
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 11/12] KVM: MMU: fix spte assertion
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
                   ` (8 preceding siblings ...)
  2013-01-23 10:09 ` [PATCH v2 10/12] KVM: MMU: unify the code of walking pte list Xiao Guangrong
@ 2013-01-23 10:10 ` Xiao Guangrong
  2013-01-23 10:10 ` [PATCH v2 12/12] KVM: MMU: fast drop all spte on the pte_list Xiao Guangrong
  2013-01-27 12:06 ` [PATCH v2 01/12] KVM: MMU: lazily drop large spte Gleb Natapov
  11 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:10 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

PT_PRESENT_MASK bit is not enough to see the spte has already been mapped
into pte-list for mmio spte also set this bit. Use is_shadow_present_pte
instead to fix it

Also, this patch move many assertions to the common place to clean up the
code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   20 ++++++--------------
 1 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b7da3fb..2c0a786 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1032,7 +1032,9 @@ static u64 *pte_list_get_next(struct pte_list_iterator *iter)

 #define for_each_spte_in_pte_list(pte_list, iter, spte)		\
 	   for (spte = pte_list_get_first(pte_list, &(iter));	\
-	      spte != NULL; spte = pte_list_get_next(&(iter)))
+	      spte != NULL &&					\
+	      ({WARN_ON(!is_shadow_present_pte(*(spte))); 1; });\
+		   spte = pte_list_get_next(&iter))

 #define for_each_spte_in_rmap(rmap, iter, spte)			\
 	   for_each_spte_in_pte_list(rmap, iter, spte)
@@ -1151,11 +1153,8 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
 	struct pte_list_iterator iter;
 	bool flush = false;

-	for_each_spte_in_rmap(*rmapp, iter, sptep) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+	for_each_spte_in_rmap(*rmapp, iter, sptep)
 		spte_write_protect(kvm, sptep, &flush, pt_protect);
-	}

 	return flush;
 }
@@ -1236,7 +1235,6 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,

 restart:
 	for_each_spte_in_rmap(*rmapp, iter, sptep) {
-		BUG_ON(!is_shadow_present_pte(*sptep));
 		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", sptep, *sptep);

 		need_flush = 1;
@@ -1361,15 +1359,12 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		goto out;
 	}

-	for_each_spte_in_rmap(*rmapp, iter, sptep) {
-		BUG_ON(!is_shadow_present_pte(*sptep));
-
+	for_each_spte_in_rmap(*rmapp, iter, sptep)
 		if (*sptep & shadow_accessed_mask) {
 			young = 1;
 			clear_bit((ffs(shadow_accessed_mask) - 1),
 				 (unsigned long *)sptep);
 		}
-	}
 out:
 	/* @data has hva passed to kvm_age_hva(). */
 	trace_kvm_age_page(data, slot, young);
@@ -1391,14 +1386,11 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	if (!shadow_accessed_mask)
 		goto out;

-	for_each_spte_in_rmap(*rmapp, iter, sptep) {
-		BUG_ON(!is_shadow_present_pte(*sptep));
-
+	for_each_spte_in_rmap(*rmapp, iter, sptep)
 		if (*sptep & shadow_accessed_mask) {
 			young = 1;
 			break;
 		}
-	}
 out:
 	return young;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 12/12] KVM: MMU: fast drop all spte on the pte_list
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
                   ` (9 preceding siblings ...)
  2013-01-23 10:10 ` [PATCH v2 11/12] KVM: MMU: fix spte assertion Xiao Guangrong
@ 2013-01-23 10:10 ` Xiao Guangrong
  2013-01-27 12:06 ` [PATCH v2 01/12] KVM: MMU: lazily drop large spte Gleb Natapov
  11 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-23 10:10 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, Gleb Natapov, LKML, KVM

If the pte_list need to be destroyed, no need to delete its spte one
by one, we can directly reset it and free the memory its used

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   36 +++++++++++++++++++++++++++---------
 1 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2c0a786..0afe8da 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -968,6 +968,25 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
 	}
 }

+static void pte_list_destroy(unsigned long *pte_list)
+{
+	struct pte_list_desc *desc;
+	unsigned long list_value = *pte_list;
+
+	*pte_list = 0;
+
+	if (!(list_value & 1))
+		return;
+
+	desc = (struct pte_list_desc *)(list_value & ~1ul);
+	while (desc) {
+		struct pte_list_desc *next_desc = desc->more;
+
+		mmu_free_pte_list_desc(desc);
+		desc = next_desc;
+	}
+}
+
 /*
  * Used by the following functions to iterate through the sptes linked by a
  * pte_list.  All fields are private and not assumed to be used outside.
@@ -1206,17 +1225,17 @@ static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			   struct kvm_memory_slot *slot, unsigned long data)
 {
-	u64 *sptep;
 	struct pte_list_iterator iter;
+	u64 *sptep;
 	int need_tlb_flush = 0;

-restart:
 	for_each_spte_in_rmap(*rmapp, iter, sptep) {
-		drop_spte(kvm, sptep);
+		mmu_spte_clear_track_bits(sptep);
 		need_tlb_flush = 1;
-		goto restart;
 	}

+	pte_list_destroy(rmapp);
+
 	return need_tlb_flush;
 }

@@ -2041,11 +2060,10 @@ static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct pte_list_iterator iter;

-restart:
-	for_each_spte_in_rmap(sp->parent_ptes, iter, sptep) {
-		drop_parent_pte(sp, sptep);
-		goto restart;
-	}
+	for_each_spte_in_rmap(sp->parent_ptes, iter, sptep)
+		mmu_spte_clear_no_track(sptep);
+
+	pte_list_destroy(&sp->parent_ptes);
 }

 static int mmu_zap_unsync_children(struct kvm *kvm,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 05/12] KVM: MMU: introduce vcpu_adjust_access
  2013-01-23 10:06 ` [PATCH v2 05/12] KVM: MMU: introduce vcpu_adjust_access Xiao Guangrong
@ 2013-01-24 10:36   ` Gleb Natapov
  2013-01-24 11:33     ` Xiao Guangrong
  0 siblings, 1 reply; 28+ messages in thread
From: Gleb Natapov @ 2013-01-24 10:36 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On Wed, Jan 23, 2013 at 06:06:36PM +0800, Xiao Guangrong wrote:
> Introduce it to split the code of adjusting pte_access from the large
> function of set_spte
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c |   63 +++++++++++++++++++++++++++++++++-------------------
>  1 files changed, 40 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index af8bcb2..43b7e0c 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2324,25 +2324,18 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>  	return 0;
>  }
> 
> -static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> -		    unsigned pte_access, int level,
> -		    gfn_t gfn, pfn_t pfn, bool speculative,
> -		    bool can_unsync, bool host_writable)
> +/*
> + * Return -1 if a race condition is detected, 1 if @gfn need to be
> + * write-protected, otherwise 0 is returned.
> + */
That's a little bit crafty.

Isn't it better to handle race condition in set_spte() explicitly?
Something like do:

 if (host_writable && (pte_access & ACC_WRITE_MASK) &&
        level > PT_PAGE_TABLE_LEVEL && has_wrprotected_page(vcpu->kvm, gfn, level))
    return 0;

before calling vcpu_adjust_access() in set_spte()?

Or even do:

 if ((pte_access & ACC_WRITE_MASK) && level > PT_PAGE_TABLE_LEVEL &&
           has_wrprotected_page(vcpu->kvm, gfn, level))
    return 0;

After calling vcpu_adjust_access().

The later will create read only large page mapping where now it is not
created, but it shouldn't be a problem as far as I see.

> +static int vcpu_adjust_access(struct kvm_vcpu *vcpu, u64 *sptep,
> +			      unsigned *pte_access, int level, gfn_t gfn,
> +			      bool can_unsync, bool host_writable)
>  {
> -	u64 spte;
> -	int ret = 0;
> -
> -	if (set_mmio_spte(sptep, gfn, pfn, pte_access))
> -		return 0;
> +	if (!host_writable)
> +		*pte_access &= ~ACC_WRITE_MASK;
> 
> -	spte = PT_PRESENT_MASK;
> -
> -	if (host_writable)
> -		spte |= SPTE_HOST_WRITEABLE;
> -	else
> -		pte_access &= ~ACC_WRITE_MASK;
> -
> -	if (pte_access & ACC_WRITE_MASK) {
> +	if (*pte_access & ACC_WRITE_MASK) {
>  		/*
>  		 * Other vcpu creates new sp in the window between
>  		 * mapping_level() and acquiring mmu-lock. We can
> @@ -2351,7 +2344,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		 */
>  		if (level > PT_PAGE_TABLE_LEVEL &&
>  		      has_wrprotected_page(vcpu->kvm, gfn, level))
> -			goto done;
> +			return -1;
> 
>  		/*
>  		 * Optimization: for pte sync, if spte was writable the hash
> @@ -2360,17 +2353,41 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		 * Same reasoning can be applied to dirty page accounting.
>  		 */
>  		if (!can_unsync && is_writable_pte(*sptep))
> -			goto out_access_adjust;
> +			return 0;
> 
>  		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
>  			pgprintk("%s: found shadow page for %llx, marking ro\n",
>  				 __func__, gfn);
> -			ret = 1;
> -			pte_access &= ~ACC_WRITE_MASK;
> +
> +			*pte_access &= ~ACC_WRITE_MASK;
> +			return 1;
>  		}
>  	}
> 
> -out_access_adjust:
> +	return 0;
> +}
> +
> +static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> +		    unsigned pte_access, int level,
> +		    gfn_t gfn, pfn_t pfn, bool speculative,
> +		    bool can_unsync, bool host_writable)
> +{
> +	u64 spte;
> +	int ret;
> +
> +	if (set_mmio_spte(sptep, gfn, pfn, pte_access))
> +		return 0;
> +
> +	ret = vcpu_adjust_access(vcpu, sptep, &pte_access, level, gfn,
> +				 can_unsync, host_writable);
> +	if (ret < 0)
> +		return 0;
> +
> +	spte = PT_PRESENT_MASK;
> +
> +	if (host_writable)
> +		spte |= SPTE_HOST_WRITEABLE;
> +
>  	if (!speculative)
>  		spte |= shadow_accessed_mask;
> 
> @@ -2399,7 +2416,7 @@ out_access_adjust:
> 
>  	if (mmu_spte_update(sptep, spte))
>  		kvm_flush_remote_tlbs(vcpu->kvm);
> -done:
> +
>  	return ret;
>  }
> 
> -- 
> 1.7.7.6

--
			Gleb.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 05/12] KVM: MMU: introduce vcpu_adjust_access
  2013-01-24 10:36   ` Gleb Natapov
@ 2013-01-24 11:33     ` Xiao Guangrong
  0 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-24 11:33 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On 01/24/2013 06:36 PM, Gleb Natapov wrote:
> On Wed, Jan 23, 2013 at 06:06:36PM +0800, Xiao Guangrong wrote:
>> Introduce it to split the code of adjusting pte_access from the large
>> function of set_spte
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c |   63 +++++++++++++++++++++++++++++++++-------------------
>>  1 files changed, 40 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index af8bcb2..43b7e0c 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2324,25 +2324,18 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>>  	return 0;
>>  }
>>
>> -static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>> -		    unsigned pte_access, int level,
>> -		    gfn_t gfn, pfn_t pfn, bool speculative,
>> -		    bool can_unsync, bool host_writable)
>> +/*
>> + * Return -1 if a race condition is detected, 1 if @gfn need to be
>> + * write-protected, otherwise 0 is returned.
>> + */
> That's a little bit crafty.
> 
> Isn't it better to handle race condition in set_spte() explicitly?
> Something like do:
> 
>  if (host_writable && (pte_access & ACC_WRITE_MASK) &&
>         level > PT_PAGE_TABLE_LEVEL && has_wrprotected_page(vcpu->kvm, gfn, level))
>     return 0;
> 
> before calling vcpu_adjust_access() in set_spte()?
> 
> Or even do:
> 
>  if ((pte_access & ACC_WRITE_MASK) && level > PT_PAGE_TABLE_LEVEL &&
>            has_wrprotected_page(vcpu->kvm, gfn, level))
>     return 0;
> 
> After calling vcpu_adjust_access().
> 
> The later will create read only large page mapping where now it is not
> created, but it shouldn't be a problem as far as I see.

Yes. I like the later way. Will update it. Thanks for your suggestion, Gleb!


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access
  2013-01-23 10:07 ` [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access Xiao Guangrong
@ 2013-01-25  0:15   ` Marcelo Tosatti
  2013-01-25  2:46     ` Xiao Guangrong
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2013-01-25  0:15 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Gleb Natapov, LKML, KVM

On Wed, Jan 23, 2013 at 06:07:20PM +0800, Xiao Guangrong wrote:
> It makes set_spte more clean and reduces branch prediction
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c |   37 ++++++++++++++++++++++++++-----------
>  1 files changed, 26 insertions(+), 11 deletions(-)

Don't see set_spte as being a performance problem?
IMO the current code is quite simple.




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access
  2013-01-25  0:15   ` Marcelo Tosatti
@ 2013-01-25  2:46     ` Xiao Guangrong
  2013-01-29  0:07       ` Marcelo Tosatti
  0 siblings, 1 reply; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-25  2:46 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Gleb Natapov, LKML, KVM

On 01/25/2013 08:15 AM, Marcelo Tosatti wrote:
> On Wed, Jan 23, 2013 at 06:07:20PM +0800, Xiao Guangrong wrote:
>> It makes set_spte more clean and reduces branch prediction
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c |   37 ++++++++++++++++++++++++++-----------
>>  1 files changed, 26 insertions(+), 11 deletions(-)
> 
> Don't see set_spte as being a performance problem?
> IMO the current code is quite simple.

Yes, this is not a performance problem.

I just dislike this many continuous "if"-s in the function:

if (xxx)
	xxx
if (xxx)
	xxx
....

Totally, it has 7 "if"-s before this patch.

Okay, if you think this is unnecessary, i will drop this patch. :)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 01/12] KVM: MMU: lazily drop large spte
  2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
                   ` (10 preceding siblings ...)
  2013-01-23 10:10 ` [PATCH v2 12/12] KVM: MMU: fast drop all spte on the pte_list Xiao Guangrong
@ 2013-01-27 12:06 ` Gleb Natapov
  2013-01-29  2:57   ` Xiao Guangrong
  11 siblings, 1 reply; 28+ messages in thread
From: Gleb Natapov @ 2013-01-27 12:06 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On Wed, Jan 23, 2013 at 06:04:17PM +0800, Xiao Guangrong wrote:
> Do not drop large spte until it can be insteaded by small pages so that
> the guest can happliy read memory through it
> 
> The idea is from Avi:
> | As I mentioned before, write-protecting a large spte is a good idea,
> | since it moves some work from protect-time to fault-time, so it reduces
> | jitter.  This removes the need for the return value.
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c |   21 ++++++---------------
>  1 files changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9f628f7..0f90269 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1105,7 +1105,7 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
> 
>  /*
>   * Write-protect on the specified @sptep, @pt_protect indicates whether
> - * spte writ-protection is caused by protecting shadow page table.
> + * spte write-protection is caused by protecting shadow page table.
>   * @flush indicates whether tlb need be flushed.
>   *
>   * Note: write protection is difference between drity logging and spte
> @@ -1114,31 +1114,23 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
>   *   its dirty bitmap is properly set.
>   * - for spte protection, the spte can be writable only after unsync-ing
>   *   shadow page.
> - *
> - * Return true if the spte is dropped.
>   */
> -static bool
> +static void
>  spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect)
Since return value is not longer used make the function return true if flush is needed
instead of returning it via pointer to a variable.

>  {
>  	u64 spte = *sptep;
> 
>  	if (!is_writable_pte(spte) &&
>  	      !(pt_protect && spte_is_locklessly_modifiable(spte)))
> -		return false;
> +		return;
> 
>  	rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);
> 
> -	if (__drop_large_spte(kvm, sptep)) {
> -		*flush |= true;
> -		return true;
> -	}
> -
>  	if (pt_protect)
>  		spte &= ~SPTE_MMU_WRITEABLE;
>  	spte = spte & ~PT_WRITABLE_MASK;
> 
>  	*flush |= mmu_spte_update(sptep, spte);
> -	return false;
>  }
> 
>  static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
> @@ -1150,11 +1142,8 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
> 
>  	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
>  		BUG_ON(!(*sptep & PT_PRESENT_MASK));
> -		if (spte_write_protect(kvm, sptep, &flush, pt_protect)) {
> -			sptep = rmap_get_first(*rmapp, &iter);
> -			continue;
> -		}
> 
> +		spte_write_protect(kvm, sptep, &flush, pt_protect);
>  		sptep = rmap_get_next(&iter);
>  	}
> 
> @@ -2611,6 +2600,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
>  			break;
>  		}
> 
> +		drop_large_spte(vcpu, iterator.sptep);
> +
>  		if (!is_shadow_present_pte(*iterator.sptep)) {
>  			u64 base_addr = iterator.addr;
> 
> -- 
> 1.7.7.6

--
			Gleb.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/12] KVM: MMU: unify the code of walking pte list
  2013-01-23 10:09 ` [PATCH v2 10/12] KVM: MMU: unify the code of walking pte list Xiao Guangrong
@ 2013-01-27 13:28   ` Gleb Natapov
  2013-01-29  3:01     ` Xiao Guangrong
  0 siblings, 1 reply; 28+ messages in thread
From: Gleb Natapov @ 2013-01-27 13:28 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On Wed, Jan 23, 2013 at 06:09:34PM +0800, Xiao Guangrong wrote:
> Walking parent spte and walking ramp have same logic, this patch introduces
> for_each_spte_in_pte_list to integrate their code
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c       |  199 ++++++++++++++++++++++------------------------
>  arch/x86/kvm/mmu_audit.c |    5 +-
>  2 files changed, 97 insertions(+), 107 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 55198a1..b7da3fb 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -968,26 +968,75 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
>  	}
>  }
> 
> -typedef void (*pte_list_walk_fn) (u64 *spte);
> -static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
> +/*
> + * Used by the following functions to iterate through the sptes linked by a
> + * pte_list.  All fields are private and not assumed to be used outside.
> + */
> +struct pte_list_iterator {
> +	/* private fields */
> +	struct pte_list_desc *desc;	/* holds the sptep if not NULL */
> +	int pos;			/* index of the sptep */
> +};
> +
> +/*
> + * Iteration must be started by this function.  This should also be used after
> + * removing/dropping sptes from the pte_list link because in such cases the
> + * information in the itererator may not be valid.
> + *
> + * Returns sptep if found, NULL otherwise.
> + */
> +static u64 *pte_list_get_first(unsigned long pte_list,
> +			       struct pte_list_iterator *iter)
>  {
> -	struct pte_list_desc *desc;
> -	int i;
> +	if (!pte_list)
> +		return NULL;
> 
> -	if (!*pte_list)
> -		return;
> +	if (!(pte_list & 1)) {
> +		iter->desc = NULL;
> +		return (u64 *)pte_list;
> +	}
> 
> -	if (!(*pte_list & 1))
> -		return fn((u64 *)*pte_list);
> +	iter->desc = (struct pte_list_desc *)(pte_list & ~1ul);
> +	iter->pos = 0;
> +	return iter->desc->sptes[iter->pos];
> +}
> 
> -	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
> -	while (desc) {
> -		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
> -			fn(desc->sptes[i]);
> -		desc = desc->more;
> +/*
> + * Must be used with a valid iterator: e.g. after pte_list_get_next().
> + *
> + * Returns sptep if found, NULL otherwise.
> + */
> +static u64 *pte_list_get_next(struct pte_list_iterator *iter)
> +{
> +	if (iter->desc) {
> +		if (iter->pos < PTE_LIST_EXT - 1) {
> +			u64 *sptep;
> +
> +			++iter->pos;
> +			sptep = iter->desc->sptes[iter->pos];
> +			if (sptep)
> +				return sptep;
> +		}
> +
> +		iter->desc = iter->desc->more;
> +
> +		if (iter->desc) {
> +			iter->pos = 0;
> +			/* desc->sptes[0] cannot be NULL */
> +			return iter->desc->sptes[iter->pos];
> +		}
>  	}
> +
> +	return NULL;
>  }
> 
> +#define for_each_spte_in_pte_list(pte_list, iter, spte)		\
> +	   for (spte = pte_list_get_first(pte_list, &(iter));	\
> +	      spte != NULL; spte = pte_list_get_next(&(iter)))
> +
> +#define for_each_spte_in_rmap(rmap, iter, spte)			\
> +	   for_each_spte_in_pte_list(rmap, iter, spte)
> +
>  static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
>  				    struct kvm_memory_slot *slot)
>  {
> @@ -1039,67 +1088,6 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
>  	pte_list_remove(spte, rmapp);
>  }
> 
> -/*
> - * Used by the following functions to iterate through the sptes linked by a
> - * rmap.  All fields are private and not assumed to be used outside.
> - */
> -struct rmap_iterator {
> -	/* private fields */
> -	struct pte_list_desc *desc;	/* holds the sptep if not NULL */
> -	int pos;			/* index of the sptep */
> -};
> -
> -/*
> - * Iteration must be started by this function.  This should also be used after
> - * removing/dropping sptes from the rmap link because in such cases the
> - * information in the itererator may not be valid.
> - *
> - * Returns sptep if found, NULL otherwise.
> - */
> -static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
> -{
> -	if (!rmap)
> -		return NULL;
> -
> -	if (!(rmap & 1)) {
> -		iter->desc = NULL;
> -		return (u64 *)rmap;
> -	}
> -
> -	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
> -	iter->pos = 0;
> -	return iter->desc->sptes[iter->pos];
> -}
> -
> -/*
> - * Must be used with a valid iterator: e.g. after rmap_get_first().
> - *
> - * Returns sptep if found, NULL otherwise.
> - */
> -static u64 *rmap_get_next(struct rmap_iterator *iter)
> -{
> -	if (iter->desc) {
> -		if (iter->pos < PTE_LIST_EXT - 1) {
> -			u64 *sptep;
> -
> -			++iter->pos;
> -			sptep = iter->desc->sptes[iter->pos];
> -			if (sptep)
> -				return sptep;
> -		}
> -
> -		iter->desc = iter->desc->more;
> -
> -		if (iter->desc) {
> -			iter->pos = 0;
> -			/* desc->sptes[0] cannot be NULL */
> -			return iter->desc->sptes[iter->pos];
> -		}
> -	}
> -
> -	return NULL;
> -}
> -
>  static void drop_spte(struct kvm *kvm, u64 *sptep)
>  {
>  	if (mmu_spte_clear_track_bits(sptep))
> @@ -1160,14 +1148,13 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
>  				 bool pt_protect)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct pte_list_iterator iter;
>  	bool flush = false;
> 
> -	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
> +	for_each_spte_in_rmap(*rmapp, iter, sptep) {
>  		BUG_ON(!(*sptep & PT_PRESENT_MASK));
> 
>  		spte_write_protect(kvm, sptep, &flush, pt_protect);
> -		sptep = rmap_get_next(&iter);
>  	}
> 
>  	return flush;
> @@ -1221,15 +1208,14 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  			   struct kvm_memory_slot *slot, unsigned long data)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct pte_list_iterator iter;
>  	int need_tlb_flush = 0;
> 
> -	while ((sptep = rmap_get_first(*rmapp, &iter))) {
> -		BUG_ON(!(*sptep & PT_PRESENT_MASK));
> -		rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", sptep, *sptep);
> -
> +restart:
> +	for_each_spte_in_rmap(*rmapp, iter, sptep) {
>  		drop_spte(kvm, sptep);
>  		need_tlb_flush = 1;
> +		goto restart;
>  	}
> 
>  	return need_tlb_flush;
> @@ -1239,7 +1225,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  			     struct kvm_memory_slot *slot, unsigned long data)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct pte_list_iterator iter;
>  	int need_flush = 0;
>  	u64 new_spte;
>  	pte_t *ptep = (pte_t *)data;
> @@ -1248,7 +1234,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  	WARN_ON(pte_huge(*ptep));
>  	new_pfn = pte_pfn(*ptep);
> 
> -	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
> +restart:
> +	for_each_spte_in_rmap(*rmapp, iter, sptep) {
>  		BUG_ON(!is_shadow_present_pte(*sptep));
>  		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", sptep, *sptep);
> 
> @@ -1256,19 +1243,18 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
> 
>  		if (pte_write(*ptep)) {
>  			drop_spte(kvm, sptep);
> -			sptep = rmap_get_first(*rmapp, &iter);
> -		} else {
> -			new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
> -			new_spte |= (u64)new_pfn << PAGE_SHIFT;
> +			goto restart;
I do not like this "goto restart" pattern throughout this patch. Follow up
patch gets rid of most of them, so they are tolerable as a temporary solution
here, but this one stays. What about moving pte_write(*ptep) outside
for_each_spte_in_rmap() loop like that:

if (pte_write(*ptep))
	need_flush = kvm_unmap_rmapp();
else
	for_each_spte_in_rmap() {
	}

> +		}
> 
> -			new_spte &= ~PT_WRITABLE_MASK;
> -			new_spte &= ~SPTE_HOST_WRITEABLE;
> -			new_spte &= ~shadow_accessed_mask;
> +		new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
> +		new_spte |= (u64)new_pfn << PAGE_SHIFT;
> 
> -			mmu_spte_clear_track_bits(sptep);
> -			mmu_spte_set(sptep, new_spte);
> -			sptep = rmap_get_next(&iter);
> -		}
> +		new_spte &= ~PT_WRITABLE_MASK;
> +		new_spte &= ~SPTE_HOST_WRITEABLE;
> +		new_spte &= ~shadow_accessed_mask;
> +
> +		mmu_spte_clear_track_bits(sptep);
> +		mmu_spte_set(sptep, new_spte);
>  	}
> 
>  	if (need_flush)
> @@ -1359,7 +1345,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  			 struct kvm_memory_slot *slot, unsigned long data)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator uninitialized_var(iter);
> +	struct pte_list_iterator uninitialized_var(iter);
>  	int young = 0;
> 
>  	/*
> @@ -1375,8 +1361,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  		goto out;
>  	}
> 
> -	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
> -	     sptep = rmap_get_next(&iter)) {
> +	for_each_spte_in_rmap(*rmapp, iter, sptep) {
>  		BUG_ON(!is_shadow_present_pte(*sptep));
> 
>  		if (*sptep & shadow_accessed_mask) {
> @@ -1395,7 +1380,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  			      struct kvm_memory_slot *slot, unsigned long data)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct pte_list_iterator iter;
>  	int young = 0;
> 
>  	/*
> @@ -1406,8 +1391,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  	if (!shadow_accessed_mask)
>  		goto out;
> 
> -	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
> -	     sptep = rmap_get_next(&iter)) {
> +	for_each_spte_in_rmap(*rmapp, iter, sptep) {
>  		BUG_ON(!is_shadow_present_pte(*sptep));
> 
>  		if (*sptep & shadow_accessed_mask) {
> @@ -1539,7 +1523,11 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>  static void mark_unsync(u64 *spte);
>  static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
>  {
> -	pte_list_walk(&sp->parent_ptes, mark_unsync);
> +	struct pte_list_iterator iter;
> +	u64 *spte;
> +
> +	for_each_spte_in_pte_list(sp->parent_ptes, iter, spte)
> +		mark_unsync(spte);
>  }
> 
>  static void mark_unsync(u64 *spte)
> @@ -2059,10 +2047,13 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
>  static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct pte_list_iterator iter;
> 
> -	while ((sptep = rmap_get_first(sp->parent_ptes, &iter)))
> +restart:
> +	for_each_spte_in_rmap(sp->parent_ptes, iter, sptep) {
>  		drop_parent_pte(sp, sptep);
> +		goto restart;
> +	}
>  }
> 
>  static int mmu_zap_unsync_children(struct kvm *kvm,
> diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
> index daff69e..a08d384 100644
> --- a/arch/x86/kvm/mmu_audit.c
> +++ b/arch/x86/kvm/mmu_audit.c
> @@ -190,15 +190,14 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>  	unsigned long *rmapp;
>  	u64 *sptep;
> -	struct rmap_iterator iter;
> +	struct pte_list_iterator iter;
> 
>  	if (sp->role.direct || sp->unsync || sp->role.invalid)
>  		return;
> 
>  	rmapp = gfn_to_rmap(kvm, sp->gfn, PT_PAGE_TABLE_LEVEL);
> 
> -	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
> -	     sptep = rmap_get_next(&iter)) {
> +	for_each_spte_in_rmap(*rmapp, iter, sptep) {
>  		if (is_writable_pte(*sptep))
>  			audit_printk(kvm, "shadow page has writable "
>  				     "mappings: gfn %llx role %x\n",
> -- 
> 1.7.7.6

--
			Gleb.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access
  2013-01-25  2:46     ` Xiao Guangrong
@ 2013-01-29  0:07       ` Marcelo Tosatti
  2013-01-29  1:07         ` Marcelo Tosatti
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2013-01-29  0:07 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Gleb Natapov, LKML, KVM

On Fri, Jan 25, 2013 at 10:46:31AM +0800, Xiao Guangrong wrote:
> On 01/25/2013 08:15 AM, Marcelo Tosatti wrote:
> > On Wed, Jan 23, 2013 at 06:07:20PM +0800, Xiao Guangrong wrote:
> >> It makes set_spte more clean and reduces branch prediction
> >>
> >> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >> ---
> >>  arch/x86/kvm/mmu.c |   37 ++++++++++++++++++++++++++-----------
> >>  1 files changed, 26 insertions(+), 11 deletions(-)
> > 
> > Don't see set_spte as being a performance problem?
> > IMO the current code is quite simple.
> 
> Yes, this is not a performance problem.
> 
> I just dislike this many continuous "if"-s in the function:
> 
> if (xxx)
> 	xxx
> if (xxx)
> 	xxx
> ....
> 
> Totally, it has 7 "if"-s before this patch.
> 
> Okay, if you think this is unnecessary, i will drop this patch. :)

Yes, please (unless you can show set_spte is a performance problem).



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte
  2013-01-23 10:05 ` [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte Xiao Guangrong
@ 2013-01-29  0:21   ` Marcelo Tosatti
  2013-01-29  2:55     ` Xiao Guangrong
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2013-01-29  0:21 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Gleb Natapov, LKML, KVM

On Wed, Jan 23, 2013 at 06:05:29PM +0800, Xiao Guangrong wrote:
> In order to detecting spte remapping, we can simply check whether the
> spte has already been pointing to the pfn even if the spte is not the
> last spte, for middle spte is pointing to the kernel pfn which can not
> be mapped to userspace

This check is detecting spte overwrite, when a large spte is replaced by 
pointer to spte table.

Can't see why check for different pfn is safe: only 'int level' can
differ, and pfn be equivalent, for example.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access
  2013-01-29  0:07       ` Marcelo Tosatti
@ 2013-01-29  1:07         ` Marcelo Tosatti
  2013-01-29 13:16           ` Gleb Natapov
  2013-01-30  3:53           ` Xiao Guangrong
  0 siblings, 2 replies; 28+ messages in thread
From: Marcelo Tosatti @ 2013-01-29  1:07 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Gleb Natapov, LKML, KVM

On Mon, Jan 28, 2013 at 10:07:15PM -0200, Marcelo Tosatti wrote:
> On Fri, Jan 25, 2013 at 10:46:31AM +0800, Xiao Guangrong wrote:
> > On 01/25/2013 08:15 AM, Marcelo Tosatti wrote:
> > > On Wed, Jan 23, 2013 at 06:07:20PM +0800, Xiao Guangrong wrote:
> > >> It makes set_spte more clean and reduces branch prediction
> > >>
> > >> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> > >> ---
> > >>  arch/x86/kvm/mmu.c |   37 ++++++++++++++++++++++++++-----------
> > >>  1 files changed, 26 insertions(+), 11 deletions(-)
> > > 
> > > Don't see set_spte as being a performance problem?
> > > IMO the current code is quite simple.
> > 
> > Yes, this is not a performance problem.
> > 
> > I just dislike this many continuous "if"-s in the function:
> > 
> > if (xxx)
> > 	xxx
> > if (xxx)
> > 	xxx
> > ....
> > 
> > Totally, it has 7 "if"-s before this patch.
> > 
> > Okay, if you think this is unnecessary, i will drop this patch. :)
> 
> Yes, please (unless you can show set_spte is a performance problem).

Same thing for spte fast drop: is it a performance problem? 

Please try to group changes into smaller, less controversial sets with 
a clear goal:

- Debated performance improvement.
- Cleanups (eg mmu_set_spte argument removal).
- Bug fixes.
- Performance improvements.

Thanks.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte
  2013-01-29  0:21   ` Marcelo Tosatti
@ 2013-01-29  2:55     ` Xiao Guangrong
  2013-01-29 21:53       ` Marcelo Tosatti
  0 siblings, 1 reply; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-29  2:55 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Gleb Natapov, LKML, KVM

On 01/29/2013 08:21 AM, Marcelo Tosatti wrote:
> On Wed, Jan 23, 2013 at 06:05:29PM +0800, Xiao Guangrong wrote:
>> In order to detecting spte remapping, we can simply check whether the
>> spte has already been pointing to the pfn even if the spte is not the
>> last spte, for middle spte is pointing to the kernel pfn which can not
>> be mapped to userspace
> 
> This check is detecting spte overwrite, when a large spte is replaced by 
> pointer to spte table.
> 
> Can't see why check for different pfn is safe: only 'int level' can
> differ, and pfn be equivalent, for example.

The 'u64 *sptep' must on the "int level" we want to set, that means:
 page_header(__pa(sptep)).role.level == "int level".


We discussed this before :), the discussion can be found at:
http://marc.info/?l=kvm&m=135345057329427&w=2.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 01/12] KVM: MMU: lazily drop large spte
  2013-01-27 12:06 ` [PATCH v2 01/12] KVM: MMU: lazily drop large spte Gleb Natapov
@ 2013-01-29  2:57   ` Xiao Guangrong
  0 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-29  2:57 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On 01/27/2013 08:06 PM, Gleb Natapov wrote:
> On Wed, Jan 23, 2013 at 06:04:17PM +0800, Xiao Guangrong wrote:
>> Do not drop large spte until it can be insteaded by small pages so that
>> the guest can happliy read memory through it
>>
>> The idea is from Avi:
>> | As I mentioned before, write-protecting a large spte is a good idea,
>> | since it moves some work from protect-time to fault-time, so it reduces
>> | jitter.  This removes the need for the return value.
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c |   21 ++++++---------------
>>  1 files changed, 6 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 9f628f7..0f90269 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1105,7 +1105,7 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
>>
>>  /*
>>   * Write-protect on the specified @sptep, @pt_protect indicates whether
>> - * spte writ-protection is caused by protecting shadow page table.
>> + * spte write-protection is caused by protecting shadow page table.
>>   * @flush indicates whether tlb need be flushed.
>>   *
>>   * Note: write protection is difference between drity logging and spte
>> @@ -1114,31 +1114,23 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
>>   *   its dirty bitmap is properly set.
>>   * - for spte protection, the spte can be writable only after unsync-ing
>>   *   shadow page.
>> - *
>> - * Return true if the spte is dropped.
>>   */
>> -static bool
>> +static void
>>  spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect)
> Since return value is not longer used make the function return true if flush is needed
> instead of returning it via pointer to a variable.

Right, i forgot to check it, will update it in the next version. Thanks for your pointing
it out.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/12] KVM: MMU: unify the code of walking pte list
  2013-01-27 13:28   ` Gleb Natapov
@ 2013-01-29  3:01     ` Xiao Guangrong
  0 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-29  3:01 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On 01/27/2013 09:28 PM, Gleb Natapov wrote:

>>
>> @@ -1256,19 +1243,18 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
>>
>>  		if (pte_write(*ptep)) {
>>  			drop_spte(kvm, sptep);
>> -			sptep = rmap_get_first(*rmapp, &iter);
>> -		} else {
>> -			new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
>> -			new_spte |= (u64)new_pfn << PAGE_SHIFT;
>> +			goto restart;
> I do not like this "goto restart" pattern throughout this patch. Follow up
> patch gets rid of most of them, so they are tolerable as a temporary solution
> here, but this one stays. What about moving pte_write(*ptep) outside
> for_each_spte_in_rmap() loop like that:
> 
> if (pte_write(*ptep))
> 	need_flush = kvm_unmap_rmapp();
> else
> 	for_each_spte_in_rmap() {
> 	}

Nice. Will do.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access
  2013-01-29  1:07         ` Marcelo Tosatti
@ 2013-01-29 13:16           ` Gleb Natapov
  2013-01-30  3:53           ` Xiao Guangrong
  1 sibling, 0 replies; 28+ messages in thread
From: Gleb Natapov @ 2013-01-29 13:16 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, Avi Kivity, LKML, KVM

On Mon, Jan 28, 2013 at 11:07:58PM -0200, Marcelo Tosatti wrote:
> On Mon, Jan 28, 2013 at 10:07:15PM -0200, Marcelo Tosatti wrote:
> > On Fri, Jan 25, 2013 at 10:46:31AM +0800, Xiao Guangrong wrote:
> > > On 01/25/2013 08:15 AM, Marcelo Tosatti wrote:
> > > > On Wed, Jan 23, 2013 at 06:07:20PM +0800, Xiao Guangrong wrote:
> > > >> It makes set_spte more clean and reduces branch prediction
> > > >>
> > > >> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> > > >> ---
> > > >>  arch/x86/kvm/mmu.c |   37 ++++++++++++++++++++++++++-----------
> > > >>  1 files changed, 26 insertions(+), 11 deletions(-)
> > > > 
> > > > Don't see set_spte as being a performance problem?
> > > > IMO the current code is quite simple.
> > > 
> > > Yes, this is not a performance problem.
> > > 
> > > I just dislike this many continuous "if"-s in the function:
> > > 
> > > if (xxx)
> > > 	xxx
> > > if (xxx)
> > > 	xxx
> > > ....
> > > 
> > > Totally, it has 7 "if"-s before this patch.
> > > 
> > > Okay, if you think this is unnecessary, i will drop this patch. :)
> > 
> > Yes, please (unless you can show set_spte is a performance problem).
> 
> Same thing for spte fast drop: is it a performance problem? 
> 
I like spte fast drop because it gets rid of "goto restart" pattern.
for_each_spte_in_rmap_safe() can be the alternative.

> Please try to group changes into smaller, less controversial sets with 
> a clear goal:
> 
> - Debated performance improvement.
> - Cleanups (eg mmu_set_spte argument removal).
> - Bug fixes.
> - Performance improvements.
> 
> Thanks.

--
			Gleb.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte
  2013-01-29  2:55     ` Xiao Guangrong
@ 2013-01-29 21:53       ` Marcelo Tosatti
  2013-01-30  3:22         ` Xiao Guangrong
  0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2013-01-29 21:53 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Gleb Natapov, LKML, KVM

On Tue, Jan 29, 2013 at 10:55:24AM +0800, Xiao Guangrong wrote:
> On 01/29/2013 08:21 AM, Marcelo Tosatti wrote:
> > On Wed, Jan 23, 2013 at 06:05:29PM +0800, Xiao Guangrong wrote:
> >> In order to detecting spte remapping, we can simply check whether the
> >> spte has already been pointing to the pfn even if the spte is not the
> >> last spte, for middle spte is pointing to the kernel pfn which can not
> >> be mapped to userspace
> > 
> > This check is detecting spte overwrite, when a large spte is replaced by 
> > pointer to spte table.
> > 
> > Can't see why check for different pfn is safe: only 'int level' can
> > differ, and pfn be equivalent, for example.
> 
> The 'u64 *sptep' must on the "int level" we want to set, that means:
>  page_header(__pa(sptep)).role.level == "int level".

Right, then stick a comment there noting which cases that condition handles.
Keep the current comment and add more.

> We discussed this before :), the discussion can be found at:
> http://marc.info/?l=kvm&m=135345057329427&w=2.

Note http://marc.info/?l=kvm&m=135345059929436&w=2, please take into
account in the future.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte
  2013-01-29 21:53       ` Marcelo Tosatti
@ 2013-01-30  3:22         ` Xiao Guangrong
  0 siblings, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-30  3:22 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Gleb Natapov, LKML, KVM

On 01/30/2013 05:53 AM, Marcelo Tosatti wrote:
> On Tue, Jan 29, 2013 at 10:55:24AM +0800, Xiao Guangrong wrote:
>> On 01/29/2013 08:21 AM, Marcelo Tosatti wrote:
>>> On Wed, Jan 23, 2013 at 06:05:29PM +0800, Xiao Guangrong wrote:
>>>> In order to detecting spte remapping, we can simply check whether the
>>>> spte has already been pointing to the pfn even if the spte is not the
>>>> last spte, for middle spte is pointing to the kernel pfn which can not
>>>> be mapped to userspace
>>>
>>> This check is detecting spte overwrite, when a large spte is replaced by 
>>> pointer to spte table.
>>>
>>> Can't see why check for different pfn is safe: only 'int level' can
>>> differ, and pfn be equivalent, for example.
>>
>> The 'u64 *sptep' must on the "int level" we want to set, that means:
>>  page_header(__pa(sptep)).role.level == "int level".
> 
> Right, then stick a comment there noting which cases that condition handles.
> Keep the current comment and add more.
> 

Okay.

>> We discussed this before :), the discussion can be found at:
>> http://marc.info/?l=kvm&m=135345057329427&w=2.
> 
> Note http://marc.info/?l=kvm&m=135345059929436&w=2, please take into
> account in the future.

Okay, i will make that patch more simpler in the next version.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access
  2013-01-29  1:07         ` Marcelo Tosatti
  2013-01-29 13:16           ` Gleb Natapov
@ 2013-01-30  3:53           ` Xiao Guangrong
  1 sibling, 0 replies; 28+ messages in thread
From: Xiao Guangrong @ 2013-01-30  3:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Gleb Natapov, LKML, KVM

On 01/29/2013 09:07 AM, Marcelo Tosatti wrote:
> On Mon, Jan 28, 2013 at 10:07:15PM -0200, Marcelo Tosatti wrote:
>> On Fri, Jan 25, 2013 at 10:46:31AM +0800, Xiao Guangrong wrote:
>>> On 01/25/2013 08:15 AM, Marcelo Tosatti wrote:
>>>> On Wed, Jan 23, 2013 at 06:07:20PM +0800, Xiao Guangrong wrote:
>>>>> It makes set_spte more clean and reduces branch prediction
>>>>>
>>>>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>>>>> ---
>>>>>  arch/x86/kvm/mmu.c |   37 ++++++++++++++++++++++++++-----------
>>>>>  1 files changed, 26 insertions(+), 11 deletions(-)
>>>>
>>>> Don't see set_spte as being a performance problem?
>>>> IMO the current code is quite simple.
>>>
>>> Yes, this is not a performance problem.
>>>
>>> I just dislike this many continuous "if"-s in the function:
>>>
>>> if (xxx)
>>> 	xxx
>>> if (xxx)
>>> 	xxx
>>> ....
>>>
>>> Totally, it has 7 "if"-s before this patch.
>>>
>>> Okay, if you think this is unnecessary, i will drop this patch. :)
>>
>> Yes, please (unless you can show set_spte is a performance problem).
> 
> Same thing for spte fast drop: is it a performance problem? 

It does not fix a performance problem. The patch does optimization on the
noncrucial path and cleanup for the previous patch.

> 
> Please try to group changes into smaller, less controversial sets with 
> a clear goal:

Ah. I thought they are simple and most of them have been reviewed, so i putted
them into one group, now i know i made a mistake. ;)

> 
> - Debated performance improvement.
> - Cleanups (eg mmu_set_spte argument removal).
> - Bug fixes.
> - Performance improvements.

Okay. I will split the long series into these set:

Set 1 for improvement:
[PATCH v2 01/12] KVM: MMU: lazily drop large spte

Set 2 for cleanups:
[PATCH v2 02/12] KVM: MMU: cleanup mapping-level
[PATCH v2 07/12] KVM: MMU: remove pt_access in mmu_set_spte
[PATCH v2 08/12] KVM: MMU: cleanup __direct_map

Set 3 for simplifying set spte path:
[PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte
[PATCH v2 04/12] KVM: MMU: simplify set_spte
[PATCH v2 05/12] KVM: MMU: introduce vcpu_adjust_access

Set 4 for fixing and unifying rmap walking
[PATCH v2 11/12] KVM: MMU: fix spte assertion
[PATCH v2 09/12] KVM: MMU: introduce mmu_spte_establish
[PATCH v2 10/12] KVM: MMU: unify the code of walking pte list
[PATCH v2 12/12] KVM: MMU: fast drop all spte on the pte_list

Thanks!


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-01-30  3:54 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-23 10:04 [PATCH v2 01/12] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-01-23 10:04 ` [PATCH v2 02/12] KVM: MMU: cleanup mapping-level Xiao Guangrong
2013-01-23 10:05 ` [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte Xiao Guangrong
2013-01-29  0:21   ` Marcelo Tosatti
2013-01-29  2:55     ` Xiao Guangrong
2013-01-29 21:53       ` Marcelo Tosatti
2013-01-30  3:22         ` Xiao Guangrong
2013-01-23 10:06 ` [PATCH v2 04/12] KVM: MMU: simplify set_spte Xiao Guangrong
2013-01-23 10:06 ` [PATCH v2 05/12] KVM: MMU: introduce vcpu_adjust_access Xiao Guangrong
2013-01-24 10:36   ` Gleb Natapov
2013-01-24 11:33     ` Xiao Guangrong
2013-01-23 10:07 ` [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access Xiao Guangrong
2013-01-25  0:15   ` Marcelo Tosatti
2013-01-25  2:46     ` Xiao Guangrong
2013-01-29  0:07       ` Marcelo Tosatti
2013-01-29  1:07         ` Marcelo Tosatti
2013-01-29 13:16           ` Gleb Natapov
2013-01-30  3:53           ` Xiao Guangrong
2013-01-23 10:07 ` [PATCH v2 07/12] KVM: MMU: remove pt_access in mmu_set_spte Xiao Guangrong
2013-01-23 10:08 ` [PATCH v2 08/12] KVM: MMU: cleanup __direct_map Xiao Guangrong
2013-01-23 10:09 ` [PATCH v2 09/12] KVM: MMU: introduce mmu_spte_establish Xiao Guangrong
2013-01-23 10:09 ` [PATCH v2 10/12] KVM: MMU: unify the code of walking pte list Xiao Guangrong
2013-01-27 13:28   ` Gleb Natapov
2013-01-29  3:01     ` Xiao Guangrong
2013-01-23 10:10 ` [PATCH v2 11/12] KVM: MMU: fix spte assertion Xiao Guangrong
2013-01-23 10:10 ` [PATCH v2 12/12] KVM: MMU: fast drop all spte on the pte_list Xiao Guangrong
2013-01-27 12:06 ` [PATCH v2 01/12] KVM: MMU: lazily drop large spte Gleb Natapov
2013-01-29  2:57   ` Xiao Guangrong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.