linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/5] KVM: MMU: cleanup mapping-level
@ 2012-11-05 12:09 Xiao Guangrong
  2012-11-05 12:10 ` [PATCH 2/5] KVM: MMU: simplify mmu_set_spte Xiao Guangrong
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Xiao Guangrong @ 2012-11-05 12:09 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

Use min() to cleanup mapping_level

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1d8869c..692ebb1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -831,8 +831,7 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
 	if (host_level == PT_PAGE_TABLE_LEVEL)
 		return host_level;

-	max_level = kvm_x86_ops->get_lpage_level() < host_level ?
-		kvm_x86_ops->get_lpage_level() : host_level;
+	max_level = min(kvm_x86_ops->get_lpage_level(), host_level);

 	for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
 		if (has_wrprotected_page(vcpu->kvm, large_gfn, level))
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] KVM: MMU: simplify mmu_set_spte
  2012-11-05 12:09 [PATCH 1/5] KVM: MMU: cleanup mapping-level Xiao Guangrong
@ 2012-11-05 12:10 ` Xiao Guangrong
  2012-11-12 23:12   ` Marcelo Tosatti
  2012-11-05 12:11 ` [PATCH 3/5] KVM: MMU: simplify set_spte Xiao Guangrong
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2012-11-05 12:10 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

In order to detecting spte remapping, we can simply check whether the
spte has already been pointing to the pfn even if the spte is not the
last spte for middle spte is pointing to the kernel pfn which can not
be mapped to userspace

Also, update slot and stat.lpages iff the spte is not remapped

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |   40 +++++++++++++---------------------------
 1 files changed, 13 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 692ebb1..4ea731e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2420,8 +2420,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			 pfn_t pfn, bool speculative,
 			 bool host_writable)
 {
-	int was_rmapped = 0;
-	int rmap_count;
+	bool was_rmapped = false;

 	pgprintk("%s: spte %llx access %x write_fault %d"
 		 " user_fault %d gfn %llx\n",
@@ -2429,25 +2428,13 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 write_fault, user_fault, gfn);

 	if (is_rmap_spte(*sptep)) {
-		/*
-		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
-		 * the parent of the now unreachable PTE.
-		 */
-		if (level > PT_PAGE_TABLE_LEVEL &&
-		    !is_large_pte(*sptep)) {
-			struct kvm_mmu_page *child;
-			u64 pte = *sptep;
+		if (pfn != spte_to_pfn(*sptep)) {
+			struct kvm_mmu_page *sp = page_header(__pa(sptep));

-			child = page_header(pte & PT64_BASE_ADDR_MASK);
-			drop_parent_pte(child, sptep);
-			kvm_flush_remote_tlbs(vcpu->kvm);
-		} else if (pfn != spte_to_pfn(*sptep)) {
-			pgprintk("hfn old %llx new %llx\n",
-				 spte_to_pfn(*sptep), pfn);
-			drop_spte(vcpu->kvm, sptep);
-			kvm_flush_remote_tlbs(vcpu->kvm);
+			if (mmu_page_zap_pte(vcpu->kvm, sp, sptep))
+				kvm_flush_remote_tlbs(vcpu->kvm);
 		} else
-			was_rmapped = 1;
+			was_rmapped = true;
 	}

 	if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
@@ -2466,16 +2453,15 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 is_large_pte(*sptep)? "2MB" : "4kB",
 		 *sptep & PT_PRESENT_MASK ?"RW":"R", gfn,
 		 *sptep, sptep);
-	if (!was_rmapped && is_large_pte(*sptep))
-		++vcpu->kvm->stat.lpages;

-	if (is_shadow_present_pte(*sptep)) {
+	if (is_shadow_present_pte(*sptep) && !was_rmapped) {
+		if (is_large_pte(*sptep))
+			++vcpu->kvm->stat.lpages;
+
 		page_header_update_slot(vcpu->kvm, sptep, gfn);
-		if (!was_rmapped) {
-			rmap_count = rmap_add(vcpu, sptep, gfn);
-			if (rmap_count > RMAP_RECYCLE_THRESHOLD)
-				rmap_recycle(vcpu, sptep, gfn);
-		}
+
+		if (rmap_add(vcpu, sptep, gfn) > RMAP_RECYCLE_THRESHOLD)
+			rmap_recycle(vcpu, sptep, gfn);
 	}

 	kvm_release_pfn_clean(pfn);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] KVM: MMU: simplify set_spte
  2012-11-05 12:09 [PATCH 1/5] KVM: MMU: cleanup mapping-level Xiao Guangrong
  2012-11-05 12:10 ` [PATCH 2/5] KVM: MMU: simplify mmu_set_spte Xiao Guangrong
@ 2012-11-05 12:11 ` Xiao Guangrong
  2012-11-20 22:24   ` Marcelo Tosatti
  2012-11-05 12:12 ` [PATCH 4/5] KVM: MMU: move adjusting softmmu pte access to FNAME(page_fault) Xiao Guangrong
  2012-11-05 12:12 ` [PATCH 5/5] KVM: MMU: remove pt_access in mmu_set_spte Xiao Guangrong
  3 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2012-11-05 12:11 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

It is more cleaner if we can update pte_access fist then set spte according
to pte_access, also introduce gfn_need_write_protect to check whether the
gfn need to be write-protected

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c |  109 ++++++++++++++++++++++++++++++++--------------------
 1 files changed, 67 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4ea731e..49957df 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2329,6 +2329,63 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 	return 0;
 }

+static bool gfn_need_write_protect(struct kvm_vcpu *vcpu, u64 *sptep,
+				   int level,  gfn_t gfn, bool can_unsync)
+{
+	/*
+	 * Optimization: for pte sync, if spte was writable the hash
+	 * lookup is unnecessary (and expensive). Write protection
+	 * is responsibility of mmu_get_page / kvm_sync_page.
+	 * Same reasoning can be applied to dirty page accounting.
+	 */
+	if (!can_unsync && is_writable_pte(*sptep))
+		return false;
+
+	if ((level > PT_PAGE_TABLE_LEVEL &&
+	   has_wrprotected_page(vcpu->kvm, gfn, level)) ||
+	      mmu_need_write_protect(vcpu, gfn, can_unsync))
+		return true;
+
+	return false;
+}
+
+/* The return value indicates whether the @gfn need to be write protected. */
+static bool vcpu_adjust_access(struct kvm_vcpu *vcpu, u64 *sptep,
+			       unsigned *pte_access, int user_fault,
+			       int write_fault, int level, gfn_t gfn,
+			       bool can_unsync, bool host_writable)
+{
+	bool ret = false;
+	unsigned access = *pte_access;
+
+	if (!host_writable)
+		access &= ~ACC_WRITE_MASK;
+
+	if (!(access & ACC_WRITE_MASK) && (!vcpu->arch.mmu.direct_map &&
+	      write_fault && !is_write_protection(vcpu) && !user_fault)) {
+		access |= ACC_WRITE_MASK;
+		access &= ~ACC_USER_MASK;
+
+		/*
+		 * If we converted a user page to a kernel page,
+		 * so that the kernel can write to it when cr0.wp=0,
+		 * then we should prevent the kernel from executing it
+		 * if SMEP is enabled.
+		 */
+		if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
+			access &= ~ACC_EXEC_MASK;
+	}
+
+	if ((access & ACC_WRITE_MASK) &&
+		  gfn_need_write_protect(vcpu, sptep, level, gfn, can_unsync)) {
+		access &= ~ACC_WRITE_MASK;
+		ret = true;
+	}
+
+	*pte_access = access;
+	return ret;
+}
+
 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		    unsigned pte_access, int user_fault,
 		    int write_fault, int level,
@@ -2341,6 +2398,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (set_mmio_spte(sptep, gfn, pfn, pte_access))
 		return 0;

+	ret = vcpu_adjust_access(vcpu, sptep, &pte_access, user_fault,
+		      write_fault, level, gfn, can_unsync, host_writable);
+
 	spte = PT_PRESENT_MASK;
 	if (!speculative)
 		spte |= shadow_accessed_mask;
@@ -2353,61 +2413,26 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (pte_access & ACC_USER_MASK)
 		spte |= shadow_user_mask;

+	if (pte_access & ACC_WRITE_MASK) {
+		spte |= PT_WRITABLE_MASK;
+		spte |= SPTE_MMU_WRITEABLE;
+	}
+
 	if (level > PT_PAGE_TABLE_LEVEL)
 		spte |= PT_PAGE_SIZE_MASK;
+
 	if (tdp_enabled)
 		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
 			kvm_is_mmio_pfn(pfn));

 	if (host_writable)
 		spte |= SPTE_HOST_WRITEABLE;
-	else
-		pte_access &= ~ACC_WRITE_MASK;

 	spte |= (u64)pfn << PAGE_SHIFT;

-	if ((pte_access & ACC_WRITE_MASK)
-	    || (!vcpu->arch.mmu.direct_map && write_fault
-		&& !is_write_protection(vcpu) && !user_fault)) {
-		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
-
-		if (!vcpu->arch.mmu.direct_map
-		    && !(pte_access & ACC_WRITE_MASK)) {
-			spte &= ~PT_USER_MASK;
-			/*
-			 * If we converted a user page to a kernel page,
-			 * so that the kernel can write to it when cr0.wp=0,
-			 * then we should prevent the kernel from executing it
-			 * if SMEP is enabled.
-			 */
-			if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
-				spte |= PT64_NX_MASK;
-		}
-
-		/*
-		 * Optimization: for pte sync, if spte was writable the hash
-		 * lookup is unnecessary (and expensive). Write protection
-		 * is responsibility of mmu_get_page / kvm_sync_page.
-		 * Same reasoning can be applied to dirty page accounting.
-		 */
-		if (!can_unsync && is_writable_pte(*sptep))
-			goto set_pte;
-
-		if ((level > PT_PAGE_TABLE_LEVEL &&
-		   has_wrprotected_page(vcpu->kvm, gfn, level)) ||
-		      mmu_need_write_protect(vcpu, gfn, can_unsync)) {
-			pgprintk("%s: found shadow page for %llx, marking ro\n",
-				 __func__, gfn);
-			ret = 1;
-			pte_access &= ~ACC_WRITE_MASK;
-			spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
-		}
-	}
-
-	if (pte_access & ACC_WRITE_MASK)
+	if (is_writable_pte(spte))
 		mark_page_dirty(vcpu->kvm, gfn);

-set_pte:
 	if (mmu_spte_update(sptep, spte))
 		kvm_flush_remote_tlbs(vcpu->kvm);
 	return ret;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] KVM: MMU: move adjusting softmmu pte access to FNAME(page_fault)
  2012-11-05 12:09 [PATCH 1/5] KVM: MMU: cleanup mapping-level Xiao Guangrong
  2012-11-05 12:10 ` [PATCH 2/5] KVM: MMU: simplify mmu_set_spte Xiao Guangrong
  2012-11-05 12:11 ` [PATCH 3/5] KVM: MMU: simplify set_spte Xiao Guangrong
@ 2012-11-05 12:12 ` Xiao Guangrong
  2012-11-20 22:27   ` Marcelo Tosatti
  2012-11-05 12:12 ` [PATCH 5/5] KVM: MMU: remove pt_access in mmu_set_spte Xiao Guangrong
  3 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2012-11-05 12:12 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Then, no mmu specified code exists in the common function and drop two
parameters in set_spte

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c         |   42 +++++++++++-------------------------------
 arch/x86/kvm/paging_tmpl.h |   25 ++++++++++++++++++++-----
 2 files changed, 31 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 49957df..4229e78 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2351,8 +2351,7 @@ static bool gfn_need_write_protect(struct kvm_vcpu *vcpu, u64 *sptep,

 /* The return value indicates whether the @gfn need to be write protected. */
 static bool vcpu_adjust_access(struct kvm_vcpu *vcpu, u64 *sptep,
-			       unsigned *pte_access, int user_fault,
-			       int write_fault, int level, gfn_t gfn,
+			       unsigned *pte_access, int level, gfn_t gfn,
 			       bool can_unsync, bool host_writable)
 {
 	bool ret = false;
@@ -2361,21 +2360,6 @@ static bool vcpu_adjust_access(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (!host_writable)
 		access &= ~ACC_WRITE_MASK;

-	if (!(access & ACC_WRITE_MASK) && (!vcpu->arch.mmu.direct_map &&
-	      write_fault && !is_write_protection(vcpu) && !user_fault)) {
-		access |= ACC_WRITE_MASK;
-		access &= ~ACC_USER_MASK;
-
-		/*
-		 * If we converted a user page to a kernel page,
-		 * so that the kernel can write to it when cr0.wp=0,
-		 * then we should prevent the kernel from executing it
-		 * if SMEP is enabled.
-		 */
-		if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
-			access &= ~ACC_EXEC_MASK;
-	}
-
 	if ((access & ACC_WRITE_MASK) &&
 		  gfn_need_write_protect(vcpu, sptep, level, gfn, can_unsync)) {
 		access &= ~ACC_WRITE_MASK;
@@ -2387,8 +2371,7 @@ static bool vcpu_adjust_access(struct kvm_vcpu *vcpu, u64 *sptep,
 }

 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-		    unsigned pte_access, int user_fault,
-		    int write_fault, int level,
+		    unsigned pte_access, int level,
 		    gfn_t gfn, pfn_t pfn, bool speculative,
 		    bool can_unsync, bool host_writable)
 {
@@ -2398,8 +2381,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (set_mmio_spte(sptep, gfn, pfn, pte_access))
 		return 0;

-	ret = vcpu_adjust_access(vcpu, sptep, &pte_access, user_fault,
-		      write_fault, level, gfn, can_unsync, host_writable);
+	ret = vcpu_adjust_access(vcpu, sptep, &pte_access, level, gfn,
+				 can_unsync, host_writable);

 	spte = PT_PRESENT_MASK;
 	if (!speculative)
@@ -2440,17 +2423,14 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,

 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			 unsigned pt_access, unsigned pte_access,
-			 int user_fault, int write_fault,
-			 int *emulate, int level, gfn_t gfn,
-			 pfn_t pfn, bool speculative,
+			 int write_fault, int *emulate, int level,
+			 gfn_t gfn, pfn_t pfn, bool speculative,
 			 bool host_writable)
 {
 	bool was_rmapped = false;

-	pgprintk("%s: spte %llx access %x write_fault %d"
-		 " user_fault %d gfn %llx\n",
-		 __func__, *sptep, pt_access,
-		 write_fault, user_fault, gfn);
+	pgprintk("%s: spte %llx access %x write_fault %d gfn %llx\n",
+		 __func__, *sptep, pt_access, write_fault, gfn);

 	if (is_rmap_spte(*sptep)) {
 		if (pfn != spte_to_pfn(*sptep)) {
@@ -2462,7 +2442,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			was_rmapped = true;
 	}

-	if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
+	if (set_spte(vcpu, sptep, pte_access,
 		      level, gfn, pfn, speculative, true,
 		      host_writable)) {
 		if (write_fault)
@@ -2556,7 +2536,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,

 	for (i = 0; i < ret; i++, gfn++, start++)
 		mmu_set_spte(vcpu, start, ACC_ALL,
-			     access, 0, 0, NULL,
+			     access, 0, NULL,
 			     sp->role.level, gfn,
 			     page_to_pfn(pages[i]), true, true);

@@ -2620,7 +2600,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 			unsigned pte_access = ACC_ALL;

 			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access,
-				     0, write, &emulate,
+				     write, &emulate,
 				     level, gfn, pfn, prefault, map_writable);
 			direct_pte_prefetch(vcpu, iterator.sptep);
 			++vcpu->stat.pf_fixed;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 891eb6d..b1bcd68 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -330,7 +330,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0,
+	mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0,
 		     NULL, PT_PAGE_TABLE_LEVEL, gfn, pfn, true, true);

 	return true;
@@ -405,7 +405,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
  */
 static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			 struct guest_walker *gw,
-			 int user_fault, int write_fault, int hlevel,
+			 int write_fault, int hlevel,
 			 pfn_t pfn, bool map_writable, bool prefault)
 {
 	struct kvm_mmu_page *sp = NULL;
@@ -478,7 +478,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,

 	clear_sp_write_flooding_count(it.sptep);
 	mmu_set_spte(vcpu, it.sptep, access, gw->pte_access,
-		     user_fault, write_fault, &emulate, it.level,
+		     write_fault, &emulate, it.level,
 		     gw->gfn, pfn, prefault, map_writable);
 	FNAME(pte_prefetch)(vcpu, gw, it.sptep);

@@ -544,6 +544,21 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 		return 0;
 	}

+	if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
+	      !is_write_protection(vcpu) && !user_fault) {
+		walker.pte_access |= ACC_WRITE_MASK;
+		walker.pte_access &= ~ACC_USER_MASK;
+
+		/*
+		 * If we converted a user page to a kernel page,
+		 * so that the kernel can write to it when cr0.wp=0,
+		 * then we should prevent the kernel from executing it
+		 * if SMEP is enabled.
+		 */
+		if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
+			walker.pte_access &= ~ACC_EXEC_MASK;
+	}
+
 	if (walker.level >= PT_DIRECTORY_LEVEL)
 		force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
 	else
@@ -572,7 +587,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 	kvm_mmu_free_some_pages(vcpu);
 	if (!force_pt_level)
 		transparent_hugepage_adjust(vcpu, &walker.gfn, &pfn, &level);
-	r = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
+	r = FNAME(fetch)(vcpu, addr, &walker, write_fault,
 			 level, pfn, map_writable, prefault);
 	++vcpu->stat.pf_fixed;
 	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
@@ -747,7 +762,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)

 		host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;

-		set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
+		set_spte(vcpu, &sp->spt[i], pte_access,
 			 PT_PAGE_TABLE_LEVEL, gfn,
 			 spte_to_pfn(sp->spt[i]), true, false,
 			 host_writable);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] KVM: MMU: remove pt_access in mmu_set_spte
  2012-11-05 12:09 [PATCH 1/5] KVM: MMU: cleanup mapping-level Xiao Guangrong
                   ` (2 preceding siblings ...)
  2012-11-05 12:12 ` [PATCH 4/5] KVM: MMU: move adjusting softmmu pte access to FNAME(page_fault) Xiao Guangrong
@ 2012-11-05 12:12 ` Xiao Guangrong
  3 siblings, 0 replies; 15+ messages in thread
From: Xiao Guangrong @ 2012-11-05 12:12 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

It is only used in debug code, so drop it

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c         |   16 +++++++---------
 arch/x86/kvm/paging_tmpl.h |    9 ++++-----
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4229e78..1faded1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2422,15 +2422,14 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 }

 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-			 unsigned pt_access, unsigned pte_access,
-			 int write_fault, int *emulate, int level,
-			 gfn_t gfn, pfn_t pfn, bool speculative,
+			 unsigned pte_access, int write_fault, int *emulate,
+			 int level, gfn_t gfn, pfn_t pfn, bool speculative,
 			 bool host_writable)
 {
 	bool was_rmapped = false;

 	pgprintk("%s: spte %llx access %x write_fault %d gfn %llx\n",
-		 __func__, *sptep, pt_access, write_fault, gfn);
+		 __func__, *sptep, write_fault, gfn);

 	if (is_rmap_spte(*sptep)) {
 		if (pfn != spte_to_pfn(*sptep)) {
@@ -2535,8 +2534,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 		return -1;

 	for (i = 0; i < ret; i++, gfn++, start++)
-		mmu_set_spte(vcpu, start, ACC_ALL,
-			     access, 0, NULL,
+		mmu_set_spte(vcpu, start, access, 0, NULL,
 			     sp->role.level, gfn,
 			     page_to_pfn(pages[i]), true, true);

@@ -2599,9 +2597,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 		if (iterator.level == level) {
 			unsigned pte_access = ACC_ALL;

-			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access,
-				     write, &emulate,
-				     level, gfn, pfn, prefault, map_writable);
+			mmu_set_spte(vcpu, iterator.sptep, pte_access,
+				     write, &emulate, level, gfn, pfn,
+				     prefault, map_writable);
 			direct_pte_prefetch(vcpu, iterator.sptep);
 			++vcpu->stat.pf_fixed;
 			break;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index b1bcd68..d6b9c59 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -330,8 +330,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0,
-		     NULL, PT_PAGE_TABLE_LEVEL, gfn, pfn, true, true);
+	mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL,
+		     gfn, pfn, true, true);

 	return true;
 }
@@ -477,9 +477,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 	}

 	clear_sp_write_flooding_count(it.sptep);
-	mmu_set_spte(vcpu, it.sptep, access, gw->pte_access,
-		     write_fault, &emulate, it.level,
-		     gw->gfn, pfn, prefault, map_writable);
+	mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, &emulate,
+		     it.level, gw->gfn, pfn, prefault, map_writable);
 	FNAME(pte_prefetch)(vcpu, gw, it.sptep);

 	return emulate;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] KVM: MMU: simplify mmu_set_spte
  2012-11-05 12:10 ` [PATCH 2/5] KVM: MMU: simplify mmu_set_spte Xiao Guangrong
@ 2012-11-12 23:12   ` Marcelo Tosatti
  2012-11-13  8:39     ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Marcelo Tosatti @ 2012-11-12 23:12 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Mon, Nov 05, 2012 at 08:10:08PM +0800, Xiao Guangrong wrote:
> In order to detecting spte remapping, we can simply check whether the
> spte has already been pointing to the pfn even if the spte is not the
> last spte for middle spte is pointing to the kernel pfn which can not
> be mapped to userspace
> 
> Also, update slot and stat.lpages iff the spte is not remapped
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c |   40 +++++++++++++---------------------------
>  1 files changed, 13 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 692ebb1..4ea731e 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2420,8 +2420,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  			 pfn_t pfn, bool speculative,
>  			 bool host_writable)
>  {
> -	int was_rmapped = 0;
> -	int rmap_count;
> +	bool was_rmapped = false;
> 
>  	pgprintk("%s: spte %llx access %x write_fault %d"
>  		 " user_fault %d gfn %llx\n",
> @@ -2429,25 +2428,13 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		 write_fault, user_fault, gfn);
> 
>  	if (is_rmap_spte(*sptep)) {
> -		/*
> -		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
> -		 * the parent of the now unreachable PTE.
> -		 */
> -		if (level > PT_PAGE_TABLE_LEVEL &&
> -		    !is_large_pte(*sptep)) {
> -			struct kvm_mmu_page *child;
> -			u64 pte = *sptep;
> +		if (pfn != spte_to_pfn(*sptep)) {
> +			struct kvm_mmu_page *sp = page_header(__pa(sptep));
> 
> -			child = page_header(pte & PT64_BASE_ADDR_MASK);
> -			drop_parent_pte(child, sptep);
> -			kvm_flush_remote_tlbs(vcpu->kvm);

How come its safe to drop this case?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] KVM: MMU: simplify mmu_set_spte
  2012-11-12 23:12   ` Marcelo Tosatti
@ 2012-11-13  8:39     ` Xiao Guangrong
  2012-11-20 22:18       ` Marcelo Tosatti
  0 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2012-11-13  8:39 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 11/13/2012 07:12 AM, Marcelo Tosatti wrote:
> On Mon, Nov 05, 2012 at 08:10:08PM +0800, Xiao Guangrong wrote:
>> In order to detecting spte remapping, we can simply check whether the
>> spte has already been pointing to the pfn even if the spte is not the
>> last spte for middle spte is pointing to the kernel pfn which can not
>> be mapped to userspace
>>
>> Also, update slot and stat.lpages iff the spte is not remapped
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c |   40 +++++++++++++---------------------------
>>  1 files changed, 13 insertions(+), 27 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 692ebb1..4ea731e 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2420,8 +2420,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>>  			 pfn_t pfn, bool speculative,
>>  			 bool host_writable)
>>  {
>> -	int was_rmapped = 0;
>> -	int rmap_count;
>> +	bool was_rmapped = false;
>>
>>  	pgprintk("%s: spte %llx access %x write_fault %d"
>>  		 " user_fault %d gfn %llx\n",
>> @@ -2429,25 +2428,13 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>>  		 write_fault, user_fault, gfn);
>>
>>  	if (is_rmap_spte(*sptep)) {
>> -		/*
>> -		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
>> -		 * the parent of the now unreachable PTE.
>> -		 */
>> -		if (level > PT_PAGE_TABLE_LEVEL &&
>> -		    !is_large_pte(*sptep)) {
>> -			struct kvm_mmu_page *child;
>> -			u64 pte = *sptep;
>> +		if (pfn != spte_to_pfn(*sptep)) {
>> +			struct kvm_mmu_page *sp = page_header(__pa(sptep));
>>
>> -			child = page_header(pte & PT64_BASE_ADDR_MASK);
>> -			drop_parent_pte(child, sptep);
>> -			kvm_flush_remote_tlbs(vcpu->kvm);
> 
> How come its safe to drop this case?

We use "if (pfn != spte_to_pfn(*sptep))" to simplify the thing.
There are two cases:
1) the sptep is not the last mapping.
   under this case, sptep must point to a shadow page table, that means
   spte_to_pfn(*sptep)) is used by KVM module, and 'pfn' is used by userspace.
   so, 'if' condition must be satisfied, the sptep will be dropped.

   Actually, This is the origin case:
  | if (level > PT_PAGE_TABLE_LEVEL &&
  |	    !is_large_pte(*sptep))"

2) the sptep is the last mapping.
   under this case, the level of spte (sp.level) must equal the 'level' which
   we pass to mmu_set_spte. If they point to the same pfn, it is 'remap', otherwise
   we drop it.

I think this is safe. :)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] KVM: MMU: simplify mmu_set_spte
  2012-11-13  8:39     ` Xiao Guangrong
@ 2012-11-20 22:18       ` Marcelo Tosatti
  2012-11-20 23:23         ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Marcelo Tosatti @ 2012-11-20 22:18 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Nov 13, 2012 at 04:39:44PM +0800, Xiao Guangrong wrote:
> On 11/13/2012 07:12 AM, Marcelo Tosatti wrote:
> > On Mon, Nov 05, 2012 at 08:10:08PM +0800, Xiao Guangrong wrote:
> >> In order to detecting spte remapping, we can simply check whether the
> >> spte has already been pointing to the pfn even if the spte is not the
> >> last spte for middle spte is pointing to the kernel pfn which can not
> >> be mapped to userspace
> >>
> >> Also, update slot and stat.lpages iff the spte is not remapped
> >>
> >> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >> ---
> >>  arch/x86/kvm/mmu.c |   40 +++++++++++++---------------------------
> >>  1 files changed, 13 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> >> index 692ebb1..4ea731e 100644
> >> --- a/arch/x86/kvm/mmu.c
> >> +++ b/arch/x86/kvm/mmu.c
> >> @@ -2420,8 +2420,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> >>  			 pfn_t pfn, bool speculative,
> >>  			 bool host_writable)
> >>  {
> >> -	int was_rmapped = 0;
> >> -	int rmap_count;
> >> +	bool was_rmapped = false;
> >>
> >>  	pgprintk("%s: spte %llx access %x write_fault %d"
> >>  		 " user_fault %d gfn %llx\n",
> >> @@ -2429,25 +2428,13 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> >>  		 write_fault, user_fault, gfn);
> >>
> >>  	if (is_rmap_spte(*sptep)) {
> >> -		/*
> >> -		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
> >> -		 * the parent of the now unreachable PTE.
> >> -		 */
> >> -		if (level > PT_PAGE_TABLE_LEVEL &&
> >> -		    !is_large_pte(*sptep)) {
> >> -			struct kvm_mmu_page *child;
> >> -			u64 pte = *sptep;
> >> +		if (pfn != spte_to_pfn(*sptep)) {
> >> +			struct kvm_mmu_page *sp = page_header(__pa(sptep));
> >>
> >> -			child = page_header(pte & PT64_BASE_ADDR_MASK);
> >> -			drop_parent_pte(child, sptep);
> >> -			kvm_flush_remote_tlbs(vcpu->kvm);
> > 
> > How come its safe to drop this case?
> 
> We use "if (pfn != spte_to_pfn(*sptep))" to simplify the thing.
> There are two cases:
> 1) the sptep is not the last mapping.
>    under this case, sptep must point to a shadow page table, that means
>    spte_to_pfn(*sptep)) is used by KVM module, and 'pfn' is used by userspace.
>    so, 'if' condition must be satisfied, the sptep will be dropped.
> 
>    Actually, This is the origin case:
>   | if (level > PT_PAGE_TABLE_LEVEL &&
>   |	    !is_large_pte(*sptep))"
> 
> 2) the sptep is the last mapping.
>    under this case, the level of spte (sp.level) must equal the 'level' which
>    we pass to mmu_set_spte. If they point to the same pfn, it is 'remap', otherwise
>    we drop it.
> 
> I think this is safe. :)

mmu_page_zap_pte takes care of it, OK.

What if was_rmapped=true but gfn is different? Say if the spte comes
from an unsync shadow page, the guest modifies that shadow page (but
does not invalidate it with invlpg), then faults. gfn can still point
to the same gfn (but in that case, with your patch,
page_header_update_slot is not called.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] KVM: MMU: simplify set_spte
  2012-11-05 12:11 ` [PATCH 3/5] KVM: MMU: simplify set_spte Xiao Guangrong
@ 2012-11-20 22:24   ` Marcelo Tosatti
  2012-11-20 23:26     ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Marcelo Tosatti @ 2012-11-20 22:24 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Mon, Nov 05, 2012 at 08:11:03PM +0800, Xiao Guangrong wrote:
> It is more cleaner if we can update pte_access fist then set spte according
> to pte_access, also introduce gfn_need_write_protect to check whether the
> gfn need to be write-protected
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

Please separate patch in:
- code movement with no logical modification.
- logical modification (such as condition for mark_page_dirty).
- move code to helper functions.

>  arch/x86/kvm/mmu.c |  109 ++++++++++++++++++++++++++++++++--------------------
>  1 files changed, 67 insertions(+), 42 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 4ea731e..49957df 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2329,6 +2329,63 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>  	return 0;
>  }
> 
> +static bool gfn_need_write_protect(struct kvm_vcpu *vcpu, u64 *sptep,
> +				   int level,  gfn_t gfn, bool can_unsync)
> +{
> +	/*
> +	 * Optimization: for pte sync, if spte was writable the hash
> +	 * lookup is unnecessary (and expensive). Write protection
> +	 * is responsibility of mmu_get_page / kvm_sync_page.
> +	 * Same reasoning can be applied to dirty page accounting.
> +	 */
> +	if (!can_unsync && is_writable_pte(*sptep))
> +		return false;
> +
> +	if ((level > PT_PAGE_TABLE_LEVEL &&
> +	   has_wrprotected_page(vcpu->kvm, gfn, level)) ||
> +	      mmu_need_write_protect(vcpu, gfn, can_unsync))
> +		return true;
> +
> +	return false;
> +}
> +
> +/* The return value indicates whether the @gfn need to be write protected. */
> +static bool vcpu_adjust_access(struct kvm_vcpu *vcpu, u64 *sptep,
> +			       unsigned *pte_access, int user_fault,
> +			       int write_fault, int level, gfn_t gfn,
> +			       bool can_unsync, bool host_writable)
> +{
> +	bool ret = false;
> +	unsigned access = *pte_access;
> +
> +	if (!host_writable)
> +		access &= ~ACC_WRITE_MASK;
> +
> +	if (!(access & ACC_WRITE_MASK) && (!vcpu->arch.mmu.direct_map &&
> +	      write_fault && !is_write_protection(vcpu) && !user_fault)) {
> +		access |= ACC_WRITE_MASK;
> +		access &= ~ACC_USER_MASK;
> +
> +		/*
> +		 * If we converted a user page to a kernel page,
> +		 * so that the kernel can write to it when cr0.wp=0,
> +		 * then we should prevent the kernel from executing it
> +		 * if SMEP is enabled.
> +		 */
> +		if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
> +			access &= ~ACC_EXEC_MASK;
> +	}
> +
> +	if ((access & ACC_WRITE_MASK) &&
> +		  gfn_need_write_protect(vcpu, sptep, level, gfn, can_unsync)) {
> +		access &= ~ACC_WRITE_MASK;
> +		ret = true;
> +	}
> +
> +	*pte_access = access;
> +	return ret;
> +}
> +
>  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		    unsigned pte_access, int user_fault,
>  		    int write_fault, int level,
> @@ -2341,6 +2398,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	if (set_mmio_spte(sptep, gfn, pfn, pte_access))
>  		return 0;
> 
> +	ret = vcpu_adjust_access(vcpu, sptep, &pte_access, user_fault,
> +		      write_fault, level, gfn, can_unsync, host_writable);
> +
>  	spte = PT_PRESENT_MASK;
>  	if (!speculative)
>  		spte |= shadow_accessed_mask;
> @@ -2353,61 +2413,26 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	if (pte_access & ACC_USER_MASK)
>  		spte |= shadow_user_mask;
> 
> +	if (pte_access & ACC_WRITE_MASK) {
> +		spte |= PT_WRITABLE_MASK;
> +		spte |= SPTE_MMU_WRITEABLE;
> +	}
> +
>  	if (level > PT_PAGE_TABLE_LEVEL)
>  		spte |= PT_PAGE_SIZE_MASK;
> +
>  	if (tdp_enabled)
>  		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
>  			kvm_is_mmio_pfn(pfn));
> 
>  	if (host_writable)
>  		spte |= SPTE_HOST_WRITEABLE;
> -	else
> -		pte_access &= ~ACC_WRITE_MASK;
> 
>  	spte |= (u64)pfn << PAGE_SHIFT;
> 
> -	if ((pte_access & ACC_WRITE_MASK)
> -	    || (!vcpu->arch.mmu.direct_map && write_fault
> -		&& !is_write_protection(vcpu) && !user_fault)) {
> -		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
> -
> -		if (!vcpu->arch.mmu.direct_map
> -		    && !(pte_access & ACC_WRITE_MASK)) {
> -			spte &= ~PT_USER_MASK;
> -			/*
> -			 * If we converted a user page to a kernel page,
> -			 * so that the kernel can write to it when cr0.wp=0,
> -			 * then we should prevent the kernel from executing it
> -			 * if SMEP is enabled.
> -			 */
> -			if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
> -				spte |= PT64_NX_MASK;
> -		}
> -
> -		/*
> -		 * Optimization: for pte sync, if spte was writable the hash
> -		 * lookup is unnecessary (and expensive). Write protection
> -		 * is responsibility of mmu_get_page / kvm_sync_page.
> -		 * Same reasoning can be applied to dirty page accounting.
> -		 */
> -		if (!can_unsync && is_writable_pte(*sptep))
> -			goto set_pte;
> -
> -		if ((level > PT_PAGE_TABLE_LEVEL &&
> -		   has_wrprotected_page(vcpu->kvm, gfn, level)) ||
> -		      mmu_need_write_protect(vcpu, gfn, can_unsync)) {
> -			pgprintk("%s: found shadow page for %llx, marking ro\n",
> -				 __func__, gfn);
> -			ret = 1;
> -			pte_access &= ~ACC_WRITE_MASK;
> -			spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
> -		}
> -	}
> -
> -	if (pte_access & ACC_WRITE_MASK)
> +	if (is_writable_pte(spte))
>  		mark_page_dirty(vcpu->kvm, gfn);
> 
> -set_pte:
>  	if (mmu_spte_update(sptep, spte))
>  		kvm_flush_remote_tlbs(vcpu->kvm);
>  	return ret;
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] KVM: MMU: move adjusting softmmu pte access to FNAME(page_fault)
  2012-11-05 12:12 ` [PATCH 4/5] KVM: MMU: move adjusting softmmu pte access to FNAME(page_fault) Xiao Guangrong
@ 2012-11-20 22:27   ` Marcelo Tosatti
  2012-11-20 23:28     ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Marcelo Tosatti @ 2012-11-20 22:27 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Mon, Nov 05, 2012 at 08:12:07PM +0800, Xiao Guangrong wrote:
> Then, no mmu specified code exists in the common function and drop two
> parameters in set_spte
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>  arch/x86/kvm/mmu.c         |   42 +++++++++++-------------------------------
>  arch/x86/kvm/paging_tmpl.h |   25 ++++++++++++++++++++-----
>  2 files changed, 31 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 49957df..4229e78 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2351,8 +2351,7 @@ static bool gfn_need_write_protect(struct kvm_vcpu *vcpu, u64 *sptep,
> 
>  /* The return value indicates whether the @gfn need to be write protected. */
>  static bool vcpu_adjust_access(struct kvm_vcpu *vcpu, u64 *sptep,
> -			       unsigned *pte_access, int user_fault,
> -			       int write_fault, int level, gfn_t gfn,
> +			       unsigned *pte_access, int level, gfn_t gfn,
>  			       bool can_unsync, bool host_writable)
>  {
>  	bool ret = false;
> @@ -2361,21 +2360,6 @@ static bool vcpu_adjust_access(struct kvm_vcpu *vcpu, u64 *sptep,
>  	if (!host_writable)
>  		access &= ~ACC_WRITE_MASK;
> 
> -	if (!(access & ACC_WRITE_MASK) && (!vcpu->arch.mmu.direct_map &&
> -	      write_fault && !is_write_protection(vcpu) && !user_fault)) {
> -		access |= ACC_WRITE_MASK;
> -		access &= ~ACC_USER_MASK;
> -
> -		/*
> -		 * If we converted a user page to a kernel page,
> -		 * so that the kernel can write to it when cr0.wp=0,
> -		 * then we should prevent the kernel from executing it
> -		 * if SMEP is enabled.
> -		 */
> -		if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
> -			access &= ~ACC_EXEC_MASK;
> -	}
> -
>  	if ((access & ACC_WRITE_MASK) &&
>  		  gfn_need_write_protect(vcpu, sptep, level, gfn, can_unsync)) {
>  		access &= ~ACC_WRITE_MASK;
> @@ -2387,8 +2371,7 @@ static bool vcpu_adjust_access(struct kvm_vcpu *vcpu, u64 *sptep,
>  }
> 
>  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> -		    unsigned pte_access, int user_fault,
> -		    int write_fault, int level,
> +		    unsigned pte_access, int level,
>  		    gfn_t gfn, pfn_t pfn, bool speculative,
>  		    bool can_unsync, bool host_writable)
>  {
> @@ -2398,8 +2381,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	if (set_mmio_spte(sptep, gfn, pfn, pte_access))
>  		return 0;
> 
> -	ret = vcpu_adjust_access(vcpu, sptep, &pte_access, user_fault,
> -		      write_fault, level, gfn, can_unsync, host_writable);
> +	ret = vcpu_adjust_access(vcpu, sptep, &pte_access, level, gfn,
> +				 can_unsync, host_writable);
> 
>  	spte = PT_PRESENT_MASK;
>  	if (!speculative)
> @@ -2440,17 +2423,14 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> 
>  static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  			 unsigned pt_access, unsigned pte_access,
> -			 int user_fault, int write_fault,
> -			 int *emulate, int level, gfn_t gfn,
> -			 pfn_t pfn, bool speculative,
> +			 int write_fault, int *emulate, int level,
> +			 gfn_t gfn, pfn_t pfn, bool speculative,
>  			 bool host_writable)
>  {
>  	bool was_rmapped = false;
> 
> -	pgprintk("%s: spte %llx access %x write_fault %d"
> -		 " user_fault %d gfn %llx\n",
> -		 __func__, *sptep, pt_access,
> -		 write_fault, user_fault, gfn);
> +	pgprintk("%s: spte %llx access %x write_fault %d gfn %llx\n",
> +		 __func__, *sptep, pt_access, write_fault, gfn);
> 
>  	if (is_rmap_spte(*sptep)) {
>  		if (pfn != spte_to_pfn(*sptep)) {
> @@ -2462,7 +2442,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  			was_rmapped = true;
>  	}
> 
> -	if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
> +	if (set_spte(vcpu, sptep, pte_access,
>  		      level, gfn, pfn, speculative, true,
>  		      host_writable)) {
>  		if (write_fault)
> @@ -2556,7 +2536,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
> 
>  	for (i = 0; i < ret; i++, gfn++, start++)
>  		mmu_set_spte(vcpu, start, ACC_ALL,
> -			     access, 0, 0, NULL,
> +			     access, 0, NULL,
>  			     sp->role.level, gfn,
>  			     page_to_pfn(pages[i]), true, true);
> 
> @@ -2620,7 +2600,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
>  			unsigned pte_access = ACC_ALL;
> 
>  			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access,
> -				     0, write, &emulate,
> +				     write, &emulate,
>  				     level, gfn, pfn, prefault, map_writable);
>  			direct_pte_prefetch(vcpu, iterator.sptep);
>  			++vcpu->stat.pf_fixed;
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 891eb6d..b1bcd68 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -330,7 +330,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>  	 * we call mmu_set_spte() with host_writable = true because
>  	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
>  	 */
> -	mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0,
> +	mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0,
>  		     NULL, PT_PAGE_TABLE_LEVEL, gfn, pfn, true, true);
> 
>  	return true;
> @@ -405,7 +405,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
>   */
>  static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>  			 struct guest_walker *gw,
> -			 int user_fault, int write_fault, int hlevel,
> +			 int write_fault, int hlevel,
>  			 pfn_t pfn, bool map_writable, bool prefault)
>  {
>  	struct kvm_mmu_page *sp = NULL;
> @@ -478,7 +478,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
> 
>  	clear_sp_write_flooding_count(it.sptep);
>  	mmu_set_spte(vcpu, it.sptep, access, gw->pte_access,
> -		     user_fault, write_fault, &emulate, it.level,
> +		     write_fault, &emulate, it.level,
>  		     gw->gfn, pfn, prefault, map_writable);
>  	FNAME(pte_prefetch)(vcpu, gw, it.sptep);
> 
> @@ -544,6 +544,21 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
>  		return 0;
>  	}
> 
> +	if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
> +	      !is_write_protection(vcpu) && !user_fault) {
> +		walker.pte_access |= ACC_WRITE_MASK;
> +		walker.pte_access &= ~ACC_USER_MASK;
> +
> +		/*
> +		 * If we converted a user page to a kernel page,
> +		 * so that the kernel can write to it when cr0.wp=0,
> +		 * then we should prevent the kernel from executing it
> +		 * if SMEP is enabled.
> +		 */
> +		if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
> +			walker.pte_access &= ~ACC_EXEC_MASK;
> +	}
> +

What about sync_page path?

>  	if (walker.level >= PT_DIRECTORY_LEVEL)
>  		force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
>  	else
> @@ -572,7 +587,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
>  	kvm_mmu_free_some_pages(vcpu);
>  	if (!force_pt_level)
>  		transparent_hugepage_adjust(vcpu, &walker.gfn, &pfn, &level);
> -	r = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
> +	r = FNAME(fetch)(vcpu, addr, &walker, write_fault,
>  			 level, pfn, map_writable, prefault);
>  	++vcpu->stat.pf_fixed;
>  	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
> @@ -747,7 +762,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> 
>  		host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;
> 
> -		set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
> +		set_spte(vcpu, &sp->spt[i], pte_access,
>  			 PT_PAGE_TABLE_LEVEL, gfn,
>  			 spte_to_pfn(sp->spt[i]), true, false,
>  			 host_writable);
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] KVM: MMU: simplify mmu_set_spte
  2012-11-20 22:18       ` Marcelo Tosatti
@ 2012-11-20 23:23         ` Xiao Guangrong
  2012-11-20 23:51           ` Marcelo Tosatti
  0 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2012-11-20 23:23 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 11/21/2012 06:18 AM, Marcelo Tosatti wrote:

>>>> -			child = page_header(pte & PT64_BASE_ADDR_MASK);
>>>> -			drop_parent_pte(child, sptep);
>>>> -			kvm_flush_remote_tlbs(vcpu->kvm);
>>>
>>> How come its safe to drop this case?
>>
>> We use "if (pfn != spte_to_pfn(*sptep))" to simplify the thing.
>> There are two cases:
>> 1) the sptep is not the last mapping.
>>    under this case, sptep must point to a shadow page table, that means
>>    spte_to_pfn(*sptep)) is used by KVM module, and 'pfn' is used by userspace.
>>    so, 'if' condition must be satisfied, the sptep will be dropped.
>>
>>    Actually, This is the origin case:
>>   | if (level > PT_PAGE_TABLE_LEVEL &&
>>   |	    !is_large_pte(*sptep))"
>>
>> 2) the sptep is the last mapping.
>>    under this case, the level of spte (sp.level) must equal the 'level' which
>>    we pass to mmu_set_spte. If they point to the same pfn, it is 'remap', otherwise
>>    we drop it.
>>
>> I think this is safe. :)
> 
> mmu_page_zap_pte takes care of it, OK.
> 
> What if was_rmapped=true but gfn is different? Say if the spte comes
> from an unsync shadow page, the guest modifies that shadow page (but
> does not invalidate it with invlpg), then faults. gfn can still point
> to the same gfn (but in that case, with your patch,
> page_header_update_slot is not called.

Marcelo,

Page fault path and other sync/prefetch paths will reread guest page table,
then it get a different target pfn.

The scenario is like this:

gfn1 = pfn1, gfn2 = pfn2
gpte = pfn1, spte is shadowed by gpte and it is a unsync spte

Guest                               Host
                                     spte = (gfn1, pfn1)

modify gpte to let it point to gfn2
                                    spte = (gfn1, pfn1)
page-fault on gpte
                                    intercept the page-fault, then
                                    want to update spte to (gfn2, pfn2)

                                    in mmu_set_spte, we can detect
                                    pfn2 != pfn1, then drop it.

Hmm, the interesting thing is what if different gfns map to the same pfn.
For example, spte1 is shadowed by gfn1 and spte2 is shadowed by pfn2, both
gfn1 and gfn2 map to pfn, the code (including the current code) will set
spte1 to the gfn2's rmap and spte2 to the gfn1's rmap. But i think it is ok.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] KVM: MMU: simplify set_spte
  2012-11-20 22:24   ` Marcelo Tosatti
@ 2012-11-20 23:26     ` Xiao Guangrong
  0 siblings, 0 replies; 15+ messages in thread
From: Xiao Guangrong @ 2012-11-20 23:26 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 11/21/2012 06:24 AM, Marcelo Tosatti wrote:
> On Mon, Nov 05, 2012 at 08:11:03PM +0800, Xiao Guangrong wrote:
>> It is more cleaner if we can update pte_access fist then set spte according
>> to pte_access, also introduce gfn_need_write_protect to check whether the
>> gfn need to be write-protected
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> 
> Please separate patch in:
> - code movement with no logical modification.
> - logical modification (such as condition for mark_page_dirty).
> - move code to helper functions.

Okay, will split it.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] KVM: MMU: move adjusting softmmu pte access to FNAME(page_fault)
  2012-11-20 22:27   ` Marcelo Tosatti
@ 2012-11-20 23:28     ` Xiao Guangrong
  0 siblings, 0 replies; 15+ messages in thread
From: Xiao Guangrong @ 2012-11-20 23:28 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 11/21/2012 06:27 AM, Marcelo Tosatti wrote:

>> @@ -544,6 +544,21 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
>>  		return 0;
>>  	}
>>
>> +	if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
>> +	      !is_write_protection(vcpu) && !user_fault) {
>> +		walker.pte_access |= ACC_WRITE_MASK;
>> +		walker.pte_access &= ~ACC_USER_MASK;
>> +
>> +		/*
>> +		 * If we converted a user page to a kernel page,
>> +		 * so that the kernel can write to it when cr0.wp=0,
>> +		 * then we should prevent the kernel from executing it
>> +		 * if SMEP is enabled.
>> +		 */
>> +		if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
>> +			walker.pte_access &= ~ACC_EXEC_MASK;
>> +	}
>> +
> 
> What about sync_page path?

The sync_page and other prefetch paths only do read-prefetch, means
they call set_spte with write_fault = 0.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] KVM: MMU: simplify mmu_set_spte
  2012-11-20 23:23         ` Xiao Guangrong
@ 2012-11-20 23:51           ` Marcelo Tosatti
  2012-11-21  3:19             ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Marcelo Tosatti @ 2012-11-20 23:51 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Wed, Nov 21, 2012 at 07:23:26AM +0800, Xiao Guangrong wrote:
> On 11/21/2012 06:18 AM, Marcelo Tosatti wrote:
> 
> >>>> -			child = page_header(pte & PT64_BASE_ADDR_MASK);
> >>>> -			drop_parent_pte(child, sptep);
> >>>> -			kvm_flush_remote_tlbs(vcpu->kvm);
> >>>
> >>> How come its safe to drop this case?
> >>
> >> We use "if (pfn != spte_to_pfn(*sptep))" to simplify the thing.
> >> There are two cases:
> >> 1) the sptep is not the last mapping.
> >>    under this case, sptep must point to a shadow page table, that means
> >>    spte_to_pfn(*sptep)) is used by KVM module, and 'pfn' is used by userspace.
> >>    so, 'if' condition must be satisfied, the sptep will be dropped.
> >>
> >>    Actually, This is the origin case:
> >>   | if (level > PT_PAGE_TABLE_LEVEL &&
> >>   |	    !is_large_pte(*sptep))"
> >>
> >> 2) the sptep is the last mapping.
> >>    under this case, the level of spte (sp.level) must equal the 'level' which
> >>    we pass to mmu_set_spte. If they point to the same pfn, it is 'remap', otherwise
> >>    we drop it.
> >>
> >> I think this is safe. :)
> > 
> > mmu_page_zap_pte takes care of it, OK.
> > 
> > What if was_rmapped=true but gfn is different? Say if the spte comes
> > from an unsync shadow page, the guest modifies that shadow page (but
> > does not invalidate it with invlpg), then faults. gfn can still point
> > to the same gfn (but in that case, with your patch,
> > page_header_update_slot is not called.
> 
> Marcelo,
> 
> Page fault path and other sync/prefetch paths will reread guest page table,
> then it get a different target pfn.
> 
> The scenario is like this:
> 
> gfn1 = pfn1, gfn2 = pfn2
> gpte = pfn1, spte is shadowed by gpte and it is a unsync spte
> 
> Guest                               Host
>                                      spte = (gfn1, pfn1)
> 
> modify gpte to let it point to gfn2
>                                     spte = (gfn1, pfn1)
> page-fault on gpte
>                                     intercept the page-fault, then
>                                     want to update spte to (gfn2, pfn2)
> 
>                                     in mmu_set_spte, we can detect
>                                     pfn2 != pfn1, then drop it.
> 
> Hmm, the interesting thing is what if different gfns map to the same pfn.
> For example, spte1 is shadowed by gfn1 and spte2 is shadowed by pfn2, both
> gfn1 and gfn2 map to pfn, the code (including the current code) will set
> spte1 to the gfn2's rmap and spte2 to the gfn1's rmap. But i think it is ok.

Current code updates gfn properly in set_spte by
page_header_update_slot. 

Better keep state properly.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] KVM: MMU: simplify mmu_set_spte
  2012-11-20 23:51           ` Marcelo Tosatti
@ 2012-11-21  3:19             ` Xiao Guangrong
  0 siblings, 0 replies; 15+ messages in thread
From: Xiao Guangrong @ 2012-11-21  3:19 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 11/21/2012 07:51 AM, Marcelo Tosatti wrote:
> On Wed, Nov 21, 2012 at 07:23:26AM +0800, Xiao Guangrong wrote:
>> On 11/21/2012 06:18 AM, Marcelo Tosatti wrote:
>>
>>>>>> -			child = page_header(pte & PT64_BASE_ADDR_MASK);
>>>>>> -			drop_parent_pte(child, sptep);
>>>>>> -			kvm_flush_remote_tlbs(vcpu->kvm);
>>>>>
>>>>> How come its safe to drop this case?
>>>>
>>>> We use "if (pfn != spte_to_pfn(*sptep))" to simplify the thing.
>>>> There are two cases:
>>>> 1) the sptep is not the last mapping.
>>>>    under this case, sptep must point to a shadow page table, that means
>>>>    spte_to_pfn(*sptep)) is used by KVM module, and 'pfn' is used by userspace.
>>>>    so, 'if' condition must be satisfied, the sptep will be dropped.
>>>>
>>>>    Actually, This is the origin case:
>>>>   | if (level > PT_PAGE_TABLE_LEVEL &&
>>>>   |	    !is_large_pte(*sptep))"
>>>>
>>>> 2) the sptep is the last mapping.
>>>>    under this case, the level of spte (sp.level) must equal the 'level' which
>>>>    we pass to mmu_set_spte. If they point to the same pfn, it is 'remap', otherwise
>>>>    we drop it.
>>>>
>>>> I think this is safe. :)
>>>
>>> mmu_page_zap_pte takes care of it, OK.
>>>
>>> What if was_rmapped=true but gfn is different? Say if the spte comes
>>> from an unsync shadow page, the guest modifies that shadow page (but
>>> does not invalidate it with invlpg), then faults. gfn can still point
>>> to the same gfn (but in that case, with your patch,
>>> page_header_update_slot is not called.
>>
>> Marcelo,
>>
>> Page fault path and other sync/prefetch paths will reread guest page table,
>> then it get a different target pfn.
>>
>> The scenario is like this:
>>
>> gfn1 = pfn1, gfn2 = pfn2
>> gpte = pfn1, spte is shadowed by gpte and it is a unsync spte
>>
>> Guest                               Host
>>                                      spte = (gfn1, pfn1)
>>
>> modify gpte to let it point to gfn2
>>                                     spte = (gfn1, pfn1)
>> page-fault on gpte
>>                                     intercept the page-fault, then
>>                                     want to update spte to (gfn2, pfn2)
>>
>>                                     in mmu_set_spte, we can detect
>>                                     pfn2 != pfn1, then drop it.
>>
>> Hmm, the interesting thing is what if different gfns map to the same pfn.
>> For example, spte1 is shadowed by gfn1 and spte2 is shadowed by pfn2, both
>> gfn1 and gfn2 map to pfn, the code (including the current code) will set
>> spte1 to the gfn2's rmap and spte2 to the gfn1's rmap. But i think it is ok.
> 
> Current code updates gfn properly in set_spte by
> page_header_update_slot. 
> 
> Better keep state properly.

Okay, i will not change the position of page_header_update_slot in the
next version. Thank you, Marcelo!



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-11-21  3:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-05 12:09 [PATCH 1/5] KVM: MMU: cleanup mapping-level Xiao Guangrong
2012-11-05 12:10 ` [PATCH 2/5] KVM: MMU: simplify mmu_set_spte Xiao Guangrong
2012-11-12 23:12   ` Marcelo Tosatti
2012-11-13  8:39     ` Xiao Guangrong
2012-11-20 22:18       ` Marcelo Tosatti
2012-11-20 23:23         ` Xiao Guangrong
2012-11-20 23:51           ` Marcelo Tosatti
2012-11-21  3:19             ` Xiao Guangrong
2012-11-05 12:11 ` [PATCH 3/5] KVM: MMU: simplify set_spte Xiao Guangrong
2012-11-20 22:24   ` Marcelo Tosatti
2012-11-20 23:26     ` Xiao Guangrong
2012-11-05 12:12 ` [PATCH 4/5] KVM: MMU: move adjusting softmmu pte access to FNAME(page_fault) Xiao Guangrong
2012-11-20 22:27   ` Marcelo Tosatti
2012-11-20 23:28     ` Xiao Guangrong
2012-11-05 12:12 ` [PATCH 5/5] KVM: MMU: remove pt_access in mmu_set_spte Xiao Guangrong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).