[PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work
@ 2015-11-12 11:48 Takuya Yoshikawa
  2015-11-12 11:49 ` [PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map() Takuya Yoshikawa
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:48 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

v1->v2:
  Patch 5 and 7 are added based on Paolo's suggestions.
  Patch 8-10 are new.

Patch 1/2/3/4: no change.
Patch 5: Needed a bit more work than I had expected.
Patch 6: Removed extra comment of v1 (patch 5 made it inappropriate).
Patch 7: As expected, many places needed to be converted.
Patch 8: This is new, but only a small change.

Patch 9: Kind of an RFC (though I have checked it to some extent).
  Following two places need to be carefully checked:
  - in kvm_mmu_get_page: "if (!direct)" block after kvm_mmu_alloc_page()
  - in FNAME(fetch): "if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))" case
Patch 10: Trivial cleanup, assuming that patch 9 is correct.


In summary: patch 1-7 is the result of updating v1 based on the suggestions.
  Although patch 5 does not look so nice than expected, this is the most
  conservative approach, and patch 8-10 try to alleviate the sadness.

  Takuya


Takuya Yoshikawa (10):
  01:  KVM: x86: MMU: Remove unused parameter of __direct_map()
  02:  KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  03:  KVM: x86: MMU: Make mmu_set_spte() return emulate value
  04:  KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
  05:  KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  06:  KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
  07:  KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  08:  KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()
  09:  KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  10:  KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()

 Documentation/virtual/kvm/mmu.txt |   4 +-
 arch/x86/include/asm/kvm_host.h   |   8 +-
 arch/x86/kvm/mmu.c                | 357 ++++++++++++++++++--------------------
 arch/x86/kvm/mmu_audit.c          |  15 +-
 arch/x86/kvm/paging_tmpl.h        |  20 +--
 5 files changed, 196 insertions(+), 208 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
@ 2015-11-12 11:49 ` Takuya Yoshikawa
  2015-11-12 11:50 ` [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap Takuya Yoshikawa
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:49 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e7c2c14..c3bbc82 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 	__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-			int map_writable, int level, gfn_t gfn, pfn_t pfn,
-			bool prefault)
+static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
+			int level, gfn_t gfn, pfn_t pfn, bool prefault)
 {
 	struct kvm_shadow_walk_iterator iterator;
 	struct kvm_mmu_page *sp;
@@ -3018,11 +3017,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
 	make_mmu_pages_available(vcpu);
 	if (likely(!force_pt_level))
 		transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
-	r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
-			 prefault);
+	r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
-
 	return r;
 
 out_unlock:
@@ -3531,8 +3528,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	make_mmu_pages_available(vcpu);
 	if (likely(!force_pt_level))
 		transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
-	r = __direct_map(vcpu, gpa, write, map_writable,
-			 level, gfn, pfn, prefault);
+	r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 	return r;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
  2015-11-12 11:49 ` [PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map() Takuya Yoshikawa
@ 2015-11-12 11:50 ` Takuya Yoshikawa
  2015-11-18  2:44   ` Xiao Guangrong
  2015-11-12 11:51 ` [PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value Takuya Yoshikawa
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:50 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line
code which clears a bit in the unsync child bitmap; the former places it
inside a loop block and uses a few goto statements to jump to it.

A new helper function, clear_unsync_child_bit(), makes the code cleaner.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c3bbc82..f3120aa 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1806,6 +1806,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
 	return (pvec->nr == KVM_PAGE_ARRAY_NR);
 }
 
+static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
+{
+	--sp->unsync_children;
+	WARN_ON((int)sp->unsync_children < 0);
+	__clear_bit(idx, sp->unsync_child_bitmap);
+}
+
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 			   struct kvm_mmu_pages *pvec)
 {
@@ -1815,8 +1822,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 		struct kvm_mmu_page *child;
 		u64 ent = sp->spt[i];
 
-		if (!is_shadow_present_pte(ent) || is_large_pte(ent))
-			goto clear_child_bitmap;
+		if (!is_shadow_present_pte(ent) || is_large_pte(ent)) {
+			clear_unsync_child_bit(sp, i);
+			continue;
+		}
 
 		child = page_header(ent & PT64_BASE_ADDR_MASK);
 
@@ -1825,28 +1834,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 				return -ENOSPC;
 
 			ret = __mmu_unsync_walk(child, pvec);
-			if (!ret)
-				goto clear_child_bitmap;
-			else if (ret > 0)
+			if (!ret) {
+				clear_unsync_child_bit(sp, i);
+				continue;
+			} else if (ret > 0) {
 				nr_unsync_leaf += ret;
-			else
+			} else
 				return ret;
 		} else if (child->unsync) {
 			nr_unsync_leaf++;
 			if (mmu_pages_add(pvec, child, i))
 				return -ENOSPC;
 		} else
-			 goto clear_child_bitmap;
-
-		continue;
-
-clear_child_bitmap:
-		__clear_bit(i, sp->unsync_child_bitmap);
-		sp->unsync_children--;
-		WARN_ON((int)sp->unsync_children < 0);
+			clear_unsync_child_bit(sp, i);
 	}
 
-
 	return nr_unsync_leaf;
 }
 
@@ -2009,9 +2011,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 		if (!sp)
 			return;
 
-		--sp->unsync_children;
-		WARN_ON((int)sp->unsync_children < 0);
-		__clear_bit(idx, sp->unsync_child_bitmap);
+		clear_unsync_child_bit(sp, idx);
 		level++;
 	} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
  2015-11-12 11:49 ` [PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map() Takuya Yoshikawa
  2015-11-12 11:50 ` [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap Takuya Yoshikawa
@ 2015-11-12 11:51 ` Takuya Yoshikawa
  2015-11-12 11:51 ` [PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() Takuya Yoshikawa
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:51 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

mmu_set_spte()'s code is based on the assumption that the emulate
parameter has a valid pointer value if set_spte() returns true and
write_fault is not zero.  In other cases, emulate may be NULL, so a
NULL-check is needed.

Stop passing emulate pointer and make mmu_set_spte() return the emulate
value instead to clean up this complex interface.  Prefetch functions
can just throw away the return value.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c         | 27 ++++++++++++++-------------
 arch/x86/kvm/paging_tmpl.h | 10 +++++-----
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f3120aa..c229356 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2564,13 +2564,13 @@ done:
 	return ret;
 }
 
-static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-			 unsigned pte_access, int write_fault, int *emulate,
-			 int level, gfn_t gfn, pfn_t pfn, bool speculative,
-			 bool host_writable)
+static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
+			 int write_fault, int level, gfn_t gfn, pfn_t pfn,
+			 bool speculative, bool host_writable)
 {
 	int was_rmapped = 0;
 	int rmap_count;
+	bool emulate = false;
 
 	pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 		 *sptep, write_fault, gfn);
@@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
 	      true, host_writable)) {
 		if (write_fault)
-			*emulate = 1;
+			emulate = true;
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 	}
 
-	if (unlikely(is_mmio_spte(*sptep) && emulate))
-		*emulate = 1;
+	if (unlikely(is_mmio_spte(*sptep)))
+		emulate = true;
 
 	pgprintk("%s: setting spte %llx\n", __func__, *sptep);
 	pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n",
@@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	}
 
 	kvm_release_pfn_clean(pfn);
+
+	return emulate;
 }
 
 static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
@@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 		return -1;
 
 	for (i = 0; i < ret; i++, gfn++, start++)
-		mmu_set_spte(vcpu, start, access, 0, NULL,
-			     sp->role.level, gfn, page_to_pfn(pages[i]),
-			     true, true);
+		mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
+			     page_to_pfn(pages[i]), true, true);
 
 	return 0;
 }
@@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
 
 	for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
 		if (iterator.level == level) {
-			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
-				     write, &emulate, level, gfn, pfn,
-				     prefault, map_writable);
+			emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
+					       write, level, gfn, pfn, prefault,
+					       map_writable);
 			direct_pte_prefetch(vcpu, iterator.sptep);
 			++vcpu->stat.pf_fixed;
 			break;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 3058a22..9d21b44 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL,
-		     gfn, pfn, true, true);
+	mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn,
+		     true, true);
 
 	return true;
 }
@@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 	struct kvm_mmu_page *sp = NULL;
 	struct kvm_shadow_walk_iterator it;
 	unsigned direct_access, access = gw->pt_access;
-	int top_level, emulate = 0;
+	int top_level, emulate;
 
 	direct_access = gw->pte_access;
 
@@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 	}
 
 	clear_sp_write_flooding_count(it.sptep);
-	mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, &emulate,
-		     it.level, gw->gfn, pfn, prefault, map_writable);
+	emulate = mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault,
+			       it.level, gw->gfn, pfn, prefault, map_writable);
 	FNAME(pte_prefetch)(vcpu, gw, it.sptep);
 
 	return emulate;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (2 preceding siblings ...)
  2015-11-12 11:51 ` [PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value Takuya Yoshikawa
@ 2015-11-12 11:51 ` Takuya Yoshikawa
  2015-11-12 11:52 ` [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() Takuya Yoshikawa
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:51 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

is_rmap_spte(), originally named is_rmap_pte(), was introduced when the
simple reverse mapping was implemented by commit cd4a4e5374110444
("[PATCH] KVM: MMU: Implement simple reverse mapping").  At that point,
its role was clear and only rmap_add() and rmap_remove() were using it
to select sptes that need to be reverse-mapped.

Independently of that, is_shadow_present_pte() was first introduced by
commit c7addb902054195b ("KVM: Allow not-present guest page faults to
bypass kvm") to do bypass_guest_pf optimization, which does not exist
any more.

These two seem to have changed their roles somewhat, and is_rmap_spte()
just calls is_shadow_present_pte() now.

Since using both of them with no clear distinction just makes the code
confusing, remove is_rmap_spte().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c       | 13 ++++---------
 arch/x86/kvm/mmu_audit.c |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c229356..e8cfdc4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -311,11 +311,6 @@ static int is_large_pte(u64 pte)
 	return pte & PT_PAGE_SIZE_MASK;
 }
 
-static int is_rmap_spte(u64 pte)
-{
-	return is_shadow_present_pte(pte);
-}
-
 static int is_last_spte(u64 pte, int level)
 {
 	if (level == PT_PAGE_TABLE_LEVEL)
@@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 	u64 old_spte = *sptep;
 	bool ret = false;
 
-	WARN_ON(!is_rmap_spte(new_spte));
+	WARN_ON(!is_shadow_present_pte(new_spte));
 
 	if (!is_shadow_present_pte(old_spte)) {
 		mmu_spte_set(sptep, new_spte);
@@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 	else
 		old_spte = __update_clear_spte_slow(sptep, 0ull);
 
-	if (!is_rmap_spte(old_spte))
+	if (!is_shadow_present_pte(old_spte))
 		return 0;
 
 	pfn = spte_to_pfn(old_spte);
@@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
 	pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 		 *sptep, write_fault, gfn);
 
-	if (is_rmap_spte(*sptep)) {
+	if (is_shadow_present_pte(*sptep)) {
 		/*
 		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 		 * the parent of the now unreachable PTE.
@@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 	 * If the mapping has been changed, let the vcpu fault on the
 	 * same address again.
 	 */
-	if (!is_rmap_spte(spte)) {
+	if (!is_shadow_present_pte(spte)) {
 		ret = true;
 		goto exit;
 	}
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 03d518e..90ee420 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp)
 		return;
 
 	for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-		if (!is_rmap_spte(sp->spt[i]))
+		if (!is_shadow_present_pte(sp->spt[i]))
 			continue;
 
 		inspect_spte_has_rmap(kvm, sp->spt + i);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (3 preceding siblings ...)
  2015-11-12 11:51 ` [PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() Takuya Yoshikawa
@ 2015-11-12 11:52 ` Takuya Yoshikawa
  2015-11-13 21:47   ` Marcelo Tosatti
  2015-11-18  3:07   ` Xiao Guangrong
  2015-11-12 11:53 ` [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes Takuya Yoshikawa
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
nearly the same as the for_each_rmap_spte macro.  The only difference
is that is_shadow_present_pte() checks cannot be placed there because
kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
whose entry is not set yet.

By calling mark_unsync() separately for the parent and adding the parent
pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
works with no problem.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e8cfdc4..1691171 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
 	}
 }
 
-typedef void (*pte_list_walk_fn) (u64 *spte);
-static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
-{
-	struct pte_list_desc *desc;
-	int i;
-
-	if (!*pte_list)
-		return;
-
-	if (!(*pte_list & 1))
-		return fn((u64 *)*pte_list);
-
-	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
-	while (desc) {
-		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
-			fn(desc->sptes[i]);
-		desc = desc->more;
-	}
-}
-
 static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
 				    struct kvm_memory_slot *slot)
 {
@@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 static void mark_unsync(u64 *spte);
 static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 {
-	pte_list_walk(&sp->parent_ptes, mark_unsync);
+	u64 *sptep;
+	struct rmap_iterator iter;
+
+	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
+		mark_unsync(sptep);
+	}
 }
 
 static void mark_unsync(u64 *spte)
@@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
 			break;
 
-		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 		if (sp->unsync_children) {
 			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 			kvm_mmu_mark_parents_unsync(sp);
-		} else if (sp->unsync)
+			if (parent_pte)
+				mark_unsync(parent_pte);
+		} else if (sp->unsync) {
 			kvm_mmu_mark_parents_unsync(sp);
+			if (parent_pte)
+				mark_unsync(parent_pte);
+		}
+		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
 		__clear_sp_write_flooding_count(sp);
 		trace_kvm_mmu_get_page(sp, false);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (4 preceding siblings ...)
  2015-11-12 11:52 ` [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() Takuya Yoshikawa
@ 2015-11-12 11:53 ` Takuya Yoshikawa
  2015-11-13 22:08   ` Marcelo Tosatti
  2015-11-12 11:55 ` [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct Takuya Yoshikawa
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:53 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
placed right after the call to detect unrelated sptes which must not be
found in the reverse-mapping list.

Move this check in rmap_get_first/next() so that all call sites, not
just the users of the for_each_rmap_spte() macro, will be checked the
same way.  In addition, change the BUG_ON to WARN_ON since killing the
whole host is the last thing that KVM should try.

One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
rmap_get_first() to handle parent sptes.  The change will not break it
because parent sptes are present, at least until drop_parent_pte()
actually unlinks them, and not mmio-sptes.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 Documentation/virtual/kvm/mmu.txt |  4 ++--
 arch/x86/kvm/mmu.c                | 26 +++++++++++++++++---------
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index 3a4d681..daf9c0f 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -203,10 +203,10 @@ Shadow pages contain the following information:
     page cannot be destroyed.  See role.invalid.
   parent_ptes:
     The reverse mapping for the pte/ptes pointing at this page's spt. If
-    parent_ptes bit 0 is zero, only one spte points at this pages and
+    parent_ptes bit 0 is zero, only one spte points at this page and
     parent_ptes points at this single spte, otherwise, there exists multiple
     sptes pointing at this page and (parent_ptes & ~0x1) points at a data
-    structure with a list of parent_ptes.
+    structure with a list of parent sptes.
   unsync:
     If true, then the translations in this page may not match the guest's
     translation.  This is equivalent to the state of the tlb when a pte is
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1691171..ee7b101 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1079,17 +1079,23 @@ struct rmap_iterator {
  */
 static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
 {
+	u64 *sptep;
+
 	if (!rmap)
 		return NULL;
 
 	if (!(rmap & 1)) {
 		iter->desc = NULL;
-		return (u64 *)rmap;
+		sptep = (u64 *)rmap;
+		goto out;
 	}
 
 	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
 	iter->pos = 0;
-	return iter->desc->sptes[iter->pos];
+	sptep = iter->desc->sptes[iter->pos];
+out:
+	WARN_ON(!is_shadow_present_pte(*sptep));
+	return sptep;
 }
 
 /*
@@ -1099,14 +1105,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
  */
 static u64 *rmap_get_next(struct rmap_iterator *iter)
 {
+	u64 *sptep;
+
 	if (iter->desc) {
 		if (iter->pos < PTE_LIST_EXT - 1) {
-			u64 *sptep;
-
 			++iter->pos;
 			sptep = iter->desc->sptes[iter->pos];
 			if (sptep)
-				return sptep;
+				goto out;
 		}
 
 		iter->desc = iter->desc->more;
@@ -1114,17 +1120,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
 		if (iter->desc) {
 			iter->pos = 0;
 			/* desc->sptes[0] cannot be NULL */
-			return iter->desc->sptes[iter->pos];
+			sptep = iter->desc->sptes[iter->pos];
+			goto out;
 		}
 	}
 
 	return NULL;
+out:
+	WARN_ON(!is_shadow_present_pte(*sptep));
+	return sptep;
 }
 
 #define for_each_rmap_spte(_rmap_, _iter_, _spte_)			    \
 	   for (_spte_ = rmap_get_first(*_rmap_, _iter_);		    \
-		_spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;});  \
-			_spte_ = rmap_get_next(_iter_))
+		_spte_; _spte_ = rmap_get_next(_iter_))
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
@@ -1338,7 +1347,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
 	bool flush = false;
 
 	while ((sptep = rmap_get_first(*rmapp, &iter))) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
 		rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
 		drop_spte(kvm, sptep);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (5 preceding siblings ...)
  2015-11-12 11:53 ` [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes Takuya Yoshikawa
@ 2015-11-12 11:55 ` Takuya Yoshikawa
  2015-11-18  3:21   ` Xiao Guangrong
  2015-11-12 11:55 ` [PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() Takuya Yoshikawa
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:55 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

New struct kvm_rmap_head makes the code type-safe to some extent.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/include/asm/kvm_host.h |   8 +-
 arch/x86/kvm/mmu.c              | 169 +++++++++++++++++++++-------------------
 arch/x86/kvm/mmu_audit.c        |  13 ++--
 3 files changed, 100 insertions(+), 90 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0535359..c5a0c4a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -214,6 +214,10 @@ union kvm_mmu_page_role {
 	};
 };
 
+struct kvm_rmap_head {
+	unsigned long val;
+};
+
 struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
@@ -231,7 +235,7 @@ struct kvm_mmu_page {
 	bool unsync;
 	int root_count;          /* Currently serving as active root */
 	unsigned int unsync_children;
-	unsigned long parent_ptes;	/* Reverse mapping for parent_pte */
+	struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
 
 	/* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen.  */
 	unsigned long mmu_valid_gen;
@@ -604,7 +608,7 @@ struct kvm_lpage_info {
 };
 
 struct kvm_arch_memory_slot {
-	unsigned long *rmap[KVM_NR_PAGE_SIZES];
+	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 };
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ee7b101..85f4bbd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -916,24 +916,24 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
  *
  */
 static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
-			unsigned long *pte_list)
+			struct kvm_rmap_head *rmap_head)
 {
 	struct pte_list_desc *desc;
 	int i, count = 0;
 
-	if (!*pte_list) {
+	if (!rmap_head->val) {
 		rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte);
-		*pte_list = (unsigned long)spte;
-	} else if (!(*pte_list & 1)) {
+		rmap_head->val = (unsigned long)spte;
+	} else if (!(rmap_head->val & 1)) {
 		rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte);
 		desc = mmu_alloc_pte_list_desc(vcpu);
-		desc->sptes[0] = (u64 *)*pte_list;
+		desc->sptes[0] = (u64 *)rmap_head->val;
 		desc->sptes[1] = spte;
-		*pte_list = (unsigned long)desc | 1;
+		rmap_head->val = (unsigned long)desc | 1;
 		++count;
 	} else {
 		rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte);
-		desc = (struct pte_list_desc *)(*pte_list & ~1ul);
+		desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
 		while (desc->sptes[PTE_LIST_EXT-1] && desc->more) {
 			desc = desc->more;
 			count += PTE_LIST_EXT;
@@ -950,8 +950,9 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
 }
 
 static void
-pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
-			   int i, struct pte_list_desc *prev_desc)
+pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
+			   struct pte_list_desc *desc, int i,
+			   struct pte_list_desc *prev_desc)
 {
 	int j;
 
@@ -962,43 +963,43 @@ pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
 	if (j != 0)
 		return;
 	if (!prev_desc && !desc->more)
-		*pte_list = (unsigned long)desc->sptes[0];
+		rmap_head->val = (unsigned long)desc->sptes[0];
 	else
 		if (prev_desc)
 			prev_desc->more = desc->more;
 		else
-			*pte_list = (unsigned long)desc->more | 1;
+			rmap_head->val = (unsigned long)desc->more | 1;
 	mmu_free_pte_list_desc(desc);
 }
 
-static void pte_list_remove(u64 *spte, unsigned long *pte_list)
+static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
 {
 	struct pte_list_desc *desc;
 	struct pte_list_desc *prev_desc;
 	int i;
 
-	if (!*pte_list) {
+	if (!rmap_head->val) {
 		printk(KERN_ERR "pte_list_remove: %p 0->BUG\n", spte);
 		BUG();
-	} else if (!(*pte_list & 1)) {
+	} else if (!(rmap_head->val & 1)) {
 		rmap_printk("pte_list_remove:  %p 1->0\n", spte);
-		if ((u64 *)*pte_list != spte) {
+		if ((u64 *)rmap_head->val != spte) {
 			printk(KERN_ERR "pte_list_remove:  %p 1->BUG\n", spte);
 			BUG();
 		}
-		*pte_list = 0;
+		rmap_head->val = 0;
 	} else {
 		rmap_printk("pte_list_remove:  %p many->many\n", spte);
-		desc = (struct pte_list_desc *)(*pte_list & ~1ul);
+		desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
 		prev_desc = NULL;
 		while (desc) {
-			for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
+			for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) {
 				if (desc->sptes[i] == spte) {
-					pte_list_desc_remove_entry(pte_list,
-							       desc, i,
-							       prev_desc);
+					pte_list_desc_remove_entry(rmap_head,
+							desc, i, prev_desc);
 					return;
 				}
+			}
 			prev_desc = desc;
 			desc = desc->more;
 		}
@@ -1007,8 +1008,8 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
 	}
 }
 
-static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
-				    struct kvm_memory_slot *slot)
+static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
+					   struct kvm_memory_slot *slot)
 {
 	unsigned long idx;
 
@@ -1016,10 +1017,8 @@ static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
 	return &slot->arch.rmap[level - PT_PAGE_TABLE_LEVEL][idx];
 }
 
-/*
- * Take gfn and return the reverse mapping to it.
- */
-static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, struct kvm_mmu_page *sp)
+static struct kvm_rmap_head *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
+					 struct kvm_mmu_page *sp)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *slot;
@@ -1040,24 +1039,24 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 {
 	struct kvm_mmu_page *sp;
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 
 	sp = page_header(__pa(spte));
 	kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
-	rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp);
-	return pte_list_add(vcpu, spte, rmapp);
+	rmap_head = gfn_to_rmap(vcpu->kvm, gfn, sp);
+	return pte_list_add(vcpu, spte, rmap_head);
 }
 
 static void rmap_remove(struct kvm *kvm, u64 *spte)
 {
 	struct kvm_mmu_page *sp;
 	gfn_t gfn;
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 
 	sp = page_header(__pa(spte));
 	gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
-	rmapp = gfn_to_rmap(kvm, gfn, sp);
-	pte_list_remove(spte, rmapp);
+	rmap_head = gfn_to_rmap(kvm, gfn, sp);
+	pte_list_remove(spte, rmap_head);
 }
 
 /*
@@ -1077,20 +1076,21 @@ struct rmap_iterator {
  *
  * Returns sptep if found, NULL otherwise.
  */
-static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
+static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
+			   struct rmap_iterator *iter)
 {
 	u64 *sptep;
 
-	if (!rmap)
+	if (!rmap_head->val)
 		return NULL;
 
-	if (!(rmap & 1)) {
+	if (!(rmap_head->val & 1)) {
 		iter->desc = NULL;
-		sptep = (u64 *)rmap;
+		sptep = (u64 *)rmap_head->val;
 		goto out;
 	}
 
-	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
+	iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
 	iter->pos = 0;
 	sptep = iter->desc->sptes[iter->pos];
 out:
@@ -1131,9 +1131,9 @@ out:
 	return sptep;
 }
 
-#define for_each_rmap_spte(_rmap_, _iter_, _spte_)			    \
-	   for (_spte_ = rmap_get_first(*_rmap_, _iter_);		    \
-		_spte_; _spte_ = rmap_get_next(_iter_))
+#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_)		\
+	for (_spte_ = rmap_get_first(_rmap_head_, _iter_);	\
+	     _spte_; _spte_ = rmap_get_next(_iter_))
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
@@ -1191,14 +1191,15 @@ static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect)
 	return mmu_spte_update(sptep, spte);
 }
 
-static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
+static bool __rmap_write_protect(struct kvm *kvm,
+				 struct kvm_rmap_head *rmap_head,
 				 bool pt_protect)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep)
 		flush |= spte_write_protect(kvm, sptep, pt_protect);
 
 	return flush;
@@ -1215,13 +1216,13 @@ static bool spte_clear_dirty(struct kvm *kvm, u64 *sptep)
 	return mmu_spte_update(sptep, spte);
 }
 
-static bool __rmap_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
+static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep)
 		flush |= spte_clear_dirty(kvm, sptep);
 
 	return flush;
@@ -1238,13 +1239,13 @@ static bool spte_set_dirty(struct kvm *kvm, u64 *sptep)
 	return mmu_spte_update(sptep, spte);
 }
 
-static bool __rmap_set_dirty(struct kvm *kvm, unsigned long *rmapp)
+static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep)
 		flush |= spte_set_dirty(kvm, sptep);
 
 	return flush;
@@ -1264,12 +1265,12 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 				     struct kvm_memory_slot *slot,
 				     gfn_t gfn_offset, unsigned long mask)
 {
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 
 	while (mask) {
-		rmapp = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
-				      PT_PAGE_TABLE_LEVEL, slot);
-		__rmap_write_protect(kvm, rmapp, false);
+		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+					  PT_PAGE_TABLE_LEVEL, slot);
+		__rmap_write_protect(kvm, rmap_head, false);
 
 		/* clear the first set bit */
 		mask &= mask - 1;
@@ -1289,12 +1290,12 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 				     struct kvm_memory_slot *slot,
 				     gfn_t gfn_offset, unsigned long mask)
 {
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 
 	while (mask) {
-		rmapp = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
-				      PT_PAGE_TABLE_LEVEL, slot);
-		__rmap_clear_dirty(kvm, rmapp);
+		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+					  PT_PAGE_TABLE_LEVEL, slot);
+		__rmap_clear_dirty(kvm, rmap_head);
 
 		/* clear the first set bit */
 		mask &= mask - 1;
@@ -1326,27 +1327,27 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
 {
 	struct kvm_memory_slot *slot;
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 	int i;
 	bool write_protected = false;
 
 	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 
 	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
-		rmapp = __gfn_to_rmap(gfn, i, slot);
-		write_protected |= __rmap_write_protect(vcpu->kvm, rmapp, true);
+		rmap_head = __gfn_to_rmap(gfn, i, slot);
+		write_protected |= __rmap_write_protect(vcpu->kvm, rmap_head, true);
 	}
 
 	return write_protected;
 }
 
-static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
+static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	while ((sptep = rmap_get_first(*rmapp, &iter))) {
+	while ((sptep = rmap_get_first(rmap_head, &iter))) {
 		rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
 		drop_spte(kvm, sptep);
@@ -1356,14 +1357,14 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
 	return flush;
 }
 
-static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_unmap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			   struct kvm_memory_slot *slot, gfn_t gfn, int level,
 			   unsigned long data)
 {
-	return kvm_zap_rmapp(kvm, rmapp);
+	return kvm_zap_rmapp(kvm, rmap_head);
 }
 
-static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			     struct kvm_memory_slot *slot, gfn_t gfn, int level,
 			     unsigned long data)
 {
@@ -1378,7 +1379,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	new_pfn = pte_pfn(*ptep);
 
 restart:
-	for_each_rmap_spte(rmapp, &iter, sptep) {
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		rmap_printk("kvm_set_pte_rmapp: spte %p %llx gfn %llx (%d)\n",
 			     sptep, *sptep, gfn, level);
 
@@ -1416,11 +1417,11 @@ struct slot_rmap_walk_iterator {
 
 	/* output fields. */
 	gfn_t gfn;
-	unsigned long *rmap;
+	struct kvm_rmap_head *rmap;
 	int level;
 
 	/* private field. */
-	unsigned long *end_rmap;
+	struct kvm_rmap_head *end_rmap;
 };
 
 static void
@@ -1479,7 +1480,7 @@ static int kvm_handle_hva_range(struct kvm *kvm,
 				unsigned long end,
 				unsigned long data,
 				int (*handler)(struct kvm *kvm,
-					       unsigned long *rmapp,
+					       struct kvm_rmap_head *rmap_head,
 					       struct kvm_memory_slot *slot,
 					       gfn_t gfn,
 					       int level,
@@ -1523,7 +1524,8 @@ static int kvm_handle_hva_range(struct kvm *kvm,
 
 static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 			  unsigned long data,
-			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
+			  int (*handler)(struct kvm *kvm,
+					 struct kvm_rmap_head *rmap_head,
 					 struct kvm_memory_slot *slot,
 					 gfn_t gfn, int level,
 					 unsigned long data))
@@ -1546,7 +1548,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
 }
 
-static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			 struct kvm_memory_slot *slot, gfn_t gfn, int level,
 			 unsigned long data)
 {
@@ -1556,18 +1558,19 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 
 	BUG_ON(!shadow_accessed_mask);
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		if (*sptep & shadow_accessed_mask) {
 			young = 1;
 			clear_bit((ffs(shadow_accessed_mask) - 1),
 				 (unsigned long *)sptep);
 		}
+	}
 
 	trace_kvm_age_page(gfn, level, slot, young);
 	return young;
 }
 
-static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			      struct kvm_memory_slot *slot, gfn_t gfn,
 			      int level, unsigned long data)
 {
@@ -1583,11 +1586,12 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	if (!shadow_accessed_mask)
 		goto out;
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		if (*sptep & shadow_accessed_mask) {
 			young = 1;
 			break;
 		}
+	}
 out:
 	return young;
 }
@@ -1596,14 +1600,14 @@ out:
 
 static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 {
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 	struct kvm_mmu_page *sp;
 
 	sp = page_header(__pa(spte));
 
-	rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp);
+	rmap_head = gfn_to_rmap(vcpu->kvm, gfn, sp);
 
-	kvm_unmap_rmapp(vcpu->kvm, rmapp, NULL, gfn, sp->role.level, 0);
+	kvm_unmap_rmapp(vcpu->kvm, rmap_head, NULL, gfn, sp->role.level, 0);
 	kvm_flush_remote_tlbs(vcpu->kvm);
 }
 
@@ -1720,7 +1724,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 	 * this feature. See the comments in kvm_zap_obsolete_pages().
 	 */
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
-	sp->parent_ptes = 0;
+	sp->parent_ptes.val = 0;
 	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
 	return sp;
@@ -2273,7 +2277,7 @@ static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	while ((sptep = rmap_get_first(sp->parent_ptes, &iter)))
+	while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
 		drop_parent_pte(sp, sptep);
 }
 
@@ -4485,7 +4489,7 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
 }
 
 /* The return value indicates if tlb flush on all vcpus is needed. */
-typedef bool (*slot_level_handler) (struct kvm *kvm, unsigned long *rmap);
+typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
 
 /* The caller should hold mmu-lock before calling this function. */
 static bool
@@ -4579,9 +4583,10 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	spin_unlock(&kvm->mmu_lock);
 }
 
-static bool slot_rmap_write_protect(struct kvm *kvm, unsigned long *rmapp)
+static bool slot_rmap_write_protect(struct kvm *kvm,
+				    struct kvm_rmap_head *rmap_head)
 {
-	return __rmap_write_protect(kvm, rmapp, false);
+	return __rmap_write_protect(kvm, rmap_head, false);
 }
 
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
@@ -4617,7 +4622,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 }
 
 static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
-		unsigned long *rmapp)
+					 struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -4626,7 +4631,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 	struct kvm_mmu_page *sp;
 
 restart:
-	for_each_rmap_spte(rmapp, &iter, sptep) {
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		sp = page_header(__pa(sptep));
 		pfn = spte_to_pfn(*sptep);
 
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 90ee420..1cee3ec 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -129,7 +129,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
 {
 	static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 	struct kvm_mmu_page *rev_sp;
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *slot;
@@ -150,8 +150,8 @@ static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
 		return;
 	}
 
-	rmapp = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
-	if (!*rmapp) {
+	rmap_head = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
+	if (!rmap_head->val) {
 		if (!__ratelimit(&ratelimit_state))
 			return;
 		audit_printk(kvm, "no rmap for writable spte %llx\n",
@@ -192,7 +192,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 	u64 *sptep;
 	struct rmap_iterator iter;
 	struct kvm_memslots *slots;
@@ -203,13 +203,14 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, sp->gfn);
-	rmapp = __gfn_to_rmap(sp->gfn, PT_PAGE_TABLE_LEVEL, slot);
+	rmap_head = __gfn_to_rmap(sp->gfn, PT_PAGE_TABLE_LEVEL, slot);
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		if (is_writable_pte(*sptep))
 			audit_printk(kvm, "shadow page has writable "
 				     "mappings: gfn %llx role %x\n",
 				     sp->gfn, sp->role.word);
+	}
 }
 
 static void audit_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (6 preceding siblings ...)
  2015-11-12 11:55 ` [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct Takuya Yoshikawa
@ 2015-11-12 11:55 ` Takuya Yoshikawa
  2015-11-12 11:56 ` [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() Takuya Yoshikawa
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:55 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

Make kvm_mmu_alloc_page() do just what its name tells to do, and remove
the extra error check at its call site since the allocation cannot fail.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 85f4bbd..9273cd4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1707,8 +1707,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
 	mmu_spte_clear_no_track(parent_pte);
 }
 
-static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
-					       u64 *parent_pte, int direct)
+static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
 {
 	struct kvm_mmu_page *sp;
 
@@ -1724,8 +1723,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 	 * this feature. See the comments in kvm_zap_obsolete_pages().
 	 */
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
-	sp->parent_ptes.val = 0;
-	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
 	return sp;
 }
@@ -2124,10 +2121,14 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		trace_kvm_mmu_get_page(sp, false);
 		return sp;
 	}
+
 	++vcpu->kvm->stat.mmu_cache_miss;
-	sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
-	if (!sp)
-		return sp;
+
+	sp = kvm_mmu_alloc_page(vcpu, direct);
+
+	sp->parent_ptes.val = 0;
+	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
+
 	sp->gfn = gfn;
 	sp->role = role;
 	hlist_add_head(&sp->hash_link,
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (7 preceding siblings ...)
  2015-11-12 11:55 ` [PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() Takuya Yoshikawa
@ 2015-11-12 11:56 ` Takuya Yoshikawa
  2015-11-12 14:27   ` Paolo Bonzini
  2015-11-18  3:32   ` Xiao Guangrong
  2015-11-12 11:57 ` [PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() Takuya Yoshikawa
  2015-11-12 12:08 ` [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Paolo Bonzini
  10 siblings, 2 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:56 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

Every time kvm_mmu_get_page() is called with a non-NULL parent_pte
argument, link_shadow_page() follows that to set the parent entry so
that the new mapping will point to the returned page table.

Moving parent_pte handling there allows to clean up the code because
parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and
mmu_page_add_parent_pte().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c         | 21 ++++++++-------------
 arch/x86/kvm/paging_tmpl.h |  6 ++----
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9273cd4..33fe720 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2108,14 +2108,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (sp->unsync_children) {
 			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 			kvm_mmu_mark_parents_unsync(sp);
-			if (parent_pte)
-				mark_unsync(parent_pte);
 		} else if (sp->unsync) {
 			kvm_mmu_mark_parents_unsync(sp);
-			if (parent_pte)
-				mark_unsync(parent_pte);
 		}
-		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
 		__clear_sp_write_flooding_count(sp);
 		trace_kvm_mmu_get_page(sp, false);
@@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	sp = kvm_mmu_alloc_page(vcpu, direct);
 
 	sp->parent_ptes.val = 0;
-	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
 	sp->gfn = gfn;
 	sp->role = role;
@@ -2196,7 +2190,8 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
 	return __shadow_walk_next(iterator, *iterator->sptep);
 }
 
-static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed)
+static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+			     struct kvm_mmu_page *sp, bool accessed)
 {
 	u64 spte;
 
@@ -2210,6 +2205,11 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed)
 		spte |= shadow_accessed_mask;
 
 	mmu_spte_set(sptep, spte);
+
+	if (sp->unsync_children || sp->unsync)
+		mark_unsync(sptep);
+
+	mmu_page_add_parent_pte(vcpu, sp, sptep);
 }
 
 static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
@@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
 		mmu_page_zap_pte(kvm, sp, sp->spt + i);
 }
 
-static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-	mmu_page_remove_parent_pte(sp, parent_pte);
-}
-
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	u64 *sptep;
@@ -2738,7 +2733,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
 					      iterator.level - 1,
 					      1, ACC_ALL, iterator.sptep);
 
-			link_shadow_page(iterator.sptep, sp, true);
+			link_shadow_page(vcpu, iterator.sptep, sp, true);
 		}
 	}
 	return emulate;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 9d21b44..f414ca6 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			goto out_gpte_changed;
 
 		if (sp)
-			link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
+			link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
 	}
 
 	for (;
@@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 
 		sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
 				      true, direct_access, it.sptep);
-		link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
+		link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
 	}
 
 	clear_sp_write_flooding_count(it.sptep);
@@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 	return emulate;
 
 out_gpte_changed:
-	if (sp)
-		kvm_mmu_put_page(sp, it.sptep);
 	kvm_release_pfn_clean(pfn);
 	return 0;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (8 preceding siblings ...)
  2015-11-12 11:56 ` [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() Takuya Yoshikawa
@ 2015-11-12 11:57 ` Takuya Yoshikawa
  2015-11-12 12:08 ` [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Paolo Bonzini
  10 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:57 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c         | 20 +++++++-------------
 arch/x86/kvm/paging_tmpl.h |  4 ++--
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 33fe720..101e77d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2072,8 +2072,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 					     gva_t gaddr,
 					     unsigned level,
 					     int direct,
-					     unsigned access,
-					     u64 *parent_pte)
+					     unsigned access)
 {
 	union kvm_mmu_page_role role;
 	unsigned quadrant;
@@ -2730,8 +2729,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
 			base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
 			pseudo_gfn = base_addr >> PAGE_SHIFT;
 			sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
-					      iterator.level - 1,
-					      1, ACC_ALL, iterator.sptep);
+					      iterator.level - 1, 1, ACC_ALL);
 
 			link_shadow_page(vcpu, iterator.sptep, sp, true);
 		}
@@ -3088,8 +3086,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
 		spin_lock(&vcpu->kvm->mmu_lock);
 		make_mmu_pages_available(vcpu);
-		sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL,
-				      1, ACC_ALL, NULL);
+		sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
 		vcpu->arch.mmu.root_hpa = __pa(sp->spt);
@@ -3101,9 +3098,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 			spin_lock(&vcpu->kvm->mmu_lock);
 			make_mmu_pages_available(vcpu);
 			sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT),
-					      i << 30,
-					      PT32_ROOT_LEVEL, 1, ACC_ALL,
-					      NULL);
+					i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
 			root = __pa(sp->spt);
 			++sp->root_count;
 			spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3140,7 +3135,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		spin_lock(&vcpu->kvm->mmu_lock);
 		make_mmu_pages_available(vcpu);
 		sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
-				      0, ACC_ALL, NULL);
+				      0, ACC_ALL);
 		root = __pa(sp->spt);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3173,9 +3168,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		}
 		spin_lock(&vcpu->kvm->mmu_lock);
 		make_mmu_pages_available(vcpu);
-		sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
-				      PT32_ROOT_LEVEL, 0,
-				      ACC_ALL, NULL);
+		sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL,
+				      0, ACC_ALL);
 		root = __pa(sp->spt);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f414ca6..ee9d211 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 		if (!is_shadow_present_pte(*it.sptep)) {
 			table_gfn = gw->table_gfn[it.level - 2];
 			sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
-					      false, access, it.sptep);
+					      false, access);
 		}
 
 		/*
@@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 		direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 
 		sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
-				      true, direct_access, it.sptep);
+				      true, direct_access);
 		link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
 	}
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (9 preceding siblings ...)
  2015-11-12 11:57 ` [PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() Takuya Yoshikawa
@ 2015-11-12 12:08 ` Paolo Bonzini
  10 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-12 12:08 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, linux-kernel, Marcelo Tosatti



On 12/11/2015 12:48, Takuya Yoshikawa wrote:
> v1->v2:
>   Patch 5 and 7 are added based on Paolo's suggestions.
>   Patch 8-10 are new.
> 
> Patch 1/2/3/4: no change.
> Patch 5: Needed a bit more work than I had expected.
> Patch 6: Removed extra comment of v1 (patch 5 made it inappropriate).
> Patch 7: As expected, many places needed to be converted.
> Patch 8: This is new, but only a small change.
> 
> Patch 9: Kind of an RFC (though I have checked it to some extent).
>   Following two places need to be carefully checked:
>   - in kvm_mmu_get_page: "if (!direct)" block after kvm_mmu_alloc_page()
>   - in FNAME(fetch): "if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))" case
> Patch 10: Trivial cleanup, assuming that patch 9 is correct.
> 
> 
> In summary: patch 1-7 is the result of updating v1 based on the suggestions.
>   Although patch 5 does not look so nice than expected, this is the most
>   conservative approach, and patch 8-10 try to alleviate the sadness.

If it works, it's actually better than what we have now.  I'll review it
in a few days.

Marcelo, can you look at this as well?  You are still king of MMU. :)

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-12 11:56 ` [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() Takuya Yoshikawa
@ 2015-11-12 14:27   ` Paolo Bonzini
  2015-11-12 17:03     ` Paolo Bonzini
  2015-11-13  2:15     ` Takuya Yoshikawa
  2015-11-18  3:32   ` Xiao Guangrong
  1 sibling, 2 replies; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-12 14:27 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, linux-kernel



On 12/11/2015 12:56, Takuya Yoshikawa wrote:
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 9d21b44..f414ca6 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>  			goto out_gpte_changed;
>  
>  		if (sp)
> -			link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
> +			link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
>  	}
>  

Here I think you can remove completely the

	if (sp)
		kvm_mmu_put_page(sp, it.sptep);

later in FNAME(fetch).  Apart from this nit, it's okay.

On to kvm_mmu_get_page...

        if (!direct) {
                if (rmap_write_protect(vcpu, gfn))
                        kvm_flush_remote_tlbs(vcpu->kvm);
                if (level > PT_PAGE_TABLE_LEVEL && need_sync)
                        kvm_sync_pages(vcpu, gfn);

This seems fishy.

need_sync is set if sp->unsync, but then the parents have not been
unsynced yet.

On the other hand, all calls to kvm_mmu_get_page except for the
roots are followed by link_shadow_page...  Perhaps if parent_pte != NULL
you can call link_shadow_page directly from kvm_mmu_get_page.  The call
would go before the "if (!direct)" and it would subsume all the existing
calls.

We could probably also warn if

	(parent_pte == NULL)
		!= (level == vcpu->arch.mmu.root_level)

in kvm_mmu_get_page.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-12 14:27   ` Paolo Bonzini
@ 2015-11-12 17:03     ` Paolo Bonzini
  2015-11-13  2:15     ` Takuya Yoshikawa
  1 sibling, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-12 17:03 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, linux-kernel



On 12/11/2015 15:27, Paolo Bonzini wrote:
> Here I think you can remove completely the
> 
> 	if (sp)
> 		kvm_mmu_put_page(sp, it.sptep);
> 
> later in FNAME(fetch).  Apart from this nit, it's okay.

Removing this is of course not possible anymore if the other suggestion
works out.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-12 14:27   ` Paolo Bonzini
  2015-11-12 17:03     ` Paolo Bonzini
@ 2015-11-13  2:15     ` Takuya Yoshikawa
  2015-11-13 10:51       ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-13  2:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel

On 2015/11/12 23:27, Paolo Bonzini wrote:

> On 12/11/2015 12:56, Takuya Yoshikawa wrote:
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 9d21b44..f414ca6 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>>   			goto out_gpte_changed;
>>
>>   		if (sp)
>> -			link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
>> +			link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
>>   	}
>>
>
> Here I think you can remove completely the
>
> 	if (sp)
> 		kvm_mmu_put_page(sp, it.sptep);
>
> later in FNAME(fetch).  Apart from this nit, it's okay.

Yes, that's what this patch does below:

> @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>  	return emulate;
>
>  out_gpte_changed:
> -	if (sp)
> -		kvm_mmu_put_page(sp, it.sptep);
>  	kvm_release_pfn_clean(pfn);
>  	return 0;
>  }

Since this is the only user of kvm_mmu_put_page(), it also removes
the definition:

> @@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
>  		mmu_page_zap_pte(kvm, sp, sp->spt + i);
>  }
>
> -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
> -{
> -	mmu_page_remove_parent_pte(sp, parent_pte);
> -}
> -
>  static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>  	u64 *sptep;

Actually, I don't understand why this is named kvm_mmu_put_page() for
just removing parent_pte pointer from the sp->parent_ptes pointer chain.


> On to kvm_mmu_get_page...
>
>          if (!direct) {
>                  if (rmap_write_protect(vcpu, gfn))
>                          kvm_flush_remote_tlbs(vcpu->kvm);
>                  if (level > PT_PAGE_TABLE_LEVEL && need_sync)
>                          kvm_sync_pages(vcpu, gfn);
>
> This seems fishy.
>
> need_sync is set if sp->unsync, but then the parents have not been
> unsynced yet.

Reaching here means that kvm_mmu_get_page() could not return sp
from inside the for_each_gfn_sp() loop above, so even without
this patch, mark_unsync() has not been called.

Here, sp holds the new page allocated by kvm_mmu_alloc_page().
One confusing thing is that hlist_add_head() right before this
"if (!direct)" line has already added the new sp to the hash
list, so it will be found by for_each_gfn_indirect_valid_sp()
in kvm_sync_pages().

Because this sp is new and sp->unsync is not set,  kvm_sync_pages()
will just skip it and look for other sp's whose ->unsync were found
to be set in the for_each_gfn_sp() loop.

I'm not 100% sure if the existence of the parent_pte pointer in the
newly created sp->parent_ptes chain alone makes any difference:
> @@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>  	sp = kvm_mmu_alloc_page(vcpu, direct);
>
>  	sp->parent_ptes.val = 0;
> -	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>
>  	sp->gfn = gfn;
>  	sp->role = role;


> On the other hand, all calls to kvm_mmu_get_page except for the
> roots are followed by link_shadow_page...  Perhaps if parent_pte != NULL
> you can call link_shadow_page directly from kvm_mmu_get_page.  The call
> would go before the "if (!direct)" and it would subsume all the existing
> calls.
>
> We could probably also warn if
>
> 	(parent_pte == NULL)
> 		!= (level == vcpu->arch.mmu.root_level)
>
> in kvm_mmu_get_page.

I think we should set the spte after init_shadow_page_table(), and
to make this subsume all the existing calls, we need to change the
"return sp;" in the for_each_gfn_sp() loop to a goto statement so
that the end of this function will become something like this:

     init_shadow_page(sp);
out:
     if (parent_pte) {
         mmu_page_add_parent_pte(vcpu, sp, parent_pte);
         link_shadow_page(parent_pte, sp, accessed);
     }
     trace_kvm_mmu_get_page(sp, created);
     return sp;

So, "bool accessed" needs to be passed to kvm_mmu_get_page().
But any way, we need to understand if mmu_page_add_parent_pte()
really needs to be placed before the "if (!direct)" block.

   Takuya



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-13  2:15     ` Takuya Yoshikawa
@ 2015-11-13 10:51       ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-13 10:51 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, linux-kernel



On 13/11/2015 03:15, Takuya Yoshikawa wrote:
> Actually, I don't understand why this is named kvm_mmu_put_page() for
> just removing parent_pte pointer from the sp->parent_ptes pointer chain.

Because it undoes kvm_mmu_get_page, I guess. :)

> 
>> On to kvm_mmu_get_page...
>>
>>          if (!direct) {
>>                  if (rmap_write_protect(vcpu, gfn))
>>                          kvm_flush_remote_tlbs(vcpu->kvm);
>>                  if (level > PT_PAGE_TABLE_LEVEL && need_sync)
>>                          kvm_sync_pages(vcpu, gfn);
>>
>> This seems fishy.
>>
>> need_sync is set if sp->unsync, but then the parents have not been
>> unsynced yet.
> 
> Reaching here means that kvm_mmu_get_page() could not return sp
> from inside the for_each_gfn_sp() loop above, so even without
> this patch, mark_unsync() has not been called.

You're right.

> Here, sp holds the new page allocated by kvm_mmu_alloc_page().
> One confusing thing is that hlist_add_head() right before this
> "if (!direct)" line has already added the new sp to the hash
> list, so it will be found by for_each_gfn_indirect_valid_sp()
> in kvm_sync_pages().
> 
> Because this sp is new and sp->unsync is not set,  kvm_sync_pages()
> will just skip it and look for other sp's whose ->unsync were found
> to be set in the for_each_gfn_sp() loop.
> 
> I'm not 100% sure if the existence of the parent_pte pointer in the
> newly created sp->parent_ptes chain alone makes any difference:

No, I don't think so.  Nothing needs the parent_ptes at this point:

- kvm_mmu_mark_parents_unsync, even in the existing code, it's called
before the new SPTE is created.

- as you said, kvm_mmu_prepare_zap_page can be called by kvm_sync_pages
but it will not operate on this page because its ->unsync is zero.

> So, "bool accessed" needs to be passed to kvm_mmu_get_page(). 

The "bool accessed" parameter is not necessary, I think.  It is only
false in the nested EPT case, and there's no reason not to set the
accessed bit *in the shadow page* if the host supports EPT
accessed/dirty bits.  I'll test and send a patch to remove the argument.

> But any way, we need to understand if mmu_page_add_parent_pte()
> really needs to be placed before the "if (!direct)" block.

No, I don't think so anymore.

I think these patches are fine as a starting point for further cleanups,
I'll push them to kvm/queue very soon.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-12 11:52 ` [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() Takuya Yoshikawa
@ 2015-11-13 21:47   ` Marcelo Tosatti
  2015-11-14  9:20     ` Marcelo Tosatti
  2015-11-18  3:07   ` Xiao Guangrong
  1 sibling, 1 reply; 31+ messages in thread
From: Marcelo Tosatti @ 2015-11-13 21:47 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: pbonzini, kvm, linux-kernel

On Thu, Nov 12, 2015 at 08:52:45PM +0900, Takuya Yoshikawa wrote:
> kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
> nearly the same as the for_each_rmap_spte macro.  The only difference
> is that is_shadow_present_pte() checks cannot be placed there because
> kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
> whose entry is not set yet.
> 
> By calling mark_unsync() separately for the parent and adding the parent
> pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
> works with no problem.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> ---
>  arch/x86/kvm/mmu.c | 36 +++++++++++++-----------------------
>  1 file changed, 13 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index e8cfdc4..1691171 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
>  	}
>  }
>  
> -typedef void (*pte_list_walk_fn) (u64 *spte);
> -static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
> -{
> -	struct pte_list_desc *desc;
> -	int i;
> -
> -	if (!*pte_list)
> -		return;
> -
> -	if (!(*pte_list & 1))
> -		return fn((u64 *)*pte_list);
> -
> -	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
> -	while (desc) {
> -		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
> -			fn(desc->sptes[i]);
> -		desc = desc->more;
> -	}
> -}
> -
>  static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
>  				    struct kvm_memory_slot *slot)
>  {
> @@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>  static void mark_unsync(u64 *spte);
>  static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
>  {
> -	pte_list_walk(&sp->parent_ptes, mark_unsync);
> +	u64 *sptep;
> +	struct rmap_iterator iter;
> +
> +	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> +		mark_unsync(sptep);
> +	}
>  }
>  
>  static void mark_unsync(u64 *spte)
> @@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,

Faulting a spte, and one of the levels of sptes, either


		spte-1
		spte-2
		spte-3

has present bit clear. So we're searching for a guest page to shadow, with
gfn "gfn".

>  		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
>  			break;

If a shadow for gfn exists, but is unsync, sync guest-page ---to--> kvm
sptes.

> -		mmu_page_add_parent_pte(vcpu, sp, parent_pte);

add "gfn" (actually its "struct kvm_mmu_page *sp" pointer) to
the parent.
>  		if (sp->unsync_children) {
>  			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>  			kvm_mmu_mark_parents_unsync(sp);

kvm_mmu_mark_parents_unsync relied on the links from current level all
the way to top level to mark all levels unsync, so that on guest entry,
KVM_REQ_MMU_SYNC is processed and any level is brought from guest -->
kvm pages. This now fails, because you removed "mmu_page_add_parent_pte"
(the link is not formed all the way to root).

Unless i am missing something, this is not correct.

> -		} else if (sp->unsync)
> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		} else if (sp->unsync) {
>  			kvm_mmu_mark_parents_unsync(sp);
> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		}
> +		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>  
>  		__clear_sp_write_flooding_count(sp);
>  		trace_kvm_mmu_get_page(sp, false);


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
  2015-11-12 11:53 ` [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes Takuya Yoshikawa
@ 2015-11-13 22:08   ` Marcelo Tosatti
  2015-11-16  3:34     ` Takuya Yoshikawa
  0 siblings, 1 reply; 31+ messages in thread
From: Marcelo Tosatti @ 2015-11-13 22:08 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: pbonzini, kvm, linux-kernel

On Thu, Nov 12, 2015 at 08:53:43PM +0900, Takuya Yoshikawa wrote:
> At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
> placed right after the call to detect unrelated sptes which must not be
> found in the reverse-mapping list.
> 
> Move this check in rmap_get_first/next() so that all call sites, not
> just the users of the for_each_rmap_spte() macro, will be checked the
> same way.  In addition, change the BUG_ON to WARN_ON since killing the
> whole host is the last thing that KVM should try.

It should be a BUG_ON, if KVM continues it will corrupt (more) memory.

> One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
> rmap_get_first() to handle parent sptes.  The change will not break it
> because parent sptes are present, at least until drop_parent_pte()
> actually unlinks them, and not mmio-sptes.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> ---
>  Documentation/virtual/kvm/mmu.txt |  4 ++--
>  arch/x86/kvm/mmu.c                | 26 +++++++++++++++++---------
>  2 files changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
> index 3a4d681..daf9c0f 100644
> --- a/Documentation/virtual/kvm/mmu.txt
> +++ b/Documentation/virtual/kvm/mmu.txt
> @@ -203,10 +203,10 @@ Shadow pages contain the following information:
>      page cannot be destroyed.  See role.invalid.
>    parent_ptes:
>      The reverse mapping for the pte/ptes pointing at this page's spt. If
> -    parent_ptes bit 0 is zero, only one spte points at this pages and
> +    parent_ptes bit 0 is zero, only one spte points at this page and
>      parent_ptes points at this single spte, otherwise, there exists multiple
>      sptes pointing at this page and (parent_ptes & ~0x1) points at a data
> -    structure with a list of parent_ptes.
> +    structure with a list of parent sptes.
>    unsync:
>      If true, then the translations in this page may not match the guest's
>      translation.  This is equivalent to the state of the tlb when a pte is
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 1691171..ee7b101 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1079,17 +1079,23 @@ struct rmap_iterator {
>   */
>  static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
>  {
> +	u64 *sptep;
> +
>  	if (!rmap)
>  		return NULL;
>  
>  	if (!(rmap & 1)) {
>  		iter->desc = NULL;
> -		return (u64 *)rmap;
> +		sptep = (u64 *)rmap;
> +		goto out;
>  	}
>  
>  	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
>  	iter->pos = 0;
> -	return iter->desc->sptes[iter->pos];
> +	sptep = iter->desc->sptes[iter->pos];
> +out:
> +	WARN_ON(!is_shadow_present_pte(*sptep));
> +	return sptep;
>  }
>  
>  /*
> @@ -1099,14 +1105,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
>   */
>  static u64 *rmap_get_next(struct rmap_iterator *iter)
>  {
> +	u64 *sptep;
> +
>  	if (iter->desc) {
>  		if (iter->pos < PTE_LIST_EXT - 1) {
> -			u64 *sptep;
> -
>  			++iter->pos;
>  			sptep = iter->desc->sptes[iter->pos];
>  			if (sptep)
> -				return sptep;
> +				goto out;
>  		}
>  
>  		iter->desc = iter->desc->more;
> @@ -1114,17 +1120,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
>  		if (iter->desc) {
>  			iter->pos = 0;
>  			/* desc->sptes[0] cannot be NULL */
> -			return iter->desc->sptes[iter->pos];
> +			sptep = iter->desc->sptes[iter->pos];
> +			goto out;
>  		}
>  	}
>  
>  	return NULL;
> +out:
> +	WARN_ON(!is_shadow_present_pte(*sptep));
> +	return sptep;
>  }
>  
>  #define for_each_rmap_spte(_rmap_, _iter_, _spte_)			    \
>  	   for (_spte_ = rmap_get_first(*_rmap_, _iter_);		    \
> -		_spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;});  \
> -			_spte_ = rmap_get_next(_iter_))
> +		_spte_; _spte_ = rmap_get_next(_iter_))
>  
>  static void drop_spte(struct kvm *kvm, u64 *sptep)
>  {
> @@ -1338,7 +1347,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
>  	bool flush = false;
>  
>  	while ((sptep = rmap_get_first(*rmapp, &iter))) {
> -		BUG_ON(!(*sptep & PT_PRESENT_MASK));
>  		rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
>  
>  		drop_spte(kvm, sptep);
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-13 21:47   ` Marcelo Tosatti
@ 2015-11-14  9:20     ` Marcelo Tosatti
  2015-11-16  2:51       ` Takuya Yoshikawa
  0 siblings, 1 reply; 31+ messages in thread
From: Marcelo Tosatti @ 2015-11-14  9:20 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: pbonzini, kvm, linux-kernel

On Fri, Nov 13, 2015 at 07:47:28PM -0200, Marcelo Tosatti wrote:
> On Thu, Nov 12, 2015 at 08:52:45PM +0900, Takuya Yoshikawa wrote:
> > kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
> > nearly the same as the for_each_rmap_spte macro.  The only difference
> > is that is_shadow_present_pte() checks cannot be placed there because
> > kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
> > whose entry is not set yet.
> > 
> > By calling mark_unsync() separately for the parent and adding the parent
> > pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
> > works with no problem.
> > 
> > Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> > ---
> >  arch/x86/kvm/mmu.c | 36 +++++++++++++-----------------------
> >  1 file changed, 13 insertions(+), 23 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index e8cfdc4..1691171 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
> >  	}
> >  }
> >  
> > -typedef void (*pte_list_walk_fn) (u64 *spte);
> > -static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
> > -{
> > -	struct pte_list_desc *desc;
> > -	int i;
> > -
> > -	if (!*pte_list)
> > -		return;
> > -
> > -	if (!(*pte_list & 1))
> > -		return fn((u64 *)*pte_list);
> > -
> > -	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
> > -	while (desc) {
> > -		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
> > -			fn(desc->sptes[i]);
> > -		desc = desc->more;
> > -	}
> > -}
> > -
> >  static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
> >  				    struct kvm_memory_slot *slot)
> >  {
> > @@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
> >  static void mark_unsync(u64 *spte);
> >  static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
> >  {
> > -	pte_list_walk(&sp->parent_ptes, mark_unsync);
> > +	u64 *sptep;
> > +	struct rmap_iterator iter;
> > +
> > +	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> > +		mark_unsync(sptep);
> > +	}
> >  }
> >  
> >  static void mark_unsync(u64 *spte)
> > @@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> 
> Faulting a spte, and one of the levels of sptes, either
> 
> 
> 		spte-1
> 		spte-2
> 		spte-3
> 
> has present bit clear. So we're searching for a guest page to shadow, with
> gfn "gfn".
> 
> >  		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
> >  			break;
> 
> If a shadow for gfn exists, but is unsync, sync guest-page ---to--> kvm
> sptes.
> 
> > -		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
> 
> add "gfn" (actually its "struct kvm_mmu_page *sp" pointer) to
> the parent.
> >  		if (sp->unsync_children) {
> >  			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> >  			kvm_mmu_mark_parents_unsync(sp);
> 
> kvm_mmu_mark_parents_unsync relied on the links from current level all
> the way to top level to mark all levels unsync, so that on guest entry,
> KVM_REQ_MMU_SYNC is processed and any level is brought from guest -->
> kvm pages. This now fails, because you removed "mmu_page_add_parent_pte"
> (the link is not formed all the way to root).
> 
> Unless i am missing something, this is not correct.

The actual issue is this: a higher level page that had, under its children,
no out of sync pages, now, due to your addition, a child that is unsync:

initial state:
	level1 

final state:

	level1 -x-> level2 -x-> level3

Where -x-> are the links created by this pagefault fixing round.

If _any_ page under you is unsync (not necessarily the ones this
pagefault is accessing), you have to mark parents unsync.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-14  9:20     ` Marcelo Tosatti
@ 2015-11-16  2:51       ` Takuya Yoshikawa
  2015-11-17 17:58         ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-16  2:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: pbonzini, kvm, linux-kernel

On 2015/11/14 18:20, Marcelo Tosatti wrote:

> The actual issue is this: a higher level page that had, under its children,
> no out of sync pages, now, due to your addition, a child that is unsync:
>
> initial state:
> 	level1
>
> final state:
>
> 	level1 -x-> level2 -x-> level3
>
> Where -x-> are the links created by this pagefault fixing round.
>
> If _any_ page under you is unsync (not necessarily the ones this
> pagefault is accessing), you have to mark parents unsync.

I understand this, but I don't think my patch will break this.

What kvm_mmu_mark_parents_unsync() does is:

   for each p_i in sp->parent_ptes rmap chain
     mark_unsync(p_i);

Then, mark_unsync() finds the parent sp including that p_i to
set ->unsync_child_bitmap and increment ->unsync_children if
necessary.  It may also call kvm_mmu_mark_parents_unsync()
recursively.

I understand we need to tell the parents "you have an unsync
child/descendant" until this information reaches the top level
by that recursive calls.

But since these recursive calls cannot come back to the starting sp,
the child->parent graph has no loop, each mark_unsync(p_i) will not
be affected by other parents in that sp->parent_ptes rmap chain,
from which we started the recursive calls.

As the following code shows, my patch does mark_unsync(parent_pte)
separately, and then mmu_page_add_parent_pte(vcpu, sp, parent_pte):

> -		} else if (sp->unsync)
> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		} else if (sp->unsync) {
>  			kvm_mmu_mark_parents_unsync(sp);
> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		}
> +		mmu_page_add_parent_pte(vcpu, sp, parent_pte);

So, as you worried, during each mark_unsync(p_i) is processed,
this parent_pte does not exist in that sp->parent_ptes rmap chain.

But as I explained above, this does not change anything about what
each mark_unsync(p_i) call does, so keeps the original behaviour.

By the way, I think "kvm_mmu_mark_parents_unsync" and "mark_unsync"
do not tell what they actually do well. When I first saw the names,
I thought they would just set the parents' sp->unsync.

To reflect the following meaning better, it should be
propagate_unsync(_to_parents) or something:

   Tell the parents "you have an unsync child/descendant"
   until this unsync information reaches the top level

Thanks,
   Takuya

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
  2015-11-13 22:08   ` Marcelo Tosatti
@ 2015-11-16  3:34     ` Takuya Yoshikawa
  0 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-16  3:34 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: pbonzini, kvm, linux-kernel

On 2015/11/14 7:08, Marcelo Tosatti wrote:
> On Thu, Nov 12, 2015 at 08:53:43PM +0900, Takuya Yoshikawa wrote:
>> At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
>> placed right after the call to detect unrelated sptes which must not be
>> found in the reverse-mapping list.
>>
>> Move this check in rmap_get_first/next() so that all call sites, not
>> just the users of the for_each_rmap_spte() macro, will be checked the
>> same way.  In addition, change the BUG_ON to WARN_ON since killing the
>> whole host is the last thing that KVM should try.
>
> It should be a BUG_ON, if KVM continues it will corrupt (more) memory.

In the sense that we cannot predict what kind of corruption it will
cause, I agree with you.

But if it can only corrupt that guest's memory, it is a bit sad to
kill unrelated guests, and host, too.  Anyway, since we cannot say
for sure what a possible bug can cause, I agree with you now.

Thanks,
   Takuya


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-16  2:51       ` Takuya Yoshikawa
@ 2015-11-17 17:58         ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-17 17:58 UTC (permalink / raw)
  To: Takuya Yoshikawa, Marcelo Tosatti; +Cc: kvm, linux-kernel



On 16/11/2015 03:51, Takuya Yoshikawa wrote:
> What kvm_mmu_mark_parents_unsync() does is:
> 
>   for each p_i in sp->parent_ptes rmap chain
>     mark_unsync(p_i);
> 
> Then, mark_unsync() finds the parent sp including that p_i to
> set ->unsync_child_bitmap and increment ->unsync_children if
> necessary.  It may also call kvm_mmu_mark_parents_unsync()
> recursively.

I agree.  sp->parent_ptes goes up one level from sp;
kvm_mmu_mark_parents_unsync(sp) visits the level above sp, while
mark_unsync(sp) visit sp and all the levels above it.

Calling mark_unsync(parent_pte) is enough to complete the visit, after
kvm_mmu_mark_parents_unsync has already processed the level above sp.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  2015-11-12 11:50 ` [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap Takuya Yoshikawa
@ 2015-11-18  2:44   ` Xiao Guangrong
  2015-11-19  0:59     ` Takuya Yoshikawa
  0 siblings, 1 reply; 31+ messages in thread
From: Xiao Guangrong @ 2015-11-18  2:44 UTC (permalink / raw)
  To: Takuya Yoshikawa, pbonzini; +Cc: kvm, linux-kernel



On 11/12/2015 07:50 PM, Takuya Yoshikawa wrote:
> +			if (!ret) {
> +				clear_unsync_child_bit(sp, i);
> +				continue;
> +			} else if (ret > 0) {
>   				nr_unsync_leaf += ret;

Just a single line here, braces are unnecessary.

> -			else
> +			} else
>   				return ret;
>   		} else if (child->unsync) {
>   			nr_unsync_leaf++;
>   			if (mmu_pages_add(pvec, child, i))
>   				return -ENOSPC;
>   		} else
> -			 goto clear_child_bitmap;
> -
> -		continue;
> -
> -clear_child_bitmap:
> -		__clear_bit(i, sp->unsync_child_bitmap);
> -		sp->unsync_children--;
> -		WARN_ON((int)sp->unsync_children < 0);
> +			clear_unsync_child_bit(sp, i);
>   	}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-12 11:52 ` [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() Takuya Yoshikawa
  2015-11-13 21:47   ` Marcelo Tosatti
@ 2015-11-18  3:07   ` Xiao Guangrong
  1 sibling, 0 replies; 31+ messages in thread
From: Xiao Guangrong @ 2015-11-18  3:07 UTC (permalink / raw)
  To: Takuya Yoshikawa, pbonzini; +Cc: kvm, linux-kernel



On 11/12/2015 07:52 PM, Takuya Yoshikawa wrote:
> kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
> nearly the same as the for_each_rmap_spte macro.  The only difference
> is that is_shadow_present_pte() checks cannot be placed there because
> kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
> whose entry is not set yet.
>
> By calling mark_unsync() separately for the parent and adding the parent
> pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
> works with no problem.
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> ---
>   arch/x86/kvm/mmu.c | 36 +++++++++++++-----------------------
>   1 file changed, 13 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index e8cfdc4..1691171 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
>   	}
>   }
>
> -typedef void (*pte_list_walk_fn) (u64 *spte);
> -static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
> -{
> -	struct pte_list_desc *desc;
> -	int i;
> -
> -	if (!*pte_list)
> -		return;
> -
> -	if (!(*pte_list & 1))
> -		return fn((u64 *)*pte_list);
> -
> -	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
> -	while (desc) {
> -		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
> -			fn(desc->sptes[i]);
> -		desc = desc->more;
> -	}
> -}
> -
>   static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
>   				    struct kvm_memory_slot *slot)
>   {
> @@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>   static void mark_unsync(u64 *spte);
>   static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
>   {
> -	pte_list_walk(&sp->parent_ptes, mark_unsync);
> +	u64 *sptep;
> +	struct rmap_iterator iter;
> +
> +	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> +		mark_unsync(sptep);
> +	}
>   }
>
>   static void mark_unsync(u64 *spte)
> @@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>   		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
>   			break;
>
> -		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>   		if (sp->unsync_children) {
>   			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>   			kvm_mmu_mark_parents_unsync(sp);

After your change above, the @sp's parents have not been changed, no need to call it now.

> -		} else if (sp->unsync)
> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		} else if (sp->unsync) {
>   			kvm_mmu_mark_parents_unsync(sp);

Ditto.

> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		}
> +		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>
>   		__clear_sp_write_flooding_count(sp);
>   		trace_kvm_mmu_get_page(sp, false);
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  2015-11-12 11:55 ` [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct Takuya Yoshikawa
@ 2015-11-18  3:21   ` Xiao Guangrong
  2015-11-18  9:09     ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Xiao Guangrong @ 2015-11-18  3:21 UTC (permalink / raw)
  To: Takuya Yoshikawa, pbonzini; +Cc: kvm, linux-kernel



On 11/12/2015 07:55 PM, Takuya Yoshikawa wrote:
> @@ -1720,7 +1724,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>   	 * this feature. See the comments in kvm_zap_obsolete_pages().
>   	 */
>   	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
> -	sp->parent_ptes = 0;
> +	sp->parent_ptes.val = 0;

The sp is allocated from kmem_cache_zalloc() so explicitly initialize it to zero is
not needed.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-12 11:56 ` [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() Takuya Yoshikawa
  2015-11-12 14:27   ` Paolo Bonzini
@ 2015-11-18  3:32   ` Xiao Guangrong
  1 sibling, 0 replies; 31+ messages in thread
From: Xiao Guangrong @ 2015-11-18  3:32 UTC (permalink / raw)
  To: Takuya Yoshikawa, pbonzini; +Cc: kvm, linux-kernel



On 11/12/2015 07:56 PM, Takuya Yoshikawa wrote:
> Every time kvm_mmu_get_page() is called with a non-NULL parent_pte
> argument, link_shadow_page() follows that to set the parent entry so
> that the new mapping will point to the returned page table.
>
> Moving parent_pte handling there allows to clean up the code because
> parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and
> mmu_page_add_parent_pte().
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> ---
>   arch/x86/kvm/mmu.c         | 21 ++++++++-------------
>   arch/x86/kvm/paging_tmpl.h |  6 ++----
>   2 files changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9273cd4..33fe720 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2108,14 +2108,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>   		if (sp->unsync_children) {
>   			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>   			kvm_mmu_mark_parents_unsync(sp);
> -			if (parent_pte)
> -				mark_unsync(parent_pte);
>   		} else if (sp->unsync) {
>   			kvm_mmu_mark_parents_unsync(sp);
> -			if (parent_pte)
> -				mark_unsync(parent_pte);
>   		}
> -		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>
>   		__clear_sp_write_flooding_count(sp);
>   		trace_kvm_mmu_get_page(sp, false);
> @@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>   	sp = kvm_mmu_alloc_page(vcpu, direct);
>
>   	sp->parent_ptes.val = 0;
> -	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>
>   	sp->gfn = gfn;
>   	sp->role = role;
> @@ -2196,7 +2190,8 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
>   	return __shadow_walk_next(iterator, *iterator->sptep);
>   }
>
> -static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed)
> +static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
> +			     struct kvm_mmu_page *sp, bool accessed)
>   {
>   	u64 spte;
>
> @@ -2210,6 +2205,11 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed)
>   		spte |= shadow_accessed_mask;
>
>   	mmu_spte_set(sptep, spte);
> +
> +	if (sp->unsync_children || sp->unsync)
> +		mark_unsync(sptep);

Why are these needed?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  2015-11-18  3:21   ` Xiao Guangrong
@ 2015-11-18  9:09     ` Paolo Bonzini
  2015-11-19  2:23       ` Takuya Yoshikawa
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-18  9:09 UTC (permalink / raw)
  To: Xiao Guangrong, Takuya Yoshikawa; +Cc: kvm, linux-kernel



On 18/11/2015 04:21, Xiao Guangrong wrote:
> 
> 
> On 11/12/2015 07:55 PM, Takuya Yoshikawa wrote:
>> @@ -1720,7 +1724,7 @@ static struct kvm_mmu_page
>> *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>>        * this feature. See the comments in kvm_zap_obsolete_pages().
>>        */
>>       list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
>> -    sp->parent_ptes = 0;
>> +    sp->parent_ptes.val = 0;
> 
> The sp is allocated from kmem_cache_zalloc() so explicitly initialize it
> to zero is not needed.

Right, but it should be a separate patch.

Takuya, since you are going to send another version of this series, can
you also:

1) move this patch either to the beginning or to the end

2) include "KVM: x86: always set accessed bit in shadow PTEs", also near
the beginning of the series?

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  2015-11-18  2:44   ` Xiao Guangrong
@ 2015-11-19  0:59     ` Takuya Yoshikawa
  2015-11-19  2:46       ` Xiao Guangrong
  0 siblings, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-19  0:59 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini; +Cc: kvm, linux-kernel

On 2015/11/18 11:44, Xiao Guangrong wrote:

> On 11/12/2015 07:50 PM, Takuya Yoshikawa wrote:
>> +            if (!ret) {
>> +                clear_unsync_child_bit(sp, i);
>> +                continue;
>> +            } else if (ret > 0) {
>>                   nr_unsync_leaf += ret;
>
> Just a single line here, braces are unnecessary.
>
>> -            else
>> +            } else
>>                   return ret;

I know we can eliminate the braces, but that does not mean
we should do so: there seems to be no consensus about this
style issue and checkpatch accepts both ways.

Actually, some people prefer to put braces when one of the
if/else-if/else cases has multiple lines.  You can see
some examples in kernel/sched/core.c: see hrtick_start(),
sched_fork(), free_sched_domain().

In our case, I thought putting braces would align the else-if
and else and make the code look a bit nicer, but I know this
may be just a matter of personal feeling.

In short, unless the maintainer, Paolo for this file, has any
preference, both ways will be accepted.

   Takuya

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  2015-11-18  9:09     ` Paolo Bonzini
@ 2015-11-19  2:23       ` Takuya Yoshikawa
  0 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-19  2:23 UTC (permalink / raw)
  To: Paolo Bonzini, Xiao Guangrong; +Cc: kvm, linux-kernel, mtosatti

On 2015/11/18 18:09, Paolo Bonzini wrote:

> On 18/11/2015 04:21, Xiao Guangrong wrote:

>> On 11/12/2015 07:55 PM, Takuya Yoshikawa wrote:
>>> @@ -1720,7 +1724,7 @@ static struct kvm_mmu_page
>>> *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>>>         * this feature. See the comments in kvm_zap_obsolete_pages().
>>>         */
>>>        list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
>>> -    sp->parent_ptes = 0;
>>> +    sp->parent_ptes.val = 0;
>>
>> The sp is allocated from kmem_cache_zalloc() so explicitly initialize it
>> to zero is not needed.
>
> Right, but it should be a separate patch.
>
> Takuya, since you are going to send another version of this series, can
> you also:

Yes, I'm preparing to do so.

> 1) move this patch either to the beginning or to the end
>
> 2) include "KVM: x86: always set accessed bit in shadow PTEs", also near
> the beginning of the series?

Commit 1c9a5e19b1af8a2c ("KVM: x86: MMU: always set accessed bit
in shadow PTEs") will be the first.

Then, the ordering will become something like this:

02: Encapsulate the type of rmap-chain head in a new struct
03: Remove unused parameter of __direct_map()
04: Add helper function to clear a bit in unsync child bitmap
05: Make mmu_set_spte() return emulate value
06: Remove is_rmap_spte() and use is_shadow_present_pte()

These five seem to be easy ones for you to apply: since patch 02
touches many places, it should go first to become the base of the
following work.

07: Consolidate BUG_ON checks for reverse-mapped sptes

I will change the WARN_ON to BUG_ON.  // Marcelo's comment

08: Move initialization of parent_ptes out from kvm_mmu_alloc_page()

In this patch, I will delete "sp->parent_ptes.val = 0;" line since
this is the problem of kvm_mmu_alloc_page(), though not a new one.
   // Xiao's comment

09: Use for_each_rmap_spte macro instead of pte_list_walk()

There is some confusion between us: Paolo and I agreed that the
patch keeps the original way and calls mark_unsync() the same way
as before, but there are still comments from Marcelo and Xiao and
those comments seem to explain the code differently.

I will check again, but I may not change this one and the following
two patches in the next version.  If we can eliminate some of the
mark_unsync() calls, that will be kind of an optimization which this
series does not intend to achieve.

Anyway, by moving the non-trivial two patches (09 and 10) here,
reviewing will become easier and you can apply the other patches
separately.

10: Move parent_pte handling from kvm_mmu_get_page()
     to link_shadow_page()
11: Remove unused parameter parent_pte from kvm_mmu_get_page()

Thanks,
   Takuya

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  2015-11-19  0:59     ` Takuya Yoshikawa
@ 2015-11-19  2:46       ` Xiao Guangrong
  2015-11-19  4:02         ` Takuya Yoshikawa
  0 siblings, 1 reply; 31+ messages in thread
From: Xiao Guangrong @ 2015-11-19  2:46 UTC (permalink / raw)
  To: Takuya Yoshikawa, pbonzini; +Cc: kvm, linux-kernel



On 11/19/2015 08:59 AM, Takuya Yoshikawa wrote:
> On 2015/11/18 11:44, Xiao Guangrong wrote:
>
>> On 11/12/2015 07:50 PM, Takuya Yoshikawa wrote:
>>> +            if (!ret) {
>>> +                clear_unsync_child_bit(sp, i);
>>> +                continue;
>>> +            } else if (ret > 0) {
>>>                   nr_unsync_leaf += ret;
>>
>> Just a single line here, braces are unnecessary.
>>
>>> -            else
>>> +            } else
>>>                   return ret;
>
> I know we can eliminate the braces, but that does not mean
> we should do so: there seems to be no consensus about this
> style issue and checkpatch accepts both ways.
>
> Actually, some people prefer to put braces when one of the
> if/else-if/else cases has multiple lines.  You can see
> some examples in kernel/sched/core.c: see hrtick_start(),
> sched_fork(), free_sched_domain().
>
> In our case, I thought putting braces would align the else-if
> and else and make the code look a bit nicer, but I know this
> may be just a matter of personal feeling.
>
> In short, unless the maintainer, Paolo for this file, has any
> preference, both ways will be accepted.

The reason why i pointed this out is that it is the style documented
in Documentation/CodingStyle:
| Do not unnecessarily use braces where a single statement will do.
|
|        if (condition)
|                action();
|

Actually, Ingo Molnar hated this braces-style too much and blamed
many developers who used this style (include me, that why i was
nervous to see this style :( ).

If this style is commonly accepted now, it is worth making a patch
to update Documentation/CodingStyle.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  2015-11-19  2:46       ` Xiao Guangrong
@ 2015-11-19  4:02         ` Takuya Yoshikawa
  0 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-19  4:02 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini; +Cc: kvm, linux-kernel

On 2015/11/19 11:46, Xiao Guangrong wrote:

>> Actually, some people prefer to put braces when one of the
>> if/else-if/else cases has multiple lines.  You can see
>> some examples in kernel/sched/core.c: see hrtick_start(),
>> sched_fork(), free_sched_domain().
>>
>> In our case, I thought putting braces would align the else-if
>> and else and make the code look a bit nicer, but I know this
>> may be just a matter of personal feeling.
>>
>> In short, unless the maintainer, Paolo for this file, has any
>> preference, both ways will be accepted.
>
> The reason why i pointed this out is that it is the style documented
> in Documentation/CodingStyle:
> | Do not unnecessarily use braces where a single statement will do.
> |
> |        if (condition)
> |                action();
> |

Ah, this is a different thing.  For this case, there is a consensus
and checkpatch will complain if we don't obey the rule.

What I explained was:

   if (condition) {
      line1;
      line2;  // multiple lines
   } else if {
      single-line-statement;  -- (*1)
   } else
      single-line-statement;  -- (*2)

For (*1) and (*2), especially for (*1), some people put braces.

> Actually, Ingo Molnar hated this braces-style too much and blamed
> many developers who used this style (include me, that why i was
> nervous to see this style :( ).

I think he likes the coding style of kernel/sched/core.c very much,
as you know.  Actually that is one reason why I took it as an example.

Let's just choose the way which Paolo prefers for this time, I don't
know which is better.

Thank you,
   Takuya



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-11-19  4:02 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
2015-11-12 11:49 ` [PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map() Takuya Yoshikawa
2015-11-12 11:50 ` [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap Takuya Yoshikawa
2015-11-18  2:44   ` Xiao Guangrong
2015-11-19  0:59     ` Takuya Yoshikawa
2015-11-19  2:46       ` Xiao Guangrong
2015-11-19  4:02         ` Takuya Yoshikawa
2015-11-12 11:51 ` [PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value Takuya Yoshikawa
2015-11-12 11:51 ` [PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() Takuya Yoshikawa
2015-11-12 11:52 ` [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() Takuya Yoshikawa
2015-11-13 21:47   ` Marcelo Tosatti
2015-11-14  9:20     ` Marcelo Tosatti
2015-11-16  2:51       ` Takuya Yoshikawa
2015-11-17 17:58         ` Paolo Bonzini
2015-11-18  3:07   ` Xiao Guangrong
2015-11-12 11:53 ` [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes Takuya Yoshikawa
2015-11-13 22:08   ` Marcelo Tosatti
2015-11-16  3:34     ` Takuya Yoshikawa
2015-11-12 11:55 ` [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct Takuya Yoshikawa
2015-11-18  3:21   ` Xiao Guangrong
2015-11-18  9:09     ` Paolo Bonzini
2015-11-19  2:23       ` Takuya Yoshikawa
2015-11-12 11:55 ` [PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() Takuya Yoshikawa
2015-11-12 11:56 ` [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() Takuya Yoshikawa
2015-11-12 14:27   ` Paolo Bonzini
2015-11-12 17:03     ` Paolo Bonzini
2015-11-13  2:15     ` Takuya Yoshikawa
2015-11-13 10:51       ` Paolo Bonzini
2015-11-18  3:32   ` Xiao Guangrong
2015-11-12 11:57 ` [PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() Takuya Yoshikawa
2015-11-12 12:08 ` [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).