All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work
@ 2015-11-12 11:48 Takuya Yoshikawa
  2015-11-12 11:49 ` [PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map() Takuya Yoshikawa
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:48 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

v1->v2:
  Patch 5 and 7 are added based on Paolo's suggestions.
  Patch 8-10 are new.

Patch 1/2/3/4: no change.
Patch 5: Needed a bit more work than I had expected.
Patch 6: Removed extra comment of v1 (patch 5 made it inappropriate).
Patch 7: As expected, many places needed to be converted.
Patch 8: This is new, but only a small change.

Patch 9: Kind of an RFC (though I have checked it to some extent).
  Following two places need to be carefully checked:
  - in kvm_mmu_get_page: "if (!direct)" block after kvm_mmu_alloc_page()
  - in FNAME(fetch): "if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))" case
Patch 10: Trivial cleanup, assuming that patch 9 is correct.


In summary: patch 1-7 is the result of updating v1 based on the suggestions.
  Although patch 5 does not look so nice than expected, this is the most
  conservative approach, and patch 8-10 try to alleviate the sadness.

  Takuya


Takuya Yoshikawa (10):
  01:  KVM: x86: MMU: Remove unused parameter of __direct_map()
  02:  KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  03:  KVM: x86: MMU: Make mmu_set_spte() return emulate value
  04:  KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
  05:  KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  06:  KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
  07:  KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  08:  KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()
  09:  KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  10:  KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()

 Documentation/virtual/kvm/mmu.txt |   4 +-
 arch/x86/include/asm/kvm_host.h   |   8 +-
 arch/x86/kvm/mmu.c                | 357 ++++++++++++++++++--------------------
 arch/x86/kvm/mmu_audit.c          |  15 +-
 arch/x86/kvm/paging_tmpl.h        |  20 +--
 5 files changed, 196 insertions(+), 208 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
@ 2015-11-12 11:49 ` Takuya Yoshikawa
  2015-11-12 11:50 ` [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap Takuya Yoshikawa
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:49 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e7c2c14..c3bbc82 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 	__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-			int map_writable, int level, gfn_t gfn, pfn_t pfn,
-			bool prefault)
+static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
+			int level, gfn_t gfn, pfn_t pfn, bool prefault)
 {
 	struct kvm_shadow_walk_iterator iterator;
 	struct kvm_mmu_page *sp;
@@ -3018,11 +3017,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
 	make_mmu_pages_available(vcpu);
 	if (likely(!force_pt_level))
 		transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
-	r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
-			 prefault);
+	r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
-
 	return r;
 
 out_unlock:
@@ -3531,8 +3528,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	make_mmu_pages_available(vcpu);
 	if (likely(!force_pt_level))
 		transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
-	r = __direct_map(vcpu, gpa, write, map_writable,
-			 level, gfn, pfn, prefault);
+	r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 	return r;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
  2015-11-12 11:49 ` [PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map() Takuya Yoshikawa
@ 2015-11-12 11:50 ` Takuya Yoshikawa
  2015-11-18  2:44   ` Xiao Guangrong
  2015-11-12 11:51 ` [PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value Takuya Yoshikawa
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:50 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line
code which clears a bit in the unsync child bitmap; the former places it
inside a loop block and uses a few goto statements to jump to it.

A new helper function, clear_unsync_child_bit(), makes the code cleaner.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c3bbc82..f3120aa 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1806,6 +1806,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
 	return (pvec->nr == KVM_PAGE_ARRAY_NR);
 }
 
+static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
+{
+	--sp->unsync_children;
+	WARN_ON((int)sp->unsync_children < 0);
+	__clear_bit(idx, sp->unsync_child_bitmap);
+}
+
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 			   struct kvm_mmu_pages *pvec)
 {
@@ -1815,8 +1822,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 		struct kvm_mmu_page *child;
 		u64 ent = sp->spt[i];
 
-		if (!is_shadow_present_pte(ent) || is_large_pte(ent))
-			goto clear_child_bitmap;
+		if (!is_shadow_present_pte(ent) || is_large_pte(ent)) {
+			clear_unsync_child_bit(sp, i);
+			continue;
+		}
 
 		child = page_header(ent & PT64_BASE_ADDR_MASK);
 
@@ -1825,28 +1834,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 				return -ENOSPC;
 
 			ret = __mmu_unsync_walk(child, pvec);
-			if (!ret)
-				goto clear_child_bitmap;
-			else if (ret > 0)
+			if (!ret) {
+				clear_unsync_child_bit(sp, i);
+				continue;
+			} else if (ret > 0) {
 				nr_unsync_leaf += ret;
-			else
+			} else
 				return ret;
 		} else if (child->unsync) {
 			nr_unsync_leaf++;
 			if (mmu_pages_add(pvec, child, i))
 				return -ENOSPC;
 		} else
-			 goto clear_child_bitmap;
-
-		continue;
-
-clear_child_bitmap:
-		__clear_bit(i, sp->unsync_child_bitmap);
-		sp->unsync_children--;
-		WARN_ON((int)sp->unsync_children < 0);
+			clear_unsync_child_bit(sp, i);
 	}
 
-
 	return nr_unsync_leaf;
 }
 
@@ -2009,9 +2011,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 		if (!sp)
 			return;
 
-		--sp->unsync_children;
-		WARN_ON((int)sp->unsync_children < 0);
-		__clear_bit(idx, sp->unsync_child_bitmap);
+		clear_unsync_child_bit(sp, idx);
 		level++;
 	} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
  2015-11-12 11:49 ` [PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map() Takuya Yoshikawa
  2015-11-12 11:50 ` [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap Takuya Yoshikawa
@ 2015-11-12 11:51 ` Takuya Yoshikawa
  2015-11-12 11:51 ` [PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() Takuya Yoshikawa
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:51 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

mmu_set_spte()'s code is based on the assumption that the emulate
parameter has a valid pointer value if set_spte() returns true and
write_fault is not zero.  In other cases, emulate may be NULL, so a
NULL-check is needed.

Stop passing emulate pointer and make mmu_set_spte() return the emulate
value instead to clean up this complex interface.  Prefetch functions
can just throw away the return value.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c         | 27 ++++++++++++++-------------
 arch/x86/kvm/paging_tmpl.h | 10 +++++-----
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f3120aa..c229356 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2564,13 +2564,13 @@ done:
 	return ret;
 }
 
-static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-			 unsigned pte_access, int write_fault, int *emulate,
-			 int level, gfn_t gfn, pfn_t pfn, bool speculative,
-			 bool host_writable)
+static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
+			 int write_fault, int level, gfn_t gfn, pfn_t pfn,
+			 bool speculative, bool host_writable)
 {
 	int was_rmapped = 0;
 	int rmap_count;
+	bool emulate = false;
 
 	pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 		 *sptep, write_fault, gfn);
@@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
 	      true, host_writable)) {
 		if (write_fault)
-			*emulate = 1;
+			emulate = true;
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 	}
 
-	if (unlikely(is_mmio_spte(*sptep) && emulate))
-		*emulate = 1;
+	if (unlikely(is_mmio_spte(*sptep)))
+		emulate = true;
 
 	pgprintk("%s: setting spte %llx\n", __func__, *sptep);
 	pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n",
@@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	}
 
 	kvm_release_pfn_clean(pfn);
+
+	return emulate;
 }
 
 static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
@@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 		return -1;
 
 	for (i = 0; i < ret; i++, gfn++, start++)
-		mmu_set_spte(vcpu, start, access, 0, NULL,
-			     sp->role.level, gfn, page_to_pfn(pages[i]),
-			     true, true);
+		mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
+			     page_to_pfn(pages[i]), true, true);
 
 	return 0;
 }
@@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
 
 	for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
 		if (iterator.level == level) {
-			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
-				     write, &emulate, level, gfn, pfn,
-				     prefault, map_writable);
+			emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
+					       write, level, gfn, pfn, prefault,
+					       map_writable);
 			direct_pte_prefetch(vcpu, iterator.sptep);
 			++vcpu->stat.pf_fixed;
 			break;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 3058a22..9d21b44 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL,
-		     gfn, pfn, true, true);
+	mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn,
+		     true, true);
 
 	return true;
 }
@@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 	struct kvm_mmu_page *sp = NULL;
 	struct kvm_shadow_walk_iterator it;
 	unsigned direct_access, access = gw->pt_access;
-	int top_level, emulate = 0;
+	int top_level, emulate;
 
 	direct_access = gw->pte_access;
 
@@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 	}
 
 	clear_sp_write_flooding_count(it.sptep);
-	mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, &emulate,
-		     it.level, gw->gfn, pfn, prefault, map_writable);
+	emulate = mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault,
+			       it.level, gw->gfn, pfn, prefault, map_writable);
 	FNAME(pte_prefetch)(vcpu, gw, it.sptep);
 
 	return emulate;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (2 preceding siblings ...)
  2015-11-12 11:51 ` [PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value Takuya Yoshikawa
@ 2015-11-12 11:51 ` Takuya Yoshikawa
  2015-11-12 11:52 ` [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() Takuya Yoshikawa
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:51 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

is_rmap_spte(), originally named is_rmap_pte(), was introduced when the
simple reverse mapping was implemented by commit cd4a4e5374110444
("[PATCH] KVM: MMU: Implement simple reverse mapping").  At that point,
its role was clear and only rmap_add() and rmap_remove() were using it
to select sptes that need to be reverse-mapped.

Independently of that, is_shadow_present_pte() was first introduced by
commit c7addb902054195b ("KVM: Allow not-present guest page faults to
bypass kvm") to do bypass_guest_pf optimization, which does not exist
any more.

These two seem to have changed their roles somewhat, and is_rmap_spte()
just calls is_shadow_present_pte() now.

Since using both of them with no clear distinction just makes the code
confusing, remove is_rmap_spte().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c       | 13 ++++---------
 arch/x86/kvm/mmu_audit.c |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c229356..e8cfdc4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -311,11 +311,6 @@ static int is_large_pte(u64 pte)
 	return pte & PT_PAGE_SIZE_MASK;
 }
 
-static int is_rmap_spte(u64 pte)
-{
-	return is_shadow_present_pte(pte);
-}
-
 static int is_last_spte(u64 pte, int level)
 {
 	if (level == PT_PAGE_TABLE_LEVEL)
@@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 	u64 old_spte = *sptep;
 	bool ret = false;
 
-	WARN_ON(!is_rmap_spte(new_spte));
+	WARN_ON(!is_shadow_present_pte(new_spte));
 
 	if (!is_shadow_present_pte(old_spte)) {
 		mmu_spte_set(sptep, new_spte);
@@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 	else
 		old_spte = __update_clear_spte_slow(sptep, 0ull);
 
-	if (!is_rmap_spte(old_spte))
+	if (!is_shadow_present_pte(old_spte))
 		return 0;
 
 	pfn = spte_to_pfn(old_spte);
@@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
 	pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 		 *sptep, write_fault, gfn);
 
-	if (is_rmap_spte(*sptep)) {
+	if (is_shadow_present_pte(*sptep)) {
 		/*
 		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 		 * the parent of the now unreachable PTE.
@@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 	 * If the mapping has been changed, let the vcpu fault on the
 	 * same address again.
 	 */
-	if (!is_rmap_spte(spte)) {
+	if (!is_shadow_present_pte(spte)) {
 		ret = true;
 		goto exit;
 	}
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 03d518e..90ee420 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp)
 		return;
 
 	for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-		if (!is_rmap_spte(sp->spt[i]))
+		if (!is_shadow_present_pte(sp->spt[i]))
 			continue;
 
 		inspect_spte_has_rmap(kvm, sp->spt + i);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (3 preceding siblings ...)
  2015-11-12 11:51 ` [PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() Takuya Yoshikawa
@ 2015-11-12 11:52 ` Takuya Yoshikawa
  2015-11-13 21:47   ` Marcelo Tosatti
  2015-11-18  3:07   ` Xiao Guangrong
  2015-11-12 11:53 ` [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes Takuya Yoshikawa
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:52 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
nearly the same as the for_each_rmap_spte macro.  The only difference
is that is_shadow_present_pte() checks cannot be placed there because
kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
whose entry is not set yet.

By calling mark_unsync() separately for the parent and adding the parent
pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
works with no problem.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e8cfdc4..1691171 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
 	}
 }
 
-typedef void (*pte_list_walk_fn) (u64 *spte);
-static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
-{
-	struct pte_list_desc *desc;
-	int i;
-
-	if (!*pte_list)
-		return;
-
-	if (!(*pte_list & 1))
-		return fn((u64 *)*pte_list);
-
-	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
-	while (desc) {
-		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
-			fn(desc->sptes[i]);
-		desc = desc->more;
-	}
-}
-
 static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
 				    struct kvm_memory_slot *slot)
 {
@@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 static void mark_unsync(u64 *spte);
 static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 {
-	pte_list_walk(&sp->parent_ptes, mark_unsync);
+	u64 *sptep;
+	struct rmap_iterator iter;
+
+	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
+		mark_unsync(sptep);
+	}
 }
 
 static void mark_unsync(u64 *spte)
@@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
 			break;
 
-		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 		if (sp->unsync_children) {
 			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 			kvm_mmu_mark_parents_unsync(sp);
-		} else if (sp->unsync)
+			if (parent_pte)
+				mark_unsync(parent_pte);
+		} else if (sp->unsync) {
 			kvm_mmu_mark_parents_unsync(sp);
+			if (parent_pte)
+				mark_unsync(parent_pte);
+		}
+		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
 		__clear_sp_write_flooding_count(sp);
 		trace_kvm_mmu_get_page(sp, false);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (4 preceding siblings ...)
  2015-11-12 11:52 ` [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() Takuya Yoshikawa
@ 2015-11-12 11:53 ` Takuya Yoshikawa
  2015-11-13 22:08   ` Marcelo Tosatti
  2015-11-12 11:55 ` [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct Takuya Yoshikawa
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:53 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
placed right after the call to detect unrelated sptes which must not be
found in the reverse-mapping list.

Move this check in rmap_get_first/next() so that all call sites, not
just the users of the for_each_rmap_spte() macro, will be checked the
same way.  In addition, change the BUG_ON to WARN_ON since killing the
whole host is the last thing that KVM should try.

One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
rmap_get_first() to handle parent sptes.  The change will not break it
because parent sptes are present, at least until drop_parent_pte()
actually unlinks them, and not mmio-sptes.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 Documentation/virtual/kvm/mmu.txt |  4 ++--
 arch/x86/kvm/mmu.c                | 26 +++++++++++++++++---------
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index 3a4d681..daf9c0f 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -203,10 +203,10 @@ Shadow pages contain the following information:
     page cannot be destroyed.  See role.invalid.
   parent_ptes:
     The reverse mapping for the pte/ptes pointing at this page's spt. If
-    parent_ptes bit 0 is zero, only one spte points at this pages and
+    parent_ptes bit 0 is zero, only one spte points at this page and
     parent_ptes points at this single spte, otherwise, there exists multiple
     sptes pointing at this page and (parent_ptes & ~0x1) points at a data
-    structure with a list of parent_ptes.
+    structure with a list of parent sptes.
   unsync:
     If true, then the translations in this page may not match the guest's
     translation.  This is equivalent to the state of the tlb when a pte is
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1691171..ee7b101 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1079,17 +1079,23 @@ struct rmap_iterator {
  */
 static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
 {
+	u64 *sptep;
+
 	if (!rmap)
 		return NULL;
 
 	if (!(rmap & 1)) {
 		iter->desc = NULL;
-		return (u64 *)rmap;
+		sptep = (u64 *)rmap;
+		goto out;
 	}
 
 	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
 	iter->pos = 0;
-	return iter->desc->sptes[iter->pos];
+	sptep = iter->desc->sptes[iter->pos];
+out:
+	WARN_ON(!is_shadow_present_pte(*sptep));
+	return sptep;
 }
 
 /*
@@ -1099,14 +1105,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
  */
 static u64 *rmap_get_next(struct rmap_iterator *iter)
 {
+	u64 *sptep;
+
 	if (iter->desc) {
 		if (iter->pos < PTE_LIST_EXT - 1) {
-			u64 *sptep;
-
 			++iter->pos;
 			sptep = iter->desc->sptes[iter->pos];
 			if (sptep)
-				return sptep;
+				goto out;
 		}
 
 		iter->desc = iter->desc->more;
@@ -1114,17 +1120,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
 		if (iter->desc) {
 			iter->pos = 0;
 			/* desc->sptes[0] cannot be NULL */
-			return iter->desc->sptes[iter->pos];
+			sptep = iter->desc->sptes[iter->pos];
+			goto out;
 		}
 	}
 
 	return NULL;
+out:
+	WARN_ON(!is_shadow_present_pte(*sptep));
+	return sptep;
 }
 
 #define for_each_rmap_spte(_rmap_, _iter_, _spte_)			    \
 	   for (_spte_ = rmap_get_first(*_rmap_, _iter_);		    \
-		_spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;});  \
-			_spte_ = rmap_get_next(_iter_))
+		_spte_; _spte_ = rmap_get_next(_iter_))
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
@@ -1338,7 +1347,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
 	bool flush = false;
 
 	while ((sptep = rmap_get_first(*rmapp, &iter))) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
 		rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
 		drop_spte(kvm, sptep);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (5 preceding siblings ...)
  2015-11-12 11:53 ` [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes Takuya Yoshikawa
@ 2015-11-12 11:55 ` Takuya Yoshikawa
  2015-11-18  3:21   ` Xiao Guangrong
  2015-11-12 11:55 ` [PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() Takuya Yoshikawa
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:55 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

New struct kvm_rmap_head makes the code type-safe to some extent.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/include/asm/kvm_host.h |   8 +-
 arch/x86/kvm/mmu.c              | 169 +++++++++++++++++++++-------------------
 arch/x86/kvm/mmu_audit.c        |  13 ++--
 3 files changed, 100 insertions(+), 90 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0535359..c5a0c4a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -214,6 +214,10 @@ union kvm_mmu_page_role {
 	};
 };
 
+struct kvm_rmap_head {
+	unsigned long val;
+};
+
 struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
@@ -231,7 +235,7 @@ struct kvm_mmu_page {
 	bool unsync;
 	int root_count;          /* Currently serving as active root */
 	unsigned int unsync_children;
-	unsigned long parent_ptes;	/* Reverse mapping for parent_pte */
+	struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
 
 	/* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen.  */
 	unsigned long mmu_valid_gen;
@@ -604,7 +608,7 @@ struct kvm_lpage_info {
 };
 
 struct kvm_arch_memory_slot {
-	unsigned long *rmap[KVM_NR_PAGE_SIZES];
+	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 };
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ee7b101..85f4bbd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -916,24 +916,24 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
  *
  */
 static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
-			unsigned long *pte_list)
+			struct kvm_rmap_head *rmap_head)
 {
 	struct pte_list_desc *desc;
 	int i, count = 0;
 
-	if (!*pte_list) {
+	if (!rmap_head->val) {
 		rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte);
-		*pte_list = (unsigned long)spte;
-	} else if (!(*pte_list & 1)) {
+		rmap_head->val = (unsigned long)spte;
+	} else if (!(rmap_head->val & 1)) {
 		rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte);
 		desc = mmu_alloc_pte_list_desc(vcpu);
-		desc->sptes[0] = (u64 *)*pte_list;
+		desc->sptes[0] = (u64 *)rmap_head->val;
 		desc->sptes[1] = spte;
-		*pte_list = (unsigned long)desc | 1;
+		rmap_head->val = (unsigned long)desc | 1;
 		++count;
 	} else {
 		rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte);
-		desc = (struct pte_list_desc *)(*pte_list & ~1ul);
+		desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
 		while (desc->sptes[PTE_LIST_EXT-1] && desc->more) {
 			desc = desc->more;
 			count += PTE_LIST_EXT;
@@ -950,8 +950,9 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
 }
 
 static void
-pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
-			   int i, struct pte_list_desc *prev_desc)
+pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
+			   struct pte_list_desc *desc, int i,
+			   struct pte_list_desc *prev_desc)
 {
 	int j;
 
@@ -962,43 +963,43 @@ pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
 	if (j != 0)
 		return;
 	if (!prev_desc && !desc->more)
-		*pte_list = (unsigned long)desc->sptes[0];
+		rmap_head->val = (unsigned long)desc->sptes[0];
 	else
 		if (prev_desc)
 			prev_desc->more = desc->more;
 		else
-			*pte_list = (unsigned long)desc->more | 1;
+			rmap_head->val = (unsigned long)desc->more | 1;
 	mmu_free_pte_list_desc(desc);
 }
 
-static void pte_list_remove(u64 *spte, unsigned long *pte_list)
+static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
 {
 	struct pte_list_desc *desc;
 	struct pte_list_desc *prev_desc;
 	int i;
 
-	if (!*pte_list) {
+	if (!rmap_head->val) {
 		printk(KERN_ERR "pte_list_remove: %p 0->BUG\n", spte);
 		BUG();
-	} else if (!(*pte_list & 1)) {
+	} else if (!(rmap_head->val & 1)) {
 		rmap_printk("pte_list_remove:  %p 1->0\n", spte);
-		if ((u64 *)*pte_list != spte) {
+		if ((u64 *)rmap_head->val != spte) {
 			printk(KERN_ERR "pte_list_remove:  %p 1->BUG\n", spte);
 			BUG();
 		}
-		*pte_list = 0;
+		rmap_head->val = 0;
 	} else {
 		rmap_printk("pte_list_remove:  %p many->many\n", spte);
-		desc = (struct pte_list_desc *)(*pte_list & ~1ul);
+		desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
 		prev_desc = NULL;
 		while (desc) {
-			for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
+			for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) {
 				if (desc->sptes[i] == spte) {
-					pte_list_desc_remove_entry(pte_list,
-							       desc, i,
-							       prev_desc);
+					pte_list_desc_remove_entry(rmap_head,
+							desc, i, prev_desc);
 					return;
 				}
+			}
 			prev_desc = desc;
 			desc = desc->more;
 		}
@@ -1007,8 +1008,8 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
 	}
 }
 
-static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
-				    struct kvm_memory_slot *slot)
+static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
+					   struct kvm_memory_slot *slot)
 {
 	unsigned long idx;
 
@@ -1016,10 +1017,8 @@ static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
 	return &slot->arch.rmap[level - PT_PAGE_TABLE_LEVEL][idx];
 }
 
-/*
- * Take gfn and return the reverse mapping to it.
- */
-static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, struct kvm_mmu_page *sp)
+static struct kvm_rmap_head *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
+					 struct kvm_mmu_page *sp)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *slot;
@@ -1040,24 +1039,24 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 {
 	struct kvm_mmu_page *sp;
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 
 	sp = page_header(__pa(spte));
 	kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
-	rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp);
-	return pte_list_add(vcpu, spte, rmapp);
+	rmap_head = gfn_to_rmap(vcpu->kvm, gfn, sp);
+	return pte_list_add(vcpu, spte, rmap_head);
 }
 
 static void rmap_remove(struct kvm *kvm, u64 *spte)
 {
 	struct kvm_mmu_page *sp;
 	gfn_t gfn;
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 
 	sp = page_header(__pa(spte));
 	gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
-	rmapp = gfn_to_rmap(kvm, gfn, sp);
-	pte_list_remove(spte, rmapp);
+	rmap_head = gfn_to_rmap(kvm, gfn, sp);
+	pte_list_remove(spte, rmap_head);
 }
 
 /*
@@ -1077,20 +1076,21 @@ struct rmap_iterator {
  *
  * Returns sptep if found, NULL otherwise.
  */
-static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
+static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
+			   struct rmap_iterator *iter)
 {
 	u64 *sptep;
 
-	if (!rmap)
+	if (!rmap_head->val)
 		return NULL;
 
-	if (!(rmap & 1)) {
+	if (!(rmap_head->val & 1)) {
 		iter->desc = NULL;
-		sptep = (u64 *)rmap;
+		sptep = (u64 *)rmap_head->val;
 		goto out;
 	}
 
-	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
+	iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
 	iter->pos = 0;
 	sptep = iter->desc->sptes[iter->pos];
 out:
@@ -1131,9 +1131,9 @@ out:
 	return sptep;
 }
 
-#define for_each_rmap_spte(_rmap_, _iter_, _spte_)			    \
-	   for (_spte_ = rmap_get_first(*_rmap_, _iter_);		    \
-		_spte_; _spte_ = rmap_get_next(_iter_))
+#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_)		\
+	for (_spte_ = rmap_get_first(_rmap_head_, _iter_);	\
+	     _spte_; _spte_ = rmap_get_next(_iter_))
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
@@ -1191,14 +1191,15 @@ static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect)
 	return mmu_spte_update(sptep, spte);
 }
 
-static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
+static bool __rmap_write_protect(struct kvm *kvm,
+				 struct kvm_rmap_head *rmap_head,
 				 bool pt_protect)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep)
 		flush |= spte_write_protect(kvm, sptep, pt_protect);
 
 	return flush;
@@ -1215,13 +1216,13 @@ static bool spte_clear_dirty(struct kvm *kvm, u64 *sptep)
 	return mmu_spte_update(sptep, spte);
 }
 
-static bool __rmap_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
+static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep)
 		flush |= spte_clear_dirty(kvm, sptep);
 
 	return flush;
@@ -1238,13 +1239,13 @@ static bool spte_set_dirty(struct kvm *kvm, u64 *sptep)
 	return mmu_spte_update(sptep, spte);
 }
 
-static bool __rmap_set_dirty(struct kvm *kvm, unsigned long *rmapp)
+static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep)
 		flush |= spte_set_dirty(kvm, sptep);
 
 	return flush;
@@ -1264,12 +1265,12 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 				     struct kvm_memory_slot *slot,
 				     gfn_t gfn_offset, unsigned long mask)
 {
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 
 	while (mask) {
-		rmapp = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
-				      PT_PAGE_TABLE_LEVEL, slot);
-		__rmap_write_protect(kvm, rmapp, false);
+		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+					  PT_PAGE_TABLE_LEVEL, slot);
+		__rmap_write_protect(kvm, rmap_head, false);
 
 		/* clear the first set bit */
 		mask &= mask - 1;
@@ -1289,12 +1290,12 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 				     struct kvm_memory_slot *slot,
 				     gfn_t gfn_offset, unsigned long mask)
 {
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 
 	while (mask) {
-		rmapp = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
-				      PT_PAGE_TABLE_LEVEL, slot);
-		__rmap_clear_dirty(kvm, rmapp);
+		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+					  PT_PAGE_TABLE_LEVEL, slot);
+		__rmap_clear_dirty(kvm, rmap_head);
 
 		/* clear the first set bit */
 		mask &= mask - 1;
@@ -1326,27 +1327,27 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
 {
 	struct kvm_memory_slot *slot;
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 	int i;
 	bool write_protected = false;
 
 	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 
 	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
-		rmapp = __gfn_to_rmap(gfn, i, slot);
-		write_protected |= __rmap_write_protect(vcpu->kvm, rmapp, true);
+		rmap_head = __gfn_to_rmap(gfn, i, slot);
+		write_protected |= __rmap_write_protect(vcpu->kvm, rmap_head, true);
 	}
 
 	return write_protected;
 }
 
-static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
+static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	while ((sptep = rmap_get_first(*rmapp, &iter))) {
+	while ((sptep = rmap_get_first(rmap_head, &iter))) {
 		rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
 		drop_spte(kvm, sptep);
@@ -1356,14 +1357,14 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
 	return flush;
 }
 
-static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_unmap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			   struct kvm_memory_slot *slot, gfn_t gfn, int level,
 			   unsigned long data)
 {
-	return kvm_zap_rmapp(kvm, rmapp);
+	return kvm_zap_rmapp(kvm, rmap_head);
 }
 
-static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			     struct kvm_memory_slot *slot, gfn_t gfn, int level,
 			     unsigned long data)
 {
@@ -1378,7 +1379,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	new_pfn = pte_pfn(*ptep);
 
 restart:
-	for_each_rmap_spte(rmapp, &iter, sptep) {
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		rmap_printk("kvm_set_pte_rmapp: spte %p %llx gfn %llx (%d)\n",
 			     sptep, *sptep, gfn, level);
 
@@ -1416,11 +1417,11 @@ struct slot_rmap_walk_iterator {
 
 	/* output fields. */
 	gfn_t gfn;
-	unsigned long *rmap;
+	struct kvm_rmap_head *rmap;
 	int level;
 
 	/* private field. */
-	unsigned long *end_rmap;
+	struct kvm_rmap_head *end_rmap;
 };
 
 static void
@@ -1479,7 +1480,7 @@ static int kvm_handle_hva_range(struct kvm *kvm,
 				unsigned long end,
 				unsigned long data,
 				int (*handler)(struct kvm *kvm,
-					       unsigned long *rmapp,
+					       struct kvm_rmap_head *rmap_head,
 					       struct kvm_memory_slot *slot,
 					       gfn_t gfn,
 					       int level,
@@ -1523,7 +1524,8 @@ static int kvm_handle_hva_range(struct kvm *kvm,
 
 static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 			  unsigned long data,
-			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
+			  int (*handler)(struct kvm *kvm,
+					 struct kvm_rmap_head *rmap_head,
 					 struct kvm_memory_slot *slot,
 					 gfn_t gfn, int level,
 					 unsigned long data))
@@ -1546,7 +1548,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
 }
 
-static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			 struct kvm_memory_slot *slot, gfn_t gfn, int level,
 			 unsigned long data)
 {
@@ -1556,18 +1558,19 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 
 	BUG_ON(!shadow_accessed_mask);
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		if (*sptep & shadow_accessed_mask) {
 			young = 1;
 			clear_bit((ffs(shadow_accessed_mask) - 1),
 				 (unsigned long *)sptep);
 		}
+	}
 
 	trace_kvm_age_page(gfn, level, slot, young);
 	return young;
 }
 
-static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
+static int kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			      struct kvm_memory_slot *slot, gfn_t gfn,
 			      int level, unsigned long data)
 {
@@ -1583,11 +1586,12 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	if (!shadow_accessed_mask)
 		goto out;
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		if (*sptep & shadow_accessed_mask) {
 			young = 1;
 			break;
 		}
+	}
 out:
 	return young;
 }
@@ -1596,14 +1600,14 @@ out:
 
 static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 {
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 	struct kvm_mmu_page *sp;
 
 	sp = page_header(__pa(spte));
 
-	rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp);
+	rmap_head = gfn_to_rmap(vcpu->kvm, gfn, sp);
 
-	kvm_unmap_rmapp(vcpu->kvm, rmapp, NULL, gfn, sp->role.level, 0);
+	kvm_unmap_rmapp(vcpu->kvm, rmap_head, NULL, gfn, sp->role.level, 0);
 	kvm_flush_remote_tlbs(vcpu->kvm);
 }
 
@@ -1720,7 +1724,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 	 * this feature. See the comments in kvm_zap_obsolete_pages().
 	 */
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
-	sp->parent_ptes = 0;
+	sp->parent_ptes.val = 0;
 	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
 	return sp;
@@ -2273,7 +2277,7 @@ static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	while ((sptep = rmap_get_first(sp->parent_ptes, &iter)))
+	while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
 		drop_parent_pte(sp, sptep);
 }
 
@@ -4485,7 +4489,7 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
 }
 
 /* The return value indicates if tlb flush on all vcpus is needed. */
-typedef bool (*slot_level_handler) (struct kvm *kvm, unsigned long *rmap);
+typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
 
 /* The caller should hold mmu-lock before calling this function. */
 static bool
@@ -4579,9 +4583,10 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	spin_unlock(&kvm->mmu_lock);
 }
 
-static bool slot_rmap_write_protect(struct kvm *kvm, unsigned long *rmapp)
+static bool slot_rmap_write_protect(struct kvm *kvm,
+				    struct kvm_rmap_head *rmap_head)
 {
-	return __rmap_write_protect(kvm, rmapp, false);
+	return __rmap_write_protect(kvm, rmap_head, false);
 }
 
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
@@ -4617,7 +4622,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 }
 
 static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
-		unsigned long *rmapp)
+					 struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -4626,7 +4631,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 	struct kvm_mmu_page *sp;
 
 restart:
-	for_each_rmap_spte(rmapp, &iter, sptep) {
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		sp = page_header(__pa(sptep));
 		pfn = spte_to_pfn(*sptep);
 
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 90ee420..1cee3ec 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -129,7 +129,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
 {
 	static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 	struct kvm_mmu_page *rev_sp;
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *slot;
@@ -150,8 +150,8 @@ static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
 		return;
 	}
 
-	rmapp = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
-	if (!*rmapp) {
+	rmap_head = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
+	if (!rmap_head->val) {
 		if (!__ratelimit(&ratelimit_state))
 			return;
 		audit_printk(kvm, "no rmap for writable spte %llx\n",
@@ -192,7 +192,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	unsigned long *rmapp;
+	struct kvm_rmap_head *rmap_head;
 	u64 *sptep;
 	struct rmap_iterator iter;
 	struct kvm_memslots *slots;
@@ -203,13 +203,14 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, sp->gfn);
-	rmapp = __gfn_to_rmap(sp->gfn, PT_PAGE_TABLE_LEVEL, slot);
+	rmap_head = __gfn_to_rmap(sp->gfn, PT_PAGE_TABLE_LEVEL, slot);
 
-	for_each_rmap_spte(rmapp, &iter, sptep)
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		if (is_writable_pte(*sptep))
 			audit_printk(kvm, "shadow page has writable "
 				     "mappings: gfn %llx role %x\n",
 				     sp->gfn, sp->role.word);
+	}
 }
 
 static void audit_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (6 preceding siblings ...)
  2015-11-12 11:55 ` [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct Takuya Yoshikawa
@ 2015-11-12 11:55 ` Takuya Yoshikawa
  2015-11-12 11:56 ` [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() Takuya Yoshikawa
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:55 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

Make kvm_mmu_alloc_page() do just what its name tells to do, and remove
the extra error check at its call site since the allocation cannot fail.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 85f4bbd..9273cd4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1707,8 +1707,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
 	mmu_spte_clear_no_track(parent_pte);
 }
 
-static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
-					       u64 *parent_pte, int direct)
+static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
 {
 	struct kvm_mmu_page *sp;
 
@@ -1724,8 +1723,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 	 * this feature. See the comments in kvm_zap_obsolete_pages().
 	 */
 	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
-	sp->parent_ptes.val = 0;
-	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 	kvm_mod_used_mmu_pages(vcpu->kvm, +1);
 	return sp;
 }
@@ -2124,10 +2121,14 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		trace_kvm_mmu_get_page(sp, false);
 		return sp;
 	}
+
 	++vcpu->kvm->stat.mmu_cache_miss;
-	sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
-	if (!sp)
-		return sp;
+
+	sp = kvm_mmu_alloc_page(vcpu, direct);
+
+	sp->parent_ptes.val = 0;
+	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
+
 	sp->gfn = gfn;
 	sp->role = role;
 	hlist_add_head(&sp->hash_link,
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (7 preceding siblings ...)
  2015-11-12 11:55 ` [PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() Takuya Yoshikawa
@ 2015-11-12 11:56 ` Takuya Yoshikawa
  2015-11-12 14:27   ` Paolo Bonzini
  2015-11-18  3:32   ` Xiao Guangrong
  2015-11-12 11:57 ` [PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() Takuya Yoshikawa
  2015-11-12 12:08 ` [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Paolo Bonzini
  10 siblings, 2 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:56 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

Every time kvm_mmu_get_page() is called with a non-NULL parent_pte
argument, link_shadow_page() follows that to set the parent entry so
that the new mapping will point to the returned page table.

Moving parent_pte handling there allows to clean up the code because
parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and
mmu_page_add_parent_pte().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c         | 21 ++++++++-------------
 arch/x86/kvm/paging_tmpl.h |  6 ++----
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9273cd4..33fe720 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2108,14 +2108,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (sp->unsync_children) {
 			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 			kvm_mmu_mark_parents_unsync(sp);
-			if (parent_pte)
-				mark_unsync(parent_pte);
 		} else if (sp->unsync) {
 			kvm_mmu_mark_parents_unsync(sp);
-			if (parent_pte)
-				mark_unsync(parent_pte);
 		}
-		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
 		__clear_sp_write_flooding_count(sp);
 		trace_kvm_mmu_get_page(sp, false);
@@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	sp = kvm_mmu_alloc_page(vcpu, direct);
 
 	sp->parent_ptes.val = 0;
-	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
 	sp->gfn = gfn;
 	sp->role = role;
@@ -2196,7 +2190,8 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
 	return __shadow_walk_next(iterator, *iterator->sptep);
 }
 
-static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed)
+static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+			     struct kvm_mmu_page *sp, bool accessed)
 {
 	u64 spte;
 
@@ -2210,6 +2205,11 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed)
 		spte |= shadow_accessed_mask;
 
 	mmu_spte_set(sptep, spte);
+
+	if (sp->unsync_children || sp->unsync)
+		mark_unsync(sptep);
+
+	mmu_page_add_parent_pte(vcpu, sp, sptep);
 }
 
 static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
@@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
 		mmu_page_zap_pte(kvm, sp, sp->spt + i);
 }
 
-static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-	mmu_page_remove_parent_pte(sp, parent_pte);
-}
-
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	u64 *sptep;
@@ -2738,7 +2733,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
 					      iterator.level - 1,
 					      1, ACC_ALL, iterator.sptep);
 
-			link_shadow_page(iterator.sptep, sp, true);
+			link_shadow_page(vcpu, iterator.sptep, sp, true);
 		}
 	}
 	return emulate;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 9d21b44..f414ca6 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			goto out_gpte_changed;
 
 		if (sp)
-			link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
+			link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
 	}
 
 	for (;
@@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 
 		sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
 				      true, direct_access, it.sptep);
-		link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
+		link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
 	}
 
 	clear_sp_write_flooding_count(it.sptep);
@@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 	return emulate;
 
 out_gpte_changed:
-	if (sp)
-		kvm_mmu_put_page(sp, it.sptep);
 	kvm_release_pfn_clean(pfn);
 	return 0;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (8 preceding siblings ...)
  2015-11-12 11:56 ` [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() Takuya Yoshikawa
@ 2015-11-12 11:57 ` Takuya Yoshikawa
  2015-11-12 12:08 ` [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Paolo Bonzini
  10 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-12 11:57 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c         | 20 +++++++-------------
 arch/x86/kvm/paging_tmpl.h |  4 ++--
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 33fe720..101e77d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2072,8 +2072,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 					     gva_t gaddr,
 					     unsigned level,
 					     int direct,
-					     unsigned access,
-					     u64 *parent_pte)
+					     unsigned access)
 {
 	union kvm_mmu_page_role role;
 	unsigned quadrant;
@@ -2730,8 +2729,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
 			base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
 			pseudo_gfn = base_addr >> PAGE_SHIFT;
 			sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
-					      iterator.level - 1,
-					      1, ACC_ALL, iterator.sptep);
+					      iterator.level - 1, 1, ACC_ALL);
 
 			link_shadow_page(vcpu, iterator.sptep, sp, true);
 		}
@@ -3088,8 +3086,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
 		spin_lock(&vcpu->kvm->mmu_lock);
 		make_mmu_pages_available(vcpu);
-		sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL,
-				      1, ACC_ALL, NULL);
+		sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
 		vcpu->arch.mmu.root_hpa = __pa(sp->spt);
@@ -3101,9 +3098,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 			spin_lock(&vcpu->kvm->mmu_lock);
 			make_mmu_pages_available(vcpu);
 			sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT),
-					      i << 30,
-					      PT32_ROOT_LEVEL, 1, ACC_ALL,
-					      NULL);
+					i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
 			root = __pa(sp->spt);
 			++sp->root_count;
 			spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3140,7 +3135,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		spin_lock(&vcpu->kvm->mmu_lock);
 		make_mmu_pages_available(vcpu);
 		sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
-				      0, ACC_ALL, NULL);
+				      0, ACC_ALL);
 		root = __pa(sp->spt);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3173,9 +3168,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		}
 		spin_lock(&vcpu->kvm->mmu_lock);
 		make_mmu_pages_available(vcpu);
-		sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
-				      PT32_ROOT_LEVEL, 0,
-				      ACC_ALL, NULL);
+		sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL,
+				      0, ACC_ALL);
 		root = __pa(sp->spt);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f414ca6..ee9d211 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 		if (!is_shadow_present_pte(*it.sptep)) {
 			table_gfn = gw->table_gfn[it.level - 2];
 			sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
-					      false, access, it.sptep);
+					      false, access);
 		}
 
 		/*
@@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 		direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 
 		sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
-				      true, direct_access, it.sptep);
+				      true, direct_access);
 		link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
 	}
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work
  2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
                   ` (9 preceding siblings ...)
  2015-11-12 11:57 ` [PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() Takuya Yoshikawa
@ 2015-11-12 12:08 ` Paolo Bonzini
  10 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-12 12:08 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, linux-kernel, Marcelo Tosatti



On 12/11/2015 12:48, Takuya Yoshikawa wrote:
> v1->v2:
>   Patch 5 and 7 are added based on Paolo's suggestions.
>   Patch 8-10 are new.
> 
> Patch 1/2/3/4: no change.
> Patch 5: Needed a bit more work than I had expected.
> Patch 6: Removed extra comment of v1 (patch 5 made it inappropriate).
> Patch 7: As expected, many places needed to be converted.
> Patch 8: This is new, but only a small change.
> 
> Patch 9: Kind of an RFC (though I have checked it to some extent).
>   Following two places need to be carefully checked:
>   - in kvm_mmu_get_page: "if (!direct)" block after kvm_mmu_alloc_page()
>   - in FNAME(fetch): "if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))" case
> Patch 10: Trivial cleanup, assuming that patch 9 is correct.
> 
> 
> In summary: patch 1-7 is the result of updating v1 based on the suggestions.
>   Although patch 5 does not look so nice than expected, this is the most
>   conservative approach, and patch 8-10 try to alleviate the sadness.

If it works, it's actually better than what we have now.  I'll review it
in a few days.

Marcelo, can you look at this as well?  You are still king of MMU. :)

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-12 11:56 ` [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() Takuya Yoshikawa
@ 2015-11-12 14:27   ` Paolo Bonzini
  2015-11-12 17:03     ` Paolo Bonzini
  2015-11-13  2:15     ` Takuya Yoshikawa
  2015-11-18  3:32   ` Xiao Guangrong
  1 sibling, 2 replies; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-12 14:27 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, linux-kernel



On 12/11/2015 12:56, Takuya Yoshikawa wrote:
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 9d21b44..f414ca6 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>  			goto out_gpte_changed;
>  
>  		if (sp)
> -			link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
> +			link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
>  	}
>  

Here I think you can remove completely the

	if (sp)
		kvm_mmu_put_page(sp, it.sptep);

later in FNAME(fetch).  Apart from this nit, it's okay.

On to kvm_mmu_get_page...

        if (!direct) {
                if (rmap_write_protect(vcpu, gfn))
                        kvm_flush_remote_tlbs(vcpu->kvm);
                if (level > PT_PAGE_TABLE_LEVEL && need_sync)
                        kvm_sync_pages(vcpu, gfn);

This seems fishy.

need_sync is set if sp->unsync, but then the parents have not been
unsynced yet.

On the other hand, all calls to kvm_mmu_get_page except for the
roots are followed by link_shadow_page...  Perhaps if parent_pte != NULL
you can call link_shadow_page directly from kvm_mmu_get_page.  The call
would go before the "if (!direct)" and it would subsume all the existing
calls.

We could probably also warn if

	(parent_pte == NULL)
		!= (level == vcpu->arch.mmu.root_level)

in kvm_mmu_get_page.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-12 14:27   ` Paolo Bonzini
@ 2015-11-12 17:03     ` Paolo Bonzini
  2015-11-13  2:15     ` Takuya Yoshikawa
  1 sibling, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-12 17:03 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, linux-kernel



On 12/11/2015 15:27, Paolo Bonzini wrote:
> Here I think you can remove completely the
> 
> 	if (sp)
> 		kvm_mmu_put_page(sp, it.sptep);
> 
> later in FNAME(fetch).  Apart from this nit, it's okay.

Removing this is of course not possible anymore if the other suggestion
works out.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-12 14:27   ` Paolo Bonzini
  2015-11-12 17:03     ` Paolo Bonzini
@ 2015-11-13  2:15     ` Takuya Yoshikawa
  2015-11-13 10:51       ` Paolo Bonzini
  1 sibling, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-13  2:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel

On 2015/11/12 23:27, Paolo Bonzini wrote:

> On 12/11/2015 12:56, Takuya Yoshikawa wrote:
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 9d21b44..f414ca6 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>>   			goto out_gpte_changed;
>>
>>   		if (sp)
>> -			link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
>> +			link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
>>   	}
>>
>
> Here I think you can remove completely the
>
> 	if (sp)
> 		kvm_mmu_put_page(sp, it.sptep);
>
> later in FNAME(fetch).  Apart from this nit, it's okay.

Yes, that's what this patch does below:

> @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>  	return emulate;
>
>  out_gpte_changed:
> -	if (sp)
> -		kvm_mmu_put_page(sp, it.sptep);
>  	kvm_release_pfn_clean(pfn);
>  	return 0;
>  }

Since this is the only user of kvm_mmu_put_page(), it also removes
the definition:

> @@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
>  		mmu_page_zap_pte(kvm, sp, sp->spt + i);
>  }
>
> -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
> -{
> -	mmu_page_remove_parent_pte(sp, parent_pte);
> -}
> -
>  static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>  	u64 *sptep;

Actually, I don't understand why this is named kvm_mmu_put_page() for
just removing parent_pte pointer from the sp->parent_ptes pointer chain.


> On to kvm_mmu_get_page...
>
>          if (!direct) {
>                  if (rmap_write_protect(vcpu, gfn))
>                          kvm_flush_remote_tlbs(vcpu->kvm);
>                  if (level > PT_PAGE_TABLE_LEVEL && need_sync)
>                          kvm_sync_pages(vcpu, gfn);
>
> This seems fishy.
>
> need_sync is set if sp->unsync, but then the parents have not been
> unsynced yet.

Reaching here means that kvm_mmu_get_page() could not return sp
from inside the for_each_gfn_sp() loop above, so even without
this patch, mark_unsync() has not been called.

Here, sp holds the new page allocated by kvm_mmu_alloc_page().
One confusing thing is that hlist_add_head() right before this
"if (!direct)" line has already added the new sp to the hash
list, so it will be found by for_each_gfn_indirect_valid_sp()
in kvm_sync_pages().

Because this sp is new and sp->unsync is not set,  kvm_sync_pages()
will just skip it and look for other sp's whose ->unsync were found
to be set in the for_each_gfn_sp() loop.

I'm not 100% sure if the existence of the parent_pte pointer in the
newly created sp->parent_ptes chain alone makes any difference:
> @@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>  	sp = kvm_mmu_alloc_page(vcpu, direct);
>
>  	sp->parent_ptes.val = 0;
> -	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>
>  	sp->gfn = gfn;
>  	sp->role = role;


> On the other hand, all calls to kvm_mmu_get_page except for the
> roots are followed by link_shadow_page...  Perhaps if parent_pte != NULL
> you can call link_shadow_page directly from kvm_mmu_get_page.  The call
> would go before the "if (!direct)" and it would subsume all the existing
> calls.
>
> We could probably also warn if
>
> 	(parent_pte == NULL)
> 		!= (level == vcpu->arch.mmu.root_level)
>
> in kvm_mmu_get_page.

I think we should set the spte after init_shadow_page_table(), and
to make this subsume all the existing calls, we need to change the
"return sp;" in the for_each_gfn_sp() loop to a goto statement so
that the end of this function will become something like this:

     init_shadow_page(sp);
out:
     if (parent_pte) {
         mmu_page_add_parent_pte(vcpu, sp, parent_pte);
         link_shadow_page(parent_pte, sp, accessed);
     }
     trace_kvm_mmu_get_page(sp, created);
     return sp;

So, "bool accessed" needs to be passed to kvm_mmu_get_page().
But any way, we need to understand if mmu_page_add_parent_pte()
really needs to be placed before the "if (!direct)" block.

   Takuya



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-13  2:15     ` Takuya Yoshikawa
@ 2015-11-13 10:51       ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-13 10:51 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: kvm, linux-kernel



On 13/11/2015 03:15, Takuya Yoshikawa wrote:
> Actually, I don't understand why this is named kvm_mmu_put_page() for
> just removing parent_pte pointer from the sp->parent_ptes pointer chain.

Because it undoes kvm_mmu_get_page, I guess. :)

> 
>> On to kvm_mmu_get_page...
>>
>>          if (!direct) {
>>                  if (rmap_write_protect(vcpu, gfn))
>>                          kvm_flush_remote_tlbs(vcpu->kvm);
>>                  if (level > PT_PAGE_TABLE_LEVEL && need_sync)
>>                          kvm_sync_pages(vcpu, gfn);
>>
>> This seems fishy.
>>
>> need_sync is set if sp->unsync, but then the parents have not been
>> unsynced yet.
> 
> Reaching here means that kvm_mmu_get_page() could not return sp
> from inside the for_each_gfn_sp() loop above, so even without
> this patch, mark_unsync() has not been called.

You're right.

> Here, sp holds the new page allocated by kvm_mmu_alloc_page().
> One confusing thing is that hlist_add_head() right before this
> "if (!direct)" line has already added the new sp to the hash
> list, so it will be found by for_each_gfn_indirect_valid_sp()
> in kvm_sync_pages().
> 
> Because this sp is new and sp->unsync is not set,  kvm_sync_pages()
> will just skip it and look for other sp's whose ->unsync were found
> to be set in the for_each_gfn_sp() loop.
> 
> I'm not 100% sure if the existence of the parent_pte pointer in the
> newly created sp->parent_ptes chain alone makes any difference:

No, I don't think so.  Nothing needs the parent_ptes at this point:

- kvm_mmu_mark_parents_unsync, even in the existing code, it's called
before the new SPTE is created.

- as you said, kvm_mmu_prepare_zap_page can be called by kvm_sync_pages
but it will not operate on this page because its ->unsync is zero.

> So, "bool accessed" needs to be passed to kvm_mmu_get_page(). 

The "bool accessed" parameter is not necessary, I think.  It is only
false in the nested EPT case, and there's no reason not to set the
accessed bit *in the shadow page* if the host supports EPT
accessed/dirty bits.  I'll test and send a patch to remove the argument.

> But any way, we need to understand if mmu_page_add_parent_pte()
> really needs to be placed before the "if (!direct)" block.

No, I don't think so anymore.

I think these patches are fine as a starting point for further cleanups,
I'll push them to kvm/queue very soon.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-12 11:52 ` [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() Takuya Yoshikawa
@ 2015-11-13 21:47   ` Marcelo Tosatti
  2015-11-14  9:20     ` Marcelo Tosatti
  2015-11-18  3:07   ` Xiao Guangrong
  1 sibling, 1 reply; 31+ messages in thread
From: Marcelo Tosatti @ 2015-11-13 21:47 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: pbonzini, kvm, linux-kernel

On Thu, Nov 12, 2015 at 08:52:45PM +0900, Takuya Yoshikawa wrote:
> kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
> nearly the same as the for_each_rmap_spte macro.  The only difference
> is that is_shadow_present_pte() checks cannot be placed there because
> kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
> whose entry is not set yet.
> 
> By calling mark_unsync() separately for the parent and adding the parent
> pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
> works with no problem.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> ---
>  arch/x86/kvm/mmu.c | 36 +++++++++++++-----------------------
>  1 file changed, 13 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index e8cfdc4..1691171 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
>  	}
>  }
>  
> -typedef void (*pte_list_walk_fn) (u64 *spte);
> -static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
> -{
> -	struct pte_list_desc *desc;
> -	int i;
> -
> -	if (!*pte_list)
> -		return;
> -
> -	if (!(*pte_list & 1))
> -		return fn((u64 *)*pte_list);
> -
> -	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
> -	while (desc) {
> -		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
> -			fn(desc->sptes[i]);
> -		desc = desc->more;
> -	}
> -}
> -
>  static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
>  				    struct kvm_memory_slot *slot)
>  {
> @@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>  static void mark_unsync(u64 *spte);
>  static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
>  {
> -	pte_list_walk(&sp->parent_ptes, mark_unsync);
> +	u64 *sptep;
> +	struct rmap_iterator iter;
> +
> +	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> +		mark_unsync(sptep);
> +	}
>  }
>  
>  static void mark_unsync(u64 *spte)
> @@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,

Faulting a spte, and one of the levels of sptes, either


		spte-1
		spte-2
		spte-3

has present bit clear. So we're searching for a guest page to shadow, with
gfn "gfn".

>  		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
>  			break;

If a shadow for gfn exists, but is unsync, sync guest-page ---to--> kvm
sptes.

> -		mmu_page_add_parent_pte(vcpu, sp, parent_pte);

add "gfn" (actually its "struct kvm_mmu_page *sp" pointer) to
the parent.
>  		if (sp->unsync_children) {
>  			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>  			kvm_mmu_mark_parents_unsync(sp);

kvm_mmu_mark_parents_unsync relied on the links from current level all
the way to top level to mark all levels unsync, so that on guest entry,
KVM_REQ_MMU_SYNC is processed and any level is brought from guest -->
kvm pages. This now fails, because you removed "mmu_page_add_parent_pte"
(the link is not formed all the way to root).

Unless i am missing something, this is not correct.

> -		} else if (sp->unsync)
> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		} else if (sp->unsync) {
>  			kvm_mmu_mark_parents_unsync(sp);
> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		}
> +		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>  
>  		__clear_sp_write_flooding_count(sp);
>  		trace_kvm_mmu_get_page(sp, false);


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
  2015-11-12 11:53 ` [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes Takuya Yoshikawa
@ 2015-11-13 22:08   ` Marcelo Tosatti
  2015-11-16  3:34     ` Takuya Yoshikawa
  0 siblings, 1 reply; 31+ messages in thread
From: Marcelo Tosatti @ 2015-11-13 22:08 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: pbonzini, kvm, linux-kernel

On Thu, Nov 12, 2015 at 08:53:43PM +0900, Takuya Yoshikawa wrote:
> At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
> placed right after the call to detect unrelated sptes which must not be
> found in the reverse-mapping list.
> 
> Move this check in rmap_get_first/next() so that all call sites, not
> just the users of the for_each_rmap_spte() macro, will be checked the
> same way.  In addition, change the BUG_ON to WARN_ON since killing the
> whole host is the last thing that KVM should try.

It should be a BUG_ON, if KVM continues it will corrupt (more) memory.

> One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
> rmap_get_first() to handle parent sptes.  The change will not break it
> because parent sptes are present, at least until drop_parent_pte()
> actually unlinks them, and not mmio-sptes.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> ---
>  Documentation/virtual/kvm/mmu.txt |  4 ++--
>  arch/x86/kvm/mmu.c                | 26 +++++++++++++++++---------
>  2 files changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
> index 3a4d681..daf9c0f 100644
> --- a/Documentation/virtual/kvm/mmu.txt
> +++ b/Documentation/virtual/kvm/mmu.txt
> @@ -203,10 +203,10 @@ Shadow pages contain the following information:
>      page cannot be destroyed.  See role.invalid.
>    parent_ptes:
>      The reverse mapping for the pte/ptes pointing at this page's spt. If
> -    parent_ptes bit 0 is zero, only one spte points at this pages and
> +    parent_ptes bit 0 is zero, only one spte points at this page and
>      parent_ptes points at this single spte, otherwise, there exists multiple
>      sptes pointing at this page and (parent_ptes & ~0x1) points at a data
> -    structure with a list of parent_ptes.
> +    structure with a list of parent sptes.
>    unsync:
>      If true, then the translations in this page may not match the guest's
>      translation.  This is equivalent to the state of the tlb when a pte is
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 1691171..ee7b101 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1079,17 +1079,23 @@ struct rmap_iterator {
>   */
>  static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
>  {
> +	u64 *sptep;
> +
>  	if (!rmap)
>  		return NULL;
>  
>  	if (!(rmap & 1)) {
>  		iter->desc = NULL;
> -		return (u64 *)rmap;
> +		sptep = (u64 *)rmap;
> +		goto out;
>  	}
>  
>  	iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
>  	iter->pos = 0;
> -	return iter->desc->sptes[iter->pos];
> +	sptep = iter->desc->sptes[iter->pos];
> +out:
> +	WARN_ON(!is_shadow_present_pte(*sptep));
> +	return sptep;
>  }
>  
>  /*
> @@ -1099,14 +1105,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
>   */
>  static u64 *rmap_get_next(struct rmap_iterator *iter)
>  {
> +	u64 *sptep;
> +
>  	if (iter->desc) {
>  		if (iter->pos < PTE_LIST_EXT - 1) {
> -			u64 *sptep;
> -
>  			++iter->pos;
>  			sptep = iter->desc->sptes[iter->pos];
>  			if (sptep)
> -				return sptep;
> +				goto out;
>  		}
>  
>  		iter->desc = iter->desc->more;
> @@ -1114,17 +1120,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
>  		if (iter->desc) {
>  			iter->pos = 0;
>  			/* desc->sptes[0] cannot be NULL */
> -			return iter->desc->sptes[iter->pos];
> +			sptep = iter->desc->sptes[iter->pos];
> +			goto out;
>  		}
>  	}
>  
>  	return NULL;
> +out:
> +	WARN_ON(!is_shadow_present_pte(*sptep));
> +	return sptep;
>  }
>  
>  #define for_each_rmap_spte(_rmap_, _iter_, _spte_)			    \
>  	   for (_spte_ = rmap_get_first(*_rmap_, _iter_);		    \
> -		_spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;});  \
> -			_spte_ = rmap_get_next(_iter_))
> +		_spte_; _spte_ = rmap_get_next(_iter_))
>  
>  static void drop_spte(struct kvm *kvm, u64 *sptep)
>  {
> @@ -1338,7 +1347,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
>  	bool flush = false;
>  
>  	while ((sptep = rmap_get_first(*rmapp, &iter))) {
> -		BUG_ON(!(*sptep & PT_PRESENT_MASK));
>  		rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
>  
>  		drop_spte(kvm, sptep);
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-13 21:47   ` Marcelo Tosatti
@ 2015-11-14  9:20     ` Marcelo Tosatti
  2015-11-16  2:51       ` Takuya Yoshikawa
  0 siblings, 1 reply; 31+ messages in thread
From: Marcelo Tosatti @ 2015-11-14  9:20 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: pbonzini, kvm, linux-kernel

On Fri, Nov 13, 2015 at 07:47:28PM -0200, Marcelo Tosatti wrote:
> On Thu, Nov 12, 2015 at 08:52:45PM +0900, Takuya Yoshikawa wrote:
> > kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
> > nearly the same as the for_each_rmap_spte macro.  The only difference
> > is that is_shadow_present_pte() checks cannot be placed there because
> > kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
> > whose entry is not set yet.
> > 
> > By calling mark_unsync() separately for the parent and adding the parent
> > pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
> > works with no problem.
> > 
> > Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> > ---
> >  arch/x86/kvm/mmu.c | 36 +++++++++++++-----------------------
> >  1 file changed, 13 insertions(+), 23 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index e8cfdc4..1691171 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
> >  	}
> >  }
> >  
> > -typedef void (*pte_list_walk_fn) (u64 *spte);
> > -static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
> > -{
> > -	struct pte_list_desc *desc;
> > -	int i;
> > -
> > -	if (!*pte_list)
> > -		return;
> > -
> > -	if (!(*pte_list & 1))
> > -		return fn((u64 *)*pte_list);
> > -
> > -	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
> > -	while (desc) {
> > -		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
> > -			fn(desc->sptes[i]);
> > -		desc = desc->more;
> > -	}
> > -}
> > -
> >  static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
> >  				    struct kvm_memory_slot *slot)
> >  {
> > @@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
> >  static void mark_unsync(u64 *spte);
> >  static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
> >  {
> > -	pte_list_walk(&sp->parent_ptes, mark_unsync);
> > +	u64 *sptep;
> > +	struct rmap_iterator iter;
> > +
> > +	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> > +		mark_unsync(sptep);
> > +	}
> >  }
> >  
> >  static void mark_unsync(u64 *spte)
> > @@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> 
> Faulting a spte, and one of the levels of sptes, either
> 
> 
> 		spte-1
> 		spte-2
> 		spte-3
> 
> has present bit clear. So we're searching for a guest page to shadow, with
> gfn "gfn".
> 
> >  		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
> >  			break;
> 
> If a shadow for gfn exists, but is unsync, sync guest-page ---to--> kvm
> sptes.
> 
> > -		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
> 
> add "gfn" (actually its "struct kvm_mmu_page *sp" pointer) to
> the parent.
> >  		if (sp->unsync_children) {
> >  			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> >  			kvm_mmu_mark_parents_unsync(sp);
> 
> kvm_mmu_mark_parents_unsync relied on the links from current level all
> the way to top level to mark all levels unsync, so that on guest entry,
> KVM_REQ_MMU_SYNC is processed and any level is brought from guest -->
> kvm pages. This now fails, because you removed "mmu_page_add_parent_pte"
> (the link is not formed all the way to root).
> 
> Unless i am missing something, this is not correct.

The actual issue is this: a higher level page that had, under its children,
no out of sync pages, now, due to your addition, a child that is unsync:

initial state:
	level1 

final state:

	level1 -x-> level2 -x-> level3

Where -x-> are the links created by this pagefault fixing round.

If _any_ page under you is unsync (not necessarily the ones this
pagefault is accessing), you have to mark parents unsync.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-14  9:20     ` Marcelo Tosatti
@ 2015-11-16  2:51       ` Takuya Yoshikawa
  2015-11-17 17:58         ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-16  2:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: pbonzini, kvm, linux-kernel

On 2015/11/14 18:20, Marcelo Tosatti wrote:

> The actual issue is this: a higher level page that had, under its children,
> no out of sync pages, now, due to your addition, a child that is unsync:
>
> initial state:
> 	level1
>
> final state:
>
> 	level1 -x-> level2 -x-> level3
>
> Where -x-> are the links created by this pagefault fixing round.
>
> If _any_ page under you is unsync (not necessarily the ones this
> pagefault is accessing), you have to mark parents unsync.

I understand this, but I don't think my patch will break this.

What kvm_mmu_mark_parents_unsync() does is:

   for each p_i in sp->parent_ptes rmap chain
     mark_unsync(p_i);

Then, mark_unsync() finds the parent sp including that p_i to
set ->unsync_child_bitmap and increment ->unsync_children if
necessary.  It may also call kvm_mmu_mark_parents_unsync()
recursively.

I understand we need to tell the parents "you have an unsync
child/descendant" until this information reaches the top level
by that recursive calls.

But since these recursive calls cannot come back to the starting sp,
the child->parent graph has no loop, each mark_unsync(p_i) will not
be affected by other parents in that sp->parent_ptes rmap chain,
from which we started the recursive calls.


As the following code shows, my patch does mark_unsync(parent_pte)
separately, and then mmu_page_add_parent_pte(vcpu, sp, parent_pte):

> -		} else if (sp->unsync)
> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		} else if (sp->unsync) {
>  			kvm_mmu_mark_parents_unsync(sp);
> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		}
> +		mmu_page_add_parent_pte(vcpu, sp, parent_pte);

So, as you worried, during each mark_unsync(p_i) is processed,
this parent_pte does not exist in that sp->parent_ptes rmap chain.

But as I explained above, this does not change anything about what
each mark_unsync(p_i) call does, so keeps the original behaviour.


By the way, I think "kvm_mmu_mark_parents_unsync" and "mark_unsync"
do not tell what they actually do well. When I first saw the names,
I thought they would just set the parents' sp->unsync.

To reflect the following meaning better, it should be
propagate_unsync(_to_parents) or something:

   Tell the parents "you have an unsync child/descendant"
   until this unsync information reaches the top level


Thanks,
   Takuya



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
  2015-11-13 22:08   ` Marcelo Tosatti
@ 2015-11-16  3:34     ` Takuya Yoshikawa
  0 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-16  3:34 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: pbonzini, kvm, linux-kernel

On 2015/11/14 7:08, Marcelo Tosatti wrote:
> On Thu, Nov 12, 2015 at 08:53:43PM +0900, Takuya Yoshikawa wrote:
>> At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
>> placed right after the call to detect unrelated sptes which must not be
>> found in the reverse-mapping list.
>>
>> Move this check in rmap_get_first/next() so that all call sites, not
>> just the users of the for_each_rmap_spte() macro, will be checked the
>> same way.  In addition, change the BUG_ON to WARN_ON since killing the
>> whole host is the last thing that KVM should try.
>
> It should be a BUG_ON, if KVM continues it will corrupt (more) memory.

In the sense that we cannot predict what kind of corruption it will
cause, I agree with you.

But if it can only corrupt that guest's memory, it is a bit sad to
kill unrelated guests, and host, too.  Anyway, since we cannot say
for sure what a possible bug can cause, I agree with you now.

Thanks,
   Takuya


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-16  2:51       ` Takuya Yoshikawa
@ 2015-11-17 17:58         ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-17 17:58 UTC (permalink / raw)
  To: Takuya Yoshikawa, Marcelo Tosatti; +Cc: kvm, linux-kernel



On 16/11/2015 03:51, Takuya Yoshikawa wrote:
> What kvm_mmu_mark_parents_unsync() does is:
> 
>   for each p_i in sp->parent_ptes rmap chain
>     mark_unsync(p_i);
> 
> Then, mark_unsync() finds the parent sp including that p_i to
> set ->unsync_child_bitmap and increment ->unsync_children if
> necessary.  It may also call kvm_mmu_mark_parents_unsync()
> recursively.

I agree.  sp->parent_ptes goes up one level from sp;
kvm_mmu_mark_parents_unsync(sp) visits the level above sp, while
mark_unsync(sp) visit sp and all the levels above it.

Calling mark_unsync(parent_pte) is enough to complete the visit, after
kvm_mmu_mark_parents_unsync has already processed the level above sp.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  2015-11-12 11:50 ` [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap Takuya Yoshikawa
@ 2015-11-18  2:44   ` Xiao Guangrong
  2015-11-19  0:59     ` Takuya Yoshikawa
  0 siblings, 1 reply; 31+ messages in thread
From: Xiao Guangrong @ 2015-11-18  2:44 UTC (permalink / raw)
  To: Takuya Yoshikawa, pbonzini; +Cc: kvm, linux-kernel



On 11/12/2015 07:50 PM, Takuya Yoshikawa wrote:
> +			if (!ret) {
> +				clear_unsync_child_bit(sp, i);
> +				continue;
> +			} else if (ret > 0) {
>   				nr_unsync_leaf += ret;

Just a single line here, braces are unnecessary.

> -			else
> +			} else
>   				return ret;
>   		} else if (child->unsync) {
>   			nr_unsync_leaf++;
>   			if (mmu_pages_add(pvec, child, i))
>   				return -ENOSPC;
>   		} else
> -			 goto clear_child_bitmap;
> -
> -		continue;
> -
> -clear_child_bitmap:
> -		__clear_bit(i, sp->unsync_child_bitmap);
> -		sp->unsync_children--;
> -		WARN_ON((int)sp->unsync_children < 0);
> +			clear_unsync_child_bit(sp, i);
>   	}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  2015-11-12 11:52 ` [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() Takuya Yoshikawa
  2015-11-13 21:47   ` Marcelo Tosatti
@ 2015-11-18  3:07   ` Xiao Guangrong
  1 sibling, 0 replies; 31+ messages in thread
From: Xiao Guangrong @ 2015-11-18  3:07 UTC (permalink / raw)
  To: Takuya Yoshikawa, pbonzini; +Cc: kvm, linux-kernel



On 11/12/2015 07:52 PM, Takuya Yoshikawa wrote:
> kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
> nearly the same as the for_each_rmap_spte macro.  The only difference
> is that is_shadow_present_pte() checks cannot be placed there because
> kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
> whose entry is not set yet.
>
> By calling mark_unsync() separately for the parent and adding the parent
> pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
> works with no problem.
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> ---
>   arch/x86/kvm/mmu.c | 36 +++++++++++++-----------------------
>   1 file changed, 13 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index e8cfdc4..1691171 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
>   	}
>   }
>
> -typedef void (*pte_list_walk_fn) (u64 *spte);
> -static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
> -{
> -	struct pte_list_desc *desc;
> -	int i;
> -
> -	if (!*pte_list)
> -		return;
> -
> -	if (!(*pte_list & 1))
> -		return fn((u64 *)*pte_list);
> -
> -	desc = (struct pte_list_desc *)(*pte_list & ~1ul);
> -	while (desc) {
> -		for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
> -			fn(desc->sptes[i]);
> -		desc = desc->more;
> -	}
> -}
> -
>   static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
>   				    struct kvm_memory_slot *slot)
>   {
> @@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>   static void mark_unsync(u64 *spte);
>   static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
>   {
> -	pte_list_walk(&sp->parent_ptes, mark_unsync);
> +	u64 *sptep;
> +	struct rmap_iterator iter;
> +
> +	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> +		mark_unsync(sptep);
> +	}
>   }
>
>   static void mark_unsync(u64 *spte)
> @@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>   		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
>   			break;
>
> -		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>   		if (sp->unsync_children) {
>   			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>   			kvm_mmu_mark_parents_unsync(sp);

After your change above, the @sp's parents have not been changed, no need to call it now.

> -		} else if (sp->unsync)
> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		} else if (sp->unsync) {
>   			kvm_mmu_mark_parents_unsync(sp);

Ditto.

> +			if (parent_pte)
> +				mark_unsync(parent_pte);
> +		}
> +		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>
>   		__clear_sp_write_flooding_count(sp);
>   		trace_kvm_mmu_get_page(sp, false);
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  2015-11-12 11:55 ` [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct Takuya Yoshikawa
@ 2015-11-18  3:21   ` Xiao Guangrong
  2015-11-18  9:09     ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Xiao Guangrong @ 2015-11-18  3:21 UTC (permalink / raw)
  To: Takuya Yoshikawa, pbonzini; +Cc: kvm, linux-kernel



On 11/12/2015 07:55 PM, Takuya Yoshikawa wrote:
> @@ -1720,7 +1724,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>   	 * this feature. See the comments in kvm_zap_obsolete_pages().
>   	 */
>   	list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
> -	sp->parent_ptes = 0;
> +	sp->parent_ptes.val = 0;

The sp is allocated from kmem_cache_zalloc() so explicitly initialize it to zero is
not needed.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
  2015-11-12 11:56 ` [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() Takuya Yoshikawa
  2015-11-12 14:27   ` Paolo Bonzini
@ 2015-11-18  3:32   ` Xiao Guangrong
  1 sibling, 0 replies; 31+ messages in thread
From: Xiao Guangrong @ 2015-11-18  3:32 UTC (permalink / raw)
  To: Takuya Yoshikawa, pbonzini; +Cc: kvm, linux-kernel



On 11/12/2015 07:56 PM, Takuya Yoshikawa wrote:
> Every time kvm_mmu_get_page() is called with a non-NULL parent_pte
> argument, link_shadow_page() follows that to set the parent entry so
> that the new mapping will point to the returned page table.
>
> Moving parent_pte handling there allows to clean up the code because
> parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and
> mmu_page_add_parent_pte().
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> ---
>   arch/x86/kvm/mmu.c         | 21 ++++++++-------------
>   arch/x86/kvm/paging_tmpl.h |  6 ++----
>   2 files changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9273cd4..33fe720 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2108,14 +2108,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>   		if (sp->unsync_children) {
>   			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>   			kvm_mmu_mark_parents_unsync(sp);
> -			if (parent_pte)
> -				mark_unsync(parent_pte);
>   		} else if (sp->unsync) {
>   			kvm_mmu_mark_parents_unsync(sp);
> -			if (parent_pte)
> -				mark_unsync(parent_pte);
>   		}
> -		mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>
>   		__clear_sp_write_flooding_count(sp);
>   		trace_kvm_mmu_get_page(sp, false);
> @@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>   	sp = kvm_mmu_alloc_page(vcpu, direct);
>
>   	sp->parent_ptes.val = 0;
> -	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>
>   	sp->gfn = gfn;
>   	sp->role = role;
> @@ -2196,7 +2190,8 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
>   	return __shadow_walk_next(iterator, *iterator->sptep);
>   }
>
> -static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed)
> +static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
> +			     struct kvm_mmu_page *sp, bool accessed)
>   {
>   	u64 spte;
>
> @@ -2210,6 +2205,11 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed)
>   		spte |= shadow_accessed_mask;
>
>   	mmu_spte_set(sptep, spte);
> +
> +	if (sp->unsync_children || sp->unsync)
> +		mark_unsync(sptep);

Why are these needed?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  2015-11-18  3:21   ` Xiao Guangrong
@ 2015-11-18  9:09     ` Paolo Bonzini
  2015-11-19  2:23       ` Takuya Yoshikawa
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2015-11-18  9:09 UTC (permalink / raw)
  To: Xiao Guangrong, Takuya Yoshikawa; +Cc: kvm, linux-kernel



On 18/11/2015 04:21, Xiao Guangrong wrote:
> 
> 
> On 11/12/2015 07:55 PM, Takuya Yoshikawa wrote:
>> @@ -1720,7 +1724,7 @@ static struct kvm_mmu_page
>> *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>>        * this feature. See the comments in kvm_zap_obsolete_pages().
>>        */
>>       list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
>> -    sp->parent_ptes = 0;
>> +    sp->parent_ptes.val = 0;
> 
> The sp is allocated from kmem_cache_zalloc() so explicitly initialize it
> to zero is not needed.

Right, but it should be a separate patch.

Takuya, since you are going to send another version of this series, can
you also:

1) move this patch either to the beginning or to the end

2) include "KVM: x86: always set accessed bit in shadow PTEs", also near
the beginning of the series?

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  2015-11-18  2:44   ` Xiao Guangrong
@ 2015-11-19  0:59     ` Takuya Yoshikawa
  2015-11-19  2:46       ` Xiao Guangrong
  0 siblings, 1 reply; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-19  0:59 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini; +Cc: kvm, linux-kernel

On 2015/11/18 11:44, Xiao Guangrong wrote:

> On 11/12/2015 07:50 PM, Takuya Yoshikawa wrote:
>> +            if (!ret) {
>> +                clear_unsync_child_bit(sp, i);
>> +                continue;
>> +            } else if (ret > 0) {
>>                   nr_unsync_leaf += ret;
>
> Just a single line here, braces are unnecessary.
>
>> -            else
>> +            } else
>>                   return ret;

I know we can eliminate the braces, but that does not mean
we should do so: there seems to be no consensus about this
style issue and checkpatch accepts both ways.

Actually, some people prefer to put braces when one of the
if/else-if/else cases has multiple lines.  You can see
some examples in kernel/sched/core.c: see hrtick_start(),
sched_fork(), free_sched_domain().

In our case, I thought putting braces would align the else-if
and else and make the code look a bit nicer, but I know this
may be just a matter of personal feeling.

In short, unless the maintainer, Paolo for this file, has any
preference, both ways will be accepted.

   Takuya


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  2015-11-18  9:09     ` Paolo Bonzini
@ 2015-11-19  2:23       ` Takuya Yoshikawa
  0 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-19  2:23 UTC (permalink / raw)
  To: Paolo Bonzini, Xiao Guangrong; +Cc: kvm, linux-kernel, mtosatti

On 2015/11/18 18:09, Paolo Bonzini wrote:

> On 18/11/2015 04:21, Xiao Guangrong wrote:

>> On 11/12/2015 07:55 PM, Takuya Yoshikawa wrote:
>>> @@ -1720,7 +1724,7 @@ static struct kvm_mmu_page
>>> *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
>>>         * this feature. See the comments in kvm_zap_obsolete_pages().
>>>         */
>>>        list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
>>> -    sp->parent_ptes = 0;
>>> +    sp->parent_ptes.val = 0;
>>
>> The sp is allocated from kmem_cache_zalloc() so explicitly initialize it
>> to zero is not needed.
>
> Right, but it should be a separate patch.
>
> Takuya, since you are going to send another version of this series, can
> you also:

Yes, I'm preparing to do so.

> 1) move this patch either to the beginning or to the end
>
> 2) include "KVM: x86: always set accessed bit in shadow PTEs", also near
> the beginning of the series?

Commit 1c9a5e19b1af8a2c ("KVM: x86: MMU: always set accessed bit
in shadow PTEs") will be the first.

Then, the ordering will become something like this:

02: Encapsulate the type of rmap-chain head in a new struct
03: Remove unused parameter of __direct_map()
04: Add helper function to clear a bit in unsync child bitmap
05: Make mmu_set_spte() return emulate value
06: Remove is_rmap_spte() and use is_shadow_present_pte()

These five seem to be easy ones for you to apply: since patch 02
touches many places, it should go first to become the base of the
following work.

07: Consolidate BUG_ON checks for reverse-mapped sptes

I will change the WARN_ON to BUG_ON.  // Marcelo's comment

08: Move initialization of parent_ptes out from kvm_mmu_alloc_page()

In this patch, I will delete "sp->parent_ptes.val = 0;" line since
this is the problem of kvm_mmu_alloc_page(), though not a new one.
   // Xiao's comment

09: Use for_each_rmap_spte macro instead of pte_list_walk()

There is some confusion between us: Paolo and I agreed that the
patch keeps the original way and calls mark_unsync() the same way
as before, but there are still comments from Marcelo and Xiao and
those comments seem to explain the code differently.

I will check again, but I may not change this one and the following
two patches in the next version.  If we can eliminate some of the
mark_unsync() calls, that will be kind of an optimization which this
series does not intend to achieve.

Anyway, by moving the non-trivial two patches (09 and 10) here,
reviewing will become easier and you can apply the other patches
separately.

10: Move parent_pte handling from kvm_mmu_get_page()
     to link_shadow_page()
11: Remove unused parameter parent_pte from kvm_mmu_get_page()

Thanks,
   Takuya


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  2015-11-19  0:59     ` Takuya Yoshikawa
@ 2015-11-19  2:46       ` Xiao Guangrong
  2015-11-19  4:02         ` Takuya Yoshikawa
  0 siblings, 1 reply; 31+ messages in thread
From: Xiao Guangrong @ 2015-11-19  2:46 UTC (permalink / raw)
  To: Takuya Yoshikawa, pbonzini; +Cc: kvm, linux-kernel



On 11/19/2015 08:59 AM, Takuya Yoshikawa wrote:
> On 2015/11/18 11:44, Xiao Guangrong wrote:
>
>> On 11/12/2015 07:50 PM, Takuya Yoshikawa wrote:
>>> +            if (!ret) {
>>> +                clear_unsync_child_bit(sp, i);
>>> +                continue;
>>> +            } else if (ret > 0) {
>>>                   nr_unsync_leaf += ret;
>>
>> Just a single line here, braces are unnecessary.
>>
>>> -            else
>>> +            } else
>>>                   return ret;
>
> I know we can eliminate the braces, but that does not mean
> we should do so: there seems to be no consensus about this
> style issue and checkpatch accepts both ways.
>
> Actually, some people prefer to put braces when one of the
> if/else-if/else cases has multiple lines.  You can see
> some examples in kernel/sched/core.c: see hrtick_start(),
> sched_fork(), free_sched_domain().
>
> In our case, I thought putting braces would align the else-if
> and else and make the code look a bit nicer, but I know this
> may be just a matter of personal feeling.
>
> In short, unless the maintainer, Paolo for this file, has any
> preference, both ways will be accepted.

The reason why i pointed this out is that it is the style documented
in Documentation/CodingStyle:
| Do not unnecessarily use braces where a single statement will do.
|
|        if (condition)
|                action();
|

Actually, Ingo Molnar hated this braces-style too much and blamed
many developers who used this style (include me, that why i was
nervous to see this style :( ).

If this style is commonly accepted now, it is worth making a patch
to update Documentation/CodingStyle.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  2015-11-19  2:46       ` Xiao Guangrong
@ 2015-11-19  4:02         ` Takuya Yoshikawa
  0 siblings, 0 replies; 31+ messages in thread
From: Takuya Yoshikawa @ 2015-11-19  4:02 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini; +Cc: kvm, linux-kernel

On 2015/11/19 11:46, Xiao Guangrong wrote:

>> Actually, some people prefer to put braces when one of the
>> if/else-if/else cases has multiple lines.  You can see
>> some examples in kernel/sched/core.c: see hrtick_start(),
>> sched_fork(), free_sched_domain().
>>
>> In our case, I thought putting braces would align the else-if
>> and else and make the code look a bit nicer, but I know this
>> may be just a matter of personal feeling.
>>
>> In short, unless the maintainer, Paolo for this file, has any
>> preference, both ways will be accepted.
>
> The reason why i pointed this out is that it is the style documented
> in Documentation/CodingStyle:
> | Do not unnecessarily use braces where a single statement will do.
> |
> |        if (condition)
> |                action();
> |

Ah, this is a different thing.  For this case, there is a consensus
and checkpatch will complain if we don't obey the rule.

What I explained was:

   if (condition) {
      line1;
      line2;  // multiple lines
   } else if {
      single-line-statement;  -- (*1)
   } else
      single-line-statement;  -- (*2)

For (*1) and (*2), especially for (*1), some people put braces.

> Actually, Ingo Molnar hated this braces-style too much and blamed
> many developers who used this style (include me, that why i was
> nervous to see this style :( ).

I think he likes the coding style of kernel/sched/core.c very much,
as you know.  Actually that is one reason why I took it as an example.

Let's just choose the way which Paolo prefers for this time, I don't
know which is better.

Thank you,
   Takuya



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-11-19  4:02 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12 11:48 [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Takuya Yoshikawa
2015-11-12 11:49 ` [PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map() Takuya Yoshikawa
2015-11-12 11:50 ` [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap Takuya Yoshikawa
2015-11-18  2:44   ` Xiao Guangrong
2015-11-19  0:59     ` Takuya Yoshikawa
2015-11-19  2:46       ` Xiao Guangrong
2015-11-19  4:02         ` Takuya Yoshikawa
2015-11-12 11:51 ` [PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value Takuya Yoshikawa
2015-11-12 11:51 ` [PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() Takuya Yoshikawa
2015-11-12 11:52 ` [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() Takuya Yoshikawa
2015-11-13 21:47   ` Marcelo Tosatti
2015-11-14  9:20     ` Marcelo Tosatti
2015-11-16  2:51       ` Takuya Yoshikawa
2015-11-17 17:58         ` Paolo Bonzini
2015-11-18  3:07   ` Xiao Guangrong
2015-11-12 11:53 ` [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes Takuya Yoshikawa
2015-11-13 22:08   ` Marcelo Tosatti
2015-11-16  3:34     ` Takuya Yoshikawa
2015-11-12 11:55 ` [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct Takuya Yoshikawa
2015-11-18  3:21   ` Xiao Guangrong
2015-11-18  9:09     ` Paolo Bonzini
2015-11-19  2:23       ` Takuya Yoshikawa
2015-11-12 11:55 ` [PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() Takuya Yoshikawa
2015-11-12 11:56 ` [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() Takuya Yoshikawa
2015-11-12 14:27   ` Paolo Bonzini
2015-11-12 17:03     ` Paolo Bonzini
2015-11-13  2:15     ` Takuya Yoshikawa
2015-11-13 10:51       ` Paolo Bonzini
2015-11-18  3:32   ` Xiao Guangrong
2015-11-12 11:57 ` [PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() Takuya Yoshikawa
2015-11-12 12:08 ` [PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.