All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations
@ 2016-02-24 13:17 Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 01/12] KVM: MMU: Fix ubsan warnings Paolo Bonzini
                   ` (13 more replies)
  0 siblings, 14 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

This series started from looking at mmu_unsync_walk for the ubsan thread.
Patches 1 and 2 are the result of the discussions in that thread.

Patches 3 to 9 do more cleanups in __kvm_sync_page and its callers.
Among other changes, it removes kvm_sync_page_transient and avoids
duplicate code between __kvm_sync_page and kvm_sync_pages.

I stopped where I had questions about the existing kvm_mmu_get_page
code (see patch 8 for the question).  However perhaps more cleanups
are possible, also thanks to Takuya's work on that function and
link_shadow_page.

Patches 10 to 12 are just micro-optimizations.

Guangrong, it would be great if you took a look since you know this part
of KVM very well.

I have tested this series minus patch 9, and it survived installation
of various Linux and Windows guests with EPT disabled.  Of course before
committing the patches I will retest with patch 9 included.

Paolo

Paolo Bonzini (11):
  KVM: MMU: Fix ubsan warnings
  KVM: MMU: introduce kvm_mmu_flush_or_zap
  KVM: MMU: move TLB flush out of __kvm_sync_page
  KVM: MMU: use kvm_sync_page in kvm_sync_pages
  KVM: MMU: cleanup __kvm_sync_page and its callers
  KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page*
  KVM: MMU: move zap/flush to kvm_mmu_get_page
  KVM: MMU: coalesce zapping page after mmu_sync_children
  KVM: mark memory barrier with smp_mb__after_atomic
  KVM: MMU: simplify last_pte_bitmap
  KVM: MMU: micro-optimize gpte_access

Xiao Guangrong (1):
  KVM: MMU: check kvm_mmu_pages and mmu_page_path indices

 arch/x86/include/asm/kvm_host.h |   6 +-
 arch/x86/kvm/mmu.c              | 216 ++++++++++++++++++++++------------------
 arch/x86/kvm/paging_tmpl.h      |  11 +-
 virt/kvm/kvm_main.c             |   2 +-
 4 files changed, 126 insertions(+), 109 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/12] KVM: MMU: Fix ubsan warnings
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-24 13:42   ` Mike Krinkin
  2016-02-24 13:17 ` [PATCH 02/12] KVM: MMU: check kvm_mmu_pages and mmu_page_path indices Paolo Bonzini
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

kvm_mmu_pages_init is doing some really yucky stuff.  It is setting
up a sentinel for mmu_page_clear_parents; however, because of a) the
way levels are numbered starting from 1 and b) the way mmu_page_path
sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
out of bounds.  This is harmless because the code overwrites up to the
first two elements of parents->idx and these are initialized, and
because the sentinel is not needed in this case---mmu_page_clear_parents
exits anyway when it gets to the end of the array.  However ubsan
complains, and everyone else should too.

This fix does three things.  First it makes the mmu_page_path arrays
PT64_ROOT_LEVEL elements in size, so that we can write to them without
checking the level in advance.  Second it disintegrates kvm_mmu_pages_init
between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
for_each_sp (to place the NULL sentinel at the end of the current path).
This is okay because the mmu_page_path is only used in
mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
a for_each_sp iterator, and hence always after a call to mmu_pages_next.
Third it changes mmu_pages_clear_parents to just use the sentinel to
stop iteration, without checking the bounds on level.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c | 57 +++++++++++++++++++++++++++++++-----------------------
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 07f4c26a10d3..4dee855897cf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1843,6 +1843,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 			   struct kvm_mmu_pages *pvec)
 {
+	pvec->nr = 0;
 	if (!sp->unsync_children)
 		return 0;
 
@@ -1956,13 +1957,12 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 }
 
 struct mmu_page_path {
-	struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
-	unsigned int idx[PT64_ROOT_LEVEL-1];
+	struct kvm_mmu_page *parent[PT64_ROOT_LEVEL];
+	unsigned int idx[PT64_ROOT_LEVEL];
 };
 
 #define for_each_sp(pvec, sp, parents, i)			\
-		for (i = mmu_pages_next(&pvec, &parents, -1),	\
-			sp = pvec.page[i].sp;			\
+		for (i = mmu_pages_first(&pvec, &parents);	\
 			i < pvec.nr && ({ sp = pvec.page[i].sp; 1;});	\
 			i = mmu_pages_next(&pvec, &parents, i))
 
@@ -1974,19 +1974,41 @@ static int mmu_pages_next(struct kvm_mmu_pages *pvec,
 
 	for (n = i+1; n < pvec->nr; n++) {
 		struct kvm_mmu_page *sp = pvec->page[n].sp;
+		unsigned idx = pvec->page[n].idx;
+		int level = sp->role.level;
 
-		if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
-			parents->idx[0] = pvec->page[n].idx;
-			return n;
-		}
+		parents->idx[level-1] = idx;
+		if (level == PT_PAGE_TABLE_LEVEL)
+			break;
 
-		parents->parent[sp->role.level-2] = sp;
-		parents->idx[sp->role.level-1] = pvec->page[n].idx;
+		parents->parent[level-2] = sp;
 	}
 
 	return n;
 }
 
+static int mmu_pages_first(struct kvm_mmu_pages *pvec,
+			   struct mmu_page_path *parents)
+{
+	struct kvm_mmu_page *sp;
+	int level;
+
+	if (pvec->nr == 0)
+		return 0;
+
+	sp = pvec->page[0].sp;
+	level = sp->role.level;
+	WARN_ON(level == PT_PAGE_TABLE_LEVEL);
+
+	parents->parent[level-2] = sp;
+
+	/* Also set up a sentinel.  Further entries in pvec are all
+	 * children of sp, so this element is never overwritten.
+	 */
+	parents->parent[level-1] = NULL;
+	return mmu_pages_next(pvec, parents, 0);
+}
+
 static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 {
 	struct kvm_mmu_page *sp;
@@ -1994,22 +2016,13 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 
 	do {
 		unsigned int idx = parents->idx[level];
-
 		sp = parents->parent[level];
 		if (!sp)
 			return;
 
 		clear_unsync_child_bit(sp, idx);
 		level++;
-	} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
-}
-
-static void kvm_mmu_pages_init(struct kvm_mmu_page *parent,
-			       struct mmu_page_path *parents,
-			       struct kvm_mmu_pages *pvec)
-{
-	parents->parent[parent->role.level-1] = NULL;
-	pvec->nr = 0;
+	} while (!sp->unsync_children);
 }
 
 static void mmu_sync_children(struct kvm_vcpu *vcpu,
@@ -2021,7 +2034,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 	struct kvm_mmu_pages pages;
 	LIST_HEAD(invalid_list);
 
-	kvm_mmu_pages_init(parent, &parents, &pages);
 	while (mmu_unsync_walk(parent, &pages)) {
 		bool protected = false;
 
@@ -2037,7 +2049,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 		}
 		kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
 		cond_resched_lock(&vcpu->kvm->mmu_lock);
-		kvm_mmu_pages_init(parent, &parents, &pages);
 	}
 }
 
@@ -2269,7 +2280,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
 	if (parent->role.level == PT_PAGE_TABLE_LEVEL)
 		return 0;
 
-	kvm_mmu_pages_init(parent, &parents, &pages);
 	while (mmu_unsync_walk(parent, &pages)) {
 		struct kvm_mmu_page *sp;
 
@@ -2278,7 +2288,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
 			mmu_pages_clear_parents(&parents);
 			zapped++;
 		}
-		kvm_mmu_pages_init(parent, &parents, &pages);
 	}
 
 	return zapped;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/12] KVM: MMU: check kvm_mmu_pages and mmu_page_path indices
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 01/12] KVM: MMU: Fix ubsan warnings Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 03/12] KVM: MMU: introduce kvm_mmu_flush_or_zap Paolo Bonzini
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

From: Xiao Guangrong <guangrong.xiao@linux.intel.com>

Give a special invalid index to the root of the walk, so that we
can check the consistency of kvm_mmu_pages and mmu_page_path.

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Extracted from a bigger patch proposed by Guangrong. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4dee855897cf..3060873d8cab 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1840,6 +1840,8 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 	return nr_unsync_leaf;
 }
 
+#define INVALID_INDEX (-1)
+
 static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 			   struct kvm_mmu_pages *pvec)
 {
@@ -1847,7 +1849,7 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 	if (!sp->unsync_children)
 		return 0;
 
-	mmu_pages_add(pvec, sp, 0);
+	mmu_pages_add(pvec, sp, INVALID_INDEX);
 	return __mmu_unsync_walk(sp, pvec);
 }
 
@@ -1996,6 +1998,8 @@ static int mmu_pages_first(struct kvm_mmu_pages *pvec,
 	if (pvec->nr == 0)
 		return 0;
 
+	WARN_ON(pvec->page[0].idx != INVALID_INDEX);
+
 	sp = pvec->page[0].sp;
 	level = sp->role.level;
 	WARN_ON(level == PT_PAGE_TABLE_LEVEL);
@@ -2020,6 +2024,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 		if (!sp)
 			return;
 
+		WARN_ON(idx == INVALID_INDEX);
 		clear_unsync_child_bit(sp, idx);
 		level++;
 	} while (!sp->unsync_children);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/12] KVM: MMU: introduce kvm_mmu_flush_or_zap
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 01/12] KVM: MMU: Fix ubsan warnings Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 02/12] KVM: MMU: check kvm_mmu_pages and mmu_page_path indices Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 04/12] KVM: MMU: move TLB flush out of __kvm_sync_page Paolo Bonzini
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

This is a generalization of mmu_pte_write_flush_tlb, that also
takes care of calling kvm_mmu_commit_zap_page.  The next
patches will introduce more uses.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3060873d8cab..6bf74d8d4989 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4125,11 +4125,14 @@ static bool need_remote_flush(u64 old, u64 new)
 	return (old & ~new & PT64_PERM_MASK) != 0;
 }
 
-static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page,
-				    bool remote_flush, bool local_flush)
+static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
+				 struct list_head *invalid_list,
+				 bool remote_flush, bool local_flush)
 {
-	if (zap_page)
+	if (!list_empty(invalid_list)) {
+		kvm_mmu_commit_zap_page(vcpu->kvm, invalid_list);
 		return;
+	}
 
 	if (remote_flush)
 		kvm_flush_remote_tlbs(vcpu->kvm);
@@ -4256,7 +4259,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	LIST_HEAD(invalid_list);
 	u64 entry, gentry, *spte;
 	int npte;
-	bool remote_flush, local_flush, zap_page;
+	bool remote_flush, local_flush;
 	union kvm_mmu_page_role mask = { };
 
 	mask.cr0_wp = 1;
@@ -4273,7 +4276,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
 		return;
 
-	zap_page = remote_flush = local_flush = false;
+	remote_flush = local_flush = false;
 
 	pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
 
@@ -4293,8 +4296,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) {
 		if (detect_write_misaligned(sp, gpa, bytes) ||
 		      detect_write_flooding(sp)) {
-			zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
-						     &invalid_list);
+			kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
 			++vcpu->kvm->stat.mmu_flooded;
 			continue;
 		}
@@ -4316,8 +4318,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 			++spte;
 		}
 	}
-	mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush);
-	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+	kvm_mmu_flush_or_zap(vcpu, &invalid_list, remote_flush, local_flush);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/12] KVM: MMU: move TLB flush out of __kvm_sync_page
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-02-24 13:17 ` [PATCH 03/12] KVM: MMU: introduce kvm_mmu_flush_or_zap Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 05/12] KVM: MMU: use kvm_sync_page in kvm_sync_pages Paolo Bonzini
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

By doing this, kvm_sync_pages can use __kvm_sync_page instead of
reinventing it.  Because of kvm_mmu_flush_or_zap, the code does not
end up being more complex than before, and more cleanups to kvm_sync_pages
will come in the next patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c | 53 ++++++++++++++++++++++++-----------------------------
 1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6bf74d8d4989..c440a28822b7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1902,10 +1902,24 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		return 1;
 	}
 
-	kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 	return 0;
 }
 
+static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
+				 struct list_head *invalid_list,
+				 bool remote_flush, bool local_flush)
+{
+	if (!list_empty(invalid_list)) {
+		kvm_mmu_commit_zap_page(vcpu->kvm, invalid_list);
+		return;
+	}
+
+	if (remote_flush)
+		kvm_flush_remote_tlbs(vcpu->kvm);
+	else if (local_flush)
+		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+}
+
 static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
 				   struct kvm_mmu_page *sp)
 {
@@ -1913,8 +1927,7 @@ static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
 	int ret;
 
 	ret = __kvm_sync_page(vcpu, sp, &invalid_list, false);
-	if (ret)
-		kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, !ret);
 
 	return ret;
 }
@@ -1945,17 +1958,11 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 
 		WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
 		kvm_unlink_unsync_page(vcpu->kvm, s);
-		if ((s->role.cr4_pae != !!is_pae(vcpu)) ||
-			(vcpu->arch.mmu.sync_page(vcpu, s))) {
-			kvm_mmu_prepare_zap_page(vcpu->kvm, s, &invalid_list);
-			continue;
-		}
-		flush = true;
+		if (!__kvm_sync_page(vcpu, s, &invalid_list, false))
+			flush = true;
 	}
 
-	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
-	if (flush)
-		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
 }
 
 struct mmu_page_path {
@@ -2041,6 +2048,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 
 	while (mmu_unsync_walk(parent, &pages)) {
 		bool protected = false;
+		bool flush = false;
 
 		for_each_sp(pages, sp, parents, i)
 			protected |= rmap_write_protect(vcpu, sp->gfn);
@@ -2049,10 +2057,12 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 			kvm_flush_remote_tlbs(vcpu->kvm);
 
 		for_each_sp(pages, sp, parents, i) {
-			kvm_sync_page(vcpu, sp, &invalid_list);
+			if (!kvm_sync_page(vcpu, sp, &invalid_list))
+				flush = true;
+
 			mmu_pages_clear_parents(&parents);
 		}
-		kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+		kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
 		cond_resched_lock(&vcpu->kvm->mmu_lock);
 	}
 }
@@ -4125,21 +4135,6 @@ static bool need_remote_flush(u64 old, u64 new)
 	return (old & ~new & PT64_PERM_MASK) != 0;
 }
 
-static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
-				 struct list_head *invalid_list,
-				 bool remote_flush, bool local_flush)
-{
-	if (!list_empty(invalid_list)) {
-		kvm_mmu_commit_zap_page(vcpu->kvm, invalid_list);
-		return;
-	}
-
-	if (remote_flush)
-		kvm_flush_remote_tlbs(vcpu->kvm);
-	else if (local_flush)
-		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-}
-
 static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa,
 				    const u8 *new, int *bytes)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/12] KVM: MMU: use kvm_sync_page in kvm_sync_pages
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-02-24 13:17 ` [PATCH 04/12] KVM: MMU: move TLB flush out of __kvm_sync_page Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 06/12] KVM: MMU: cleanup __kvm_sync_page and its callers Paolo Bonzini
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

If the last argument is true, kvm_unlink_unsync_page is called anyway in
__kvm_sync_page (either by kvm_mmu_prepare_zap_page or by __kvm_sync_page
itself).  Therefore, kvm_sync_pages can just call kvm_sync_page, instead
of going through kvm_unlink_unsync_page+__kvm_sync_page.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c440a28822b7..240c0dbb140f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1957,8 +1957,7 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 			continue;
 
 		WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
-		kvm_unlink_unsync_page(vcpu->kvm, s);
-		if (!__kvm_sync_page(vcpu, s, &invalid_list, false))
+		if (!kvm_sync_page(vcpu, s, &invalid_list))
 			flush = true;
 	}
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/12] KVM: MMU: cleanup __kvm_sync_page and its callers
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-02-24 13:17 ` [PATCH 05/12] KVM: MMU: use kvm_sync_page in kvm_sync_pages Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 07/12] KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page* Paolo Bonzini
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

Calling kvm_unlink_unsync_page in the middle of __kvm_sync_page makes
things unnecessarily tricky.  If kvm_mmu_prepare_zap_page is called,
it will call kvm_unlink_unsync_page too.  So kvm_unlink_unsync_page can
be called just as well at the beginning or the end of __kvm_sync_page...
which means that we might do it in kvm_sync_page too and remove the
parameter.

kvm_sync_page ends up being the same code that kvm_sync_pages used
to have before the previous patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 240c0dbb140f..56fa1636c0cf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1887,16 +1887,13 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 
 /* @sp->gfn should be write-protected at the call site */
 static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			   struct list_head *invalid_list, bool clear_unsync)
+			   struct list_head *invalid_list)
 {
 	if (sp->role.cr4_pae != !!is_pae(vcpu)) {
 		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
 		return 1;
 	}
 
-	if (clear_unsync)
-		kvm_unlink_unsync_page(vcpu->kvm, sp);
-
 	if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
 		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
 		return 1;
@@ -1926,7 +1923,7 @@ static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
 	LIST_HEAD(invalid_list);
 	int ret;
 
-	ret = __kvm_sync_page(vcpu, sp, &invalid_list, false);
+	ret = __kvm_sync_page(vcpu, sp, &invalid_list);
 	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, !ret);
 
 	return ret;
@@ -1942,7 +1939,8 @@ static void mmu_audit_disable(void) { }
 static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			 struct list_head *invalid_list)
 {
-	return __kvm_sync_page(vcpu, sp, invalid_list, true);
+	kvm_unlink_unsync_page(vcpu->kvm, sp);
+	return __kvm_sync_page(vcpu, sp, invalid_list);
 }
 
 /* @gfn should be write-protected at the call site */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/12] KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page*
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-02-24 13:17 ` [PATCH 06/12] KVM: MMU: cleanup __kvm_sync_page and its callers Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page Paolo Bonzini
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

Return true if the page was synced (and the TLB must be flushed)
and false if the page was zapped.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c         | 29 +++++++++++++----------------
 arch/x86/kvm/paging_tmpl.h |  4 ++--
 2 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 56fa1636c0cf..e3215cc89d97 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1886,20 +1886,20 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 		if ((_sp)->role.direct || (_sp)->role.invalid) {} else
 
 /* @sp->gfn should be write-protected at the call site */
-static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			   struct list_head *invalid_list)
+static bool __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+			    struct list_head *invalid_list)
 {
 	if (sp->role.cr4_pae != !!is_pae(vcpu)) {
 		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
-		return 1;
+		return false;
 	}
 
-	if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
+	if (vcpu->arch.mmu.sync_page(vcpu, sp) == 0) {
 		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
-		return 1;
+		return false;
 	}
 
-	return 0;
+	return true;
 }
 
 static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
@@ -1917,14 +1917,14 @@ static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 }
 
-static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
-				   struct kvm_mmu_page *sp)
+static bool kvm_sync_page_transient(struct kvm_vcpu *vcpu,
+				    struct kvm_mmu_page *sp)
 {
 	LIST_HEAD(invalid_list);
 	int ret;
 
 	ret = __kvm_sync_page(vcpu, sp, &invalid_list);
-	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, !ret);
+	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, ret);
 
 	return ret;
 }
@@ -1936,7 +1936,7 @@ static void kvm_mmu_audit(struct kvm_vcpu *vcpu, int point) { }
 static void mmu_audit_disable(void) { }
 #endif
 
-static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			 struct list_head *invalid_list)
 {
 	kvm_unlink_unsync_page(vcpu->kvm, sp);
@@ -1955,8 +1955,7 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 			continue;
 
 		WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
-		if (!kvm_sync_page(vcpu, s, &invalid_list))
-			flush = true;
+		flush |= kvm_sync_page(vcpu, s, &invalid_list);
 	}
 
 	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
@@ -2054,9 +2053,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 			kvm_flush_remote_tlbs(vcpu->kvm);
 
 		for_each_sp(pages, sp, parents, i) {
-			if (!kvm_sync_page(vcpu, sp, &invalid_list))
-				flush = true;
-
+			flush |= kvm_sync_page(vcpu, sp, &invalid_list);
 			mmu_pages_clear_parents(&parents);
 		}
 		kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
@@ -2115,7 +2112,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (sp->role.word != role.word)
 			continue;
 
-		if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
+		if (sp->unsync && !kvm_sync_page_transient(vcpu, sp))
 			break;
 
 		if (sp->unsync_children)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 05827ff7bd2e..2e019b103249 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -938,7 +938,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 
 		if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
 					       sizeof(pt_element_t)))
-			return -EINVAL;
+			return 0;
 
 		if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
 			vcpu->kvm->tlbs_dirty++;
@@ -970,7 +970,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 			 host_writable);
 	}
 
-	return !nr_present;
+	return nr_present;
 }
 
 #undef pt_element_t
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (6 preceding siblings ...)
  2016-02-24 13:17 ` [PATCH 07/12] KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page* Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-25  7:32   ` Xiao Guangrong
  2016-02-24 13:17 ` [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children Paolo Bonzini
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

kvm_mmu_get_page is the only caller of kvm_sync_page_transient
and kvm_sync_pages.  Moving the handling of the invalid_list there
removes the need for the underdocumented kvm_sync_page_transient
function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	Guangrong, at this point I am confused about why
	kvm_sync_page_transient didn't clear sp->unsync.  Do
	you remember?  Or perhaps kvm_mmu_get_page could just
	call kvm_sync_page now?

	Also, can you explain the need_sync variable in
	kvm_mmu_get_page?

 arch/x86/kvm/mmu.c | 43 ++++++++++++++++++++++---------------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e3215cc89d97..725316df32ec 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1917,18 +1917,6 @@ static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 }
 
-static bool kvm_sync_page_transient(struct kvm_vcpu *vcpu,
-				    struct kvm_mmu_page *sp)
-{
-	LIST_HEAD(invalid_list);
-	int ret;
-
-	ret = __kvm_sync_page(vcpu, sp, &invalid_list);
-	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, ret);
-
-	return ret;
-}
-
 #ifdef CONFIG_KVM_MMU_AUDIT
 #include "mmu_audit.c"
 #else
@@ -1944,21 +1932,21 @@ static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 }
 
 /* @gfn should be write-protected at the call site */
-static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
+static bool kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
+			   struct list_head *invalid_list)
 {
 	struct kvm_mmu_page *s;
-	LIST_HEAD(invalid_list);
-	bool flush = false;
+	bool ret = false;
 
 	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
 		if (!s->unsync)
 			continue;
 
 		WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
-		flush |= kvm_sync_page(vcpu, s, &invalid_list);
+		ret |= kvm_sync_page(vcpu, s, invalid_list);
 	}
 
-	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
+	return ret;
 }
 
 struct mmu_page_path {
@@ -2089,6 +2077,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	unsigned quadrant;
 	struct kvm_mmu_page *sp;
 	bool need_sync = false;
+	bool flush = false;
+	LIST_HEAD(invalid_list);
 
 	role = vcpu->arch.mmu.base_role;
 	role.level = level;
@@ -2112,8 +2103,16 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (sp->role.word != role.word)
 			continue;
 
-		if (sp->unsync && !kvm_sync_page_transient(vcpu, sp))
-			break;
+		if (sp->unsync) {
+			/* The page is good, but __kvm_sync_page might still end
+			 * up zapping it.  If so, break in order to rebuild it.
+			 */
+			if (!__kvm_sync_page(vcpu, sp, &invalid_list))
+				break;
+
+			WARN_ON(!list_empty(&invalid_list));
+			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+		}
 
 		if (sp->unsync_children)
 			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
@@ -2133,13 +2132,15 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		if (rmap_write_protect(vcpu, gfn))
 			kvm_flush_remote_tlbs(vcpu->kvm);
 		if (level > PT_PAGE_TABLE_LEVEL && need_sync)
-			kvm_sync_pages(vcpu, gfn);
+			flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
 
 		account_shadowed(vcpu->kvm, sp);
 	}
 	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
 	clear_page(sp->spt);
 	trace_kvm_mmu_get_page(sp, true);
+
+	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
 	return sp;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (7 preceding siblings ...)
  2016-02-24 13:17 ` [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-25  2:15   ` Takuya Yoshikawa
  2016-02-24 13:17 ` [PATCH 10/12] KVM: mark memory barrier with smp_mb__after_atomic Paolo Bonzini
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

Move the call to kvm_mmu_flush_or_zap outside the loop.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 725316df32ec..6d47b5c43246 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2029,24 +2029,27 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 	struct mmu_page_path parents;
 	struct kvm_mmu_pages pages;
 	LIST_HEAD(invalid_list);
+	bool flush = false;
 
 	while (mmu_unsync_walk(parent, &pages)) {
 		bool protected = false;
-		bool flush = false;
 
 		for_each_sp(pages, sp, parents, i)
 			protected |= rmap_write_protect(vcpu, sp->gfn);
 
-		if (protected)
+		if (protected) {
 			kvm_flush_remote_tlbs(vcpu->kvm);
+			flush = false;
+		}
 
 		for_each_sp(pages, sp, parents, i) {
 			flush |= kvm_sync_page(vcpu, sp, &invalid_list);
 			mmu_pages_clear_parents(&parents);
 		}
-		kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
 		cond_resched_lock(&vcpu->kvm->mmu_lock);
 	}
+
+	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
 }
 
 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/12] KVM: mark memory barrier with smp_mb__after_atomic
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (8 preceding siblings ...)
  2016-02-24 13:17 ` [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 11/12] KVM: MMU: simplify last_pte_bitmap Paolo Bonzini
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 39c36d4f4f5c..77a73739b08e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -171,7 +171,7 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
 		cpu = vcpu->cpu;
 
 		/* Set ->requests bit before we read ->mode */
-		smp_mb();
+		smp_mb__after_atomic();
 
 		if (cpus != NULL && cpu != -1 && cpu != me &&
 		      kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/12] KVM: MMU: simplify last_pte_bitmap
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (9 preceding siblings ...)
  2016-02-24 13:17 ` [PATCH 10/12] KVM: mark memory barrier with smp_mb__after_atomic Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-24 13:17 ` [PATCH 12/12] KVM: MMU: micro-optimize gpte_access Paolo Bonzini
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

Branch-free code is fun and everybody knows how much Avi loves it,
but last_pte_bitmap takes it a bit to the extreme.  A logical | is
still branch-free and simpler than building the whole truth table into
last_pte_bitmap.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  6 +-----
 arch/x86/kvm/mmu.c              | 40 +++++++++++++++++++++++++---------------
 2 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7b5459982433..3b4c0b98f285 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -338,11 +338,7 @@ struct kvm_mmu {
 
 	struct rsvd_bits_validate guest_rsvd_check;
 
-	/*
-	 * Bitmap: bit set = last pte in walk
-	 * index[0:1]: level (zero-based)
-	 * index[2]: pte.ps
-	 */
+	/* bit [7-n] = can have large pages at (one-based) level n */
 	u8 last_pte_bitmap;
 
 	bool nx;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6d47b5c43246..6bf8c36b681b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3566,13 +3566,21 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 	return false;
 }
 
-static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gpte)
+static inline bool is_last_gpte(struct kvm_mmu *mmu,
+				unsigned level, unsigned gpte)
 {
-	unsigned index;
+	unsigned m1, m2;
+
+	/* PT_PAGE_TABLE_LEVEL always terminates.  m1 has bit 7 set iff
+	 * level <= PT_PAGE_TABLE_LEVEL, which for our purpose means
+	 * level == PT_PAGE_TABLE_LEVEL (level is never zero).  */
+	m1 = level - (PT_PAGE_TABLE_LEVEL + 1);
 
-	index = level - 1;
-	index |= (gpte & PT_PAGE_SIZE_MASK) >> (PT_PAGE_SIZE_SHIFT - 2);
-	return mmu->last_pte_bitmap & (1 << index);
+	/* last_pte_bitmap is built so that shifting it left by level
+	 * produces a 1-bit mask for gpte's PT_PAGE_SIZE_MASK bit.
+	 */
+	m2 = gpte & (mmu->last_pte_bitmap << level);
+	return (m1 | m2) & PT_PAGE_SIZE_MASK;
 }
 
 #define PTTYPE_EPT 18 /* arbitrary */
@@ -3848,16 +3856,18 @@ static void update_last_pte_bitmap(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 {
 	u8 map;
 	unsigned level, root_level = mmu->root_level;
-	const unsigned ps_set_index = 1 << 2;  /* bit 2 of index: ps */
-
-	if (root_level == PT32E_ROOT_LEVEL)
-		--root_level;
-	/* PT_PAGE_TABLE_LEVEL always terminates */
-	map = 1 | (1 << ps_set_index);
-	for (level = PT_DIRECTORY_LEVEL; level <= root_level; ++level) {
-		if (level <= PT_PDPE_LEVEL
-		    && (mmu->root_level >= PT32E_ROOT_LEVEL || is_pse(vcpu)))
-			map |= 1 << (ps_set_index | (level - 1));
+
+	map = 0;
+	if (root_level >= PT32E_ROOT_LEVEL || is_pse(vcpu)) {
+		/* Apart from PSE, there are never large pages at the top level
+		 * (512GB for 64-bit, 1GB for PAE).
+		 */
+		if (root_level != PT32_ROOT_LEVEL)
+			--root_level;
+
+		BUILD_BUG_ON(PT_PAGE_SIZE_MASK > 255);
+		for (level = PT_DIRECTORY_LEVEL; level <= root_level; ++level)
+			map |= PT_PAGE_SIZE_MASK >> level;
 	}
 	mmu->last_pte_bitmap = map;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/12] KVM: MMU: micro-optimize gpte_access
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (10 preceding siblings ...)
  2016-02-24 13:17 ` [PATCH 11/12] KVM: MMU: simplify last_pte_bitmap Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
  2016-02-25  8:28 ` [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Xiao Guangrong
  2016-03-04 21:43 ` Paolo Bonzini
  13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti

Avoid AND-NOT, most x86 processor lack an instruction for it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 2e019b103249..3c77bdd443d9 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -189,8 +189,11 @@ static inline unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, u64 gpte)
 		((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
 		ACC_USER_MASK;
 #else
-	access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
-	access &= ~(gpte >> PT64_NX_SHIFT);
+	BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK);
+	BUILD_BUG_ON(ACC_EXEC_MASK != 1);
+	access = gpte & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK);
+	/* Combine NX with P (which is set here) to get ACC_EXEC_MASK.  */
+	access ^= (gpte >> PT64_NX_SHIFT);
 #endif
 
 	return access;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/12] KVM: MMU: Fix ubsan warnings
  2016-02-24 13:17 ` [PATCH 01/12] KVM: MMU: Fix ubsan warnings Paolo Bonzini
@ 2016-02-24 13:42   ` Mike Krinkin
  2016-02-24 13:43     ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Mike Krinkin @ 2016-02-24 13:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, yoshikawa_takuya_b1, guangrong.xiao, mtosatti

Could you also merge a fix for the following ubsan warning (i
reported this one too, but it attracted a bit less attention):

[  168.791851] ================================================================================
[  168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
[  168.791866] index 4 is out of range for type 'u64 [4]'
[  168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G           O L  4.5.0-rc5-next-20160222 #7
[  168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
[  168.791876]  0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
[  168.791882]  ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
[  168.791886]  0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
[  168.791891] Call Trace:
[  168.791899]  [<ffffffff81c9f780>] dump_stack+0xcc/0x12c
[  168.791904]  [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
[  168.791910]  [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
[  168.791914]  [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
[  168.791918]  [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
[  168.791922]  [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
[  168.791954]  [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
[  168.791958]  [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
[  168.791987]  [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
[  168.792014]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
[  168.792019]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
[  168.792044]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
[  168.792076]  [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
[  168.792121]  [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
[  168.792130]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[  168.792178]  [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
[  168.792208]  [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
[  168.792234]  [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
[  168.792238]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[  168.792263]  [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
[  168.792290]  [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
[  168.792314]  [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
[  168.792340]  [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
[  168.792367]  [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
[  168.792374]  [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
[  168.792400]  [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
[  168.792424]  [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
[  168.792449]  [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
[  168.792474]  [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
[  168.792499]  [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
[  168.792524]  [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
[  168.792532]  [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
[  168.792539]  [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
[  168.792546]  [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
[  168.792572]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
[  168.792597]  [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
[  168.792621]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
[  168.792627]  [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
[  168.792651]  [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
[  168.792656]  [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
[  168.792681]  [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
[  168.792704]  [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
[  168.792727]  [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
[  168.792732]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
[  168.792735]  [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
[  168.792740]  [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
[  168.792744]  [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
[  168.792747]  [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
[  168.792751]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[  168.792756]  [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
[  168.792759]  [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
[  168.792763]  [<ffffffff8174aef3>] ? __fget+0x273/0x4a0
[  168.792766]  [<ffffffff8174acd0>] ? __fget+0x50/0x4a0
[  168.792770]  [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
[  168.792773]  [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90
[  168.792777]  [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
[  168.792780] ================================================================================

This seems to be a typo, and the following fix solves problem for me:

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6c9fed9..2ce4f05 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -249,7 +249,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
                        return ret;

                kvm_vcpu_mark_page_dirty(vcpu, table_gfn);
-               walker->ptes[level] = pte;
+               walker->ptes[level - 1] = pte;
        }
        return 0;
 }

On Wed, Feb 24, 2016 at 02:17:42PM +0100, Paolo Bonzini wrote:
> kvm_mmu_pages_init is doing some really yucky stuff.  It is setting
> up a sentinel for mmu_page_clear_parents; however, because of a) the
> way levels are numbered starting from 1 and b) the way mmu_page_path
> sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
> out of bounds.  This is harmless because the code overwrites up to the
> first two elements of parents->idx and these are initialized, and
> because the sentinel is not needed in this case---mmu_page_clear_parents
> exits anyway when it gets to the end of the array.  However ubsan
> complains, and everyone else should too.
> 
> This fix does three things.  First it makes the mmu_page_path arrays
> PT64_ROOT_LEVEL elements in size, so that we can write to them without
> checking the level in advance.  Second it disintegrates kvm_mmu_pages_init
> between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
> for_each_sp (to place the NULL sentinel at the end of the current path).
> This is okay because the mmu_page_path is only used in
> mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
> a for_each_sp iterator, and hence always after a call to mmu_pages_next.
> Third it changes mmu_pages_clear_parents to just use the sentinel to
> stop iteration, without checking the bounds on level.
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu.c | 57 +++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 33 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 07f4c26a10d3..4dee855897cf 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1843,6 +1843,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>  static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>  			   struct kvm_mmu_pages *pvec)
>  {
> +	pvec->nr = 0;
>  	if (!sp->unsync_children)
>  		return 0;
>  
> @@ -1956,13 +1957,12 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
>  }
>  
>  struct mmu_page_path {
> -	struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
> -	unsigned int idx[PT64_ROOT_LEVEL-1];
> +	struct kvm_mmu_page *parent[PT64_ROOT_LEVEL];
> +	unsigned int idx[PT64_ROOT_LEVEL];
>  };
>  
>  #define for_each_sp(pvec, sp, parents, i)			\
> -		for (i = mmu_pages_next(&pvec, &parents, -1),	\
> -			sp = pvec.page[i].sp;			\
> +		for (i = mmu_pages_first(&pvec, &parents);	\
>  			i < pvec.nr && ({ sp = pvec.page[i].sp; 1;});	\
>  			i = mmu_pages_next(&pvec, &parents, i))
>  
> @@ -1974,19 +1974,41 @@ static int mmu_pages_next(struct kvm_mmu_pages *pvec,
>  
>  	for (n = i+1; n < pvec->nr; n++) {
>  		struct kvm_mmu_page *sp = pvec->page[n].sp;
> +		unsigned idx = pvec->page[n].idx;
> +		int level = sp->role.level;
>  
> -		if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
> -			parents->idx[0] = pvec->page[n].idx;
> -			return n;
> -		}
> +		parents->idx[level-1] = idx;
> +		if (level == PT_PAGE_TABLE_LEVEL)
> +			break;
>  
> -		parents->parent[sp->role.level-2] = sp;
> -		parents->idx[sp->role.level-1] = pvec->page[n].idx;
> +		parents->parent[level-2] = sp;
>  	}
>  
>  	return n;
>  }
>  
> +static int mmu_pages_first(struct kvm_mmu_pages *pvec,
> +			   struct mmu_page_path *parents)
> +{
> +	struct kvm_mmu_page *sp;
> +	int level;
> +
> +	if (pvec->nr == 0)
> +		return 0;
> +
> +	sp = pvec->page[0].sp;
> +	level = sp->role.level;
> +	WARN_ON(level == PT_PAGE_TABLE_LEVEL);
> +
> +	parents->parent[level-2] = sp;
> +
> +	/* Also set up a sentinel.  Further entries in pvec are all
> +	 * children of sp, so this element is never overwritten.
> +	 */
> +	parents->parent[level-1] = NULL;
> +	return mmu_pages_next(pvec, parents, 0);
> +}
> +
>  static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>  {
>  	struct kvm_mmu_page *sp;
> @@ -1994,22 +2016,13 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>  
>  	do {
>  		unsigned int idx = parents->idx[level];
> -
>  		sp = parents->parent[level];
>  		if (!sp)
>  			return;
>  
>  		clear_unsync_child_bit(sp, idx);
>  		level++;
> -	} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
> -}
> -
> -static void kvm_mmu_pages_init(struct kvm_mmu_page *parent,
> -			       struct mmu_page_path *parents,
> -			       struct kvm_mmu_pages *pvec)
> -{
> -	parents->parent[parent->role.level-1] = NULL;
> -	pvec->nr = 0;
> +	} while (!sp->unsync_children);
>  }
>  
>  static void mmu_sync_children(struct kvm_vcpu *vcpu,
> @@ -2021,7 +2034,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>  	struct kvm_mmu_pages pages;
>  	LIST_HEAD(invalid_list);
>  
> -	kvm_mmu_pages_init(parent, &parents, &pages);
>  	while (mmu_unsync_walk(parent, &pages)) {
>  		bool protected = false;
>  
> @@ -2037,7 +2049,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>  		}
>  		kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
>  		cond_resched_lock(&vcpu->kvm->mmu_lock);
> -		kvm_mmu_pages_init(parent, &parents, &pages);
>  	}
>  }
>  
> @@ -2269,7 +2280,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
>  	if (parent->role.level == PT_PAGE_TABLE_LEVEL)
>  		return 0;
>  
> -	kvm_mmu_pages_init(parent, &parents, &pages);
>  	while (mmu_unsync_walk(parent, &pages)) {
>  		struct kvm_mmu_page *sp;
>  
> @@ -2278,7 +2288,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
>  			mmu_pages_clear_parents(&parents);
>  			zapped++;
>  		}
> -		kvm_mmu_pages_init(parent, &parents, &pages);
>  	}
>  
>  	return zapped;
> -- 
> 1.8.3.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/12] KVM: MMU: Fix ubsan warnings
  2016-02-24 13:42   ` Mike Krinkin
@ 2016-02-24 13:43     ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:43 UTC (permalink / raw)
  To: Mike Krinkin
  Cc: linux-kernel, kvm, yoshikawa_takuya_b1, guangrong.xiao, mtosatti



On 24/02/2016 14:42, Mike Krinkin wrote:
> Could you also merge a fix for the following ubsan warning (i
> reported this one too, but it attracted a bit less attention):
> 
> [  168.791851] ================================================================================
> [  168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
> [  168.791866] index 4 is out of range for type 'u64 [4]'
> [  168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G           O L  4.5.0-rc5-next-20160222 #7
> [  168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
> [  168.791876]  0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
> [  168.791882]  ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
> [  168.791886]  0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
> [  168.791891] Call Trace:
> [  168.791899]  [<ffffffff81c9f780>] dump_stack+0xcc/0x12c
> [  168.791904]  [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
> [  168.791910]  [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
> [  168.791914]  [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
> [  168.791918]  [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
> [  168.791922]  [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
> [  168.791954]  [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
> [  168.791958]  [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
> [  168.791987]  [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
> [  168.792014]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
> [  168.792019]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
> [  168.792044]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
> [  168.792076]  [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
> [  168.792121]  [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
> [  168.792130]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
> [  168.792178]  [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
> [  168.792208]  [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
> [  168.792234]  [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
> [  168.792238]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
> [  168.792263]  [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
> [  168.792290]  [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
> [  168.792314]  [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
> [  168.792340]  [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
> [  168.792367]  [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
> [  168.792374]  [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
> [  168.792400]  [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
> [  168.792424]  [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
> [  168.792449]  [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
> [  168.792474]  [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
> [  168.792499]  [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
> [  168.792524]  [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
> [  168.792532]  [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
> [  168.792539]  [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
> [  168.792546]  [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
> [  168.792572]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
> [  168.792597]  [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
> [  168.792621]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
> [  168.792627]  [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
> [  168.792651]  [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
> [  168.792656]  [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
> [  168.792681]  [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
> [  168.792704]  [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
> [  168.792727]  [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
> [  168.792732]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
> [  168.792735]  [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
> [  168.792740]  [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
> [  168.792744]  [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
> [  168.792747]  [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
> [  168.792751]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
> [  168.792756]  [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
> [  168.792759]  [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
> [  168.792763]  [<ffffffff8174aef3>] ? __fget+0x273/0x4a0
> [  168.792766]  [<ffffffff8174acd0>] ? __fget+0x50/0x4a0
> [  168.792770]  [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
> [  168.792773]  [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90
> [  168.792777]  [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
> [  168.792780] ================================================================================
> 
> This seems to be a typo, and the following fix solves problem for me:
> 
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 6c9fed9..2ce4f05 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -249,7 +249,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>                         return ret;
> 
>                 kvm_vcpu_mark_page_dirty(vcpu, table_gfn);
> -               walker->ptes[level] = pte;
> +               walker->ptes[level - 1] = pte;
>         }
>         return 0;
>  }

This patch is correct---good catch!  Can you resend it with a
"Signed-off-by: Mike Krinkin <krinkin.m.u@gmail.com>" line?

Thanks,

Paolo

> On Wed, Feb 24, 2016 at 02:17:42PM +0100, Paolo Bonzini wrote:
>> kvm_mmu_pages_init is doing some really yucky stuff.  It is setting
>> up a sentinel for mmu_page_clear_parents; however, because of a) the
>> way levels are numbered starting from 1 and b) the way mmu_page_path
>> sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
>> out of bounds.  This is harmless because the code overwrites up to the
>> first two elements of parents->idx and these are initialized, and
>> because the sentinel is not needed in this case---mmu_page_clear_parents
>> exits anyway when it gets to the end of the array.  However ubsan
>> complains, and everyone else should too.
>>
>> This fix does three things.  First it makes the mmu_page_path arrays
>> PT64_ROOT_LEVEL elements in size, so that we can write to them without
>> checking the level in advance.  Second it disintegrates kvm_mmu_pages_init
>> between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
>> for_each_sp (to place the NULL sentinel at the end of the current path).
>> This is okay because the mmu_page_path is only used in
>> mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
>> a for_each_sp iterator, and hence always after a call to mmu_pages_next.
>> Third it changes mmu_pages_clear_parents to just use the sentinel to
>> stop iteration, without checking the bounds on level.
>>
>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>> Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  arch/x86/kvm/mmu.c | 57 +++++++++++++++++++++++++++++++-----------------------
>>  1 file changed, 33 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 07f4c26a10d3..4dee855897cf 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1843,6 +1843,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>>  static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>>  			   struct kvm_mmu_pages *pvec)
>>  {
>> +	pvec->nr = 0;
>>  	if (!sp->unsync_children)
>>  		return 0;
>>  
>> @@ -1956,13 +1957,12 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
>>  }
>>  
>>  struct mmu_page_path {
>> -	struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
>> -	unsigned int idx[PT64_ROOT_LEVEL-1];
>> +	struct kvm_mmu_page *parent[PT64_ROOT_LEVEL];
>> +	unsigned int idx[PT64_ROOT_LEVEL];
>>  };
>>  
>>  #define for_each_sp(pvec, sp, parents, i)			\
>> -		for (i = mmu_pages_next(&pvec, &parents, -1),	\
>> -			sp = pvec.page[i].sp;			\
>> +		for (i = mmu_pages_first(&pvec, &parents);	\
>>  			i < pvec.nr && ({ sp = pvec.page[i].sp; 1;});	\
>>  			i = mmu_pages_next(&pvec, &parents, i))
>>  
>> @@ -1974,19 +1974,41 @@ static int mmu_pages_next(struct kvm_mmu_pages *pvec,
>>  
>>  	for (n = i+1; n < pvec->nr; n++) {
>>  		struct kvm_mmu_page *sp = pvec->page[n].sp;
>> +		unsigned idx = pvec->page[n].idx;
>> +		int level = sp->role.level;
>>  
>> -		if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
>> -			parents->idx[0] = pvec->page[n].idx;
>> -			return n;
>> -		}
>> +		parents->idx[level-1] = idx;
>> +		if (level == PT_PAGE_TABLE_LEVEL)
>> +			break;
>>  
>> -		parents->parent[sp->role.level-2] = sp;
>> -		parents->idx[sp->role.level-1] = pvec->page[n].idx;
>> +		parents->parent[level-2] = sp;
>>  	}
>>  
>>  	return n;
>>  }
>>  
>> +static int mmu_pages_first(struct kvm_mmu_pages *pvec,
>> +			   struct mmu_page_path *parents)
>> +{
>> +	struct kvm_mmu_page *sp;
>> +	int level;
>> +
>> +	if (pvec->nr == 0)
>> +		return 0;
>> +
>> +	sp = pvec->page[0].sp;
>> +	level = sp->role.level;
>> +	WARN_ON(level == PT_PAGE_TABLE_LEVEL);
>> +
>> +	parents->parent[level-2] = sp;
>> +
>> +	/* Also set up a sentinel.  Further entries in pvec are all
>> +	 * children of sp, so this element is never overwritten.
>> +	 */
>> +	parents->parent[level-1] = NULL;
>> +	return mmu_pages_next(pvec, parents, 0);
>> +}
>> +
>>  static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>>  {
>>  	struct kvm_mmu_page *sp;
>> @@ -1994,22 +2016,13 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>>  
>>  	do {
>>  		unsigned int idx = parents->idx[level];
>> -
>>  		sp = parents->parent[level];
>>  		if (!sp)
>>  			return;
>>  
>>  		clear_unsync_child_bit(sp, idx);
>>  		level++;
>> -	} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
>> -}
>> -
>> -static void kvm_mmu_pages_init(struct kvm_mmu_page *parent,
>> -			       struct mmu_page_path *parents,
>> -			       struct kvm_mmu_pages *pvec)
>> -{
>> -	parents->parent[parent->role.level-1] = NULL;
>> -	pvec->nr = 0;
>> +	} while (!sp->unsync_children);
>>  }
>>  
>>  static void mmu_sync_children(struct kvm_vcpu *vcpu,
>> @@ -2021,7 +2034,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>>  	struct kvm_mmu_pages pages;
>>  	LIST_HEAD(invalid_list);
>>  
>> -	kvm_mmu_pages_init(parent, &parents, &pages);
>>  	while (mmu_unsync_walk(parent, &pages)) {
>>  		bool protected = false;
>>  
>> @@ -2037,7 +2049,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>>  		}
>>  		kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
>>  		cond_resched_lock(&vcpu->kvm->mmu_lock);
>> -		kvm_mmu_pages_init(parent, &parents, &pages);
>>  	}
>>  }
>>  
>> @@ -2269,7 +2280,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
>>  	if (parent->role.level == PT_PAGE_TABLE_LEVEL)
>>  		return 0;
>>  
>> -	kvm_mmu_pages_init(parent, &parents, &pages);
>>  	while (mmu_unsync_walk(parent, &pages)) {
>>  		struct kvm_mmu_page *sp;
>>  
>> @@ -2278,7 +2288,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
>>  			mmu_pages_clear_parents(&parents);
>>  			zapped++;
>>  		}
>> -		kvm_mmu_pages_init(parent, &parents, &pages);
>>  	}
>>  
>>  	return zapped;
>> -- 
>> 1.8.3.1
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
  2016-02-24 13:17 ` [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children Paolo Bonzini
@ 2016-02-25  2:15   ` Takuya Yoshikawa
  2016-02-25  7:35     ` Xiao Guangrong
  2016-02-25  8:46     ` Paolo Bonzini
  0 siblings, 2 replies; 26+ messages in thread
From: Takuya Yoshikawa @ 2016-02-25  2:15 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: guangrong.xiao, mtosatti

On 2016/02/24 22:17, Paolo Bonzini wrote:
> Move the call to kvm_mmu_flush_or_zap outside the loop.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/mmu.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 725316df32ec..6d47b5c43246 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2029,24 +2029,27 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>   	struct mmu_page_path parents;
>   	struct kvm_mmu_pages pages;
>   	LIST_HEAD(invalid_list);
> +	bool flush = false;
>   
>   	while (mmu_unsync_walk(parent, &pages)) {
>   		bool protected = false;
> -		bool flush = false;
>   
>   		for_each_sp(pages, sp, parents, i)
>   			protected |= rmap_write_protect(vcpu, sp->gfn);
>   
> -		if (protected)
> +		if (protected) {
>   			kvm_flush_remote_tlbs(vcpu->kvm);
> +			flush = false;
> +		}
>   
>   		for_each_sp(pages, sp, parents, i) {
>   			flush |= kvm_sync_page(vcpu, sp, &invalid_list);
>   			mmu_pages_clear_parents(&parents);
>   		}
> -		kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
>   		cond_resched_lock(&vcpu->kvm->mmu_lock);

This may release the mmu_lock before committing the zapping.
Is it safe?  If so, we may want to see the reason in the changelog.

  Takuya

>   	}
> +
> +	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
>   }
>   
>   static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page
  2016-02-24 13:17 ` [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page Paolo Bonzini
@ 2016-02-25  7:32   ` Xiao Guangrong
  2016-02-25  8:48     ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Xiao Guangrong @ 2016-02-25  7:32 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, mtosatti



On 02/24/2016 09:17 PM, Paolo Bonzini wrote:
> kvm_mmu_get_page is the only caller of kvm_sync_page_transient
> and kvm_sync_pages.  Moving the handling of the invalid_list there
> removes the need for the underdocumented kvm_sync_page_transient
> function.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> 	Guangrong, at this point I am confused about why
> 	kvm_sync_page_transient didn't clear sp->unsync.  Do
> 	you remember?  Or perhaps kvm_mmu_get_page could just
> 	call kvm_sync_page now?
>

It is the optimization to reduce write-protect as changing unsync to
sync need to write-protect the page and sync all sptes pointing to the
same gfn.

However, after syncing the content between unsync-ed spte and guest pte,
we can reuse this spte perfectly.

> 	Also, can you explain the need_sync variable in
> 	kvm_mmu_get_page?

This is because we need to to protect the semanteme of 'unsync spte' as
only the spte on last level (level = 1) can be unsync so that if a spte
on the upper level is created we should eliminate all the unsync sptes
pointing to the same gfn.

As you have already merged this patchset to the kvm tree, i will post
a patch to comment these cases to make the code be more understandable.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
  2016-02-25  2:15   ` Takuya Yoshikawa
@ 2016-02-25  7:35     ` Xiao Guangrong
  2016-02-25  8:49       ` Paolo Bonzini
  2016-02-25  8:46     ` Paolo Bonzini
  1 sibling, 1 reply; 26+ messages in thread
From: Xiao Guangrong @ 2016-02-25  7:35 UTC (permalink / raw)
  To: Takuya Yoshikawa, Paolo Bonzini, linux-kernel, kvm; +Cc: mtosatti



On 02/25/2016 10:15 AM, Takuya Yoshikawa wrote:
> On 2016/02/24 22:17, Paolo Bonzini wrote:
>> Move the call to kvm_mmu_flush_or_zap outside the loop.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>    arch/x86/kvm/mmu.c | 9 ++++++---
>>    1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 725316df32ec..6d47b5c43246 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2029,24 +2029,27 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>>    	struct mmu_page_path parents;
>>    	struct kvm_mmu_pages pages;
>>    	LIST_HEAD(invalid_list);
>> +	bool flush = false;
>>
>>    	while (mmu_unsync_walk(parent, &pages)) {
>>    		bool protected = false;
>> -		bool flush = false;
>>
>>    		for_each_sp(pages, sp, parents, i)
>>    			protected |= rmap_write_protect(vcpu, sp->gfn);
>>
>> -		if (protected)
>> +		if (protected) {
>>    			kvm_flush_remote_tlbs(vcpu->kvm);
>> +			flush = false;
>> +		}
>>
>>    		for_each_sp(pages, sp, parents, i) {
>>    			flush |= kvm_sync_page(vcpu, sp, &invalid_list);
>>    			mmu_pages_clear_parents(&parents);
>>    		}
>> -		kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
>>    		cond_resched_lock(&vcpu->kvm->mmu_lock);
>
> This may release the mmu_lock before committing the zapping.
> Is it safe?  If so, we may want to see the reason in the changelog.

It is unsafe indeed, please do not do it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (11 preceding siblings ...)
  2016-02-24 13:17 ` [PATCH 12/12] KVM: MMU: micro-optimize gpte_access Paolo Bonzini
@ 2016-02-25  8:28 ` Xiao Guangrong
  2016-02-25  8:49   ` Paolo Bonzini
  2016-03-04 21:43 ` Paolo Bonzini
  13 siblings, 1 reply; 26+ messages in thread
From: Xiao Guangrong @ 2016-02-25  8:28 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, mtosatti



On 02/24/2016 09:17 PM, Paolo Bonzini wrote:
> This series started from looking at mmu_unsync_walk for the ubsan thread.
> Patches 1 and 2 are the result of the discussions in that thread.
>
> Patches 3 to 9 do more cleanups in __kvm_sync_page and its callers.
> Among other changes, it removes kvm_sync_page_transient and avoids
> duplicate code between __kvm_sync_page and kvm_sync_pages.
>
> I stopped where I had questions about the existing kvm_mmu_get_page
> code (see patch 8 for the question).  However perhaps more cleanups
> are possible, also thanks to Takuya's work on that function and
> link_shadow_page.
>
> Patches 10 to 12 are just micro-optimizations.
>
> Guangrong, it would be great if you took a look since you know this part
> of KVM very well.

I have reviewed it and it works fine except the one leaking tlb flush out
of mmu-lock.

I will continue to simplify the path of walking unsync sp to keep
mmu_page_path smaller and make comments for kvm_mmu_get_page on top of
this patchset.

BTW, is any conflict to apply my page-tracking patchset on top of this
patchset (i noticed you've merged this patchset on kvm/queue)? Please
tell me to rebase it if it is needed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
  2016-02-25  2:15   ` Takuya Yoshikawa
  2016-02-25  7:35     ` Xiao Guangrong
@ 2016-02-25  8:46     ` Paolo Bonzini
  1 sibling, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25  8:46 UTC (permalink / raw)
  To: Takuya Yoshikawa, linux-kernel, kvm; +Cc: guangrong.xiao, mtosatti



On 25/02/2016 03:15, Takuya Yoshikawa wrote:
> On 2016/02/24 22:17, Paolo Bonzini wrote:
>> Move the call to kvm_mmu_flush_or_zap outside the loop.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   arch/x86/kvm/mmu.c | 9 ++++++---
>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 725316df32ec..6d47b5c43246 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2029,24 +2029,27 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>>   	struct mmu_page_path parents;
>>   	struct kvm_mmu_pages pages;
>>   	LIST_HEAD(invalid_list);
>> +	bool flush = false;
>>   
>>   	while (mmu_unsync_walk(parent, &pages)) {
>>   		bool protected = false;
>> -		bool flush = false;
>>   
>>   		for_each_sp(pages, sp, parents, i)
>>   			protected |= rmap_write_protect(vcpu, sp->gfn);
>>   
>> -		if (protected)
>> +		if (protected) {
>>   			kvm_flush_remote_tlbs(vcpu->kvm);
>> +			flush = false;
>> +		}
>>   
>>   		for_each_sp(pages, sp, parents, i) {
>>   			flush |= kvm_sync_page(vcpu, sp, &invalid_list);
>>   			mmu_pages_clear_parents(&parents);
>>   		}
>> -		kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
>>   		cond_resched_lock(&vcpu->kvm->mmu_lock);
> 
> This may release the mmu_lock before committing the zapping.
> Is it safe?  If so, we may want to see the reason in the changelog.

It should be safe; the page is already marked as invalid and hence the
role will not match in kvm_mmu_get_page.

The idea is simply that committing the zap is expensive (for example it
requires a remote TLB flush) so you want to do it as rarely as possible.
 I'll note this in the commit message.

Paolo

>   Takuya
> 
>>   	}
>> +
>> +	kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
>>   }
>>   
>>   static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page
  2016-02-25  7:32   ` Xiao Guangrong
@ 2016-02-25  8:48     ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25  8:48 UTC (permalink / raw)
  To: Xiao Guangrong, linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, mtosatti



On 25/02/2016 08:32, Xiao Guangrong wrote:
> 
> As you have already merged this patchset to the kvm tree, i will post
> a patch to comment these cases to make the code be more understandable.

I've only merged it to kvm/queue so that it gets into all my testing
(and the buildbot's).  I won't move it to kvm/next until I get reviews.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
  2016-02-25  7:35     ` Xiao Guangrong
@ 2016-02-25  8:49       ` Paolo Bonzini
  2016-02-25  9:10         ` Xiao Guangrong
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25  8:49 UTC (permalink / raw)
  To: Xiao Guangrong, Takuya Yoshikawa, linux-kernel, kvm; +Cc: mtosatti



On 25/02/2016 08:35, Xiao Guangrong wrote:
>> This may release the mmu_lock before committing the zapping.
>> Is it safe?  If so, we may want to see the reason in the changelog.
> 
> It is unsafe indeed, please do not do it.

Can you explain why?  kvm_zap_obsolete_pages does the same.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations
  2016-02-25  8:28 ` [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Xiao Guangrong
@ 2016-02-25  8:49   ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25  8:49 UTC (permalink / raw)
  To: Xiao Guangrong, linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, mtosatti



On 25/02/2016 09:28, Xiao Guangrong wrote:
> 
> BTW, is any conflict to apply my page-tracking patchset on top of this
> patchset (i noticed you've merged this patchset on kvm/queue)? Please
> tell me to rebase it if it is needed.

No, there shouldn't be any conflict.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
  2016-02-25  8:49       ` Paolo Bonzini
@ 2016-02-25  9:10         ` Xiao Guangrong
  2016-02-25  9:55           ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Xiao Guangrong @ 2016-02-25  9:10 UTC (permalink / raw)
  To: Paolo Bonzini, Takuya Yoshikawa, linux-kernel, kvm; +Cc: mtosatti



On 02/25/2016 04:49 PM, Paolo Bonzini wrote:
>
>
> On 25/02/2016 08:35, Xiao Guangrong wrote:
>>> This may release the mmu_lock before committing the zapping.
>>> Is it safe?  If so, we may want to see the reason in the changelog.
>>
>> It is unsafe indeed, please do not do it.
>
> Can you explain why?  kvm_zap_obsolete_pages does the same.

It's not the same, please see the comment in  kvm_mmu_invalidate_zap_all_pages:
	/*
	 * Notify all vcpus to reload its shadow page table
	 * and flush TLB. Then all vcpus will switch to new
	 * shadow page table with the new mmu_valid_gen.
	 *
	 * Note: we should do this under the protection of
	 * mmu-lock, otherwise, vcpu would purge shadow page
	 * but miss tlb flush.
	 */
	kvm_reload_remote_mmus(kvm);

That means the tlb is flushed before releasing mmu-lock.

A example is in rmap_write_protect(), when KVM creates a shadow page table for
the the guest, it detects no spte pointing to the gfn, so tlb is not flushed so
that guest can freely updates its pte.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
  2016-02-25  9:10         ` Xiao Guangrong
@ 2016-02-25  9:55           ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25  9:55 UTC (permalink / raw)
  To: Xiao Guangrong, Takuya Yoshikawa, linux-kernel, kvm; +Cc: mtosatti



On 25/02/2016 10:10, Xiao Guangrong wrote:
> 
> 
> On 02/25/2016 04:49 PM, Paolo Bonzini wrote:
>>
>>
>> On 25/02/2016 08:35, Xiao Guangrong wrote:
>>>> This may release the mmu_lock before committing the zapping.
>>>> Is it safe?  If so, we may want to see the reason in the changelog.
>>>
>>> It is unsafe indeed, please do not do it.
>>
>> Can you explain why?  kvm_zap_obsolete_pages does the same.
> 
> It's not the same, please see the comment in 
> kvm_mmu_invalidate_zap_all_pages:
>     /*
>      * Notify all vcpus to reload its shadow page table
>      * and flush TLB. Then all vcpus will switch to new
>      * shadow page table with the new mmu_valid_gen.
>      *
>      * Note: we should do this under the protection of
>      * mmu-lock, otherwise, vcpu would purge shadow page
>      * but miss tlb flush.
>      */
>     kvm_reload_remote_mmus(kvm);
> 
> That means the tlb is flushed before releasing mmu-lock.
> 
> A example is in rmap_write_protect(), when KVM creates a shadow page
> table for
> the the guest, it detects no spte pointing to the gfn, so tlb is not
> flushed so
> that guest can freely updates its pte.

Then I'll do a different patch that checks need_resched||spin_needbreak,
and if so does commit+cond_resched_lock.  I've removed 9/12 from
kvm/queue.  Again, sorry for giving the impression that these patches
were already final.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations
  2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
                   ` (12 preceding siblings ...)
  2016-02-25  8:28 ` [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Xiao Guangrong
@ 2016-03-04 21:43 ` Paolo Bonzini
  13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-04 21:43 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti



On 24/02/2016 14:17, Paolo Bonzini wrote:
> This series started from looking at mmu_unsync_walk for the ubsan thread.
> Patches 1 and 2 are the result of the discussions in that thread.
> 
> Patches 3 to 9 do more cleanups in __kvm_sync_page and its callers.
> Among other changes, it removes kvm_sync_page_transient and avoids
> duplicate code between __kvm_sync_page and kvm_sync_pages.
> 
> I stopped where I had questions about the existing kvm_mmu_get_page
> code (see patch 8 for the question).  However perhaps more cleanups
> are possible, also thanks to Takuya's work on that function and
> link_shadow_page.
> 
> Patches 10 to 12 are just micro-optimizations.
> 
> Guangrong, it would be great if you took a look since you know this part
> of KVM very well.
> 
> I have tested this series minus patch 9, and it survived installation
> of various Linux and Windows guests with EPT disabled.  Of course before
> committing the patches I will retest with patch 9 included.
> 
> Paolo
> 
> Paolo Bonzini (11):
>   KVM: MMU: Fix ubsan warnings
>   KVM: MMU: introduce kvm_mmu_flush_or_zap
>   KVM: MMU: move TLB flush out of __kvm_sync_page
>   KVM: MMU: use kvm_sync_page in kvm_sync_pages
>   KVM: MMU: cleanup __kvm_sync_page and its callers
>   KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page*
>   KVM: MMU: move zap/flush to kvm_mmu_get_page
>   KVM: MMU: coalesce zapping page after mmu_sync_children
>   KVM: mark memory barrier with smp_mb__after_atomic
>   KVM: MMU: simplify last_pte_bitmap
>   KVM: MMU: micro-optimize gpte_access
> 
> Xiao Guangrong (1):
>   KVM: MMU: check kvm_mmu_pages and mmu_page_path indices
> 
>  arch/x86/include/asm/kvm_host.h |   6 +-
>  arch/x86/kvm/mmu.c              | 216 ++++++++++++++++++++++------------------
>  arch/x86/kvm/paging_tmpl.h      |  11 +-
>  virt/kvm/kvm_main.c             |   2 +-
>  4 files changed, 126 insertions(+), 109 deletions(-)
> 

I've pushed patches 1 and 2 to kvm/next now; I'll repost the others next
Monday since there were very small conflicts with the page tracking series.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-03-04 21:43 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
2016-02-24 13:17 ` [PATCH 01/12] KVM: MMU: Fix ubsan warnings Paolo Bonzini
2016-02-24 13:42   ` Mike Krinkin
2016-02-24 13:43     ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 02/12] KVM: MMU: check kvm_mmu_pages and mmu_page_path indices Paolo Bonzini
2016-02-24 13:17 ` [PATCH 03/12] KVM: MMU: introduce kvm_mmu_flush_or_zap Paolo Bonzini
2016-02-24 13:17 ` [PATCH 04/12] KVM: MMU: move TLB flush out of __kvm_sync_page Paolo Bonzini
2016-02-24 13:17 ` [PATCH 05/12] KVM: MMU: use kvm_sync_page in kvm_sync_pages Paolo Bonzini
2016-02-24 13:17 ` [PATCH 06/12] KVM: MMU: cleanup __kvm_sync_page and its callers Paolo Bonzini
2016-02-24 13:17 ` [PATCH 07/12] KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page* Paolo Bonzini
2016-02-24 13:17 ` [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page Paolo Bonzini
2016-02-25  7:32   ` Xiao Guangrong
2016-02-25  8:48     ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children Paolo Bonzini
2016-02-25  2:15   ` Takuya Yoshikawa
2016-02-25  7:35     ` Xiao Guangrong
2016-02-25  8:49       ` Paolo Bonzini
2016-02-25  9:10         ` Xiao Guangrong
2016-02-25  9:55           ` Paolo Bonzini
2016-02-25  8:46     ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 10/12] KVM: mark memory barrier with smp_mb__after_atomic Paolo Bonzini
2016-02-24 13:17 ` [PATCH 11/12] KVM: MMU: simplify last_pte_bitmap Paolo Bonzini
2016-02-24 13:17 ` [PATCH 12/12] KVM: MMU: micro-optimize gpte_access Paolo Bonzini
2016-02-25  8:28 ` [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Xiao Guangrong
2016-02-25  8:49   ` Paolo Bonzini
2016-03-04 21:43 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.