linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] KVM: MTRR fixes and some cleanups
@ 2015-05-13  6:42 Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR Xiao Guangrong
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

Changelog in v3:
thanks for Paolo's comment:
- do not apply for_each_rmap_spte to kvm_zap_rmapp and kvm_mmu_unlink_parents
- fix a cosmetic issue in slot_handle_level_range
- introduce PT_MAX_HUGEPAGE_LEVEL to clean up the code
- improve code Indentation

Changelog in v2:
- fix the bit description in changelog of the first patch, thanks
  David Matlack for pointing it out

all follow changes are from Paolo's comment and really appreciate it:
- reorder the whole patchset to make it is more readable
- redesign the iterator APIs
- make TLB clean if @lock_flush_tlb is true in slot_handle_level()
- make MTRR update be generic

This are some MTRR bugs if legacy IOMMU device is used on Intel's CPU:
- In current code, whenever guest MTRR registers are changed
  kvm_mmu_reset_context is called to switch to the new root shadow page
  table, however, it's useless since:
  1) the cache type is not cached into shadow page's attribute so that the
     original root shadow page will be reused

  2) the cache type is set on the last spte, that means we should sync the
     last sptes when MTRR is changed

  We can fix it by dropping all the spte in the gfn range which is
  being updated by MTRR

- some bugs are in get_mtrr_type();
  1: bit 1 of mtrr_state->enabled is corresponding bit 11 of IA32_MTRR_DEF_TYPE
     MSR which completely control MTRR's enablement that means other bits are
     ignored if it is cleared

  2: the fixed MTRR ranges are controlled by bit 0 of mtrr_state->enabled (bit
     10 of IA32_MTRR_DEF_TYPE)
  
  3: if MTRR is disabled, UC is applied to all of physical memory rather than
     mtrr_state->def_type

- we need not to reset mmu once cache policy is changed since shadow page table
  does not virtualize any cache policy

Also, these are some cleanups to make current MMU code more cleaner and help
us fixing the bug more easier. 

Xiao Guangrong (10):
  KVM: MMU: fix decoding cache type from MTRR
  KVM: MMU: introduce for_each_rmap_spte()
  KVM: MMU: introduce PT_MAX_HUGEPAGE_LEVEL
  KVM: MMU: introduce for_each_slot_rmap_range
  KVM: MMU: introduce slot_handle_level_range() and its helpers
  KVM: MMU: use slot_handle_level and its helper to clean  up the code
  KVM: MMU: introduce kvm_zap_rmapp
  KVM: MMU: introduce kvm_zap_gfn_range
  KVM: MMU: fix MTRR update
  KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed

 arch/x86/kvm/mmu.c       | 409 +++++++++++++++++++++++++----------------------
 arch/x86/kvm/mmu.h       |   2 +
 arch/x86/kvm/mmu_audit.c |   4 +-
 arch/x86/kvm/x86.c       |  62 ++++++-
 4 files changed, 281 insertions(+), 196 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
@ 2015-05-13  6:42 ` Xiao Guangrong
  2015-05-13  8:09   ` Wanpeng Li
  2015-07-12 17:33   ` Alex Williamson
  2015-05-13  6:42 ` [PATCH v3 02/10] KVM: MMU: introduce for_each_rmap_spte() Xiao Guangrong
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

There are some bugs in current get_mtrr_type();
1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
   IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
   that means other bits are ignored if it is cleared

2: the fixed MTRR ranges are controlled by bit 0 of
   mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)

3: if MTRR is disabled, UC is applied to all of physical memory rather
   than mtrr_state->def_type

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b78e83f..d00cebd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
 static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
 			 u64 start, u64 end)
 {
-	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
-	int num_var_ranges = KVM_NR_VAR_MTRR;
+	int i, num_var_ranges = KVM_NR_VAR_MTRR;
 
-	if (!mtrr_state->enabled)
-		return 0xFF;
+	/* MTRR is completely disabled, use UC for all of physical memory. */
+	if (!(mtrr_state->enabled & 0x2))
+		return MTRR_TYPE_UNCACHABLE;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
 	/* Look in fixed ranges. Just return the type as per start */
-	if (mtrr_state->have_fixed && (start < 0x100000)) {
+	if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
+	      (start < 0x100000)) {
 		int idx;
 
 		if (start < 0x80000) {
@@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	if (!(mtrr_state->enabled & 2))
-		return mtrr_state->def_type;
-
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 02/10] KVM: MMU: introduce for_each_rmap_spte()
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR Xiao Guangrong
@ 2015-05-13  6:42 ` Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 03/10] KVM: MMU: introduce PT_MAX_HUGEPAGE_LEVEL Xiao Guangrong
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

It's used to walk all the sptes on the rmap to clean up the
code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c       | 53 ++++++++++++++++--------------------------------
 arch/x86/kvm/mmu_audit.c |  4 +---
 2 files changed, 19 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d00cebd..afc8d15 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1142,6 +1142,11 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
 	return NULL;
 }
 
+#define for_each_rmap_spte(_rmap_, _iter_, _spte_)			    \
+	   for (_spte_ = rmap_get_first(*_rmap_, _iter_);		    \
+		_spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;});  \
+			_spte_ = rmap_get_next(_iter_))
+
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
 	if (mmu_spte_clear_track_bits(sptep))
@@ -1205,12 +1210,8 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+	for_each_rmap_spte(rmapp, &iter, sptep)
 		flush |= spte_write_protect(kvm, sptep, pt_protect);
-		sptep = rmap_get_next(&iter);
-	}
 
 	return flush;
 }
@@ -1232,12 +1233,8 @@ static bool __rmap_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+	for_each_rmap_spte(rmapp, &iter, sptep)
 		flush |= spte_clear_dirty(kvm, sptep);
-		sptep = rmap_get_next(&iter);
-	}
 
 	return flush;
 }
@@ -1259,12 +1256,8 @@ static bool __rmap_set_dirty(struct kvm *kvm, unsigned long *rmapp)
 	struct rmap_iterator iter;
 	bool flush = false;
 
-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+	for_each_rmap_spte(rmapp, &iter, sptep)
 		flush |= spte_set_dirty(kvm, sptep);
-		sptep = rmap_get_next(&iter);
-	}
 
 	return flush;
 }
@@ -1394,8 +1387,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	WARN_ON(pte_huge(*ptep));
 	new_pfn = pte_pfn(*ptep);
 
-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
-		BUG_ON(!is_shadow_present_pte(*sptep));
+restart:
+	for_each_rmap_spte(rmapp, &iter, sptep) {
 		rmap_printk("kvm_set_pte_rmapp: spte %p %llx gfn %llx (%d)\n",
 			     sptep, *sptep, gfn, level);
 
@@ -1403,7 +1396,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 
 		if (pte_write(*ptep)) {
 			drop_spte(kvm, sptep);
-			sptep = rmap_get_first(*rmapp, &iter);
+			goto restart;
 		} else {
 			new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
 			new_spte |= (u64)new_pfn << PAGE_SHIFT;
@@ -1414,7 +1407,6 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 
 			mmu_spte_clear_track_bits(sptep);
 			mmu_spte_set(sptep, new_spte);
-			sptep = rmap_get_next(&iter);
 		}
 	}
 
@@ -1518,16 +1510,13 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 
 	BUG_ON(!shadow_accessed_mask);
 
-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter)) {
-		BUG_ON(!is_shadow_present_pte(*sptep));
-
+	for_each_rmap_spte(rmapp, &iter, sptep)
 		if (*sptep & shadow_accessed_mask) {
 			young = 1;
 			clear_bit((ffs(shadow_accessed_mask) - 1),
 				 (unsigned long *)sptep);
 		}
-	}
+
 	trace_kvm_age_page(gfn, level, slot, young);
 	return young;
 }
@@ -1548,15 +1537,11 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	if (!shadow_accessed_mask)
 		goto out;
 
-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter)) {
-		BUG_ON(!is_shadow_present_pte(*sptep));
-
+	for_each_rmap_spte(rmapp, &iter, sptep)
 		if (*sptep & shadow_accessed_mask) {
 			young = 1;
 			break;
 		}
-	}
 out:
 	return young;
 }
@@ -4481,9 +4466,8 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 	pfn_t pfn;
 	struct kvm_mmu_page *sp;
 
-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
-		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+restart:
+	for_each_rmap_spte(rmapp, &iter, sptep) {
 		sp = page_header(__pa(sptep));
 		pfn = spte_to_pfn(*sptep);
 
@@ -4498,10 +4482,9 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 			!kvm_is_reserved_pfn(pfn) &&
 			PageTransCompound(pfn_to_page(pfn))) {
 			drop_spte(kvm, sptep);
-			sptep = rmap_get_first(*rmapp, &iter);
 			need_tlb_flush = 1;
-		} else
-			sptep = rmap_get_next(&iter);
+			goto restart;
+		}
 	}
 
 	return need_tlb_flush;
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 9ade5cf..368d534 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -197,13 +197,11 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	rmapp = gfn_to_rmap(kvm, sp->gfn, PT_PAGE_TABLE_LEVEL);
 
-	for (sptep = rmap_get_first(*rmapp, &iter); sptep;
-	     sptep = rmap_get_next(&iter)) {
+	for_each_rmap_spte(rmapp, &iter, sptep)
 		if (is_writable_pte(*sptep))
 			audit_printk(kvm, "shadow page has writable "
 				     "mappings: gfn %llx role %x\n",
 				     sp->gfn, sp->role.word);
-	}
 }
 
 static void audit_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 03/10] KVM: MMU: introduce PT_MAX_HUGEPAGE_LEVEL
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 02/10] KVM: MMU: introduce for_each_rmap_spte() Xiao Guangrong
@ 2015-05-13  6:42 ` Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 04/10] KVM: MMU: introduce for_each_slot_rmap_range Xiao Guangrong
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

It's used to clean up the code. Thanks for Paolo Bonzini's
suggestion

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 24 +++++++++---------------
 arch/x86/kvm/mmu.h |  1 +
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index afc8d15..e0e15b8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -811,8 +811,7 @@ static void account_shadowed(struct kvm *kvm, gfn_t gfn)
 	int i;
 
 	slot = gfn_to_memslot(kvm, gfn);
-	for (i = PT_DIRECTORY_LEVEL;
-	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		linfo = lpage_info_slot(gfn, slot, i);
 		linfo->write_count += 1;
 	}
@@ -826,8 +825,7 @@ static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
 	int i;
 
 	slot = gfn_to_memslot(kvm, gfn);
-	for (i = PT_DIRECTORY_LEVEL;
-	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		linfo = lpage_info_slot(gfn, slot, i);
 		linfo->write_count -= 1;
 		WARN_ON(linfo->write_count < 0);
@@ -858,8 +856,7 @@ static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
 
 	page_size = kvm_host_page_size(kvm, gfn);
 
-	for (i = PT_PAGE_TABLE_LEVEL;
-	     i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
+	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		if (page_size >= KVM_HPAGE_SIZE(i))
 			ret = i;
 		else
@@ -1344,8 +1341,7 @@ static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
 
 	slot = gfn_to_memslot(kvm, gfn);
 
-	for (i = PT_PAGE_TABLE_LEVEL;
-	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		rmapp = __gfn_to_rmap(gfn, i, slot);
 		write_protected |= __rmap_write_protect(kvm, rmapp, true);
 	}
@@ -1451,7 +1447,7 @@ static int kvm_handle_hva_range(struct kvm *kvm,
 		gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);
 
 		for (j = PT_PAGE_TABLE_LEVEL;
-		     j < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) {
+		     j <= PT_MAX_HUGEPAGE_LEVEL; ++j) {
 			unsigned long idx, idx_end;
 			unsigned long *rmapp;
 			gfn_t gfn = gfn_start;
@@ -4415,8 +4411,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 
-	for (i = PT_PAGE_TABLE_LEVEL;
-	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		unsigned long *rmapp;
 		unsigned long last_index, index;
 
@@ -4572,8 +4567,8 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 
-	for (i = PT_PAGE_TABLE_LEVEL + 1; /* skip rmap for 4K page */
-	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+	/* skip rmap for 4K page */
+	for (i = PT_PAGE_TABLE_LEVEL + 1; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		unsigned long *rmapp;
 		unsigned long last_index, index;
 
@@ -4610,8 +4605,7 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 
-	for (i = PT_PAGE_TABLE_LEVEL;
-	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		unsigned long *rmapp;
 		unsigned long last_index, index;
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 0ada65e..72bb33f 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -43,6 +43,7 @@
 #define PT_PDPE_LEVEL 3
 #define PT_DIRECTORY_LEVEL 2
 #define PT_PAGE_TABLE_LEVEL 1
+#define PT_MAX_HUGEPAGE_LEVEL (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1)
 
 static inline u64 rsvd_bits(int s, int e)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 04/10] KVM: MMU: introduce for_each_slot_rmap_range
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
                   ` (2 preceding siblings ...)
  2015-05-13  6:42 ` [PATCH v3 03/10] KVM: MMU: introduce PT_MAX_HUGEPAGE_LEVEL Xiao Guangrong
@ 2015-05-13  6:42 ` Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 05/10] KVM: MMU: introduce slot_handle_level_range() and its helpers Xiao Guangrong
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

It's used to abstract the code from kvm_handle_hva_range and it will be
used by later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 97 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 75 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e0e15b8..df403b3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1412,6 +1412,74 @@ restart:
 	return 0;
 }
 
+struct slot_rmap_walk_iterator {
+	/* input fields. */
+	struct kvm_memory_slot *slot;
+	gfn_t start_gfn;
+	gfn_t end_gfn;
+	int start_level;
+	int end_level;
+
+	/* output fields. */
+	gfn_t gfn;
+	unsigned long *rmap;
+	int level;
+
+	/* private field. */
+	unsigned long *end_rmap;
+};
+
+static void
+rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
+{
+	iterator->level = level;
+	iterator->gfn = iterator->start_gfn;
+	iterator->rmap = __gfn_to_rmap(iterator->gfn, level, iterator->slot);
+	iterator->end_rmap = __gfn_to_rmap(iterator->end_gfn, level,
+					   iterator->slot);
+}
+
+static void
+slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
+		    struct kvm_memory_slot *slot, int start_level,
+		    int end_level, gfn_t start_gfn, gfn_t end_gfn)
+{
+	iterator->slot = slot;
+	iterator->start_level = start_level;
+	iterator->end_level = end_level;
+	iterator->start_gfn = start_gfn;
+	iterator->end_gfn = end_gfn;
+
+	rmap_walk_init_level(iterator, iterator->start_level);
+}
+
+static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
+{
+	return !!iterator->rmap;
+}
+
+static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
+{
+	if (++iterator->rmap <= iterator->end_rmap) {
+		iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
+		return;
+	}
+
+	if (++iterator->level > iterator->end_level) {
+		iterator->rmap = NULL;
+		return;
+	}
+
+	rmap_walk_init_level(iterator, iterator->level);
+}
+
+#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_,	\
+	   _start_gfn, _end_gfn, _iter_)				\
+	for (slot_rmap_walk_init(_iter_, _slot_, _start_level_,		\
+				 _end_level_, _start_gfn, _end_gfn);	\
+	     slot_rmap_walk_okay(_iter_);				\
+	     slot_rmap_walk_next(_iter_))
+
 static int kvm_handle_hva_range(struct kvm *kvm,
 				unsigned long start,
 				unsigned long end,
@@ -1423,10 +1491,10 @@ static int kvm_handle_hva_range(struct kvm *kvm,
 					       int level,
 					       unsigned long data))
 {
-	int j;
-	int ret = 0;
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
+	struct slot_rmap_walk_iterator iterator;
+	int ret = 0;
 
 	slots = kvm_memslots(kvm);
 
@@ -1446,26 +1514,11 @@ static int kvm_handle_hva_range(struct kvm *kvm,
 		gfn_start = hva_to_gfn_memslot(hva_start, memslot);
 		gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);
 
-		for (j = PT_PAGE_TABLE_LEVEL;
-		     j <= PT_MAX_HUGEPAGE_LEVEL; ++j) {
-			unsigned long idx, idx_end;
-			unsigned long *rmapp;
-			gfn_t gfn = gfn_start;
-
-			/*
-			 * {idx(page_j) | page_j intersects with
-			 *  [hva_start, hva_end)} = {idx, idx+1, ..., idx_end}.
-			 */
-			idx = gfn_to_index(gfn_start, memslot->base_gfn, j);
-			idx_end = gfn_to_index(gfn_end - 1, memslot->base_gfn, j);
-
-			rmapp = __gfn_to_rmap(gfn_start, j, memslot);
-
-			for (; idx <= idx_end;
-			       ++idx, gfn += (1UL << KVM_HPAGE_GFN_SHIFT(j)))
-				ret |= handler(kvm, rmapp++, memslot,
-					       gfn, j, data);
-		}
+		for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL,
+				PT_MAX_HUGEPAGE_LEVEL, gfn_start, gfn_end - 1,
+				&iterator)
+			ret |= handler(kvm, iterator.rmap, memslot,
+				       iterator.gfn, iterator.level, data);
 	}
 
 	return ret;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 05/10] KVM: MMU: introduce slot_handle_level_range() and its helpers
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
                   ` (3 preceding siblings ...)
  2015-05-13  6:42 ` [PATCH v3 04/10] KVM: MMU: introduce for_each_slot_rmap_range Xiao Guangrong
@ 2015-05-13  6:42 ` Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 06/10] KVM: MMU: use slot_handle_level and its helper to clean up the code Xiao Guangrong
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

There are several places walking all rmaps for the memslot so that
introduce common functions to cleanup the code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index df403b3..62ac4e0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4453,6 +4453,75 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
 	init_kvm_mmu(vcpu);
 }
 
+/* The return value indicates if tlb flush on all vcpus is needed. */
+typedef bool (*slot_level_handler) (struct kvm *kvm, unsigned long *rmap);
+
+/* The caller should hold mmu-lock before calling this function. */
+static bool
+slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
+			slot_level_handler fn, int start_level, int end_level,
+			gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb)
+{
+	struct slot_rmap_walk_iterator iterator;
+	bool flush = false;
+
+	for_each_slot_rmap_range(memslot, start_level, end_level, start_gfn,
+			end_gfn, &iterator) {
+		if (iterator.rmap)
+			flush |= fn(kvm, iterator.rmap);
+
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+			if (flush && lock_flush_tlb) {
+				kvm_flush_remote_tlbs(kvm);
+				flush = false;
+			}
+			cond_resched_lock(&kvm->mmu_lock);
+		}
+	}
+
+	if (flush && lock_flush_tlb) {
+		kvm_flush_remote_tlbs(kvm);
+		flush = false;
+	}
+
+	return flush;
+}
+
+static bool
+slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
+		  slot_level_handler fn, int start_level, int end_level,
+		  bool lock_flush_tlb)
+{
+	return slot_handle_level_range(kvm, memslot, fn, start_level,
+			end_level, memslot->base_gfn,
+			memslot->base_gfn + memslot->npages - 1,
+			lock_flush_tlb);
+}
+
+static bool
+slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
+		      slot_level_handler fn, bool lock_flush_tlb)
+{
+	return slot_handle_level(kvm, memslot, fn, PT_PAGE_TABLE_LEVEL,
+				 PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
+}
+
+static bool
+slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
+			slot_level_handler fn, bool lock_flush_tlb)
+{
+	return slot_handle_level(kvm, memslot, fn, PT_PAGE_TABLE_LEVEL + 1,
+				 PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
+}
+
+static bool
+slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
+		 slot_level_handler fn, bool lock_flush_tlb)
+{
+	return slot_handle_level(kvm, memslot, fn, PT_PAGE_TABLE_LEVEL,
+				 PT_PAGE_TABLE_LEVEL, lock_flush_tlb);
+}
+
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 				      struct kvm_memory_slot *memslot)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 06/10] KVM: MMU: use slot_handle_level and its helper to clean up the code
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
                   ` (4 preceding siblings ...)
  2015-05-13  6:42 ` [PATCH v3 05/10] KVM: MMU: introduce slot_handle_level_range() and its helpers Xiao Guangrong
@ 2015-05-13  6:42 ` Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 07/10] KVM: MMU: introduce kvm_zap_rmapp Xiao Guangrong
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

slot_handle_level and its helper functions are ready now, use them to
clean up the code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 128 +++++++----------------------------------------------
 1 file changed, 16 insertions(+), 112 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 62ac4e0..c059822 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4522,34 +4522,19 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
 				 PT_PAGE_TABLE_LEVEL, lock_flush_tlb);
 }
 
+static bool slot_rmap_write_protect(struct kvm *kvm, unsigned long *rmapp)
+{
+	return __rmap_write_protect(kvm, rmapp, false);
+}
+
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 				      struct kvm_memory_slot *memslot)
 {
-	gfn_t last_gfn;
-	int i;
-	bool flush = false;
-
-	last_gfn = memslot->base_gfn + memslot->npages - 1;
+	bool flush;
 
 	spin_lock(&kvm->mmu_lock);
-
-	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
-		unsigned long *rmapp;
-		unsigned long last_index, index;
-
-		rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
-		last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
-
-		for (index = 0; index <= last_index; ++index, ++rmapp) {
-			if (*rmapp)
-				flush |= __rmap_write_protect(kvm, rmapp,
-						false);
-
-			if (need_resched() || spin_needbreak(&kvm->mmu_lock))
-				cond_resched_lock(&kvm->mmu_lock);
-		}
-	}
-
+	flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
+				      false);
 	spin_unlock(&kvm->mmu_lock);
 
 	/*
@@ -4610,59 +4595,18 @@ restart:
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 			struct kvm_memory_slot *memslot)
 {
-	bool flush = false;
-	unsigned long *rmapp;
-	unsigned long last_index, index;
-
 	spin_lock(&kvm->mmu_lock);
-
-	rmapp = memslot->arch.rmap[0];
-	last_index = gfn_to_index(memslot->base_gfn + memslot->npages - 1,
-				memslot->base_gfn, PT_PAGE_TABLE_LEVEL);
-
-	for (index = 0; index <= last_index; ++index, ++rmapp) {
-		if (*rmapp)
-			flush |= kvm_mmu_zap_collapsible_spte(kvm, rmapp);
-
-		if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
-			if (flush) {
-				kvm_flush_remote_tlbs(kvm);
-				flush = false;
-			}
-			cond_resched_lock(&kvm->mmu_lock);
-		}
-	}
-
-	if (flush)
-		kvm_flush_remote_tlbs(kvm);
-
+	slot_handle_leaf(kvm, memslot, kvm_mmu_zap_collapsible_spte, true);
 	spin_unlock(&kvm->mmu_lock);
 }
 
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   struct kvm_memory_slot *memslot)
 {
-	gfn_t last_gfn;
-	unsigned long *rmapp;
-	unsigned long last_index, index;
-	bool flush = false;
-
-	last_gfn = memslot->base_gfn + memslot->npages - 1;
+	bool flush;
 
 	spin_lock(&kvm->mmu_lock);
-
-	rmapp = memslot->arch.rmap[PT_PAGE_TABLE_LEVEL - 1];
-	last_index = gfn_to_index(last_gfn, memslot->base_gfn,
-			PT_PAGE_TABLE_LEVEL);
-
-	for (index = 0; index <= last_index; ++index, ++rmapp) {
-		if (*rmapp)
-			flush |= __rmap_clear_dirty(kvm, rmapp);
-
-		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
-			cond_resched_lock(&kvm->mmu_lock);
-	}
-
+	flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false);
 	spin_unlock(&kvm->mmu_lock);
 
 	lockdep_assert_held(&kvm->slots_lock);
@@ -4681,31 +4625,11 @@ EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);
 void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
 					struct kvm_memory_slot *memslot)
 {
-	gfn_t last_gfn;
-	int i;
-	bool flush = false;
-
-	last_gfn = memslot->base_gfn + memslot->npages - 1;
+	bool flush;
 
 	spin_lock(&kvm->mmu_lock);
-
-	/* skip rmap for 4K page */
-	for (i = PT_PAGE_TABLE_LEVEL + 1; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
-		unsigned long *rmapp;
-		unsigned long last_index, index;
-
-		rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
-		last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
-
-		for (index = 0; index <= last_index; ++index, ++rmapp) {
-			if (*rmapp)
-				flush |= __rmap_write_protect(kvm, rmapp,
-						false);
-
-			if (need_resched() || spin_needbreak(&kvm->mmu_lock))
-				cond_resched_lock(&kvm->mmu_lock);
-		}
-	}
+	flush = slot_handle_large_level(kvm, memslot, slot_rmap_write_protect,
+					false);
 	spin_unlock(&kvm->mmu_lock);
 
 	/* see kvm_mmu_slot_remove_write_access */
@@ -4719,30 +4643,10 @@ EXPORT_SYMBOL_GPL(kvm_mmu_slot_largepage_remove_write_access);
 void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 			    struct kvm_memory_slot *memslot)
 {
-	gfn_t last_gfn;
-	int i;
-	bool flush = false;
-
-	last_gfn = memslot->base_gfn + memslot->npages - 1;
+	bool flush;
 
 	spin_lock(&kvm->mmu_lock);
-
-	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
-		unsigned long *rmapp;
-		unsigned long last_index, index;
-
-		rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
-		last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
-
-		for (index = 0; index <= last_index; ++index, ++rmapp) {
-			if (*rmapp)
-				flush |= __rmap_set_dirty(kvm, rmapp);
-
-			if (need_resched() || spin_needbreak(&kvm->mmu_lock))
-				cond_resched_lock(&kvm->mmu_lock);
-		}
-	}
-
+	flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false);
 	spin_unlock(&kvm->mmu_lock);
 
 	lockdep_assert_held(&kvm->slots_lock);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 07/10] KVM: MMU: introduce kvm_zap_rmapp
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
                   ` (5 preceding siblings ...)
  2015-05-13  6:42 ` [PATCH v3 06/10] KVM: MMU: use slot_handle_level and its helper to clean up the code Xiao Guangrong
@ 2015-05-13  6:42 ` Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 08/10] KVM: MMU: introduce kvm_zap_gfn_range Xiao Guangrong
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

Split kvm_unmap_rmapp and introduce kvm_zap_rmapp which will be used in the
later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c059822..a990ad9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1349,24 +1349,28 @@ static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
 	return write_protected;
 }
 
-static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
-			   struct kvm_memory_slot *slot, gfn_t gfn, int level,
-			   unsigned long data)
+static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
-	int need_tlb_flush = 0;
+	bool flush = false;
 
 	while ((sptep = rmap_get_first(*rmapp, &iter))) {
 		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-		rmap_printk("kvm_rmap_unmap_hva: spte %p %llx gfn %llx (%d)\n",
-			     sptep, *sptep, gfn, level);
+		rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
 		drop_spte(kvm, sptep);
-		need_tlb_flush = 1;
+		flush = true;
 	}
 
-	return need_tlb_flush;
+	return flush;
+}
+
+static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
+			   struct kvm_memory_slot *slot, gfn_t gfn, int level,
+			   unsigned long data)
+{
+	return kvm_zap_rmapp(kvm, rmapp);
 }
 
 static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 08/10] KVM: MMU: introduce kvm_zap_gfn_range
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
                   ` (6 preceding siblings ...)
  2015-05-13  6:42 ` [PATCH v3 07/10] KVM: MMU: introduce kvm_zap_rmapp Xiao Guangrong
@ 2015-05-13  6:42 ` Xiao Guangrong
  2015-05-13  6:42 ` [PATCH v3 09/10] KVM: MMU: fix MTRR update Xiao Guangrong
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

It is used to zap all the rmaps of the specified gfn range and will
be used by the later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 24 ++++++++++++++++++++++++
 arch/x86/kvm/mmu.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a990ad9..49c34e6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4526,6 +4526,30 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
 				 PT_PAGE_TABLE_LEVEL, lock_flush_tlb);
 }
 
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+
+	slots = kvm_memslots(kvm);
+
+	spin_lock(&kvm->mmu_lock);
+	kvm_for_each_memslot(memslot, slots) {
+		gfn_t start, end;
+
+		start = max(gfn_start, memslot->base_gfn);
+		end = min(gfn_end, memslot->base_gfn + memslot->npages);
+		if (start >= end)
+			continue;
+
+		slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
+				PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL,
+				start, end - 1, true);
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+}
+
 static bool slot_rmap_write_protect(struct kvm *kvm, unsigned long *rmapp)
 {
 	return __rmap_write_protect(kvm, rmapp, false);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 72bb33f..398d21c 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -171,4 +171,5 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 
 void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 #endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 09/10] KVM: MMU: fix MTRR update
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
                   ` (7 preceding siblings ...)
  2015-05-13  6:42 ` [PATCH v3 08/10] KVM: MMU: introduce kvm_zap_gfn_range Xiao Guangrong
@ 2015-05-13  6:42 ` Xiao Guangrong
  2015-05-13  8:43   ` Wanpeng Li
  2015-05-13  6:42 ` [PATCH v3 10/10] KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed Xiao Guangrong
  2015-05-13 14:14 ` [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Paolo Bonzini
  10 siblings, 1 reply; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

Currently, whenever guest MTRR registers are changed
kvm_mmu_reset_context is called to switch to the new root shadow page
table, however, it's useless since:
1) the cache type is not cached into shadow page's attribute so that
   the original root shadow page will be reused

2) the cache type is set on the last spte, that means we should sync
   the last sptes when MTRR is changed

This patch fixs this issue by drop all the spte in the gfn range which
is being updated by MTRR

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/x86.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cde5d61..bbe184f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1852,6 +1852,63 @@ bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 }
 EXPORT_SYMBOL_GPL(kvm_mtrr_valid);
 
+static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
+{
+	struct mtrr_state_type *mtrr_state = &vcpu->arch.mtrr_state;
+	unsigned char mtrr_enabled = mtrr_state->enabled;
+	gfn_t start, end, mask;
+	int index;
+	bool is_fixed = true;
+
+	if (msr == MSR_IA32_CR_PAT || !tdp_enabled ||
+	      !kvm_arch_has_noncoherent_dma(vcpu->kvm))
+		return;
+
+	if (!(mtrr_enabled & 0x2) && msr != MSR_MTRRdefType)
+		return;
+
+	switch (msr) {
+	case MSR_MTRRfix64K_00000:
+		start = 0x0;
+		end = 0x80000;
+		break;
+	case MSR_MTRRfix16K_80000:
+		start = 0x80000;
+		end = 0xa0000;
+		break;
+	case MSR_MTRRfix16K_A0000:
+		start = 0xa0000;
+		end = 0xc0000;
+		break;
+	case MSR_MTRRfix4K_C0000 ... MSR_MTRRfix4K_F8000:
+		index = msr - MSR_MTRRfix4K_C0000;
+		start = 0xc0000 + index * (32 << 10);
+		end = start + (32 << 10);
+		break;
+	case MSR_MTRRdefType:
+		is_fixed = false;
+		start = 0x0;
+		end = ~0ULL;
+		break;
+	default:
+		/* variable range MTRRs. */
+		is_fixed = false;
+		index = (msr - 0x200) / 2;
+		start = (((u64)mtrr_state->var_ranges[index].base_hi) << 32) +
+		       (mtrr_state->var_ranges[index].base_lo & PAGE_MASK);
+		mask = (((u64)mtrr_state->var_ranges[index].mask_hi) << 32) +
+		       (mtrr_state->var_ranges[index].mask_lo & PAGE_MASK);
+		mask |= ~0ULL << cpuid_maxphyaddr(vcpu);
+
+		end = ((start & mask) | ~mask) + 1;
+	}
+
+	if (is_fixed && !(mtrr_enabled & 0x1))
+		return;
+
+	kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end));
+}
+
 static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
 	u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges;
@@ -1885,7 +1942,7 @@ static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		*pt = data;
 	}
 
-	kvm_mmu_reset_context(vcpu);
+	update_mtrr(vcpu, msr);
 	return 0;
 }
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 10/10] KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
                   ` (8 preceding siblings ...)
  2015-05-13  6:42 ` [PATCH v3 09/10] KVM: MMU: fix MTRR update Xiao Guangrong
@ 2015-05-13  6:42 ` Xiao Guangrong
  2015-05-13 14:14 ` [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Paolo Bonzini
  10 siblings, 0 replies; 32+ messages in thread
From: Xiao Guangrong @ 2015-05-13  6:42 UTC (permalink / raw)
  To: pbonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, Xiao Guangrong

CR0.CD and CR0.NW are not used by shadow page table so that need
not adjust mmu if these two bit are changed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/x86.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bbe184f..457b908 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -572,8 +572,7 @@ out:
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
 	unsigned long old_cr0 = kvm_read_cr0(vcpu);
-	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP |
-				    X86_CR0_CD | X86_CR0_NW;
+	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
 
 	cr0 |= X86_CR0_ET;
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR
  2015-05-13  6:42 ` [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR Xiao Guangrong
@ 2015-05-13  8:09   ` Wanpeng Li
  2015-07-12 17:33   ` Alex Williamson
  1 sibling, 0 replies; 32+ messages in thread
From: Wanpeng Li @ 2015-05-13  8:09 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: pbonzini, gleb, mtosatti, kvm, linux-kernel

On Wed, May 13, 2015 at 02:42:19PM +0800, Xiao Guangrong wrote:
>There are some bugs in current get_mtrr_type();
>1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>   IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>   that means other bits are ignored if it is cleared
>
>2: the fixed MTRR ranges are controlled by bit 0 of
>   mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>
>3: if MTRR is disabled, UC is applied to all of physical memory rather
>   than mtrr_state->def_type
>
>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>

Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com>

>---
> arch/x86/kvm/mmu.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
>diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>index b78e83f..d00cebd 100644
>--- a/arch/x86/kvm/mmu.c
>+++ b/arch/x86/kvm/mmu.c
>@@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
> static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
> 			 u64 start, u64 end)
> {
>-	int i;
> 	u64 base, mask;
> 	u8 prev_match, curr_match;
>-	int num_var_ranges = KVM_NR_VAR_MTRR;
>+	int i, num_var_ranges = KVM_NR_VAR_MTRR;
> 
>-	if (!mtrr_state->enabled)
>-		return 0xFF;
>+	/* MTRR is completely disabled, use UC for all of physical memory. */
>+	if (!(mtrr_state->enabled & 0x2))
>+		return MTRR_TYPE_UNCACHABLE;
> 
> 	/* Make end inclusive end, instead of exclusive */
> 	end--;
> 
> 	/* Look in fixed ranges. Just return the type as per start */
>-	if (mtrr_state->have_fixed && (start < 0x100000)) {
>+	if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
>+	      (start < 0x100000)) {
> 		int idx;
> 
> 		if (start < 0x80000) {
>@@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
> 	 * Look of multiple ranges matching this address and pick type
> 	 * as per MTRR precedence
> 	 */
>-	if (!(mtrr_state->enabled & 2))
>-		return mtrr_state->def_type;
>-
> 	prev_match = 0xFF;
> 	for (i = 0; i < num_var_ranges; ++i) {
> 		unsigned short start_state, end_state;
>-- 
>2.1.0
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 09/10] KVM: MMU: fix MTRR update
  2015-05-13  6:42 ` [PATCH v3 09/10] KVM: MMU: fix MTRR update Xiao Guangrong
@ 2015-05-13  8:43   ` Wanpeng Li
  2015-05-13 14:10     ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Wanpeng Li @ 2015-05-13  8:43 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: pbonzini, gleb, mtosatti, kvm, linux-kernel

Hi Xiao,
On Wed, May 13, 2015 at 02:42:27PM +0800, Xiao Guangrong wrote:
>Currently, whenever guest MTRR registers are changed
>kvm_mmu_reset_context is called to switch to the new root shadow page
>table, however, it's useless since:
>1) the cache type is not cached into shadow page's attribute so that
>   the original root shadow page will be reused

kvm_mmu_reset_context
  kvm_mmu_unload
    mmu_free_roots

The original root shadow page will be freed in mmu_free_roots, where I
miss?

Another question maybe not related to this patch:

If kvm_mmu_reset_context is just called to destroy the original root 
shadow page and all the sptes will remain valid? 

Regards,
Wanpeng Li 

>
>2) the cache type is set on the last spte, that means we should sync
>   the last sptes when MTRR is changed
>
>This patch fixs this issue by drop all the spte in the gfn range which
>is being updated by MTRR
>
>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>---
> arch/x86/kvm/x86.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 58 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index cde5d61..bbe184f 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -1852,6 +1852,63 @@ bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> }
> EXPORT_SYMBOL_GPL(kvm_mtrr_valid);
> 
>+static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
>+{
>+	struct mtrr_state_type *mtrr_state = &vcpu->arch.mtrr_state;
>+	unsigned char mtrr_enabled = mtrr_state->enabled;
>+	gfn_t start, end, mask;
>+	int index;
>+	bool is_fixed = true;
>+
>+	if (msr == MSR_IA32_CR_PAT || !tdp_enabled ||
>+	      !kvm_arch_has_noncoherent_dma(vcpu->kvm))
>+		return;
>+
>+	if (!(mtrr_enabled & 0x2) && msr != MSR_MTRRdefType)
>+		return;
>+
>+	switch (msr) {
>+	case MSR_MTRRfix64K_00000:
>+		start = 0x0;
>+		end = 0x80000;
>+		break;
>+	case MSR_MTRRfix16K_80000:
>+		start = 0x80000;
>+		end = 0xa0000;
>+		break;
>+	case MSR_MTRRfix16K_A0000:
>+		start = 0xa0000;
>+		end = 0xc0000;
>+		break;
>+	case MSR_MTRRfix4K_C0000 ... MSR_MTRRfix4K_F8000:
>+		index = msr - MSR_MTRRfix4K_C0000;
>+		start = 0xc0000 + index * (32 << 10);
>+		end = start + (32 << 10);
>+		break;
>+	case MSR_MTRRdefType:
>+		is_fixed = false;
>+		start = 0x0;
>+		end = ~0ULL;
>+		break;
>+	default:
>+		/* variable range MTRRs. */
>+		is_fixed = false;
>+		index = (msr - 0x200) / 2;
>+		start = (((u64)mtrr_state->var_ranges[index].base_hi) << 32) +
>+		       (mtrr_state->var_ranges[index].base_lo & PAGE_MASK);
>+		mask = (((u64)mtrr_state->var_ranges[index].mask_hi) << 32) +
>+		       (mtrr_state->var_ranges[index].mask_lo & PAGE_MASK);
>+		mask |= ~0ULL << cpuid_maxphyaddr(vcpu);
>+
>+		end = ((start & mask) | ~mask) + 1;
>+	}
>+
>+	if (is_fixed && !(mtrr_enabled & 0x1))
>+		return;
>+
>+	kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end));
>+}
>+
> static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> {
> 	u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges;
>@@ -1885,7 +1942,7 @@ static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> 		*pt = data;
> 	}
> 
>-	kvm_mmu_reset_context(vcpu);
>+	update_mtrr(vcpu, msr);
> 	return 0;
> }
> 
>-- 
>2.1.0
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 09/10] KVM: MMU: fix MTRR update
  2015-05-13  8:43   ` Wanpeng Li
@ 2015-05-13 14:10     ` Paolo Bonzini
  2015-05-14  0:16       ` Wanpeng Li
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2015-05-13 14:10 UTC (permalink / raw)
  To: Wanpeng Li, Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel



On 13/05/2015 10:43, Wanpeng Li wrote:
> kvm_mmu_reset_context
>   kvm_mmu_unload
>     mmu_free_roots
> 
> The original root shadow page will be freed in mmu_free_roots, where I
> miss?
> 
> Another question maybe not related to this patch:
> 
> If kvm_mmu_reset_context is just called to destroy the original root 
> shadow page and all the sptes will remain valid? 

SPTEs are kept around and cached.  The "role" field is used as the hash
key; if the role doesn't change, SPTEs are reused, so you have to zap
the SPTEs explicitly.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 00/10] KVM: MTRR fixes and some cleanups
  2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
                   ` (9 preceding siblings ...)
  2015-05-13  6:42 ` [PATCH v3 10/10] KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed Xiao Guangrong
@ 2015-05-13 14:14 ` Paolo Bonzini
  10 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-05-13 14:14 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel



On 13/05/2015 08:42, Xiao Guangrong wrote:
> Changelog in v3:
> thanks for Paolo's comment:
> - do not apply for_each_rmap_spte to kvm_zap_rmapp and kvm_mmu_unlink_parents

Good idea applying my comment to kvm_mmu_unlink_parents as well. :)

> - fix a cosmetic issue in slot_handle_level_range
> - introduce PT_MAX_HUGEPAGE_LEVEL to clean up the code
> - improve code Indentation

I've squashed together patches 8 and 9, and will apply to kvm/queue soon.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 09/10] KVM: MMU: fix MTRR update
  2015-05-13 14:10     ` Paolo Bonzini
@ 2015-05-14  0:16       ` Wanpeng Li
  2015-05-14  8:43         ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Wanpeng Li @ 2015-05-14  0:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Xiao Guangrong, gleb, mtosatti, kvm, linux-kernel

On Wed, May 13, 2015 at 04:10:08PM +0200, Paolo Bonzini wrote:
>
>
>On 13/05/2015 10:43, Wanpeng Li wrote:
>> kvm_mmu_reset_context
>>   kvm_mmu_unload
>>     mmu_free_roots
>> 
>> The original root shadow page will be freed in mmu_free_roots, where I
>> miss?
>> 
>> Another question maybe not related to this patch:
>> 
>> If kvm_mmu_reset_context is just called to destroy the original root 
>> shadow page and all the sptes will remain valid? 
>
>SPTEs are kept around and cached.  The "role" field is used as the hash
>key; if the role doesn't change, SPTEs are reused, so you have to zap
>the SPTEs explicitly.

Thanks for your explanation. :)

Btw, why the patch changelog mentioned that the root shadow page will be
reused, I think it will be zapped in mmu_free_roots. 

Regards,
Wanpeng Li 

>
>Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 09/10] KVM: MMU: fix MTRR update
  2015-05-14  0:16       ` Wanpeng Li
@ 2015-05-14  8:43         ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-05-14  8:43 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Xiao Guangrong, gleb, mtosatti, kvm, linux-kernel



On 14/05/2015 02:16, Wanpeng Li wrote:
> >SPTEs are kept around and cached.  The "role" field is used as the hash
> >key; if the role doesn't change, SPTEs are reused, so you have to zap
> >the SPTEs explicitly.
> 
> Btw, why the patch changelog mentioned that the root shadow page will be
> reused, I think it will be zapped in mmu_free_roots. 

Who has set sp->role.invalid on it?  If no one has, it's not zapped and
it will be found again in the hash table.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR
  2015-05-13  6:42 ` [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR Xiao Guangrong
  2015-05-13  8:09   ` Wanpeng Li
@ 2015-07-12 17:33   ` Alex Williamson
  2015-07-12 18:59     ` Xiao Guangrong
  2015-07-12 19:12     ` [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR Bandan Das
  1 sibling, 2 replies; 32+ messages in thread
From: Alex Williamson @ 2015-07-12 17:33 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: pbonzini, gleb, mtosatti, kvm, linux-kernel

On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
> There are some bugs in current get_mtrr_type();
> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>    IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>    that means other bits are ignored if it is cleared
> 
> 2: the fixed MTRR ranges are controlled by bit 0 of
>    mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
> 
> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>    than mtrr_state->def_type
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  arch/x86/kvm/mmu.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)


I'm seeing a significant regression in boot performance on Intel
hardware with assigned devices that bisects back to this patch.  There's
a long delay with Seabios between the version splash and execution of
option ROMs, and a _very_ long delay with OVMF before the display is
initialized.  The delay is long enough that users are reporting their
previously working VM is hung with 100% CPU usage on v4.2-rc1.  Thanks,

Alex

> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index b78e83f..d00cebd 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
>  static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
>  			 u64 start, u64 end)
>  {
> -	int i;
>  	u64 base, mask;
>  	u8 prev_match, curr_match;
> -	int num_var_ranges = KVM_NR_VAR_MTRR;
> +	int i, num_var_ranges = KVM_NR_VAR_MTRR;
>  
> -	if (!mtrr_state->enabled)
> -		return 0xFF;
> +	/* MTRR is completely disabled, use UC for all of physical memory. */
> +	if (!(mtrr_state->enabled & 0x2))
> +		return MTRR_TYPE_UNCACHABLE;
>  
>  	/* Make end inclusive end, instead of exclusive */
>  	end--;
>  
>  	/* Look in fixed ranges. Just return the type as per start */
> -	if (mtrr_state->have_fixed && (start < 0x100000)) {
> +	if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
> +	      (start < 0x100000)) {
>  		int idx;
>  
>  		if (start < 0x80000) {
> @@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
>  	 * Look of multiple ranges matching this address and pick type
>  	 * as per MTRR precedence
>  	 */
> -	if (!(mtrr_state->enabled & 2))
> -		return mtrr_state->def_type;
> -
>  	prev_match = 0xFF;
>  	for (i = 0; i < num_var_ranges; ++i) {
>  		unsigned short start_state, end_state;




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR
  2015-07-12 17:33   ` Alex Williamson
@ 2015-07-12 18:59     ` Xiao Guangrong
  2015-07-13  7:32       ` Paolo Bonzini
  2015-07-12 19:12     ` [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR Bandan Das
  1 sibling, 1 reply; 32+ messages in thread
From: Xiao Guangrong @ 2015-07-12 18:59 UTC (permalink / raw)
  To: Alex Williamson; +Cc: pbonzini, gleb, mtosatti, kvm, linux-kernel



On 07/13/2015 01:33 AM, Alex Williamson wrote:
> On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
>> There are some bugs in current get_mtrr_type();
>> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>>     IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>>     that means other bits are ignored if it is cleared
>>
>> 2: the fixed MTRR ranges are controlled by bit 0 of
>>     mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>>
>> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>>     than mtrr_state->def_type
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   arch/x86/kvm/mmu.c | 14 ++++++--------
>>   1 file changed, 6 insertions(+), 8 deletions(-)
>
>
> I'm seeing a significant regression in boot performance on Intel
> hardware with assigned devices that bisects back to this patch.  There's
> a long delay with Seabios between the version splash and execution of
> option ROMs, and a _very_ long delay with OVMF before the display is
> initialized.  The delay is long enough that users are reporting their
> previously working VM is hung with 100% CPU usage on v4.2-rc1.  Thanks,
>

Alex, thanks for your report. I will try to reproduce and fix it asap.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR
  2015-07-12 17:33   ` Alex Williamson
  2015-07-12 18:59     ` Xiao Guangrong
@ 2015-07-12 19:12     ` Bandan Das
  1 sibling, 0 replies; 32+ messages in thread
From: Bandan Das @ 2015-07-12 19:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Xiao Guangrong, pbonzini, gleb, mtosatti, kvm, linux-kernel

Alex Williamson <alex.williamson@redhat.com> writes:

> On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
>> There are some bugs in current get_mtrr_type();
>> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>>    IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>>    that means other bits are ignored if it is cleared
>> 
>> 2: the fixed MTRR ranges are controlled by bit 0 of
>>    mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>> 
>> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>>    than mtrr_state->def_type
>> 
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>  arch/x86/kvm/mmu.c | 14 ++++++--------
>>  1 file changed, 6 insertions(+), 8 deletions(-)
>
>
> I'm seeing a significant regression in boot performance on Intel
> hardware with assigned devices that bisects back to this patch.  There's
> a long delay with Seabios between the version splash and execution of
> option ROMs, and a _very_ long delay with OVMF before the display is
> initialized.  The delay is long enough that users are reporting their
> previously working VM is hung with 100% CPU usage on v4.2-rc1.  Thanks,
>
> Alex
>
>> 
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index b78e83f..d00cebd 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
>>  static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
>>  			 u64 start, u64 end)
>>  {
>> -	int i;
>>  	u64 base, mask;
>>  	u8 prev_match, curr_match;
>> -	int num_var_ranges = KVM_NR_VAR_MTRR;
>> +	int i, num_var_ranges = KVM_NR_VAR_MTRR;
>>  
>> -	if (!mtrr_state->enabled)
>> -		return 0xFF;
>> +	/* MTRR is completely disabled, use UC for all of physical memory. */
>> +	if (!(mtrr_state->enabled & 0x2))
>> +		return MTRR_TYPE_UNCACHABLE;

Setting this to UC if MTRR is disabled is too restrictive maybe..

Bandan

>>  	/* Make end inclusive end, instead of exclusive */
>>  	end--;
>>  
>>  	/* Look in fixed ranges. Just return the type as per start */
>> -	if (mtrr_state->have_fixed && (start < 0x100000)) {
>> +	if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
>> +	      (start < 0x100000)) {
>>  		int idx;
>>  
>>  		if (start < 0x80000) {
>> @@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
>>  	 * Look of multiple ranges matching this address and pick type
>>  	 * as per MTRR precedence
>>  	 */
>> -	if (!(mtrr_state->enabled & 2))
>> -		return mtrr_state->def_type;
>> -
>>  	prev_match = 0xFF;
>>  	for (i = 0; i < num_var_ranges; ++i) {
>>  		unsigned short start_state, end_state;
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR
  2015-07-12 18:59     ` Xiao Guangrong
@ 2015-07-13  7:32       ` Paolo Bonzini
  2015-07-13 14:45         ` Xiao Guangrong
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2015-07-13  7:32 UTC (permalink / raw)
  To: Xiao Guangrong, Alex Williamson; +Cc: gleb, mtosatti, kvm, linux-kernel



On 12/07/2015 20:59, Xiao Guangrong wrote:
> 
> 
> On 07/13/2015 01:33 AM, Alex Williamson wrote:
>> On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
>>> There are some bugs in current get_mtrr_type();
>>> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>>>     IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>>>     that means other bits are ignored if it is cleared
>>>
>>> 2: the fixed MTRR ranges are controlled by bit 0 of
>>>     mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>>>
>>> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>>>     than mtrr_state->def_type
>>>
>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> ---
>>>   arch/x86/kvm/mmu.c | 14 ++++++--------
>>>   1 file changed, 6 insertions(+), 8 deletions(-)
>>
>>
>> I'm seeing a significant regression in boot performance on Intel
>> hardware with assigned devices that bisects back to this patch.  There's
>> a long delay with Seabios between the version splash and execution of
>> option ROMs, and a _very_ long delay with OVMF before the display is
>> initialized.  The delay is long enough that users are reporting their
>> previously working VM is hung with 100% CPU usage on v4.2-rc1.  Thanks,
>>
> 
> Alex, thanks for your report. I will try to reproduce and fix it asap.

The code that Bandan pointed out

+	/* MTRR is completely disabled, use UC for all of physical memory. */
+	if (!(mtrr_state->enabled & 0x2))
+		return MTRR_TYPE_UNCACHABLE;

actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
kvm_mtrr_get_guest_memory_type, 2015-06-15).  Should mtrr_default_type
actually be something like this:

static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
{
        if (mtrr_is_enabled(mtrr_state))
                return mtrr_state->deftype & IA32_MTRR_DEF_TYPE_TYPE_MASK;
        else
                return MTRR_TYPE_UNCACHABLE;
}

?  Then it's easy to add a quirk that makes the default WRITEBACK until
MTRRs are enabled.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR
  2015-07-13  7:32       ` Paolo Bonzini
@ 2015-07-13 14:45         ` Xiao Guangrong
  2015-07-13 15:13           ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Xiao Guangrong @ 2015-07-13 14:45 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Williamson; +Cc: gleb, mtosatti, kvm, linux-kernel


On 07/13/2015 03:32 PM, Paolo Bonzini wrote:

>>> I'm seeing a significant regression in boot performance on Intel
>>> hardware with assigned devices that bisects back to this patch.  There's
>>> a long delay with Seabios between the version splash and execution of
>>> option ROMs, and a _very_ long delay with OVMF before the display is
>>> initialized.  The delay is long enough that users are reporting their
>>> previously working VM is hung with 100% CPU usage on v4.2-rc1.  Thanks,
>>>
>>
>> Alex, thanks for your report. I will try to reproduce and fix it asap.
>

I have reproduced the issue and I think Bandan and Paolo is correct.

> The code that Bandan pointed out
>
> +	/* MTRR is completely disabled, use UC for all of physical memory. */
> +	if (!(mtrr_state->enabled & 0x2))
> +		return MTRR_TYPE_UNCACHABLE;
>
> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
> kvm_mtrr_get_guest_memory_type, 2015-06-15).  Should mtrr_default_type
> actually be something like this:

:(

Based on the SDM, UC is applied to all memory rather than default-type
if MTRR is disabled.

If i changed the code to:
   	if (!(mtrr_state->enabled & 0x2))
  		return mtrr_state->def_type;
the result is the same as before.

However, fast boot came back if "return 0xFF" here. So fast boot expects
that the memory type is WB.

>
> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
> {
>          if (mtrr_is_enabled(mtrr_state))
>                  return mtrr_state->deftype & IA32_MTRR_DEF_TYPE_TYPE_MASK;
>          else
>                  return MTRR_TYPE_UNCACHABLE;
> }
>
> ?  Then it's easy to add a quirk that makes the default WRITEBACK until
> MTRRs are enabled.

It is the wrong configure in OVMF... shall we need to adjust KVM to satisfy
OVMF?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR
  2015-07-13 14:45         ` Xiao Guangrong
@ 2015-07-13 15:13           ` Paolo Bonzini
  2015-07-13 15:15             ` Xiao Guangrong
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2015-07-13 15:13 UTC (permalink / raw)
  To: Xiao Guangrong, Alex Williamson; +Cc: gleb, mtosatti, kvm, linux-kernel

On 13/07/2015 16:45, Xiao Guangrong wrote:
>> +    /* MTRR is completely disabled, use UC for all of physical
>> memory. */
>> +    if (!(mtrr_state->enabled & 0x2))
>> +        return MTRR_TYPE_UNCACHABLE;
>>
>> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
>> kvm_mtrr_get_guest_memory_type, 2015-06-15).
> 
> :(
> 
> Based on the SDM, UC is applied to all memory rather than default-type
> if MTRR is disabled.

There are two issues I think.  One is that I cannot find in the current
code that "UC is applied to all memory rather than default-type if MTRR
is disabled".  mtrr_default_type unconditionally looks at
mtrr_state->deftype.

> However, fast boot came back if "return 0xFF" here. So fast boot expects
> that the memory type is WB.

Yes.

>>
>> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
>> {
>>          if (mtrr_is_enabled(mtrr_state))
>>                  return mtrr_state->deftype &
>> IA32_MTRR_DEF_TYPE_TYPE_MASK;
>>          else
>>                  return MTRR_TYPE_UNCACHABLE;
>> }
>>
>> ?  Then it's easy to add a quirk that makes the default WRITEBACK until
>> MTRRs are enabled.
> 
> It is the wrong configure in OVMF... shall we need to adjust KVM to satisfy
> OVMF?

That's what quirks are for...  The firmware should still be fixed of course.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR
  2015-07-13 15:13           ` Paolo Bonzini
@ 2015-07-13 15:15             ` Xiao Guangrong
  2015-07-14 21:12               ` MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR] Laszlo Ersek
  0 siblings, 1 reply; 32+ messages in thread
From: Xiao Guangrong @ 2015-07-13 15:15 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Williamson; +Cc: gleb, mtosatti, kvm, linux-kernel



On 07/13/2015 11:13 PM, Paolo Bonzini wrote:
> On 13/07/2015 16:45, Xiao Guangrong wrote:
>>> +    /* MTRR is completely disabled, use UC for all of physical
>>> memory. */
>>> +    if (!(mtrr_state->enabled & 0x2))
>>> +        return MTRR_TYPE_UNCACHABLE;
>>>
>>> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
>>> kvm_mtrr_get_guest_memory_type, 2015-06-15).
>>
>> :(
>>
>> Based on the SDM, UC is applied to all memory rather than default-type
>> if MTRR is disabled.
>
> There are two issues I think.  One is that I cannot find in the current
> code that "UC is applied to all memory rather than default-type if MTRR
> is disabled".  mtrr_default_type unconditionally looks at
> mtrr_state->deftype.

Yes... Will fix.

>
>> However, fast boot came back if "return 0xFF" here. So fast boot expects
>> that the memory type is WB.
>
> Yes.
>
>>>
>>> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
>>> {
>>>           if (mtrr_is_enabled(mtrr_state))
>>>                   return mtrr_state->deftype &
>>> IA32_MTRR_DEF_TYPE_TYPE_MASK;
>>>           else
>>>                   return MTRR_TYPE_UNCACHABLE;
>>> }
>>>
>>> ?  Then it's easy to add a quirk that makes the default WRITEBACK until
>>> MTRRs are enabled.
>>
>> It is the wrong configure in OVMF... shall we need to adjust KVM to satisfy
>> OVMF?
>
> That's what quirks are for...  The firmware should still be fixed of course.

I see, will do it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]
  2015-07-13 15:15             ` Xiao Guangrong
@ 2015-07-14 21:12               ` Laszlo Ersek
  2015-07-14 21:15                 ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Laszlo Ersek @ 2015-07-14 21:12 UTC (permalink / raw)
  To: Xiao Guangrong, Paolo Bonzini, Alex Williamson
  Cc: gleb, mtosatti, kvm, linux-kernel, edk2-devel list,
	Jordan Justen (Intel address)

Cross-posting to edk2-devel.

Original sub-thread starts here:
http://thread.gmane.org/gmane.linux.kernel/1952205/focus=1994315

On 07/13/15 17:15, Xiao Guangrong wrote:
> 
> 
> On 07/13/2015 11:13 PM, Paolo Bonzini wrote:
>> On 13/07/2015 16:45, Xiao Guangrong wrote:
>>>> +    /* MTRR is completely disabled, use UC for all of physical
>>>> memory. */
>>>> +    if (!(mtrr_state->enabled & 0x2))
>>>> +        return MTRR_TYPE_UNCACHABLE;
>>>>
>>>> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
>>>> kvm_mtrr_get_guest_memory_type, 2015-06-15).
>>>
>>> :(
>>>
>>> Based on the SDM, UC is applied to all memory rather than default-type
>>> if MTRR is disabled.
>>
>> There are two issues I think.  One is that I cannot find in the current
>> code that "UC is applied to all memory rather than default-type if MTRR
>> is disabled".  mtrr_default_type unconditionally looks at
>> mtrr_state->deftype.
> 
> Yes... Will fix.
> 
>>
>>> However, fast boot came back if "return 0xFF" here. So fast boot expects
>>> that the memory type is WB.
>>
>> Yes.
>>
>>>>
>>>> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
>>>> {
>>>>           if (mtrr_is_enabled(mtrr_state))
>>>>                   return mtrr_state->deftype &
>>>> IA32_MTRR_DEF_TYPE_TYPE_MASK;
>>>>           else
>>>>                   return MTRR_TYPE_UNCACHABLE;
>>>> }
>>>>
>>>> ?  Then it's easy to add a quirk that makes the default WRITEBACK until
>>>> MTRRs are enabled.
>>>
>>> It is the wrong configure in OVMF... shall we need to adjust KVM to
>>> satisfy
>>> OVMF?
>>
>> That's what quirks are for...  The firmware should still be fixed of
>> course.
> 
> I see, will do it.

The long delay that Alex reported (for the case when all guest memory
was set to UC up-front) is due to the fact that the SEC phase of OVMF
decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
drivers -- and this decompression is extremely memory-intensive.

(Refer to "Firmware image structure" in the OVMF whitepaper,
<http://www.linux-kvm.org/page/OVMF>.)

Perhaps also significant, with this initial MTRR change: the x86_64
reset vector builds some page tables too. (Refer to "Select features |
X64-specific reset vector for OVMF" in the OVMF whitepaper.)

(When Jordan implemented that reset vector first, we saw similar
performance degradation on AMD hosts (albeit not due to MTRR but due to
page attributes). See
<https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
it here because it makes me appreciate the current problem report.)

Anyway, the reset vector's page table building is implemented in
"OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
(It is recommended to refer heavily to the OVMF whitepaper, or at least
to drink heavily.)

In the PEI phase, we do set up MTRRs sensibly, see
"OvmfPkg/PlatformPei/MemDetect.c", function QemuInitializeRam().
However, that's too late with regard this problem report, because PEI
modules run *from* one of the firmware volumes that SEC expands with LZMA.

Anyway, the logic in QemuInitializeRam() relies on the MtrrLib library
class, which OVMF resolves to the
"UefiCpuPkg/Library/MtrrLib/MtrrLib.inf" instance. The latter has no
client module type restriction (it's BASE), so it could be used in SEC
too, if someone were to write patches.

I'm sorry but I really can't take this on now. So, respectfully:
- please quirk it in KVM for now (SeaBIOS is also affected),
- "patches welcome" for OVMF, as always.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]
  2015-07-14 21:12               ` MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR] Laszlo Ersek
@ 2015-07-14 21:15                 ` Paolo Bonzini
  2015-07-14 21:29                   ` Laszlo Ersek
                                     ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-07-14 21:15 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Xiao Guangrong, Alex Williamson, gleb, mtosatti, kvm,
	linux-kernel, edk2-devel list, Jordan Justen (Intel address)

> The long delay that Alex reported (for the case when all guest memory
> was set to UC up-front) is due to the fact that the SEC phase of OVMF
> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
> drivers -- and this decompression is extremely memory-intensive.
> 
> (When Jordan implemented that reset vector first, we saw similar
> performance degradation on AMD hosts (albeit not due to MTRR but due to
> page attributes). See
> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
> it here because it makes me appreciate the current problem report.)
> 
> Anyway, the reset vector's page table building is implemented in
> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().

Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
and set the default type to writeback.

In any case we're going to have to quirk it, because of the broken
guests in the wild.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]
  2015-07-14 21:15                 ` Paolo Bonzini
@ 2015-07-14 21:29                   ` Laszlo Ersek
  2015-07-14 22:37                     ` Jordan Justen
  2015-07-15  0:14                   ` [edk2] " Fan, Jeff
  2015-07-15 19:30                   ` Xiao Guangrong
  2 siblings, 1 reply; 32+ messages in thread
From: Laszlo Ersek @ 2015-07-14 21:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, Alex Williamson, gleb, mtosatti, kvm,
	linux-kernel, edk2-devel list, Jordan Justen (Intel address)

On 07/14/15 23:15, Paolo Bonzini wrote:
>> The long delay that Alex reported (for the case when all guest memory
>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>> drivers -- and this decompression is extremely memory-intensive.
>>
>> (When Jordan implemented that reset vector first, we saw similar
>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>> page attributes). See
>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>> it here because it makes me appreciate the current problem report.)
>>
>> Anyway, the reset vector's page table building is implemented in
>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
> 
> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?

That's an idea, yes, if someone feels sufficiently drawn to writing
assembly. Complications:
- the reset vector is specific to OvmfPkg only in the OvmfPkgX64.dsc
  build
- it needs to be determined what memory to cover.

> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
> and set the default type to writeback.

Seems safe to me, off the top of my head (and testing could confirm /
disprove quickly).

> In any case we're going to have to quirk it, because of the broken
> guests in the wild.

Thanks.
Laszlo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]
  2015-07-14 21:29                   ` Laszlo Ersek
@ 2015-07-14 22:37                     ` Jordan Justen
  2015-07-15  9:57                       ` Laszlo Ersek
  0 siblings, 1 reply; 32+ messages in thread
From: Jordan Justen @ 2015-07-14 22:37 UTC (permalink / raw)
  To: Laszlo Ersek, Paolo Bonzini
  Cc: Xiao Guangrong, Alex Williamson, gleb, mtosatti, kvm,
	linux-kernel, edk2-devel list

On 2015-07-14 14:29:11, Laszlo Ersek wrote:
> On 07/14/15 23:15, Paolo Bonzini wrote:
> >> The long delay that Alex reported (for the case when all guest memory
> >> was set to UC up-front) is due to the fact that the SEC phase of OVMF
> >> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
> >> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
> >> drivers -- and this decompression is extremely memory-intensive.
> >>
> >> (When Jordan implemented that reset vector first, we saw similar
> >> performance degradation on AMD hosts (albeit not due to MTRR but due to
> >> page attributes). See
> >> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
> >> it here because it makes me appreciate the current problem report.)
> >>
> >> Anyway, the reset vector's page table building is implemented in
> >> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
> >> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
> > 
> > Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
> 
> That's an idea, yes, if someone feels sufficiently drawn to writing
> assembly.

Maybe we can use MtrrLib in the SEC C code?

-Jordan

> Complications:
> - the reset vector is specific to OvmfPkg only in the OvmfPkgX64.dsc
>   build
> - it needs to be determined what memory to cover.
> 
> > I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
> > and set the default type to writeback.
> 
> Seems safe to me, off the top of my head (and testing could confirm /
> disprove quickly).
> 
> > In any case we're going to have to quirk it, because of the broken
> > guests in the wild.
> 
> Thanks.
> Laszlo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [edk2] MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]
  2015-07-14 21:15                 ` Paolo Bonzini
  2015-07-14 21:29                   ` Laszlo Ersek
@ 2015-07-15  0:14                   ` Fan, Jeff
  2015-07-15 19:30                   ` Xiao Guangrong
  2 siblings, 0 replies; 32+ messages in thread
From: Fan, Jeff @ 2015-07-15  0:14 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek
  Cc: edk2-devel list, Xiao Guangrong, kvm, gleb, linux-kernel

Actually, MMIO will be used in OVMF SEC phase if local APIC is consumed. (SecPeiDebugAgentLib will consume local APIC timer for communication between debugger TARGET/HOST).

So, I suggest to keep MTRR default value to UC and set the code range to WB, or set default value to WB and set Local APIC space range to UC. (gUefiCpuPkgTokenSpaceGuid.PcdCpuLocalApicBaseAddress|0xfee00000)

As Jordan's suggestion, you could make use of MTRR lib in UefiCpuPkg.

Jeff

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com] 
Sent: Wednesday, July 15, 2015 5:16 AM
To: Laszlo Ersek
Cc: edk2-devel list; Xiao Guangrong; kvm@vger.kernel.org; gleb@kernel.org; mtosatti@redhat.com; linux-kernel@vger.kernel.org
Subject: Re: [edk2] MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]

> The long delay that Alex reported (for the case when all guest memory 
> was set to UC up-front) is due to the fact that the SEC phase of OVMF 
> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to 
> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI 
> drivers -- and this decompression is extremely memory-intensive.
> 
> (When Jordan implemented that reset vector first, we saw similar 
> performance degradation on AMD hosts (albeit not due to MTRR but due 
> to page attributes). See 
> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only 
> mentioning it here because it makes me appreciate the current problem 
> report.)
> 
> Anyway, the reset vector's page table building is implemented in 
> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC 
> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().

Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs and set the default type to writeback.

In any case we're going to have to quirk it, because of the broken guests in the wild.

Paolo

------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]
  2015-07-14 22:37                     ` Jordan Justen
@ 2015-07-15  9:57                       ` Laszlo Ersek
  0 siblings, 0 replies; 32+ messages in thread
From: Laszlo Ersek @ 2015-07-15  9:57 UTC (permalink / raw)
  To: Jordan Justen
  Cc: Paolo Bonzini, Xiao Guangrong, Alex Williamson, gleb, mtosatti,
	kvm, linux-kernel, edk2-devel list

On 07/15/15 00:37, Jordan Justen wrote:
> On 2015-07-14 14:29:11, Laszlo Ersek wrote:
>> On 07/14/15 23:15, Paolo Bonzini wrote:
>>>> The long delay that Alex reported (for the case when all guest memory
>>>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>>>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>>>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>>>> drivers -- and this decompression is extremely memory-intensive.
>>>>
>>>> (When Jordan implemented that reset vector first, we saw similar
>>>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>>>> page attributes). See
>>>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>>>> it here because it makes me appreciate the current problem report.)
>>>>
>>>> Anyway, the reset vector's page table building is implemented in
>>>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>>>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
>>>
>>> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
>>
>> That's an idea, yes, if someone feels sufficiently drawn to writing
>> assembly.
> 
> Maybe we can use MtrrLib in the SEC C code?

If the page table building in the reset vector is not too costly with
UC, then yes, it should suffice if MtrrLib was put to use first just
before the decompression in SEC.

Thanks
Laszlo

> -Jordan
> 
>> Complications:
>> - the reset vector is specific to OvmfPkg only in the OvmfPkgX64.dsc
>>   build
>> - it needs to be determined what memory to cover.
>>
>>> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
>>> and set the default type to writeback.
>>
>> Seems safe to me, off the top of my head (and testing could confirm /
>> disprove quickly).
>>
>>> In any case we're going to have to quirk it, because of the broken
>>> guests in the wild.
>>
>> Thanks.
>> Laszlo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]
  2015-07-14 21:15                 ` Paolo Bonzini
  2015-07-14 21:29                   ` Laszlo Ersek
  2015-07-15  0:14                   ` [edk2] " Fan, Jeff
@ 2015-07-15 19:30                   ` Xiao Guangrong
  2015-07-15 19:41                     ` Laszlo Ersek
  2 siblings, 1 reply; 32+ messages in thread
From: Xiao Guangrong @ 2015-07-15 19:30 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek
  Cc: Alex Williamson, gleb, mtosatti, kvm, linux-kernel,
	edk2-devel list, Jordan Justen (Intel address)


Hi,

I have posted the pachset to make OVMF happy and have CCed you guys,
could you please check it if it works for you?


On 07/15/2015 05:15 AM, Paolo Bonzini wrote:
>> The long delay that Alex reported (for the case when all guest memory
>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>> drivers -- and this decompression is extremely memory-intensive.
>>
>> (When Jordan implemented that reset vector first, we saw similar
>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>> page attributes). See
>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>> it here because it makes me appreciate the current problem report.)
>>
>> Anyway, the reset vector's page table building is implemented in
>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
>
> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
> and set the default type to writeback.
>
> In any case we're going to have to quirk it, because of the broken
> guests in the wild.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]
  2015-07-15 19:30                   ` Xiao Guangrong
@ 2015-07-15 19:41                     ` Laszlo Ersek
  0 siblings, 0 replies; 32+ messages in thread
From: Laszlo Ersek @ 2015-07-15 19:41 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, Alex Williamson, gleb, mtosatti, kvm,
	linux-kernel, edk2-devel list, Jordan Justen (Intel address)

On 07/15/15 21:30, Xiao Guangrong wrote:
> 
> Hi,
> 
> I have posted the pachset to make OVMF happy and have CCed you guys,
> could you please check it if it works for you?

Sorry, I can't check it; I don't have an environment that exposes the
issue in the first place. Perhaps Alex can check it, or refer some of
the users that reported the problem to him to this patchset? (Those
users were using bleeding edge v4.2-rc1 anyway, unlike conservative me.)

Thanks!
Laszlo

> On 07/15/2015 05:15 AM, Paolo Bonzini wrote:
>>> The long delay that Alex reported (for the case when all guest memory
>>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>>> drivers -- and this decompression is extremely memory-intensive.
>>>
>>> (When Jordan implemented that reset vector first, we saw similar
>>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>>> page attributes). See
>>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>>> it here because it makes me appreciate the current problem report.)
>>>
>>> Anyway, the reset vector's page table building is implemented in
>>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
>>
>> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
>> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
>> and set the default type to writeback.
>>
>> In any case we're going to have to quirk it, because of the broken
>> guests in the wild.
>>
>> Paolo
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-07-15 19:41 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13  6:42 [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Xiao Guangrong
2015-05-13  6:42 ` [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR Xiao Guangrong
2015-05-13  8:09   ` Wanpeng Li
2015-07-12 17:33   ` Alex Williamson
2015-07-12 18:59     ` Xiao Guangrong
2015-07-13  7:32       ` Paolo Bonzini
2015-07-13 14:45         ` Xiao Guangrong
2015-07-13 15:13           ` Paolo Bonzini
2015-07-13 15:15             ` Xiao Guangrong
2015-07-14 21:12               ` MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR] Laszlo Ersek
2015-07-14 21:15                 ` Paolo Bonzini
2015-07-14 21:29                   ` Laszlo Ersek
2015-07-14 22:37                     ` Jordan Justen
2015-07-15  9:57                       ` Laszlo Ersek
2015-07-15  0:14                   ` [edk2] " Fan, Jeff
2015-07-15 19:30                   ` Xiao Guangrong
2015-07-15 19:41                     ` Laszlo Ersek
2015-07-12 19:12     ` [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR Bandan Das
2015-05-13  6:42 ` [PATCH v3 02/10] KVM: MMU: introduce for_each_rmap_spte() Xiao Guangrong
2015-05-13  6:42 ` [PATCH v3 03/10] KVM: MMU: introduce PT_MAX_HUGEPAGE_LEVEL Xiao Guangrong
2015-05-13  6:42 ` [PATCH v3 04/10] KVM: MMU: introduce for_each_slot_rmap_range Xiao Guangrong
2015-05-13  6:42 ` [PATCH v3 05/10] KVM: MMU: introduce slot_handle_level_range() and its helpers Xiao Guangrong
2015-05-13  6:42 ` [PATCH v3 06/10] KVM: MMU: use slot_handle_level and its helper to clean up the code Xiao Guangrong
2015-05-13  6:42 ` [PATCH v3 07/10] KVM: MMU: introduce kvm_zap_rmapp Xiao Guangrong
2015-05-13  6:42 ` [PATCH v3 08/10] KVM: MMU: introduce kvm_zap_gfn_range Xiao Guangrong
2015-05-13  6:42 ` [PATCH v3 09/10] KVM: MMU: fix MTRR update Xiao Guangrong
2015-05-13  8:43   ` Wanpeng Li
2015-05-13 14:10     ` Paolo Bonzini
2015-05-14  0:16       ` Wanpeng Li
2015-05-14  8:43         ` Paolo Bonzini
2015-05-13  6:42 ` [PATCH v3 10/10] KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed Xiao Guangrong
2015-05-13 14:14 ` [PATCH v3 00/10] KVM: MTRR fixes and some cleanups Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).