All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Fix nested EPT when A/D bits are disabled
@ 2017-07-01  0:26 Peter Feiner
  2017-07-01  0:26 ` [PATCH 1/4] x86: kvm: mmu: dead code thanks to access tracking Peter Feiner
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Peter Feiner @ 2017-07-01  0:26 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, David Matlack, Peter Feiner

This patch series fixes nested EPT when A/D bits are disabled in vmcs12 but
enabled in the kvm_intel module. Concretely, this series makes the
vmx_ept_access_test_paddr_not_present_ad_disabled kvm-unit-test pass instead
of getting stuck in an infinite loop.

I'm on vacation next week, so I won't see any feedback on this series until
July 10.

Thanks for reviewing!

Peter

*** BLURB HERE ***

Peter Feiner (4):
  x86: kvm: mmu: dead code thanks to access tracking
  x86: kvm: mmu: make spte mmio mask more explicit
  kvm: x86: mmu: allow A/D bits to be disabled in an mmu
  x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12

 arch/x86/include/asm/kvm_host.h |   3 +-
 arch/x86/kvm/mmu.c              | 151 ++++++++++++++++++++++++++--------------
 arch/x86/kvm/mmu.h              |   2 +-
 arch/x86/kvm/mmutrace.h         |   6 +-
 arch/x86/kvm/vmx.c              |  43 ++++++------
 arch/x86/kvm/x86.c              |   2 +-
 6 files changed, 125 insertions(+), 82 deletions(-)

-- 
2.13.2.725.g09c95d1e9-goog

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] x86: kvm: mmu: dead code thanks to access tracking
  2017-07-01  0:26 [PATCH 0/4] Fix nested EPT when A/D bits are disabled Peter Feiner
@ 2017-07-01  0:26 ` Peter Feiner
  2017-07-01  0:26 ` [PATCH 2/4] x86: kvm: mmu: make spte mmio mask more explicit Peter Feiner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Peter Feiner @ 2017-07-01  0:26 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, David Matlack, Peter Feiner

The MMU always has hardware A bits or access tracking support, thus
it's unnecessary to handle the scenario where we have neither.

Signed-off-by: Peter Feiner <pfeiner@google.com>
---
 arch/x86/kvm/mmu.c | 30 +++++++++---------------------
 1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5d3376f67794..dfd4cd67e5a6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -315,12 +315,21 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 	return likely(kvm_gen == spte_gen);
 }
 
+/*
+ * Sets the shadow PTE masks used by the MMU.
+ *
+ * Assumptions:
+ *  - Setting either @accessed_mask or @dirty_mask requires setting both
+ *  - At least one of @accessed_mask or @acc_track_mask must be set
+ */
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
 		u64 acc_track_mask)
 {
 	if (acc_track_mask != 0)
 		acc_track_mask |= SPTE_SPECIAL_MASK;
+	BUG_ON(!dirty_mask != !accessed_mask);
+	BUG_ON(!accessed_mask && !acc_track_mask);
 
 	shadow_user_mask = user_mask;
 	shadow_accessed_mask = accessed_mask;
@@ -1766,18 +1775,9 @@ static int kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	/*
-	 * If there's no access bit in the secondary pte set by the hardware and
-	 * fast access tracking is also not enabled, it's up to gup-fast/gup to
-	 * set the access bit in the primary pte or in the page structure.
-	 */
-	if (!shadow_accessed_mask && !shadow_acc_track_mask)
-		goto out;
-
 	for_each_rmap_spte(rmap_head, &iter, sptep)
 		if (is_accessed_spte(*sptep))
 			return 1;
-out:
 	return 0;
 }
 
@@ -1798,18 +1798,6 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
 {
-	/*
-	 * In case of absence of EPT Access and Dirty Bits supports,
-	 * emulate the accessed bit for EPT, by checking if this page has
-	 * an EPT mapping, and clearing it if it does. On the next access,
-	 * a new EPT mapping will be established.
-	 * This has some overhead, but not as much as the cost of swapping
-	 * out actively used pages or breaking up actively used hugepages.
-	 */
-	if (!shadow_accessed_mask && !shadow_acc_track_mask)
-		return kvm_handle_hva_range(kvm, start, end, 0,
-					    kvm_unmap_rmapp);
-
 	return kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp);
 }
 
-- 
2.13.2.725.g09c95d1e9-goog

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] x86: kvm: mmu: make spte mmio mask more explicit
  2017-07-01  0:26 [PATCH 0/4] Fix nested EPT when A/D bits are disabled Peter Feiner
  2017-07-01  0:26 ` [PATCH 1/4] x86: kvm: mmu: dead code thanks to access tracking Peter Feiner
@ 2017-07-01  0:26 ` Peter Feiner
  2017-07-01  0:26 ` [PATCH 3/4] kvm: x86: mmu: allow A/D bits to be disabled in an mmu Peter Feiner
  2017-07-01  0:26 ` [PATCH 4/4] x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 Peter Feiner
  3 siblings, 0 replies; 9+ messages in thread
From: Peter Feiner @ 2017-07-01  0:26 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, David Matlack, Peter Feiner

Specify both a mask (i.e., bits to consider) and a value (i.e.,
pattern of bits that indicates a special PTE) for mmio SPTEs. On
Intel, this lets us pack even more information into the
(SPTE_SPECIAL_MASK | EPT_VMX_RWX_MASK) mask we use for access
tracking liberating all (SPTE_SPECIAL_MASK | (non-misconfigured-RWX))
values.

Signed-off-by: Peter Feiner <pfeiner@google.com>
---
 arch/x86/kvm/mmu.c | 9 ++++++---
 arch/x86/kvm/mmu.h | 2 +-
 arch/x86/kvm/vmx.c | 3 ++-
 arch/x86/kvm/x86.c | 2 +-
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index dfd4cd67e5a6..10b3cfc7b411 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -183,6 +183,7 @@ static u64 __read_mostly shadow_user_mask;
 static u64 __read_mostly shadow_accessed_mask;
 static u64 __read_mostly shadow_dirty_mask;
 static u64 __read_mostly shadow_mmio_mask;
+static u64 __read_mostly shadow_mmio_value;
 static u64 __read_mostly shadow_present_mask;
 
 /*
@@ -207,8 +208,10 @@ static const u64 shadow_acc_track_saved_bits_shift = PT64_SECOND_AVAIL_BITS_SHIF
 static void mmu_spte_set(u64 *sptep, u64 spte);
 static void mmu_free_roots(struct kvm_vcpu *vcpu);
 
-void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask)
+void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
 {
+	BUG_ON((mmio_mask & mmio_value) != mmio_value);
+	shadow_mmio_value = mmio_value | SPTE_SPECIAL_MASK;
 	shadow_mmio_mask = mmio_mask | SPTE_SPECIAL_MASK;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
@@ -270,7 +273,7 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
 	u64 mask = generation_mmio_spte_mask(gen);
 
 	access &= ACC_WRITE_MASK | ACC_USER_MASK;
-	mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
+	mask |= shadow_mmio_value | access | gfn << PAGE_SHIFT;
 
 	trace_mark_mmio_spte(sptep, gfn, access, gen);
 	mmu_spte_set(sptep, mask);
@@ -278,7 +281,7 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
 
 static bool is_mmio_spte(u64 spte)
 {
-	return (spte & shadow_mmio_mask) == shadow_mmio_mask;
+	return (spte & shadow_mmio_mask) == shadow_mmio_value;
 }
 
 static gfn_t get_mmio_spte_gfn(u64 spte)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 27975807cc64..41d362e95681 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -51,7 +51,7 @@ static inline u64 rsvd_bits(int s, int e)
 	return ((1ULL << (e - s + 1)) - 1) << s;
 }
 
-void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
+void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value);
 
 void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6dec552b28f..e59b01a1d431 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5163,7 +5163,8 @@ static void ept_set_mmio_spte_mask(void)
 	 * EPT Misconfigurations can be generated if the value of bits 2:0
 	 * of an EPT paging-structure entry is 110b (write/execute).
 	 */
-	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE);
+	kvm_mmu_set_mmio_spte_mask(VMX_EPT_RWX_MASK,
+				   VMX_EPT_MISCONFIG_WX_VALUE);
 }
 
 #define VMX_XSS_EXIT_BITMAP 0
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a2cd0997343c..ac6eb99b99b5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6009,7 +6009,7 @@ static void kvm_set_mmio_spte_mask(void)
 		mask &= ~1ull;
 #endif
 
-	kvm_mmu_set_mmio_spte_mask(mask);
+	kvm_mmu_set_mmio_spte_mask(mask, mask);
 }
 
 #ifdef CONFIG_X86_64
-- 
2.13.2.725.g09c95d1e9-goog

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] kvm: x86: mmu: allow A/D bits to be disabled in an mmu
  2017-07-01  0:26 [PATCH 0/4] Fix nested EPT when A/D bits are disabled Peter Feiner
  2017-07-01  0:26 ` [PATCH 1/4] x86: kvm: mmu: dead code thanks to access tracking Peter Feiner
  2017-07-01  0:26 ` [PATCH 2/4] x86: kvm: mmu: make spte mmio mask more explicit Peter Feiner
@ 2017-07-01  0:26 ` Peter Feiner
  2017-07-03  9:22   ` Paolo Bonzini
  2017-07-01  0:26 ` [PATCH 4/4] x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 Peter Feiner
  3 siblings, 1 reply; 9+ messages in thread
From: Peter Feiner @ 2017-07-01  0:26 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, David Matlack, Peter Feiner

Adds the plumbing to disable A/D bits in the MMU based on a new role
bit, ad_disabled. When A/D is disabled, the MMU operates as though A/D
aren't available (i.e., using access tracking faults instead).

To avoid SP -> kvm_mmu_page.role.ad_disabled lookups all over the
place, A/D disablement is now stored in the SPTE. This state is stored
in the SPTE by tweaking the use of SPTE_SPECIAL_MASK for access
tracking. Rather than just setting SPTE_SPECIAL_MASK when an
access-tracking SPTE is non-present, we now always set
SPTE_SPECIAL_MASK for access-tracking SPTEs.

Signed-off-by: Peter Feiner <pfeiner@google.com>
---
 arch/x86/include/asm/kvm_host.h |   3 +-
 arch/x86/kvm/mmu.c              | 111 +++++++++++++++++++++++++++++-----------
 arch/x86/kvm/mmutrace.h         |   6 ++-
 3 files changed, 87 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 695605eb1dfb..2f19c680c3fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -254,7 +254,8 @@ union kvm_mmu_page_role {
 		unsigned cr0_wp:1;
 		unsigned smep_andnot_wp:1;
 		unsigned smap_andnot_wp:1;
-		unsigned :8;
+		unsigned ad_disabled:1;
+		unsigned :7;
 
 		/*
 		 * This is left at the top of the word so that
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 10b3cfc7b411..e814fb6248ac 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -187,10 +187,9 @@ static u64 __read_mostly shadow_mmio_value;
 static u64 __read_mostly shadow_present_mask;
 
 /*
- * The mask/value to distinguish a PTE that has been marked not-present for
- * access tracking purposes.
- * The mask would be either 0 if access tracking is disabled, or
- * SPTE_SPECIAL_MASK|VMX_EPT_RWX_MASK if access tracking is enabled.
+ * SPTEs used by MMUs without A/D bits are marked with shadow_acc_track_value.
+ * Non-present SPTEs with shadow_acc_track_value set are in place for access
+ * tracking.
  */
 static u64 __read_mostly shadow_acc_track_mask;
 static const u64 shadow_acc_track_value = SPTE_SPECIAL_MASK;
@@ -216,10 +215,29 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
 
+static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
+{
+	return shadow_accessed_mask == 0 || sp->role.ad_disabled;
+}
+
+static inline bool spte_ad_enabled(u64 spte)
+{
+	return !(spte & shadow_acc_track_value);
+}
+
+static u64 spte_shadow_accessed_mask(u64 spte)
+{
+	return spte_ad_enabled(spte) ? shadow_accessed_mask : 0;
+}
+
+static u64 spte_shadow_dirty_mask(u64 spte)
+{
+	return spte_ad_enabled(spte) ? shadow_dirty_mask : 0;
+}
+
 static inline bool is_access_track_spte(u64 spte)
 {
-	/* Always false if shadow_acc_track_mask is zero.  */
-	return (spte & shadow_acc_track_mask) == shadow_acc_track_value;
+	return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
 }
 
 /*
@@ -329,10 +347,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
 		u64 acc_track_mask)
 {
-	if (acc_track_mask != 0)
-		acc_track_mask |= SPTE_SPECIAL_MASK;
 	BUG_ON(!dirty_mask != !accessed_mask);
 	BUG_ON(!accessed_mask && !acc_track_mask);
+	BUG_ON(acc_track_mask & shadow_acc_track_value);
 
 	shadow_user_mask = user_mask;
 	shadow_accessed_mask = accessed_mask;
@@ -341,7 +358,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 	shadow_x_mask = x_mask;
 	shadow_present_mask = p_mask;
 	shadow_acc_track_mask = acc_track_mask;
-	WARN_ON(shadow_accessed_mask != 0 && shadow_acc_track_mask != 0);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 
@@ -561,7 +577,7 @@ static bool spte_has_volatile_bits(u64 spte)
 	    is_access_track_spte(spte))
 		return true;
 
-	if (shadow_accessed_mask) {
+	if (spte_ad_enabled(spte)) {
 		if ((spte & shadow_accessed_mask) == 0 ||
 	    	    (is_writable_pte(spte) && (spte & shadow_dirty_mask) == 0))
 			return true;
@@ -572,14 +588,17 @@ static bool spte_has_volatile_bits(u64 spte)
 
 static bool is_accessed_spte(u64 spte)
 {
-	return shadow_accessed_mask ? spte & shadow_accessed_mask
-				    : !is_access_track_spte(spte);
+	u64 accessed_mask = spte_shadow_accessed_mask(spte);
+
+	return accessed_mask ? spte & accessed_mask
+			     : !is_access_track_spte(spte);
 }
 
 static bool is_dirty_spte(u64 spte)
 {
-	return shadow_dirty_mask ? spte & shadow_dirty_mask
-				 : spte & PT_WRITABLE_MASK;
+	u64 dirty_mask = spte_shadow_dirty_mask(spte);
+
+	return dirty_mask ? spte & dirty_mask : spte & PT_WRITABLE_MASK;
 }
 
 /* Rules for using mmu_spte_set:
@@ -719,10 +738,10 @@ static u64 mmu_spte_get_lockless(u64 *sptep)
 
 static u64 mark_spte_for_access_track(u64 spte)
 {
-	if (shadow_accessed_mask != 0)
+	if (spte_ad_enabled(spte))
 		return spte & ~shadow_accessed_mask;
 
-	if (shadow_acc_track_mask == 0 || is_access_track_spte(spte))
+	if (is_access_track_spte(spte))
 		return spte;
 
 	/*
@@ -741,7 +760,6 @@ static u64 mark_spte_for_access_track(u64 spte)
 	spte |= (spte & shadow_acc_track_saved_bits_mask) <<
 		shadow_acc_track_saved_bits_shift;
 	spte &= ~shadow_acc_track_mask;
-	spte |= shadow_acc_track_value;
 
 	return spte;
 }
@@ -753,6 +771,7 @@ static u64 restore_acc_track_spte(u64 spte)
 	u64 saved_bits = (spte >> shadow_acc_track_saved_bits_shift)
 			 & shadow_acc_track_saved_bits_mask;
 
+	WARN_ON_ONCE(spte_ad_enabled(spte));
 	WARN_ON_ONCE(!is_access_track_spte(spte));
 
 	new_spte &= ~shadow_acc_track_mask;
@@ -771,7 +790,7 @@ static bool mmu_spte_age(u64 *sptep)
 	if (!is_accessed_spte(spte))
 		return false;
 
-	if (shadow_accessed_mask) {
+	if (spte_ad_enabled(spte)) {
 		clear_bit((ffs(shadow_accessed_mask) - 1),
 			  (unsigned long *)sptep);
 	} else {
@@ -1402,6 +1421,22 @@ static bool spte_clear_dirty(u64 *sptep)
 	return mmu_spte_update(sptep, spte);
 }
 
+static bool wrprot_ad_disabled_spte(u64 *sptep)
+{
+	bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
+					       (unsigned long *)sptep);
+	if (was_writable)
+		kvm_set_pfn_dirty(spte_to_pfn(*sptep));
+
+	return was_writable;
+}
+
+/*
+ * Gets the GFN ready for another round of dirty logging by clearing the
+ *	- D bit on ad-enabled SPTEs, and
+ *	- W bit on ad-disabled SPTEs.
+ * Returns true iff any D or W bits were cleared.
+ */
 static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
@@ -1409,7 +1444,10 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 	bool flush = false;
 
 	for_each_rmap_spte(rmap_head, &iter, sptep)
-		flush |= spte_clear_dirty(sptep);
+		if (spte_ad_enabled(*sptep))
+			flush |= spte_clear_dirty(sptep);
+		else
+			flush |= wrprot_ad_disabled_spte(sptep);
 
 	return flush;
 }
@@ -1432,7 +1470,8 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 	bool flush = false;
 
 	for_each_rmap_spte(rmap_head, &iter, sptep)
-		flush |= spte_set_dirty(sptep);
+		if (spte_ad_enabled(*sptep))
+			flush |= spte_set_dirty(sptep);
 
 	return flush;
 }
@@ -1464,7 +1503,8 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 }
 
 /**
- * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages
+ * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages, or write
+ * protect the page if the D-bit isn't supported.
  * @kvm: kvm instance
  * @slot: slot to clear D-bit
  * @gfn_offset: start of the BITS_PER_LONG pages we care about
@@ -2389,7 +2429,12 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
 	BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
 
 	spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK |
-	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+	       shadow_user_mask | shadow_x_mask;
+
+	if (sp_ad_disabled(sp))
+		spte |= shadow_acc_track_value;
+	else
+		spte |= shadow_accessed_mask;
 
 	mmu_spte_set(sptep, spte);
 
@@ -2657,10 +2702,15 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 {
 	u64 spte = 0;
 	int ret = 0;
+	struct kvm_mmu_page *sp;
 
 	if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
 		return 0;
 
+	sp = page_header(__pa(sptep));
+	if (sp_ad_disabled(sp))
+		spte |= shadow_acc_track_value;
+
 	/*
 	 * For the EPT case, shadow_present_mask is 0 if hardware
 	 * supports exec-only page table entries.  In that case,
@@ -2669,7 +2719,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	 */
 	spte |= shadow_present_mask;
 	if (!speculative)
-		spte |= shadow_accessed_mask;
+		spte |= spte_shadow_accessed_mask(spte);
 
 	if (pte_access & ACC_EXEC_MASK)
 		spte |= shadow_x_mask;
@@ -2726,7 +2776,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
 	if (pte_access & ACC_WRITE_MASK) {
 		kvm_vcpu_mark_page_dirty(vcpu, gfn);
-		spte |= shadow_dirty_mask;
+		spte |= spte_shadow_dirty_mask(spte);
 	}
 
 	if (speculative)
@@ -2868,16 +2918,16 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 {
 	struct kvm_mmu_page *sp;
 
+	sp = page_header(__pa(sptep));
+
 	/*
-	 * Since it's no accessed bit on EPT, it's no way to
-	 * distinguish between actually accessed translations
-	 * and prefetched, so disable pte prefetch if EPT is
-	 * enabled.
+	 * Without accessed bits, there's no way to distinguish between
+	 * actually accessed translations and prefetched, so disable pte
+	 * prefetch if accessed bits aren't available.
 	 */
-	if (!shadow_accessed_mask)
+	if (sp_ad_disabled(sp))
 		return;
 
-	sp = page_header(__pa(sptep));
 	if (sp->role.level > PT_PAGE_TABLE_LEVEL)
 		return;
 
@@ -4624,6 +4674,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	mask.smep_andnot_wp = 1;
 	mask.smap_andnot_wp = 1;
 	mask.smm = 1;
+	mask.ad_disabled = 1;
 
 	/*
 	 * If we don't have indirect shadow pages, it means no page is
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 5a24b846a1cb..8b97a6cba8d1 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -30,8 +30,9 @@
 								        \
 	role.word = __entry->role;					\
 									\
-	trace_seq_printf(p, "sp gen %lx gfn %llx %u%s q%u%s %s%s"	\
-			 " %snxe root %u %s%c",	__entry->mmu_valid_gen,	\
+	trace_seq_printf(p, "sp gen %lx gfn %llx l%u%s q%u%s %s%s"	\
+			 " %snxe %sad root %u %s%c",			\
+			 __entry->mmu_valid_gen,			\
 			 __entry->gfn, role.level,			\
 			 role.cr4_pae ? " pae" : "",			\
 			 role.quadrant,					\
@@ -39,6 +40,7 @@
 			 access_str[role.access],			\
 			 role.invalid ? " invalid" : "",		\
 			 role.nxe ? "" : "!",				\
+			 role.ad_disabled ? "!" : "",			\
 			 __entry->root_count,				\
 			 __entry->unsync ? "unsync" : "sync", 0);	\
 	saved_ptr;							\
-- 
2.13.2.725.g09c95d1e9-goog

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  2017-07-01  0:26 [PATCH 0/4] Fix nested EPT when A/D bits are disabled Peter Feiner
                   ` (2 preceding siblings ...)
  2017-07-01  0:26 ` [PATCH 3/4] kvm: x86: mmu: allow A/D bits to be disabled in an mmu Peter Feiner
@ 2017-07-01  0:26 ` Peter Feiner
  2017-07-01  5:29   ` Paolo Bonzini
  3 siblings, 1 reply; 9+ messages in thread
From: Peter Feiner @ 2017-07-01  0:26 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, David Matlack, Peter Feiner

EPT A/D was enabled in the vmcs02 EPTP regardless of the vmcs12's EPTP
value. The problem is that enabling A/D changes the behavior of L2's
x86 page table walks as seen by L1. With A/D enabled, x86 page table
walks are always treated as EPT writes.

The EPT A/D emulation patch (XXX) tried to work around this problem
by clearing the write bit in the exit qualification for EPT violations
triggered by page walks. However, that fixup introduced the opposite
bug: page-table walks that actually set x86 A/D bits were *missing*
the write bit in the exit qualification.

This patch fixes the problem by disabling EPT A/D in the shadow MMU
when EPT A/D is disabled in vmcs12's EPTP.

Signed-off-by: Peter Feiner <pfeiner@google.com>
---
 arch/x86/kvm/mmu.c |  1 +
 arch/x86/kvm/vmx.c | 40 ++++++++++++++++++----------------------
 2 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e814fb6248ac..d483f81699fe 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4415,6 +4415,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 	context->root_level = context->shadow_root_level;
 	context->root_hpa = INVALID_PAGE;
 	context->direct_map = false;
+	context->base_role.ad_disabled = !accessed_dirty;
 
 	update_permission_bitmask(vcpu, context, true);
 	update_pkru_bitmask(vcpu, context, true);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e59b01a1d431..58137690d57d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -910,8 +910,9 @@ static void nested_release_page_clean(struct page *page)
 	kvm_release_page_clean(page);
 }
 
+static bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu);
 static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu);
-static u64 construct_eptp(unsigned long root_hpa);
+static u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa);
 static bool vmx_xsaves_supported(void);
 static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
 static void vmx_set_segment(struct kvm_vcpu *vcpu,
@@ -4013,7 +4014,7 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
 	if (enable_ept) {
 		if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
 			return;
-		ept_sync_context(construct_eptp(vcpu->arch.mmu.root_hpa));
+		ept_sync_context(construct_eptp(vcpu, vcpu->arch.mmu.root_hpa));
 	} else {
 		vpid_sync_context(vpid);
 	}
@@ -4188,14 +4189,15 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	vmx->emulation_required = emulation_required(vcpu);
 }
 
-static u64 construct_eptp(unsigned long root_hpa)
+static u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa)
 {
 	u64 eptp;
 
 	/* TODO write the value reading from MSR */
 	eptp = VMX_EPT_DEFAULT_MT |
 		VMX_EPT_DEFAULT_GAW << VMX_EPT_GAW_EPTP_SHIFT;
-	if (enable_ept_ad_bits)
+	if (enable_ept_ad_bits &&
+	    (!is_guest_mode(vcpu) || nested_ept_ad_enabled(vcpu)))
 		eptp |= VMX_EPT_AD_ENABLE_BIT;
 	eptp |= (root_hpa & PAGE_MASK);
 
@@ -4209,7 +4211,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 
 	guest_cr3 = cr3;
 	if (enable_ept) {
-		eptp = construct_eptp(cr3);
+		eptp = construct_eptp(vcpu, cr3);
 		vmcs_write64(EPT_POINTER, eptp);
 		if (is_paging(vcpu) || is_guest_mode(vcpu))
 			guest_cr3 = kvm_read_cr3(vcpu);
@@ -6214,17 +6216,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
-	if (is_guest_mode(vcpu)
-	    && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
-		/*
-		 * Fix up exit_qualification according to whether guest
-		 * page table accesses are reads or writes.
-		 */
-		u64 eptp = nested_ept_get_cr3(vcpu);
-		if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
-			exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
-	}
-
 	/*
 	 * EPT violation happened while executing iret from NMI,
 	 * "blocked by NMI" bit has to be set before next VM entry.
@@ -6447,7 +6438,7 @@ void vmx_enable_tdp(void)
 		enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull,
 		0ull, VMX_EPT_EXECUTABLE_MASK,
 		cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK,
-		enable_ept_ad_bits ? 0ull : VMX_EPT_RWX_MASK);
+		VMX_EPT_RWX_MASK);
 
 	ept_set_mmio_spte_mask();
 	kvm_enable_tdp();
@@ -8358,7 +8349,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 	 * mode as if vcpus is in root mode, the PML buffer must has been
 	 * flushed already.
 	 */
-	if (enable_pml)
+	if (enable_pml && !is_guest_mode(vcpu))
 		vmx_flush_pml_buffer(vcpu);
 
 	/* If guest state is invalid, start emulating */
@@ -9379,6 +9370,11 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
 	vmcs12->guest_physical_address = fault->address;
 }
 
+static bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
+{
+	return nested_ept_get_cr3(vcpu) & VMX_EPT_AD_ENABLE_BIT;
+}
+
 /* Callbacks for nested_ept_init_mmu_context: */
 
 static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
@@ -9389,18 +9385,18 @@ static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
 
 static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
 {
-	u64 eptp;
+	bool wants_ad;
 
 	WARN_ON(mmu_is_nested(vcpu));
-	eptp = nested_ept_get_cr3(vcpu);
-	if ((eptp & VMX_EPT_AD_ENABLE_BIT) && !enable_ept_ad_bits)
+	wants_ad = nested_ept_ad_enabled(vcpu);
+	if (wants_ad && !enable_ept_ad_bits)
 		return 1;
 
 	kvm_mmu_unload(vcpu);
 	kvm_init_shadow_ept_mmu(vcpu,
 			to_vmx(vcpu)->nested.nested_vmx_ept_caps &
 			VMX_EPT_EXECUTE_ONLY_BIT,
-			eptp & VMX_EPT_AD_ENABLE_BIT);
+			wants_ad);
 	vcpu->arch.mmu.set_cr3           = vmx_set_cr3;
 	vcpu->arch.mmu.get_cr3           = nested_ept_get_cr3;
 	vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
-- 
2.13.2.725.g09c95d1e9-goog

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  2017-07-01  0:26 ` [PATCH 4/4] x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 Peter Feiner
@ 2017-07-01  5:29   ` Paolo Bonzini
  2017-07-01  7:29     ` Peter Feiner
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2017-07-01  5:29 UTC (permalink / raw)
  To: Peter Feiner; +Cc: kvm, David Matlack


> @@ -8358,7 +8349,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
>  	 * mode as if vcpus is in root mode, the PML buffer must has been
>  	 * flushed already.
>  	 */
> -	if (enable_pml)
> +	if (enable_pml && !is_guest_mode(vcpu))
>  		vmx_flush_pml_buffer(vcpu);
>  
>  	/* If guest state is invalid, start emulating */

I don't understand this.  You need to flush the PML buffer if
L2 is running with EPT A/D bits enabled, don't you? Apart from
this it seems sane, I only have to look at patch 3 more carefully.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  2017-07-01  5:29   ` Paolo Bonzini
@ 2017-07-01  7:29     ` Peter Feiner
  2017-07-01  7:59       ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Feiner @ 2017-07-01  7:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm list, David Matlack

On Fri, Jun 30, 2017 at 10:29 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> @@ -8358,7 +8349,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
>>        * mode as if vcpus is in root mode, the PML buffer must has been
>>        * flushed already.
>>        */
>> -     if (enable_pml)
>> +     if (enable_pml && !is_guest_mode(vcpu))
>>               vmx_flush_pml_buffer(vcpu);
>>
>>       /* If guest state is invalid, start emulating */
>
> I don't understand this.  You need to flush the PML buffer if
> L2 is running with EPT A/D bits enabled, don't you? Apart from
> this it seems sane, I only have to look at patch 3 more carefully.

You're right: this is busted. I wrote these patches before you
implemented EPT A/D nesting (i.e., PML was moot for guest mode).

I think the patch hunk can go away entirely actually. As long as PML
is enabled, it's ok to flush the buffer. The interesting case is when
the vcpu is in guest mode with EPT A/D disabled. In this case, L0's
PML isn't filled while L2 runs because EPT A/D is disabled in the
vmcs02 (thanks to this patch), so there's nothing in the buffer!

It's troubling is that there's no test case covering L0's use of PML +
nesting. Stress testing live migrations of L1 hypervisors (and
implicitly their L2 guests) is one way of doing it, but it's pretty
clumsy. A tightly coupled L0 userspace, L1 and L2 guests would be the
way to go because you could just coordinate ioctls with guest memory
accesses.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  2017-07-01  7:29     ` Peter Feiner
@ 2017-07-01  7:59       ` Paolo Bonzini
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2017-07-01  7:59 UTC (permalink / raw)
  To: Peter Feiner; +Cc: kvm list, David Matlack



On 01/07/2017 09:29, Peter Feiner wrote:
> You're right: this is busted. I wrote these patches before you
> implemented EPT A/D nesting (i.e., PML was moot for guest mode).
> 
> I think the patch hunk can go away entirely actually. As long as PML
> is enabled, it's ok to flush the buffer. The interesting case is when
> the vcpu is in guest mode with EPT A/D disabled. In this case, L0's
> PML isn't filled while L2 runs because EPT A/D is disabled in the
> vmcs02 (thanks to this patch), so there's nothing in the buffer!

That was my thought too.

> It's troubling is that there's no test case covering L0's use of PML +
> nesting. Stress testing live migrations of L1 hypervisors (and
> implicitly their L2 guests) is one way of doing it, but it's pretty
> clumsy. A tightly coupled L0 userspace, L1 and L2 guests would be the
> way to go because you could just coordinate ioctls with guest memory
> accesses.

Indeed.  We need to do api/-style testing of nested virt.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] kvm: x86: mmu: allow A/D bits to be disabled in an mmu
  2017-07-01  0:26 ` [PATCH 3/4] kvm: x86: mmu: allow A/D bits to be disabled in an mmu Peter Feiner
@ 2017-07-03  9:22   ` Paolo Bonzini
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2017-07-03  9:22 UTC (permalink / raw)
  To: Peter Feiner, kvm; +Cc: David Matlack



On 01/07/2017 02:26, Peter Feiner wrote:
> Adds the plumbing to disable A/D bits in the MMU based on a new role
> bit, ad_disabled. When A/D is disabled, the MMU operates as though A/D
> aren't available (i.e., using access tracking faults instead).
> 
> To avoid SP -> kvm_mmu_page.role.ad_disabled lookups all over the
> place, A/D disablement is now stored in the SPTE. This state is stored
> in the SPTE by tweaking the use of SPTE_SPECIAL_MASK for access
> tracking. Rather than just setting SPTE_SPECIAL_MASK when an
> access-tracking SPTE is non-present, we now always set
> SPTE_SPECIAL_MASK for access-tracking SPTEs.

Here are the only small fixes I'd like to make (not tested yet, but I'm
trying to poke holes in my understanding of the patch):

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index 481b6a9c25d5..f50d45b1e967 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -179,6 +179,10 @@ Shadow pages contain the following information:
     shadow page; it is also used to go back from a struct kvm_mmu_page
     to a memslot, through the kvm_memslots_for_spte_role macro and
     __gfn_to_memslot.
+  role.ad_disabled:
+    Is 1 if the MMU instance cannot use A/D bits.  EPT did not have A/D
+    bits before Haswell; shadow EPT page tables also cannot use A/D bits
+    if the L1 hypervisor does not enable them.
   gfn:
     Either the guest page table containing the translations shadowed by this
     page, or the base page frame for linear translations.  See role.direct.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e814fb6248ac..9aeef6ade6b4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -217,21 +217,24 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
 
 static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
 {
-	return shadow_accessed_mask == 0 || sp->role.ad_disabled;
+	return sp->role.ad_disabled;
 }
 
 static inline bool spte_ad_enabled(u64 spte)
 {
+	MMU_WARN_ON((spte & shadow_mmio_mask) == shadow_mmio_value);
 	return !(spte & shadow_acc_track_value);
 }
 
-static u64 spte_shadow_accessed_mask(u64 spte)
+static inline u64 spte_shadow_accessed_mask(u64 spte)
 {
+	MMU_WARN_ON((spte & shadow_mmio_mask) == shadow_mmio_value);
 	return spte_ad_enabled(spte) ? shadow_accessed_mask : 0;
 }
 
-static u64 spte_shadow_dirty_mask(u64 spte)
+static inline u64 spte_shadow_dirty_mask(u64 spte)
 {
+	MMU_WARN_ON((spte & shadow_mmio_mask) == shadow_mmio_value);
 	return spte_ad_enabled(spte) ? shadow_dirty_mask : 0;
 }
 
@@ -4328,6 +4329,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 
 	context->base_role.word = 0;
 	context->base_role.smm = is_smm(vcpu);
+	context->base_role.ad_disabled = (shadow_accessed_mask == 0);
 	context->page_fault = tdp_page_fault;
 	context->sync_page = nonpaging_sync_page;
 	context->invlpg = nonpaging_invlpg;

> Signed-off-by: Peter Feiner <pfeiner@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   3 +-
>  arch/x86/kvm/mmu.c              | 111 +++++++++++++++++++++++++++++-----------
>  arch/x86/kvm/mmutrace.h         |   6 ++-
>  3 files changed, 87 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 695605eb1dfb..2f19c680c3fd 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -254,7 +254,8 @@ union kvm_mmu_page_role {
>  		unsigned cr0_wp:1;
>  		unsigned smep_andnot_wp:1;
>  		unsigned smap_andnot_wp:1;
> -		unsigned :8;
> +		unsigned ad_disabled:1;
> +		unsigned :7;
>  
>  		/*
>  		 * This is left at the top of the word so that
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 10b3cfc7b411..e814fb6248ac 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -187,10 +187,9 @@ static u64 __read_mostly shadow_mmio_value;
>  static u64 __read_mostly shadow_present_mask;
>  
>  /*
> - * The mask/value to distinguish a PTE that has been marked not-present for
> - * access tracking purposes.
> - * The mask would be either 0 if access tracking is disabled, or
> - * SPTE_SPECIAL_MASK|VMX_EPT_RWX_MASK if access tracking is enabled.
> + * SPTEs used by MMUs without A/D bits are marked with shadow_acc_track_value.
> + * Non-present SPTEs with shadow_acc_track_value set are in place for access
> + * tracking.
>   */
>  static u64 __read_mostly shadow_acc_track_mask;
>  static const u64 shadow_acc_track_value = SPTE_SPECIAL_MASK;
> @@ -216,10 +215,29 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
>  }
>  EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
>  
> +static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
> +{
> +	return shadow_accessed_mask == 0 || sp->role.ad_disabled;
> +}
> +
> +static inline bool spte_ad_enabled(u64 spte)
> +{
> +	return !(spte & shadow_acc_track_value);
> +}
> +
> +static u64 spte_shadow_accessed_mask(u64 spte)
> +{
> +	return spte_ad_enabled(spte) ? shadow_accessed_mask : 0;
> +}
> +
> +static u64 spte_shadow_dirty_mask(u64 spte)
> +{
> +	return spte_ad_enabled(spte) ? shadow_dirty_mask : 0;
> +}
> +
>  static inline bool is_access_track_spte(u64 spte)
>  {
> -	/* Always false if shadow_acc_track_mask is zero.  */
> -	return (spte & shadow_acc_track_mask) == shadow_acc_track_value;
> +	return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
>  }
>  
>  /*
> @@ -329,10 +347,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>  		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
>  		u64 acc_track_mask)
>  {
> -	if (acc_track_mask != 0)
> -		acc_track_mask |= SPTE_SPECIAL_MASK;
>  	BUG_ON(!dirty_mask != !accessed_mask);
>  	BUG_ON(!accessed_mask && !acc_track_mask);
> +	BUG_ON(acc_track_mask & shadow_acc_track_value);
>  
>  	shadow_user_mask = user_mask;
>  	shadow_accessed_mask = accessed_mask;
> @@ -341,7 +358,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>  	shadow_x_mask = x_mask;
>  	shadow_present_mask = p_mask;
>  	shadow_acc_track_mask = acc_track_mask;
> -	WARN_ON(shadow_accessed_mask != 0 && shadow_acc_track_mask != 0);
>  }
>  EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
>  
> @@ -561,7 +577,7 @@ static bool spte_has_volatile_bits(u64 spte)
>  	    is_access_track_spte(spte))
>  		return true;
>  
> -	if (shadow_accessed_mask) {
> +	if (spte_ad_enabled(spte)) {
>  		if ((spte & shadow_accessed_mask) == 0 ||
>  	    	    (is_writable_pte(spte) && (spte & shadow_dirty_mask) == 0))
>  			return true;
> @@ -572,14 +588,17 @@ static bool spte_has_volatile_bits(u64 spte)
>  
>  static bool is_accessed_spte(u64 spte)
>  {
> -	return shadow_accessed_mask ? spte & shadow_accessed_mask
> -				    : !is_access_track_spte(spte);
> +	u64 accessed_mask = spte_shadow_accessed_mask(spte);
> +
> +	return accessed_mask ? spte & accessed_mask
> +			     : !is_access_track_spte(spte);
>  }
>  
>  static bool is_dirty_spte(u64 spte)
>  {
> -	return shadow_dirty_mask ? spte & shadow_dirty_mask
> -				 : spte & PT_WRITABLE_MASK;
> +	u64 dirty_mask = spte_shadow_dirty_mask(spte);
> +
> +	return dirty_mask ? spte & dirty_mask : spte & PT_WRITABLE_MASK;
>  }
>  
>  /* Rules for using mmu_spte_set:
> @@ -719,10 +738,10 @@ static u64 mmu_spte_get_lockless(u64 *sptep)
>  
>  static u64 mark_spte_for_access_track(u64 spte)
>  {
> -	if (shadow_accessed_mask != 0)
> +	if (spte_ad_enabled(spte))
>  		return spte & ~shadow_accessed_mask;
>  
> -	if (shadow_acc_track_mask == 0 || is_access_track_spte(spte))
> +	if (is_access_track_spte(spte))
>  		return spte;
>  
>  	/*
> @@ -741,7 +760,6 @@ static u64 mark_spte_for_access_track(u64 spte)
>  	spte |= (spte & shadow_acc_track_saved_bits_mask) <<
>  		shadow_acc_track_saved_bits_shift;
>  	spte &= ~shadow_acc_track_mask;
> -	spte |= shadow_acc_track_value;
>  
>  	return spte;
>  }
> @@ -753,6 +771,7 @@ static u64 restore_acc_track_spte(u64 spte)
>  	u64 saved_bits = (spte >> shadow_acc_track_saved_bits_shift)
>  			 & shadow_acc_track_saved_bits_mask;
>  
> +	WARN_ON_ONCE(spte_ad_enabled(spte));
>  	WARN_ON_ONCE(!is_access_track_spte(spte));
>  
>  	new_spte &= ~shadow_acc_track_mask;
> @@ -771,7 +790,7 @@ static bool mmu_spte_age(u64 *sptep)
>  	if (!is_accessed_spte(spte))
>  		return false;
>  
> -	if (shadow_accessed_mask) {
> +	if (spte_ad_enabled(spte)) {
>  		clear_bit((ffs(shadow_accessed_mask) - 1),
>  			  (unsigned long *)sptep);
>  	} else {
> @@ -1402,6 +1421,22 @@ static bool spte_clear_dirty(u64 *sptep)
>  	return mmu_spte_update(sptep, spte);
>  }
>  
> +static bool wrprot_ad_disabled_spte(u64 *sptep)
> +{
> +	bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
> +					       (unsigned long *)sptep);
> +	if (was_writable)
> +		kvm_set_pfn_dirty(spte_to_pfn(*sptep));
> +
> +	return was_writable;
> +}
> +
> +/*
> + * Gets the GFN ready for another round of dirty logging by clearing the
> + *	- D bit on ad-enabled SPTEs, and
> + *	- W bit on ad-disabled SPTEs.
> + * Returns true iff any D or W bits were cleared.
> + */
>  static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
>  {
>  	u64 *sptep;
> @@ -1409,7 +1444,10 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
>  	bool flush = false;
>  
>  	for_each_rmap_spte(rmap_head, &iter, sptep)
> -		flush |= spte_clear_dirty(sptep);
> +		if (spte_ad_enabled(*sptep))
> +			flush |= spte_clear_dirty(sptep);
> +		else
> +			flush |= wrprot_ad_disabled_spte(sptep);
>  
>  	return flush;
>  }
> @@ -1432,7 +1470,8 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
>  	bool flush = false;
>  
>  	for_each_rmap_spte(rmap_head, &iter, sptep)
> -		flush |= spte_set_dirty(sptep);
> +		if (spte_ad_enabled(*sptep))
> +			flush |= spte_set_dirty(sptep);
>  
>  	return flush;
>  }
> @@ -1464,7 +1503,8 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>  }
>  
>  /**
> - * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages
> + * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages, or write
> + * protect the page if the D-bit isn't supported.
>   * @kvm: kvm instance
>   * @slot: slot to clear D-bit
>   * @gfn_offset: start of the BITS_PER_LONG pages we care about
> @@ -2389,7 +2429,12 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
>  	BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
>  
>  	spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK |
> -	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
> +	       shadow_user_mask | shadow_x_mask;
> +
> +	if (sp_ad_disabled(sp))
> +		spte |= shadow_acc_track_value;
> +	else
> +		spte |= shadow_accessed_mask;
>  
>  	mmu_spte_set(sptep, spte);
>  
> @@ -2657,10 +2702,15 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  {
>  	u64 spte = 0;
>  	int ret = 0;
> +	struct kvm_mmu_page *sp;
>  
>  	if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
>  		return 0;
>  
> +	sp = page_header(__pa(sptep));
> +	if (sp_ad_disabled(sp))
> +		spte |= shadow_acc_track_value;
> +
>  	/*
>  	 * For the EPT case, shadow_present_mask is 0 if hardware
>  	 * supports exec-only page table entries.  In that case,
> @@ -2669,7 +2719,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	 */
>  	spte |= shadow_present_mask;
>  	if (!speculative)
> -		spte |= shadow_accessed_mask;
> +		spte |= spte_shadow_accessed_mask(spte);
>  
>  	if (pte_access & ACC_EXEC_MASK)
>  		spte |= shadow_x_mask;
> @@ -2726,7 +2776,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  
>  	if (pte_access & ACC_WRITE_MASK) {
>  		kvm_vcpu_mark_page_dirty(vcpu, gfn);
> -		spte |= shadow_dirty_mask;
> +		spte |= spte_shadow_dirty_mask(spte);
>  	}
>  
>  	if (speculative)
> @@ -2868,16 +2918,16 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
>  {
>  	struct kvm_mmu_page *sp;
>  
> +	sp = page_header(__pa(sptep));
> +
>  	/*
> -	 * Since it's no accessed bit on EPT, it's no way to
> -	 * distinguish between actually accessed translations
> -	 * and prefetched, so disable pte prefetch if EPT is
> -	 * enabled.
> +	 * Without accessed bits, there's no way to distinguish between
> +	 * actually accessed translations and prefetched, so disable pte
> +	 * prefetch if accessed bits aren't available.
>  	 */
> -	if (!shadow_accessed_mask)
> +	if (sp_ad_disabled(sp))
>  		return;
>  
> -	sp = page_header(__pa(sptep));
>  	if (sp->role.level > PT_PAGE_TABLE_LEVEL)
>  		return;
>  
> @@ -4624,6 +4674,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>  	mask.smep_andnot_wp = 1;
>  	mask.smap_andnot_wp = 1;
>  	mask.smm = 1;
> +	mask.ad_disabled = 1;
>  
>  	/*
>  	 * If we don't have indirect shadow pages, it means no page is
> diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
> index 5a24b846a1cb..8b97a6cba8d1 100644
> --- a/arch/x86/kvm/mmutrace.h
> +++ b/arch/x86/kvm/mmutrace.h
> @@ -30,8 +30,9 @@
>  								        \
>  	role.word = __entry->role;					\
>  									\
> -	trace_seq_printf(p, "sp gen %lx gfn %llx %u%s q%u%s %s%s"	\
> -			 " %snxe root %u %s%c",	__entry->mmu_valid_gen,	\
> +	trace_seq_printf(p, "sp gen %lx gfn %llx l%u%s q%u%s %s%s"	\
> +			 " %snxe %sad root %u %s%c",			\
> +			 __entry->mmu_valid_gen,			\
>  			 __entry->gfn, role.level,			\
>  			 role.cr4_pae ? " pae" : "",			\
>  			 role.quadrant,					\
> @@ -39,6 +40,7 @@
>  			 access_str[role.access],			\
>  			 role.invalid ? " invalid" : "",		\
>  			 role.nxe ? "" : "!",				\
> +			 role.ad_disabled ? "!" : "",			\
>  			 __entry->root_count,				\
>  			 __entry->unsync ? "unsync" : "sync", 0);	\
>  	saved_ptr;							\
> 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-07-03  9:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-01  0:26 [PATCH 0/4] Fix nested EPT when A/D bits are disabled Peter Feiner
2017-07-01  0:26 ` [PATCH 1/4] x86: kvm: mmu: dead code thanks to access tracking Peter Feiner
2017-07-01  0:26 ` [PATCH 2/4] x86: kvm: mmu: make spte mmio mask more explicit Peter Feiner
2017-07-01  0:26 ` [PATCH 3/4] kvm: x86: mmu: allow A/D bits to be disabled in an mmu Peter Feiner
2017-07-03  9:22   ` Paolo Bonzini
2017-07-01  0:26 ` [PATCH 4/4] x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 Peter Feiner
2017-07-01  5:29   ` Paolo Bonzini
2017-07-01  7:29     ` Peter Feiner
2017-07-01  7:59       ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.