All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits
@ 2017-03-30  9:55 Paolo Bonzini
  2017-03-30  9:55 ` [PATCH 1/6] KVM: nVMX: we support 1GB EPT pages Paolo Bonzini
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Paolo Bonzini @ 2017-03-30  9:55 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: david

Patches 1-4 implement nested EPT A/D bits and GB pages.  As a side effect,
this fixes one vmx.flat failure on machines with EPT A/D bits.
It should be possible to implement PML on top of this with host
support for A/D bits only.

Patches 5-6 implement nested RDRAND and RDSEED exiting.

Paolo

v1->v2: simplified patch 2 further
	removed magic 0x100 from patch 4

Paolo Bonzini (6):
  KVM: nVMX: we support 1GB EPT pages
  KVM: VMX: remove bogus check for invalid EPT violation
  kvm: x86: MMU support for EPT accessed/dirty bits
  kvm: nVMX: support EPT accessed/dirty bits
  KVM: VMX: add missing exit reasons
  KVM: nVMX: support RDRAND and RDSEED exiting

 arch/x86/include/asm/kvm_host.h |  5 ++--
 arch/x86/include/asm/vmx.h      |  4 +++
 arch/x86/include/uapi/asm/vmx.h | 25 +++++++++++++------
 arch/x86/kvm/mmu.c              |  4 ++-
 arch/x86/kvm/mmu.h              |  3 ++-
 arch/x86/kvm/paging_tmpl.h      | 54 +++++++++++++++++++++++------------------
 arch/x86/kvm/vmx.c              | 54 ++++++++++++++++++++++++++---------------
 7 files changed, 95 insertions(+), 54 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] KVM: nVMX: we support 1GB EPT pages
  2017-03-30  9:55 [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
@ 2017-03-30  9:55 ` Paolo Bonzini
  2017-03-30  9:55 ` [PATCH 2/6] KVM: VMX: remove bogus check for invalid EPT violation Paolo Bonzini
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2017-03-30  9:55 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: david

Large pages at the PDPE level can be emulated by the MMU, so the bit
can be set unconditionally in the EPT capabilities MSR.  The same is
true of 2MB EPT pages, though all Intel processors with EPT in practice
support those.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 40b80e8959e1..0e61b9226bf2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2759,14 +2759,14 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
 		vmx->nested.nested_vmx_secondary_ctls_high |=
 			SECONDARY_EXEC_ENABLE_EPT;
 		vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
-			 VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
-			 VMX_EPT_INVEPT_BIT;
+			 VMX_EPTP_WB_BIT | VMX_EPT_INVEPT_BIT;
 		if (cpu_has_vmx_ept_execute_only())
 			vmx->nested.nested_vmx_ept_caps |=
 				VMX_EPT_EXECUTE_ONLY_BIT;
 		vmx->nested.nested_vmx_ept_caps &= vmx_capability.ept;
 		vmx->nested.nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT |
-			VMX_EPT_EXTENT_CONTEXT_BIT;
+			VMX_EPT_EXTENT_CONTEXT_BIT | VMX_EPT_2MB_PAGE_BIT |
+			VMX_EPT_1GB_PAGE_BIT;
 	} else
 		vmx->nested.nested_vmx_ept_caps = 0;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] KVM: VMX: remove bogus check for invalid EPT violation
  2017-03-30  9:55 [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
  2017-03-30  9:55 ` [PATCH 1/6] KVM: nVMX: we support 1GB EPT pages Paolo Bonzini
@ 2017-03-30  9:55 ` Paolo Bonzini
  2017-03-30 16:30   ` Jim Mattson
                     ` (2 more replies)
  2017-03-30  9:55 ` [PATCH 3/6] kvm: x86: MMU support for EPT accessed/dirty bits Paolo Bonzini
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 20+ messages in thread
From: Paolo Bonzini @ 2017-03-30  9:55 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: david

handle_ept_violation is checking for "guest-linear-address invalid" +
"not a paging-structure walk".  However, _all_ EPT violations without
a valid guest linear address are paging structure walks, because those
EPT violations happen when loading the guest PDPTEs.

Therefore, the check can never be true, and even if it were, KVM doesn't
care about the guest linear address; it only uses the guest *physical*
address VMCS field.  So, remove the check altogether.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0e61b9226bf2..1c372600a962 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6208,23 +6208,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	unsigned long exit_qualification;
 	gpa_t gpa;
 	u32 error_code;
-	int gla_validity;
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
-	gla_validity = (exit_qualification >> 7) & 0x3;
-	if (gla_validity == 0x2) {
-		printk(KERN_ERR "EPT: Handling EPT violation failed!\n");
-		printk(KERN_ERR "EPT: GPA: 0x%lx, GVA: 0x%lx\n",
-			(long unsigned int)vmcs_read64(GUEST_PHYSICAL_ADDRESS),
-			vmcs_readl(GUEST_LINEAR_ADDRESS));
-		printk(KERN_ERR "EPT: Exit qualification is 0x%lx\n",
-			(long unsigned int)exit_qualification);
-		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
-		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_EPT_VIOLATION;
-		return 0;
-	}
-
 	/*
 	 * EPT violation happened while executing iret from NMI,
 	 * "blocked by NMI" bit has to be set before next VM entry.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] kvm: x86: MMU support for EPT accessed/dirty bits
  2017-03-30  9:55 [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
  2017-03-30  9:55 ` [PATCH 1/6] KVM: nVMX: we support 1GB EPT pages Paolo Bonzini
  2017-03-30  9:55 ` [PATCH 2/6] KVM: VMX: remove bogus check for invalid EPT violation Paolo Bonzini
@ 2017-03-30  9:55 ` Paolo Bonzini
  2017-03-31 13:52   ` Radim Krčmář
  2017-03-30  9:55 ` [PATCH 4/6] kvm: nVMX: support " Paolo Bonzini
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2017-03-30  9:55 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: david

This prepares the MMU paging code for EPT accessed and dirty bits,
which can be enabled optionally at runtime.  Code that updates the
accessed and dirty bits will need a pointer to the struct kvm_mmu.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index a01105485315..3e20f7b33892 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -43,6 +43,7 @@ extern u64 __pure __using_nonexistent_pte_bit(void)
 	#define PT_GUEST_DIRTY_MASK PT_DIRTY_MASK
 	#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
 	#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) true
 	#ifdef CONFIG_X86_64
 	#define PT_MAX_FULL_LEVELS 4
 	#define CMPXCHG cmpxchg
@@ -64,6 +65,7 @@ extern u64 __pure __using_nonexistent_pte_bit(void)
 	#define PT_GUEST_DIRTY_MASK PT_DIRTY_MASK
 	#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
 	#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) true
 	#define CMPXCHG cmpxchg
 #elif PTTYPE == PTTYPE_EPT
 	#define pt_element_t u64
@@ -78,6 +80,7 @@ extern u64 __pure __using_nonexistent_pte_bit(void)
 	#define PT_GUEST_DIRTY_MASK 0
 	#define PT_GUEST_DIRTY_SHIFT __using_nonexistent_pte_bit()
 	#define PT_GUEST_ACCESSED_SHIFT __using_nonexistent_pte_bit()
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) false
 	#define CMPXCHG cmpxchg64
 	#define PT_MAX_FULL_LEVELS 4
 #else
@@ -111,12 +114,13 @@ static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl)
 	return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT;
 }
 
-static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
+static inline void FNAME(protect_clean_gpte)(struct kvm_mmu *mmu, unsigned *access,
+					     unsigned gpte)
 {
 	unsigned mask;
 
 	/* dirty bit is not supported, so no need to track it */
-	if (!PT_GUEST_DIRTY_MASK)
+	if (!PT_HAVE_ACCESSED_DIRTY(mmu))
 		return;
 
 	BUILD_BUG_ON(PT_WRITABLE_MASK != ACC_WRITE_MASK);
@@ -171,7 +175,7 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
 		goto no_present;
 
 	/* if accessed bit is not supported prefetch non accessed gpte */
-	if (PT_GUEST_ACCESSED_MASK && !(gpte & PT_GUEST_ACCESSED_MASK))
+	if (PT_HAVE_ACCESSED_DIRTY(&vcpu->arch.mmu) && !(gpte & PT_GUEST_ACCESSED_MASK))
 		goto no_present;
 
 	return false;
@@ -217,7 +221,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
 	int ret;
 
 	/* dirty/accessed bits are not supported, so no need to update them */
-	if (!PT_GUEST_DIRTY_MASK)
+	if (!PT_HAVE_ACCESSED_DIRTY(mmu))
 		return 0;
 
 	for (level = walker->max_level; level >= walker->level; --level) {
@@ -287,6 +291,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	gfn_t table_gfn;
 	unsigned index, pt_access, pte_access, accessed_dirty, pte_pkey;
 	gpa_t pte_gpa;
+	bool have_ad;
 	int offset;
 	const int write_fault = access & PFERR_WRITE_MASK;
 	const int user_fault  = access & PFERR_USER_MASK;
@@ -299,6 +304,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 retry_walk:
 	walker->level = mmu->root_level;
 	pte           = mmu->get_cr3(vcpu);
+	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
 
 #if PTTYPE == 64
 	if (walker->level == PT32E_ROOT_LEVEL) {
@@ -312,7 +318,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	walker->max_level = walker->level;
 	ASSERT(!(is_long_mode(vcpu) && !is_pae(vcpu)));
 
-	accessed_dirty = PT_GUEST_ACCESSED_MASK;
+	accessed_dirty = have_ad ? PT_GUEST_ACCESSED_MASK : 0;
 	pt_access = pte_access = ACC_ALL;
 	++walker->level;
 
@@ -394,7 +400,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	walker->gfn = real_gpa >> PAGE_SHIFT;
 
 	if (!write_fault)
-		FNAME(protect_clean_gpte)(&pte_access, pte);
+		FNAME(protect_clean_gpte)(mmu, &pte_access, pte);
 	else
 		/*
 		 * On a write fault, fold the dirty bit into accessed_dirty.
@@ -485,7 +491,7 @@ static int FNAME(walk_addr_nested)(struct guest_walker *walker,
 
 	gfn = gpte_to_gfn(gpte);
 	pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
-	FNAME(protect_clean_gpte)(&pte_access, gpte);
+	FNAME(protect_clean_gpte)(&vcpu->arch.mmu, &pte_access, gpte);
 	pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
 			no_dirty_log && (pte_access & ACC_WRITE_MASK));
 	if (is_error_pfn(pfn))
@@ -979,7 +985,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		gfn = gpte_to_gfn(gpte);
 		pte_access = sp->role.access;
 		pte_access &= FNAME(gpte_access)(vcpu, gpte);
-		FNAME(protect_clean_gpte)(&pte_access, gpte);
+		FNAME(protect_clean_gpte)(&vcpu->arch.mmu, &pte_access, gpte);
 
 		if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access,
 		      &nr_present))
@@ -1025,3 +1031,4 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 #undef PT_GUEST_DIRTY_MASK
 #undef PT_GUEST_DIRTY_SHIFT
 #undef PT_GUEST_ACCESSED_SHIFT
+#undef PT_HAVE_ACCESSED_DIRTY
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] kvm: nVMX: support EPT accessed/dirty bits
  2017-03-30  9:55 [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
                   ` (2 preceding siblings ...)
  2017-03-30  9:55 ` [PATCH 3/6] kvm: x86: MMU support for EPT accessed/dirty bits Paolo Bonzini
@ 2017-03-30  9:55 ` Paolo Bonzini
  2017-03-31 16:24   ` Radim Krčmář
  2017-04-11 23:35   ` Bandan Das
  2017-03-30  9:55 ` [PATCH 5/6] KVM: VMX: add missing exit reasons Paolo Bonzini
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2017-03-30  9:55 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: david

Now use bit 6 of EPTP to optionally enable A/D bits for EPTP.  Another
thing to change is that, when EPT accessed and dirty bits are not in use,
VMX treats accesses to guest paging structures as data reads.  When they
are in use (bit 6 of EPTP is set), they are treated as writes and the
corresponding EPT dirty bit is set.  The MMU didn't know this detail,
so this patch adds it.

We also have to fix up the exit qualification.  It may be wrong because
KVM sets bit 6 but the guest might not.

L1 emulates EPT A/D bits using write permissions, so in principle it may
be possible for EPT A/D bits to be used by L1 even though not available
in hardware.  The problem is that guest page-table walks will be treated
as reads rather than writes, so they would not cause an EPT violation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +++--
 arch/x86/include/asm/vmx.h      |  2 ++
 arch/x86/kvm/mmu.c              |  4 +++-
 arch/x86/kvm/mmu.h              |  3 ++-
 arch/x86/kvm/paging_tmpl.h      | 33 ++++++++++++++++-----------------
 arch/x86/kvm/vmx.c              | 33 +++++++++++++++++++++++++++++----
 6 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 74ef58c8ff53..7dbb8d622683 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -343,9 +343,10 @@ struct kvm_mmu {
 	void (*update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			   u64 *spte, const void *pte);
 	hpa_t root_hpa;
-	int root_level;
-	int shadow_root_level;
 	union kvm_mmu_page_role base_role;
+	u8 root_level;
+	u8 shadow_root_level;
+	u8 ept_ad;
 	bool direct_map;
 
 	/*
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cc54b7026567..dffe8d68fb27 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -516,12 +516,14 @@ struct vmx_msr_entry {
 #define EPT_VIOLATION_READABLE_BIT	3
 #define EPT_VIOLATION_WRITABLE_BIT	4
 #define EPT_VIOLATION_EXECUTABLE_BIT	5
+#define EPT_VIOLATION_GVA_TRANSLATED_BIT 8
 #define EPT_VIOLATION_ACC_READ		(1 << EPT_VIOLATION_ACC_READ_BIT)
 #define EPT_VIOLATION_ACC_WRITE		(1 << EPT_VIOLATION_ACC_WRITE_BIT)
 #define EPT_VIOLATION_ACC_INSTR		(1 << EPT_VIOLATION_ACC_INSTR_BIT)
 #define EPT_VIOLATION_READABLE		(1 << EPT_VIOLATION_READABLE_BIT)
 #define EPT_VIOLATION_WRITABLE		(1 << EPT_VIOLATION_WRITABLE_BIT)
 #define EPT_VIOLATION_EXECUTABLE	(1 << EPT_VIOLATION_EXECUTABLE_BIT)
+#define EPT_VIOLATION_GVA_TRANSLATED	(1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)
 
 /*
  * VM-instruction error numbers
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ac7810513d0e..558676538fca 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4340,7 +4340,8 @@ void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
-void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly)
+void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
+			     bool accessed_dirty)
 {
 	struct kvm_mmu *context = &vcpu->arch.mmu;
 
@@ -4349,6 +4350,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly)
 	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
 
 	context->nx = true;
+	context->ept_ad = accessed_dirty;
 	context->page_fault = ept_page_fault;
 	context->gva_to_gpa = ept_gva_to_gpa;
 	context->sync_page = ept_sync_page;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index ddc56e91f2e4..d8ccb32f7308 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -74,7 +74,8 @@ enum {
 
 int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct);
 void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu);
-void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly);
+void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
+			     bool accessed_dirty);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 3e20f7b33892..8bf829703a00 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -23,13 +23,6 @@
  * so the code in this file is compiled twice, once per pte size.
  */
 
-/*
- * This is used to catch non optimized PT_GUEST_(DIRTY|ACCESS)_SHIFT macro
- * uses for EPT without A/D paging type.
- */
-extern u64 __pure __using_nonexistent_pte_bit(void)
-	       __compiletime_error("wrong use of PT_GUEST_(DIRTY|ACCESS)_SHIFT");
-
 #if PTTYPE == 64
 	#define pt_element_t u64
 	#define guest_walker guest_walker64
@@ -39,8 +32,6 @@ extern u64 __pure __using_nonexistent_pte_bit(void)
 	#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
 	#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
 	#define PT_LEVEL_BITS PT64_LEVEL_BITS
-	#define PT_GUEST_ACCESSED_MASK PT_ACCESSED_MASK
-	#define PT_GUEST_DIRTY_MASK PT_DIRTY_MASK
 	#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
 	#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
 	#define PT_HAVE_ACCESSED_DIRTY(mmu) true
@@ -61,8 +52,6 @@ extern u64 __pure __using_nonexistent_pte_bit(void)
 	#define PT_INDEX(addr, level) PT32_INDEX(addr, level)
 	#define PT_LEVEL_BITS PT32_LEVEL_BITS
 	#define PT_MAX_FULL_LEVELS 2
-	#define PT_GUEST_ACCESSED_MASK PT_ACCESSED_MASK
-	#define PT_GUEST_DIRTY_MASK PT_DIRTY_MASK
 	#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
 	#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
 	#define PT_HAVE_ACCESSED_DIRTY(mmu) true
@@ -76,17 +65,18 @@ extern u64 __pure __using_nonexistent_pte_bit(void)
 	#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
 	#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
 	#define PT_LEVEL_BITS PT64_LEVEL_BITS
-	#define PT_GUEST_ACCESSED_MASK 0
-	#define PT_GUEST_DIRTY_MASK 0
-	#define PT_GUEST_DIRTY_SHIFT __using_nonexistent_pte_bit()
-	#define PT_GUEST_ACCESSED_SHIFT __using_nonexistent_pte_bit()
-	#define PT_HAVE_ACCESSED_DIRTY(mmu) false
+	#define PT_GUEST_DIRTY_SHIFT 9
+	#define PT_GUEST_ACCESSED_SHIFT 8
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) ((mmu)->ept_ad)
 	#define CMPXCHG cmpxchg64
 	#define PT_MAX_FULL_LEVELS 4
 #else
 	#error Invalid PTTYPE value
 #endif
 
+#define PT_GUEST_DIRTY_MASK    (1 << PT_GUEST_DIRTY_SHIFT)
+#define PT_GUEST_ACCESSED_MASK (1 << PT_GUEST_ACCESSED_SHIFT)
+
 #define gpte_to_gfn_lvl FNAME(gpte_to_gfn_lvl)
 #define gpte_to_gfn(pte) gpte_to_gfn_lvl((pte), PT_PAGE_TABLE_LEVEL)
 
@@ -290,6 +280,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	pt_element_t __user *uninitialized_var(ptep_user);
 	gfn_t table_gfn;
 	unsigned index, pt_access, pte_access, accessed_dirty, pte_pkey;
+	unsigned nested_access;
 	gpa_t pte_gpa;
 	bool have_ad;
 	int offset;
@@ -319,6 +310,14 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	ASSERT(!(is_long_mode(vcpu) && !is_pae(vcpu)));
 
 	accessed_dirty = have_ad ? PT_GUEST_ACCESSED_MASK : 0;
+
+	/*
+	 * FIXME: on Intel processors, loads of the PDPTE registers for PAE paging
+	 * by the MOV to CR instruction are treated as reads and do not cause the
+	 * processor to set the dirty flag in tany EPT paging-structure entry.
+	 */
+	nested_access = (have_ad ? PFERR_WRITE_MASK : 0) | PFERR_USER_MASK;
+
 	pt_access = pte_access = ACC_ALL;
 	++walker->level;
 
@@ -338,7 +337,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 		walker->pte_gpa[walker->level - 1] = pte_gpa;
 
 		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
-					      PFERR_USER_MASK|PFERR_WRITE_MASK,
+					      nested_access,
 					      &walker->fault);
 
 		/*
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1c372600a962..6aaecc78dd71 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2767,6 +2767,8 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
 		vmx->nested.nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT |
 			VMX_EPT_EXTENT_CONTEXT_BIT | VMX_EPT_2MB_PAGE_BIT |
 			VMX_EPT_1GB_PAGE_BIT;
+	       if (enable_ept_ad_bits)
+		       vmx->nested.nested_vmx_ept_caps |= VMX_EPT_AD_BIT;
 	} else
 		vmx->nested.nested_vmx_ept_caps = 0;
 
@@ -6211,6 +6213,18 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
+	if (is_guest_mode(vcpu)
+	    && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
+		/*
+		 * Fix up exit_qualification according to whether guest
+		 * page table accesses are reads or writes.
+		 */
+		u64 eptp = nested_ept_get_cr3(vcpu);
+		exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
+		if (eptp & VMX_EPT_AD_ENABLE_BIT)
+			exit_qualification |= EPT_VIOLATION_ACC_WRITE;
+	}
+
 	/*
 	 * EPT violation happened while executing iret from NMI,
 	 * "blocked by NMI" bit has to be set before next VM entry.
@@ -9416,17 +9430,26 @@ static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
 	return get_vmcs12(vcpu)->ept_pointer;
 }
 
-static void nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
 {
+	u64 eptp;
+
 	WARN_ON(mmu_is_nested(vcpu));
+	eptp = nested_ept_get_cr3(vcpu);
+	if ((eptp & VMX_EPT_AD_ENABLE_BIT) && !enable_ept_ad_bits)
+		return 1;
+
+	kvm_mmu_unload(vcpu);
 	kvm_init_shadow_ept_mmu(vcpu,
 			to_vmx(vcpu)->nested.nested_vmx_ept_caps &
-			VMX_EPT_EXECUTE_ONLY_BIT);
+			VMX_EPT_EXECUTE_ONLY_BIT,
+			eptp & VMX_EPT_AD_ENABLE_BIT);
 	vcpu->arch.mmu.set_cr3           = vmx_set_cr3;
 	vcpu->arch.mmu.get_cr3           = nested_ept_get_cr3;
 	vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
 
 	vcpu->arch.walk_mmu              = &vcpu->arch.nested_mmu;
+	return 0;
 }
 
 static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu)
@@ -10188,8 +10211,10 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	}
 
 	if (nested_cpu_has_ept(vmcs12)) {
-		kvm_mmu_unload(vcpu);
-		nested_ept_init_mmu_context(vcpu);
+		if (nested_ept_init_mmu_context(vcpu)) {
+			*entry_failure_code = ENTRY_FAIL_DEFAULT;
+			return 1;
+		}
 	} else if (nested_cpu_has2(vmcs12,
 				   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
 		vmx_flush_tlb_ept_only(vcpu);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] KVM: VMX: add missing exit reasons
  2017-03-30  9:55 [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
                   ` (3 preceding siblings ...)
  2017-03-30  9:55 ` [PATCH 4/6] kvm: nVMX: support " Paolo Bonzini
@ 2017-03-30  9:55 ` Paolo Bonzini
  2017-03-30  9:55 ` [PATCH 6/6] KVM: nVMX: support RDRAND and RDSEED exiting Paolo Bonzini
  2017-03-31 11:13 ` [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
  6 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2017-03-30  9:55 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: david

In order to simplify adding exit reasons in the future,
the array of exit reason names is now also sorted by
exit reason code.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/uapi/asm/vmx.h | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 14458658e988..690a2dcf4078 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -76,7 +76,11 @@
 #define EXIT_REASON_WBINVD              54
 #define EXIT_REASON_XSETBV              55
 #define EXIT_REASON_APIC_WRITE          56
+#define EXIT_REASON_RDRAND              57
 #define EXIT_REASON_INVPCID             58
+#define EXIT_REASON_VMFUNC              59
+#define EXIT_REASON_ENCLS               60
+#define EXIT_REASON_RDSEED              61
 #define EXIT_REASON_PML_FULL            62
 #define EXIT_REASON_XSAVES              63
 #define EXIT_REASON_XRSTORS             64
@@ -90,6 +94,7 @@
 	{ EXIT_REASON_TASK_SWITCH,           "TASK_SWITCH" }, \
 	{ EXIT_REASON_CPUID,                 "CPUID" }, \
 	{ EXIT_REASON_HLT,                   "HLT" }, \
+	{ EXIT_REASON_INVD,                  "INVD" }, \
 	{ EXIT_REASON_INVLPG,                "INVLPG" }, \
 	{ EXIT_REASON_RDPMC,                 "RDPMC" }, \
 	{ EXIT_REASON_RDTSC,                 "RDTSC" }, \
@@ -108,6 +113,8 @@
 	{ EXIT_REASON_IO_INSTRUCTION,        "IO_INSTRUCTION" }, \
 	{ EXIT_REASON_MSR_READ,              "MSR_READ" }, \
 	{ EXIT_REASON_MSR_WRITE,             "MSR_WRITE" }, \
+	{ EXIT_REASON_INVALID_STATE,         "INVALID_STATE" }, \
+	{ EXIT_REASON_MSR_LOAD_FAIL,         "MSR_LOAD_FAIL" }, \
 	{ EXIT_REASON_MWAIT_INSTRUCTION,     "MWAIT_INSTRUCTION" }, \
 	{ EXIT_REASON_MONITOR_TRAP_FLAG,     "MONITOR_TRAP_FLAG" }, \
 	{ EXIT_REASON_MONITOR_INSTRUCTION,   "MONITOR_INSTRUCTION" }, \
@@ -115,20 +122,24 @@
 	{ EXIT_REASON_MCE_DURING_VMENTRY,    "MCE_DURING_VMENTRY" }, \
 	{ EXIT_REASON_TPR_BELOW_THRESHOLD,   "TPR_BELOW_THRESHOLD" }, \
 	{ EXIT_REASON_APIC_ACCESS,           "APIC_ACCESS" }, \
-	{ EXIT_REASON_GDTR_IDTR,	     "GDTR_IDTR" }, \
-	{ EXIT_REASON_LDTR_TR,		     "LDTR_TR" }, \
+	{ EXIT_REASON_EOI_INDUCED,           "EOI_INDUCED" }, \
+	{ EXIT_REASON_GDTR_IDTR,             "GDTR_IDTR" }, \
+	{ EXIT_REASON_LDTR_TR,               "LDTR_TR" }, \
 	{ EXIT_REASON_EPT_VIOLATION,         "EPT_VIOLATION" }, \
 	{ EXIT_REASON_EPT_MISCONFIG,         "EPT_MISCONFIG" }, \
 	{ EXIT_REASON_INVEPT,                "INVEPT" }, \
+	{ EXIT_REASON_RDTSCP,                "RDTSCP" }, \
 	{ EXIT_REASON_PREEMPTION_TIMER,      "PREEMPTION_TIMER" }, \
+	{ EXIT_REASON_INVVPID,               "INVVPID" }, \
 	{ EXIT_REASON_WBINVD,                "WBINVD" }, \
+	{ EXIT_REASON_XSETBV,                "XSETBV" }, \
 	{ EXIT_REASON_APIC_WRITE,            "APIC_WRITE" }, \
-	{ EXIT_REASON_EOI_INDUCED,           "EOI_INDUCED" }, \
-	{ EXIT_REASON_INVALID_STATE,         "INVALID_STATE" }, \
-	{ EXIT_REASON_MSR_LOAD_FAIL,         "MSR_LOAD_FAIL" }, \
-	{ EXIT_REASON_INVD,                  "INVD" }, \
-	{ EXIT_REASON_INVVPID,               "INVVPID" }, \
+	{ EXIT_REASON_RDRAND,                "RDRAND" }, \
 	{ EXIT_REASON_INVPCID,               "INVPCID" }, \
+	{ EXIT_REASON_VMFUNC,                "VMFUNC" }, \
+	{ EXIT_REASON_ENCLS,                 "ENCLS" }, \
+	{ EXIT_REASON_RDSEED,                "RDSEED" }, \
+	{ EXIT_REASON_PML_FULL,              "PML_FULL" }, \
 	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
 	{ EXIT_REASON_XRSTORS,               "XRSTORS" }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] KVM: nVMX: support RDRAND and RDSEED exiting
  2017-03-30  9:55 [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
                   ` (4 preceding siblings ...)
  2017-03-30  9:55 ` [PATCH 5/6] KVM: VMX: add missing exit reasons Paolo Bonzini
@ 2017-03-30  9:55 ` Paolo Bonzini
  2017-03-30 16:54   ` Jim Mattson
  2017-03-31 11:13 ` [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
  6 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2017-03-30  9:55 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: david

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/vmx.h | 2 ++
 arch/x86/kvm/vmx.c         | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index dffe8d68fb27..35cd06f636ab 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -70,8 +70,10 @@
 #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
 #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
+#define SECONDARY_EXEC_RDRAND			0x00000800
 #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
 #define SECONDARY_EXEC_SHADOW_VMCS              0x00004000
+#define SECONDARY_EXEC_RDSEED			0x00010000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
 #define SECONDARY_EXEC_XSAVES			0x00100000
 #define SECONDARY_EXEC_TSC_SCALING              0x02000000
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6aaecc78dd71..efc3a73c2109 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2745,6 +2745,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
 		vmx->nested.nested_vmx_secondary_ctls_high);
 	vmx->nested.nested_vmx_secondary_ctls_low = 0;
 	vmx->nested.nested_vmx_secondary_ctls_high &=
+		SECONDARY_EXEC_RDRAND | SECONDARY_EXEC_RDSEED |
 		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
 		SECONDARY_EXEC_RDTSCP |
 		SECONDARY_EXEC_DESC |
@@ -8083,6 +8084,10 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 		return nested_cpu_has(vmcs12, CPU_BASED_INVLPG_EXITING);
 	case EXIT_REASON_RDPMC:
 		return nested_cpu_has(vmcs12, CPU_BASED_RDPMC_EXITING);
+	case EXIT_REASON_RDRAND:
+		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_RDRAND);
+	case EXIT_REASON_RDSEED:
+		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_RDSEED);
 	case EXIT_REASON_RDTSC: case EXIT_REASON_RDTSCP:
 		return nested_cpu_has(vmcs12, CPU_BASED_RDTSC_EXITING);
 	case EXIT_REASON_VMCALL: case EXIT_REASON_VMCLEAR:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] KVM: VMX: remove bogus check for invalid EPT violation
  2017-03-30  9:55 ` [PATCH 2/6] KVM: VMX: remove bogus check for invalid EPT violation Paolo Bonzini
@ 2017-03-30 16:30   ` Jim Mattson
  2017-04-03 11:17   ` David Hildenbrand
  2017-04-12 20:00   ` David Hildenbrand
  2 siblings, 0 replies; 20+ messages in thread
From: Jim Mattson @ 2017-03-30 16:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: LKML, kvm list, David Hildenbrand

On Thu, Mar 30, 2017 at 2:55 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> handle_ept_violation is checking for "guest-linear-address invalid" +
> "not a paging-structure walk".  However, _all_ EPT violations without
> a valid guest linear address are paging structure walks, because those
> EPT violations happen when loading the guest PDPTEs.
>
> Therefore, the check can never be true, and even if it were, KVM doesn't
> care about the guest linear address; it only uses the guest *physical*
> address VMCS field.  So, remove the check altogether.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The check can never be true because the SDM says so explicitly: Bit 8
is "Reserved if bit 7 is 0 (cleared to 0)."

Reviewed-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/vmx.c | 14 --------------
>  1 file changed, 14 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 0e61b9226bf2..1c372600a962 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6208,23 +6208,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>         unsigned long exit_qualification;
>         gpa_t gpa;
>         u32 error_code;
> -       int gla_validity;
>
>         exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>
> -       gla_validity = (exit_qualification >> 7) & 0x3;
> -       if (gla_validity == 0x2) {
> -               printk(KERN_ERR "EPT: Handling EPT violation failed!\n");
> -               printk(KERN_ERR "EPT: GPA: 0x%lx, GVA: 0x%lx\n",
> -                       (long unsigned int)vmcs_read64(GUEST_PHYSICAL_ADDRESS),
> -                       vmcs_readl(GUEST_LINEAR_ADDRESS));
> -               printk(KERN_ERR "EPT: Exit qualification is 0x%lx\n",
> -                       (long unsigned int)exit_qualification);
> -               vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
> -               vcpu->run->hw.hardware_exit_reason = EXIT_REASON_EPT_VIOLATION;
> -               return 0;
> -       }
> -
>         /*
>          * EPT violation happened while executing iret from NMI,
>          * "blocked by NMI" bit has to be set before next VM entry.
> --
> 1.8.3.1
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] KVM: nVMX: support RDRAND and RDSEED exiting
  2017-03-30  9:55 ` [PATCH 6/6] KVM: nVMX: support RDRAND and RDSEED exiting Paolo Bonzini
@ 2017-03-30 16:54   ` Jim Mattson
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Mattson @ 2017-03-30 16:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: LKML, kvm list, David Hildenbrand

On Thu, Mar 30, 2017 at 2:55 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/include/asm/vmx.h | 2 ++
>  arch/x86/kvm/vmx.c         | 5 +++++
>  2 files changed, 7 insertions(+)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index dffe8d68fb27..35cd06f636ab 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -70,8 +70,10 @@
>  #define SECONDARY_EXEC_APIC_REGISTER_VIRT       0x00000100
>  #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
>  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING      0x00000400
> +#define SECONDARY_EXEC_RDRAND                  0x00000800
>  #define SECONDARY_EXEC_ENABLE_INVPCID          0x00001000
>  #define SECONDARY_EXEC_SHADOW_VMCS              0x00004000
> +#define SECONDARY_EXEC_RDSEED                  0x00010000
>  #define SECONDARY_EXEC_ENABLE_PML               0x00020000
>  #define SECONDARY_EXEC_XSAVES                  0x00100000
>  #define SECONDARY_EXEC_TSC_SCALING              0x02000000
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 6aaecc78dd71..efc3a73c2109 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2745,6 +2745,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>                 vmx->nested.nested_vmx_secondary_ctls_high);
>         vmx->nested.nested_vmx_secondary_ctls_low = 0;
>         vmx->nested.nested_vmx_secondary_ctls_high &=
> +               SECONDARY_EXEC_RDRAND | SECONDARY_EXEC_RDSEED |
>                 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>                 SECONDARY_EXEC_RDTSCP |
>                 SECONDARY_EXEC_DESC |
> @@ -8083,6 +8084,10 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
>                 return nested_cpu_has(vmcs12, CPU_BASED_INVLPG_EXITING);
>         case EXIT_REASON_RDPMC:
>                 return nested_cpu_has(vmcs12, CPU_BASED_RDPMC_EXITING);
> +       case EXIT_REASON_RDRAND:
> +               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_RDRAND);
> +       case EXIT_REASON_RDSEED:
> +               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_RDSEED);
>         case EXIT_REASON_RDTSC: case EXIT_REASON_RDTSCP:
>                 return nested_cpu_has(vmcs12, CPU_BASED_RDTSC_EXITING);
>         case EXIT_REASON_VMCALL: case EXIT_REASON_VMCLEAR:
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits
  2017-03-30  9:55 [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
                   ` (5 preceding siblings ...)
  2017-03-30  9:55 ` [PATCH 6/6] KVM: nVMX: support RDRAND and RDSEED exiting Paolo Bonzini
@ 2017-03-31 11:13 ` Paolo Bonzini
  6 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2017-03-31 11:13 UTC (permalink / raw)
  To: linux-kernel, kvm



On 30/03/2017 11:55, Paolo Bonzini wrote:
> Patches 1-4 implement nested EPT A/D bits and GB pages.  As a side effect,
> this fixes one vmx.flat failure on machines with EPT A/D bits.
> It should be possible to implement PML on top of this with host
> support for A/D bits only.
> 
> Patches 5-6 implement nested RDRAND and RDSEED exiting.
> 
> Paolo
> 
> v1->v2: simplified patch 2 further
> 	removed magic 0x100 from patch 4
> 
> Paolo Bonzini (6):
>   KVM: nVMX: we support 1GB EPT pages
>   KVM: VMX: remove bogus check for invalid EPT violation
>   kvm: x86: MMU support for EPT accessed/dirty bits
>   kvm: nVMX: support EPT accessed/dirty bits
>   KVM: VMX: add missing exit reasons
>   KVM: nVMX: support RDRAND and RDSEED exiting
> 
>  arch/x86/include/asm/kvm_host.h |  5 ++--
>  arch/x86/include/asm/vmx.h      |  4 +++
>  arch/x86/include/uapi/asm/vmx.h | 25 +++++++++++++------
>  arch/x86/kvm/mmu.c              |  4 ++-
>  arch/x86/kvm/mmu.h              |  3 ++-
>  arch/x86/kvm/paging_tmpl.h      | 54 +++++++++++++++++++++++------------------
>  arch/x86/kvm/vmx.c              | 54 ++++++++++++++++++++++++++---------------
>  7 files changed, 95 insertions(+), 54 deletions(-)
> 

I'm merging patches 1, 2, 5 and 6.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] kvm: x86: MMU support for EPT accessed/dirty bits
  2017-03-30  9:55 ` [PATCH 3/6] kvm: x86: MMU support for EPT accessed/dirty bits Paolo Bonzini
@ 2017-03-31 13:52   ` Radim Krčmář
  0 siblings, 0 replies; 20+ messages in thread
From: Radim Krčmář @ 2017-03-31 13:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, david

2017-03-30 11:55+0200, Paolo Bonzini:
> This prepares the MMU paging code for EPT accessed and dirty bits,
> which can be enabled optionally at runtime.  Code that updates the
> accessed and dirty bits will need a pointer to the struct kvm_mmu.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] kvm: nVMX: support EPT accessed/dirty bits
  2017-03-30  9:55 ` [PATCH 4/6] kvm: nVMX: support " Paolo Bonzini
@ 2017-03-31 16:24   ` Radim Krčmář
  2017-03-31 16:26     ` Paolo Bonzini
  2017-04-11 23:35   ` Bandan Das
  1 sibling, 1 reply; 20+ messages in thread
From: Radim Krčmář @ 2017-03-31 16:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, david

2017-03-30 11:55+0200, Paolo Bonzini:
> Now use bit 6 of EPTP to optionally enable A/D bits for EPTP.  Another
> thing to change is that, when EPT accessed and dirty bits are not in use,
> VMX treats accesses to guest paging structures as data reads.  When they
> are in use (bit 6 of EPTP is set), they are treated as writes and the
> corresponding EPT dirty bit is set.  The MMU didn't know this detail,
> so this patch adds it.
> 
> We also have to fix up the exit qualification.  It may be wrong because
> KVM sets bit 6 but the guest might not.
> 
> L1 emulates EPT A/D bits using write permissions, so in principle it may
> be possible for EPT A/D bits to be used by L1 even though not available
> in hardware.  The problem is that guest page-table walks will be treated
> as reads rather than writes, so they would not cause an EPT violation.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> @@ -319,6 +310,14 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  	ASSERT(!(is_long_mode(vcpu) && !is_pae(vcpu)));
>  
>  	accessed_dirty = have_ad ? PT_GUEST_ACCESSED_MASK : 0;
> +
> +	/*
> +	 * FIXME: on Intel processors, loads of the PDPTE registers for PAE paging
> +	 * by the MOV to CR instruction are treated as reads and do not cause the
> +	 * processor to set the dirty flag in tany EPT paging-structure entry.
                                              ^
                                               typo
> +	 */
> +	nested_access = (have_ad ? PFERR_WRITE_MASK : 0) | PFERR_USER_MASK;
> +

This special case should be fairly safe if I understand the consequences
correctly,

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>

> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -6211,6 +6213,18 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> +	if (is_guest_mode(vcpu)
> +	    && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
> +		/*
> +		 * Fix up exit_qualification according to whether guest
> +		 * page table accesses are reads or writes.
> +		 */
> +		u64 eptp = nested_ept_get_cr3(vcpu);
> +		exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
> +		if (eptp & VMX_EPT_AD_ENABLE_BIT)
> +			exit_qualification |= EPT_VIOLATION_ACC_WRITE;

I think this would be better without unconditional clearing

		if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
			exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] kvm: nVMX: support EPT accessed/dirty bits
  2017-03-31 16:24   ` Radim Krčmář
@ 2017-03-31 16:26     ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2017-03-31 16:26 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, david



On 31/03/2017 18:24, Radim Krčmář wrote:
>> +	if (is_guest_mode(vcpu)
>> +	    && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
>> +		/*
>> +		 * Fix up exit_qualification according to whether guest
>> +		 * page table accesses are reads or writes.
>> +		 */
>> +		u64 eptp = nested_ept_get_cr3(vcpu);
>> +		exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
>> +		if (eptp & VMX_EPT_AD_ENABLE_BIT)
>> +			exit_qualification |= EPT_VIOLATION_ACC_WRITE;
> I think this would be better without unconditional clearing
> 
> 		if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
> 			exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;

Yeah, this is a remnant of my (failed) attempt at emulating A/D bits
when the processor doesn't support it.  Which worked, only it's not
compliant enough to include it in the final series.

As for the two nits you found, shall I repost or are you okay with
fixing it yourself?

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] KVM: VMX: remove bogus check for invalid EPT violation
  2017-03-30  9:55 ` [PATCH 2/6] KVM: VMX: remove bogus check for invalid EPT violation Paolo Bonzini
  2017-03-30 16:30   ` Jim Mattson
@ 2017-04-03 11:17   ` David Hildenbrand
  2017-04-12 20:00   ` David Hildenbrand
  2 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2017-04-03 11:17 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm

On 30.03.2017 11:55, Paolo Bonzini wrote:
> handle_ept_violation is checking for "guest-linear-address invalid" +
> "not a paging-structure walk".  However, _all_ EPT violations without
> a valid guest linear address are paging structure walks, because those
> EPT violations happen when loading the guest PDPTEs.
> 
> Therefore, the check can never be true, and even if it were, KVM doesn't
> care about the guest linear address; it only uses the guest *physical*
> address VMCS field.  So, remove the check altogether.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 14 --------------
>  1 file changed, 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 0e61b9226bf2..1c372600a962 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6208,23 +6208,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>  	unsigned long exit_qualification;
>  	gpa_t gpa;
>  	u32 error_code;
> -	int gla_validity;
>  
>  	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>  
> -	gla_validity = (exit_qualification >> 7) & 0x3;
> -	if (gla_validity == 0x2) {
> -		printk(KERN_ERR "EPT: Handling EPT violation failed!\n");
> -		printk(KERN_ERR "EPT: GPA: 0x%lx, GVA: 0x%lx\n",
> -			(long unsigned int)vmcs_read64(GUEST_PHYSICAL_ADDRESS),
> -			vmcs_readl(GUEST_LINEAR_ADDRESS));
> -		printk(KERN_ERR "EPT: Exit qualification is 0x%lx\n",
> -			(long unsigned int)exit_qualification);
> -		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
> -		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_EPT_VIOLATION;
> -		return 0;
> -	}
> -
>  	/*
>  	 * EPT violation happened while executing iret from NMI,
>  	 * "blocked by NMI" bit has to be set before next VM entry.
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] kvm: nVMX: support EPT accessed/dirty bits
  2017-03-30  9:55 ` [PATCH 4/6] kvm: nVMX: support " Paolo Bonzini
  2017-03-31 16:24   ` Radim Krčmář
@ 2017-04-11 23:35   ` Bandan Das
  2017-04-11 23:54     ` Paolo Bonzini
  1 sibling, 1 reply; 20+ messages in thread
From: Bandan Das @ 2017-04-11 23:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, david

Paolo Bonzini <pbonzini@redhat.com> writes:
...
>  	accessed_dirty = have_ad ? PT_GUEST_ACCESSED_MASK : 0;
> +
> +	/*
> +	 * FIXME: on Intel processors, loads of the PDPTE registers for PAE paging
> +	 * by the MOV to CR instruction are treated as reads and do not cause the
> +	 * processor to set the dirty flag in tany EPT paging-structure entry.
> +	 */

Minor typo: "in any EPT paging-structure entry".

> +	nested_access = (have_ad ? PFERR_WRITE_MASK : 0) | PFERR_USER_MASK;
> +
>  	pt_access = pte_access = ACC_ALL;
>  	++walker->level;
>  
> @@ -338,7 +337,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  		walker->pte_gpa[walker->level - 1] = pte_gpa;
>  
>  		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
> -					      PFERR_USER_MASK|PFERR_WRITE_MASK,
> +					      nested_access,
>  					      &walker->fault);

I can't seem to understand the significance of this change (or for that matter
what was before this change).

mmu->translate_gpa() just returns gfn_to_gpa(table_gfn), right ?

Bandan

>  		/*
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1c372600a962..6aaecc78dd71 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2767,6 +2767,8 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>  		vmx->nested.nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT |
>  			VMX_EPT_EXTENT_CONTEXT_BIT | VMX_EPT_2MB_PAGE_BIT |
>  			VMX_EPT_1GB_PAGE_BIT;
> +	       if (enable_ept_ad_bits)
> +		       vmx->nested.nested_vmx_ept_caps |= VMX_EPT_AD_BIT;
>  	} else
>  		vmx->nested.nested_vmx_ept_caps = 0;
>  
> @@ -6211,6 +6213,18 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>  
>  	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>  
> +	if (is_guest_mode(vcpu)
> +	    && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
> +		/*
> +		 * Fix up exit_qualification according to whether guest
> +		 * page table accesses are reads or writes.
> +		 */
> +		u64 eptp = nested_ept_get_cr3(vcpu);
> +		exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
> +		if (eptp & VMX_EPT_AD_ENABLE_BIT)
> +			exit_qualification |= EPT_VIOLATION_ACC_WRITE;
> +	}
> +
>  	/*
>  	 * EPT violation happened while executing iret from NMI,
>  	 * "blocked by NMI" bit has to be set before next VM entry.
> @@ -9416,17 +9430,26 @@ static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
>  	return get_vmcs12(vcpu)->ept_pointer;
>  }
>  
> -static void nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
> +static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
>  {
> +	u64 eptp;
> +
>  	WARN_ON(mmu_is_nested(vcpu));
> +	eptp = nested_ept_get_cr3(vcpu);
> +	if ((eptp & VMX_EPT_AD_ENABLE_BIT) && !enable_ept_ad_bits)
> +		return 1;
> +
> +	kvm_mmu_unload(vcpu);
>  	kvm_init_shadow_ept_mmu(vcpu,
>  			to_vmx(vcpu)->nested.nested_vmx_ept_caps &
> -			VMX_EPT_EXECUTE_ONLY_BIT);
> +			VMX_EPT_EXECUTE_ONLY_BIT,
> +			eptp & VMX_EPT_AD_ENABLE_BIT);
>  	vcpu->arch.mmu.set_cr3           = vmx_set_cr3;
>  	vcpu->arch.mmu.get_cr3           = nested_ept_get_cr3;
>  	vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
>  
>  	vcpu->arch.walk_mmu              = &vcpu->arch.nested_mmu;
> +	return 0;
>  }
>  
>  static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu)
> @@ -10188,8 +10211,10 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  	}
>  
>  	if (nested_cpu_has_ept(vmcs12)) {
> -		kvm_mmu_unload(vcpu);
> -		nested_ept_init_mmu_context(vcpu);
> +		if (nested_ept_init_mmu_context(vcpu)) {
> +			*entry_failure_code = ENTRY_FAIL_DEFAULT;
> +			return 1;
> +		}
>  	} else if (nested_cpu_has2(vmcs12,
>  				   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
>  		vmx_flush_tlb_ept_only(vcpu);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] kvm: nVMX: support EPT accessed/dirty bits
  2017-04-11 23:35   ` Bandan Das
@ 2017-04-11 23:54     ` Paolo Bonzini
  2017-04-12 23:02       ` Bandan Das
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2017-04-11 23:54 UTC (permalink / raw)
  To: Bandan Das; +Cc: linux-kernel, kvm, david



----- Original Message -----
> From: "Bandan Das" <bsd@redhat.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, david@redhat.com
> Sent: Wednesday, April 12, 2017 7:35:16 AM
> Subject: Re: [PATCH 4/6] kvm: nVMX: support EPT accessed/dirty bits
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> ...
> >  	accessed_dirty = have_ad ? PT_GUEST_ACCESSED_MASK : 0;
> > +
> > +	/*
> > +	 * FIXME: on Intel processors, loads of the PDPTE registers for PAE
> > paging
> > +	 * by the MOV to CR instruction are treated as reads and do not cause the
> > +	 * processor to set the dirty flag in tany EPT paging-structure entry.
> > +	 */
> 
> Minor typo: "in any EPT paging-structure entry".
> 
> > +	nested_access = (have_ad ? PFERR_WRITE_MASK : 0) | PFERR_USER_MASK;
> > +
> >  	pt_access = pte_access = ACC_ALL;
> >  	++walker->level;
> >  
> > @@ -338,7 +337,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker
> > *walker,
> >  		walker->pte_gpa[walker->level - 1] = pte_gpa;
> >  
> >  		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
> > -					      PFERR_USER_MASK|PFERR_WRITE_MASK,
> > +					      nested_access,
> >  					      &walker->fault);
> 
> I can't seem to understand the significance of this change (or for that
> matter what was before this change).
> 
> mmu->translate_gpa() just returns gfn_to_gpa(table_gfn), right ?

For EPT it is, you're right it's fishy.  The "nested_access" should be
computed in translate_nested_gpa, which is where kvm->arch.nested_mmu
(non-EPT) requests to access kvm->arch.mmu (EPT).

In practice we need to define a new function
vcpu->arch.mmu.gva_to_gpa_nested that computes the nested_access
and calls cpu->arch.mmu.gva_to_gpa.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] KVM: VMX: remove bogus check for invalid EPT violation
  2017-03-30  9:55 ` [PATCH 2/6] KVM: VMX: remove bogus check for invalid EPT violation Paolo Bonzini
  2017-03-30 16:30   ` Jim Mattson
  2017-04-03 11:17   ` David Hildenbrand
@ 2017-04-12 20:00   ` David Hildenbrand
  2 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2017-04-12 20:00 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm

On 30.03.2017 11:55, Paolo Bonzini wrote:
> handle_ept_violation is checking for "guest-linear-address invalid" +
> "not a paging-structure walk".  However, _all_ EPT violations without
> a valid guest linear address are paging structure walks, because those
> EPT violations happen when loading the guest PDPTEs.
> 
> Therefore, the check can never be true, and even if it were, KVM doesn't
> care about the guest linear address; it only uses the guest *physical*
> address VMCS field.  So, remove the check altogether.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] kvm: nVMX: support EPT accessed/dirty bits
  2017-04-11 23:54     ` Paolo Bonzini
@ 2017-04-12 23:02       ` Bandan Das
  2017-04-14  5:17         ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Bandan Das @ 2017-04-12 23:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, david

Paolo Bonzini <pbonzini@redhat.com> writes:

> ----- Original Message -----
>> From: "Bandan Das" <bsd@redhat.com>
>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>> Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, david@redhat.com
>> Sent: Wednesday, April 12, 2017 7:35:16 AM
>> Subject: Re: [PATCH 4/6] kvm: nVMX: support EPT accessed/dirty bits
>> 
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> ...
>> >  	accessed_dirty = have_ad ? PT_GUEST_ACCESSED_MASK : 0;
>> > +
>> > +	/*
>> > +	 * FIXME: on Intel processors, loads of the PDPTE registers for PAE
>> > paging
>> > +	 * by the MOV to CR instruction are treated as reads and do not cause the
>> > +	 * processor to set the dirty flag in tany EPT paging-structure entry.
>> > +	 */
>> 
>> Minor typo: "in any EPT paging-structure entry".
>> 
>> > +	nested_access = (have_ad ? PFERR_WRITE_MASK : 0) | PFERR_USER_MASK;
>> > +
>> >  	pt_access = pte_access = ACC_ALL;
>> >  	++walker->level;
>> >  
>> > @@ -338,7 +337,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker
>> > *walker,
>> >  		walker->pte_gpa[walker->level - 1] = pte_gpa;
>> >  
>> >  		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
>> > -					      PFERR_USER_MASK|PFERR_WRITE_MASK,
>> > +					      nested_access,
>> >  					      &walker->fault);
>> 
>> I can't seem to understand the significance of this change (or for that
>> matter what was before this change).
>> 
>> mmu->translate_gpa() just returns gfn_to_gpa(table_gfn), right ?
>
> For EPT it is, you're right it's fishy.  The "nested_access" should be
> computed in translate_nested_gpa, which is where kvm->arch.nested_mmu
> (non-EPT) requests to access kvm->arch.mmu (EPT).

Thanks for the clarification. Is it the case when L1 runs L2 without
EPT ? I can't figure out the case where translate_nested_gpa will actually
be called. FNAME(walk_addr_nested) calls walk_addr_generic
with &vcpu->arch.nested_mmu and init_kvm_nested_mmu() sets gva_to_gpa()
with the appropriate "_nested" functions. But the gva_to_gpa() pointers
don't seem to get invoked at all for the nested case.

BTW, just noticed that setting PFERR_USER_MASK is redundant since
translate_nested_gpa does it too.

Bandan

> In practice we need to define a new function
> vcpu->arch.mmu.gva_to_gpa_nested that computes the nested_access
> and calls cpu->arch.mmu.gva_to_gpa.
>
> Thanks,
>
> Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] kvm: nVMX: support EPT accessed/dirty bits
  2017-04-12 23:02       ` Bandan Das
@ 2017-04-14  5:17         ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2017-04-14  5:17 UTC (permalink / raw)
  To: Bandan Das; +Cc: linux-kernel, kvm, david



On 13/04/2017 07:02, Bandan Das wrote:
>> For EPT it is, you're right it's fishy.  The "nested_access" should be
>> computed in translate_nested_gpa, which is where kvm->arch.nested_mmu
>> (non-EPT) requests to access kvm->arch.mmu (EPT).
>
> Thanks for the clarification. Is it the case when L1 runs L2 without
> EPT ? I can't figure out the case where translate_nested_gpa will actually
> be called.

It happens when L2 instruction are emulated by L0, for example when L1
is passing through I/O ports to L2 and L2 runs an "insb" instruction.  I
think this case is not covered by vmx.flat.

Paolo

> FNAME(walk_addr_nested) calls walk_addr_generic
> with &vcpu->arch.nested_mmu and init_kvm_nested_mmu() sets gva_to_gpa()
> with the appropriate "_nested" functions. But the gva_to_gpa() pointers
> don't seem to get invoked at all for the nested case.
> 
> BTW, just noticed that setting PFERR_USER_MASK is redundant since
> translate_nested_gpa does it too.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/6] kvm: x86: MMU support for EPT accessed/dirty bits
  2017-03-08 18:03 [PATCH " Paolo Bonzini
@ 2017-03-08 18:03 ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2017-03-08 18:03 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: bdas, dmatlack

This prepares the MMU paging code for EPT accessed and dirty bits,
which can be enabled optionally at runtime.  Code that updates the
accessed and dirty bits will need a pointer to the struct kvm_mmu.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/paging_tmpl.h | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index a01105485315..3e20f7b33892 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -43,6 +43,7 @@ extern u64 __pure __using_nonexistent_pte_bit(void)
 	#define PT_GUEST_DIRTY_MASK PT_DIRTY_MASK
 	#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
 	#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) true
 	#ifdef CONFIG_X86_64
 	#define PT_MAX_FULL_LEVELS 4
 	#define CMPXCHG cmpxchg
@@ -64,6 +65,7 @@ extern u64 __pure __using_nonexistent_pte_bit(void)
 	#define PT_GUEST_DIRTY_MASK PT_DIRTY_MASK
 	#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
 	#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) true
 	#define CMPXCHG cmpxchg
 #elif PTTYPE == PTTYPE_EPT
 	#define pt_element_t u64
@@ -78,6 +80,7 @@ extern u64 __pure __using_nonexistent_pte_bit(void)
 	#define PT_GUEST_DIRTY_MASK 0
 	#define PT_GUEST_DIRTY_SHIFT __using_nonexistent_pte_bit()
 	#define PT_GUEST_ACCESSED_SHIFT __using_nonexistent_pte_bit()
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) false
 	#define CMPXCHG cmpxchg64
 	#define PT_MAX_FULL_LEVELS 4
 #else
@@ -111,12 +114,13 @@ static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl)
 	return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT;
 }
 
-static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
+static inline void FNAME(protect_clean_gpte)(struct kvm_mmu *mmu, unsigned *access,
+					     unsigned gpte)
 {
 	unsigned mask;
 
 	/* dirty bit is not supported, so no need to track it */
-	if (!PT_GUEST_DIRTY_MASK)
+	if (!PT_HAVE_ACCESSED_DIRTY(mmu))
 		return;
 
 	BUILD_BUG_ON(PT_WRITABLE_MASK != ACC_WRITE_MASK);
@@ -171,7 +175,7 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
 		goto no_present;
 
 	/* if accessed bit is not supported prefetch non accessed gpte */
-	if (PT_GUEST_ACCESSED_MASK && !(gpte & PT_GUEST_ACCESSED_MASK))
+	if (PT_HAVE_ACCESSED_DIRTY(&vcpu->arch.mmu) && !(gpte & PT_GUEST_ACCESSED_MASK))
 		goto no_present;
 
 	return false;
@@ -217,7 +221,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
 	int ret;
 
 	/* dirty/accessed bits are not supported, so no need to update them */
-	if (!PT_GUEST_DIRTY_MASK)
+	if (!PT_HAVE_ACCESSED_DIRTY(mmu))
 		return 0;
 
 	for (level = walker->max_level; level >= walker->level; --level) {
@@ -287,6 +291,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	gfn_t table_gfn;
 	unsigned index, pt_access, pte_access, accessed_dirty, pte_pkey;
 	gpa_t pte_gpa;
+	bool have_ad;
 	int offset;
 	const int write_fault = access & PFERR_WRITE_MASK;
 	const int user_fault  = access & PFERR_USER_MASK;
@@ -299,6 +304,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 retry_walk:
 	walker->level = mmu->root_level;
 	pte           = mmu->get_cr3(vcpu);
+	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
 
 #if PTTYPE == 64
 	if (walker->level == PT32E_ROOT_LEVEL) {
@@ -312,7 +318,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	walker->max_level = walker->level;
 	ASSERT(!(is_long_mode(vcpu) && !is_pae(vcpu)));
 
-	accessed_dirty = PT_GUEST_ACCESSED_MASK;
+	accessed_dirty = have_ad ? PT_GUEST_ACCESSED_MASK : 0;
 	pt_access = pte_access = ACC_ALL;
 	++walker->level;
 
@@ -394,7 +400,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	walker->gfn = real_gpa >> PAGE_SHIFT;
 
 	if (!write_fault)
-		FNAME(protect_clean_gpte)(&pte_access, pte);
+		FNAME(protect_clean_gpte)(mmu, &pte_access, pte);
 	else
 		/*
 		 * On a write fault, fold the dirty bit into accessed_dirty.
@@ -485,7 +491,7 @@ static int FNAME(walk_addr_nested)(struct guest_walker *walker,
 
 	gfn = gpte_to_gfn(gpte);
 	pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
-	FNAME(protect_clean_gpte)(&pte_access, gpte);
+	FNAME(protect_clean_gpte)(&vcpu->arch.mmu, &pte_access, gpte);
 	pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
 			no_dirty_log && (pte_access & ACC_WRITE_MASK));
 	if (is_error_pfn(pfn))
@@ -979,7 +985,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		gfn = gpte_to_gfn(gpte);
 		pte_access = sp->role.access;
 		pte_access &= FNAME(gpte_access)(vcpu, gpte);
-		FNAME(protect_clean_gpte)(&pte_access, gpte);
+		FNAME(protect_clean_gpte)(&vcpu->arch.mmu, &pte_access, gpte);
 
 		if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access,
 		      &nr_present))
@@ -1025,3 +1031,4 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 #undef PT_GUEST_DIRTY_MASK
 #undef PT_GUEST_DIRTY_SHIFT
 #undef PT_GUEST_ACCESSED_SHIFT
+#undef PT_HAVE_ACCESSED_DIRTY
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-04-14  5:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-30  9:55 [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
2017-03-30  9:55 ` [PATCH 1/6] KVM: nVMX: we support 1GB EPT pages Paolo Bonzini
2017-03-30  9:55 ` [PATCH 2/6] KVM: VMX: remove bogus check for invalid EPT violation Paolo Bonzini
2017-03-30 16:30   ` Jim Mattson
2017-04-03 11:17   ` David Hildenbrand
2017-04-12 20:00   ` David Hildenbrand
2017-03-30  9:55 ` [PATCH 3/6] kvm: x86: MMU support for EPT accessed/dirty bits Paolo Bonzini
2017-03-31 13:52   ` Radim Krčmář
2017-03-30  9:55 ` [PATCH 4/6] kvm: nVMX: support " Paolo Bonzini
2017-03-31 16:24   ` Radim Krčmář
2017-03-31 16:26     ` Paolo Bonzini
2017-04-11 23:35   ` Bandan Das
2017-04-11 23:54     ` Paolo Bonzini
2017-04-12 23:02       ` Bandan Das
2017-04-14  5:17         ` Paolo Bonzini
2017-03-30  9:55 ` [PATCH 5/6] KVM: VMX: add missing exit reasons Paolo Bonzini
2017-03-30  9:55 ` [PATCH 6/6] KVM: nVMX: support RDRAND and RDSEED exiting Paolo Bonzini
2017-03-30 16:54   ` Jim Mattson
2017-03-31 11:13 ` [PATCH v2 0/6] KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2017-03-08 18:03 [PATCH " Paolo Bonzini
2017-03-08 18:03 ` [PATCH 3/6] kvm: x86: MMU support for EPT accessed/dirty bits Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.