kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
@ 2016-07-12 22:18 Bandan Das
  2016-07-12 22:18 ` [PATCH v2 1/5] mmu: extend the is_present check to 32 bits Bandan Das
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel

v1 of this series posted at https://lkml.org/lkml/2016/6/28/7

Changes since v1:
 - 1/5 : modify is_shadow_present_pte to check against 0xffffffff
   Reasoning provided in commit message.
 - 2/5 : Removed 2/5 from v1 since kvm doesn't use execute only.
   3/5 from v1 is now 2/5. Introduce shadow_present_mask that
   signifies whether ept execute only is supported. Add/remove some
   comments as suggested in v1.
 - 3/5 : 4/5 from v1 is now 3/5.
 - 4/5 : update_permission_bitmask now sets u=1 only if host doesn't
   support ept execute only.
 - 5/5 : No change
 
These patches are based on reviews to my RFC
http://www.spinics.net/lists/kvm/msg134440.html

Changes since RFC:
 - Remove shadow_xonly_valid, it's not needed
 - Remove checks from is_shadow_present_pte()
 - In reset_tdp_shadow_zero_bits_mask, pass correct execonly to __reset_rsvds_bits_mask_ept
 - Reuse shadow_user_mask in set_spte()
 - Remove is_present_gpte() and inline the operation at the two call sites

I spoke to Paolo about this a while back and thought to post this as
RFC while I am thinking of adding some unit tests.

Background: ESX refuses to run as L1 if support for EPT execute only isn't
found. I am not really sure if it uses it for anything since just advertising
the bits seems to work but adding the necessary plumbing seemed like a good idea.

Xiao, I took the liberty of adding you based on "git blame" :)

Thanks in advance.

Bandan Das (5):
  mmu: extend the is_present check to 32 bits
  mmu: don't set the present bit unconditionally
  mmu: remove is_present_gpte()
  mmu: change unconditional setting of the u bit in fault bitmap
  vmx: advertise support for ept execute only

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu.c              | 26 ++++++++++++++++++--------
 arch/x86/kvm/mmu.h              |  5 -----
 arch/x86/kvm/paging_tmpl.h      | 10 ++++++++--
 arch/x86/kvm/vmx.c              | 10 ++++++++--
 arch/x86/kvm/x86.c              |  6 +++---
 6 files changed, 38 insertions(+), 21 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/5] mmu: extend the is_present check to 32 bits
  2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
@ 2016-07-12 22:18 ` Bandan Das
  2016-09-01 14:20   ` Li, Liang Z
  2016-07-12 22:18 ` [PATCH v2 2/5] mmu: don't set the present bit unconditionally Bandan Das
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel

This is safe because this function is called
on host controlled page table and non-present/non-MMIO
sptes never use bits 1..31. For the EPT case, this
ensures that cases where only the execute bit is set
is marked valid.

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index def97b3..87b62dc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -304,7 +304,7 @@ static int is_nx(struct kvm_vcpu *vcpu)
 
 static int is_shadow_present_pte(u64 pte)
 {
-	return pte & PT_PRESENT_MASK && !is_mmio_spte(pte);
+	return (pte & 0xFFFFFFFFull) && !is_mmio_spte(pte);
 }
 
 static int is_large_pte(u64 pte)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/5] mmu: don't set the present bit unconditionally
  2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
  2016-07-12 22:18 ` [PATCH v2 1/5] mmu: extend the is_present check to 32 bits Bandan Das
@ 2016-07-12 22:18 ` Bandan Das
  2016-07-13  8:10   ` Paolo Bonzini
  2016-07-12 22:18 ` [PATCH v2 3/5] mmu: remove is_present_gpte() Bandan Das
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel

To support execute only mappings on behalf of L1
hypervisors, we teach set_spte() to honor L1's valid XWR
bits. This is only if host supports EPT execute only. Reuse
ACC_USER_MASK to signify if the L1 hypervisor has the R bit
set. Add a new variable "shadow_present_mask" that is
set for non EPT cases and preserves the existing behavior
for those cases.

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu.c              | 15 ++++++++++++---
 arch/x86/kvm/paging_tmpl.h      |  8 +++++++-
 arch/x86/kvm/vmx.c              |  7 +++++--
 arch/x86/kvm/x86.c              |  4 ++--
 5 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 69e62862..c0acc66 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1026,7 +1026,7 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu);
 void kvm_mmu_init_vm(struct kvm *kvm);
 void kvm_mmu_uninit_vm(struct kvm *kvm);
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
-		u64 dirty_mask, u64 nx_mask, u64 x_mask);
+		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask);
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 87b62dc..ae80aa4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -175,6 +175,7 @@ static u64 __read_mostly shadow_user_mask;
 static u64 __read_mostly shadow_accessed_mask;
 static u64 __read_mostly shadow_dirty_mask;
 static u64 __read_mostly shadow_mmio_mask;
+static u64 __read_mostly shadow_present_mask;
 
 static void mmu_spte_set(u64 *sptep, u64 spte);
 static void mmu_free_roots(struct kvm_vcpu *vcpu);
@@ -282,13 +283,14 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 }
 
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
-		u64 dirty_mask, u64 nx_mask, u64 x_mask)
+		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask)
 {
 	shadow_user_mask = user_mask;
 	shadow_accessed_mask = accessed_mask;
 	shadow_dirty_mask = dirty_mask;
 	shadow_nx_mask = nx_mask;
 	shadow_x_mask = x_mask;
+	shadow_present_mask = p_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 
@@ -2515,13 +2517,20 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		    gfn_t gfn, kvm_pfn_t pfn, bool speculative,
 		    bool can_unsync, bool host_writable)
 {
-	u64 spte;
+	u64 spte = 0;
 	int ret = 0;
 
 	if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
 		return 0;
 
-	spte = PT_PRESENT_MASK;
+	/*
+	 * In the non-EPT case, execonly is not valid and so
+	 * the following line is equivalent to spte |= PT_PRESENT_MASK.
+	 * For the EPT case, shadow_present_mask is 0 if hardware
+	 * supports it and we honor whatever way the guest set it.
+	 * See: FNAME(gpte_access) in paging_tmpl.h
+	 */
+	spte |= shadow_present_mask;
 	if (!speculative)
 		spte |= shadow_accessed_mask;
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index bc019f7..f2741db 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -181,13 +181,19 @@ no_present:
 	return true;
 }
 
+/*
+ * For PTTYPE_EPT, a page table can be executable but not readable
+ * on supported processors. Therefore, set_spte does not automatically
+ * set bit 0 if execute only is supported. Here, we repurpose ACC_USER_MASK
+ * to signify readability since it isn't used in the EPT case
+ */
 static inline unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, u64 gpte)
 {
 	unsigned access;
 #if PTTYPE == PTTYPE_EPT
 	access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
 		((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
-		ACC_USER_MASK;
+		((gpte & VMX_EPT_READABLE_MASK) ? ACC_USER_MASK : 0);
 #else
 	BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK);
 	BUILD_BUG_ON(ACC_EXEC_MASK != 1);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 64a79f2..f73b5dc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6366,10 +6366,13 @@ static __init int hardware_setup(void)
 	vmx_disable_intercept_msr_write_x2apic(0x83f);
 
 	if (enable_ept) {
-		kvm_mmu_set_mask_ptes(0ull,
+		kvm_mmu_set_mask_ptes(VMX_EPT_READABLE_MASK,
 			(enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
 			(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
-			0ull, VMX_EPT_EXECUTABLE_MASK);
+			0ull, VMX_EPT_EXECUTABLE_MASK,
+			cpu_has_vmx_ept_execute_only() ?
+				      0ull : PT_PRESENT_MASK);
+		BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
 		ept_set_mmio_spte_mask();
 		kvm_enable_tdp();
 	} else
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7da5dd2..b15f214 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5867,8 +5867,8 @@ int kvm_arch_init(void *opaque)
 	kvm_x86_ops = ops;
 
 	kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
-			PT_DIRTY_MASK, PT64_NX_MASK, 0);
-
+			PT_DIRTY_MASK, PT64_NX_MASK, 0,
+			PT_PRESENT_MASK);
 	kvm_timer_init();
 
 	perf_register_guest_info_callbacks(&kvm_guest_cbs);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] mmu: remove is_present_gpte()
  2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
  2016-07-12 22:18 ` [PATCH v2 1/5] mmu: extend the is_present check to 32 bits Bandan Das
  2016-07-12 22:18 ` [PATCH v2 2/5] mmu: don't set the present bit unconditionally Bandan Das
@ 2016-07-12 22:18 ` Bandan Das
  2016-07-12 22:18 ` [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap Bandan Das
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel

We have two versions of the above function.
To prevent confusion and bugs in the future, remove
the non-FNAME version entirely and replace all calls
with the actual check.

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/kvm/mmu.c         | 2 +-
 arch/x86/kvm/mmu.h         | 5 -----
 arch/x86/kvm/paging_tmpl.h | 2 +-
 arch/x86/kvm/x86.c         | 2 +-
 4 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ae80aa4..c364dcb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3198,7 +3198,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		MMU_WARN_ON(VALID_PAGE(root));
 		if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
 			pdptr = vcpu->arch.mmu.get_pdptr(vcpu, i);
-			if (!is_present_gpte(pdptr)) {
+			if (!(pdptr & PT_PRESENT_MASK)) {
 				vcpu->arch.mmu.pae_root[i] = 0;
 				continue;
 			}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 66b33b9..ddc56e9 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -93,11 +93,6 @@ static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
 	return kvm_mmu_load(vcpu);
 }
 
-static inline int is_present_gpte(unsigned long pte)
-{
-	return pte & PT_PRESENT_MASK;
-}
-
 /*
  * Currently, we have two sorts of write-protection, a) the first one
  * write-protects guest page to sync the guest modification, b) another one is
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f2741db..a011054 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -131,7 +131,7 @@ static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
 static inline int FNAME(is_present_gpte)(unsigned long pte)
 {
 #if PTTYPE != PTTYPE_EPT
-	return is_present_gpte(pte);
+	return pte & PT_PRESENT_MASK;
 #else
 	return pte & 7;
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b15f214..2e8618f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -537,7 +537,7 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
 		goto out;
 	}
 	for (i = 0; i < ARRAY_SIZE(pdpte); ++i) {
-		if (is_present_gpte(pdpte[i]) &&
+		if ((pdpte[i] & PT_PRESENT_MASK) &&
 		    (pdpte[i] &
 		     vcpu->arch.mmu.guest_rsvd_check.rsvd_bits_mask[0][2])) {
 			ret = 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap
  2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
                   ` (2 preceding siblings ...)
  2016-07-12 22:18 ` [PATCH v2 3/5] mmu: remove is_present_gpte() Bandan Das
@ 2016-07-12 22:18 ` Bandan Das
  2016-07-13  8:23   ` Paolo Bonzini
  2016-07-12 22:18 ` [PATCH v2 5/5] vmx: advertise support for ept execute only Bandan Das
  2016-07-13  9:20 ` [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Paolo Bonzini
  5 siblings, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel

For the nested EPT case, we assume that the read bit (u) is
always set since we used to unconditionally set it in set_spte().
Modify it to only be set when host ept execute only support
isn't present.

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/kvm/mmu.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c364dcb..566eea5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3923,9 +3923,10 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
 				 *   clearer.
 				 */
 				smap = cr4_smap && u && !uf && !ff;
-			} else
-				/* Not really needed: no U/S accesses on ept  */
-				u = 1;
+			} else {
+				if (shadow_present_mask)
+					u = 1;
+			}
 
 			fault = (ff && !x) || (uf && !u) || (wf && !w) ||
 				(smapf && smap);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/5] vmx: advertise support for ept execute only
  2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
                   ` (3 preceding siblings ...)
  2016-07-12 22:18 ` [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap Bandan Das
@ 2016-07-12 22:18 ` Bandan Das
  2016-07-13  9:20 ` [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Paolo Bonzini
  5 siblings, 0 replies; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel

MMU now knows about execute only mappings, so
advertise the feature to L1 hypervisors

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/kvm/vmx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f73b5dc..2c49355f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2717,6 +2717,9 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
 		vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
 			 VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
 			 VMX_EPT_INVEPT_BIT;
+		if (cpu_has_vmx_ept_execute_only())
+			vmx->nested.nested_vmx_ept_caps |=
+				VMX_EPT_EXECUTE_ONLY_BIT;
 		vmx->nested.nested_vmx_ept_caps &= vmx_capability.ept;
 		/*
 		 * For nested guests, we don't do anything specific
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/5] mmu: don't set the present bit unconditionally
  2016-07-12 22:18 ` [PATCH v2 2/5] mmu: don't set the present bit unconditionally Bandan Das
@ 2016-07-13  8:10   ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13  8:10 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: guangrong.xiao, kernellwp, linux-kernel



On 13/07/2016 00:18, Bandan Das wrote:
> +	/*
> +	 * In the non-EPT case, execonly is not valid and so
> +	 * the following line is equivalent to spte |= PT_PRESENT_MASK.

I think the comment need not mention non-EPT.

> +	 * For the EPT case, shadow_present_mask is 0 if hardware
> +	 * supports it and we honor whatever way the guest set it.

if hardware supports exec-only page table entries.

> +	 * See: FNAME(gpte_access) in paging_tmpl.h
> +	 */
> +	spte |= shadow_present_mask;
>  	if (!speculative)
>  		spte |= shadow_accessed_mask;
>  
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index bc019f7..f2741db 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -181,13 +181,19 @@ no_present:
>  	return true;
>  }
>  
> +/*
> + * For PTTYPE_EPT, a page table can be executable but not readable
> + * on supported processors. Therefore, set_spte does not automatically
> + * set bit 0 if execute only is supported. Here, we repurpose ACC_USER_MASK
> + * to signify readability since it isn't used in the EPT case
> + */
>  static inline unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, u64 gpte)
>  {
>  	unsigned access;
>  #if PTTYPE == PTTYPE_EPT
>  	access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
>  		((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
> -		ACC_USER_MASK;
> +		((gpte & VMX_EPT_READABLE_MASK) ? ACC_USER_MASK : 0);
>  #else
>  	BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK);
>  	BUILD_BUG_ON(ACC_EXEC_MASK != 1);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 64a79f2..f73b5dc 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6366,10 +6366,13 @@ static __init int hardware_setup(void)
>  	vmx_disable_intercept_msr_write_x2apic(0x83f);
>  
>  	if (enable_ept) {
> -		kvm_mmu_set_mask_ptes(0ull,
> +		kvm_mmu_set_mask_ptes(VMX_EPT_READABLE_MASK,
>  			(enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
>  			(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
> -			0ull, VMX_EPT_EXECUTABLE_MASK);
> +			0ull, VMX_EPT_EXECUTABLE_MASK,
> +			cpu_has_vmx_ept_execute_only() ?
> +				      0ull : PT_PRESENT_MASK);

Better to use VMX_EPT_READABLE_MASK here, which removes the need for the
BUILD_BUG_ON.

I can do these changes myself.

Paolo

> +		BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap
  2016-07-12 22:18 ` [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap Bandan Das
@ 2016-07-13  8:23   ` Paolo Bonzini
  2016-07-13  9:06     ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13  8:23 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: guangrong.xiao, kernellwp, linux-kernel



On 13/07/2016 00:18, Bandan Das wrote:
> For the nested EPT case, we assume that the read bit (u) is
> always set since we used to unconditionally set it in set_spte().
> Modify it to only be set when host ept execute only support
> isn't present.
> 
> Signed-off-by: Bandan Das <bsd@redhat.com>
> ---
>  arch/x86/kvm/mmu.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index c364dcb..566eea5 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3923,9 +3923,10 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>  				 *   clearer.
>  				 */
>  				smap = cr4_smap && u && !uf && !ff;
> -			} else
> -				/* Not really needed: no U/S accesses on ept  */
> -				u = 1;
> +			} else {
> +				if (shadow_present_mask)
> +					u = 1;
> +			}
>  
>  			fault = (ff && !x) || (uf && !u) || (wf && !w) ||
>  				(smapf && smap);
> 

This is needed too, in order to look up with the correct uf:

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 003618e324ce..941d345ebd2c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6014,12 +6014,14 @@ static int handle_ept_violation(struct kvm_vcpu
*vcpu)
 	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
 	trace_kvm_page_fault(gpa, exit_qualification);

-	/* It is a write fault? */
+	/* it is a read fault? */
+	error_code = (exit_qualification << 2) & PFERR_USER_MASK;
+	/* it is a write fault? */
 	error_code = exit_qualification & PFERR_WRITE_MASK;
 	/* It is a fetch fault? */
 	error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
 	/* ept page table is present? */
-	error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
+	error_code |= (exit_qualification & 0x38) != 0;

 	vcpu->arch.exit_qualification = exit_qualification;


Otherwise you always lookup the permission bitmap for an uf=0 case.
This was the meaning of the "Not really needed" comment.

Paolo

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap
  2016-07-13  8:23   ` Paolo Bonzini
@ 2016-07-13  9:06     ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13  9:06 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: guangrong.xiao, kernellwp, linux-kernel



On 13/07/2016 10:23, Paolo Bonzini wrote:
> 
> 
> On 13/07/2016 00:18, Bandan Das wrote:
>> For the nested EPT case, we assume that the read bit (u) is
>> always set since we used to unconditionally set it in set_spte().
>> Modify it to only be set when host ept execute only support
>> isn't present.
>>
>> Signed-off-by: Bandan Das <bsd@redhat.com>
>> ---
>>  arch/x86/kvm/mmu.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index c364dcb..566eea5 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3923,9 +3923,10 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>>  				 *   clearer.
>>  				 */
>>  				smap = cr4_smap && u && !uf && !ff;
>> -			} else
>> -				/* Not really needed: no U/S accesses on ept  */
>> -				u = 1;
>> +			} else {
>> +				if (shadow_present_mask)
>> +					u = 1;

... in fact I think this should be removed completely.

If shadow_present_mask is set, the R bit will always be set in a present
PTE and FNAME(gpte_access) will change that to ACC_USER_MASK---which in
turn means that u=1.

Paolo

>> +			}
>>  
>>  			fault = (ff && !x) || (uf && !u) || (wf && !w) ||
>>  				(smapf && smap);
>>
> 
> This is needed too, in order to look up with the correct uf:
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 003618e324ce..941d345ebd2c 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6014,12 +6014,14 @@ static int handle_ept_violation(struct kvm_vcpu
> *vcpu)
>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>  	trace_kvm_page_fault(gpa, exit_qualification);
> 
> -	/* It is a write fault? */
> +	/* it is a read fault? */
> +	error_code = (exit_qualification << 2) & PFERR_USER_MASK;
> +	/* it is a write fault? */
>  	error_code = exit_qualification & PFERR_WRITE_MASK;
>  	/* It is a fetch fault? */
>  	error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
>  	/* ept page table is present? */
> -	error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
> +	error_code |= (exit_qualification & 0x38) != 0;
> 
>  	vcpu->arch.exit_qualification = exit_qualification;
> 
> 
> Otherwise you always lookup the permission bitmap for an uf=0 case.
> This was the meaning of the "Not really needed" comment.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
  2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
                   ` (4 preceding siblings ...)
  2016-07-12 22:18 ` [PATCH v2 5/5] vmx: advertise support for ept execute only Bandan Das
@ 2016-07-13  9:20 ` Paolo Bonzini
  2016-07-13  9:49   ` Fam Zheng
  2016-07-13 15:06   ` Bandan Das
  5 siblings, 2 replies; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13  9:20 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: guangrong.xiao, kernellwp, linux-kernel



On 13/07/2016 00:18, Bandan Das wrote:
> v1 of this series posted at https://lkml.org/lkml/2016/6/28/7
> 
> Changes since v1:
>  - 1/5 : modify is_shadow_present_pte to check against 0xffffffff
>    Reasoning provided in commit message.
>  - 2/5 : Removed 2/5 from v1 since kvm doesn't use execute only.
>    3/5 from v1 is now 2/5. Introduce shadow_present_mask that
>    signifies whether ept execute only is supported. Add/remove some
>    comments as suggested in v1.
>  - 3/5 : 4/5 from v1 is now 3/5.
>  - 4/5 : update_permission_bitmask now sets u=1 only if host doesn't
>    support ept execute only.
>  - 5/5 : No change

These are the diffs I have after review, do they look okay?

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 190c0559c221..bd2535fdb9eb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		return 0;
 
 	/*
-	 * In the non-EPT case, execonly is not valid and so
-	 * the following line is equivalent to spte |= PT_PRESENT_MASK.
 	 * For the EPT case, shadow_present_mask is 0 if hardware
-	 * supports it and we honor whatever way the guest set it.
-	 * See: FNAME(gpte_access) in paging_tmpl.h
+	 * supports exec-only page table entries.  In that case,
+	 * ACC_USER_MASK and shadow_user_mask are used to represent
+	 * read access.  See FNAME(gpte_access) in paging_tmpl.h.
 	 */
 	spte |= shadow_present_mask;
 	if (!speculative)
@@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
 				 *   clearer.
 				 */
 				smap = cr4_smap && u && !uf && !ff;
-			} else {
-				if (shadow_present_mask)
-					u = 1;
 			}
 
 			fault = (ff && !x) || (uf && !u) || (wf && !w) ||
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 576c47cda1a3..dfef081e76c0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
 	trace_kvm_page_fault(gpa, exit_qualification);
 
-	/* It is a write fault? */
+	/* it is a read fault? */
+	error_code = (exit_qualification << 2) & PFERR_USER_MASK;
+	/* it is a write fault? */
 	error_code = exit_qualification & PFERR_WRITE_MASK;
 	/* It is a fetch fault? */
 	error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
 	/* ept page table is present? */
-	error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
+	error_code |= (exit_qualification & 0x38) != 0;
 
 	vcpu->arch.exit_qualification = exit_qualification;
 
@@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
 			(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
 			0ull, VMX_EPT_EXECUTABLE_MASK,
 			cpu_has_vmx_ept_execute_only() ?
-				      0ull : PT_PRESENT_MASK);
-		BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
+				      0ull : VMX_EPT_READABLE_MASK);
 		ept_set_mmio_spte_mask();
 		kvm_enable_tdp();
 	} else


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
  2016-07-13  9:20 ` [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Paolo Bonzini
@ 2016-07-13  9:49   ` Fam Zheng
  2016-07-13 10:03     ` Paolo Bonzini
  2016-07-13 15:06   ` Bandan Das
  1 sibling, 1 reply; 21+ messages in thread
From: Fam Zheng @ 2016-07-13  9:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Bandan Das, kvm, guangrong.xiao, kernellwp, linux-kernel

On Wed, 07/13 11:20, Paolo Bonzini wrote:
> 
> 
> On 13/07/2016 00:18, Bandan Das wrote:
> > v1 of this series posted at https://lkml.org/lkml/2016/6/28/7
> > 
> > Changes since v1:
> >  - 1/5 : modify is_shadow_present_pte to check against 0xffffffff
> >    Reasoning provided in commit message.
> >  - 2/5 : Removed 2/5 from v1 since kvm doesn't use execute only.
> >    3/5 from v1 is now 2/5. Introduce shadow_present_mask that
> >    signifies whether ept execute only is supported. Add/remove some
> >    comments as suggested in v1.
> >  - 3/5 : 4/5 from v1 is now 3/5.
> >  - 4/5 : update_permission_bitmask now sets u=1 only if host doesn't
> >    support ept execute only.
> >  - 5/5 : No change
> 
> These are the diffs I have after review, do they look okay?
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 190c0559c221..bd2535fdb9eb 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		return 0;
>  
>  	/*
> -	 * In the non-EPT case, execonly is not valid and so
> -	 * the following line is equivalent to spte |= PT_PRESENT_MASK.
>  	 * For the EPT case, shadow_present_mask is 0 if hardware
> -	 * supports it and we honor whatever way the guest set it.
> -	 * See: FNAME(gpte_access) in paging_tmpl.h
> +	 * supports exec-only page table entries.  In that case,
> +	 * ACC_USER_MASK and shadow_user_mask are used to represent
> +	 * read access.  See FNAME(gpte_access) in paging_tmpl.h.
>  	 */
>  	spte |= shadow_present_mask;
>  	if (!speculative)
> @@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>  				 *   clearer.
>  				 */
>  				smap = cr4_smap && u && !uf && !ff;
> -			} else {
> -				if (shadow_present_mask)
> -					u = 1;
>  			}
>  
>  			fault = (ff && !x) || (uf && !u) || (wf && !w) ||
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 576c47cda1a3..dfef081e76c0 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>  	trace_kvm_page_fault(gpa, exit_qualification);
>  
> -	/* It is a write fault? */
> +	/* it is a read fault? */
> +	error_code = (exit_qualification << 2) & PFERR_USER_MASK;
> +	/* it is a write fault? */
>  	error_code = exit_qualification & PFERR_WRITE_MASK;

Did you mean s/=/|=/ for this line?

Fam

>  	/* It is a fetch fault? */
>  	error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
>  	/* ept page table is present? */
> -	error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
> +	error_code |= (exit_qualification & 0x38) != 0;
>  
>  	vcpu->arch.exit_qualification = exit_qualification;
>  
> @@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
>  			(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
>  			0ull, VMX_EPT_EXECUTABLE_MASK,
>  			cpu_has_vmx_ept_execute_only() ?
> -				      0ull : PT_PRESENT_MASK);
> -		BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
> +				      0ull : VMX_EPT_READABLE_MASK);
>  		ept_set_mmio_spte_mask();
>  		kvm_enable_tdp();
>  	} else
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
  2016-07-13  9:49   ` Fam Zheng
@ 2016-07-13 10:03     ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13 10:03 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Bandan Das, kvm, guangrong.xiao, kernellwp, linux-kernel



On 13/07/2016 11:49, Fam Zheng wrote:
> > +	error_code = (exit_qualification << 2) & PFERR_USER_MASK;
> > +	/* it is a write fault? */
> >  	error_code = exit_qualification & PFERR_WRITE_MASK;
>
> Did you mean s/=/|=/ for this line?

Ugh, yes.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
  2016-07-13  9:20 ` [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Paolo Bonzini
  2016-07-13  9:49   ` Fam Zheng
@ 2016-07-13 15:06   ` Bandan Das
  2016-07-13 15:27     ` Paolo Bonzini
  1 sibling, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-13 15:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 13/07/2016 00:18, Bandan Das wrote:
>> v1 of this series posted at https://lkml.org/lkml/2016/6/28/7
>> 
>> Changes since v1:
>>  - 1/5 : modify is_shadow_present_pte to check against 0xffffffff
>>    Reasoning provided in commit message.
>>  - 2/5 : Removed 2/5 from v1 since kvm doesn't use execute only.
>>    3/5 from v1 is now 2/5. Introduce shadow_present_mask that
>>    signifies whether ept execute only is supported. Add/remove some
>>    comments as suggested in v1.
>>  - 3/5 : 4/5 from v1 is now 3/5.
>>  - 4/5 : update_permission_bitmask now sets u=1 only if host doesn't
>>    support ept execute only.
>>  - 5/5 : No change
>
> These are the diffs I have after review, do they look okay?
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 190c0559c221..bd2535fdb9eb 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		return 0;
>  
>  	/*
> -	 * In the non-EPT case, execonly is not valid and so
> -	 * the following line is equivalent to spte |= PT_PRESENT_MASK.
>  	 * For the EPT case, shadow_present_mask is 0 if hardware
> -	 * supports it and we honor whatever way the guest set it.
> -	 * See: FNAME(gpte_access) in paging_tmpl.h
> +	 * supports exec-only page table entries.  In that case,
> +	 * ACC_USER_MASK and shadow_user_mask are used to represent
> +	 * read access.  See FNAME(gpte_access) in paging_tmpl.h.
>  	 */

I would still prefer a note about the non-EPT case, makes it easy to
understand.

>  	spte |= shadow_present_mask;
>  	if (!speculative)
> @@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>  				 *   clearer.
>  				 */
>  				smap = cr4_smap && u && !uf && !ff;
> -			} else {
> -				if (shadow_present_mask)
> -					u = 1;
>  			}
>  
>  			fault = (ff && !x) || (uf && !u) || (wf && !w) ||
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 576c47cda1a3..dfef081e76c0 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>  	trace_kvm_page_fault(gpa, exit_qualification);
>  
> -	/* It is a write fault? */
> +	/* it is a read fault? */
> +	error_code = (exit_qualification << 2) & PFERR_USER_MASK;
> +	/* it is a write fault? */
>  	error_code = exit_qualification & PFERR_WRITE_MASK;
>  	/* It is a fetch fault? */
>  	error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
>  	/* ept page table is present? */
> -	error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
> +	error_code |= (exit_qualification & 0x38) != 0;
>

Thank you for the thorough review here. I missed that we didn't set the read bit
at all. I am still a little unclear how permission_fault works though...


>  	vcpu->arch.exit_qualification = exit_qualification;
>  
> @@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
>  			(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
>  			0ull, VMX_EPT_EXECUTABLE_MASK,
>  			cpu_has_vmx_ept_execute_only() ?
> -				      0ull : PT_PRESENT_MASK);
> -		BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
> +				      0ull : VMX_EPT_READABLE_MASK);

I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
is an assumption all throughout. I wanted to use this section to catch mismatches.

Bandan

>  		ept_set_mmio_spte_mask();
>  		kvm_enable_tdp();
>  	} else

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
  2016-07-13 15:06   ` Bandan Das
@ 2016-07-13 15:27     ` Paolo Bonzini
  2016-07-13 15:47       ` Bandan Das
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13 15:27 UTC (permalink / raw)
  To: Bandan Das; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel



On 13/07/2016 17:06, Bandan Das wrote:
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 190c0559c221..bd2535fdb9eb 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>>  		return 0;
>>  
>>  	/*
>> -	 * In the non-EPT case, execonly is not valid and so
>> -	 * the following line is equivalent to spte |= PT_PRESENT_MASK.
>>  	 * For the EPT case, shadow_present_mask is 0 if hardware
>> -	 * supports it and we honor whatever way the guest set it.
>> -	 * See: FNAME(gpte_access) in paging_tmpl.h
>> +	 * supports exec-only page table entries.  In that case,
>> +	 * ACC_USER_MASK and shadow_user_mask are used to represent
>> +	 * read access.  See FNAME(gpte_access) in paging_tmpl.h.
>>  	 */
> 
> I would still prefer a note about the non-EPT case, makes it easy to
> understand.

I can add "shadow_present_mask is PT_PRESENT_MASK in the non-EPT case"
but it's a bit of a tautology.

>>  	spte |= shadow_present_mask;
>>  	if (!speculative)
>> @@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>>  				 *   clearer.
>>  				 */
>>  				smap = cr4_smap && u && !uf && !ff;
>> -			} else {
>> -				if (shadow_present_mask)
>> -					u = 1;
>>  			}
>>  
>>  			fault = (ff && !x) || (uf && !u) || (wf && !w) ||
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 576c47cda1a3..dfef081e76c0 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>>  	trace_kvm_page_fault(gpa, exit_qualification);
>>  
>> -	/* It is a write fault? */
>> +	/* it is a read fault? */
>> +	error_code = (exit_qualification << 2) & PFERR_USER_MASK;
>> +	/* it is a write fault? */
>>  	error_code = exit_qualification & PFERR_WRITE_MASK;
>>  	/* It is a fetch fault? */
>>  	error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
>>  	/* ept page table is present? */
>> -	error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
>> +	error_code |= (exit_qualification & 0x38) != 0;
>>
> 
> Thank you for the thorough review here. I missed that we didn't set the read bit
> at all. I am still a little unclear how permission_fault works though...
> 
>>  	vcpu->arch.exit_qualification = exit_qualification;
>>  
>> @@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
>>  			(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
>>  			0ull, VMX_EPT_EXECUTABLE_MASK,
>>  			cpu_has_vmx_ept_execute_only() ?
>> -				      0ull : PT_PRESENT_MASK);
>> -		BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
>> +				      0ull : VMX_EPT_READABLE_MASK);
> 
> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
> is an assumption all throughout. I wanted to use this section to catch mismatches.

I think there's no such assumption anymore, actually.  Can you double
check?  If there are any, that's where the BUILD_BUG_ON should be.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
  2016-07-13 15:27     ` Paolo Bonzini
@ 2016-07-13 15:47       ` Bandan Das
  2016-07-14  6:56         ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-13 15:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 13/07/2016 17:06, Bandan Das wrote:
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index 190c0559c221..bd2535fdb9eb 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>>>  		return 0;
>>>  
>>>  	/*
>>> -	 * In the non-EPT case, execonly is not valid and so
>>> -	 * the following line is equivalent to spte |= PT_PRESENT_MASK.
>>>  	 * For the EPT case, shadow_present_mask is 0 if hardware
>>> -	 * supports it and we honor whatever way the guest set it.
>>> -	 * See: FNAME(gpte_access) in paging_tmpl.h
>>> +	 * supports exec-only page table entries.  In that case,
>>> +	 * ACC_USER_MASK and shadow_user_mask are used to represent
>>> +	 * read access.  See FNAME(gpte_access) in paging_tmpl.h.
>>>  	 */
>> 
>> I would still prefer a note about the non-EPT case, makes it easy to
>> understand.
>
> I can add "shadow_present_mask is PT_PRESENT_MASK in the non-EPT case"
> but it's a bit of a tautology.

shadow_present_mask actually signifies different things for ept/non-ept
cases and it doesn't hurt to mention it. But I get your point, maybe,
it's self-explanatory.

>>>  	spte |= shadow_present_mask;
>>>  	if (!speculative)
>>> @@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>>>  				 *   clearer.
>>>  				 */
>>>  				smap = cr4_smap && u && !uf && !ff;
>>> -			} else {
>>> -				if (shadow_present_mask)
>>> -					u = 1;
>>>  			}
>>>  
>>>  			fault = (ff && !x) || (uf && !u) || (wf && !w) ||
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 576c47cda1a3..dfef081e76c0 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>>>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>>>  	trace_kvm_page_fault(gpa, exit_qualification);
>>>  
>>> -	/* It is a write fault? */
>>> +	/* it is a read fault? */
>>> +	error_code = (exit_qualification << 2) & PFERR_USER_MASK;
>>> +	/* it is a write fault? */
>>>  	error_code = exit_qualification & PFERR_WRITE_MASK;
>>>  	/* It is a fetch fault? */
>>>  	error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
>>>  	/* ept page table is present? */
>>> -	error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
>>> +	error_code |= (exit_qualification & 0x38) != 0;
>>>
>> 
>> Thank you for the thorough review here. I missed that we didn't set the read bit
>> at all. I am still a little unclear how permission_fault works though...
>> 
>>>  	vcpu->arch.exit_qualification = exit_qualification;
>>>  
>>> @@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
>>>  			(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
>>>  			0ull, VMX_EPT_EXECUTABLE_MASK,
>>>  			cpu_has_vmx_ept_execute_only() ?
>>> -				      0ull : PT_PRESENT_MASK);
>>> -		BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
>>> +				      0ull : VMX_EPT_READABLE_MASK);
>> 
>> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
>> is an assumption all throughout. I wanted to use this section to catch mismatches.
>
> I think there's no such assumption anymore, actually.  Can you double
> check?  If there are any, that's where the BUILD_BUG_ON should be.

What I meant is how they are the same bit.  is_shadow_present_pte() is probably one
and another one is link_shadow_page() which already has a BUILD_BUG_ON().

> Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
  2016-07-13 15:47       ` Bandan Das
@ 2016-07-14  6:56         ` Paolo Bonzini
  2016-07-14 17:38           ` Bandan Das
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-14  6:56 UTC (permalink / raw)
  To: Bandan Das; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel



On 13/07/2016 17:47, Bandan Das wrote:
>>> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
>>> is an assumption all throughout. I wanted to use this section to catch mismatches.
>>
>> I think there's no such assumption anymore, actually.  Can you double
>> check?  If there are any, that's where the BUILD_BUG_ON should be.
> 
> What I meant is how they are the same bit.  is_shadow_present_pte() is probably one
> and another one is link_shadow_page() which already has a BUILD_BUG_ON().

You're right about link_shadow_page()!  We probably should change the
PT_PRESENT_MASK to shadow_present_mask there (and then readability in
the EPT execonly case is still provided by shadow_user_mask).

For is_shadow_present_pte() you have removed it in patch 1 though.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
  2016-07-14  6:56         ` Paolo Bonzini
@ 2016-07-14 17:38           ` Bandan Das
  2016-07-14 18:29             ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-14 17:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 13/07/2016 17:47, Bandan Das wrote:
>>>> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
>>>> is an assumption all throughout. I wanted to use this section to catch mismatches.
>>>
>>> I think there's no such assumption anymore, actually.  Can you double
>>> check?  If there are any, that's where the BUILD_BUG_ON should be.
>> 
>> What I meant is how they are the same bit.  is_shadow_present_pte() is probably one
>> and another one is link_shadow_page() which already has a BUILD_BUG_ON().
>
> You're right about link_shadow_page()!  We probably should change the
> PT_PRESENT_MASK to shadow_present_mask there (and then readability in
> the EPT execonly case is still provided by shadow_user_mask).

Makes sense. Would you like a new version with that added or can that be a
separate patch ?

> For is_shadow_present_pte() you have removed it in patch 1 though.

Right. But the assumption is still that is_shadow_present_pte() works because
EPT_READABLE and PT_PRESENT are the same.

> Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
  2016-07-14 17:38           ` Bandan Das
@ 2016-07-14 18:29             ` Paolo Bonzini
  2016-07-14 18:52               ` Bandan Das
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-14 18:29 UTC (permalink / raw)
  To: Bandan Das; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel



On 14/07/2016 19:38, Bandan Das wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 13/07/2016 17:47, Bandan Das wrote:
>>>>> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
>>>>> is an assumption all throughout. I wanted to use this section to catch mismatches.
>>>>
>>>> I think there's no such assumption anymore, actually.  Can you double
>>>> check?  If there are any, that's where the BUILD_BUG_ON should be.
>>>
>>> What I meant is how they are the same bit.  is_shadow_present_pte() is probably one
>>> and another one is link_shadow_page() which already has a BUILD_BUG_ON().
>>
>> You're right about link_shadow_page()!  We probably should change the
>> PT_PRESENT_MASK to shadow_present_mask there (and then readability in
>> the EPT execonly case is still provided by shadow_user_mask).
> 
> Makes sense. Would you like a new version with that added or can that be a
> separate patch ?

I've already done it and pushed it to kvm/next. :)

>> For is_shadow_present_pte() you have removed it in patch 1 though.
> 
> Right. But the assumption is still that is_shadow_present_pte() works because
> EPT_READABLE and PT_PRESENT are the same.

is_shadow_present_pte() tests 0xFFFFFFFF, so it does not depend on bit 0
alone, for neither EPT nor "normal" page tables.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
  2016-07-14 18:29             ` Paolo Bonzini
@ 2016-07-14 18:52               ` Bandan Das
  0 siblings, 0 replies; 21+ messages in thread
From: Bandan Das @ 2016-07-14 18:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 14/07/2016 19:38, Bandan Das wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>>> On 13/07/2016 17:47, Bandan Das wrote:
>>>>>> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
>>>>>> is an assumption all throughout. I wanted to use this section to catch mismatches.
>>>>>
>>>>> I think there's no such assumption anymore, actually.  Can you double
>>>>> check?  If there are any, that's where the BUILD_BUG_ON should be.
>>>>
>>>> What I meant is how they are the same bit.  is_shadow_present_pte() is probably one
>>>> and another one is link_shadow_page() which already has a BUILD_BUG_ON().
>>>
>>> You're right about link_shadow_page()!  We probably should change the
>>> PT_PRESENT_MASK to shadow_present_mask there (and then readability in
>>> the EPT execonly case is still provided by shadow_user_mask).
>> 
>> Makes sense. Would you like a new version with that added or can that be a
>> separate patch ?
>
> I've already done it and pushed it to kvm/next. :)

Ah, thank you!

>>> For is_shadow_present_pte() you have removed it in patch 1 though.
>> 
>> Right. But the assumption is still that is_shadow_present_pte() works because
>> EPT_READABLE and PT_PRESENT are the same.
>
> is_shadow_present_pte() tests 0xFFFFFFFF, so it does not depend on bit 0
> alone, for neither EPT nor "normal" page tables.

Yeah... Let me rephrase, is_shadow_present_pte works because the assumption
is that both of the bits are in the first 32 bits :) You proved me wrong though,
this assumption does not mean a BUILD_BUG for the equal condition is required here.

Bandan

> Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 1/5] mmu: extend the is_present check to 32 bits
  2016-07-12 22:18 ` [PATCH v2 1/5] mmu: extend the is_present check to 32 bits Bandan Das
@ 2016-09-01 14:20   ` Li, Liang Z
  2016-09-01 15:16     ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Li, Liang Z @ 2016-09-01 14:20 UTC (permalink / raw)
  To: Bandan Das, kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel

Intel SDM doesn't describe whether the A bit will be set or not when CPU accesses  a no present EPT page table entry? 
even this patch works for the current CPU, it's not good to make such an assumption. 

Should we revert it?

Thanks!
Liang


> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> On Behalf Of Bandan Das
> Sent: Wednesday, July 13, 2016 6:19 AM
> To: kvm@vger.kernel.org
> Cc: pbonzini@redhat.com; guangrong.xiao@linux.intel.com;
> kernellwp@gmail.com; linux-kernel@vger.kernel.org
> Subject: [PATCH v2 1/5] mmu: extend the is_present check to 32 bits
> 
> This is safe because this function is called on host controlled page table and
> non-present/non-MMIO sptes never use bits 1..31. For the EPT case, this
> ensures that cases where only the execute bit is set is marked valid.
> 
> Signed-off-by: Bandan Das <bsd@redhat.com>
> ---
>  arch/x86/kvm/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index
> def97b3..87b62dc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -304,7 +304,7 @@ static int is_nx(struct kvm_vcpu *vcpu)
> 
>  static int is_shadow_present_pte(u64 pte)  {
> -	return pte & PT_PRESENT_MASK && !is_mmio_spte(pte);
> +	return (pte & 0xFFFFFFFFull) && !is_mmio_spte(pte);
>  }
> 
>  static int is_large_pte(u64 pte)
> --
> 2.5.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the body of
> a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] mmu: extend the is_present check to 32 bits
  2016-09-01 14:20   ` Li, Liang Z
@ 2016-09-01 15:16     ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2016-09-01 15:16 UTC (permalink / raw)
  To: Li, Liang Z, Bandan Das, kvm; +Cc: guangrong.xiao, kernellwp, linux-kernel



On 01/09/2016 16:20, Li, Liang Z wrote:
> Intel SDM doesn't describe whether the A bit will be set or not when
> CPU accesses  a no present EPT page table entry

Bits in a non-present page table entry are entirely for use by the OS.
The processor will never touch it.  This includes both P=0 in non-EPT
page tables and XWR=000 in EPT page tables.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-09-01 15:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
2016-07-12 22:18 ` [PATCH v2 1/5] mmu: extend the is_present check to 32 bits Bandan Das
2016-09-01 14:20   ` Li, Liang Z
2016-09-01 15:16     ` Paolo Bonzini
2016-07-12 22:18 ` [PATCH v2 2/5] mmu: don't set the present bit unconditionally Bandan Das
2016-07-13  8:10   ` Paolo Bonzini
2016-07-12 22:18 ` [PATCH v2 3/5] mmu: remove is_present_gpte() Bandan Das
2016-07-12 22:18 ` [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap Bandan Das
2016-07-13  8:23   ` Paolo Bonzini
2016-07-13  9:06     ` Paolo Bonzini
2016-07-12 22:18 ` [PATCH v2 5/5] vmx: advertise support for ept execute only Bandan Das
2016-07-13  9:20 ` [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Paolo Bonzini
2016-07-13  9:49   ` Fam Zheng
2016-07-13 10:03     ` Paolo Bonzini
2016-07-13 15:06   ` Bandan Das
2016-07-13 15:27     ` Paolo Bonzini
2016-07-13 15:47       ` Bandan Das
2016-07-14  6:56         ` Paolo Bonzini
2016-07-14 17:38           ` Bandan Das
2016-07-14 18:29             ` Paolo Bonzini
2016-07-14 18:52               ` Bandan Das

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).