* [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
@ 2016-07-12 22:18 Bandan Das
2016-07-12 22:18 ` [PATCH v2 1/5] mmu: extend the is_present check to 32 bits Bandan Das
` (5 more replies)
0 siblings, 6 replies; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel
v1 of this series posted at https://lkml.org/lkml/2016/6/28/7
Changes since v1:
- 1/5 : modify is_shadow_present_pte to check against 0xffffffff
Reasoning provided in commit message.
- 2/5 : Removed 2/5 from v1 since kvm doesn't use execute only.
3/5 from v1 is now 2/5. Introduce shadow_present_mask that
signifies whether ept execute only is supported. Add/remove some
comments as suggested in v1.
- 3/5 : 4/5 from v1 is now 3/5.
- 4/5 : update_permission_bitmask now sets u=1 only if host doesn't
support ept execute only.
- 5/5 : No change
These patches are based on reviews to my RFC
http://www.spinics.net/lists/kvm/msg134440.html
Changes since RFC:
- Remove shadow_xonly_valid, it's not needed
- Remove checks from is_shadow_present_pte()
- In reset_tdp_shadow_zero_bits_mask, pass correct execonly to __reset_rsvds_bits_mask_ept
- Reuse shadow_user_mask in set_spte()
- Remove is_present_gpte() and inline the operation at the two call sites
I spoke to Paolo about this a while back and thought to post this as
RFC while I am thinking of adding some unit tests.
Background: ESX refuses to run as L1 if support for EPT execute only isn't
found. I am not really sure if it uses it for anything since just advertising
the bits seems to work but adding the necessary plumbing seemed like a good idea.
Xiao, I took the liberty of adding you based on "git blame" :)
Thanks in advance.
Bandan Das (5):
mmu: extend the is_present check to 32 bits
mmu: don't set the present bit unconditionally
mmu: remove is_present_gpte()
mmu: change unconditional setting of the u bit in fault bitmap
vmx: advertise support for ept execute only
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.c | 26 ++++++++++++++++++--------
arch/x86/kvm/mmu.h | 5 -----
arch/x86/kvm/paging_tmpl.h | 10 ++++++++--
arch/x86/kvm/vmx.c | 10 ++++++++--
arch/x86/kvm/x86.c | 6 +++---
6 files changed, 38 insertions(+), 21 deletions(-)
--
2.5.5
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 1/5] mmu: extend the is_present check to 32 bits
2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
@ 2016-07-12 22:18 ` Bandan Das
2016-09-01 14:20 ` Li, Liang Z
2016-07-12 22:18 ` [PATCH v2 2/5] mmu: don't set the present bit unconditionally Bandan Das
` (4 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel
This is safe because this function is called
on host controlled page table and non-present/non-MMIO
sptes never use bits 1..31. For the EPT case, this
ensures that cases where only the execute bit is set
is marked valid.
Signed-off-by: Bandan Das <bsd@redhat.com>
---
arch/x86/kvm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index def97b3..87b62dc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -304,7 +304,7 @@ static int is_nx(struct kvm_vcpu *vcpu)
static int is_shadow_present_pte(u64 pte)
{
- return pte & PT_PRESENT_MASK && !is_mmio_spte(pte);
+ return (pte & 0xFFFFFFFFull) && !is_mmio_spte(pte);
}
static int is_large_pte(u64 pte)
--
2.5.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 2/5] mmu: don't set the present bit unconditionally
2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
2016-07-12 22:18 ` [PATCH v2 1/5] mmu: extend the is_present check to 32 bits Bandan Das
@ 2016-07-12 22:18 ` Bandan Das
2016-07-13 8:10 ` Paolo Bonzini
2016-07-12 22:18 ` [PATCH v2 3/5] mmu: remove is_present_gpte() Bandan Das
` (3 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel
To support execute only mappings on behalf of L1
hypervisors, we teach set_spte() to honor L1's valid XWR
bits. This is only if host supports EPT execute only. Reuse
ACC_USER_MASK to signify if the L1 hypervisor has the R bit
set. Add a new variable "shadow_present_mask" that is
set for non EPT cases and preserves the existing behavior
for those cases.
Signed-off-by: Bandan Das <bsd@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.c | 15 ++++++++++++---
arch/x86/kvm/paging_tmpl.h | 8 +++++++-
arch/x86/kvm/vmx.c | 7 +++++--
arch/x86/kvm/x86.c | 4 ++--
5 files changed, 27 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 69e62862..c0acc66 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1026,7 +1026,7 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu);
void kvm_mmu_init_vm(struct kvm *kvm);
void kvm_mmu_uninit_vm(struct kvm *kvm);
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
- u64 dirty_mask, u64 nx_mask, u64 x_mask);
+ u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask);
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 87b62dc..ae80aa4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -175,6 +175,7 @@ static u64 __read_mostly shadow_user_mask;
static u64 __read_mostly shadow_accessed_mask;
static u64 __read_mostly shadow_dirty_mask;
static u64 __read_mostly shadow_mmio_mask;
+static u64 __read_mostly shadow_present_mask;
static void mmu_spte_set(u64 *sptep, u64 spte);
static void mmu_free_roots(struct kvm_vcpu *vcpu);
@@ -282,13 +283,14 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
}
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
- u64 dirty_mask, u64 nx_mask, u64 x_mask)
+ u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask)
{
shadow_user_mask = user_mask;
shadow_accessed_mask = accessed_mask;
shadow_dirty_mask = dirty_mask;
shadow_nx_mask = nx_mask;
shadow_x_mask = x_mask;
+ shadow_present_mask = p_mask;
}
EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
@@ -2515,13 +2517,20 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
gfn_t gfn, kvm_pfn_t pfn, bool speculative,
bool can_unsync, bool host_writable)
{
- u64 spte;
+ u64 spte = 0;
int ret = 0;
if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
return 0;
- spte = PT_PRESENT_MASK;
+ /*
+ * In the non-EPT case, execonly is not valid and so
+ * the following line is equivalent to spte |= PT_PRESENT_MASK.
+ * For the EPT case, shadow_present_mask is 0 if hardware
+ * supports it and we honor whatever way the guest set it.
+ * See: FNAME(gpte_access) in paging_tmpl.h
+ */
+ spte |= shadow_present_mask;
if (!speculative)
spte |= shadow_accessed_mask;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index bc019f7..f2741db 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -181,13 +181,19 @@ no_present:
return true;
}
+/*
+ * For PTTYPE_EPT, a page table can be executable but not readable
+ * on supported processors. Therefore, set_spte does not automatically
+ * set bit 0 if execute only is supported. Here, we repurpose ACC_USER_MASK
+ * to signify readability since it isn't used in the EPT case
+ */
static inline unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, u64 gpte)
{
unsigned access;
#if PTTYPE == PTTYPE_EPT
access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
- ACC_USER_MASK;
+ ((gpte & VMX_EPT_READABLE_MASK) ? ACC_USER_MASK : 0);
#else
BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK);
BUILD_BUG_ON(ACC_EXEC_MASK != 1);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 64a79f2..f73b5dc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6366,10 +6366,13 @@ static __init int hardware_setup(void)
vmx_disable_intercept_msr_write_x2apic(0x83f);
if (enable_ept) {
- kvm_mmu_set_mask_ptes(0ull,
+ kvm_mmu_set_mask_ptes(VMX_EPT_READABLE_MASK,
(enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
- 0ull, VMX_EPT_EXECUTABLE_MASK);
+ 0ull, VMX_EPT_EXECUTABLE_MASK,
+ cpu_has_vmx_ept_execute_only() ?
+ 0ull : PT_PRESENT_MASK);
+ BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
ept_set_mmio_spte_mask();
kvm_enable_tdp();
} else
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7da5dd2..b15f214 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5867,8 +5867,8 @@ int kvm_arch_init(void *opaque)
kvm_x86_ops = ops;
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
- PT_DIRTY_MASK, PT64_NX_MASK, 0);
-
+ PT_DIRTY_MASK, PT64_NX_MASK, 0,
+ PT_PRESENT_MASK);
kvm_timer_init();
perf_register_guest_info_callbacks(&kvm_guest_cbs);
--
2.5.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 3/5] mmu: remove is_present_gpte()
2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
2016-07-12 22:18 ` [PATCH v2 1/5] mmu: extend the is_present check to 32 bits Bandan Das
2016-07-12 22:18 ` [PATCH v2 2/5] mmu: don't set the present bit unconditionally Bandan Das
@ 2016-07-12 22:18 ` Bandan Das
2016-07-12 22:18 ` [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap Bandan Das
` (2 subsequent siblings)
5 siblings, 0 replies; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel
We have two versions of the above function.
To prevent confusion and bugs in the future, remove
the non-FNAME version entirely and replace all calls
with the actual check.
Signed-off-by: Bandan Das <bsd@redhat.com>
---
arch/x86/kvm/mmu.c | 2 +-
arch/x86/kvm/mmu.h | 5 -----
arch/x86/kvm/paging_tmpl.h | 2 +-
arch/x86/kvm/x86.c | 2 +-
4 files changed, 3 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ae80aa4..c364dcb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3198,7 +3198,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
MMU_WARN_ON(VALID_PAGE(root));
if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
pdptr = vcpu->arch.mmu.get_pdptr(vcpu, i);
- if (!is_present_gpte(pdptr)) {
+ if (!(pdptr & PT_PRESENT_MASK)) {
vcpu->arch.mmu.pae_root[i] = 0;
continue;
}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 66b33b9..ddc56e9 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -93,11 +93,6 @@ static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
return kvm_mmu_load(vcpu);
}
-static inline int is_present_gpte(unsigned long pte)
-{
- return pte & PT_PRESENT_MASK;
-}
-
/*
* Currently, we have two sorts of write-protection, a) the first one
* write-protects guest page to sync the guest modification, b) another one is
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f2741db..a011054 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -131,7 +131,7 @@ static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
static inline int FNAME(is_present_gpte)(unsigned long pte)
{
#if PTTYPE != PTTYPE_EPT
- return is_present_gpte(pte);
+ return pte & PT_PRESENT_MASK;
#else
return pte & 7;
#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b15f214..2e8618f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -537,7 +537,7 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
goto out;
}
for (i = 0; i < ARRAY_SIZE(pdpte); ++i) {
- if (is_present_gpte(pdpte[i]) &&
+ if ((pdpte[i] & PT_PRESENT_MASK) &&
(pdpte[i] &
vcpu->arch.mmu.guest_rsvd_check.rsvd_bits_mask[0][2])) {
ret = 0;
--
2.5.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap
2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
` (2 preceding siblings ...)
2016-07-12 22:18 ` [PATCH v2 3/5] mmu: remove is_present_gpte() Bandan Das
@ 2016-07-12 22:18 ` Bandan Das
2016-07-13 8:23 ` Paolo Bonzini
2016-07-12 22:18 ` [PATCH v2 5/5] vmx: advertise support for ept execute only Bandan Das
2016-07-13 9:20 ` [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Paolo Bonzini
5 siblings, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel
For the nested EPT case, we assume that the read bit (u) is
always set since we used to unconditionally set it in set_spte().
Modify it to only be set when host ept execute only support
isn't present.
Signed-off-by: Bandan Das <bsd@redhat.com>
---
arch/x86/kvm/mmu.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c364dcb..566eea5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3923,9 +3923,10 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
* clearer.
*/
smap = cr4_smap && u && !uf && !ff;
- } else
- /* Not really needed: no U/S accesses on ept */
- u = 1;
+ } else {
+ if (shadow_present_mask)
+ u = 1;
+ }
fault = (ff && !x) || (uf && !u) || (wf && !w) ||
(smapf && smap);
--
2.5.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 5/5] vmx: advertise support for ept execute only
2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
` (3 preceding siblings ...)
2016-07-12 22:18 ` [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap Bandan Das
@ 2016-07-12 22:18 ` Bandan Das
2016-07-13 9:20 ` [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Paolo Bonzini
5 siblings, 0 replies; 21+ messages in thread
From: Bandan Das @ 2016-07-12 22:18 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel
MMU now knows about execute only mappings, so
advertise the feature to L1 hypervisors
Signed-off-by: Bandan Das <bsd@redhat.com>
---
arch/x86/kvm/vmx.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f73b5dc..2c49355f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2717,6 +2717,9 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
VMX_EPT_INVEPT_BIT;
+ if (cpu_has_vmx_ept_execute_only())
+ vmx->nested.nested_vmx_ept_caps |=
+ VMX_EPT_EXECUTE_ONLY_BIT;
vmx->nested.nested_vmx_ept_caps &= vmx_capability.ept;
/*
* For nested guests, we don't do anything specific
--
2.5.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 2/5] mmu: don't set the present bit unconditionally
2016-07-12 22:18 ` [PATCH v2 2/5] mmu: don't set the present bit unconditionally Bandan Das
@ 2016-07-13 8:10 ` Paolo Bonzini
0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13 8:10 UTC (permalink / raw)
To: Bandan Das, kvm; +Cc: guangrong.xiao, kernellwp, linux-kernel
On 13/07/2016 00:18, Bandan Das wrote:
> + /*
> + * In the non-EPT case, execonly is not valid and so
> + * the following line is equivalent to spte |= PT_PRESENT_MASK.
I think the comment need not mention non-EPT.
> + * For the EPT case, shadow_present_mask is 0 if hardware
> + * supports it and we honor whatever way the guest set it.
if hardware supports exec-only page table entries.
> + * See: FNAME(gpte_access) in paging_tmpl.h
> + */
> + spte |= shadow_present_mask;
> if (!speculative)
> spte |= shadow_accessed_mask;
>
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index bc019f7..f2741db 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -181,13 +181,19 @@ no_present:
> return true;
> }
>
> +/*
> + * For PTTYPE_EPT, a page table can be executable but not readable
> + * on supported processors. Therefore, set_spte does not automatically
> + * set bit 0 if execute only is supported. Here, we repurpose ACC_USER_MASK
> + * to signify readability since it isn't used in the EPT case
> + */
> static inline unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, u64 gpte)
> {
> unsigned access;
> #if PTTYPE == PTTYPE_EPT
> access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
> ((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
> - ACC_USER_MASK;
> + ((gpte & VMX_EPT_READABLE_MASK) ? ACC_USER_MASK : 0);
> #else
> BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK);
> BUILD_BUG_ON(ACC_EXEC_MASK != 1);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 64a79f2..f73b5dc 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6366,10 +6366,13 @@ static __init int hardware_setup(void)
> vmx_disable_intercept_msr_write_x2apic(0x83f);
>
> if (enable_ept) {
> - kvm_mmu_set_mask_ptes(0ull,
> + kvm_mmu_set_mask_ptes(VMX_EPT_READABLE_MASK,
> (enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
> (enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
> - 0ull, VMX_EPT_EXECUTABLE_MASK);
> + 0ull, VMX_EPT_EXECUTABLE_MASK,
> + cpu_has_vmx_ept_execute_only() ?
> + 0ull : PT_PRESENT_MASK);
Better to use VMX_EPT_READABLE_MASK here, which removes the need for the
BUILD_BUG_ON.
I can do these changes myself.
Paolo
> + BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap
2016-07-12 22:18 ` [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap Bandan Das
@ 2016-07-13 8:23 ` Paolo Bonzini
2016-07-13 9:06 ` Paolo Bonzini
0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13 8:23 UTC (permalink / raw)
To: Bandan Das, kvm; +Cc: guangrong.xiao, kernellwp, linux-kernel
On 13/07/2016 00:18, Bandan Das wrote:
> For the nested EPT case, we assume that the read bit (u) is
> always set since we used to unconditionally set it in set_spte().
> Modify it to only be set when host ept execute only support
> isn't present.
>
> Signed-off-by: Bandan Das <bsd@redhat.com>
> ---
> arch/x86/kvm/mmu.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index c364dcb..566eea5 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3923,9 +3923,10 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
> * clearer.
> */
> smap = cr4_smap && u && !uf && !ff;
> - } else
> - /* Not really needed: no U/S accesses on ept */
> - u = 1;
> + } else {
> + if (shadow_present_mask)
> + u = 1;
> + }
>
> fault = (ff && !x) || (uf && !u) || (wf && !w) ||
> (smapf && smap);
>
This is needed too, in order to look up with the correct uf:
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 003618e324ce..941d345ebd2c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6014,12 +6014,14 @@ static int handle_ept_violation(struct kvm_vcpu
*vcpu)
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
trace_kvm_page_fault(gpa, exit_qualification);
- /* It is a write fault? */
+ /* it is a read fault? */
+ error_code = (exit_qualification << 2) & PFERR_USER_MASK;
+ /* it is a write fault? */
error_code = exit_qualification & PFERR_WRITE_MASK;
/* It is a fetch fault? */
error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
/* ept page table is present? */
- error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
+ error_code |= (exit_qualification & 0x38) != 0;
vcpu->arch.exit_qualification = exit_qualification;
Otherwise you always lookup the permission bitmap for an uf=0 case.
This was the meaning of the "Not really needed" comment.
Paolo
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap
2016-07-13 8:23 ` Paolo Bonzini
@ 2016-07-13 9:06 ` Paolo Bonzini
0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13 9:06 UTC (permalink / raw)
To: Bandan Das, kvm; +Cc: guangrong.xiao, kernellwp, linux-kernel
On 13/07/2016 10:23, Paolo Bonzini wrote:
>
>
> On 13/07/2016 00:18, Bandan Das wrote:
>> For the nested EPT case, we assume that the read bit (u) is
>> always set since we used to unconditionally set it in set_spte().
>> Modify it to only be set when host ept execute only support
>> isn't present.
>>
>> Signed-off-by: Bandan Das <bsd@redhat.com>
>> ---
>> arch/x86/kvm/mmu.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index c364dcb..566eea5 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3923,9 +3923,10 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>> * clearer.
>> */
>> smap = cr4_smap && u && !uf && !ff;
>> - } else
>> - /* Not really needed: no U/S accesses on ept */
>> - u = 1;
>> + } else {
>> + if (shadow_present_mask)
>> + u = 1;
... in fact I think this should be removed completely.
If shadow_present_mask is set, the R bit will always be set in a present
PTE and FNAME(gpte_access) will change that to ACC_USER_MASK---which in
turn means that u=1.
Paolo
>> + }
>>
>> fault = (ff && !x) || (uf && !u) || (wf && !w) ||
>> (smapf && smap);
>>
>
> This is needed too, in order to look up with the correct uf:
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 003618e324ce..941d345ebd2c 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6014,12 +6014,14 @@ static int handle_ept_violation(struct kvm_vcpu
> *vcpu)
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> trace_kvm_page_fault(gpa, exit_qualification);
>
> - /* It is a write fault? */
> + /* it is a read fault? */
> + error_code = (exit_qualification << 2) & PFERR_USER_MASK;
> + /* it is a write fault? */
> error_code = exit_qualification & PFERR_WRITE_MASK;
> /* It is a fetch fault? */
> error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
> /* ept page table is present? */
> - error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
> + error_code |= (exit_qualification & 0x38) != 0;
>
> vcpu->arch.exit_qualification = exit_qualification;
>
>
> Otherwise you always lookup the permission bitmap for an uf=0 case.
> This was the meaning of the "Not really needed" comment.
>
> Paolo
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
` (4 preceding siblings ...)
2016-07-12 22:18 ` [PATCH v2 5/5] vmx: advertise support for ept execute only Bandan Das
@ 2016-07-13 9:20 ` Paolo Bonzini
2016-07-13 9:49 ` Fam Zheng
2016-07-13 15:06 ` Bandan Das
5 siblings, 2 replies; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13 9:20 UTC (permalink / raw)
To: Bandan Das, kvm; +Cc: guangrong.xiao, kernellwp, linux-kernel
On 13/07/2016 00:18, Bandan Das wrote:
> v1 of this series posted at https://lkml.org/lkml/2016/6/28/7
>
> Changes since v1:
> - 1/5 : modify is_shadow_present_pte to check against 0xffffffff
> Reasoning provided in commit message.
> - 2/5 : Removed 2/5 from v1 since kvm doesn't use execute only.
> 3/5 from v1 is now 2/5. Introduce shadow_present_mask that
> signifies whether ept execute only is supported. Add/remove some
> comments as suggested in v1.
> - 3/5 : 4/5 from v1 is now 3/5.
> - 4/5 : update_permission_bitmask now sets u=1 only if host doesn't
> support ept execute only.
> - 5/5 : No change
These are the diffs I have after review, do they look okay?
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 190c0559c221..bd2535fdb9eb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
return 0;
/*
- * In the non-EPT case, execonly is not valid and so
- * the following line is equivalent to spte |= PT_PRESENT_MASK.
* For the EPT case, shadow_present_mask is 0 if hardware
- * supports it and we honor whatever way the guest set it.
- * See: FNAME(gpte_access) in paging_tmpl.h
+ * supports exec-only page table entries. In that case,
+ * ACC_USER_MASK and shadow_user_mask are used to represent
+ * read access. See FNAME(gpte_access) in paging_tmpl.h.
*/
spte |= shadow_present_mask;
if (!speculative)
@@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
* clearer.
*/
smap = cr4_smap && u && !uf && !ff;
- } else {
- if (shadow_present_mask)
- u = 1;
}
fault = (ff && !x) || (uf && !u) || (wf && !w) ||
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 576c47cda1a3..dfef081e76c0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
trace_kvm_page_fault(gpa, exit_qualification);
- /* It is a write fault? */
+ /* it is a read fault? */
+ error_code = (exit_qualification << 2) & PFERR_USER_MASK;
+ /* it is a write fault? */
error_code = exit_qualification & PFERR_WRITE_MASK;
/* It is a fetch fault? */
error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
/* ept page table is present? */
- error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
+ error_code |= (exit_qualification & 0x38) != 0;
vcpu->arch.exit_qualification = exit_qualification;
@@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
0ull, VMX_EPT_EXECUTABLE_MASK,
cpu_has_vmx_ept_execute_only() ?
- 0ull : PT_PRESENT_MASK);
- BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
+ 0ull : VMX_EPT_READABLE_MASK);
ept_set_mmio_spte_mask();
kvm_enable_tdp();
} else
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
2016-07-13 9:20 ` [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Paolo Bonzini
@ 2016-07-13 9:49 ` Fam Zheng
2016-07-13 10:03 ` Paolo Bonzini
2016-07-13 15:06 ` Bandan Das
1 sibling, 1 reply; 21+ messages in thread
From: Fam Zheng @ 2016-07-13 9:49 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Bandan Das, kvm, guangrong.xiao, kernellwp, linux-kernel
On Wed, 07/13 11:20, Paolo Bonzini wrote:
>
>
> On 13/07/2016 00:18, Bandan Das wrote:
> > v1 of this series posted at https://lkml.org/lkml/2016/6/28/7
> >
> > Changes since v1:
> > - 1/5 : modify is_shadow_present_pte to check against 0xffffffff
> > Reasoning provided in commit message.
> > - 2/5 : Removed 2/5 from v1 since kvm doesn't use execute only.
> > 3/5 from v1 is now 2/5. Introduce shadow_present_mask that
> > signifies whether ept execute only is supported. Add/remove some
> > comments as suggested in v1.
> > - 3/5 : 4/5 from v1 is now 3/5.
> > - 4/5 : update_permission_bitmask now sets u=1 only if host doesn't
> > support ept execute only.
> > - 5/5 : No change
>
> These are the diffs I have after review, do they look okay?
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 190c0559c221..bd2535fdb9eb 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> return 0;
>
> /*
> - * In the non-EPT case, execonly is not valid and so
> - * the following line is equivalent to spte |= PT_PRESENT_MASK.
> * For the EPT case, shadow_present_mask is 0 if hardware
> - * supports it and we honor whatever way the guest set it.
> - * See: FNAME(gpte_access) in paging_tmpl.h
> + * supports exec-only page table entries. In that case,
> + * ACC_USER_MASK and shadow_user_mask are used to represent
> + * read access. See FNAME(gpte_access) in paging_tmpl.h.
> */
> spte |= shadow_present_mask;
> if (!speculative)
> @@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
> * clearer.
> */
> smap = cr4_smap && u && !uf && !ff;
> - } else {
> - if (shadow_present_mask)
> - u = 1;
> }
>
> fault = (ff && !x) || (uf && !u) || (wf && !w) ||
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 576c47cda1a3..dfef081e76c0 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> trace_kvm_page_fault(gpa, exit_qualification);
>
> - /* It is a write fault? */
> + /* it is a read fault? */
> + error_code = (exit_qualification << 2) & PFERR_USER_MASK;
> + /* it is a write fault? */
> error_code = exit_qualification & PFERR_WRITE_MASK;
Did you mean s/=/|=/ for this line?
Fam
> /* It is a fetch fault? */
> error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
> /* ept page table is present? */
> - error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
> + error_code |= (exit_qualification & 0x38) != 0;
>
> vcpu->arch.exit_qualification = exit_qualification;
>
> @@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
> (enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
> 0ull, VMX_EPT_EXECUTABLE_MASK,
> cpu_has_vmx_ept_execute_only() ?
> - 0ull : PT_PRESENT_MASK);
> - BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
> + 0ull : VMX_EPT_READABLE_MASK);
> ept_set_mmio_spte_mask();
> kvm_enable_tdp();
> } else
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
2016-07-13 9:49 ` Fam Zheng
@ 2016-07-13 10:03 ` Paolo Bonzini
0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13 10:03 UTC (permalink / raw)
To: Fam Zheng; +Cc: Bandan Das, kvm, guangrong.xiao, kernellwp, linux-kernel
On 13/07/2016 11:49, Fam Zheng wrote:
> > + error_code = (exit_qualification << 2) & PFERR_USER_MASK;
> > + /* it is a write fault? */
> > error_code = exit_qualification & PFERR_WRITE_MASK;
>
> Did you mean s/=/|=/ for this line?
Ugh, yes.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
2016-07-13 9:20 ` [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Paolo Bonzini
2016-07-13 9:49 ` Fam Zheng
@ 2016-07-13 15:06 ` Bandan Das
2016-07-13 15:27 ` Paolo Bonzini
1 sibling, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-13 15:06 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 13/07/2016 00:18, Bandan Das wrote:
>> v1 of this series posted at https://lkml.org/lkml/2016/6/28/7
>>
>> Changes since v1:
>> - 1/5 : modify is_shadow_present_pte to check against 0xffffffff
>> Reasoning provided in commit message.
>> - 2/5 : Removed 2/5 from v1 since kvm doesn't use execute only.
>> 3/5 from v1 is now 2/5. Introduce shadow_present_mask that
>> signifies whether ept execute only is supported. Add/remove some
>> comments as suggested in v1.
>> - 3/5 : 4/5 from v1 is now 3/5.
>> - 4/5 : update_permission_bitmask now sets u=1 only if host doesn't
>> support ept execute only.
>> - 5/5 : No change
>
> These are the diffs I have after review, do they look okay?
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 190c0559c221..bd2535fdb9eb 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> return 0;
>
> /*
> - * In the non-EPT case, execonly is not valid and so
> - * the following line is equivalent to spte |= PT_PRESENT_MASK.
> * For the EPT case, shadow_present_mask is 0 if hardware
> - * supports it and we honor whatever way the guest set it.
> - * See: FNAME(gpte_access) in paging_tmpl.h
> + * supports exec-only page table entries. In that case,
> + * ACC_USER_MASK and shadow_user_mask are used to represent
> + * read access. See FNAME(gpte_access) in paging_tmpl.h.
> */
I would still prefer a note about the non-EPT case, makes it easy to
understand.
> spte |= shadow_present_mask;
> if (!speculative)
> @@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
> * clearer.
> */
> smap = cr4_smap && u && !uf && !ff;
> - } else {
> - if (shadow_present_mask)
> - u = 1;
> }
>
> fault = (ff && !x) || (uf && !u) || (wf && !w) ||
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 576c47cda1a3..dfef081e76c0 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> trace_kvm_page_fault(gpa, exit_qualification);
>
> - /* It is a write fault? */
> + /* it is a read fault? */
> + error_code = (exit_qualification << 2) & PFERR_USER_MASK;
> + /* it is a write fault? */
> error_code = exit_qualification & PFERR_WRITE_MASK;
> /* It is a fetch fault? */
> error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
> /* ept page table is present? */
> - error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
> + error_code |= (exit_qualification & 0x38) != 0;
>
Thank you for the thorough review here. I missed that we didn't set the read bit
at all. I am still a little unclear how permission_fault works though...
> vcpu->arch.exit_qualification = exit_qualification;
>
> @@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
> (enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
> 0ull, VMX_EPT_EXECUTABLE_MASK,
> cpu_has_vmx_ept_execute_only() ?
> - 0ull : PT_PRESENT_MASK);
> - BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
> + 0ull : VMX_EPT_READABLE_MASK);
I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
is an assumption all throughout. I wanted to use this section to catch mismatches.
Bandan
> ept_set_mmio_spte_mask();
> kvm_enable_tdp();
> } else
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
2016-07-13 15:06 ` Bandan Das
@ 2016-07-13 15:27 ` Paolo Bonzini
2016-07-13 15:47 ` Bandan Das
0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-13 15:27 UTC (permalink / raw)
To: Bandan Das; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel
On 13/07/2016 17:06, Bandan Das wrote:
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 190c0559c221..bd2535fdb9eb 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>> return 0;
>>
>> /*
>> - * In the non-EPT case, execonly is not valid and so
>> - * the following line is equivalent to spte |= PT_PRESENT_MASK.
>> * For the EPT case, shadow_present_mask is 0 if hardware
>> - * supports it and we honor whatever way the guest set it.
>> - * See: FNAME(gpte_access) in paging_tmpl.h
>> + * supports exec-only page table entries. In that case,
>> + * ACC_USER_MASK and shadow_user_mask are used to represent
>> + * read access. See FNAME(gpte_access) in paging_tmpl.h.
>> */
>
> I would still prefer a note about the non-EPT case, makes it easy to
> understand.
I can add "shadow_present_mask is PT_PRESENT_MASK in the non-EPT case"
but it's a bit of a tautology.
>> spte |= shadow_present_mask;
>> if (!speculative)
>> @@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>> * clearer.
>> */
>> smap = cr4_smap && u && !uf && !ff;
>> - } else {
>> - if (shadow_present_mask)
>> - u = 1;
>> }
>>
>> fault = (ff && !x) || (uf && !u) || (wf && !w) ||
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 576c47cda1a3..dfef081e76c0 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>> trace_kvm_page_fault(gpa, exit_qualification);
>>
>> - /* It is a write fault? */
>> + /* it is a read fault? */
>> + error_code = (exit_qualification << 2) & PFERR_USER_MASK;
>> + /* it is a write fault? */
>> error_code = exit_qualification & PFERR_WRITE_MASK;
>> /* It is a fetch fault? */
>> error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
>> /* ept page table is present? */
>> - error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
>> + error_code |= (exit_qualification & 0x38) != 0;
>>
>
> Thank you for the thorough review here. I missed that we didn't set the read bit
> at all. I am still a little unclear how permission_fault works though...
>
>> vcpu->arch.exit_qualification = exit_qualification;
>>
>> @@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
>> (enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
>> 0ull, VMX_EPT_EXECUTABLE_MASK,
>> cpu_has_vmx_ept_execute_only() ?
>> - 0ull : PT_PRESENT_MASK);
>> - BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
>> + 0ull : VMX_EPT_READABLE_MASK);
>
> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
> is an assumption all throughout. I wanted to use this section to catch mismatches.
I think there's no such assumption anymore, actually. Can you double
check? If there are any, that's where the BUILD_BUG_ON should be.
Paolo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
2016-07-13 15:27 ` Paolo Bonzini
@ 2016-07-13 15:47 ` Bandan Das
2016-07-14 6:56 ` Paolo Bonzini
0 siblings, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-13 15:47 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 13/07/2016 17:06, Bandan Das wrote:
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index 190c0559c221..bd2535fdb9eb 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>>> return 0;
>>>
>>> /*
>>> - * In the non-EPT case, execonly is not valid and so
>>> - * the following line is equivalent to spte |= PT_PRESENT_MASK.
>>> * For the EPT case, shadow_present_mask is 0 if hardware
>>> - * supports it and we honor whatever way the guest set it.
>>> - * See: FNAME(gpte_access) in paging_tmpl.h
>>> + * supports exec-only page table entries. In that case,
>>> + * ACC_USER_MASK and shadow_user_mask are used to represent
>>> + * read access. See FNAME(gpte_access) in paging_tmpl.h.
>>> */
>>
>> I would still prefer a note about the non-EPT case, makes it easy to
>> understand.
>
> I can add "shadow_present_mask is PT_PRESENT_MASK in the non-EPT case"
> but it's a bit of a tautology.
shadow_present_mask actually signifies different things for ept/non-ept
cases and it doesn't hurt to mention it. But I get your point, maybe,
it's self-explanatory.
>>> spte |= shadow_present_mask;
>>> if (!speculative)
>>> @@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>>> * clearer.
>>> */
>>> smap = cr4_smap && u && !uf && !ff;
>>> - } else {
>>> - if (shadow_present_mask)
>>> - u = 1;
>>> }
>>>
>>> fault = (ff && !x) || (uf && !u) || (wf && !w) ||
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 576c47cda1a3..dfef081e76c0 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>>> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>>> trace_kvm_page_fault(gpa, exit_qualification);
>>>
>>> - /* It is a write fault? */
>>> + /* it is a read fault? */
>>> + error_code = (exit_qualification << 2) & PFERR_USER_MASK;
>>> + /* it is a write fault? */
>>> error_code = exit_qualification & PFERR_WRITE_MASK;
>>> /* It is a fetch fault? */
>>> error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
>>> /* ept page table is present? */
>>> - error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
>>> + error_code |= (exit_qualification & 0x38) != 0;
>>>
>>
>> Thank you for the thorough review here. I missed that we didn't set the read bit
>> at all. I am still a little unclear how permission_fault works though...
>>
>>> vcpu->arch.exit_qualification = exit_qualification;
>>>
>>> @@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
>>> (enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
>>> 0ull, VMX_EPT_EXECUTABLE_MASK,
>>> cpu_has_vmx_ept_execute_only() ?
>>> - 0ull : PT_PRESENT_MASK);
>>> - BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
>>> + 0ull : VMX_EPT_READABLE_MASK);
>>
>> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
>> is an assumption all throughout. I wanted to use this section to catch mismatches.
>
> I think there's no such assumption anymore, actually. Can you double
> check? If there are any, that's where the BUILD_BUG_ON should be.
What I meant is how they are the same bit. is_shadow_present_pte() is probably one
and another one is link_shadow_page() which already has a BUILD_BUG_ON().
> Paolo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
2016-07-13 15:47 ` Bandan Das
@ 2016-07-14 6:56 ` Paolo Bonzini
2016-07-14 17:38 ` Bandan Das
0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-14 6:56 UTC (permalink / raw)
To: Bandan Das; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel
On 13/07/2016 17:47, Bandan Das wrote:
>>> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
>>> is an assumption all throughout. I wanted to use this section to catch mismatches.
>>
>> I think there's no such assumption anymore, actually. Can you double
>> check? If there are any, that's where the BUILD_BUG_ON should be.
>
> What I meant is how they are the same bit. is_shadow_present_pte() is probably one
> and another one is link_shadow_page() which already has a BUILD_BUG_ON().
You're right about link_shadow_page()! We probably should change the
PT_PRESENT_MASK to shadow_present_mask there (and then readability in
the EPT execonly case is still provided by shadow_user_mask).
For is_shadow_present_pte() you have removed it in patch 1 though.
Paolo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
2016-07-14 6:56 ` Paolo Bonzini
@ 2016-07-14 17:38 ` Bandan Das
2016-07-14 18:29 ` Paolo Bonzini
0 siblings, 1 reply; 21+ messages in thread
From: Bandan Das @ 2016-07-14 17:38 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 13/07/2016 17:47, Bandan Das wrote:
>>>> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
>>>> is an assumption all throughout. I wanted to use this section to catch mismatches.
>>>
>>> I think there's no such assumption anymore, actually. Can you double
>>> check? If there are any, that's where the BUILD_BUG_ON should be.
>>
>> What I meant is how they are the same bit. is_shadow_present_pte() is probably one
>> and another one is link_shadow_page() which already has a BUILD_BUG_ON().
>
> You're right about link_shadow_page()! We probably should change the
> PT_PRESENT_MASK to shadow_present_mask there (and then readability in
> the EPT execonly case is still provided by shadow_user_mask).
Makes sense. Would you like a new version with that added or can that be a
separate patch ?
> For is_shadow_present_pte() you have removed it in patch 1 though.
Right. But the assumption is still that is_shadow_present_pte() works because
EPT_READABLE and PT_PRESENT are the same.
> Paolo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
2016-07-14 17:38 ` Bandan Das
@ 2016-07-14 18:29 ` Paolo Bonzini
2016-07-14 18:52 ` Bandan Das
0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2016-07-14 18:29 UTC (permalink / raw)
To: Bandan Das; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel
On 14/07/2016 19:38, Bandan Das wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> On 13/07/2016 17:47, Bandan Das wrote:
>>>>> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
>>>>> is an assumption all throughout. I wanted to use this section to catch mismatches.
>>>>
>>>> I think there's no such assumption anymore, actually. Can you double
>>>> check? If there are any, that's where the BUILD_BUG_ON should be.
>>>
>>> What I meant is how they are the same bit. is_shadow_present_pte() is probably one
>>> and another one is link_shadow_page() which already has a BUILD_BUG_ON().
>>
>> You're right about link_shadow_page()! We probably should change the
>> PT_PRESENT_MASK to shadow_present_mask there (and then readability in
>> the EPT execonly case is still provided by shadow_user_mask).
>
> Makes sense. Would you like a new version with that added or can that be a
> separate patch ?
I've already done it and pushed it to kvm/next. :)
>> For is_shadow_present_pte() you have removed it in patch 1 though.
>
> Right. But the assumption is still that is_shadow_present_pte() works because
> EPT_READABLE and PT_PRESENT are the same.
is_shadow_present_pte() tests 0xFFFFFFFF, so it does not depend on bit 0
alone, for neither EPT nor "normal" page tables.
Paolo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors
2016-07-14 18:29 ` Paolo Bonzini
@ 2016-07-14 18:52 ` Bandan Das
0 siblings, 0 replies; 21+ messages in thread
From: Bandan Das @ 2016-07-14 18:52 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, guangrong.xiao, kernellwp, linux-kernel
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 14/07/2016 19:38, Bandan Das wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>
>>> On 13/07/2016 17:47, Bandan Das wrote:
>>>>>> I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
>>>>>> is an assumption all throughout. I wanted to use this section to catch mismatches.
>>>>>
>>>>> I think there's no such assumption anymore, actually. Can you double
>>>>> check? If there are any, that's where the BUILD_BUG_ON should be.
>>>>
>>>> What I meant is how they are the same bit. is_shadow_present_pte() is probably one
>>>> and another one is link_shadow_page() which already has a BUILD_BUG_ON().
>>>
>>> You're right about link_shadow_page()! We probably should change the
>>> PT_PRESENT_MASK to shadow_present_mask there (and then readability in
>>> the EPT execonly case is still provided by shadow_user_mask).
>>
>> Makes sense. Would you like a new version with that added or can that be a
>> separate patch ?
>
> I've already done it and pushed it to kvm/next. :)
Ah, thank you!
>>> For is_shadow_present_pte() you have removed it in patch 1 though.
>>
>> Right. But the assumption is still that is_shadow_present_pte() works because
>> EPT_READABLE and PT_PRESENT are the same.
>
> is_shadow_present_pte() tests 0xFFFFFFFF, so it does not depend on bit 0
> alone, for neither EPT nor "normal" page tables.
Yeah... Let me rephrase, is_shadow_present_pte works because the assumption
is that both of the bits are in the first 32 bits :) You proved me wrong though,
this assumption does not mean a BUILD_BUG for the equal condition is required here.
Bandan
> Paolo
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v2 1/5] mmu: extend the is_present check to 32 bits
2016-07-12 22:18 ` [PATCH v2 1/5] mmu: extend the is_present check to 32 bits Bandan Das
@ 2016-09-01 14:20 ` Li, Liang Z
2016-09-01 15:16 ` Paolo Bonzini
0 siblings, 1 reply; 21+ messages in thread
From: Li, Liang Z @ 2016-09-01 14:20 UTC (permalink / raw)
To: Bandan Das, kvm; +Cc: pbonzini, guangrong.xiao, kernellwp, linux-kernel
Intel SDM doesn't describe whether the A bit will be set or not when CPU accesses a no present EPT page table entry?
even this patch works for the current CPU, it's not good to make such an assumption.
Should we revert it?
Thanks!
Liang
> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> On Behalf Of Bandan Das
> Sent: Wednesday, July 13, 2016 6:19 AM
> To: kvm@vger.kernel.org
> Cc: pbonzini@redhat.com; guangrong.xiao@linux.intel.com;
> kernellwp@gmail.com; linux-kernel@vger.kernel.org
> Subject: [PATCH v2 1/5] mmu: extend the is_present check to 32 bits
>
> This is safe because this function is called on host controlled page table and
> non-present/non-MMIO sptes never use bits 1..31. For the EPT case, this
> ensures that cases where only the execute bit is set is marked valid.
>
> Signed-off-by: Bandan Das <bsd@redhat.com>
> ---
> arch/x86/kvm/mmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index
> def97b3..87b62dc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -304,7 +304,7 @@ static int is_nx(struct kvm_vcpu *vcpu)
>
> static int is_shadow_present_pte(u64 pte) {
> - return pte & PT_PRESENT_MASK && !is_mmio_spte(pte);
> + return (pte & 0xFFFFFFFFull) && !is_mmio_spte(pte);
> }
>
> static int is_large_pte(u64 pte)
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the body of
> a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/5] mmu: extend the is_present check to 32 bits
2016-09-01 14:20 ` Li, Liang Z
@ 2016-09-01 15:16 ` Paolo Bonzini
0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2016-09-01 15:16 UTC (permalink / raw)
To: Li, Liang Z, Bandan Das, kvm; +Cc: guangrong.xiao, kernellwp, linux-kernel
On 01/09/2016 16:20, Li, Liang Z wrote:
> Intel SDM doesn't describe whether the A bit will be set or not when
> CPU accesses a no present EPT page table entry
Bits in a non-present page table entry are entirely for use by the OS.
The processor will never touch it. This includes both P=0 in non-EPT
page tables and XWR=000 in EPT page tables.
Paolo
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2016-09-01 15:16 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12 22:18 [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Bandan Das
2016-07-12 22:18 ` [PATCH v2 1/5] mmu: extend the is_present check to 32 bits Bandan Das
2016-09-01 14:20 ` Li, Liang Z
2016-09-01 15:16 ` Paolo Bonzini
2016-07-12 22:18 ` [PATCH v2 2/5] mmu: don't set the present bit unconditionally Bandan Das
2016-07-13 8:10 ` Paolo Bonzini
2016-07-12 22:18 ` [PATCH v2 3/5] mmu: remove is_present_gpte() Bandan Das
2016-07-12 22:18 ` [PATCH v2 4/5] mmu: change unconditional setting of the u bit in fault bitmap Bandan Das
2016-07-13 8:23 ` Paolo Bonzini
2016-07-13 9:06 ` Paolo Bonzini
2016-07-12 22:18 ` [PATCH v2 5/5] vmx: advertise support for ept execute only Bandan Das
2016-07-13 9:20 ` [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors Paolo Bonzini
2016-07-13 9:49 ` Fam Zheng
2016-07-13 10:03 ` Paolo Bonzini
2016-07-13 15:06 ` Bandan Das
2016-07-13 15:27 ` Paolo Bonzini
2016-07-13 15:47 ` Bandan Das
2016-07-14 6:56 ` Paolo Bonzini
2016-07-14 17:38 ` Bandan Das
2016-07-14 18:29 ` Paolo Bonzini
2016-07-14 18:52 ` Bandan Das
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).