linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] KVM: MMU: return page fault error code from permission_fault
@ 2016-03-08 11:45 Paolo Bonzini
  2016-03-08 11:45 ` [RFC PATCH 1/2] KVM: MMU: precompute page fault error code Paolo Bonzini
  2016-03-08 11:45 ` [RFC PATCH 2/2] KVM: MMU: return page fault error code from permission_fault Paolo Bonzini
  0 siblings, 2 replies; 7+ messages in thread
From: Paolo Bonzini @ 2016-03-08 11:45 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: guangrong.xiao, huaitong.han

Intel's new PKU extension introduces a bit of the page fault error code
that must be dynamically computed after the PTE lookup (unlike I/D, U/S
and W/R).  To ease this, these two patches make permission_fault return
the page fault error code.

Right now it only adds the P bit to the input parameter "pfec", but PKU
can change that.

Paolo

Paolo Bonzini (2):
  KVM: MMU: precompute page fault error code
  KVM: MMU: return page fault error code from permission_fault

 arch/x86/kvm/mmu.c         |  2 +-
 arch/x86/kvm/mmu.h         | 15 ++++++++++-----
 arch/x86/kvm/paging_tmpl.h | 31 +++++++++++++++++--------------
 3 files changed, 28 insertions(+), 20 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/2] KVM: MMU: precompute page fault error code
  2016-03-08 11:45 [RFC PATCH 0/2] KVM: MMU: return page fault error code from permission_fault Paolo Bonzini
@ 2016-03-08 11:45 ` Paolo Bonzini
  2016-03-10 14:01   ` Xiao Guangrong
  2016-03-08 11:45 ` [RFC PATCH 2/2] KVM: MMU: return page fault error code from permission_fault Paolo Bonzini
  1 sibling, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2016-03-08 11:45 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: guangrong.xiao, huaitong.han

For the next patch, we will want to filter PFERR_FETCH_MASK away early,
and not pass it to permission_fault if neither NX nor SMEP are enabled.
Prepare for the change.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c         |  2 +-
 arch/x86/kvm/paging_tmpl.h | 26 +++++++++++++++-----------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2463de0b935c..e57f7be061e3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3883,7 +3883,7 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
 			u = bit & ACC_USER_MASK;
 
 			if (!ept) {
-				/* Not really needed: !nx will cause pte.nx to fault */
+				/* Not really needed: if !nx, ff will always be zero */
 				x |= !mmu->nx;
 				/* Allow supervisor writes if !cr0.wp */
 				w |= !is_write_protection(vcpu) && !uf;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6013f3685ef4..285858d3223b 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -272,13 +272,24 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	gpa_t pte_gpa;
 	int offset;
 	const int write_fault = access & PFERR_WRITE_MASK;
-	const int user_fault  = access & PFERR_USER_MASK;
-	const int fetch_fault = access & PFERR_FETCH_MASK;
-	u16 errcode = 0;
+	u16 errcode;
 	gpa_t real_gpa;
 	gfn_t gfn;
 
 	trace_kvm_mmu_pagetable_walk(addr, access);
+
+	/*
+	 * Do not modify PFERR_FETCH_MASK in access.  It is used later in the call to
+	 * mmu->translate_gpa and, when nested virtualization is in use, the X or NX
+	 * bit of nested page tables always applies---even if NX and SMEP are disabled
+	 * in the guest.
+	 *
+	 * TODO: cache the result of the NX and SMEP test in struct kvm_mmu?
+	 */
+	errcode = access;
+	if (!(mmu->nx || kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
+		errcode &= ~PFERR_FETCH_MASK;
+
 retry_walk:
 	walker->level = mmu->root_level;
 	pte           = mmu->get_cr3(vcpu);
@@ -389,9 +400,7 @@ retry_walk:
 
 	if (unlikely(!accessed_dirty)) {
 		ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, write_fault);
-		if (unlikely(ret < 0))
-			goto error;
-		else if (ret)
+		if (ret > 0)
 			goto retry_walk;
 	}
 
@@ -402,11 +411,6 @@ retry_walk:
 	return 1;
 
 error:
-	errcode |= write_fault | user_fault;
-	if (fetch_fault && (mmu->nx ||
-			    kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
-		errcode |= PFERR_FETCH_MASK;
-
 	walker->fault.vector = PF_VECTOR;
 	walker->fault.error_code_valid = true;
 	walker->fault.error_code = errcode;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/2] KVM: MMU: return page fault error code from permission_fault
  2016-03-08 11:45 [RFC PATCH 0/2] KVM: MMU: return page fault error code from permission_fault Paolo Bonzini
  2016-03-08 11:45 ` [RFC PATCH 1/2] KVM: MMU: precompute page fault error code Paolo Bonzini
@ 2016-03-08 11:45 ` Paolo Bonzini
  2016-03-10 14:03   ` Xiao Guangrong
  1 sibling, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2016-03-08 11:45 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: guangrong.xiao, huaitong.han

Intel's new PKU extension introduces a bit of the page fault error code
that must be dynamically computed after the PTE lookup (unlike I/D, U/S
and W/R).  To ease this, these two patches make permission_fault return
the page fault error code.

Right now it only adds the P bit to the input parameter "pfec", but PKU
can change that.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.h         | 15 ++++++++++-----
 arch/x86/kvm/paging_tmpl.h |  5 ++---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 58fe98a0a526..8b2b3dfca1ae 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -141,11 +141,15 @@ static inline bool is_write_protection(struct kvm_vcpu *vcpu)
 }
 
 /*
- * Will a fault with a given page-fault error code (pfec) cause a permission
- * fault with the given access (in ACC_* format)?
+ * Check if a given access (described through the I/D, W/R and U/S bits of a
+ * page fault error code pfec) causes a permission fault with the given PTE
+ * access rights (in ACC_* format).
+ *
+ * Return zero if the access does not fault; return the page fault error code
+ * if the access faults.
  */
-static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-				    unsigned pte_access, unsigned pfec)
+static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+				  unsigned pte_access, unsigned pfec)
 {
 	int cpl = kvm_x86_ops->get_cpl(vcpu);
 	unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
@@ -169,7 +173,8 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 
 	WARN_ON(pfec & PFERR_RSVD_MASK);
 
-	return (mmu->permissions[index] >> pte_access) & 1;
+	pfec |= PFERR_PRESENT_MASK;
+	return -((mmu->permissions[index] >> pte_access) & 1) & pfec;
 }
 
 void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 285858d3223b..4a6866dab848 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -370,10 +370,9 @@ retry_walk:
 		walker->ptes[walker->level - 1] = pte;
 	} while (!is_last_gpte(mmu, walker->level, pte));
 
-	if (unlikely(permission_fault(vcpu, mmu, pte_access, access))) {
-		errcode |= PFERR_PRESENT_MASK;
+	errcode = permission_fault(vcpu, mmu, pte_access, errcode);
+	if (unlikely(errcode))
 		goto error;
-	}
 
 	gfn = gpte_to_gfn_lvl(pte, walker->level);
 	gfn += (addr & PT_LVL_OFFSET_MASK(walker->level)) >> PAGE_SHIFT;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] KVM: MMU: precompute page fault error code
  2016-03-08 11:45 ` [RFC PATCH 1/2] KVM: MMU: precompute page fault error code Paolo Bonzini
@ 2016-03-10 14:01   ` Xiao Guangrong
  2016-03-10 14:07     ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Xiao Guangrong @ 2016-03-10 14:01 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: huaitong.han



On 03/08/2016 07:45 PM, Paolo Bonzini wrote:
> For the next patch, we will want to filter PFERR_FETCH_MASK away early,
> and not pass it to permission_fault if neither NX nor SMEP are enabled.
> Prepare for the change.

Why it is needed? It is much easier to drop PFEC.F in
update_permission_bitmask().

>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/mmu.c         |  2 +-
>   arch/x86/kvm/paging_tmpl.h | 26 +++++++++++++++-----------
>   2 files changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2463de0b935c..e57f7be061e3 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3883,7 +3883,7 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>   			u = bit & ACC_USER_MASK;
>
>   			if (!ept) {
> -				/* Not really needed: !nx will cause pte.nx to fault */
> +				/* Not really needed: if !nx, ff will always be zero */
>   				x |= !mmu->nx;
>   				/* Allow supervisor writes if !cr0.wp */
>   				w |= !is_write_protection(vcpu) && !uf;
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 6013f3685ef4..285858d3223b 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -272,13 +272,24 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>   	gpa_t pte_gpa;
>   	int offset;
>   	const int write_fault = access & PFERR_WRITE_MASK;
> -	const int user_fault  = access & PFERR_USER_MASK;
> -	const int fetch_fault = access & PFERR_FETCH_MASK;
> -	u16 errcode = 0;
> +	u16 errcode;
>   	gpa_t real_gpa;
>   	gfn_t gfn;
>
>   	trace_kvm_mmu_pagetable_walk(addr, access);
> +
> +	/*
> +	 * Do not modify PFERR_FETCH_MASK in access.  It is used later in the call to
> +	 * mmu->translate_gpa and, when nested virtualization is in use, the X or NX
> +	 * bit of nested page tables always applies---even if NX and SMEP are disabled
> +	 * in the guest.
> +	 *
> +	 * TODO: cache the result of the NX and SMEP test in struct kvm_mmu?
> +	 */
> +	errcode = access;
> +	if (!(mmu->nx || kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
> +		errcode &= ~PFERR_FETCH_MASK;
> +

PFEC.P might be set since it is copied from @access.

And the workload is moved from rare path to the common path...

>   retry_walk:
>   	walker->level = mmu->root_level;
>   	pte           = mmu->get_cr3(vcpu);
> @@ -389,9 +400,7 @@ retry_walk:
>
>   	if (unlikely(!accessed_dirty)) {
>   		ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, write_fault);
> -		if (unlikely(ret < 0))
> -			goto error;
> -		else if (ret)
> +		if (ret > 0)
>   			goto retry_walk;

So it returns normally even if update_accessed_dirty_bits() failed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/2] KVM: MMU: return page fault error code from permission_fault
  2016-03-08 11:45 ` [RFC PATCH 2/2] KVM: MMU: return page fault error code from permission_fault Paolo Bonzini
@ 2016-03-10 14:03   ` Xiao Guangrong
  2016-03-10 14:04     ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Xiao Guangrong @ 2016-03-10 14:03 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: huaitong.han



On 03/08/2016 07:45 PM, Paolo Bonzini wrote:
> Intel's new PKU extension introduces a bit of the page fault error code
> that must be dynamically computed after the PTE lookup (unlike I/D, U/S
> and W/R).  To ease this, these two patches make permission_fault return
> the page fault error code.
>
> Right now it only adds the P bit to the input parameter "pfec", but PKU
> can change that.

Yep, i got the same idea when i reviewed the pkey patchset. This patch
looks good to me.

Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/2] KVM: MMU: return page fault error code from permission_fault
  2016-03-10 14:03   ` Xiao Guangrong
@ 2016-03-10 14:04     ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2016-03-10 14:04 UTC (permalink / raw)
  To: Xiao Guangrong, linux-kernel, kvm; +Cc: huaitong.han



On 10/03/2016 15:03, Xiao Guangrong wrote:
> 
> 
> On 03/08/2016 07:45 PM, Paolo Bonzini wrote:
>> Intel's new PKU extension introduces a bit of the page fault error code
>> that must be dynamically computed after the PTE lookup (unlike I/D, U/S
>> and W/R).  To ease this, these two patches make permission_fault return
>> the page fault error code.
>>
>> Right now it only adds the P bit to the input parameter "pfec", but PKU
>> can change that.
> 
> Yep, i got the same idea when i reviewed the pkey patchset. This patch
> looks good to me.
> 
> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>

Great, feel free to pick it up in the pkey patchset.  I'm answering to
1/2 now.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] KVM: MMU: precompute page fault error code
  2016-03-10 14:01   ` Xiao Guangrong
@ 2016-03-10 14:07     ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2016-03-10 14:07 UTC (permalink / raw)
  To: Xiao Guangrong, linux-kernel, kvm; +Cc: huaitong.han



On 10/03/2016 15:01, Xiao Guangrong wrote:
> 
> 
> On 03/08/2016 07:45 PM, Paolo Bonzini wrote:
>> For the next patch, we will want to filter PFERR_FETCH_MASK away early,
>> and not pass it to permission_fault if neither NX nor SMEP are enabled.
>> Prepare for the change.
> 
> Why it is needed? It is much easier to drop PFEC.F in
> update_permission_bitmask().

It's just how I structured the patches.  It's unrelated to
update_permission_bimask.

I wanted permission_fault to return a fault code, and while it's easy to
add bits (such as PKERR_PK_MASK) it's harder to remove bits from the
PFEC that permission_fault receives.  So I'm doing it here.

Feel free to instead drop PFEC.F in permission_fault() or even add a new
member to kvm_mmu with the valid bits of a PFEC.  This is just an RFC
for you and Huaitong to adapt in the PK patchset.  Sometimes things are
easier to describe in patches than in English. :)

Paolo

>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   arch/x86/kvm/mmu.c         |  2 +-
>>   arch/x86/kvm/paging_tmpl.h | 26 +++++++++++++++-----------
>>   2 files changed, 16 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 2463de0b935c..e57f7be061e3 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3883,7 +3883,7 @@ static void update_permission_bitmask(struct
>> kvm_vcpu *vcpu,
>>               u = bit & ACC_USER_MASK;
>>
>>               if (!ept) {
>> -                /* Not really needed: !nx will cause pte.nx to fault */
>> +                /* Not really needed: if !nx, ff will always be zero */
>>                   x |= !mmu->nx;
>>                   /* Allow supervisor writes if !cr0.wp */
>>                   w |= !is_write_protection(vcpu) && !uf;
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 6013f3685ef4..285858d3223b 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -272,13 +272,24 @@ static int FNAME(walk_addr_generic)(struct
>> guest_walker *walker,
>>       gpa_t pte_gpa;
>>       int offset;
>>       const int write_fault = access & PFERR_WRITE_MASK;
>> -    const int user_fault  = access & PFERR_USER_MASK;
>> -    const int fetch_fault = access & PFERR_FETCH_MASK;
>> -    u16 errcode = 0;
>> +    u16 errcode;
>>       gpa_t real_gpa;
>>       gfn_t gfn;
>>
>>       trace_kvm_mmu_pagetable_walk(addr, access);
>> +
>> +    /*
>> +     * Do not modify PFERR_FETCH_MASK in access.  It is used later in
>> the call to
>> +     * mmu->translate_gpa and, when nested virtualization is in use,
>> the X or NX
>> +     * bit of nested page tables always applies---even if NX and SMEP
>> are disabled
>> +     * in the guest.
>> +     *
>> +     * TODO: cache the result of the NX and SMEP test in struct kvm_mmu?
>> +     */
>> +    errcode = access;
>> +    if (!(mmu->nx || kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
>> +        errcode &= ~PFERR_FETCH_MASK;
>> +
> 
> PFEC.P might be set since it is copied from @access.
> 
> And the workload is moved from rare path to the common path...
> 
>>   retry_walk:
>>       walker->level = mmu->root_level;
>>       pte           = mmu->get_cr3(vcpu);
>> @@ -389,9 +400,7 @@ retry_walk:
>>
>>       if (unlikely(!accessed_dirty)) {
>>           ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker,
>> write_fault);
>> -        if (unlikely(ret < 0))
>> -            goto error;
>> -        else if (ret)
>> +        if (ret > 0)
>>               goto retry_walk;
> 
> So it returns normally even if update_accessed_dirty_bits() failed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-03-10 14:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-08 11:45 [RFC PATCH 0/2] KVM: MMU: return page fault error code from permission_fault Paolo Bonzini
2016-03-08 11:45 ` [RFC PATCH 1/2] KVM: MMU: precompute page fault error code Paolo Bonzini
2016-03-10 14:01   ` Xiao Guangrong
2016-03-10 14:07     ` Paolo Bonzini
2016-03-08 11:45 ` [RFC PATCH 2/2] KVM: MMU: return page fault error code from permission_fault Paolo Bonzini
2016-03-10 14:03   ` Xiao Guangrong
2016-03-10 14:04     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).