linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: mmu: allow page tables to be in read-only slots
@ 2013-08-30 12:41 Paolo Bonzini
  2013-09-01  9:17 ` Gleb Natapov
  2013-09-02  9:20 ` Xiao Guangrong
  0 siblings, 2 replies; 17+ messages in thread
From: Paolo Bonzini @ 2013-08-30 12:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, glin, agraf, brogers, afaerber, lnussel, edk2-devel, stable,
	gleb, Xiao Guangrong

Page tables in a read-only memory slot will currently cause a triple
fault because the page walker uses gfn_to_hva and it fails on such a slot.

OVMF uses such a page table; however, real hardware seems to be fine with
that as long as the accessed/dirty bits are set.  Save whether the slot
is readonly, and later check it when updating the accessed and dirty bits.

Cc: stable@vger.kernel.org
Cc: gleb@redhat.com
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	CCing to stable@ since the regression was introduced with
	support for readonly memory slots.

 arch/x86/kvm/paging_tmpl.h |  7 ++++++-
 include/linux/kvm_host.h   |  1 +
 virt/kvm/kvm_main.c        | 14 +++++++++-----
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 0433301..dadc5c0 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -99,6 +99,7 @@ struct guest_walker {
 	pt_element_t prefetch_ptes[PTE_PREFETCH_NUM];
 	gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
 	pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
+	bool pte_writable[PT_MAX_FULL_LEVELS];
 	unsigned pt_access;
 	unsigned pte_access;
 	gfn_t gfn;
@@ -235,6 +236,9 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
 		if (pte == orig_pte)
 			continue;
 
+		if (unlikely(!walker->pte_writable[level - 1]))
+			return -EACCES;
+
 		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, orig_pte, pte);
 		if (ret)
 			return ret;
@@ -309,7 +313,8 @@ retry_walk:
 			goto error;
 		real_gfn = gpa_to_gfn(real_gfn);
 
-		host_addr = gfn_to_hva(vcpu->kvm, real_gfn);
+		host_addr = gfn_to_hva_read(vcpu->kvm, real_gfn,
+					    &walker->pte_writable[walker->level - 1]);
 		if (unlikely(kvm_is_error_hva(host_addr)))
 			goto error;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ca645a0..22f9cdf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -533,6 +533,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages,
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
+unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn, bool *writable);
 unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f7e4334..418d037 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1078,11 +1078,15 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 EXPORT_SYMBOL_GPL(gfn_to_hva);
 
 /*
- * The hva returned by this function is only allowed to be read.
- * It should pair with kvm_read_hva() or kvm_read_hva_atomic().
+ * If writable is set to false, the hva returned by this function is only
+ * allowed to be read.
  */
-static unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn)
+unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn, bool *writable)
 {
+	struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
+	if (writable)
+		*writable = !memslot_is_readonly(slot);
+
 	return __gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL, false);
 }
 
@@ -1450,7 +1454,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
 	int r;
 	unsigned long addr;
 
-	addr = gfn_to_hva_read(kvm, gfn);
+	addr = gfn_to_hva_read(kvm, gfn, NULL);
 	if (kvm_is_error_hva(addr))
 		return -EFAULT;
 	r = kvm_read_hva(data, (void __user *)addr + offset, len);
@@ -1488,7 +1492,7 @@ int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int offset = offset_in_page(gpa);
 
-	addr = gfn_to_hva_read(kvm, gfn);
+	addr = gfn_to_hva_read(kvm, gfn, NULL);
 	if (kvm_is_error_hva(addr))
 		return -EFAULT;
 	pagefault_disable();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-08-30 12:41 [PATCH] KVM: mmu: allow page tables to be in read-only slots Paolo Bonzini
@ 2013-09-01  9:17 ` Gleb Natapov
  2013-09-02  9:42   ` Xiao Guangrong
  2013-10-14 15:28   ` Paolo Bonzini
  2013-09-02  9:20 ` Xiao Guangrong
  1 sibling, 2 replies; 17+ messages in thread
From: Gleb Natapov @ 2013-09-01  9:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, glin, agraf, brogers, afaerber, lnussel,
	edk2-devel, stable, Xiao Guangrong

On Fri, Aug 30, 2013 at 02:41:37PM +0200, Paolo Bonzini wrote:
> Page tables in a read-only memory slot will currently cause a triple
> fault because the page walker uses gfn_to_hva and it fails on such a slot.
> 
> OVMF uses such a page table; however, real hardware seems to be fine with
> that as long as the accessed/dirty bits are set.  Save whether the slot
> is readonly, and later check it when updating the accessed and dirty bits.
> 
The fix looks OK to me, but some comment below.

> Cc: stable@vger.kernel.org
> Cc: gleb@redhat.com
> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> 	CCing to stable@ since the regression was introduced with
> 	support for readonly memory slots.
> 
>  arch/x86/kvm/paging_tmpl.h |  7 ++++++-
>  include/linux/kvm_host.h   |  1 +
>  virt/kvm/kvm_main.c        | 14 +++++++++-----
>  3 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 0433301..dadc5c0 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -99,6 +99,7 @@ struct guest_walker {
>  	pt_element_t prefetch_ptes[PTE_PREFETCH_NUM];
>  	gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
>  	pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
> +	bool pte_writable[PT_MAX_FULL_LEVELS];
>  	unsigned pt_access;
>  	unsigned pte_access;
>  	gfn_t gfn;
> @@ -235,6 +236,9 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>  		if (pte == orig_pte)
>  			continue;
>  
> +		if (unlikely(!walker->pte_writable[level - 1]))
> +			return -EACCES;
> +
>  		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, orig_pte, pte);
>  		if (ret)
>  			return ret;
> @@ -309,7 +313,8 @@ retry_walk:
>  			goto error;
>  		real_gfn = gpa_to_gfn(real_gfn);
>  
> -		host_addr = gfn_to_hva(vcpu->kvm, real_gfn);
> +		host_addr = gfn_to_hva_read(vcpu->kvm, real_gfn,
> +					    &walker->pte_writable[walker->level - 1]);
The use of gfn_to_hva_read is misleading. The code can still write into
gfn. Lets rename gfn_to_hva_read to gfn_to_hva_prot() and gfn_to_hva()
to gfn_to_hva_write().

This makes me think are there other places where gfn_to_hva() was
used, but gfn_to_hva_prot() should have been?
 - kvm_host_page_size() looks incorrect. We never use huge page to map
   read only memory slots currently.
 - kvm_handle_bad_page() also looks incorrect and may cause incorrect
   address to be reported to userspace.
 - kvm_setup_async_pf() also incorrect. Makes all page fault on read
   only slot to be sync.
 - kvm_vm_fault() one looks OK since function assumes write only slots,
   but it is obsolete and should be deleted anyway.

Others in generic and x86 code looks OK, somebody need to check ppc and
arm code.


>  		if (unlikely(kvm_is_error_hva(host_addr)))
>  			goto error;
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ca645a0..22f9cdf 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -533,6 +533,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages,
>  
>  struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
>  unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
> +unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn, bool *writable);
>  unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
>  void kvm_release_page_clean(struct page *page);
>  void kvm_release_page_dirty(struct page *page);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f7e4334..418d037 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1078,11 +1078,15 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
>  EXPORT_SYMBOL_GPL(gfn_to_hva);
>  
>  /*
> - * The hva returned by this function is only allowed to be read.
> - * It should pair with kvm_read_hva() or kvm_read_hva_atomic().
> + * If writable is set to false, the hva returned by this function is only
> + * allowed to be read.
>   */
> -static unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn)
> +unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn, bool *writable)
>  {
> +	struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
> +	if (writable)
> +		*writable = !memslot_is_readonly(slot);
> +
>  	return __gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL, false);
>  }
>  
> @@ -1450,7 +1454,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
>  	int r;
>  	unsigned long addr;
>  
> -	addr = gfn_to_hva_read(kvm, gfn);
> +	addr = gfn_to_hva_read(kvm, gfn, NULL);
>  	if (kvm_is_error_hva(addr))
>  		return -EFAULT;
>  	r = kvm_read_hva(data, (void __user *)addr + offset, len);
> @@ -1488,7 +1492,7 @@ int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
>  	gfn_t gfn = gpa >> PAGE_SHIFT;
>  	int offset = offset_in_page(gpa);
>  
> -	addr = gfn_to_hva_read(kvm, gfn);
> +	addr = gfn_to_hva_read(kvm, gfn, NULL);
>  	if (kvm_is_error_hva(addr))
>  		return -EFAULT;
>  	pagefault_disable();
> -- 
> 1.8.3.1

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-08-30 12:41 [PATCH] KVM: mmu: allow page tables to be in read-only slots Paolo Bonzini
  2013-09-01  9:17 ` Gleb Natapov
@ 2013-09-02  9:20 ` Xiao Guangrong
  2013-09-02  9:25   ` Gleb Natapov
  1 sibling, 1 reply; 17+ messages in thread
From: Xiao Guangrong @ 2013-09-02  9:20 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, glin, agraf, brogers, afaerber, lnussel,
	edk2-devel, stable, gleb

On 08/30/2013 08:41 PM, Paolo Bonzini wrote:
> Page tables in a read-only memory slot will currently cause a triple
> fault because the page walker uses gfn_to_hva and it fails on such a slot.
> 
> OVMF uses such a page table; however, real hardware seems to be fine with
> that as long as the accessed/dirty bits are set.  Save whether the slot
> is readonly, and later check it when updating the accessed and dirty bits.

Paolo, do you know why OVMF is using readonly memory like this?

AFAIK, The fault trigged by this kind of access can hardly be fixed by
userspace since the fault is trigged by pagetable walking not by the current
instruction. Do you have any idea to let uerspace emulate it properly?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02  9:20 ` Xiao Guangrong
@ 2013-09-02  9:25   ` Gleb Natapov
  2013-09-02 10:00     ` Xiao Guangrong
  2013-09-02 16:00     ` Paolo Bonzini
  0 siblings, 2 replies; 17+ messages in thread
From: Gleb Natapov @ 2013-09-02  9:25 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, linux-kernel, kvm, glin, agraf, brogers, afaerber,
	lnussel, edk2-devel, stable

On Mon, Sep 02, 2013 at 05:20:15PM +0800, Xiao Guangrong wrote:
> On 08/30/2013 08:41 PM, Paolo Bonzini wrote:
> > Page tables in a read-only memory slot will currently cause a triple
> > fault because the page walker uses gfn_to_hva and it fails on such a slot.
> > 
> > OVMF uses such a page table; however, real hardware seems to be fine with
> > that as long as the accessed/dirty bits are set.  Save whether the slot
> > is readonly, and later check it when updating the accessed and dirty bits.
> 
> Paolo, do you know why OVMF is using readonly memory like this?
> 
Just a guess, but perhaps they want to move to paging mode as early as
possible even before memory controller is fully initialized.

> AFAIK, The fault trigged by this kind of access can hardly be fixed by
> userspace since the fault is trigged by pagetable walking not by the current
> instruction. Do you have any idea to let uerspace emulate it properly?
Not sure what userspace you mean here, but there shouldn't be a fault in the
first place if ROM page tables have access/dirty bit set and they do.

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-01  9:17 ` Gleb Natapov
@ 2013-09-02  9:42   ` Xiao Guangrong
  2013-09-02  9:49     ` Gleb Natapov
  2013-10-14 15:28   ` Paolo Bonzini
  1 sibling, 1 reply; 17+ messages in thread
From: Xiao Guangrong @ 2013-09-02  9:42 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, linux-kernel, kvm, glin, agraf, brogers, afaerber,
	lnussel, edk2-devel, stable

On 09/01/2013 05:17 PM, Gleb Natapov wrote:
> On Fri, Aug 30, 2013 at 02:41:37PM +0200, Paolo Bonzini wrote:
>> Page tables in a read-only memory slot will currently cause a triple
>> fault because the page walker uses gfn_to_hva and it fails on such a slot.
>>
>> OVMF uses such a page table; however, real hardware seems to be fine with
>> that as long as the accessed/dirty bits are set.  Save whether the slot
>> is readonly, and later check it when updating the accessed and dirty bits.
>>
> The fix looks OK to me, but some comment below.
> 
>> Cc: stable@vger.kernel.org
>> Cc: gleb@redhat.com
>> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> 	CCing to stable@ since the regression was introduced with
>> 	support for readonly memory slots.
>>
>>  arch/x86/kvm/paging_tmpl.h |  7 ++++++-
>>  include/linux/kvm_host.h   |  1 +
>>  virt/kvm/kvm_main.c        | 14 +++++++++-----
>>  3 files changed, 16 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 0433301..dadc5c0 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -99,6 +99,7 @@ struct guest_walker {
>>  	pt_element_t prefetch_ptes[PTE_PREFETCH_NUM];
>>  	gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
>>  	pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
>> +	bool pte_writable[PT_MAX_FULL_LEVELS];
>>  	unsigned pt_access;
>>  	unsigned pte_access;
>>  	gfn_t gfn;
>> @@ -235,6 +236,9 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>>  		if (pte == orig_pte)
>>  			continue;
>>  
>> +		if (unlikely(!walker->pte_writable[level - 1]))
>> +			return -EACCES;
>> +
>>  		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, orig_pte, pte);
>>  		if (ret)
>>  			return ret;
>> @@ -309,7 +313,8 @@ retry_walk:
>>  			goto error;
>>  		real_gfn = gpa_to_gfn(real_gfn);
>>  
>> -		host_addr = gfn_to_hva(vcpu->kvm, real_gfn);
>> +		host_addr = gfn_to_hva_read(vcpu->kvm, real_gfn,
>> +					    &walker->pte_writable[walker->level - 1]);
> The use of gfn_to_hva_read is misleading. The code can still write into
> gfn. Lets rename gfn_to_hva_read to gfn_to_hva_prot() and gfn_to_hva()
> to gfn_to_hva_write().

Yes. I agreed.

> 
> This makes me think are there other places where gfn_to_hva() was
> used, but gfn_to_hva_prot() should have been?
>  - kvm_host_page_size() looks incorrect. We never use huge page to map
>    read only memory slots currently.

It only checks whether gfn have been mapped, I think we can use
gfn_to_hva_read() instead, the real permission will be checked when we translate
the gfn to pfn.

>  - kvm_handle_bad_page() also looks incorrect and may cause incorrect
>    address to be reported to userspace.

I have no idea on this point. kvm_handle_bad_page() is called when it failed to
translate the target gfn to pfn, then the emulator can detect the error on target gfn
properly. no? Or i misunderstood your meaning?

>  - kvm_setup_async_pf() also incorrect. Makes all page fault on read
>    only slot to be sync.
>  - kvm_vm_fault() one looks OK since function assumes write only slots,
>    but it is obsolete and should be deleted anyway.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02  9:42   ` Xiao Guangrong
@ 2013-09-02  9:49     ` Gleb Natapov
  2013-09-02 10:05       ` Xiao Guangrong
  0 siblings, 1 reply; 17+ messages in thread
From: Gleb Natapov @ 2013-09-02  9:49 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, linux-kernel, kvm, glin, agraf, brogers, afaerber,
	lnussel, edk2-devel, stable

On Mon, Sep 02, 2013 at 05:42:25PM +0800, Xiao Guangrong wrote:
> On 09/01/2013 05:17 PM, Gleb Natapov wrote:
> > On Fri, Aug 30, 2013 at 02:41:37PM +0200, Paolo Bonzini wrote:
> >> Page tables in a read-only memory slot will currently cause a triple
> >> fault because the page walker uses gfn_to_hva and it fails on such a slot.
> >>
> >> OVMF uses such a page table; however, real hardware seems to be fine with
> >> that as long as the accessed/dirty bits are set.  Save whether the slot
> >> is readonly, and later check it when updating the accessed and dirty bits.
> >>
> > The fix looks OK to me, but some comment below.
> > 
> >> Cc: stable@vger.kernel.org
> >> Cc: gleb@redhat.com
> >> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >> ---
> >> 	CCing to stable@ since the regression was introduced with
> >> 	support for readonly memory slots.
> >>
> >>  arch/x86/kvm/paging_tmpl.h |  7 ++++++-
> >>  include/linux/kvm_host.h   |  1 +
> >>  virt/kvm/kvm_main.c        | 14 +++++++++-----
> >>  3 files changed, 16 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> >> index 0433301..dadc5c0 100644
> >> --- a/arch/x86/kvm/paging_tmpl.h
> >> +++ b/arch/x86/kvm/paging_tmpl.h
> >> @@ -99,6 +99,7 @@ struct guest_walker {
> >>  	pt_element_t prefetch_ptes[PTE_PREFETCH_NUM];
> >>  	gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
> >>  	pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
> >> +	bool pte_writable[PT_MAX_FULL_LEVELS];
> >>  	unsigned pt_access;
> >>  	unsigned pte_access;
> >>  	gfn_t gfn;
> >> @@ -235,6 +236,9 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
> >>  		if (pte == orig_pte)
> >>  			continue;
> >>  
> >> +		if (unlikely(!walker->pte_writable[level - 1]))
> >> +			return -EACCES;
> >> +
> >>  		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, orig_pte, pte);
> >>  		if (ret)
> >>  			return ret;
> >> @@ -309,7 +313,8 @@ retry_walk:
> >>  			goto error;
> >>  		real_gfn = gpa_to_gfn(real_gfn);
> >>  
> >> -		host_addr = gfn_to_hva(vcpu->kvm, real_gfn);
> >> +		host_addr = gfn_to_hva_read(vcpu->kvm, real_gfn,
> >> +					    &walker->pte_writable[walker->level - 1]);
> > The use of gfn_to_hva_read is misleading. The code can still write into
> > gfn. Lets rename gfn_to_hva_read to gfn_to_hva_prot() and gfn_to_hva()
> > to gfn_to_hva_write().
> 
> Yes. I agreed.
> 
> > 
> > This makes me think are there other places where gfn_to_hva() was
> > used, but gfn_to_hva_prot() should have been?
> >  - kvm_host_page_size() looks incorrect. We never use huge page to map
> >    read only memory slots currently.
> 
> It only checks whether gfn have been mapped, I think we can use
> gfn_to_hva_read() instead, the real permission will be checked when we translate
> the gfn to pfn.
> 
Yes, all the cases I listed should be changed to use function that looks
at both regular and RO slots.

> >  - kvm_handle_bad_page() also looks incorrect and may cause incorrect
> >    address to be reported to userspace.
> 
> I have no idea on this point. kvm_handle_bad_page() is called when it failed to
> translate the target gfn to pfn, then the emulator can detect the error on target gfn
> properly. no? Or i misunderstood your meaning?
> 
I am talking about the following code:

        if (pfn == KVM_PFN_ERR_HWPOISON) {
                kvm_send_hwpoison_signal(gfn_to_hva(vcpu->kvm, gfn), current);
                return 0;
        }

pfn will be KVM_PFN_ERR_HWPOISON gfn is backed by faulty memory, we need
to report the liner address of the faulty memory to a userspace here,
but if gfn is in a RO slot gfn_to_hva() will not return correct address
here.

> >  - kvm_setup_async_pf() also incorrect. Makes all page fault on read
> >    only slot to be sync.
> >  - kvm_vm_fault() one looks OK since function assumes write only slots,
> >    but it is obsolete and should be deleted anyway.
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02  9:25   ` Gleb Natapov
@ 2013-09-02 10:00     ` Xiao Guangrong
  2013-09-02 10:07       ` Gleb Natapov
  2013-09-02 16:00     ` Paolo Bonzini
  1 sibling, 1 reply; 17+ messages in thread
From: Xiao Guangrong @ 2013-09-02 10:00 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, linux-kernel, kvm, glin, agraf, brogers, afaerber,
	lnussel, edk2-devel, stable

On 09/02/2013 05:25 PM, Gleb Natapov wrote:
> On Mon, Sep 02, 2013 at 05:20:15PM +0800, Xiao Guangrong wrote:
>> On 08/30/2013 08:41 PM, Paolo Bonzini wrote:
>>> Page tables in a read-only memory slot will currently cause a triple
>>> fault because the page walker uses gfn_to_hva and it fails on such a slot.
>>>
>>> OVMF uses such a page table; however, real hardware seems to be fine with
>>> that as long as the accessed/dirty bits are set.  Save whether the slot
>>> is readonly, and later check it when updating the accessed and dirty bits.
>>
>> Paolo, do you know why OVMF is using readonly memory like this?
>>
> Just a guess, but perhaps they want to move to paging mode as early as
> possible even before memory controller is fully initialized.
> 
>> AFAIK, The fault trigged by this kind of access can hardly be fixed by
>> userspace since the fault is trigged by pagetable walking not by the current
>> instruction. Do you have any idea to let uerspace emulate it properly?
> Not sure what userspace you mean here, but there shouldn't be a fault in the

I just wonder how to fix this kind of fault. The current patch returns -EACCES
but that will crash the guest. I think we'd better let userspace to fix this
error (let userspace set the D/A bit.)

> first place if ROM page tables have access/dirty bit set and they do.

Yes, so we can not call x86_emulate_instruction() to fix this fault (that function
emulates the access on the first place). Need directly return a MMIO-exit to
userpsace when met this fault? What happen if this fault on pagetable-walking
is trigged in x86_emulate_instruction().?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02  9:49     ` Gleb Natapov
@ 2013-09-02 10:05       ` Xiao Guangrong
  2013-09-02 10:11         ` Gleb Natapov
  0 siblings, 1 reply; 17+ messages in thread
From: Xiao Guangrong @ 2013-09-02 10:05 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, linux-kernel, kvm, glin, agraf, brogers, afaerber,
	lnussel, edk2-devel, stable

On 09/02/2013 05:49 PM, Gleb Natapov wrote:
> On Mon, Sep 02, 2013 at 05:42:25PM +0800, Xiao Guangrong wrote:
>> On 09/01/2013 05:17 PM, Gleb Natapov wrote:
>>> On Fri, Aug 30, 2013 at 02:41:37PM +0200, Paolo Bonzini wrote:
>>>> Page tables in a read-only memory slot will currently cause a triple
>>>> fault because the page walker uses gfn_to_hva and it fails on such a slot.
>>>>
>>>> OVMF uses such a page table; however, real hardware seems to be fine with
>>>> that as long as the accessed/dirty bits are set.  Save whether the slot
>>>> is readonly, and later check it when updating the accessed and dirty bits.
>>>>
>>> The fix looks OK to me, but some comment below.
>>>
>>>> Cc: stable@vger.kernel.org
>>>> Cc: gleb@redhat.com
>>>> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>> ---
>>>> 	CCing to stable@ since the regression was introduced with
>>>> 	support for readonly memory slots.
>>>>
>>>>  arch/x86/kvm/paging_tmpl.h |  7 ++++++-
>>>>  include/linux/kvm_host.h   |  1 +
>>>>  virt/kvm/kvm_main.c        | 14 +++++++++-----
>>>>  3 files changed, 16 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>>>> index 0433301..dadc5c0 100644
>>>> --- a/arch/x86/kvm/paging_tmpl.h
>>>> +++ b/arch/x86/kvm/paging_tmpl.h
>>>> @@ -99,6 +99,7 @@ struct guest_walker {
>>>>  	pt_element_t prefetch_ptes[PTE_PREFETCH_NUM];
>>>>  	gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
>>>>  	pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
>>>> +	bool pte_writable[PT_MAX_FULL_LEVELS];
>>>>  	unsigned pt_access;
>>>>  	unsigned pte_access;
>>>>  	gfn_t gfn;
>>>> @@ -235,6 +236,9 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>>>>  		if (pte == orig_pte)
>>>>  			continue;
>>>>  
>>>> +		if (unlikely(!walker->pte_writable[level - 1]))
>>>> +			return -EACCES;
>>>> +
>>>>  		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, orig_pte, pte);
>>>>  		if (ret)
>>>>  			return ret;
>>>> @@ -309,7 +313,8 @@ retry_walk:
>>>>  			goto error;
>>>>  		real_gfn = gpa_to_gfn(real_gfn);
>>>>  
>>>> -		host_addr = gfn_to_hva(vcpu->kvm, real_gfn);
>>>> +		host_addr = gfn_to_hva_read(vcpu->kvm, real_gfn,
>>>> +					    &walker->pte_writable[walker->level - 1]);
>>> The use of gfn_to_hva_read is misleading. The code can still write into
>>> gfn. Lets rename gfn_to_hva_read to gfn_to_hva_prot() and gfn_to_hva()
>>> to gfn_to_hva_write().
>>
>> Yes. I agreed.
>>
>>>
>>> This makes me think are there other places where gfn_to_hva() was
>>> used, but gfn_to_hva_prot() should have been?
>>>  - kvm_host_page_size() looks incorrect. We never use huge page to map
>>>    read only memory slots currently.
>>
>> It only checks whether gfn have been mapped, I think we can use
>> gfn_to_hva_read() instead, the real permission will be checked when we translate
>> the gfn to pfn.
>>
> Yes, all the cases I listed should be changed to use function that looks
> at both regular and RO slots.
> 
>>>  - kvm_handle_bad_page() also looks incorrect and may cause incorrect
>>>    address to be reported to userspace.
>>
>> I have no idea on this point. kvm_handle_bad_page() is called when it failed to
>> translate the target gfn to pfn, then the emulator can detect the error on target gfn
>> properly. no? Or i misunderstood your meaning?
>>
> I am talking about the following code:
> 
>         if (pfn == KVM_PFN_ERR_HWPOISON) {
>                 kvm_send_hwpoison_signal(gfn_to_hva(vcpu->kvm, gfn), current);
>                 return 0;
>         }
> 
> pfn will be KVM_PFN_ERR_HWPOISON gfn is backed by faulty memory, we need
> to report the liner address of the faulty memory to a userspace here,
> but if gfn is in a RO slot gfn_to_hva() will not return correct address
> here.

Got it, thanks for your explanation.

BTW, if you and Paolo are busy on other things, i am happy to fix these issues. :)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02 10:00     ` Xiao Guangrong
@ 2013-09-02 10:07       ` Gleb Natapov
  2013-09-02 15:56         ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Gleb Natapov @ 2013-09-02 10:07 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, linux-kernel, kvm, glin, agraf, brogers, afaerber,
	lnussel, edk2-devel, stable

On Mon, Sep 02, 2013 at 06:00:39PM +0800, Xiao Guangrong wrote:
> On 09/02/2013 05:25 PM, Gleb Natapov wrote:
> > On Mon, Sep 02, 2013 at 05:20:15PM +0800, Xiao Guangrong wrote:
> >> On 08/30/2013 08:41 PM, Paolo Bonzini wrote:
> >>> Page tables in a read-only memory slot will currently cause a triple
> >>> fault because the page walker uses gfn_to_hva and it fails on such a slot.
> >>>
> >>> OVMF uses such a page table; however, real hardware seems to be fine with
> >>> that as long as the accessed/dirty bits are set.  Save whether the slot
> >>> is readonly, and later check it when updating the accessed and dirty bits.
> >>
> >> Paolo, do you know why OVMF is using readonly memory like this?
> >>
> > Just a guess, but perhaps they want to move to paging mode as early as
> > possible even before memory controller is fully initialized.
> > 
> >> AFAIK, The fault trigged by this kind of access can hardly be fixed by
> >> userspace since the fault is trigged by pagetable walking not by the current
> >> instruction. Do you have any idea to let uerspace emulate it properly?
> > Not sure what userspace you mean here, but there shouldn't be a fault in the
> 
> I just wonder how to fix this kind of fault. The current patch returns -EACCES
> but that will crash the guest. I think we'd better let userspace to fix this
> error (let userspace set the D/A bit.)
> 
Ugh, this is not good. Missed that. Don't know what real HW will do
here, but the easy thing for us to do would be to just return success.

> > first place if ROM page tables have access/dirty bit set and they do.
> 
> Yes, so we can not call x86_emulate_instruction() to fix this fault (that function
> emulates the access on the first place). Need directly return a MMIO-exit to
> userpsace when met this fault? What happen if this fault on pagetable-walking
> is trigged in x86_emulate_instruction().?
I think we should not return MMIO-exit to userspace. Either ignore write attempt
or kill a guest.

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02 10:05       ` Xiao Guangrong
@ 2013-09-02 10:11         ` Gleb Natapov
  2013-09-02 15:58           ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Gleb Natapov @ 2013-09-02 10:11 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, linux-kernel, kvm, glin, agraf, brogers, afaerber,
	lnussel, edk2-devel, stable

On Mon, Sep 02, 2013 at 06:05:10PM +0800, Xiao Guangrong wrote:
> On 09/02/2013 05:49 PM, Gleb Natapov wrote:
> > On Mon, Sep 02, 2013 at 05:42:25PM +0800, Xiao Guangrong wrote:
> >> On 09/01/2013 05:17 PM, Gleb Natapov wrote:
> >>> On Fri, Aug 30, 2013 at 02:41:37PM +0200, Paolo Bonzini wrote:
> >>>> Page tables in a read-only memory slot will currently cause a triple
> >>>> fault because the page walker uses gfn_to_hva and it fails on such a slot.
> >>>>
> >>>> OVMF uses such a page table; however, real hardware seems to be fine with
> >>>> that as long as the accessed/dirty bits are set.  Save whether the slot
> >>>> is readonly, and later check it when updating the accessed and dirty bits.
> >>>>
> >>> The fix looks OK to me, but some comment below.
> >>>
> >>>> Cc: stable@vger.kernel.org
> >>>> Cc: gleb@redhat.com
> >>>> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> >>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>> ---
> >>>> 	CCing to stable@ since the regression was introduced with
> >>>> 	support for readonly memory slots.
> >>>>
> >>>>  arch/x86/kvm/paging_tmpl.h |  7 ++++++-
> >>>>  include/linux/kvm_host.h   |  1 +
> >>>>  virt/kvm/kvm_main.c        | 14 +++++++++-----
> >>>>  3 files changed, 16 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> >>>> index 0433301..dadc5c0 100644
> >>>> --- a/arch/x86/kvm/paging_tmpl.h
> >>>> +++ b/arch/x86/kvm/paging_tmpl.h
> >>>> @@ -99,6 +99,7 @@ struct guest_walker {
> >>>>  	pt_element_t prefetch_ptes[PTE_PREFETCH_NUM];
> >>>>  	gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
> >>>>  	pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
> >>>> +	bool pte_writable[PT_MAX_FULL_LEVELS];
> >>>>  	unsigned pt_access;
> >>>>  	unsigned pte_access;
> >>>>  	gfn_t gfn;
> >>>> @@ -235,6 +236,9 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
> >>>>  		if (pte == orig_pte)
> >>>>  			continue;
> >>>>  
> >>>> +		if (unlikely(!walker->pte_writable[level - 1]))
> >>>> +			return -EACCES;
> >>>> +
> >>>>  		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, orig_pte, pte);
> >>>>  		if (ret)
> >>>>  			return ret;
> >>>> @@ -309,7 +313,8 @@ retry_walk:
> >>>>  			goto error;
> >>>>  		real_gfn = gpa_to_gfn(real_gfn);
> >>>>  
> >>>> -		host_addr = gfn_to_hva(vcpu->kvm, real_gfn);
> >>>> +		host_addr = gfn_to_hva_read(vcpu->kvm, real_gfn,
> >>>> +					    &walker->pte_writable[walker->level - 1]);
> >>> The use of gfn_to_hva_read is misleading. The code can still write into
> >>> gfn. Lets rename gfn_to_hva_read to gfn_to_hva_prot() and gfn_to_hva()
> >>> to gfn_to_hva_write().
> >>
> >> Yes. I agreed.
> >>
> >>>
> >>> This makes me think are there other places where gfn_to_hva() was
> >>> used, but gfn_to_hva_prot() should have been?
> >>>  - kvm_host_page_size() looks incorrect. We never use huge page to map
> >>>    read only memory slots currently.
> >>
> >> It only checks whether gfn have been mapped, I think we can use
> >> gfn_to_hva_read() instead, the real permission will be checked when we translate
> >> the gfn to pfn.
> >>
> > Yes, all the cases I listed should be changed to use function that looks
> > at both regular and RO slots.
> > 
> >>>  - kvm_handle_bad_page() also looks incorrect and may cause incorrect
> >>>    address to be reported to userspace.
> >>
> >> I have no idea on this point. kvm_handle_bad_page() is called when it failed to
> >> translate the target gfn to pfn, then the emulator can detect the error on target gfn
> >> properly. no? Or i misunderstood your meaning?
> >>
> > I am talking about the following code:
> > 
> >         if (pfn == KVM_PFN_ERR_HWPOISON) {
> >                 kvm_send_hwpoison_signal(gfn_to_hva(vcpu->kvm, gfn), current);
> >                 return 0;
> >         }
> > 
> > pfn will be KVM_PFN_ERR_HWPOISON gfn is backed by faulty memory, we need
> > to report the liner address of the faulty memory to a userspace here,
> > but if gfn is in a RO slot gfn_to_hva() will not return correct address
> > here.
> 
> Got it, thanks for your explanation.
> 
> BTW, if you and Paolo are busy on other things, i am happy to fix these issues. :)
I am busy with reviews mostly :). If you are not to busy with lockless
write protection then fine with me. Lest wait for Paolo's input on
proposed API though.

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02 10:07       ` Gleb Natapov
@ 2013-09-02 15:56         ` Paolo Bonzini
  2013-09-02 16:26           ` Gleb Natapov
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2013-09-02 15:56 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Xiao Guangrong, linux-kernel, kvm, glin, agraf, brogers,
	afaerber, lnussel, edk2-devel, stable

Il 02/09/2013 12:07, Gleb Natapov ha scritto:
> On Mon, Sep 02, 2013 at 06:00:39PM +0800, Xiao Guangrong wrote:
>> On 09/02/2013 05:25 PM, Gleb Natapov wrote:
>>> On Mon, Sep 02, 2013 at 05:20:15PM +0800, Xiao Guangrong wrote:
>>>> On 08/30/2013 08:41 PM, Paolo Bonzini wrote:
>>>>> Page tables in a read-only memory slot will currently cause a triple
>>>>> fault because the page walker uses gfn_to_hva and it fails on such a slot.
>>>>>
>>>>> OVMF uses such a page table; however, real hardware seems to be fine with
>>>>> that as long as the accessed/dirty bits are set.  Save whether the slot
>>>>> is readonly, and later check it when updating the accessed and dirty bits.
>>>>
>>>> Paolo, do you know why OVMF is using readonly memory like this?
>>>>
>>> Just a guess, but perhaps they want to move to paging mode as early as
>>> possible even before memory controller is fully initialized.
>>>
>>>> AFAIK, The fault trigged by this kind of access can hardly be fixed by
>>>> userspace since the fault is trigged by pagetable walking not by the current
>>>> instruction. Do you have any idea to let uerspace emulate it properly?
>>> Not sure what userspace you mean here, but there shouldn't be a fault in the
>>
>> I just wonder how to fix this kind of fault. The current patch returns -EACCES
>> but that will crash the guest. I think we'd better let userspace to fix this
>> error (let userspace set the D/A bit.)
>>
> Ugh, this is not good. Missed that. Don't know what real HW will do
> here, but the easy thing for us to do would be to just return success.

Real hardware would just do a memory write.  What happens depends on
what is on the bus, i.e. on what the ROM is used for.

QEMU uses read-only slots for two things: actual read-only memory where
writes go to the bitbucket, and "ROMD" memory where writes are treated
as MMIO.

So, in the first case we would ignore the write.  In the second we would
do an MMIO exit to userspace.  But ignoring the write isn't always
correct, and doing an MMIO exit is complicated, so I would just kill the
guest.

EPT will probably write to the read-only slots without thinking much
about it.

My patch injects a page fault, which is very likely to escalate to a
triple fault.  This is probably never what you want---on the other hand,
I wasn't sure what level of accuracy we want in this case, given that
EPT does it wrong too.

Paolo

>>> first place if ROM page tables have access/dirty bit set and they do.
>>
>> Yes, so we can not call x86_emulate_instruction() to fix this fault (that function
>> emulates the access on the first place). Need directly return a MMIO-exit to
>> userpsace when met this fault? What happen if this fault on pagetable-walking
>> is trigged in x86_emulate_instruction().?
> I think we should not return MMIO-exit to userspace. Either ignore write attempt
> or kill a guest.
> 
> --
> 			Gleb.
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02 10:11         ` Gleb Natapov
@ 2013-09-02 15:58           ` Paolo Bonzini
  2013-09-02 16:27             ` Gleb Natapov
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2013-09-02 15:58 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Xiao Guangrong, linux-kernel, kvm, glin, agraf, brogers,
	afaerber, lnussel, edk2-devel, stable

Il 02/09/2013 12:11, Gleb Natapov ha scritto:
>> > 
>> > Got it, thanks for your explanation.
>> > 
>> > BTW, if you and Paolo are busy on other things, i am happy to fix these issues. :)
> I am busy with reviews mostly :). If you are not to busy with lockless
> write protection then fine with me. Lest wait for Paolo's input on
> proposed API though.

No problem, I'm happy to do this kind of exercise.  I like the API you
proposed.  I can do this change on top of the API change.  However,
since this bug is "in the wild" it may be applicable to stable@ as well.
 If you agree, I will also post v2 of this patch for stable kernels as
soon as we agree on the desired faulting behavior.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02  9:25   ` Gleb Natapov
  2013-09-02 10:00     ` Xiao Guangrong
@ 2013-09-02 16:00     ` Paolo Bonzini
  2013-09-02 16:16       ` Gleb Natapov
  1 sibling, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2013-09-02 16:00 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Xiao Guangrong, linux-kernel, kvm, glin, agraf, brogers,
	afaerber, lnussel, edk2-devel, stable

Il 02/09/2013 11:25, Gleb Natapov ha scritto:
> On Mon, Sep 02, 2013 at 05:20:15PM +0800, Xiao Guangrong wrote:
>> On 08/30/2013 08:41 PM, Paolo Bonzini wrote:
>>> Page tables in a read-only memory slot will currently cause a triple
>>> fault because the page walker uses gfn_to_hva and it fails on such a slot.
>>>
>>> OVMF uses such a page table; however, real hardware seems to be fine with
>>> that as long as the accessed/dirty bits are set.  Save whether the slot
>>> is readonly, and later check it when updating the accessed and dirty bits.
>>
>> Paolo, do you know why OVMF is using readonly memory like this?
>>
> Just a guess, but perhaps they want to move to paging mode as early as
> possible even before memory controller is fully initialized.

More precisely they want to move to 64-bit mode as early as possible,
and that requires paging.

>> AFAIK, The fault trigged by this kind of access can hardly be fixed by
>> userspace since the fault is trigged by pagetable walking not by the current
>> instruction. Do you have any idea to let uerspace emulate it properly?
>
> Not sure what userspace you mean here, but there shouldn't be a fault in the
> first place if ROM page tables have access/dirty bit set and they do.

Yep.  Actually they do not set dirty on the PD/PDP/PML4, only on the
bottom level (which is also fine).

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02 16:00     ` Paolo Bonzini
@ 2013-09-02 16:16       ` Gleb Natapov
  0 siblings, 0 replies; 17+ messages in thread
From: Gleb Natapov @ 2013-09-02 16:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, linux-kernel, kvm, glin, agraf, brogers,
	afaerber, lnussel, edk2-devel, stable

On Mon, Sep 02, 2013 at 06:00:08PM +0200, Paolo Bonzini wrote:
> Il 02/09/2013 11:25, Gleb Natapov ha scritto:
> > On Mon, Sep 02, 2013 at 05:20:15PM +0800, Xiao Guangrong wrote:
> >> On 08/30/2013 08:41 PM, Paolo Bonzini wrote:
> >>> Page tables in a read-only memory slot will currently cause a triple
> >>> fault because the page walker uses gfn_to_hva and it fails on such a slot.
> >>>
> >>> OVMF uses such a page table; however, real hardware seems to be fine with
> >>> that as long as the accessed/dirty bits are set.  Save whether the slot
> >>> is readonly, and later check it when updating the accessed and dirty bits.
> >>
> >> Paolo, do you know why OVMF is using readonly memory like this?
> >>
> > Just a guess, but perhaps they want to move to paging mode as early as
> > possible even before memory controller is fully initialized.
> 
> More precisely they want to move to 64-bit mode as early as possible,
> and that requires paging.
> 
> >> AFAIK, The fault trigged by this kind of access can hardly be fixed by
> >> userspace since the fault is trigged by pagetable walking not by the current
> >> instruction. Do you have any idea to let uerspace emulate it properly?
> >
> > Not sure what userspace you mean here, but there shouldn't be a fault in the
> > first place if ROM page tables have access/dirty bit set and they do.
> 
> Yep.  Actually they do not set dirty on the PD/PDP/PML4, only on the
> bottom level (which is also fine).
> 
They do it for a good reason, dirty bit exists only on the bottom level :)

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02 15:56         ` Paolo Bonzini
@ 2013-09-02 16:26           ` Gleb Natapov
  0 siblings, 0 replies; 17+ messages in thread
From: Gleb Natapov @ 2013-09-02 16:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, linux-kernel, kvm, glin, agraf, brogers,
	afaerber, lnussel, edk2-devel, stable

On Mon, Sep 02, 2013 at 05:56:07PM +0200, Paolo Bonzini wrote:
> Il 02/09/2013 12:07, Gleb Natapov ha scritto:
> > On Mon, Sep 02, 2013 at 06:00:39PM +0800, Xiao Guangrong wrote:
> >> On 09/02/2013 05:25 PM, Gleb Natapov wrote:
> >>> On Mon, Sep 02, 2013 at 05:20:15PM +0800, Xiao Guangrong wrote:
> >>>> On 08/30/2013 08:41 PM, Paolo Bonzini wrote:
> >>>>> Page tables in a read-only memory slot will currently cause a triple
> >>>>> fault because the page walker uses gfn_to_hva and it fails on such a slot.
> >>>>>
> >>>>> OVMF uses such a page table; however, real hardware seems to be fine with
> >>>>> that as long as the accessed/dirty bits are set.  Save whether the slot
> >>>>> is readonly, and later check it when updating the accessed and dirty bits.
> >>>>
> >>>> Paolo, do you know why OVMF is using readonly memory like this?
> >>>>
> >>> Just a guess, but perhaps they want to move to paging mode as early as
> >>> possible even before memory controller is fully initialized.
> >>>
> >>>> AFAIK, The fault trigged by this kind of access can hardly be fixed by
> >>>> userspace since the fault is trigged by pagetable walking not by the current
> >>>> instruction. Do you have any idea to let uerspace emulate it properly?
> >>> Not sure what userspace you mean here, but there shouldn't be a fault in the
> >>
> >> I just wonder how to fix this kind of fault. The current patch returns -EACCES
> >> but that will crash the guest. I think we'd better let userspace to fix this
> >> error (let userspace set the D/A bit.)
> >>
> > Ugh, this is not good. Missed that. Don't know what real HW will do
> > here, but the easy thing for us to do would be to just return success.
> 
> Real hardware would just do a memory write.  What happens depends on
> what is on the bus, i.e. on what the ROM is used for.
> 
> QEMU uses read-only slots for two things: actual read-only memory where
> writes go to the bitbucket, and "ROMD" memory where writes are treated
> as MMIO.
> 
> So, in the first case we would ignore the write.  In the second we would
> do an MMIO exit to userspace.  But ignoring the write isn't always
> correct, and doing an MMIO exit is complicated, so I would just kill the
> guest.
> 
> EPT will probably write to the read-only slots without thinking much
> about it.
> 
> My patch injects a page fault, which is very likely to escalate to a
> triple fault.  This is probably never what you want---on the other hand,
> I wasn't sure what level of accuracy we want in this case, given that
> EPT does it wrong too.
> 
Just ignoring write into RO slot is OK. We do not handle page tables in
MMIO memory, so handling page tables in ROMD memory shouldn't be a
priority either. 

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-02 15:58           ` Paolo Bonzini
@ 2013-09-02 16:27             ` Gleb Natapov
  0 siblings, 0 replies; 17+ messages in thread
From: Gleb Natapov @ 2013-09-02 16:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, linux-kernel, kvm, glin, agraf, brogers,
	afaerber, lnussel, edk2-devel, stable

On Mon, Sep 02, 2013 at 05:58:41PM +0200, Paolo Bonzini wrote:
> Il 02/09/2013 12:11, Gleb Natapov ha scritto:
> >> > 
> >> > Got it, thanks for your explanation.
> >> > 
> >> > BTW, if you and Paolo are busy on other things, i am happy to fix these issues. :)
> > I am busy with reviews mostly :). If you are not to busy with lockless
> > write protection then fine with me. Lest wait for Paolo's input on
> > proposed API though.
> 
> No problem, I'm happy to do this kind of exercise.  I like the API you
> proposed.  I can do this change on top of the API change.  However,
> since this bug is "in the wild" it may be applicable to stable@ as well.
>  If you agree, I will also post v2 of this patch for stable kernels as
> soon as we agree on the desired faulting behavior.
> 
I am OK with having minimal patch for backporting to stable and doing
API changes + fixes for other call sites on top.

--
			Gleb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
  2013-09-01  9:17 ` Gleb Natapov
  2013-09-02  9:42   ` Xiao Guangrong
@ 2013-10-14 15:28   ` Paolo Bonzini
  1 sibling, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2013-10-14 15:28 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: linux-kernel, kvm, glin, agraf, brogers, afaerber, lnussel,
	edk2-devel, stable, Xiao Guangrong

Il 01/09/2013 11:17, Gleb Natapov ha scritto:
> This makes me think are there other places where gfn_to_hva() was
> used, but gfn_to_hva_prot() should have been?
>  - kvm_host_page_size() looks incorrect. We never use huge page to map
>    read only memory slots currently.
>  - kvm_handle_bad_page() also looks incorrect and may cause incorrect
>    address to be reported to userspace.
>  - kvm_setup_async_pf() also incorrect. Makes all page fault on read
>    only slot to be sync.
>  - kvm_vm_fault() one looks OK since function assumes write only slots,
>    but it is obsolete and should be deleted anyway.
> 
> Others in generic and x86 code looks OK, somebody need to check ppc and
> arm code.

PPC is a mess, for s390 I couldn't find a spec.  ARM needs to be fixed.
 Patches coming...

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-10-14 15:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-30 12:41 [PATCH] KVM: mmu: allow page tables to be in read-only slots Paolo Bonzini
2013-09-01  9:17 ` Gleb Natapov
2013-09-02  9:42   ` Xiao Guangrong
2013-09-02  9:49     ` Gleb Natapov
2013-09-02 10:05       ` Xiao Guangrong
2013-09-02 10:11         ` Gleb Natapov
2013-09-02 15:58           ` Paolo Bonzini
2013-09-02 16:27             ` Gleb Natapov
2013-10-14 15:28   ` Paolo Bonzini
2013-09-02  9:20 ` Xiao Guangrong
2013-09-02  9:25   ` Gleb Natapov
2013-09-02 10:00     ` Xiao Guangrong
2013-09-02 10:07       ` Gleb Natapov
2013-09-02 15:56         ` Paolo Bonzini
2013-09-02 16:26           ` Gleb Natapov
2013-09-02 16:00     ` Paolo Bonzini
2013-09-02 16:16       ` Gleb Natapov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).