All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
@ 2017-04-05 13:07 Filippo Sironi
  2017-04-06 14:22 ` Radim Krčmář
  2017-04-13 15:20 ` Filippo Sironi
  0 siblings, 2 replies; 6+ messages in thread
From: Filippo Sironi @ 2017-04-05 13:07 UTC (permalink / raw)
  To: sironi; +Cc: Filippo Sironi, Anthony Liguori, kvm, linux-kernel

cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of
pages and the respective struct pages for mapping in the kernel virtual
address space.
This doesn't work if get_user_pages_fast() is invoked with a userspace
virtual address that's backed by PFNs outside of kernel reach (e.g.,
when limiting the kernel memory with mem= in the command line and using
/dev/mem to map memory).

If get_user_pages_fast() fails, look up the VMA that backs the userspace
virtual address, compute the PFN and the physical address, and map it in
the kernel virtual address space with memremap().

Signed-off-by: Filippo Sironi <sironi@amazon.de>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kvm/paging_tmpl.h | 39 ++++++++++++++++++++++++++++++---------
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index a01105485315..ab4d6617238c 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -147,15 +147,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	struct page *page;
 
 	npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page);
-	/* Check if the user is doing something meaningless. */
-	if (unlikely(npages != 1))
-		return -EFAULT;
-
-	table = kmap_atomic(page);
-	ret = CMPXCHG(&table[index], orig_pte, new_pte);
-	kunmap_atomic(table);
-
-	kvm_release_page_dirty(page);
+	if (likely(npages == 1)) {
+		table = kmap_atomic(page);
+		ret = CMPXCHG(&table[index], orig_pte, new_pte);
+		kunmap_atomic(table);
+
+		kvm_release_page_dirty(page);
+	} else {
+		struct vm_area_struct *vma;
+		unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
+		unsigned long pfn;
+		unsigned long paddr;
+
+		down_read(&current->mm->mmap_sem);
+		vma = find_vma_intersection(current->mm, vaddr,
+					    vaddr + PAGE_SIZE);
+		if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
+			up_read(&current->mm->mmap_sem);
+			return -EFAULT;
+		}
+		pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+		paddr = pfn << PAGE_SHIFT;
+		table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
+		if (!table) {
+			up_read(&current->mm->mmap_sem);
+			return -EFAULT;
+		}
+		ret = CMPXCHG(&table[index], orig_pte, new_pte);
+		memunmap(table);
+		up_read(&current->mm->mmap_sem);
+	}
 
 	return (ret != orig_pte);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
  2017-04-05 13:07 [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs Filippo Sironi
@ 2017-04-06 14:22 ` Radim Krčmář
  2017-04-12 13:16   ` Sironi, Filippo
  2017-04-13 15:20 ` Filippo Sironi
  1 sibling, 1 reply; 6+ messages in thread
From: Radim Krčmář @ 2017-04-06 14:22 UTC (permalink / raw)
  To: Filippo Sironi; +Cc: sironi, Anthony Liguori, kvm, linux-kernel

2017-04-05 15:07+0200, Filippo Sironi:
> cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of
> pages and the respective struct pages for mapping in the kernel virtual
> address space.
> This doesn't work if get_user_pages_fast() is invoked with a userspace
> virtual address that's backed by PFNs outside of kernel reach (e.g.,
> when limiting the kernel memory with mem= in the command line and using
> /dev/mem to map memory).
> 
> If get_user_pages_fast() fails, look up the VMA that backs the userspace
> virtual address, compute the PFN and the physical address, and map it in
> the kernel virtual address space with memremap().

What is the reason for a configuration that voluntarily restricts access
to memory that it needs?

> Signed-off-by: Filippo Sironi <sironi@amazon.de>
> Cc: Anthony Liguori <aliguori@amazon.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> @@ -147,15 +147,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  	struct page *page;
>  
>  	npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page);
> -	/* Check if the user is doing something meaningless. */
> -	if (unlikely(npages != 1))
> -		return -EFAULT;
> -
> -	table = kmap_atomic(page);
> -	ret = CMPXCHG(&table[index], orig_pte, new_pte);
> -	kunmap_atomic(table);
> -
> -	kvm_release_page_dirty(page);
> +	if (likely(npages == 1)) {
> +		table = kmap_atomic(page);
> +		ret = CMPXCHG(&table[index], orig_pte, new_pte);
> +		kunmap_atomic(table);
> +
> +		kvm_release_page_dirty(page);
> +	} else {
> +		struct vm_area_struct *vma;
> +		unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
> +		unsigned long pfn;
> +		unsigned long paddr;
> +
> +		down_read(&current->mm->mmap_sem);
> +		vma = find_vma_intersection(current->mm, vaddr,
> +					    vaddr + PAGE_SIZE);

Hm, with the argument order like this, we check that

  vaddr < vma->vm_end && vaddr + PAGE_SIZE > vma->vm_start

but shouldn't we actually check that the whole page is there, i.e.

  vaddr + PAGE_SIZE < vma->vm_end && vaddr > vma->vm_start

?

Thanks.

> +		if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
> +			up_read(&current->mm->mmap_sem);
> +			return -EFAULT;
> +		}
> +		pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> +		paddr = pfn << PAGE_SHIFT;
> +		table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);

(I don't undestand why there isn't a wrapper for this ...
 Looks like we're doing something unexpected.)

> +		if (!table) {
> +			up_read(&current->mm->mmap_sem);
> +			return -EFAULT;
> +		}
> +		ret = CMPXCHG(&table[index], orig_pte, new_pte);
> +		memunmap(table);
> +		up_read(&current->mm->mmap_sem);
> +	}
>  
>  	return (ret != orig_pte);
>  }
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
  2017-04-06 14:22 ` Radim Krčmář
@ 2017-04-12 13:16   ` Sironi, Filippo
  2017-04-17 11:26     ` Xiao Guangrong
  0 siblings, 1 reply; 6+ messages in thread
From: Sironi, Filippo @ 2017-04-12 13:16 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Sironi, Filippo, Liguori, Anthony, kvm, linux-kernel

Thanks for taking the time and sorry for the delay.

> On 6. Apr 2017, at 16:22, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 
> 2017-04-05 15:07+0200, Filippo Sironi:
>> cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of
>> pages and the respective struct pages for mapping in the kernel virtual
>> address space.
>> This doesn't work if get_user_pages_fast() is invoked with a userspace
>> virtual address that's backed by PFNs outside of kernel reach (e.g.,
>> when limiting the kernel memory with mem= in the command line and using
>> /dev/mem to map memory).
>> 
>> If get_user_pages_fast() fails, look up the VMA that backs the userspace
>> virtual address, compute the PFN and the physical address, and map it in
>> the kernel virtual address space with memremap().
> 
> What is the reason for a configuration that voluntarily restricts access
> to memory that it needs?

By using /dev/mem to provide VM memory, one can avoid the overhead of allocating struct page(s) for the whole memory, which is wasteful when using a server entirely for hosting VMs.

>> Signed-off-by: Filippo Sironi <sironi@amazon.de>
>> Cc: Anthony Liguori <aliguori@amazon.com>
>> Cc: kvm@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> @@ -147,15 +147,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>> 	struct page *page;
>> 
>> 	npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page);
>> -	/* Check if the user is doing something meaningless. */
>> -	if (unlikely(npages != 1))
>> -		return -EFAULT;
>> -
>> -	table = kmap_atomic(page);
>> -	ret = CMPXCHG(&table[index], orig_pte, new_pte);
>> -	kunmap_atomic(table);
>> -
>> -	kvm_release_page_dirty(page);
>> +	if (likely(npages == 1)) {
>> +		table = kmap_atomic(page);
>> +		ret = CMPXCHG(&table[index], orig_pte, new_pte);
>> +		kunmap_atomic(table);
>> +
>> +		kvm_release_page_dirty(page);
>> +	} else {
>> +		struct vm_area_struct *vma;
>> +		unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
>> +		unsigned long pfn;
>> +		unsigned long paddr;
>> +
>> +		down_read(&current->mm->mmap_sem);
>> +		vma = find_vma_intersection(current->mm, vaddr,
>> +					    vaddr + PAGE_SIZE);
> 
> Hm, with the argument order like this, we check that
> 
>  vaddr < vma->vm_end && vaddr + PAGE_SIZE > vma->vm_start
> 
> but shouldn't we actually check that the whole page is there, i.e.
> 
>  vaddr + PAGE_SIZE < vma->vm_end && vaddr > vma->vm_start
> 
> ?
> 
> Thanks.

Hm, right now we check for the following:

    vaddr >= vma->vm_start && vaddr < vma->vm_end && vaddr + PAGE_SIZE > vma->vm_start

given that vaddr is PAGE_SIZE aligned, we're guaranteed that vaddr + PAGE_SIZE <= vma->vm_end.
This seems more complex than necessary. I believe that:

    vma = find_vma(current->mm, vaddr);

should be enough.

>> +		if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
>> +			up_read(&current->mm->mmap_sem);
>> +			return -EFAULT;
>> +		}
>> +		pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>> +		paddr = pfn << PAGE_SHIFT;
>> +		table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
> 
> (I don't undestand why there isn't a wrapper for this ...
> Looks like we're doing something unexpected.)

Do you mean a wrapper for getting the pfn/paddr?

>> +		if (!table) {
>> +			up_read(&current->mm->mmap_sem);
>> +			return -EFAULT;
>> +		}
>> +		ret = CMPXCHG(&table[index], orig_pte, new_pte);
>> +		memunmap(table);
>> +		up_read(&current->mm->mmap_sem);
>> +	}
>> 
>> 	return (ret != orig_pte);
>> }
>> -- 
>> 2.7.4

I'll submit a v2 version soon.

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
  2017-04-05 13:07 [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs Filippo Sironi
  2017-04-06 14:22 ` Radim Krčmář
@ 2017-04-13 15:20 ` Filippo Sironi
  1 sibling, 0 replies; 6+ messages in thread
From: Filippo Sironi @ 2017-04-13 15:20 UTC (permalink / raw)
  To: kvm; +Cc: Filippo Sironi, Anthony Liguori, linux-kernel

cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of
pages and the respective struct pages for mapping in the kernel virtual
address space.
This doesn't work if get_user_pages_fast() is invoked with a userspace
virtual address that's backed by PFNs outside of kernel reach (e.g.,
when limiting the kernel memory with mem= in the command line and using
/dev/mem to map memory).

If get_user_pages_fast() fails, look up the VMA that backs the userspace
virtual address, compute the PFN and the physical address, and map it in
the kernel virtual address space with memremap().

Signed-off-by: Filippo Sironi <sironi@amazon.de>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kvm/paging_tmpl.h | 42 +++++++++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index a01105485315..b3d7a117179d 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -147,15 +147,39 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	struct page *page;
 
 	npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page);
-	/* Check if the user is doing something meaningless. */
-	if (unlikely(npages != 1))
-		return -EFAULT;
-
-	table = kmap_atomic(page);
-	ret = CMPXCHG(&table[index], orig_pte, new_pte);
-	kunmap_atomic(table);
-
-	kvm_release_page_dirty(page);
+	if (likely(npages == 1)) {
+		table = kmap_atomic(page);
+		ret = CMPXCHG(&table[index], orig_pte, new_pte);
+		kunmap_atomic(table);
+
+		kvm_release_page_dirty(page);
+	} else {
+		struct vm_area_struct *vma;
+		unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
+		unsigned long pfn;
+		unsigned long paddr;
+
+		down_read(&current->mm->mmap_sem);
+		/*
+		 * vaddr is page-aligned, if a vma exists, it must cover
+		 * ptep_user
+		 */
+		vma = find_vma(current->mm, vaddr);
+		if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
+			up_read(&current->mm->mmap_sem);
+			return -EFAULT;
+		}
+		pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+		paddr = pfn << PAGE_SHIFT;
+		table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
+		if (!table) {
+			up_read(&current->mm->mmap_sem);
+			return -EFAULT;
+		}
+		ret = CMPXCHG(&table[index], orig_pte, new_pte);
+		memunmap(table);
+		up_read(&current->mm->mmap_sem);
+	}
 
 	return (ret != orig_pte);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
  2017-04-12 13:16   ` Sironi, Filippo
@ 2017-04-17 11:26     ` Xiao Guangrong
  2017-04-18  8:12       ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Xiao Guangrong @ 2017-04-17 11:26 UTC (permalink / raw)
  To: Sironi, Filippo, Radim Krčmář
  Cc: Liguori, Anthony, kvm, linux-kernel



On 04/12/2017 09:16 PM, Sironi, Filippo wrote:
> Thanks for taking the time and sorry for the delay.
>
>> On 6. Apr 2017, at 16:22, Radim Krčmář <rkrcmar@redhat.com> wrote:
>>
>> 2017-04-05 15:07+0200, Filippo Sironi:
>>> cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of
>>> pages and the respective struct pages for mapping in the kernel virtual
>>> address space.
>>> This doesn't work if get_user_pages_fast() is invoked with a userspace
>>> virtual address that's backed by PFNs outside of kernel reach (e.g.,
>>> when limiting the kernel memory with mem= in the command line and using
>>> /dev/mem to map memory).
>>>
>>> If get_user_pages_fast() fails, look up the VMA that backs the userspace
>>> virtual address, compute the PFN and the physical address, and map it in
>>> the kernel virtual address space with memremap().
>>
>> What is the reason for a configuration that voluntarily restricts access
>> to memory that it needs?
>
> By using /dev/mem to provide VM memory, one can avoid the overhead of allocating struct page(s) for the whole memory, which is wasteful when using a server entirely for hosting VMs.
>

Sounds reasonable, however it is incomplete so far as there are some
code paths still do not support non-page backend memory, e.g,
emulator_cmpxchg_emulated().

I would suggest to unify the code introduced in this patch with existing
hva_to_pfn(), also we can introduce a common API, maybe named
kvm_map_hva(), to improve the caller sides.


BTW, i do not know why we used kmap_atomic() rather than kmap(), the
path of cmpxchg_gpte() is sleep-able anyway.

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
  2017-04-17 11:26     ` Xiao Guangrong
@ 2017-04-18  8:12       ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2017-04-18  8:12 UTC (permalink / raw)
  To: Xiao Guangrong, Sironi, Filippo, Radim Krčmář
  Cc: Liguori, Anthony, kvm, linux-kernel



On 17/04/2017 13:26, Xiao Guangrong wrote:
>>
> 
> Sounds reasonable, however it is incomplete so far as there are some
> code paths still do not support non-page backend memory, e.g,
> emulator_cmpxchg_emulated().
> 
> I would suggest to unify the code introduced in this patch with existing
> hva_to_pfn(), also we can introduce a common API, maybe named
> kvm_map_hva(), to improve the caller sides.

I agree with Guangrong's suggestion.

Paolo

> BTW, i do not know why we used kmap_atomic() rather than kmap(), the
> path of cmpxchg_gpte() is sleep-able anyway.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-18  8:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-05 13:07 [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs Filippo Sironi
2017-04-06 14:22 ` Radim Krčmář
2017-04-12 13:16   ` Sironi, Filippo
2017-04-17 11:26     ` Xiao Guangrong
2017-04-18  8:12       ` Paolo Bonzini
2017-04-13 15:20 ` Filippo Sironi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.