* [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
@ 2017-04-05 13:07 Filippo Sironi
2017-04-06 14:22 ` Radim Krčmář
2017-04-13 15:20 ` Filippo Sironi
0 siblings, 2 replies; 6+ messages in thread
From: Filippo Sironi @ 2017-04-05 13:07 UTC (permalink / raw)
To: sironi; +Cc: Filippo Sironi, Anthony Liguori, kvm, linux-kernel
cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of
pages and the respective struct pages for mapping in the kernel virtual
address space.
This doesn't work if get_user_pages_fast() is invoked with a userspace
virtual address that's backed by PFNs outside of kernel reach (e.g.,
when limiting the kernel memory with mem= in the command line and using
/dev/mem to map memory).
If get_user_pages_fast() fails, look up the VMA that backs the userspace
virtual address, compute the PFN and the physical address, and map it in
the kernel virtual address space with memremap().
Signed-off-by: Filippo Sironi <sironi@amazon.de>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
arch/x86/kvm/paging_tmpl.h | 39 ++++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index a01105485315..ab4d6617238c 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -147,15 +147,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
struct page *page;
npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page);
- /* Check if the user is doing something meaningless. */
- if (unlikely(npages != 1))
- return -EFAULT;
-
- table = kmap_atomic(page);
- ret = CMPXCHG(&table[index], orig_pte, new_pte);
- kunmap_atomic(table);
-
- kvm_release_page_dirty(page);
+ if (likely(npages == 1)) {
+ table = kmap_atomic(page);
+ ret = CMPXCHG(&table[index], orig_pte, new_pte);
+ kunmap_atomic(table);
+
+ kvm_release_page_dirty(page);
+ } else {
+ struct vm_area_struct *vma;
+ unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
+ unsigned long pfn;
+ unsigned long paddr;
+
+ down_read(¤t->mm->mmap_sem);
+ vma = find_vma_intersection(current->mm, vaddr,
+ vaddr + PAGE_SIZE);
+ if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
+ up_read(¤t->mm->mmap_sem);
+ return -EFAULT;
+ }
+ pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+ paddr = pfn << PAGE_SHIFT;
+ table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
+ if (!table) {
+ up_read(¤t->mm->mmap_sem);
+ return -EFAULT;
+ }
+ ret = CMPXCHG(&table[index], orig_pte, new_pte);
+ memunmap(table);
+ up_read(¤t->mm->mmap_sem);
+ }
return (ret != orig_pte);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
2017-04-05 13:07 [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs Filippo Sironi
@ 2017-04-06 14:22 ` Radim Krčmář
2017-04-12 13:16 ` Sironi, Filippo
2017-04-13 15:20 ` Filippo Sironi
1 sibling, 1 reply; 6+ messages in thread
From: Radim Krčmář @ 2017-04-06 14:22 UTC (permalink / raw)
To: Filippo Sironi; +Cc: sironi, Anthony Liguori, kvm, linux-kernel
2017-04-05 15:07+0200, Filippo Sironi:
> cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of
> pages and the respective struct pages for mapping in the kernel virtual
> address space.
> This doesn't work if get_user_pages_fast() is invoked with a userspace
> virtual address that's backed by PFNs outside of kernel reach (e.g.,
> when limiting the kernel memory with mem= in the command line and using
> /dev/mem to map memory).
>
> If get_user_pages_fast() fails, look up the VMA that backs the userspace
> virtual address, compute the PFN and the physical address, and map it in
> the kernel virtual address space with memremap().
What is the reason for a configuration that voluntarily restricts access
to memory that it needs?
> Signed-off-by: Filippo Sironi <sironi@amazon.de>
> Cc: Anthony Liguori <aliguori@amazon.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> @@ -147,15 +147,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> struct page *page;
>
> npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page);
> - /* Check if the user is doing something meaningless. */
> - if (unlikely(npages != 1))
> - return -EFAULT;
> -
> - table = kmap_atomic(page);
> - ret = CMPXCHG(&table[index], orig_pte, new_pte);
> - kunmap_atomic(table);
> -
> - kvm_release_page_dirty(page);
> + if (likely(npages == 1)) {
> + table = kmap_atomic(page);
> + ret = CMPXCHG(&table[index], orig_pte, new_pte);
> + kunmap_atomic(table);
> +
> + kvm_release_page_dirty(page);
> + } else {
> + struct vm_area_struct *vma;
> + unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
> + unsigned long pfn;
> + unsigned long paddr;
> +
> + down_read(¤t->mm->mmap_sem);
> + vma = find_vma_intersection(current->mm, vaddr,
> + vaddr + PAGE_SIZE);
Hm, with the argument order like this, we check that
vaddr < vma->vm_end && vaddr + PAGE_SIZE > vma->vm_start
but shouldn't we actually check that the whole page is there, i.e.
vaddr + PAGE_SIZE < vma->vm_end && vaddr > vma->vm_start
?
Thanks.
> + if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
> + up_read(¤t->mm->mmap_sem);
> + return -EFAULT;
> + }
> + pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> + paddr = pfn << PAGE_SHIFT;
> + table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
(I don't undestand why there isn't a wrapper for this ...
Looks like we're doing something unexpected.)
> + if (!table) {
> + up_read(¤t->mm->mmap_sem);
> + return -EFAULT;
> + }
> + ret = CMPXCHG(&table[index], orig_pte, new_pte);
> + memunmap(table);
> + up_read(¤t->mm->mmap_sem);
> + }
>
> return (ret != orig_pte);
> }
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
2017-04-06 14:22 ` Radim Krčmář
@ 2017-04-12 13:16 ` Sironi, Filippo
2017-04-17 11:26 ` Xiao Guangrong
0 siblings, 1 reply; 6+ messages in thread
From: Sironi, Filippo @ 2017-04-12 13:16 UTC (permalink / raw)
To: Radim Krčmář
Cc: Sironi, Filippo, Liguori, Anthony, kvm, linux-kernel
Thanks for taking the time and sorry for the delay.
> On 6. Apr 2017, at 16:22, Radim Krčmář <rkrcmar@redhat.com> wrote:
>
> 2017-04-05 15:07+0200, Filippo Sironi:
>> cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of
>> pages and the respective struct pages for mapping in the kernel virtual
>> address space.
>> This doesn't work if get_user_pages_fast() is invoked with a userspace
>> virtual address that's backed by PFNs outside of kernel reach (e.g.,
>> when limiting the kernel memory with mem= in the command line and using
>> /dev/mem to map memory).
>>
>> If get_user_pages_fast() fails, look up the VMA that backs the userspace
>> virtual address, compute the PFN and the physical address, and map it in
>> the kernel virtual address space with memremap().
>
> What is the reason for a configuration that voluntarily restricts access
> to memory that it needs?
By using /dev/mem to provide VM memory, one can avoid the overhead of allocating struct page(s) for the whole memory, which is wasteful when using a server entirely for hosting VMs.
>> Signed-off-by: Filippo Sironi <sironi@amazon.de>
>> Cc: Anthony Liguori <aliguori@amazon.com>
>> Cc: kvm@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> @@ -147,15 +147,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>> struct page *page;
>>
>> npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page);
>> - /* Check if the user is doing something meaningless. */
>> - if (unlikely(npages != 1))
>> - return -EFAULT;
>> -
>> - table = kmap_atomic(page);
>> - ret = CMPXCHG(&table[index], orig_pte, new_pte);
>> - kunmap_atomic(table);
>> -
>> - kvm_release_page_dirty(page);
>> + if (likely(npages == 1)) {
>> + table = kmap_atomic(page);
>> + ret = CMPXCHG(&table[index], orig_pte, new_pte);
>> + kunmap_atomic(table);
>> +
>> + kvm_release_page_dirty(page);
>> + } else {
>> + struct vm_area_struct *vma;
>> + unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
>> + unsigned long pfn;
>> + unsigned long paddr;
>> +
>> + down_read(¤t->mm->mmap_sem);
>> + vma = find_vma_intersection(current->mm, vaddr,
>> + vaddr + PAGE_SIZE);
>
> Hm, with the argument order like this, we check that
>
> vaddr < vma->vm_end && vaddr + PAGE_SIZE > vma->vm_start
>
> but shouldn't we actually check that the whole page is there, i.e.
>
> vaddr + PAGE_SIZE < vma->vm_end && vaddr > vma->vm_start
>
> ?
>
> Thanks.
Hm, right now we check for the following:
vaddr >= vma->vm_start && vaddr < vma->vm_end && vaddr + PAGE_SIZE > vma->vm_start
given that vaddr is PAGE_SIZE aligned, we're guaranteed that vaddr + PAGE_SIZE <= vma->vm_end.
This seems more complex than necessary. I believe that:
vma = find_vma(current->mm, vaddr);
should be enough.
>> + if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
>> + up_read(¤t->mm->mmap_sem);
>> + return -EFAULT;
>> + }
>> + pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>> + paddr = pfn << PAGE_SHIFT;
>> + table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
>
> (I don't undestand why there isn't a wrapper for this ...
> Looks like we're doing something unexpected.)
Do you mean a wrapper for getting the pfn/paddr?
>> + if (!table) {
>> + up_read(¤t->mm->mmap_sem);
>> + return -EFAULT;
>> + }
>> + ret = CMPXCHG(&table[index], orig_pte, new_pte);
>> + memunmap(table);
>> + up_read(¤t->mm->mmap_sem);
>> + }
>>
>> return (ret != orig_pte);
>> }
>> --
>> 2.7.4
I'll submit a v2 version soon.
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
2017-04-05 13:07 [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs Filippo Sironi
2017-04-06 14:22 ` Radim Krčmář
@ 2017-04-13 15:20 ` Filippo Sironi
1 sibling, 0 replies; 6+ messages in thread
From: Filippo Sironi @ 2017-04-13 15:20 UTC (permalink / raw)
To: kvm; +Cc: Filippo Sironi, Anthony Liguori, linux-kernel
cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of
pages and the respective struct pages for mapping in the kernel virtual
address space.
This doesn't work if get_user_pages_fast() is invoked with a userspace
virtual address that's backed by PFNs outside of kernel reach (e.g.,
when limiting the kernel memory with mem= in the command line and using
/dev/mem to map memory).
If get_user_pages_fast() fails, look up the VMA that backs the userspace
virtual address, compute the PFN and the physical address, and map it in
the kernel virtual address space with memremap().
Signed-off-by: Filippo Sironi <sironi@amazon.de>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
arch/x86/kvm/paging_tmpl.h | 42 +++++++++++++++++++++++++++++++++---------
1 file changed, 33 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index a01105485315..b3d7a117179d 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -147,15 +147,39 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
struct page *page;
npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page);
- /* Check if the user is doing something meaningless. */
- if (unlikely(npages != 1))
- return -EFAULT;
-
- table = kmap_atomic(page);
- ret = CMPXCHG(&table[index], orig_pte, new_pte);
- kunmap_atomic(table);
-
- kvm_release_page_dirty(page);
+ if (likely(npages == 1)) {
+ table = kmap_atomic(page);
+ ret = CMPXCHG(&table[index], orig_pte, new_pte);
+ kunmap_atomic(table);
+
+ kvm_release_page_dirty(page);
+ } else {
+ struct vm_area_struct *vma;
+ unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
+ unsigned long pfn;
+ unsigned long paddr;
+
+ down_read(¤t->mm->mmap_sem);
+ /*
+ * vaddr is page-aligned, if a vma exists, it must cover
+ * ptep_user
+ */
+ vma = find_vma(current->mm, vaddr);
+ if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
+ up_read(¤t->mm->mmap_sem);
+ return -EFAULT;
+ }
+ pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+ paddr = pfn << PAGE_SHIFT;
+ table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
+ if (!table) {
+ up_read(¤t->mm->mmap_sem);
+ return -EFAULT;
+ }
+ ret = CMPXCHG(&table[index], orig_pte, new_pte);
+ memunmap(table);
+ up_read(¤t->mm->mmap_sem);
+ }
return (ret != orig_pte);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
2017-04-12 13:16 ` Sironi, Filippo
@ 2017-04-17 11:26 ` Xiao Guangrong
2017-04-18 8:12 ` Paolo Bonzini
0 siblings, 1 reply; 6+ messages in thread
From: Xiao Guangrong @ 2017-04-17 11:26 UTC (permalink / raw)
To: Sironi, Filippo, Radim Krčmář
Cc: Liguori, Anthony, kvm, linux-kernel
On 04/12/2017 09:16 PM, Sironi, Filippo wrote:
> Thanks for taking the time and sorry for the delay.
>
>> On 6. Apr 2017, at 16:22, Radim Krčmář <rkrcmar@redhat.com> wrote:
>>
>> 2017-04-05 15:07+0200, Filippo Sironi:
>>> cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of
>>> pages and the respective struct pages for mapping in the kernel virtual
>>> address space.
>>> This doesn't work if get_user_pages_fast() is invoked with a userspace
>>> virtual address that's backed by PFNs outside of kernel reach (e.g.,
>>> when limiting the kernel memory with mem= in the command line and using
>>> /dev/mem to map memory).
>>>
>>> If get_user_pages_fast() fails, look up the VMA that backs the userspace
>>> virtual address, compute the PFN and the physical address, and map it in
>>> the kernel virtual address space with memremap().
>>
>> What is the reason for a configuration that voluntarily restricts access
>> to memory that it needs?
>
> By using /dev/mem to provide VM memory, one can avoid the overhead of allocating struct page(s) for the whole memory, which is wasteful when using a server entirely for hosting VMs.
>
Sounds reasonable, however it is incomplete so far as there are some
code paths still do not support non-page backend memory, e.g,
emulator_cmpxchg_emulated().
I would suggest to unify the code introduced in this patch with existing
hva_to_pfn(), also we can introduce a common API, maybe named
kvm_map_hva(), to improve the caller sides.
BTW, i do not know why we used kmap_atomic() rather than kmap(), the
path of cmpxchg_gpte() is sleep-able anyway.
Thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs
2017-04-17 11:26 ` Xiao Guangrong
@ 2017-04-18 8:12 ` Paolo Bonzini
0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2017-04-18 8:12 UTC (permalink / raw)
To: Xiao Guangrong, Sironi, Filippo, Radim Krčmář
Cc: Liguori, Anthony, kvm, linux-kernel
On 17/04/2017 13:26, Xiao Guangrong wrote:
>>
>
> Sounds reasonable, however it is incomplete so far as there are some
> code paths still do not support non-page backend memory, e.g,
> emulator_cmpxchg_emulated().
>
> I would suggest to unify the code introduced in this patch with existing
> hva_to_pfn(), also we can introduce a common API, maybe named
> kvm_map_hva(), to improve the caller sides.
I agree with Guangrong's suggestion.
Paolo
> BTW, i do not know why we used kmap_atomic() rather than kmap(), the
> path of cmpxchg_gpte() is sleep-able anyway.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-04-18 8:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-05 13:07 [PATCH] x86, kvm: Handle PFNs outside of kernel reach when touching GPTEs Filippo Sironi
2017-04-06 14:22 ` Radim Krčmář
2017-04-12 13:16 ` Sironi, Filippo
2017-04-17 11:26 ` Xiao Guangrong
2017-04-18 8:12 ` Paolo Bonzini
2017-04-13 15:20 ` Filippo Sironi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.