linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] KVM: mm: fd-based approach for supporting KVM guest private memory
@ 2021-11-11 14:13 Chao Peng
  2021-11-11 14:13 ` [RFC PATCH 1/6] mm: Add F_SEAL_GUEST to shmem/memfd Chao Peng
                   ` (12 more replies)
  0 siblings, 13 replies; 18+ messages in thread
From: Chao Peng @ 2021-11-11 14:13 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-mm, linux-fsdevel, qemu-devel
  Cc: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, Hugh Dickins, Jeff Layton, J . Bruce Fields,
	Andrew Morton, Yu Zhang, Chao Peng, Kirill A . Shutemov, luto,
	john.ji, susie.li, jun.nakajima, dave.hansen, ak, david

This RFC series try to implement the fd-based KVM guest private memory
proposal described at [1].

We had some offline discussions on this series already and that results
a different design proposal from Paolo. This thread includes both the
original RFC patch series for proposal [1] as well as the summary for
the new proposal from Paolo so that we can continue the discussion.

To understand the patch and the new proposal you are highly recommended
to read the original proposal [1] firstly.
 

Patch Description
=================
The patch include a private memory implementation in memfd/shmem backing
store and KVM support for private memory slot as well its counterpart in
QEMU.

Patch1:     kernel part shmem/memfd support
Patch2-6:   KVM part
Patch7-13:  QEMU part

QEMU Usage:
-machine private-memory-backend=ram1 \                                                                                                                                                                                       
-object memory-backend-memfd,id=ram1,size=5G,guest_private=on,seal=off


New Proposal
============
Below is a summary of the changes for the new proposal that was discussed
in the offline thread.

In general, this new proposal reuses the concept of fd-based guest
memory backing store that described in [1] but uses a different way to
coordinate the private and shared parts into one single memslot instead
of introducing dedicated private memslot.

- memslot extension
The new proposal suggests to add the private fd and the offset to
existing 'shared' memslot so both private/shared memory can live in one
single memslot. A page in the memslot is either private or shared. A
page is private only when it's allocated in the private fd, all the
other cases it's treated as shared, this includes those already mapped
as shared as well as those having not been mapped.

- private memory map/unmap
Userspace's map/unmap operations are done by fallocate() ioctl on
private fd.
  - map: default fallocate() with mode=0.
  - unmap: fallocate() with FALLOC_FL_PUNCH_HOLE.

There would be two new callbacks registered by KVM and called by memory
backing store during above map/unmap operations:
  - map(inode, offset, size): memory backing store to tell related KVM
    memslot to do a shared->private conversion.
  - unmap(inode, offset, size): memory backing store to tell related KVM
    memslot to do a private->shared conversion.

Memory backing store also needs to provide a new callback for KVM to
query if a page is already allocated in private-fd so KVM can know if
the page is private or not.
  - page_allocated(inode, offset): for shmem this would simply return
    pagecache_get_page().

There are two places in KVM that can exit to userspace to trigger
private/share conversion:
  - explicit conversion: happens when guest calls into KVM to explicitly
    map a range(as private or shared), KVM then exits to userspace to do
    the above map/unmap operations.
  - implicit conversion: happens in KVM page fault handler.
    * if fault due to a private memory access then cause a userspace exit
      for a shared->private conversion request when page_allocate() return
      false, otherwise map that directly without usrspace exit.
    * If fault due to a shared memory access then cause a userspace exit
      for a private->shared conversion request when page_allocate() return
      true, otherwise map that directly without userspace exit.
 
An example flow:

  guest                     Linux                userspace
  ------------------------- -------------------- -----------------------
                                                 ioctl(KVM_RUN)
  access private memoryd
         '--- EPT violation --.
                              v
                            userspace exit
                                 '------------------.
                                                    v
                                                 munmap shared memfd
                                                 fallocate private memfd
                                 .------------------'
                                 v
                            fallocate()
                              call guest_ops
                                 unmap shared PTE
                                 map private PTE
                              ...
                                                 ioctl(KVM_RUN)

Compared to the original proposal:
 - no need to introduce KVM memslot hole punching API,
 - would avoid potential memslot performance/scalability/fragment issue,
 - may also reduce userspace complexity,
 - but requires additional callbacks between KVM and memory backing
   store.

[1] https://lkml.kernel.org/kvm/51a6f74f-6c05-74b9-3fd7-b7cd900fb8cc@redhat.com/t/

Thanks,
Chao
---
Chao Peng (6):
  mm: Add F_SEAL_GUEST to shmem/memfd
  kvm: x86: Introduce guest private memory address space to memslot
  kvm: x86: add private_ops to memslot
  kvm: x86: implement private_ops for memfd backing store
  kvm: x86: add KVM_EXIT_MEMORY_ERROR exit
  KVM: add KVM_SPLIT_MEMORY_REGION

 Documentation/virt/kvm/api.rst  |   1 +
 arch/x86/include/asm/kvm_host.h |   5 +-
 arch/x86/include/uapi/asm/kvm.h |   4 +
 arch/x86/kvm/Makefile           |   2 +-
 arch/x86/kvm/memfd.c            |  63 +++++++++++
 arch/x86/kvm/mmu/mmu.c          |  69 ++++++++++--
 arch/x86/kvm/mmu/paging_tmpl.h  |   3 +-
 arch/x86/kvm/x86.c              |   3 +-
 include/linux/kvm_host.h        |  41 ++++++-
 include/linux/memfd.h           |  22 ++++
 include/linux/shmem_fs.h        |   9 ++
 include/uapi/linux/fcntl.h      |   1 +
 include/uapi/linux/kvm.h        |  34 ++++++
 mm/memfd.c                      |  34 +++++-
 mm/shmem.c                      | 127 +++++++++++++++++++++-
 virt/kvm/kvm_main.c             | 185 +++++++++++++++++++++++++++++++-
 16 files changed, 581 insertions(+), 22 deletions(-)
 create mode 100644 arch/x86/kvm/memfd.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-01-18  9:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-11 14:13 [RFC PATCH 0/6] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2021-11-11 14:13 ` [RFC PATCH 1/6] mm: Add F_SEAL_GUEST to shmem/memfd Chao Peng
2021-11-12 19:28   ` Kirill A. Shutemov
2021-11-11 14:13 ` [RFC PATCH 2/6] kvm: x86: Introduce guest private memory address space to memslot Chao Peng
2021-11-11 14:13 ` [RFC PATCH 3/6] kvm: x86: add private_ops " Chao Peng
2021-11-11 14:13 ` [RFC PATCH 4/6] kvm: x86: implement private_ops for memfd backing store Chao Peng
2021-11-11 14:13 ` [RFC PATCH 5/6] kvm: x86: add KVM_EXIT_MEMORY_ERROR exit Chao Peng
2021-11-11 15:08   ` Mika Penttilä
2021-11-12  5:50     ` Chao Peng
2021-11-11 14:13 ` [RFC PATCH 6/6] KVM: add KVM_SPLIT_MEMORY_REGION Chao Peng
2021-11-11 14:13 ` [RFC PATCH 07/13] linux-headers: Update Chao Peng
2021-11-11 14:13 ` [RFC PATCH 08/13] hostmem: Add guest private memory to memory backend Chao Peng
2021-11-11 14:13 ` [RFC PATCH 09/13] qmp: Include "guest-private" property for memory backends Chao Peng
2021-11-11 14:13 ` [RFC PATCH 10/13] softmmu/physmem: Add private memory address space Chao Peng
2022-01-18  9:41   ` Philippe Mathieu-Daudé
2021-11-11 14:13 ` [RFC PATCH 11/13] kvm: register private memory slots Chao Peng
2021-11-11 14:13 ` [RFC PATCH 12/13] kvm: handle private to shared memory conversion Chao Peng
2021-11-11 14:13 ` [RFC PATCH 13/13] machine: Add 'private-memory-backend' property Chao Peng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).