linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC V2 00/37] Enhance memory utilization with DMEMFS
@ 2020-12-07 11:30 yulei.kernel
  2020-12-07 11:30 ` [RFC V2 01/37] fs: introduce dmemfs module yulei.kernel
                   ` (37 more replies)
  0 siblings, 38 replies; 41+ messages in thread
From: yulei.kernel @ 2020-12-07 11:30 UTC (permalink / raw)
  To: linux-mm, akpm, linux-fsdevel, kvm, linux-kernel,
	naoya.horiguchi, viro, pbonzini
  Cc: joao.m.martins, rdunlap, sean.j.christopherson,
	xiaoguangrong.eric, kernellwp, lihaiwei.kernel, Yulei Zhang

From: Yulei Zhang <yuleixzhang@tencent.com>

In current system each physical memory page is assocaited with
a page structure which is used to track the usage of this page.
But due to the memory usage rapidly growing in cloud environment,
we find the resource consuming for page structure storage becomes
more and more remarkable. So is it possible that we could reclaim
such memory and make it reusable?

This patchset introduces an idea about how to save the extra
memory through a new virtual filesystem -- dmemfs.

Dmemfs (Direct Memory filesystem) is device memory or reserved
memory based filesystem. This kind of memory is special as it
is not managed by kernel and most important it is without 'struct page'.
Therefore we can leverage the extra memory from the host system
to support more tenants in our cloud service.

As the belowing figure shows, we uses a kernel boot parameter 'dmem='
to reserve the system memory when the host system boots up, the
remaining system memory is still managed by system memory management
which is associated with "struct page", the reserved memory
will be managed by dmem and assigned to guest system, the details
can be checked in /Documentation/admin-guide/kernel-parameters.txt.

   +------------------+--------------------------------------+
   |  system memory   |     memory for guest system          | 
   +------------------+--------------------------------------+
    |                                   |
    v                                   |
struct page                             |
    |                                   |
    v                                   v
    system mem management             dmem  

And during the usage, the dmemfs will handle the memory request to
allocate and free the reserved memory on each NUMA node, the user 
space application could leverage the mmap interface to access the 
memory, and kernel module such as kvm and vfio would be able to pin
the memory thongh follow_pfn() and get_user_page() in different given
page size granularities.

          +-----------+  +-----------+
          |   QEMU    |  |  dpdk etc.|      user
          +-----+-----+  +-----------+
  +-----------|------\------------------------------+
  |           |       v                    kernel   |
  |           |     +-------+  +-------+            |
  |           |     |  KVM  |  | vfio  |            |
  |           |     +-------+  +-------+            |
  |           |         |          |                |
  |      +----v---------v----------v------+         |
  |      |                                |         |
  |      |             Dmemfs             |         |
  |      |                                |         |
  |      +--------------------------------+         |
  +-----------/-----------------------\-------------+
             /                         \
     +------v-----+                +----v-------+
     |   node 0   |                |   node 1   |
     +------------+                +------------+

Theoretically for each 4k physical page it can save 64 bytes if
we drop the 'struct page', so for guest memory with 320G it can
save about 5G physical memory totally.

Detailed usage of dmemfs is included in
/Documentation/filesystem/dmemfs.rst.

V1->V2:
* Rebase the code the kernel version 5.10.0-rc3.
* Introudce dregion->memmap for dmem to add _refcount for each
  dmem page.
* Enable record_steal_time for dmem before entering guest system.
* Adjust page walking for dmem.

Yulei Zhang (37):
  fs: introduce dmemfs module
  mm: support direct memory reservation
  dmem: implement dmem memory management
  dmem: let pat recognize dmem
  dmemfs: support mmap for dmemfs
  dmemfs: support truncating inode down
  dmem: trace core functions
  dmem: show some statistic in debugfs
  dmemfs: support remote access
  dmemfs: introduce max_alloc_try_dpages parameter
  mm: export mempolicy interfaces to serve dmem allocator
  dmem: introduce mempolicy support
  mm, dmem: introduce PFN_DMEM and pfn_t_dmem
  mm, dmem: differentiate dmem-pmd and thp-pmd
  mm: add pmd_special() check for pmd_trans_huge_lock()
  dmemfs: introduce ->split() to dmemfs_vm_ops
  mm, dmemfs: support unmap_page_range() for dmemfs pmd
  mm: follow_pmd_mask() for dmem huge pmd
  mm: gup_huge_pmd() for dmem huge pmd
  mm: support dmem huge pmd for vmf_insert_pfn_pmd()
  mm: support dmem huge pmd for follow_pfn()
  kvm, x86: Distinguish dmemfs page from mmio page
  kvm, x86: introduce VM_DMEM for syscall support usage
  dmemfs: support hugepage for dmemfs
  mm, x86, dmem: fix estimation of reserved page for vaddr_get_pfn()
  mm, dmem: introduce pud_special() for dmem huge pud support
  mm: add pud_special() check to support dmem huge pud
  mm, dmemfs: support huge_fault() for dmemfs
  mm: add follow_pte_pud() to support huge pud look up
  dmem: introduce dmem_bitmap_alloc() and dmem_bitmap_free()
  dmem: introduce mce handler
  mm, dmemfs: register and handle the dmem mce
  kvm, x86: enable record_steal_time for dmem
  dmem: add dmem unit tests
  mm, dmem: introduce dregion->memmap for dmem
  vfio: support dmempage refcount for vfio
  Add documentation for dmemfs

 Documentation/admin-guide/kernel-parameters.txt |   38 +
 Documentation/filesystems/dmemfs.rst            |   58 ++
 Documentation/filesystems/index.rst             |    1 +
 arch/x86/Kconfig                                |    1 +
 arch/x86/include/asm/pgtable.h                  |   32 +-
 arch/x86/include/asm/pgtable_types.h            |   13 +-
 arch/x86/kernel/setup.c                         |    3 +
 arch/x86/kvm/mmu/mmu.c                          |    1 +
 arch/x86/mm/pat/memtype.c                       |   21 +
 drivers/vfio/vfio_iommu_type1.c                 |   13 +-
 fs/Kconfig                                      |    1 +
 fs/Makefile                                     |    1 +
 fs/dmemfs/Kconfig                               |   16 +
 fs/dmemfs/Makefile                              |    8 +
 fs/dmemfs/inode.c                               | 1060 ++++++++++++++++++++
 fs/dmemfs/trace.h                               |   54 +
 fs/inode.c                                      |    6 +
 include/linux/dmem.h                            |   54 +
 include/linux/fs.h                              |    1 +
 include/linux/huge_mm.h                         |    5 +-
 include/linux/mempolicy.h                       |    3 +
 include/linux/mm.h                              |    9 +
 include/linux/pfn_t.h                           |   17 +-
 include/linux/pgtable.h                         |   22 +
 include/trace/events/dmem.h                     |   85 ++
 include/uapi/linux/magic.h                      |    1 +
 mm/Kconfig                                      |   19 +
 mm/Makefile                                     |    1 +
 mm/dmem.c                                       | 1196 +++++++++++++++++++++++
 mm/dmem_reserve.c                               |  303 ++++++
 mm/gup.c                                        |  101 +-
 mm/huge_memory.c                                |   19 +-
 mm/memory-failure.c                             |   70 +-
 mm/memory.c                                     |   74 +-
 mm/mempolicy.c                                  |    4 +-
 mm/mincore.c                                    |    8 +-
 mm/mprotect.c                                   |    7 +-
 mm/mremap.c                                     |    3 +
 mm/pagewalk.c                                   |    4 +-
 tools/testing/dmem/Kbuild                       |    1 +
 tools/testing/dmem/Makefile                     |   10 +
 tools/testing/dmem/dmem-test.c                  |  184 ++++
 virt/kvm/kvm_main.c                             |   13 +-
 43 files changed, 3483 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/filesystems/dmemfs.rst
 create mode 100644 fs/dmemfs/Kconfig
 create mode 100644 fs/dmemfs/Makefile
 create mode 100644 fs/dmemfs/inode.c
 create mode 100644 fs/dmemfs/trace.h
 create mode 100644 include/linux/dmem.h
 create mode 100644 include/trace/events/dmem.h
 create mode 100644 mm/dmem.c
 create mode 100644 mm/dmem_reserve.c
 create mode 100644 tools/testing/dmem/Kbuild
 create mode 100644 tools/testing/dmem/Makefile
 create mode 100644 tools/testing/dmem/dmem-test.c

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2020-12-24 18:28 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-07 11:30 [RFC V2 00/37] Enhance memory utilization with DMEMFS yulei.kernel
2020-12-07 11:30 ` [RFC V2 01/37] fs: introduce dmemfs module yulei.kernel
2020-12-07 11:30 ` [RFC V2 02/37] mm: support direct memory reservation yulei.kernel
2020-12-07 11:30 ` [RFC V2 03/37] dmem: implement dmem memory management yulei.kernel
2020-12-07 11:30 ` [RFC V2 04/37] dmem: let pat recognize dmem yulei.kernel
2020-12-07 11:30 ` [RFC V2 05/37] dmemfs: support mmap for dmemfs yulei.kernel
2020-12-07 11:30 ` [RFC V2 06/37] dmemfs: support truncating inode down yulei.kernel
2020-12-07 11:31 ` [RFC V2 07/37] dmem: trace core functions yulei.kernel
2020-12-07 11:31 ` [RFC V2 08/37] dmem: show some statistic in debugfs yulei.kernel
2020-12-07 11:31 ` [RFC V2 09/37] dmemfs: support remote access yulei.kernel
2020-12-07 11:31 ` [RFC V2 10/37] dmemfs: introduce max_alloc_try_dpages parameter yulei.kernel
2020-12-07 11:31 ` [RFC V2 11/37] mm: export mempolicy interfaces to serve dmem allocator yulei.kernel
2020-12-07 11:31 ` [RFC V2 12/37] dmem: introduce mempolicy support yulei.kernel
2020-12-07 11:31 ` [RFC V2 13/37] mm, dmem: introduce PFN_DMEM and pfn_t_dmem yulei.kernel
2020-12-07 11:31 ` [RFC V2 14/37] mm, dmem: differentiate dmem-pmd and thp-pmd yulei.kernel
2020-12-07 11:31 ` [RFC V2 15/37] mm: add pmd_special() check for pmd_trans_huge_lock() yulei.kernel
2020-12-07 11:31 ` [RFC V2 16/37] dmemfs: introduce ->split() to dmemfs_vm_ops yulei.kernel
2020-12-07 11:31 ` [RFC V2 17/37] mm, dmemfs: support unmap_page_range() for dmemfs pmd yulei.kernel
2020-12-07 11:31 ` [RFC V2 18/37] mm: follow_pmd_mask() for dmem huge pmd yulei.kernel
2020-12-07 11:31 ` [RFC V2 19/37] mm: gup_huge_pmd() " yulei.kernel
2020-12-07 11:31 ` [RFC V2 20/37] mm: support dmem huge pmd for vmf_insert_pfn_pmd() yulei.kernel
2020-12-07 11:31 ` [RFC V2 21/37] mm: support dmem huge pmd for follow_pfn() yulei.kernel
2020-12-07 11:31 ` [RFC V2 22/37] kvm, x86: Distinguish dmemfs page from mmio page yulei.kernel
2020-12-07 11:31 ` [RFC V2 23/37] kvm, x86: introduce VM_DMEM for syscall support usage yulei.kernel
2020-12-07 11:31 ` [RFC V2 24/37] dmemfs: support hugepage for dmemfs yulei.kernel
2020-12-07 11:31 ` [RFC V2 25/37] mm, x86, dmem: fix estimation of reserved page for vaddr_get_pfn() yulei.kernel
2020-12-07 11:31 ` [RFC V2 26/37] mm, dmem: introduce pud_special() for dmem huge pud support yulei.kernel
2020-12-07 11:31 ` [RFC V2 27/37] mm: add pud_special() check to support dmem huge pud yulei.kernel
2020-12-07 11:31 ` [RFC V2 28/37] mm, dmemfs: support huge_fault() for dmemfs yulei.kernel
2020-12-07 11:31 ` [RFC V2 29/37] mm: add follow_pte_pud() to support huge pud look up yulei.kernel
2020-12-07 11:31 ` [RFC V2 30/37] dmem: introduce dmem_bitmap_alloc() and dmem_bitmap_free() yulei.kernel
2020-12-07 11:31 ` [RFC V2 31/37] dmem: introduce mce handler yulei.kernel
2020-12-07 11:31 ` [RFC V2 32/37] mm, dmemfs: register and handle the dmem mce yulei.kernel
2020-12-07 11:31 ` [RFC V2 33/37] kvm, x86: enable record_steal_time for dmem yulei.kernel
2020-12-07 11:31 ` [RFC V2 34/37] dmem: add dmem unit tests yulei.kernel
2020-12-07 11:31 ` [RFC V2 35/37] mm, dmem: introduce dregion->memmap for dmem yulei.kernel
2020-12-07 11:31 ` [RFC V2 36/37] vfio: support dmempage refcount for vfio yulei.kernel
2020-12-07 11:31 ` [RFC V2 37/37] Add documentation for dmemfs yulei.kernel
2020-12-24 18:27   ` Randy Dunlap
2020-12-07 12:02 ` [RFC V2 00/37] Enhance memory utilization with DMEMFS David Hildenbrand
2020-12-07 19:32   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).