From: Blake Caldwell <blake.caldwell@colorado.edu>
To: blake.caldwell@colorado.edu
Cc: rppt@linux.vnet.ibm.com, xemul@virtuozzo.com,
akpm@linux-foundation.org, mike.kravetz@oracle.com,
kirill.shutemov@linux.intel.com, linux-mm@kvack.org,
aarcange@redhat.com
Subject: [PATCH 0/4] RFC: userfaultfd remap
Date: Sat, 12 Jan 2019 00:36:25 +0000 [thread overview]
Message-ID: <cover.1547251023.git.blake.caldwell@colorado.edu> (raw)
Hello,
Since userfaultfd remap functionality was first proposed by Andrea
Arcangeli [1], a new use case has been demonstrated for removing pages
from the userfaultfd registered region. FluidMem [2] is a system for
expanding or limiting the resident size of a VM using a remote key-value
store as backing storage instead of swap space. It runs on the hypervisor
and uses userfaultfd to manage the memory regions malloc'd by qemu.
Since FluidMem maintains a constant resident size using an LRU list, it
must evict pages to the remote key-value store to make room for pages that
were just faulted in. This requires UFFDIO_REMAP to remove pages from the
uncooperative userspace page fault handler.
The VM shadow page tables must be kept in sync after a remapping, so
mmu_notifier_invalidate_range_(start/end) calls are made as necessary.
FluiMem enables page fault latencies to a remote key-value store that are
as fast as swap backed by DRAM (/dev/pmem0) and 77% faster than swap with a
SSD drive. pmbench [3] was used to measure page fault latencies with a 4 GB
working set size, within a VM using 1 GB DRAM (20% local):
FluidMem (RAMCloud): 24.87 microseconds
Swap (pmem DRAM): 26.34 microseconds
Swap (NVMe over Fabrics): 41.73 microseconds
Swap (SSD): 106.56 microseconds
For real applications FluidMem has an additional benefit of allowing
unused kernel pages to be removed from DRAM and stored in backend storage,
making room for additional application pages to be kept in local DRAM.
The useful memory capacity for the VM is increased.
The main complexity of this code is found in rmap, where it overwrites the
page->index when it moves the page to a different vma with different
vma->vm_pgoff. Overwriting page->index requires the rmap change and it's
only possible when the page_mapcount is 1.
Changes since [1]:
- Changed the direction supported by UFFDIO_REMAP to the OUT direction
needed by FluidMem. The IN direction is not necessary, as UFFDIO_COPY
should be used instead because it doesn't require a TLB flush.
- Code has been kept up-to-date by Andrea in branch userfault from [4].
[1] https://lkml.org/lkml/2015/3/5/576
[2] Caldwell, Blake, Youngbin Im, Sangtae Ha, Richard Han, and
Eric Keller. "FluidMem: Memory as a Service for the Datacenter."
arXiv preprint arXiv:1707.07780 (2017).
https://github.com/blakecaldwell/fluidmem
[3] Yang, Jisoo, and Julian Seymour. "Pmbench: A Micro-Benchmark for
Profiling Paging Performance on a System with Low-Latency SSDs."
Information Technology-New Generations. Springer, Cham, 2018. 627-633.
https://bitbucket.org/jisooy/pmbench/src
[4] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
Andrea Arcangeli (3):
userfaultfd: UFFDIO_REMAP: rmap preparation
userfaultfd: UFFDIO_REMAP uABI
userfaultfd: UFFDIO_REMAP
Blake Caldwell (1):
userfaultfd: change the direction for UFFDIO_REMAP to out
Documentation/admin-guide/mm/userfaultfd.rst | 10 +
fs/userfaultfd.c | 49 +++
include/linux/userfaultfd_k.h | 17 +
include/uapi/linux/userfaultfd.h | 25 +-
mm/huge_memory.c | 117 ++++++
mm/khugepaged.c | 3 +
mm/rmap.c | 13 +
mm/userfaultfd.c | 536 +++++++++++++++++++++++++++
8 files changed, 769 insertions(+), 1 deletion(-)
--
1.8.3.1
next reply other threads:[~2019-01-12 0:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-12 0:36 Blake Caldwell [this message]
2019-01-12 0:36 ` [PATCH 1/4] userfaultfd: UFFDIO_REMAP: rmap preparation Blake Caldwell
2019-01-12 0:36 ` [PATCH 2/4] userfaultfd: UFFDIO_REMAP uABI Blake Caldwell
2019-01-12 0:36 ` [PATCH 3/4] userfaultfd: UFFDIO_REMAP Blake Caldwell
2019-01-12 0:36 ` [PATCH 4/4] userfaultfd: change the direction for UFFDIO_REMAP to out Blake Caldwell
2019-01-20 21:07 ` Mike Rapoport
2019-01-24 23:36 ` Blake Caldwell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1547251023.git.blake.caldwell@colorado.edu \
--to=blake.caldwell@colorado.edu \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=xemul@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).