linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: linux-fsdevel@vger.kernel.org
Cc: Nadav Amit <namit@vmware.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Jens Axboe <axboe@kernel.dk>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	io-uring@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [RFC PATCH 00/13] fs/userfaultfd: support iouring and polling
Date: Sat, 28 Nov 2020 16:45:35 -0800	[thread overview]
Message-ID: <20201129004548.1619714-1-namit@vmware.com> (raw)

From: Nadav Amit <namit@vmware.com>

While the overhead of userfaultfd is usually reasonable, this overhead
can still be prohibitive for low-latency backing storage, such as RDMA,
persistent memory or in-memory compression. In such cases the overhead
of scheduling and entering/exiting the kernel becomes dominant.

The natural solution for this problem is to use iouring with
userfaultfd. But besides one bug, this does not provide sufficient
performance improvement and the use of ioctls for zero/copy limits the
use of iouring for synchronous "reads" (reporting of faults/events).
This patch-set provides four solutions for this overhead:

1. Userfaultfd "polling" mode, in which the faulting thread polls after
reporting the fault instead of being de-scheduled. This fits cases in
which the handler is expected to poll for page-faults on a different
thread.

2. Asynchronous-reads, in which the faulting thread reports page-faults
(and other events) directly to the userspace handler thread. For this
matter asynchronous read completions are being introduced.

3. Write interface, which provides similar services to the zero/copy
ioctls. This allows the use of iouring for zero/copy without changing
the iouring code or making it to be userfaultfd-aware. The low bits of
the "position" are being used to encode the requested operation
(zero/cop/wp/etc).

4. Async-writes, in which the zero/copy is performed by the faulting
thread instead of the iouring thread. This reduces caching effects as
the data is likely to be used by the faulting thread and find_vma()
cannot use its cache on the iouring worker.

I will provide some benchmark results later, but some initial results
show that these patches reduce the overhead of handling a user
page-fault by over 50%.

The patches require a bit more cleanup but seem to pass the tests.

Note that the first three patches are bug fixes. I did not Cc them to
stable yet.

Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: io-uring@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org

Nadav Amit (13):
  fs/userfaultfd: fix wrong error code on WP & !VM_MAYWRITE
  fs/userfaultfd: fix wrong file usage with iouring
  selftests/vm/userfaultfd: wake after copy failure
  fs/userfaultfd: simplify locks in userfaultfd_ctx_read
  fs/userfaultfd: introduce UFFD_FEATURE_POLL
  iov_iter: support atomic copy_page_from_iter_iovec()
  fs/userfaultfd: support read_iter to use io_uring
  fs/userfaultfd: complete reads asynchronously
  fs/userfaultfd: use iov_iter for copy/zero
  fs/userfaultfd: add write_iter() interface
  fs/userfaultfd: complete write asynchronously
  fs/userfaultfd: kmem-cache for wait-queue objects
  selftests/vm/userfaultfd: iouring and polling tests

 fs/userfaultfd.c                         | 740 ++++++++++++++++----
 include/linux/hugetlb.h                  |   4 +-
 include/linux/mm.h                       |   6 +-
 include/linux/shmem_fs.h                 |   2 +-
 include/linux/uio.h                      |   3 +
 include/linux/userfaultfd_k.h            |  10 +-
 include/uapi/linux/userfaultfd.h         |  21 +-
 lib/iov_iter.c                           |  23 +-
 mm/hugetlb.c                             |  12 +-
 mm/memory.c                              |  36 +-
 mm/shmem.c                               |  17 +-
 mm/userfaultfd.c                         |  96 ++-
 tools/testing/selftests/vm/Makefile      |   2 +-
 tools/testing/selftests/vm/userfaultfd.c | 835 +++++++++++++++++++++--
 14 files changed, 1506 insertions(+), 301 deletions(-)

-- 
2.25.1



             reply	other threads:[~2020-11-29  0:49 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-29  0:45 Nadav Amit [this message]
2020-11-29  0:45 ` [RFC PATCH 01/13] fs/userfaultfd: fix wrong error code on WP & !VM_MAYWRITE Nadav Amit
2020-12-01 21:22   ` Mike Kravetz
2020-12-21 19:01     ` Peter Xu
2020-11-29  0:45 ` [RFC PATCH 02/13] fs/userfaultfd: fix wrong file usage with iouring Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 03/13] selftests/vm/userfaultfd: wake after copy failure Nadav Amit
2020-12-21 19:28   ` Peter Xu
2020-12-21 19:51     ` Nadav Amit
2020-12-21 20:52       ` Peter Xu
2020-12-21 20:54         ` Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 04/13] fs/userfaultfd: simplify locks in userfaultfd_ctx_read Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 05/13] fs/userfaultfd: introduce UFFD_FEATURE_POLL Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 06/13] iov_iter: support atomic copy_page_from_iter_iovec() Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 07/13] fs/userfaultfd: support read_iter to use io_uring Nadav Amit
2020-11-30 18:20   ` Jens Axboe
2020-11-30 19:23     ` Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 08/13] fs/userfaultfd: complete reads asynchronously Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 09/13] fs/userfaultfd: use iov_iter for copy/zero Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 10/13] fs/userfaultfd: add write_iter() interface Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 11/13] fs/userfaultfd: complete write asynchronously Nadav Amit
2020-12-02  7:12   ` Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 12/13] fs/userfaultfd: kmem-cache for wait-queue objects Nadav Amit
2020-11-30 19:51   ` Nadav Amit
2020-12-03  5:19   ` [fs/userfaultfd] fec9227821: will-it-scale.per_process_ops -5.5% regression kernel test robot
2020-11-29  0:45 ` [RFC PATCH 13/13] selftests/vm/userfaultfd: iouring and polling tests Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201129004548.1619714-1-namit@vmware.com \
    --to=nadav.amit@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=namit@vmware.com \
    --cc=peterx@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).