All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: linux-fsdevel@vger.kernel.org
Cc: Nadav Amit <namit@vmware.com>, Jens Axboe <axboe@kernel.dk>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	io-uring@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [RFC PATCH 05/13] fs/userfaultfd: introduce UFFD_FEATURE_POLL
Date: Sat, 28 Nov 2020 16:45:40 -0800	[thread overview]
Message-ID: <20201129004548.1619714-6-namit@vmware.com> (raw)
In-Reply-To: <20201129004548.1619714-1-namit@vmware.com>

From: Nadav Amit <namit@vmware.com>

Add a feature UFFD_FEATURE_POLL that makes the faulting thread spin
while waiting for the page-fault to be handled.

Users of this feature should be wise by setting the page-fault handling
thread on another physical CPU and to potentially ensure that there are
available cores to run the handler, as otherwise they will see
performance degradation.

We can later enhance it by setting one or two timeouts: one timeout
until the page-fault is handled and another until the handler was
woken.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: io-uring@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 fs/userfaultfd.c                 | 24 ++++++++++++++++++++----
 include/uapi/linux/userfaultfd.h |  9 ++++++++-
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index fedf7c1615d5..b6a04e526025 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -122,7 +122,9 @@ static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode,
 	if (len && (start > uwq->msg.arg.pagefault.address ||
 		    start + len <= uwq->msg.arg.pagefault.address))
 		goto out;
-	WRITE_ONCE(uwq->waken, true);
+
+	smp_store_mb(uwq->waken, true);
+
 	/*
 	 * The Program-Order guarantees provided by the scheduler
 	 * ensure uwq->waken is visible before the task is woken.
@@ -377,6 +379,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 	vm_fault_t ret = VM_FAULT_SIGBUS;
 	bool must_wait;
 	long blocking_state;
+	bool poll;
 
 	/*
 	 * We don't do userfault handling for the final child pid update.
@@ -410,6 +413,8 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 	if (ctx->features & UFFD_FEATURE_SIGBUS)
 		goto out;
 
+	poll = ctx->features & UFFD_FEATURE_POLL;
+
 	/*
 	 * If it's already released don't get it. This avoids to loop
 	 * in __get_user_pages if userfaultfd_release waits on the
@@ -495,7 +500,10 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 	 * following the spin_unlock to happen before the list_add in
 	 * __add_wait_queue.
 	 */
-	set_current_state(blocking_state);
+
+	if (!poll)
+		set_current_state(blocking_state);
+
 	spin_unlock_irq(&ctx->fault_pending_wqh.lock);
 
 	if (!is_vm_hugetlb_page(vmf->vma))
@@ -509,10 +517,18 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 
 	if (likely(must_wait && !READ_ONCE(ctx->released))) {
 		wake_up_poll(&ctx->fd_wqh, EPOLLIN);
-		schedule();
+		if (poll) {
+			while (!READ_ONCE(uwq.waken) && !READ_ONCE(ctx->released) &&
+			       !signal_pending(current)) {
+				cpu_relax();
+				cond_resched();
+			}
+		} else
+			schedule();
 	}
 
-	__set_current_state(TASK_RUNNING);
+	if (!poll)
+		__set_current_state(TASK_RUNNING);
 
 	/*
 	 * Here we race with the list_del; list_add in
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..4eeba4235afe 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -27,7 +27,9 @@
 			   UFFD_FEATURE_MISSING_HUGETLBFS |	\
 			   UFFD_FEATURE_MISSING_SHMEM |		\
 			   UFFD_FEATURE_SIGBUS |		\
-			   UFFD_FEATURE_THREAD_ID)
+			   UFFD_FEATURE_THREAD_ID |		\
+			   UFFD_FEATURE_POLL)
+
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -171,6 +173,10 @@ struct uffdio_api {
 	 *
 	 * UFFD_FEATURE_THREAD_ID pid of the page faulted task_struct will
 	 * be returned, if feature is not requested 0 will be returned.
+	 *
+	 * UFFD_FEATURE_POLL polls upon page-fault if the feature is requested
+	 * instead of descheduling. This feature should only be enabled for
+	 * low-latency handlers and when CPUs are not overcomitted.
 	 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
 #define UFFD_FEATURE_EVENT_FORK			(1<<1)
@@ -181,6 +187,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_EVENT_UNMAP		(1<<6)
 #define UFFD_FEATURE_SIGBUS			(1<<7)
 #define UFFD_FEATURE_THREAD_ID			(1<<8)
+#define UFFD_FEATURE_POLL			(1<<9)
 	__u64 features;
 
 	__u64 ioctls;
-- 
2.25.1


  parent reply	other threads:[~2020-11-29  0:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-29  0:45 [RFC PATCH 00/13] fs/userfaultfd: support iouring and polling Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 01/13] fs/userfaultfd: fix wrong error code on WP & !VM_MAYWRITE Nadav Amit
2020-12-01 21:22   ` Mike Kravetz
2020-12-21 19:01     ` Peter Xu
2020-11-29  0:45 ` [RFC PATCH 02/13] fs/userfaultfd: fix wrong file usage with iouring Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 03/13] selftests/vm/userfaultfd: wake after copy failure Nadav Amit
2020-12-21 19:28   ` Peter Xu
2020-12-21 19:51     ` Nadav Amit
2020-12-21 20:52       ` Peter Xu
2020-12-21 20:54         ` Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 04/13] fs/userfaultfd: simplify locks in userfaultfd_ctx_read Nadav Amit
2020-11-29  0:45 ` Nadav Amit [this message]
2020-11-29  0:45 ` [RFC PATCH 06/13] iov_iter: support atomic copy_page_from_iter_iovec() Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 07/13] fs/userfaultfd: support read_iter to use io_uring Nadav Amit
2020-11-30 18:20   ` Jens Axboe
2020-11-30 19:23     ` Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 08/13] fs/userfaultfd: complete reads asynchronously Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 09/13] fs/userfaultfd: use iov_iter for copy/zero Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 10/13] fs/userfaultfd: add write_iter() interface Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 11/13] fs/userfaultfd: complete write asynchronously Nadav Amit
2020-12-02  7:12   ` Nadav Amit
2020-11-29  0:45 ` [RFC PATCH 12/13] fs/userfaultfd: kmem-cache for wait-queue objects Nadav Amit
2020-11-30 19:51   ` Nadav Amit
2020-12-03  5:19   ` [fs/userfaultfd] fec9227821: will-it-scale.per_process_ops -5.5% regression kernel test robot
2020-12-03  5:19     ` kernel test robot
2020-11-29  0:45 ` [RFC PATCH 13/13] selftests/vm/userfaultfd: iouring and polling tests Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201129004548.1619714-6-namit@vmware.com \
    --to=nadav.amit@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=namit@vmware.com \
    --cc=peterx@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.