All of lore.kernel.org
 help / color / mirror / Atom feed
From: Axel Rasmussen <axelrasmussen@google.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Charan Teja Reddy <charante@codeaurora.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Dmitry V . Levin" <ldv@altlinux.org>,
	Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>,
	Hugh Dickins <hughd@google.com>, Jan Kara <jack@suse.cz>,
	Jonathan Corbet <corbet@lwn.net>,
	Mel Gorman <mgorman@techsingularity.net>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@kernel.org>, Nadav Amit <namit@vmware.com>,
	Peter Xu <peterx@redhat.com>, Shuah Khan <shuah@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>, zhangyi <yi.zhang@huawei.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>,
	linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-kselftest@vger.kernel.org
Subject: [PATCH v2 4/6] userfaultfd: update documentation to describe /dev/userfaultfd
Date: Fri, 22 Apr 2022 14:29:43 -0700	[thread overview]
Message-ID: <20220422212945.2227722-5-axelrasmussen@google.com> (raw)
In-Reply-To: <20220422212945.2227722-1-axelrasmussen@google.com>

Explain the different ways to create a new userfaultfd, and how access
control works for each way.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 Documentation/admin-guide/mm/userfaultfd.rst | 38 ++++++++++++++++++--
 Documentation/admin-guide/sysctl/vm.rst      |  3 ++
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
index 6528036093e1..4c079b5377d4 100644
--- a/Documentation/admin-guide/mm/userfaultfd.rst
+++ b/Documentation/admin-guide/mm/userfaultfd.rst
@@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick.
 Design
 ======
 
-Userfaults are delivered and resolved through the ``userfaultfd`` syscall.
+Userspace creates a new userfaultfd, initializes it, and registers one or more
+regions of virtual memory with it. Then, any page faults which occur within the
+region(s) result in a message being delivered to the userfaultfd, notifying
+userspace of the fault.
 
 The ``userfaultfd`` (aside from registering and unregistering virtual
 memory ranges) provides two primary functionalities:
@@ -39,7 +42,7 @@ Vmas are not suitable for page- (or hugepage) granular fault tracking
 when dealing with virtual address spaces that could span
 Terabytes. Too many vmas would be needed for that.
 
-The ``userfaultfd`` once opened by invoking the syscall, can also be
+The ``userfaultfd``, once created, can also be
 passed using unix domain sockets to a manager process, so the same
 manager process could handle the userfaults of a multitude of
 different processes without them being aware about what is going on
@@ -50,6 +53,37 @@ is a corner case that would currently return ``-EBUSY``).
 API
 ===
 
+Creating a userfaultfd
+----------------------
+
+There are two mechanisms to create a userfaultfd. There are various ways to
+restrict this too, since userfaultfds which handle kernel page faults have
+historically been a useful tool for exploiting the kernel.
+
+The first is the userfaultfd(2) syscall. Access to this is controlled in several
+ways:
+
+- By default, the userfaultfd will be able to handle kernel page faults. This
+  can be disabled by passing in UFFD_USER_MODE_ONLY.
+
+- If vm.unprivileged_userfaultfd is 0, then the caller must *either* have
+  CAP_SYS_PTRACE, or pass in UFFD_USER_MODE_ONLY.
+
+- If vm.unprivileged_userfaultfd is 1, then no particular privilege is needed to
+  use this syscall, even if UFFD_USER_MODE_ONLY is *not* set.
+
+Alternatively, userfaultfds can be created by opening /dev/userfaultfd, and
+issuing a USERFAULTFD_IOC_NEW ioctl to this device. Access to this device is
+controlled via normal filesystem permissions (user/group/mode for example) - no
+additional permission (capability/sysctl) is needed to be able to handle kernel
+faults this way. This is useful because it allows e.g. a specific user or group
+to be able to create kernel-fault-handling userfaultfds, without allowing it
+more broadly, or granting more privileges in addition to that particular ability
+(CAP_SYS_PTRACE). In other words, it allows permissions to be minimized.
+
+Initializing up a userfaultfd
+------------------------
+
 When first opened the ``userfaultfd`` must be enabled invoking the
 ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or
 a later API version) which will specify the ``read/POLLIN`` protocol
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index f4804ce37c58..8682d5fbc8ea 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -880,6 +880,9 @@ calls without any restrictions.
 
 The default value is 0.
 
+An alternative to this sysctl / the userfaultfd(2) syscall is to create
+userfaultfds via /dev/userfaultfd. See
+Documentation/admin-guide/mm/userfaultfd.rst.
 
 user_reserve_kbytes
 ===================
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


  parent reply	other threads:[~2022-04-22 22:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-22 21:29 [PATCH v2 0/6] userfaultfd: add /dev/userfaultfd for fine grained access control Axel Rasmussen
2022-04-22 21:29 ` [PATCH v2 1/6] selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh Axel Rasmussen
2022-04-26 16:06   ` Shuah Khan
2022-04-26 20:33   ` Peter Xu
2022-04-22 21:29 ` [PATCH v2 2/6] userfaultfd: add /dev/userfaultfd for fine grained access control Axel Rasmussen
2022-04-25 20:32   ` Dmitry V. Levin
2022-04-26 16:00     ` Axel Rasmussen
2022-04-26 17:13       ` Arnd Bergmann
2022-04-26 20:32   ` Peter Xu
2022-04-26 21:33     ` Axel Rasmussen
2022-04-22 21:29 ` [PATCH v2 3/6] userfaultfd: selftests: modify selftest to use /dev/userfaultfd Axel Rasmussen
2022-04-26 16:16   ` Shuah Khan
2022-05-19 17:56     ` Axel Rasmussen
2022-04-22 21:29 ` Axel Rasmussen [this message]
2022-04-26 16:46   ` [PATCH v2 4/6] userfaultfd: update documentation to describe /dev/userfaultfd Shuah Khan
2022-05-19 18:58     ` Axel Rasmussen
2022-04-22 21:29 ` [PATCH v2 5/6] userfaultfd: selftests: make /dev/userfaultfd testing configurable Axel Rasmussen
2022-04-26 16:56   ` Shuah Khan
2022-05-19 19:13     ` Axel Rasmussen
2022-04-22 21:29 ` [PATCH v2 6/6] selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh Axel Rasmussen
2022-04-26 17:34   ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220422212945.2227722-5-axelrasmussen@google.com \
    --to=axelrasmussen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=charante@codeaurora.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=glebfm@altlinux.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=ldv@altlinux.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mike.kravetz@oracle.com \
    --cc=namit@vmware.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.