All of lore.kernel.org
 help / color / mirror / Atom feed
From: Axel Rasmussen <axelrasmussen@google.com>
To: "Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Alexey Dobriyan" <adobriyan@gmail.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Chinwen Chang" <chinwen.chang@mediatek.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Ingo Molnar" <mingo@redhat.com>, "Jann Horn" <jannh@google.com>,
	"Jerome Glisse" <jglisse@redhat.com>,
	"Lokesh Gidra" <lokeshgidra@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Michel Lespinasse" <walken@google.com>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Peter Xu" <peterx@redhat.com>, "Shaohua Li" <shli@fb.com>,
	"Shawn Anastasio" <shawn@anastas.io>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Steven Price" <steven.price@arm.com>,
	"Vlastimil Babka" <vbabka@suse.cz>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Adam Ruprecht <ruprecht@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Cannon Matthews <cannonmatthews@google.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Oliver Upton <oupton@google.com>
Subject: [PATCH 8/9] userfaultfd: update documentation to describe minor fault handling
Date: Fri, 15 Jan 2021 11:04:50 -0800	[thread overview]
Message-ID: <20210115190451.3135416-9-axelrasmussen@google.com> (raw)
In-Reply-To: <20210115190451.3135416-1-axelrasmussen@google.com>

Reword / reorganize things a little bit into "lists", so new features /
modes / ioctls can sort of just be appended.

Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used
to intercept and resolve minor faults. Make it clear that COPY and
ZEROPAGE are used for MISSING faults, whereas CONTINUE is used for MINOR
faults.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 Documentation/admin-guide/mm/userfaultfd.rst | 105 +++++++++++--------
 1 file changed, 64 insertions(+), 41 deletions(-)

diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
index 65eefa66c0ba..67f2c68e65a2 100644
--- a/Documentation/admin-guide/mm/userfaultfd.rst
+++ b/Documentation/admin-guide/mm/userfaultfd.rst
@@ -63,36 +63,36 @@ the generic ioctl available.
 
 The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
 defines what memory types are supported by the ``userfaultfd`` and what
-events, except page fault notifications, may be generated.
-
-If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs
-virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in
-``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be
-set if the kernel supports registering ``userfaultfd`` ranges on shared
-memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``,
-``MAP_SHARED``, ``memfd_create``, etc).
-
-The userland application that wants to use ``userfaultfd`` with hugetlbfs
-or shared memory need to set the corresponding flag in
-``uffdio_api.features`` to enable those features.
-
-If the userland desires to receive notifications for events other than
-page faults, it has to verify that ``uffdio_api.features`` has appropriate
-``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
-detail below in `Non-cooperative userfaultfd`_ section.
-
-Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should
-be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
-register a memory range in the ``userfaultfd`` by setting the
+events, except page fault notifications, may be generated:
+
+- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
+  other than page faults are supported. These events are described in more
+  detail below in the `Non-cooperative userfaultfd`_ section.
+
+- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
+  indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING``
+  registrations for hugetlbfs and shared memory (covering all shmem APIs,
+  i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``,
+  etc) virtual memory areas, respectively.
+
+- ``UFFD_FEATURE_MINOR_FAULT_HUGETLBFS`` indicates that the kernel
+  supports ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs
+  virtual memory areas.
+
+The userland application should set the feature flags it intends to use
+when envoking the ``UFFDIO_API`` ioctl, to request that those features be
+enabled if supported.
+
+Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER``
+ioctl should be invoked (if present in the returned ``uffdio_api.ioctls``
+bitmask) to register a memory range in the ``userfaultfd`` by setting the
 uffdio_register structure accordingly. The ``uffdio_register.mode``
 bitmask will specify to the kernel which kind of faults to track for
-the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing
-pages). The ``UFFDIO_REGISTER`` ioctl will return the
+the range. The ``UFFDIO_REGISTER`` ioctl will return the
 ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
 userfaults on the range registered. Not all ioctls will necessarily be
-supported for all memory types depending on the underlying virtual
-memory backend (anonymous memory vs tmpfs vs real filebacked
-mappings).
+supported for all memory types (e.g. anonymous memory vs. shmem vs.
+hugetlbfs), or all types of intercepted faults.
 
 Userland can use the ``uffdio_register.ioctls`` to manage the virtual
 address space in the background (to add or potentially also remove
@@ -100,21 +100,44 @@ memory from the ``userfaultfd`` registered range). This means a userfault
 could be triggering just before userland maps in the background the
 user-faulted page.
 
-The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That
-atomically copies a page into the userfault registered range and wakes
-up the blocked userfaults
-(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set).
-Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
-guaranteeing that nothing can see an half copied page since it'll
-keep userfaulting until the copy has finished.
+Resolving Userfaults
+--------------------
+
+There are three basic ways to resolve userfaults:
+
+- ``UFFDIO_COPY`` atomically copies some existing page contents from
+  userspace.
+
+- ``UFFDIO_ZEROPAGE`` atomically zeros the new page.
+
+- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page.
+
+These operations are atomic in the sense that they guarantee nothing can
+see a half-populated page, since readers will keep userfaulting until the
+operation has finished.
+
+By default, these wake up userfaults blocked on the range in question.
+They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates
+that waking will be done separately at some later time.
+
+Which of these are used depends on the kind of fault:
+
+- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, a new page has to be
+  provided. This can be done with either ``UFFDIO_COPY`` or
+  ``UFFDIO_ZEROPAGE``. The default (non-userfaultfd) behavior would be to
+  provide a zero page, but in userfaultfd this is left up to userspace.
+
+- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, an existing page already
+  exists. Userspace needs to ensure its contents are correct (if it needs
+  to be modified, by writing directly to the non-userfaultfd-registered
+  side of shared memory), and then issue ``UFFDIO_CONTINUE`` to resolve
+  the fault.
 
 Notes:
 
-- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then
-  you must provide some kind of page in your thread after reading from
-  the uffd.  You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``.
-  The normal behavior of the OS automatically providing a zero page on
-  an anonymous mmaping is not in place.
+- You can tell which kind of fault occurred by examining
+  ``pagefault.flags`` within the ``uffd_msg``, checking for the
+  ``UFFD_PAGEFAULT_FLAG_*`` flags.
 
 - None of the page-delivering ioctls default to the range that you
   registered with.  You must fill in all fields for the appropriate
@@ -122,9 +145,9 @@ Notes:
 
 - You get the address of the access that triggered the missing page
   event out of a struct uffd_msg that you read in the thread from the
-  uffd.  You can supply as many pages as you want with ``UFFDIO_COPY`` or
-  ``UFFDIO_ZEROPAGE``.  Keep in mind that unless you used DONTWAKE then
-  the first of any of those IOCTLs wakes up the faulting thread.
+  uffd.  You can supply as many pages as you want with these IOCTLs.
+  Keep in mind that unless you used DONTWAKE then the first of any of
+  those IOCTLs wakes up the faulting thread.
 
 - Be sure to test for all errors including
   (``pollfd[0].revents & POLLERR``).  This can happen, e.g. when ranges
-- 
2.30.0.284.gd98b1dd5eaa7-goog


  parent reply	other threads:[~2021-01-15 19:06 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-15 19:04 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
2021-01-15 19:04 ` Axel Rasmussen
2021-01-15 19:04 ` [PATCH 1/9] hugetlb: Pass vma into huge_pte_alloc() Axel Rasmussen
2021-01-15 19:04   ` Axel Rasmussen
2021-01-16  0:42   ` kernel test robot
2021-01-16  0:42     ` kernel test robot
2021-01-15 19:04 ` [PATCH 2/9] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Axel Rasmussen
2021-01-15 19:04   ` Axel Rasmussen
2021-01-15 23:36   ` kernel test robot
2021-01-15 23:36     ` kernel test robot
2021-01-21 18:52   ` Peter Xu
2021-01-15 19:04 ` [PATCH 3/9] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Axel Rasmussen
2021-01-15 19:04   ` Axel Rasmussen
2021-01-15 19:04 ` [PATCH 4/9] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Axel Rasmussen
2021-01-15 19:04   ` Axel Rasmussen
2021-01-15 22:37   ` kernel test robot
2021-01-15 22:37     ` kernel test robot
2021-01-15 19:04 ` [PATCH 5/9] userfaultfd: add minor fault registration mode Axel Rasmussen
2021-01-15 19:04   ` Axel Rasmussen
2021-01-21 18:49   ` Peter Xu
2021-01-15 19:04 ` [PATCH 6/9] userfaultfd: disable huge PMD sharing for MINOR registered VMAs Axel Rasmussen
2021-01-15 19:04   ` Axel Rasmussen
2021-01-21 18:59   ` Peter Xu
2021-01-15 19:04 ` [PATCH 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl Axel Rasmussen
2021-01-15 19:04   ` Axel Rasmussen
2021-01-21 22:46   ` Peter Xu
2021-01-21 23:46     ` Axel Rasmussen
2021-01-15 19:04 ` Axel Rasmussen [this message]
2021-01-15 19:04   ` [PATCH 8/9] userfaultfd: update documentation to describe minor fault handling Axel Rasmussen
2021-01-15 19:04 ` [PATCH 9/9] userfaultfd/selftests: add test exercising " Axel Rasmussen
2021-01-15 19:04   ` Axel Rasmussen
2021-01-21 19:12 ` [PATCH 0/9] userfaultfd: add " Peter Xu
2021-01-21 22:13   ` Axel Rasmussen
2021-01-21 22:37     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210115190451.3135416-9-axelrasmussen@google.com \
    --to=axelrasmussen@google.com \
    --cc=aarcange@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=cannonmatthews@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=chinwen.chang@mediatek.com \
    --cc=dgilbert@redhat.com \
    --cc=jannh@google.com \
    --cc=jglisse@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=oupton@google.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=ruprecht@google.com \
    --cc=shawn@anastas.io \
    --cc=shli@fb.com \
    --cc=steven.price@arm.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.