From: John Hubbard <jhubbard@nvidia.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Christoph Hellwig" <hch@infradead.org>,
"Dan Williams" <dan.j.williams@intel.com>,
"Dave Chinner" <david@fromorbit.com>,
"Ira Weiny" <ira.weiny@intel.com>, "Jan Kara" <jack@suse.cz>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-rdma@vger.kernel.org, "John Hubbard" <jhubbard@nvidia.com>,
"Michal Hocko" <mhocko@kernel.org>
Subject: [PATCH 3/4] mm/gup: introduce FOLL_PIN flag for get_user_pages()
Date: Tue, 20 Aug 2019 21:03:54 -0700 [thread overview]
Message-ID: <20190821040355.19566-3-jhubbard@nvidia.com> (raw)
In-Reply-To: <20190821040355.19566-1-jhubbard@nvidia.com>
FOLL_PIN is set by callers of vaddr_pin_pages(). This is different
than FOLL_LONGTERM, because even short term page pins need a new kind
of tracking, if those pinned pages' data is going to potentially
be modified.
This situation is described in more detail in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").
FOLL_PIN is added now, rather than waiting until there is code that
takes action based on FOLL_PIN. That's because having FOLL_PIN in
the code helps to highlight the differences between:
a) get_user_pages(): soon to be deprecated. Used to pin pages,
but without awareness of file systems that might use those
pages,
b) The original vaddr_pin_pages(): intended only for
FOLL_LONGTERM and DAX use cases. This assumes direct IO
and therefore is not applicable the most of the other
callers of get_user_pages(), and
Also add fairly extensive documentation of the meaning and use
of both FOLL_PIN and FOLL_LONGTERM.
Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases
in this documentation. (I've reworded it and expanded on it slightly.)
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
include/linux/mm.h | 56 +++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 50 insertions(+), 6 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc675e94ddf8..6e7de424bf5e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2644,6 +2644,8 @@ static inline vm_fault_t vmf_error(int err)
struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
unsigned int foll_flags);
+/* Flags for follow_page(), get_user_pages ("GUP"), and vaddr_pin_pages(): */
+
#define FOLL_WRITE 0x01 /* check pte is writable */
#define FOLL_TOUCH 0x02 /* mark page accessed */
#define FOLL_GET 0x04 /* do get_page on page */
@@ -2663,13 +2665,15 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
#define FOLL_ANON 0x8000 /* don't do file mappings */
#define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */
#define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */
+#define FOLL_PIN 0x40000 /* pages must be released via put_user_page() */
/*
- * NOTE on FOLL_LONGTERM:
+ * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
+ * other. Here is what they mean, and how to use them:
*
* FOLL_LONGTERM indicates that the page will be held for an indefinite time
- * period _often_ under userspace control. This is contrasted with
- * iov_iter_get_pages() where usages which are transient.
+ * period _often_ under userspace control. This is in contrast to
+ * iov_iter_get_pages(), where usages which are transient.
*
* FIXME: For pages which are part of a filesystem, mappings are subject to the
* lifetime enforced by the filesystem and we need guarantees that longterm
@@ -2684,11 +2688,51 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
* Currently only get_user_pages() and get_user_pages_fast() support this flag
* and calls to get_user_pages_[un]locked are specifically not allowed. This
* is due to an incompatibility with the FS DAX check and
- * FAULT_FLAG_ALLOW_RETRY
+ * FAULT_FLAG_ALLOW_RETRY.
*
- * In the CMA case: longterm pins in a CMA region would unnecessarily fragment
- * that region. And so CMA attempts to migrate the page before pinning when
+ * In the CMA case: long term pins in a CMA region would unnecessarily fragment
+ * that region. And so, CMA attempts to migrate the page before pinning, when
* FOLL_LONGTERM is specified.
+ *
+ * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount,
+ * but an additional pin counting system) will be invoked. This is intended for
+ * anything that gets a page reference and then touches page data (for example,
+ * Direct IO). This lets the filesystem know that some non-file-system entity is
+ * potentially changing the pages' data. FOLL_PIN pages must be released,
+ * ultimately, by a call to put_user_page(). Typically that will be via one of
+ * the vaddr_unpin_pages() variants.
+ *
+ * FIXME: note that this special tracking is not in place yet. However, the
+ * pages should still be released by put_user_page().
+ *
+ * When and where to use each flag:
+ *
+ * CASE 1: Direct IO (DIO). There are GUP references to pages that are serving
+ * as DIO buffers. These buffers are needed for a relatively short time (so they
+ * are not "long term"). No special synchronization with page_mkclean() or
+ * munmap() is provided. Therefore, flags to set at the call site are:
+ *
+ * FOLL_PIN
+ *
+ * CASE 2: RDMA. There are GUP references to pages that are serving as DMA
+ * buffers. These buffers are needed for a long time ("long term"). No special
+ * synchronization with page_mkclean() or munmap() is provided. Therefore, flags
+ * to set at the call site are:
+ *
+ * FOLL_PIN | FOLL_LONGTERM
+ *
+ * There is also a special case when the pages are DAX pages: in addition to the
+ * above flags, the caller needs a file lease. This is provided via the struct
+ * vaddr_pin argument to vaddr_pin_pages().
+ *
+ * CASE 3: ODP (Mellanox/Infiniband On Demand Paging: the hardware supports
+ * replayable page faulting). There are GUP references to pages serving as DMA
+ * buffers. For ODP, MMU notifiers are used to synchronize with page_mkclean()
+ * and munmap(). Therefore, normal GUP calls are sufficient, so neither flag
+ * needs to be set.
+ *
+ * CASE 4: pinning for struct page manipulation only. Here, normal GUP calls are
+ * sufficient, so neither flag needs to be set.
*/
static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
--
2.22.1
next prev parent reply other threads:[~2019-08-21 4:04 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-21 4:03 [PATCH 1/4] checkpatch: revert broken NOTIFIER_HEAD check John Hubbard
2019-08-21 4:03 ` [PATCH 2/4] For Ira: tiny formatting tweak to kerneldoc John Hubbard
2019-08-21 4:03 ` John Hubbard [this message]
2019-08-21 4:03 ` [PATCH 4/4] mm/gup: introduce vaddr_pin_pages_remote(), and invoke it John Hubbard
2019-08-21 4:05 ` disregard: [PATCH 1/4] checkpatch: revert broken NOTIFIER_HEAD check John Hubbard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190821040355.19566-3-jhubbard@nvidia.com \
--to=jhubbard@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jgg@ziepe.ca \
--cc=jglisse@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).