From: John Hubbard <jhubbard@nvidia.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: "Christoph Hellwig" <hch@infradead.org>, "Dan Williams" <dan.j.williams@intel.com>, "Dave Chinner" <david@fromorbit.com>, "Ira Weiny" <ira.weiny@intel.com>, "Jan Kara" <jack@suse.cz>, "Jason Gunthorpe" <jgg@ziepe.ca>, "Jérôme Glisse" <jglisse@redhat.com>, "Vlastimil Babka" <vbabka@suse.cz>, LKML <linux-kernel@vger.kernel.org>, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-rdma@vger.kernel.org, "John Hubbard" <jhubbard@nvidia.com>, "Michal Hocko" <mhocko@kernel.org> Subject: [PATCH v2 2/3] mm/gup: introduce FOLL_PIN flag for get_user_pages() Date: Tue, 20 Aug 2019 21:07:26 -0700 [thread overview] Message-ID: <20190821040727.19650-3-jhubbard@nvidia.com> (raw) In-Reply-To: <20190821040727.19650-1-jhubbard@nvidia.com> As explained in the newly added documentation for FOLL_PIN and FOLL_LONGTERM, in every case where vaddr_pin_pages() is required, FOLL_PIN must be set. That reason, plus a desire to keep FOLL_PIN an internal (to get_user_pages() and follow_page()) detail, is why vaddr_pin_pages() sets FOLL_PIN. FOLL_LONGTERM, on the other hand, in only set in *some* cases, but not all. For that reason, this patch moves the setting of FOLL_LONGTERM out to the caller. Also add fairly extensive documentation of the meaning and use of both FOLL_PIN and FOLL_LONGTERM. Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases in this documentation. (I've reworded it and expanded on it slightly.) The motivation behind moving away from "bare" get_user_pages() calls is described in more detail in commit fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> --- drivers/infiniband/core/umem.c | 1 + include/linux/mm.h | 56 ++++++++++++++++++++++++++++++---- mm/gup.c | 2 +- 3 files changed, 52 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index e69eecb0023f..d84f1bfb8d21 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -300,6 +300,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, while (npages) { down_read(&mm->mmap_sem); + gup_flags |= FOLL_LONGTERM; ret = vaddr_pin_pages(cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof (struct page *)), diff --git a/include/linux/mm.h b/include/linux/mm.h index bc675e94ddf8..6e7de424bf5e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2644,6 +2644,8 @@ static inline vm_fault_t vmf_error(int err) struct page *follow_page(struct vm_area_struct *vma, unsigned long address, unsigned int foll_flags); +/* Flags for follow_page(), get_user_pages ("GUP"), and vaddr_pin_pages(): */ + #define FOLL_WRITE 0x01 /* check pte is writable */ #define FOLL_TOUCH 0x02 /* mark page accessed */ #define FOLL_GET 0x04 /* do get_page on page */ @@ -2663,13 +2665,15 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_ANON 0x8000 /* don't do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ +#define FOLL_PIN 0x40000 /* pages must be released via put_user_page() */ /* - * NOTE on FOLL_LONGTERM: + * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each + * other. Here is what they mean, and how to use them: * * FOLL_LONGTERM indicates that the page will be held for an indefinite time - * period _often_ under userspace control. This is contrasted with - * iov_iter_get_pages() where usages which are transient. + * period _often_ under userspace control. This is in contrast to + * iov_iter_get_pages(), where usages which are transient. * * FIXME: For pages which are part of a filesystem, mappings are subject to the * lifetime enforced by the filesystem and we need guarantees that longterm @@ -2684,11 +2688,51 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * Currently only get_user_pages() and get_user_pages_fast() support this flag * and calls to get_user_pages_[un]locked are specifically not allowed. This * is due to an incompatibility with the FS DAX check and - * FAULT_FLAG_ALLOW_RETRY + * FAULT_FLAG_ALLOW_RETRY. * - * In the CMA case: longterm pins in a CMA region would unnecessarily fragment - * that region. And so CMA attempts to migrate the page before pinning when + * In the CMA case: long term pins in a CMA region would unnecessarily fragment + * that region. And so, CMA attempts to migrate the page before pinning, when * FOLL_LONGTERM is specified. + * + * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount, + * but an additional pin counting system) will be invoked. This is intended for + * anything that gets a page reference and then touches page data (for example, + * Direct IO). This lets the filesystem know that some non-file-system entity is + * potentially changing the pages' data. FOLL_PIN pages must be released, + * ultimately, by a call to put_user_page(). Typically that will be via one of + * the vaddr_unpin_pages() variants. + * + * FIXME: note that this special tracking is not in place yet. However, the + * pages should still be released by put_user_page(). + * + * When and where to use each flag: + * + * CASE 1: Direct IO (DIO). There are GUP references to pages that are serving + * as DIO buffers. These buffers are needed for a relatively short time (so they + * are not "long term"). No special synchronization with page_mkclean() or + * munmap() is provided. Therefore, flags to set at the call site are: + * + * FOLL_PIN + * + * CASE 2: RDMA. There are GUP references to pages that are serving as DMA + * buffers. These buffers are needed for a long time ("long term"). No special + * synchronization with page_mkclean() or munmap() is provided. Therefore, flags + * to set at the call site are: + * + * FOLL_PIN | FOLL_LONGTERM + * + * There is also a special case when the pages are DAX pages: in addition to the + * above flags, the caller needs a file lease. This is provided via the struct + * vaddr_pin argument to vaddr_pin_pages(). + * + * CASE 3: ODP (Mellanox/Infiniband On Demand Paging: the hardware supports + * replayable page faulting). There are GUP references to pages serving as DMA + * buffers. For ODP, MMU notifiers are used to synchronize with page_mkclean() + * and munmap(). Therefore, normal GUP calls are sufficient, so neither flag + * needs to be set. + * + * CASE 4: pinning for struct page manipulation only. Here, normal GUP calls are + * sufficient, so neither flag needs to be set. */ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) diff --git a/mm/gup.c b/mm/gup.c index e49096d012ea..ba316d960d7a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2490,7 +2490,7 @@ long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages, { long ret; - gup_flags |= FOLL_LONGTERM; + gup_flags |= FOLL_PIN; if (!vaddr_pin || (!vaddr_pin->mm && !vaddr_pin->f_owner)) return -EINVAL; -- 2.22.1
next prev parent reply other threads:[~2019-08-21 4:07 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-08-21 4:07 [PATCH v2 0/3] mm/gup: introduce vaddr_pin_pages_remote(), FOLL_PIN John Hubbard 2019-08-21 4:07 ` [PATCH v2 1/3] For Ira: tiny formatting tweak to kerneldoc John Hubbard 2019-08-21 4:07 ` John Hubbard [this message] 2019-08-21 4:07 ` [PATCH v2 3/3] mm/gup: introduce vaddr_pin_pages_remote(), and invoke it John Hubbard 2019-08-23 0:24 ` [PATCH v2 0/3] mm/gup: introduce vaddr_pin_pages_remote(), FOLL_PIN Ira Weiny 2019-08-23 0:36 ` John Hubbard
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190821040727.19650-3-jhubbard@nvidia.com \ --to=jhubbard@nvidia.com \ --cc=akpm@linux-foundation.org \ --cc=dan.j.williams@intel.com \ --cc=david@fromorbit.com \ --cc=hch@infradead.org \ --cc=ira.weiny@intel.com \ --cc=jack@suse.cz \ --cc=jgg@ziepe.ca \ --cc=jglisse@redhat.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-rdma@vger.kernel.org \ --cc=mhocko@kernel.org \ --cc=vbabka@suse.cz \ --subject='Re: [PATCH v2 2/3] mm/gup: introduce FOLL_PIN flag for get_user_pages()' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).