From: ira.weiny@intel.com To: Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>, Theodore Ts'o <tytso@mit.edu>, Jeff Layton <jlayton@kernel.org>, Dave Chinner <david@fromorbit.com> Cc: linux-nvdimm@lists.01.org, "John Hubbard" <jhubbard@nvidia.com>, linux-kernel@vger.kernel.org, "Matthew Wilcox" <willy@infradead.org>, linux-xfs@vger.kernel.org, linux-mm@kvack.org, "Jérôme Glisse" <jglisse@redhat.com>, linux-fsdevel@vger.kernel.org, "Andrew Morton" <akpm@linux-foundation.org>, linux-ext4@vger.kernel.org Subject: [PATCH RFC 04/10] mm/gup: Ensure F_LAYOUT lease is held prior to GUP'ing pages Date: Wed, 5 Jun 2019 18:45:37 -0700 [thread overview] Message-ID: <20190606014544.8339-5-ira.weiny@intel.com> (raw) In-Reply-To: <20190606014544.8339-1-ira.weiny@intel.com> From: Ira Weiny <ira.weiny@intel.com> On FS DAX files users must inform the file system they intend to take long term GUP pins on the file pages. Failure to do so should result in an error. Ensure that a F_LAYOUT lease exists at the time the GUP call is made. If not return EPERM. Signed-off-by: Ira Weiny <ira.weiny@intel.com> --- fs/locks.c | 41 +++++++++++++++++++++++++++++++++++++++++ include/linux/mm.h | 2 ++ mm/gup.c | 25 +++++++++++++++++++++++++ mm/huge_memory.c | 12 ++++++++++++ 4 files changed, 80 insertions(+) diff --git a/fs/locks.c b/fs/locks.c index de9761c068de..43f5dc97652c 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2945,3 +2945,44 @@ static int __init filelock_init(void) return 0; } core_initcall(filelock_init); + +/** + * mapping_inode_has_layout() + * @page page we are trying to GUP + * + * This should only be called on DAX pages. DAX pages which are mapped through + * FS DAX do not use the page cache. As a result they require the user to take + * a LAYOUT lease on them prior to be able to pin them for longterm use. + * This allows the user to opt-into the fact that truncation operations will + * fail for the duration of the pin. + * + * @Return true if the page has a LAYOUT lease associated with it's file. + */ +bool mapping_inode_has_layout(struct page *page) +{ + bool ret = false; + struct inode *inode; + struct file_lock *fl; + struct file_lock_context *ctx; + + if (WARN_ON(PageAnon(page)) || + WARN_ON(!page) || + WARN_ON(!page->mapping) || + WARN_ON(!page->mapping->host)) + return false; + + inode = page->mapping->host; + + ctx = locks_get_lock_context(inode, F_RDLCK); + spin_lock(&ctx->flc_lock); + list_for_each_entry(fl, &ctx->flc_lease, fl_list) { + if (fl->fl_flags & FL_LAYOUT) { + ret = true; + break; + } + } + spin_unlock(&ctx->flc_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(mapping_inode_has_layout); diff --git a/include/linux/mm.h b/include/linux/mm.h index bc373a9b69fc..432b004b920c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1630,6 +1630,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); +bool mapping_inode_has_layout(struct page *page); + /* Container for pinned pfns / pages */ struct frame_vector { unsigned int nr_allocated; /* Number of frames we have space for */ diff --git a/mm/gup.c b/mm/gup.c index 26a7a3a3a657..d06cc5b14c0b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -361,6 +361,13 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = pte_page(pte); else goto no_page; + + if (unlikely(flags & FOLL_LONGTERM) && + (*pgmap)->type == MEMORY_DEVICE_FS_DAX && + !mapping_inode_has_layout(page)) { + page = ERR_PTR(-EPERM); + goto out; + } } else if (unlikely(!page)) { if (flags & FOLL_DUMP) { /* Avoid special (like zero) pages in core dumps */ @@ -1905,6 +1912,16 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON_PAGE(compound_head(page) != head, page); + if (pte_devmap(pte) && + unlikely(flags & FOLL_LONGTERM) && + pgmap->type == MEMORY_DEVICE_FS_DAX && + !mapping_inode_has_layout(head)) { + mod_node_page_state(page_pgdat(head), + NR_GUP_FAST_PAGE_BACKOFFS, 1); + put_user_page(head); + goto pte_unmap; + } + SetPageReferenced(page); pages[*nr] = page; (*nr)++; @@ -1955,6 +1972,14 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, } SetPageReferenced(page); pages[*nr] = page; + + if (unlikely(flags & FOLL_LONGTERM) && + pgmap->type == MEMORY_DEVICE_FS_DAX && + !mapping_inode_has_layout(page)) { + undo_dev_pagemap(nr, nr_start, pages); + return 0; + } + if (try_get_gup_pin_page(page, NR_GUP_FAST_PAGES_REQUESTED)) { undo_dev_pagemap(nr, nr_start, pages); return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index bb7fd7fa6f77..cdc213e50902 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -950,6 +950,12 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); + + if (unlikely(flags & FOLL_LONGTERM) && + (*pgmap)->type == MEMORY_DEVICE_FS_DAX && + !mapping_inode_has_layout(page)) + return ERR_PTR(-EPERM); + if (unlikely(!try_get_gup_pin_page(page, NR_GUP_SLOW_PAGES_REQUESTED))) page = ERR_PTR(-ENOMEM); @@ -1092,6 +1098,12 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); + + if (unlikely(flags & FOLL_LONGTERM) && + (*pgmap)->type == MEMORY_DEVICE_FS_DAX && + !mapping_inode_has_layout(page)) + return ERR_PTR(-EPERM); + if (unlikely(!try_get_gup_pin_page(page, NR_GUP_SLOW_PAGES_REQUESTED))) page = ERR_PTR(-ENOMEM); -- 2.20.1 _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: ira.weiny@intel.com To: Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>, "Theodore Ts'o" <tytso@mit.edu>, Jeff Layton <jlayton@kernel.org>, Dave Chinner <david@fromorbit.com> Cc: "Ira Weiny" <ira.weiny@intel.com>, "Matthew Wilcox" <willy@infradead.org>, linux-xfs@vger.kernel.org, "Andrew Morton" <akpm@linux-foundation.org>, "John Hubbard" <jhubbard@nvidia.com>, "Jérôme Glisse" <jglisse@redhat.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-ext4@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH RFC 04/10] mm/gup: Ensure F_LAYOUT lease is held prior to GUP'ing pages Date: Wed, 5 Jun 2019 18:45:37 -0700 [thread overview] Message-ID: <20190606014544.8339-5-ira.weiny@intel.com> (raw) In-Reply-To: <20190606014544.8339-1-ira.weiny@intel.com> From: Ira Weiny <ira.weiny@intel.com> On FS DAX files users must inform the file system they intend to take long term GUP pins on the file pages. Failure to do so should result in an error. Ensure that a F_LAYOUT lease exists at the time the GUP call is made. If not return EPERM. Signed-off-by: Ira Weiny <ira.weiny@intel.com> --- fs/locks.c | 41 +++++++++++++++++++++++++++++++++++++++++ include/linux/mm.h | 2 ++ mm/gup.c | 25 +++++++++++++++++++++++++ mm/huge_memory.c | 12 ++++++++++++ 4 files changed, 80 insertions(+) diff --git a/fs/locks.c b/fs/locks.c index de9761c068de..43f5dc97652c 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2945,3 +2945,44 @@ static int __init filelock_init(void) return 0; } core_initcall(filelock_init); + +/** + * mapping_inode_has_layout() + * @page page we are trying to GUP + * + * This should only be called on DAX pages. DAX pages which are mapped through + * FS DAX do not use the page cache. As a result they require the user to take + * a LAYOUT lease on them prior to be able to pin them for longterm use. + * This allows the user to opt-into the fact that truncation operations will + * fail for the duration of the pin. + * + * @Return true if the page has a LAYOUT lease associated with it's file. + */ +bool mapping_inode_has_layout(struct page *page) +{ + bool ret = false; + struct inode *inode; + struct file_lock *fl; + struct file_lock_context *ctx; + + if (WARN_ON(PageAnon(page)) || + WARN_ON(!page) || + WARN_ON(!page->mapping) || + WARN_ON(!page->mapping->host)) + return false; + + inode = page->mapping->host; + + ctx = locks_get_lock_context(inode, F_RDLCK); + spin_lock(&ctx->flc_lock); + list_for_each_entry(fl, &ctx->flc_lease, fl_list) { + if (fl->fl_flags & FL_LAYOUT) { + ret = true; + break; + } + } + spin_unlock(&ctx->flc_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(mapping_inode_has_layout); diff --git a/include/linux/mm.h b/include/linux/mm.h index bc373a9b69fc..432b004b920c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1630,6 +1630,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); +bool mapping_inode_has_layout(struct page *page); + /* Container for pinned pfns / pages */ struct frame_vector { unsigned int nr_allocated; /* Number of frames we have space for */ diff --git a/mm/gup.c b/mm/gup.c index 26a7a3a3a657..d06cc5b14c0b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -361,6 +361,13 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = pte_page(pte); else goto no_page; + + if (unlikely(flags & FOLL_LONGTERM) && + (*pgmap)->type == MEMORY_DEVICE_FS_DAX && + !mapping_inode_has_layout(page)) { + page = ERR_PTR(-EPERM); + goto out; + } } else if (unlikely(!page)) { if (flags & FOLL_DUMP) { /* Avoid special (like zero) pages in core dumps */ @@ -1905,6 +1912,16 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON_PAGE(compound_head(page) != head, page); + if (pte_devmap(pte) && + unlikely(flags & FOLL_LONGTERM) && + pgmap->type == MEMORY_DEVICE_FS_DAX && + !mapping_inode_has_layout(head)) { + mod_node_page_state(page_pgdat(head), + NR_GUP_FAST_PAGE_BACKOFFS, 1); + put_user_page(head); + goto pte_unmap; + } + SetPageReferenced(page); pages[*nr] = page; (*nr)++; @@ -1955,6 +1972,14 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, } SetPageReferenced(page); pages[*nr] = page; + + if (unlikely(flags & FOLL_LONGTERM) && + pgmap->type == MEMORY_DEVICE_FS_DAX && + !mapping_inode_has_layout(page)) { + undo_dev_pagemap(nr, nr_start, pages); + return 0; + } + if (try_get_gup_pin_page(page, NR_GUP_FAST_PAGES_REQUESTED)) { undo_dev_pagemap(nr, nr_start, pages); return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index bb7fd7fa6f77..cdc213e50902 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -950,6 +950,12 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); + + if (unlikely(flags & FOLL_LONGTERM) && + (*pgmap)->type == MEMORY_DEVICE_FS_DAX && + !mapping_inode_has_layout(page)) + return ERR_PTR(-EPERM); + if (unlikely(!try_get_gup_pin_page(page, NR_GUP_SLOW_PAGES_REQUESTED))) page = ERR_PTR(-ENOMEM); @@ -1092,6 +1098,12 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); + + if (unlikely(flags & FOLL_LONGTERM) && + (*pgmap)->type == MEMORY_DEVICE_FS_DAX && + !mapping_inode_has_layout(page)) + return ERR_PTR(-EPERM); + if (unlikely(!try_get_gup_pin_page(page, NR_GUP_SLOW_PAGES_REQUESTED))) page = ERR_PTR(-ENOMEM); -- 2.20.1
next prev parent reply other threads:[~2019-06-06 1:45 UTC|newest] Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-06-06 1:45 [PATCH RFC 00/10] RDMA/FS DAX truncate proposal ira.weiny 2019-06-06 1:45 ` ira.weiny 2019-06-06 1:45 ` [PATCH RFC 01/10] fs/locks: Add trace_leases_conflict ira.weiny 2019-06-09 12:52 ` Jeff Layton 2019-06-06 1:45 ` [PATCH RFC 02/10] fs/locks: Export F_LAYOUT lease to user space ira.weiny 2019-06-06 1:45 ` ira.weiny 2019-06-09 13:00 ` Jeff Layton 2019-06-09 13:00 ` Jeff Layton 2019-06-11 21:38 ` Ira Weiny 2019-06-11 21:38 ` Ira Weiny 2019-06-12 9:46 ` Jan Kara 2019-06-06 1:45 ` [PATCH RFC 03/10] mm/gup: Pass flags down to __gup_device_huge* calls ira.weiny 2019-06-06 1:45 ` ira.weiny 2019-06-06 6:18 ` Christoph Hellwig 2019-06-06 16:10 ` Ira Weiny 2019-06-06 1:45 ` ira.weiny [this message] 2019-06-06 1:45 ` [PATCH RFC 04/10] mm/gup: Ensure F_LAYOUT lease is held prior to GUP'ing pages ira.weiny 2019-06-06 1:45 ` [PATCH RFC 05/10] fs/ext4: Teach ext4 to break layout leases ira.weiny 2019-06-06 1:45 ` ira.weiny 2019-06-06 1:45 ` [PATCH RFC 06/10] fs/ext4: Teach dax_layout_busy_page() to operate on a sub-range ira.weiny 2019-06-06 1:45 ` ira.weiny 2019-06-06 1:45 ` [PATCH RFC 07/10] fs/ext4: Fail truncate if pages are GUP pinned ira.weiny 2019-06-06 1:45 ` ira.weiny 2019-06-06 10:58 ` Jan Kara 2019-06-06 10:58 ` Jan Kara 2019-06-06 16:17 ` Ira Weiny 2019-06-06 1:45 ` [PATCH RFC 08/10] fs/xfs: Teach xfs to use new dax_layout_busy_page() ira.weiny 2019-06-06 1:45 ` ira.weiny 2019-06-06 1:45 ` [PATCH RFC 09/10] fs/xfs: Fail truncate if pages are GUP pinned ira.weiny 2019-06-06 1:45 ` ira.weiny 2019-06-06 1:45 ` [PATCH RFC 10/10] mm/gup: Remove FOLL_LONGTERM DAX exclusion ira.weiny 2019-06-06 1:45 ` ira.weiny 2019-06-06 5:52 ` [PATCH RFC 00/10] RDMA/FS DAX truncate proposal John Hubbard 2019-06-06 5:52 ` John Hubbard 2019-06-06 17:11 ` Ira Weiny 2019-06-06 17:11 ` Ira Weiny 2019-06-06 19:46 ` Jason Gunthorpe 2019-06-06 10:42 ` Jan Kara 2019-06-06 15:35 ` Dan Williams 2019-06-06 19:51 ` Jason Gunthorpe 2019-06-06 22:22 ` Ira Weiny 2019-06-07 10:36 ` Jan Kara 2019-06-07 12:17 ` Jason Gunthorpe 2019-06-07 14:52 ` Ira Weiny 2019-06-07 14:52 ` Ira Weiny 2019-06-07 15:10 ` Jason Gunthorpe 2019-06-12 10:29 ` Jan Kara 2019-06-12 10:29 ` Jan Kara 2019-06-12 11:47 ` Jason Gunthorpe 2019-06-12 12:09 ` Jan Kara 2019-06-12 12:09 ` Jan Kara 2019-06-12 18:41 ` Dan Williams 2019-06-13 7:17 ` Jan Kara 2019-06-13 7:17 ` Jan Kara 2019-06-12 19:14 ` Jason Gunthorpe 2019-06-12 22:13 ` Ira Weiny 2019-06-12 22:54 ` Dan Williams 2019-06-12 22:54 ` Dan Williams 2019-06-12 23:33 ` Ira Weiny 2019-06-12 23:33 ` Ira Weiny 2019-06-13 1:14 ` Dan Williams 2019-06-13 1:14 ` Dan Williams 2019-06-13 15:13 ` Jason Gunthorpe 2019-06-13 16:25 ` Dan Williams 2019-06-13 16:25 ` Dan Williams 2019-06-13 17:18 ` Jason Gunthorpe 2019-06-13 16:53 ` Dan Williams 2019-06-13 16:53 ` Dan Williams 2019-06-13 15:12 ` Jason Gunthorpe 2019-06-13 7:53 ` Jan Kara 2019-06-13 7:53 ` Jan Kara 2019-06-12 18:49 ` Dan Williams 2019-06-12 18:49 ` Dan Williams 2019-06-13 7:43 ` Jan Kara 2019-06-06 22:03 ` Ira Weiny 2019-06-06 22:03 ` Ira Weiny 2019-06-06 22:26 ` Ira Weiny 2019-06-06 22:28 ` Dave Chinner 2019-06-07 11:04 ` Jan Kara 2019-06-07 18:25 ` Ira Weiny 2019-06-07 18:25 ` Ira Weiny 2019-06-07 18:25 ` Ira Weiny 2019-06-07 18:50 ` Jason Gunthorpe 2019-06-08 0:10 ` Dave Chinner 2019-06-08 0:10 ` Dave Chinner 2019-06-09 1:29 ` Ira Weiny 2019-06-09 1:29 ` Ira Weiny 2019-06-09 1:29 ` Ira Weiny 2019-06-12 12:37 ` Matthew Wilcox 2019-06-12 12:37 ` Matthew Wilcox 2019-06-12 12:37 ` Matthew Wilcox 2019-06-12 23:30 ` Ira Weiny 2019-06-12 23:30 ` Ira Weiny 2019-06-12 23:30 ` Ira Weiny 2019-06-13 0:55 ` Dave Chinner 2019-06-13 0:55 ` Dave Chinner 2019-06-13 0:55 ` Dave Chinner 2019-06-13 20:34 ` Ira Weiny 2019-06-13 20:34 ` Ira Weiny 2019-06-13 20:34 ` Ira Weiny 2019-06-14 3:42 ` Dave Chinner 2019-06-13 0:25 ` Dave Chinner 2019-06-13 0:25 ` Dave Chinner 2019-06-13 3:23 ` Matthew Wilcox 2019-06-13 3:23 ` Matthew Wilcox 2019-06-13 3:23 ` Matthew Wilcox 2019-06-13 4:36 ` Dave Chinner 2019-06-13 4:36 ` Dave Chinner 2019-06-13 4:36 ` Dave Chinner 2019-06-13 10:47 ` Matthew Wilcox 2019-06-13 10:47 ` Matthew Wilcox 2019-06-13 10:47 ` Matthew Wilcox 2019-06-13 15:29 ` Jason Gunthorpe 2019-06-13 15:27 ` Matthew Wilcox 2019-06-13 15:27 ` Matthew Wilcox 2019-06-13 15:27 ` Matthew Wilcox 2019-06-13 21:13 ` Ira Weiny 2019-06-13 21:13 ` Ira Weiny 2019-06-13 23:45 ` Jason Gunthorpe 2019-06-14 0:00 ` Ira Weiny 2019-06-14 0:00 ` Ira Weiny 2019-06-14 2:09 ` Dave Chinner 2019-06-14 2:09 ` Dave Chinner 2019-06-14 2:09 ` Dave Chinner 2019-06-14 2:31 ` Matthew Wilcox 2019-06-14 2:31 ` Matthew Wilcox 2019-06-14 3:07 ` Dave Chinner 2019-06-14 3:07 ` Dave Chinner 2019-06-14 3:07 ` Dave Chinner 2019-06-20 14:52 ` Jan Kara 2019-06-20 14:52 ` Jan Kara 2019-06-13 20:34 ` Ira Weiny 2019-06-13 20:34 ` Ira Weiny 2019-06-13 20:34 ` Ira Weiny 2019-06-14 2:58 ` Dave Chinner 2019-06-14 2:58 ` Dave Chinner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190606014544.8339-5-ira.weiny@intel.com \ --to=ira.weiny@intel.com \ --cc=akpm@linux-foundation.org \ --cc=dan.j.williams@intel.com \ --cc=david@fromorbit.com \ --cc=jack@suse.cz \ --cc=jglisse@redhat.com \ --cc=jhubbard@nvidia.com \ --cc=jlayton@kernel.org \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@lists.01.org \ --cc=linux-xfs@vger.kernel.org \ --cc=tytso@mit.edu \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.