From: Ira Weiny <ira.weiny@intel.com> To: John Hubbard <jhubbard@nvidia.com> Cc: "Andrew Morton" <akpm@linux-foundation.org>, "Al Viro" <viro@zeniv.linux.org.uk>, "Alex Williamson" <alex.williamson@redhat.com>, "Benjamin Herrenschmidt" <benh@kernel.crashing.org>, "Björn Töpel" <bjorn.topel@intel.com>, "Christoph Hellwig" <hch@infradead.org>, "Dan Williams" <dan.j.williams@intel.com>, "Daniel Vetter" <daniel@ffwll.ch>, "Dave Chinner" <david@fromorbit.com>, "David Airlie" <airlied@linux.ie>, "David S . Miller" <davem@davemloft.net>, "Jan Kara" <jack@suse.cz>, "Jason Gunthorpe" <jgg@ziepe.ca>, "Jens Axboe" <axboe@kernel.dk>, "Jonathan Corbet" <corbet@lwn.net>, "Jérôme Glisse" <jglisse@redhat.com>, "Magnus Karlsson" <magnus.karlsson@intel.com>, "Mauro Carvalho Chehab" <mchehab@kernel.org>, "Michael Ellerman" <mpe@ellerman.id.au> Subject: Re: [PATCH v5 09/24] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM Date: Fri, 15 Nov 2019 10:06:32 -0800 [thread overview] Message-ID: <20191115180631.GA23832@iweiny-DESK2.sc.intel.com> (raw) In-Reply-To: <20191115055340.1825745-10-jhubbard@nvidia.com> On Thu, Nov 14, 2019 at 09:53:25PM -0800, John Hubbard wrote: > As it says in the updated comment in gup.c: current FOLL_LONGTERM > behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the > FS DAX check requirement on vmas. > > However, the corresponding restriction in get_user_pages_remote() was > slightly stricter than is actually required: it forbade all > FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers > that do not set the "locked" arg. > > Update the code and comments accordingly, and update the VFIO caller > to take advantage of this, fixing a bug as a result: the VFIO caller > is logically a FOLL_LONGTERM user. > > Also, remove an unnessary pair of calls that were releasing and > reacquiring the mmap_sem. There is no need to avoid holding mmap_sem > just in order to call page_to_pfn(). > > Also, move the DAX check ("if a VMA is DAX, don't allow long term > pinning") from the VFIO call site, all the way into the internals > of get_user_pages_remote() and __gup_longterm_locked(). That is: > get_user_pages_remote() calls __gup_longterm_locked(), which in turn > calls check_dax_vmas(). It's lightly explained in the comments as well. > > Thanks to Jason Gunthorpe for pointing out a clean way to fix this, > and to Dan Williams for helping clarify the DAX refactoring. > > Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Jerome Glisse <jglisse@redhat.com> > Cc: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > --- > drivers/vfio/vfio_iommu_type1.c | 30 +++++------------------------- > mm/gup.c | 27 ++++++++++++++++++++++----- > 2 files changed, 27 insertions(+), 30 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index d864277ea16f..c7a111ad9975 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -340,7 +340,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > { > struct page *page[1]; > struct vm_area_struct *vma; > - struct vm_area_struct *vmas[1]; > unsigned int flags = 0; > int ret; > > @@ -348,33 +347,14 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > flags |= FOLL_WRITE; > > down_read(&mm->mmap_sem); > - if (mm == current->mm) { > - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, > - vmas); > - } else { > - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, > - vmas, NULL); > - /* > - * The lifetime of a vaddr_get_pfn() page pin is > - * userspace-controlled. In the fs-dax case this could > - * lead to indefinite stalls in filesystem operations. > - * Disallow attempts to pin fs-dax pages via this > - * interface. > - */ > - if (ret > 0 && vma_is_fsdax(vmas[0])) { > - ret = -EOPNOTSUPP; > - put_page(page[0]); > - } > - } > - up_read(&mm->mmap_sem); > - > + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, > + page, NULL, NULL); > if (ret == 1) { > *pfn = page_to_pfn(page[0]); > - return 0; > + ret = 0; > + goto done; > } > > - down_read(&mm->mmap_sem); > - > vaddr = untagged_addr(vaddr); > > vma = find_vma_intersection(mm, vaddr, vaddr + 1); > @@ -384,7 +364,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > if (is_invalid_reserved_pfn(*pfn)) > ret = 0; > } > - > +done: > up_read(&mm->mmap_sem); > return ret; > } > diff --git a/mm/gup.c b/mm/gup.c > index b859bd4da4d7..6cf613bfe7dc 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -29,6 +29,13 @@ struct follow_page_context { > unsigned int page_mask; > }; > > +static __always_inline long __gup_longterm_locked(struct task_struct *tsk, > + struct mm_struct *mm, > + unsigned long start, > + unsigned long nr_pages, > + struct page **pages, > + struct vm_area_struct **vmas, > + unsigned int flags); > /* > * Return the compound head page with ref appropriately incremented, > * or NULL if that failed. > @@ -1167,13 +1174,23 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, > struct vm_area_struct **vmas, int *locked) > { > /* > - * FIXME: Current FOLL_LONGTERM behavior is incompatible with > + * Parts of FOLL_LONGTERM behavior are incompatible with > * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on > - * vmas. As there are no users of this flag in this call we simply > - * disallow this option for now. > + * vmas. However, this only comes up if locked is set, and there are > + * callers that do request FOLL_LONGTERM, but do not set locked. So, > + * allow what we can. > */ > - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM)) > - return -EINVAL; > + if (gup_flags & FOLL_LONGTERM) { > + if (WARN_ON_ONCE(locked)) > + return -EINVAL; > + /* > + * This will check the vmas (even if our vmas arg is NULL) > + * and return -ENOTSUPP if DAX isn't allowed in this case: > + */ > + return __gup_longterm_locked(tsk, mm, start, nr_pages, pages, > + vmas, gup_flags | FOLL_TOUCH | > + FOLL_REMOTE); > + } > > return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, > locked, > -- > 2.24.0 >
WARNING: multiple messages have this Message-ID (diff)
From: Ira Weiny <ira.weiny@intel.com> To: John Hubbard <jhubbard@nvidia.com> Cc: "Michal Hocko" <mhocko@suse.com>, "Jan Kara" <jack@suse.cz>, kvm@vger.kernel.org, linux-doc@vger.kernel.org, "David Airlie" <airlied@linux.ie>, "Dave Chinner" <david@fromorbit.com>, dri-devel@lists.freedesktop.org, LKML <linux-kernel@vger.kernel.org>, linux-mm@kvack.org, "Paul Mackerras" <paulus@samba.org>, linux-kselftest@vger.kernel.org, "Shuah Khan" <shuah@kernel.org>, "Jonathan Corbet" <corbet@lwn.net>, linux-rdma@vger.kernel.org, "Michael Ellerman" <mpe@ellerman.id.au>, "Christoph Hellwig" <hch@infradead.org>, "Jason Gunthorpe" <jgg@ziepe.ca>, "Vlastimil Babka" <vbabka@suse.cz>, "Björn Töpel" <bjorn.topel@intel.com>, linux-media@vger.kernel.org, linux-block@vger.kernel.org, "Jérôme Glisse" <jglisse@redhat.com>, "Al Viro" <viro@zeniv.linux.org.uk>, "Dan Williams" <dan.j.williams@intel.com>, "Mauro Carvalho Chehab" <mchehab@kernel.org>, bpf@vger.kernel.org, "Magnus Karlsson" <magnus.karlsson@intel.com>, "Jens Axboe" <axboe@kernel.dk>, netdev@vger.kernel.org, "Alex Williamson" <alex.williamson@redhat.com>, linux-fsdevel@vger.kernel.org, "Andrew Morton" <akpm@linux-foundation.org>, linuxppc-dev@lists.ozlabs.org, "David S . Miller" <davem@davemloft.net>, "Mike Kravetz" <mike.kravetz@oracle.com> Subject: Re: [PATCH v5 09/24] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM Date: Fri, 15 Nov 2019 10:06:32 -0800 [thread overview] Message-ID: <20191115180631.GA23832@iweiny-DESK2.sc.intel.com> (raw) Message-ID: <20191115180632.1wskmkbUaUVLYmHc9_2Ahi2MbmI0ffG8sUPiMfei_qk@z> (raw) In-Reply-To: <20191115055340.1825745-10-jhubbard@nvidia.com> On Thu, Nov 14, 2019 at 09:53:25PM -0800, John Hubbard wrote: > As it says in the updated comment in gup.c: current FOLL_LONGTERM > behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the > FS DAX check requirement on vmas. > > However, the corresponding restriction in get_user_pages_remote() was > slightly stricter than is actually required: it forbade all > FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers > that do not set the "locked" arg. > > Update the code and comments accordingly, and update the VFIO caller > to take advantage of this, fixing a bug as a result: the VFIO caller > is logically a FOLL_LONGTERM user. > > Also, remove an unnessary pair of calls that were releasing and > reacquiring the mmap_sem. There is no need to avoid holding mmap_sem > just in order to call page_to_pfn(). > > Also, move the DAX check ("if a VMA is DAX, don't allow long term > pinning") from the VFIO call site, all the way into the internals > of get_user_pages_remote() and __gup_longterm_locked(). That is: > get_user_pages_remote() calls __gup_longterm_locked(), which in turn > calls check_dax_vmas(). It's lightly explained in the comments as well. > > Thanks to Jason Gunthorpe for pointing out a clean way to fix this, > and to Dan Williams for helping clarify the DAX refactoring. > > Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Jerome Glisse <jglisse@redhat.com> > Cc: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > --- > drivers/vfio/vfio_iommu_type1.c | 30 +++++------------------------- > mm/gup.c | 27 ++++++++++++++++++++++----- > 2 files changed, 27 insertions(+), 30 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index d864277ea16f..c7a111ad9975 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -340,7 +340,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > { > struct page *page[1]; > struct vm_area_struct *vma; > - struct vm_area_struct *vmas[1]; > unsigned int flags = 0; > int ret; > > @@ -348,33 +347,14 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > flags |= FOLL_WRITE; > > down_read(&mm->mmap_sem); > - if (mm == current->mm) { > - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, > - vmas); > - } else { > - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, > - vmas, NULL); > - /* > - * The lifetime of a vaddr_get_pfn() page pin is > - * userspace-controlled. In the fs-dax case this could > - * lead to indefinite stalls in filesystem operations. > - * Disallow attempts to pin fs-dax pages via this > - * interface. > - */ > - if (ret > 0 && vma_is_fsdax(vmas[0])) { > - ret = -EOPNOTSUPP; > - put_page(page[0]); > - } > - } > - up_read(&mm->mmap_sem); > - > + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, > + page, NULL, NULL); > if (ret == 1) { > *pfn = page_to_pfn(page[0]); > - return 0; > + ret = 0; > + goto done; > } > > - down_read(&mm->mmap_sem); > - > vaddr = untagged_addr(vaddr); > > vma = find_vma_intersection(mm, vaddr, vaddr + 1); > @@ -384,7 +364,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > if (is_invalid_reserved_pfn(*pfn)) > ret = 0; > } > - > +done: > up_read(&mm->mmap_sem); > return ret; > } > diff --git a/mm/gup.c b/mm/gup.c > index b859bd4da4d7..6cf613bfe7dc 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -29,6 +29,13 @@ struct follow_page_context { > unsigned int page_mask; > }; > > +static __always_inline long __gup_longterm_locked(struct task_struct *tsk, > + struct mm_struct *mm, > + unsigned long start, > + unsigned long nr_pages, > + struct page **pages, > + struct vm_area_struct **vmas, > + unsigned int flags); > /* > * Return the compound head page with ref appropriately incremented, > * or NULL if that failed. > @@ -1167,13 +1174,23 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, > struct vm_area_struct **vmas, int *locked) > { > /* > - * FIXME: Current FOLL_LONGTERM behavior is incompatible with > + * Parts of FOLL_LONGTERM behavior are incompatible with > * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on > - * vmas. As there are no users of this flag in this call we simply > - * disallow this option for now. > + * vmas. However, this only comes up if locked is set, and there are > + * callers that do request FOLL_LONGTERM, but do not set locked. So, > + * allow what we can. > */ > - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM)) > - return -EINVAL; > + if (gup_flags & FOLL_LONGTERM) { > + if (WARN_ON_ONCE(locked)) > + return -EINVAL; > + /* > + * This will check the vmas (even if our vmas arg is NULL) > + * and return -ENOTSUPP if DAX isn't allowed in this case: > + */ > + return __gup_longterm_locked(tsk, mm, start, nr_pages, pages, > + vmas, gup_flags | FOLL_TOUCH | > + FOLL_REMOTE); > + } > > return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, > locked, > -- > 2.24.0 > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2019-11-15 18:06 UTC|newest] Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-15 5:53 [PATCH v5 00/24] mm/gup: track dma-pinned pages: FOLL_PIN John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 01/24] mm/gup: pass flags arg to __gup_device_* functions John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 02/24] mm/gup: factor out duplicate code from four routines John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-18 9:46 ` Jan Kara 2019-11-18 9:46 ` Jan Kara 2019-11-19 7:00 ` John Hubbard 2019-11-19 7:00 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 03/24] mm/gup: move try_get_compound_head() to top, fix minor issues John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 04/24] mm: Cleanup __put_devmap_managed_page() vs ->page_free() John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 05/24] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 06/24] goldish_pipe: rename local pin_user_pages() routine John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-18 9:47 ` Jan Kara 2019-11-18 9:47 ` Jan Kara 2019-11-15 5:53 ` [PATCH v5 07/24] IB/umem: use get_user_pages_fast() to pin DMA pages John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-18 9:49 ` Jan Kara 2019-11-18 9:49 ` Jan Kara 2019-11-15 5:53 ` [PATCH v5 08/24] media/v4l2-core: set pages dirty upon releasing DMA buffers John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 09/24] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 14:08 ` Jason Gunthorpe 2019-11-15 18:06 ` Ira Weiny [this message] 2019-11-15 18:06 ` Ira Weiny 2019-11-15 5:53 ` [PATCH v5 10/24] mm/gup: introduce pin_user_pages*() and FOLL_PIN John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-18 10:16 ` Jan Kara 2019-11-18 10:16 ` Jan Kara 2019-11-19 5:17 ` John Hubbard 2019-11-19 5:17 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 11/24] goldish_pipe: convert to pin_user_pages() and put_user_page() John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-18 10:16 ` Jan Kara 2019-11-18 10:16 ` Jan Kara 2019-11-15 5:53 ` [PATCH v5 12/24] IB/{core, hw, umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 14:09 ` [PATCH v5 12/24] IB/{core,hw,umem}: " Jason Gunthorpe 2019-11-15 5:53 ` [PATCH v5 13/24] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-18 10:30 ` Jan Kara 2019-11-18 10:30 ` Jan Kara 2019-11-15 5:53 ` [PATCH v5 14/24] drm/via: set FOLL_PIN via pin_user_pages_fast() John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 15/24] fs/io_uring: set FOLL_PIN via pin_user_pages() John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-18 10:34 ` Jan Kara 2019-11-18 10:34 ` Jan Kara 2019-11-15 5:53 ` [PATCH v5 16/24] net/xdp: " John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 17/24] mm/gup: track FOLL_PIN pages John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-18 11:58 ` Jan Kara 2019-11-18 11:58 ` Jan Kara 2019-11-19 0:22 ` John Hubbard 2019-11-19 0:22 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 18/24] media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 19/24] vfio, mm: " John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 20/24] powerpc: book3s64: convert to pin_user_pages() and put_user_page() John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 21/24] mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1" John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 22/24] mm/gup_benchmark: support pin_user_pages() and related calls John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 23/24] selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage John Hubbard 2019-11-15 5:53 ` John Hubbard 2019-11-15 5:53 ` [PATCH v5 24/24] mm, tree-wide: rename put_user_page*() to unpin_user_page*() John Hubbard 2019-11-15 5:53 ` John Hubbard
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191115180631.GA23832@iweiny-DESK2.sc.intel.com \ --to=ira.weiny@intel.com \ --cc=airlied@linux.ie \ --cc=akpm@linux-foundation.org \ --cc=alex.williamson@redhat.com \ --cc=axboe@kernel.dk \ --cc=benh@kernel.crashing.org \ --cc=bjorn.topel@intel.com \ --cc=corbet@lwn.net \ --cc=dan.j.williams@intel.com \ --cc=daniel@ffwll.ch \ --cc=davem@davemloft.net \ --cc=david@fromorbit.com \ --cc=hch@infradead.org \ --cc=jack@suse.cz \ --cc=jgg@ziepe.ca \ --cc=jglisse@redhat.com \ --cc=jhubbard@nvidia.com \ --cc=magnus.karlsson@intel.com \ --cc=mchehab@kernel.org \ --cc=mpe@ellerman.id.au \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).