linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] introduce get_user_pages_longterm()
@ 2017-11-14 19:56 Dan Williams
  2017-11-14 19:56 ` [PATCH v2 1/4] mm: introduce get_user_pages_longterm Dan Williams
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Dan Williams @ 2017-11-14 19:56 UTC (permalink / raw)
  To: akpm
  Cc: Inki Dae, Jan Kara, Joonyoung Shim, linux-nvdimm, linux-rdma,
	linux-kernel, Seung-Woo Kim, Jeff Moyer, stable, Hal Rosenstock,
	Jason Gunthorpe, linux-mm, Doug Ledford, Mel Gorman,
	Ross Zwisler, Kyungmin Park, Sean Hefty, Mauro Carvalho Chehab,
	Christoph Hellwig, Vlastimil Babka, linux-media

Changes since v1 [1]:
* Cleanup local 'vmas' argument (Christoph)
* Replace inline IS_ENABLED(CONFIG_FS_DAX) in C code with ifdef
  versions of get_user_pages_longterm() for the FS_DAX on/off cases
  (Christoph)
* Add a new patch for the get_vaddr_frames() case, this impacts users
  like V4L2, and the Exynos driver.
* Collect Christoph's reviewed-by for the rdma change

[1]: https://lwn.net/Articles/738323/

---

Andrew,

Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely. This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).

In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.

Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.

Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.

I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel. The behavior regression this policy change
implies is one of the reasons we maintain the "dax enabled. Warning:
EXPERIMENTAL, use at your own risk" notification when mounting a
filesystem in dax mode.

It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.

---

Dan Williams (4):
      mm: introduce get_user_pages_longterm
      mm: fail get_vaddr_frames() for filesystem-dax mappings
      [media] v4l2: disable filesystem-dax mapping support
      IB/core: disable memory registration of fileystem-dax vmas


 drivers/infiniband/core/umem.c            |    2 -
 drivers/media/v4l2-core/videobuf-dma-sg.c |    5 +-
 include/linux/fs.h                        |   14 ++++++
 include/linux/mm.h                        |   13 ++++++
 mm/frame_vector.c                         |    4 ++
 mm/gup.c                                  |   64 +++++++++++++++++++++++++++++
 6 files changed, 99 insertions(+), 3 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] mm: introduce get_user_pages_longterm
  2017-11-14 19:56 [PATCH v2 0/4] introduce get_user_pages_longterm() Dan Williams
@ 2017-11-14 19:56 ` Dan Williams
  2017-11-21 23:05   ` Andrew Morton
  2017-11-21 23:11   ` Andrew Morton
  2017-11-14 19:56 ` [PATCH v2 2/4] mm: fail get_vaddr_frames() for filesystem-dax mappings Dan Williams
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Dan Williams @ 2017-11-14 19:56 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, Christoph Hellwig, stable, linux-nvdimm

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas. Device-dax vmas do not have this problem and are
explicitly allowed.

This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).

Cc: <stable@vger.kernel.org>
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fs.h |   14 +++++++++++
 include/linux/mm.h |   13 +++++++++++
 mm/gup.c           |   64 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 57added3201d..0bacccc9461e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3175,6 +3175,20 @@ static inline bool vma_is_dax(struct vm_area_struct *vma)
 	return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
 }
 
+static inline bool vma_is_fsdax(struct vm_area_struct *vma)
+{
+	struct inode *inode;
+
+	if (!vma->vm_file)
+		return false;
+	if (!vma_is_dax(vma))
+		return false;
+	inode = file_inode(vma->vm_file);
+	if (inode->i_mode == S_IFCHR)
+		return false; /* device-dax */
+	return true;
+}
+
 static inline int iocb_flags(struct file *file)
 {
 	int res = 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8d9f52a84f77..34ff8830f717 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1369,6 +1369,19 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 		    unsigned int gup_flags, struct page **pages, int *locked);
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 		    struct page **pages, unsigned int gup_flags);
+#ifdef CONFIG_FS_DAX
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+			    unsigned int gup_flags, struct page **pages,
+			    struct vm_area_struct **vmas);
+#else
+static inline long get_user_pages_longterm(unsigned long start,
+		unsigned long nr_pages, unsigned int gup_flags,
+		struct page **pages, struct vm_area_struct **vmas)
+{
+	return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+}
+#endif /* CONFIG_FS_DAX */
+
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
 
diff --git a/mm/gup.c b/mm/gup.c
index b2b4d4263768..ad9d13987f14 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+#ifdef CONFIG_FS_DAX
+/*
+ * This is the same as get_user_pages() in that it assumes we are
+ * operating on the current task's mm, but it goes further to validate
+ * that the vmas associated with the address range are suitable for
+ * longterm elevated page reference counts. For example, filesystem-dax
+ * mappings are subject to the lifetime enforced by the filesystem and
+ * we need guarantees that longterm users like RDMA and V4L2 only
+ * establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * "longterm" == userspace controlled elevated page count lifetime.
+ * Contrast this to iov_iter_get_pages() usages which are transient.
+ */
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+		unsigned int gup_flags, struct page **pages,
+		struct vm_area_struct **vmas_arg)
+{
+	struct vm_area_struct **vmas = vmas_arg;
+	struct vm_area_struct *vma_prev = NULL;
+	long rc, i;
+
+	if (!pages)
+		return -EINVAL;
+
+	if (!vmas) {
+		vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
+				GFP_KERNEL);
+		if (!vmas)
+			return -ENOMEM;
+	}
+
+	rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+
+	for (i = 0; i < rc; i++) {
+		struct vm_area_struct *vma = vmas[i];
+
+		if (vma == vma_prev)
+			continue;
+
+		vma_prev = vma;
+
+		if (vma_is_fsdax(vma))
+			break;
+	}
+
+	/*
+	 * Either get_user_pages() failed, or the vma validation
+	 * succeeded, in either case we don't need to put_page() before
+	 * returning.
+	 */
+	if (i >= rc)
+		goto out;
+
+	for (i = 0; i < rc; i++)
+		put_page(pages[i]);
+	rc = -EOPNOTSUPP;
+out:
+	if (vmas != vmas_arg)
+		kfree(vmas);
+	return rc;
+}
+EXPORT_SYMBOL(get_user_pages_longterm);
+#endif /* CONFIG_FS_DAX */
+
 /**
  * populate_vma_page_range() -  populate a range of pages in the vma.
  * @vma:   target vma

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] mm: fail get_vaddr_frames() for filesystem-dax mappings
  2017-11-14 19:56 [PATCH v2 0/4] introduce get_user_pages_longterm() Dan Williams
  2017-11-14 19:56 ` [PATCH v2 1/4] mm: introduce get_user_pages_longterm Dan Williams
@ 2017-11-14 19:56 ` Dan Williams
  2017-11-27 16:15   ` Jan Kara
  2017-11-14 19:56 ` [PATCH v2 3/4] [media] v4l2: disable filesystem-dax mapping support Dan Williams
  2017-11-14 19:56 ` [PATCH v2 4/4] IB/core: disable memory registration of fileystem-dax vmas Dan Williams
  3 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2017-11-14 19:56 UTC (permalink / raw)
  To: akpm
  Cc: Jan Kara, Joonyoung Shim, linux-nvdimm, Seung-Woo Kim,
	linux-kernel, stable, Inki Dae, linux-mm, Kyungmin Park,
	Mel Gorman, Mauro Carvalho Chehab, Vlastimil Babka, linux-media

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow V4L2, Exynos, and other frame vector users to create
long standing / irrevocable memory registrations against filesytem-dax
vmas.

Cc: Inki Dae <inki.dae@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-media@vger.kernel.org
Cc: Jan Kara <jack@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 mm/frame_vector.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 72ebec18629c..d2fdbeaadc8b 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -52,6 +52,10 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 		ret = -EFAULT;
 		goto out;
 	}
+
+	if (vma_is_fsdax(vma))
+		return -EOPNOTSUPP;
+
 	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
 		vec->got_ref = true;
 		vec->is_pfns = false;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] [media] v4l2: disable filesystem-dax mapping support
  2017-11-14 19:56 [PATCH v2 0/4] introduce get_user_pages_longterm() Dan Williams
  2017-11-14 19:56 ` [PATCH v2 1/4] mm: introduce get_user_pages_longterm Dan Williams
  2017-11-14 19:56 ` [PATCH v2 2/4] mm: fail get_vaddr_frames() for filesystem-dax mappings Dan Williams
@ 2017-11-14 19:56 ` Dan Williams
  2017-11-27 16:17   ` Jan Kara
  2017-11-14 19:56 ` [PATCH v2 4/4] IB/core: disable memory registration of fileystem-dax vmas Dan Williams
  3 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2017-11-14 19:56 UTC (permalink / raw)
  To: akpm
  Cc: Jan Kara, linux-nvdimm, linux-kernel, stable, linux-mm,
	Mauro Carvalho Chehab, linux-media

V4L2 memory registrations are incompatible with filesystem-dax that
needs the ability to revoke dma access to a mapping at will, or
otherwise allow the kernel to wait for completion of DMA. The
filesystem-dax implementation breaks the traditional solution of
truncate of active file backed mappings since there is no page-cache
page we can orphan to sustain ongoing DMA.

If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.

Reported-by: Jan Kara <jack@suse.cz>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-media@vger.kernel.org
Cc: <stable@vger.kernel.org>
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/media/v4l2-core/videobuf-dma-sg.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 0b5c43f7e020..f412429cf5ba 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
 	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
 		data, size, dma->nr_pages);
 
-	err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+	err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
 			     flags, dma->pages, NULL);
 
 	if (err != dma->nr_pages) {
 		dma->nr_pages = (err >= 0) ? err : 0;
-		dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
+		dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+			dma->nr_pages);
 		return err < 0 ? err : -EINVAL;
 	}
 	return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] IB/core: disable memory registration of fileystem-dax vmas
  2017-11-14 19:56 [PATCH v2 0/4] introduce get_user_pages_longterm() Dan Williams
                   ` (2 preceding siblings ...)
  2017-11-14 19:56 ` [PATCH v2 3/4] [media] v4l2: disable filesystem-dax mapping support Dan Williams
@ 2017-11-14 19:56 ` Dan Williams
  3 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2017-11-14 19:56 UTC (permalink / raw)
  To: akpm
  Cc: linux-nvdimm, linux-rdma, linux-kernel, Jeff Moyer, stable,
	Christoph Hellwig, Jason Gunthorpe, linux-mm, Doug Ledford,
	Ross Zwisler, Sean Hefty, Hal Rosenstock

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.

Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: <linux-rdma@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/infiniband/core/umem.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 21e60b1e2ff4..130606c3b07c 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	sg_list_start = umem->sg_head.sgl;
 
 	while (npages) {
-		ret = get_user_pages(cur_base,
+		ret = get_user_pages_longterm(cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
 				     gup_flags, page_list, vma_list);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/4] mm: introduce get_user_pages_longterm
  2017-11-14 19:56 ` [PATCH v2 1/4] mm: introduce get_user_pages_longterm Dan Williams
@ 2017-11-21 23:05   ` Andrew Morton
  2017-11-21 23:26     ` Dan Williams
  2017-11-21 23:11   ` Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2017-11-21 23:05 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-mm, linux-kernel, Christoph Hellwig, stable, linux-nvdimm

On Tue, 14 Nov 2017 11:56:34 -0800 Dan Williams <dan.j.williams@intel.com> wrote:

> Until there is a solution to the dma-to-dax vs truncate problem it is
> not safe to allow long standing memory registrations against
> filesytem-dax vmas. Device-dax vmas do not have this problem and are
> explicitly allowed.
> 
> This is temporary until a "memory registration with layout-lease"
> mechanism can be implemented for the affected sub-systems (RDMA and
> V4L2).

Sounds like that will be unpleasant.  Do we really need it to be that
complex?  Can we get away with simply failing the get_user_pages()
request?  Or are there significant usecases for RDMA and V4L to play
with DAX memory?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/4] mm: introduce get_user_pages_longterm
  2017-11-14 19:56 ` [PATCH v2 1/4] mm: introduce get_user_pages_longterm Dan Williams
  2017-11-21 23:05   ` Andrew Morton
@ 2017-11-21 23:11   ` Andrew Morton
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2017-11-21 23:11 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-mm, linux-kernel, Christoph Hellwig, stable, linux-nvdimm

On Tue, 14 Nov 2017 11:56:34 -0800 Dan Williams <dan.j.williams@intel.com> wrote:

> Until there is a solution to the dma-to-dax vs truncate problem it is
> not safe to allow long standing memory registrations against
> filesytem-dax vmas. Device-dax vmas do not have this problem and are
> explicitly allowed.
> 
> This is temporary until a "memory registration with layout-lease"
> mechanism can be implemented for the affected sub-systems (RDMA and
> V4L2).
> 
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
>  }
>  EXPORT_SYMBOL(get_user_pages);
>  
> +#ifdef CONFIG_FS_DAX
> +/*
> + * This is the same as get_user_pages() in that it assumes we are
> + * operating on the current task's mm, but it goes further to validate
> + * that the vmas associated with the address range are suitable for
> + * longterm elevated page reference counts. For example, filesystem-dax
> + * mappings are subject to the lifetime enforced by the filesystem and
> + * we need guarantees that longterm users like RDMA and V4L2 only
> + * establish mappings that have a kernel enforced revocation mechanism.
> + *
> + * "longterm" == userspace controlled elevated page count lifetime.
> + * Contrast this to iov_iter_get_pages() usages which are transient.
> + */
> +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> +		unsigned int gup_flags, struct page **pages,
> +		struct vm_area_struct **vmas_arg)
> +{
> +	struct vm_area_struct **vmas = vmas_arg;
> +	struct vm_area_struct *vma_prev = NULL;
> +	long rc, i;
> +
> +	if (!pages)
> +		return -EINVAL;
> +
> +	if (!vmas) {
> +		vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
> +				GFP_KERNEL);
> +		if (!vmas)
> +			return -ENOMEM;
> +	}
>
> ...
>

I'll do this:

--- a/mm/gup.c~mm-introduce-get_user_pages_longterm-fix
+++ a/mm/gup.c
@@ -1120,8 +1120,8 @@ long get_user_pages_longterm(unsigned lo
 		return -EINVAL;
 
 	if (!vmas) {
-		vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
-				GFP_KERNEL);
+		vmas = kcalloc(nr_pages, sizeof(struct vm_area_struct *),
+			       GFP_KERNEL);
 		if (!vmas)
 			return -ENOMEM;
 	}
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/4] mm: introduce get_user_pages_longterm
  2017-11-21 23:05   ` Andrew Morton
@ 2017-11-21 23:26     ` Dan Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2017-11-21 23:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux MM, linux-kernel, Christoph Hellwig, stable, linux-nvdimm

On Tue, Nov 21, 2017 at 3:05 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Tue, 14 Nov 2017 11:56:34 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
>
> > Until there is a solution to the dma-to-dax vs truncate problem it is
> > not safe to allow long standing memory registrations against
> > filesytem-dax vmas. Device-dax vmas do not have this problem and are
> > explicitly allowed.
> >
> > This is temporary until a "memory registration with layout-lease"
> > mechanism can be implemented for the affected sub-systems (RDMA and
> > V4L2).
>
> Sounds like that will be unpleasant.  Do we really need it to be that
> complex?  Can we get away with simply failing the get_user_pages()
> request?  Or are there significant usecases for RDMA and V4L to play
> with DAX memory?

V4L plus DAX is indeed dubious, but RDMA to persistent memory is
something the RDMA community is interested in supporting [1].

[1]: http://www.snia.org/sites/default/files/SDC15_presentations/persistant_mem/ChetDouglas_RDMA_with_PM.pdf

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] mm: fail get_vaddr_frames() for filesystem-dax mappings
  2017-11-14 19:56 ` [PATCH v2 2/4] mm: fail get_vaddr_frames() for filesystem-dax mappings Dan Williams
@ 2017-11-27 16:15   ` Jan Kara
  2017-11-27 16:19     ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2017-11-27 16:15 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Jan Kara, Joonyoung Shim, linux-nvdimm, Seung-Woo Kim,
	linux-kernel, stable, Inki Dae, linux-mm, Kyungmin Park,
	Mel Gorman, Mauro Carvalho Chehab, Vlastimil Babka, linux-media

On Tue 14-11-17 11:56:39, Dan Williams wrote:
> Until there is a solution to the dma-to-dax vs truncate problem it is
> not safe to allow V4L2, Exynos, and other frame vector users to create
> long standing / irrevocable memory registrations against filesytem-dax
> vmas.
> 
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: linux-media@vger.kernel.org
> Cc: Jan Kara <jack@suse.cz>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <stable@vger.kernel.org>
> Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Makes sense. I'd just note that in principle get_vaddr_frames() is no more
long-term than get_user_pages(). It is just so that all the users of
get_vaddr_frames() currently want a long-term reference. Maybe could you
add here also a comment that the vma_is_fsdax() check is there because all
users of this function want a long term page reference? With that you can
add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/frame_vector.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> index 72ebec18629c..d2fdbeaadc8b 100644
> --- a/mm/frame_vector.c
> +++ b/mm/frame_vector.c
> @@ -52,6 +52,10 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>  		ret = -EFAULT;
>  		goto out;
>  	}
> +
> +	if (vma_is_fsdax(vma))
> +		return -EOPNOTSUPP;
> +
>  	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
>  		vec->got_ref = true;
>  		vec->is_pfns = false;
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] [media] v4l2: disable filesystem-dax mapping support
  2017-11-14 19:56 ` [PATCH v2 3/4] [media] v4l2: disable filesystem-dax mapping support Dan Williams
@ 2017-11-27 16:17   ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2017-11-27 16:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Jan Kara, linux-nvdimm, linux-kernel, stable, linux-mm,
	Mauro Carvalho Chehab, linux-media

On Tue 14-11-17 11:56:45, Dan Williams wrote:
> V4L2 memory registrations are incompatible with filesystem-dax that
> needs the ability to revoke dma access to a mapping at will, or
> otherwise allow the kernel to wait for completion of DMA. The
> filesystem-dax implementation breaks the traditional solution of
> truncate of active file backed mappings since there is no page-cache
> page we can orphan to sustain ongoing DMA.
> 
> If v4l2 wants to support long lived DMA mappings it needs to arrange to
> hold a file lease or use some other mechanism so that the kernel can
> coordinate revoking DMA access when the filesystem needs to truncate
> mappings.
> 
> Reported-by: Jan Kara <jack@suse.cz>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: linux-media@vger.kernel.org
> Cc: <stable@vger.kernel.org>
> Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/media/v4l2-core/videobuf-dma-sg.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
> index 0b5c43f7e020..f412429cf5ba 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-sg.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
> @@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
>  	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
>  		data, size, dma->nr_pages);
>  
> -	err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
> +	err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
>  			     flags, dma->pages, NULL);
>  
>  	if (err != dma->nr_pages) {
>  		dma->nr_pages = (err >= 0) ? err : 0;
> -		dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
> +		dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
> +			dma->nr_pages);
>  		return err < 0 ? err : -EINVAL;
>  	}
>  	return 0;
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] mm: fail get_vaddr_frames() for filesystem-dax mappings
  2017-11-27 16:15   ` Jan Kara
@ 2017-11-27 16:19     ` Dan Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2017-11-27 16:19 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Joonyoung Shim, linux-nvdimm, Seung-Woo Kim,
	linux-kernel, stable, Inki Dae, Linux MM, Kyungmin Park,
	Mel Gorman, Mauro Carvalho Chehab, Vlastimil Babka,
	Linux-media@vger.kernel.org

On Mon, Nov 27, 2017 at 8:15 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 14-11-17 11:56:39, Dan Williams wrote:
>> Until there is a solution to the dma-to-dax vs truncate problem it is
>> not safe to allow V4L2, Exynos, and other frame vector users to create
>> long standing / irrevocable memory registrations against filesytem-dax
>> vmas.
>>
>> Cc: Inki Dae <inki.dae@samsung.com>
>> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
>> Cc: Joonyoung Shim <jy0922.shim@samsung.com>
>> Cc: Kyungmin Park <kyungmin.park@samsung.com>
>> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
>> Cc: linux-media@vger.kernel.org
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: <stable@vger.kernel.org>
>> Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Makes sense. I'd just note that in principle get_vaddr_frames() is no more
> long-term than get_user_pages(). It is just so that all the users of
> get_vaddr_frames() currently want a long-term reference. Maybe could you
> add here also a comment that the vma_is_fsdax() check is there because all
> users of this function want a long term page reference? With that you can
> add:

Ok, will do.

> Reviewed-by: Jan Kara <jack@suse.cz>

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-11-27 16:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-14 19:56 [PATCH v2 0/4] introduce get_user_pages_longterm() Dan Williams
2017-11-14 19:56 ` [PATCH v2 1/4] mm: introduce get_user_pages_longterm Dan Williams
2017-11-21 23:05   ` Andrew Morton
2017-11-21 23:26     ` Dan Williams
2017-11-21 23:11   ` Andrew Morton
2017-11-14 19:56 ` [PATCH v2 2/4] mm: fail get_vaddr_frames() for filesystem-dax mappings Dan Williams
2017-11-27 16:15   ` Jan Kara
2017-11-27 16:19     ` Dan Williams
2017-11-14 19:56 ` [PATCH v2 3/4] [media] v4l2: disable filesystem-dax mapping support Dan Williams
2017-11-27 16:17   ` Jan Kara
2017-11-14 19:56 ` [PATCH v2 4/4] IB/core: disable memory registration of fileystem-dax vmas Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).