All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] introduce get_user_pages_longterm()
@ 2017-11-07  0:57 ` Dan Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-07  0:57 UTC (permalink / raw)
  To: akpm
  Cc: Sean Hefty, Jan Kara, linux-rdma, linux-kernel, Doug Ledford,
	stable, Hal Rosenstock, Jason Gunthorpe, linux-mm, Jeff Moyer,
	Ross Zwisler, Mauro Carvalho Chehab, Christoph Hellwig,
	linux-media

Andrew,

Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely. This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).

In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.

Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.

Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.

I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel. The behavior regression this policy change
implies is one of the reasons we maintain the "dax enabled. Warning:
EXPERIMENTAL, use at your own risk" notification when mounting a
filesystem in dax mode.

It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.

---

Dan Williams (3):
      mm: introduce get_user_pages_longterm
      IB/core: disable memory registration of fileystem-dax vmas
      [media] v4l2: disable filesystem-dax mapping support


 drivers/infiniband/core/umem.c            |    2 -
 drivers/media/v4l2-core/videobuf-dma-sg.c |    5 +-
 include/linux/mm.h                        |    3 +
 mm/gup.c                                  |   75 +++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+), 3 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 0/3] introduce get_user_pages_longterm()
@ 2017-11-07  0:57 ` Dan Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-07  0:57 UTC (permalink / raw)
  To: akpm
  Cc: Sean Hefty, Jan Kara, linux-rdma, linux-kernel, Doug Ledford,
	stable, Hal Rosenstock, Jason Gunthorpe, linux-mm, Jeff Moyer,
	Ross Zwisler, Mauro Carvalho Chehab, Christoph Hellwig,
	linux-media

Andrew,

Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely. This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).

In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.

Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.

Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.

I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel. The behavior regression this policy change
implies is one of the reasons we maintain the "dax enabled. Warning:
EXPERIMENTAL, use at your own risk" notification when mounting a
filesystem in dax mode.

It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.

---

Dan Williams (3):
      mm: introduce get_user_pages_longterm
      IB/core: disable memory registration of fileystem-dax vmas
      [media] v4l2: disable filesystem-dax mapping support


 drivers/infiniband/core/umem.c            |    2 -
 drivers/media/v4l2-core/videobuf-dma-sg.c |    5 +-
 include/linux/mm.h                        |    3 +
 mm/gup.c                                  |   75 +++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/3] mm: introduce get_user_pages_longterm
  2017-11-07  0:57 ` Dan Williams
@ 2017-11-07  0:57   ` Dan Williams
  -1 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-07  0:57 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Christoph Hellwig, stable, linux-kernel

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas. Device-dax vmas do not have this problem and are
explicitly allowed.

This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).

Cc: <stable@vger.kernel.org>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mm.h |    3 ++
 mm/gup.c           |   75 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8d9f52a84f77..0ffe93072abf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1365,6 +1365,9 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 long get_user_pages(unsigned long start, unsigned long nr_pages,
 			    unsigned int gup_flags, struct page **pages,
 			    struct vm_area_struct **vmas);
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+			    unsigned int gup_flags, struct page **pages,
+			    struct vm_area_struct **vmas);
 long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 		    unsigned int gup_flags, struct page **pages, int *locked);
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
diff --git a/mm/gup.c b/mm/gup.c
index b2b4d4263768..6c913731acad 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1095,6 +1095,81 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+/*
+ * This is the same as get_user_pages() in that it assumes we are
+ * operating on the current task's mm, but it goes further to validate
+ * that the vmas associated with the address range are suitable for
+ * longterm elevated page reference counts. For example, filesystem-dax
+ * mappings are subject to the lifetime enforced by the filesystem and
+ * we need guarantees that longterm users like RDMA and V4L2 only
+ * establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * "longterm" == userspace controlled elevated page count lifetime.
+ * Contrast this to iov_iter_get_pages() usages which are transient.
+ */
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+		unsigned int gup_flags, struct page **pages,
+		struct vm_area_struct **vmas)
+{
+	struct vm_area_struct **__vmas = vmas;
+	struct vm_area_struct *vma_prev = NULL;
+	long rc, i;
+
+	if (!pages)
+		return -EINVAL;
+
+	if (!vmas && IS_ENABLED(CONFIG_FS_DAX)) {
+		__vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
+				GFP_KERNEL);
+		if (!__vmas)
+			return -ENOMEM;
+	}
+
+	rc = get_user_pages(start, nr_pages, gup_flags, pages, __vmas);
+
+	/* skip scan for fs-dax vmas if they are compile time disabled */
+	if (!IS_ENABLED(CONFIG_FS_DAX))
+		goto out;
+
+	for (i = 0; i < rc; i++) {
+		struct inode *inode;
+		struct vm_area_struct *vma = __vmas[i];
+
+		if (vma == vma_prev)
+			continue;
+		vma_prev = vma;
+
+		if (!vma_is_dax(vma))
+			continue;
+		inode = file_inode(vma->vm_file);
+
+		/* device-dax is safe for longterm... */
+		inode = file_inode(vma->vm_file);
+		if (inode->i_mode == S_IFCHR)
+			continue;
+
+		/* ...filesystem-dax is not. */
+		break;
+	}
+
+	/*
+	 * Either get_user_pages() failed, or the vma validation
+	 * succeeded, in either case we don't need to put_page() before
+	 * returning.
+	 */
+	if (i >= rc)
+		goto out;
+
+	for (i = 0; i < rc; i++)
+		put_page(pages[i]);
+	rc = -EOPNOTSUPP;
+out:
+	if (vmas != __vmas)
+		kfree(__vmas);
+	return rc;
+}
+EXPORT_SYMBOL(get_user_pages_longterm);
+
 /**
  * populate_vma_page_range() -  populate a range of pages in the vma.
  * @vma:   target vma

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 1/3] mm: introduce get_user_pages_longterm
@ 2017-11-07  0:57   ` Dan Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-07  0:57 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, Christoph Hellwig, stable, linux-kernel

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas. Device-dax vmas do not have this problem and are
explicitly allowed.

This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).

Cc: <stable@vger.kernel.org>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/mm.h |    3 ++
 mm/gup.c           |   75 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8d9f52a84f77..0ffe93072abf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1365,6 +1365,9 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 long get_user_pages(unsigned long start, unsigned long nr_pages,
 			    unsigned int gup_flags, struct page **pages,
 			    struct vm_area_struct **vmas);
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+			    unsigned int gup_flags, struct page **pages,
+			    struct vm_area_struct **vmas);
 long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 		    unsigned int gup_flags, struct page **pages, int *locked);
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
diff --git a/mm/gup.c b/mm/gup.c
index b2b4d4263768..6c913731acad 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1095,6 +1095,81 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+/*
+ * This is the same as get_user_pages() in that it assumes we are
+ * operating on the current task's mm, but it goes further to validate
+ * that the vmas associated with the address range are suitable for
+ * longterm elevated page reference counts. For example, filesystem-dax
+ * mappings are subject to the lifetime enforced by the filesystem and
+ * we need guarantees that longterm users like RDMA and V4L2 only
+ * establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * "longterm" == userspace controlled elevated page count lifetime.
+ * Contrast this to iov_iter_get_pages() usages which are transient.
+ */
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+		unsigned int gup_flags, struct page **pages,
+		struct vm_area_struct **vmas)
+{
+	struct vm_area_struct **__vmas = vmas;
+	struct vm_area_struct *vma_prev = NULL;
+	long rc, i;
+
+	if (!pages)
+		return -EINVAL;
+
+	if (!vmas && IS_ENABLED(CONFIG_FS_DAX)) {
+		__vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
+				GFP_KERNEL);
+		if (!__vmas)
+			return -ENOMEM;
+	}
+
+	rc = get_user_pages(start, nr_pages, gup_flags, pages, __vmas);
+
+	/* skip scan for fs-dax vmas if they are compile time disabled */
+	if (!IS_ENABLED(CONFIG_FS_DAX))
+		goto out;
+
+	for (i = 0; i < rc; i++) {
+		struct inode *inode;
+		struct vm_area_struct *vma = __vmas[i];
+
+		if (vma == vma_prev)
+			continue;
+		vma_prev = vma;
+
+		if (!vma_is_dax(vma))
+			continue;
+		inode = file_inode(vma->vm_file);
+
+		/* device-dax is safe for longterm... */
+		inode = file_inode(vma->vm_file);
+		if (inode->i_mode == S_IFCHR)
+			continue;
+
+		/* ...filesystem-dax is not. */
+		break;
+	}
+
+	/*
+	 * Either get_user_pages() failed, or the vma validation
+	 * succeeded, in either case we don't need to put_page() before
+	 * returning.
+	 */
+	if (i >= rc)
+		goto out;
+
+	for (i = 0; i < rc; i++)
+		put_page(pages[i]);
+	rc = -EOPNOTSUPP;
+out:
+	if (vmas != __vmas)
+		kfree(__vmas);
+	return rc;
+}
+EXPORT_SYMBOL(get_user_pages_longterm);
+
 /**
  * populate_vma_page_range() -  populate a range of pages in the vma.
  * @vma:   target vma

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] IB/core: disable memory registration of fileystem-dax vmas
  2017-11-07  0:57 ` Dan Williams
@ 2017-11-07  0:57   ` Dan Williams
  -1 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-07  0:57 UTC (permalink / raw)
  To: akpm
  Cc: linux-rdma, linux-kernel, Jeff Moyer, stable, Christoph Hellwig,
	Jason Gunthorpe, linux-mm, Doug Ledford, Ross Zwisler,
	Sean Hefty, Hal Rosenstock

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.

Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: <linux-rdma@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/infiniband/core/umem.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 21e60b1e2ff4..130606c3b07c 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	sg_list_start = umem->sg_head.sgl;
 
 	while (npages) {
-		ret = get_user_pages(cur_base,
+		ret = get_user_pages_longterm(cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
 				     gup_flags, page_list, vma_list);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] IB/core: disable memory registration of fileystem-dax vmas
@ 2017-11-07  0:57   ` Dan Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-07  0:57 UTC (permalink / raw)
  To: akpm
  Cc: linux-rdma, linux-kernel, Jeff Moyer, stable, Christoph Hellwig,
	Jason Gunthorpe, linux-mm, Doug Ledford, Ross Zwisler,
	Sean Hefty, Hal Rosenstock

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.

Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: <linux-rdma@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/infiniband/core/umem.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 21e60b1e2ff4..130606c3b07c 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	sg_list_start = umem->sg_head.sgl;
 
 	while (npages) {
-		ret = get_user_pages(cur_base,
+		ret = get_user_pages_longterm(cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
 				     gup_flags, page_list, vma_list);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
  2017-11-07  0:57 ` Dan Williams
@ 2017-11-07  0:57   ` Dan Williams
  -1 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-07  0:57 UTC (permalink / raw)
  To: akpm
  Cc: Jan Kara, linux-kernel, stable, linux-mm, Mauro Carvalho Chehab,
	linux-media

V4L2 memory registrations are incompatible with filesystem-dax that
needs the ability to revoke dma access to a mapping at will, or
otherwise allow the kernel to wait for completion of DMA. The
filesystem-dax implementation breaks the traditional solution of
truncate of active file backed mappings since there is no page-cache
page we can orphan to sustain ongoing DMA.

If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.

Reported-by: Jan Kara <jack@suse.cz>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-media@vger.kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/media/v4l2-core/videobuf-dma-sg.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 0b5c43f7e020..f412429cf5ba 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
 	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
 		data, size, dma->nr_pages);
 
-	err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+	err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
 			     flags, dma->pages, NULL);
 
 	if (err != dma->nr_pages) {
 		dma->nr_pages = (err >= 0) ? err : 0;
-		dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
+		dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+			dma->nr_pages);
 		return err < 0 ? err : -EINVAL;
 	}
 	return 0;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
@ 2017-11-07  0:57   ` Dan Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-07  0:57 UTC (permalink / raw)
  To: akpm
  Cc: Jan Kara, linux-kernel, stable, linux-mm, Mauro Carvalho Chehab,
	linux-media

V4L2 memory registrations are incompatible with filesystem-dax that
needs the ability to revoke dma access to a mapping at will, or
otherwise allow the kernel to wait for completion of DMA. The
filesystem-dax implementation breaks the traditional solution of
truncate of active file backed mappings since there is no page-cache
page we can orphan to sustain ongoing DMA.

If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.

Reported-by: Jan Kara <jack@suse.cz>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-media@vger.kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/media/v4l2-core/videobuf-dma-sg.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 0b5c43f7e020..f412429cf5ba 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
 	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
 		data, size, dma->nr_pages);
 
-	err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+	err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
 			     flags, dma->pages, NULL);
 
 	if (err != dma->nr_pages) {
 		dma->nr_pages = (err >= 0) ? err : 0;
-		dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
+		dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+			dma->nr_pages);
 		return err < 0 ? err : -EINVAL;
 	}
 	return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
  2017-11-07  0:57   ` Dan Williams
@ 2017-11-07  8:33     ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 21+ messages in thread
From: Mauro Carvalho Chehab @ 2017-11-07  8:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Jan Kara, linux-kernel, stable, linux-mm,
	Mauro Carvalho Chehab, linux-media

Em Mon, 06 Nov 2017 16:57:28 -0800
Dan Williams <dan.j.williams@intel.com> escreveu:

> V4L2 memory registrations are incompatible with filesystem-dax that
> needs the ability to revoke dma access to a mapping at will, or
> otherwise allow the kernel to wait for completion of DMA. The
> filesystem-dax implementation breaks the traditional solution of
> truncate of active file backed mappings since there is no page-cache
> page we can orphan to sustain ongoing DMA.
> 
> If v4l2 wants to support long lived DMA mappings it needs to arrange to
> hold a file lease or use some other mechanism so that the kernel can
> coordinate revoking DMA access when the filesystem needs to truncate
> mappings.


Not sure if I understand this your comment here... what happens
if FS_DAX is enabled? The new err = get_user_pages_longterm()
would cause DMA allocation to fail? If so, that doesn't sound
right. Instead, mm should somehow mark this mapping to be out
of FS_DAX control range.

Also, it is not only videobuf-dma-sg.c that does long lived
DMA mappings. VB2 also does that (and videobuf-vmalloc).

Regards,
Mauro


> 
> Reported-by: Jan Kara <jack@suse.cz>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: linux-media@vger.kernel.org
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/media/v4l2-core/videobuf-dma-sg.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
> index 0b5c43f7e020..f412429cf5ba 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-sg.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
> @@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
>  	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
>  		data, size, dma->nr_pages);
>  
> -	err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
> +	err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
>  			     flags, dma->pages, NULL);
>  
>  	if (err != dma->nr_pages) {
>  		dma->nr_pages = (err >= 0) ? err : 0;
> -		dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
> +		dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
> +			dma->nr_pages);
>  		return err < 0 ? err : -EINVAL;
>  	}
>  	return 0;
> 



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
@ 2017-11-07  8:33     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 21+ messages in thread
From: Mauro Carvalho Chehab @ 2017-11-07  8:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, Jan Kara, linux-kernel, stable, linux-mm,
	Mauro Carvalho Chehab, linux-media

Em Mon, 06 Nov 2017 16:57:28 -0800
Dan Williams <dan.j.williams@intel.com> escreveu:

> V4L2 memory registrations are incompatible with filesystem-dax that
> needs the ability to revoke dma access to a mapping at will, or
> otherwise allow the kernel to wait for completion of DMA. The
> filesystem-dax implementation breaks the traditional solution of
> truncate of active file backed mappings since there is no page-cache
> page we can orphan to sustain ongoing DMA.
> 
> If v4l2 wants to support long lived DMA mappings it needs to arrange to
> hold a file lease or use some other mechanism so that the kernel can
> coordinate revoking DMA access when the filesystem needs to truncate
> mappings.


Not sure if I understand this your comment here... what happens
if FS_DAX is enabled? The new err = get_user_pages_longterm()
would cause DMA allocation to fail? If so, that doesn't sound
right. Instead, mm should somehow mark this mapping to be out
of FS_DAX control range.

Also, it is not only videobuf-dma-sg.c that does long lived
DMA mappings. VB2 also does that (and videobuf-vmalloc).

Regards,
Mauro


> 
> Reported-by: Jan Kara <jack@suse.cz>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: linux-media@vger.kernel.org
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/media/v4l2-core/videobuf-dma-sg.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
> index 0b5c43f7e020..f412429cf5ba 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-sg.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
> @@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
>  	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
>  		data, size, dma->nr_pages);
>  
> -	err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
> +	err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
>  			     flags, dma->pages, NULL);
>  
>  	if (err != dma->nr_pages) {
>  		dma->nr_pages = (err >= 0) ? err : 0;
> -		dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
> +		dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
> +			dma->nr_pages);
>  		return err < 0 ? err : -EINVAL;
>  	}
>  	return 0;
> 



Thanks,
Mauro

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
  2017-11-07  8:33     ` Mauro Carvalho Chehab
@ 2017-11-07 17:43       ` Dan Williams
  -1 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-07 17:43 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Andrew Morton, Jan Kara, linux-kernel, stable, Linux MM,
	Mauro Carvalho Chehab, Linux-media@vger.kernel.org

On Tue, Nov 7, 2017 at 12:33 AM, Mauro Carvalho Chehab
<mchehab@s-opensource.com> wrote:
> Em Mon, 06 Nov 2017 16:57:28 -0800
> Dan Williams <dan.j.williams@intel.com> escreveu:
>
>> V4L2 memory registrations are incompatible with filesystem-dax that
>> needs the ability to revoke dma access to a mapping at will, or
>> otherwise allow the kernel to wait for completion of DMA. The
>> filesystem-dax implementation breaks the traditional solution of
>> truncate of active file backed mappings since there is no page-cache
>> page we can orphan to sustain ongoing DMA.
>>
>> If v4l2 wants to support long lived DMA mappings it needs to arrange to
>> hold a file lease or use some other mechanism so that the kernel can
>> coordinate revoking DMA access when the filesystem needs to truncate
>> mappings.
>
>
> Not sure if I understand this your comment here... what happens
> if FS_DAX is enabled? The new err = get_user_pages_longterm()
> would cause DMA allocation to fail?

Correct, any attempt to specify a filesystem-dax mapping range to
get_user_pages_longterm will fail with EOPNOTSUPP. In the future we
want to add something like a 'struct file_lock *' argument to
get_user_pages_longterm so that the kernel has a handle to revoke
access to the returned pages. Once we have a safe way for the kernel
to undo elevated page counts we can stop failing the longterm vs
filesystem-dax case.

Here is more background on why _longterm gup is a problem for filesystem-dax:

    https://lwn.net/Articles/737273/

> If so, that doesn't sound
> right. Instead, mm should somehow mark this mapping to be out
> of FS_DAX control range.

DAX is currently global setting for the entire backing device of the
filesystem, so any mapping of any file when the "-o dax" mount option
is set is in the "FS_DAX control range". In other words there's
currently no way to prevent FS_DAX mappings from being exposed to V4L2
outside of CONFIG_FS_DAX=n.

> Also, it is not only videobuf-dma-sg.c that does long lived
> DMA mappings. VB2 also does that (and videobuf-vmalloc).

Without finding the code videobuf-vmalloc sounds like it should be ok
if the kernel is allocating memory separate from a file-backed DAX
mapping. Where is the VB2 get_user_pages call?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
@ 2017-11-07 17:43       ` Dan Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-07 17:43 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Andrew Morton, Jan Kara, linux-kernel, stable, Linux MM,
	Mauro Carvalho Chehab, Linux-media@vger.kernel.org

On Tue, Nov 7, 2017 at 12:33 AM, Mauro Carvalho Chehab
<mchehab@s-opensource.com> wrote:
> Em Mon, 06 Nov 2017 16:57:28 -0800
> Dan Williams <dan.j.williams@intel.com> escreveu:
>
>> V4L2 memory registrations are incompatible with filesystem-dax that
>> needs the ability to revoke dma access to a mapping at will, or
>> otherwise allow the kernel to wait for completion of DMA. The
>> filesystem-dax implementation breaks the traditional solution of
>> truncate of active file backed mappings since there is no page-cache
>> page we can orphan to sustain ongoing DMA.
>>
>> If v4l2 wants to support long lived DMA mappings it needs to arrange to
>> hold a file lease or use some other mechanism so that the kernel can
>> coordinate revoking DMA access when the filesystem needs to truncate
>> mappings.
>
>
> Not sure if I understand this your comment here... what happens
> if FS_DAX is enabled? The new err = get_user_pages_longterm()
> would cause DMA allocation to fail?

Correct, any attempt to specify a filesystem-dax mapping range to
get_user_pages_longterm will fail with EOPNOTSUPP. In the future we
want to add something like a 'struct file_lock *' argument to
get_user_pages_longterm so that the kernel has a handle to revoke
access to the returned pages. Once we have a safe way for the kernel
to undo elevated page counts we can stop failing the longterm vs
filesystem-dax case.

Here is more background on why _longterm gup is a problem for filesystem-dax:

    https://lwn.net/Articles/737273/

> If so, that doesn't sound
> right. Instead, mm should somehow mark this mapping to be out
> of FS_DAX control range.

DAX is currently global setting for the entire backing device of the
filesystem, so any mapping of any file when the "-o dax" mount option
is set is in the "FS_DAX control range". In other words there's
currently no way to prevent FS_DAX mappings from being exposed to V4L2
outside of CONFIG_FS_DAX=n.

> Also, it is not only videobuf-dma-sg.c that does long lived
> DMA mappings. VB2 also does that (and videobuf-vmalloc).

Without finding the code videobuf-vmalloc sounds like it should be ok
if the kernel is allocating memory separate from a file-backed DAX
mapping. Where is the VB2 get_user_pages call?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
  2017-11-07 17:43       ` Dan Williams
@ 2017-11-07 20:39         ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 21+ messages in thread
From: Mauro Carvalho Chehab @ 2017-11-07 20:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Jan Kara, linux-kernel, stable, Linux MM,
	Mauro Carvalho Chehab, Linux-media@vger.kernel.org

Em Tue, 7 Nov 2017 09:43:41 -0800
Dan Williams <dan.j.williams@intel.com> escreveu:

> On Tue, Nov 7, 2017 at 12:33 AM, Mauro Carvalho Chehab
> <mchehab@s-opensource.com> wrote:
> > Em Mon, 06 Nov 2017 16:57:28 -0800
> > Dan Williams <dan.j.williams@intel.com> escreveu:
> >  
> >> V4L2 memory registrations are incompatible with filesystem-dax that
> >> needs the ability to revoke dma access to a mapping at will, or
> >> otherwise allow the kernel to wait for completion of DMA. The
> >> filesystem-dax implementation breaks the traditional solution of
> >> truncate of active file backed mappings since there is no page-cache
> >> page we can orphan to sustain ongoing DMA.
> >>
> >> If v4l2 wants to support long lived DMA mappings it needs to arrange to
> >> hold a file lease or use some other mechanism so that the kernel can
> >> coordinate revoking DMA access when the filesystem needs to truncate
> >> mappings.  
> >
> >
> > Not sure if I understand this your comment here... what happens
> > if FS_DAX is enabled? The new err = get_user_pages_longterm()
> > would cause DMA allocation to fail?  
> 
> Correct, any attempt to specify a filesystem-dax mapping range to
> get_user_pages_longterm will fail with EOPNOTSUPP. In the future we
> want to add something like a 'struct file_lock *' argument to
> get_user_pages_longterm so that the kernel has a handle to revoke
> access to the returned pages. Once we have a safe way for the kernel
> to undo elevated page counts we can stop failing the longterm vs
> filesystem-dax case.

Argh! Perhaps we should make it depend on BROKEN while not fixed :-/

> Here is more background on why _longterm gup is a problem for filesystem-dax:
> 
>     https://lwn.net/Articles/737273/
> 
> > If so, that doesn't sound
> > right. Instead, mm should somehow mark this mapping to be out
> > of FS_DAX control range.  
> 
> DAX is currently global setting for the entire backing device of the
> filesystem, so any mapping of any file when the "-o dax" mount option
> is set is in the "FS_DAX control range". In other words there's
> currently no way to prevent FS_DAX mappings from being exposed to V4L2
> outside of CONFIG_FS_DAX=n.

Grrr...

> > Also, it is not only videobuf-dma-sg.c that does long lived
> > DMA mappings. VB2 also does that (and videobuf-vmalloc).  
> 
> Without finding the code videobuf-vmalloc sounds like it should be ok
> if the kernel is allocating memory separate from a file-backed DAX
> mapping.

videobuf-vmalloc do DMA mapping for pages allocated via vmalloc(),
via vmalloc_user()/remap_vmalloc_range().

There aren't much drivers using VB1 anymore, but a change at VB2
will likely break support for almost all webcams if fs DAX is
in usage.

> Where is the VB2 get_user_pages call?

Before changeset 3336c24f25ec, the logic for get_user_pages() were
at drivers/media/v4l2-core/videobuf2-dma-sg.c. Now, the logic
it uses is inside mm/frame_vector.c.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
@ 2017-11-07 20:39         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 21+ messages in thread
From: Mauro Carvalho Chehab @ 2017-11-07 20:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, Jan Kara, linux-kernel, stable, Linux MM,
	Mauro Carvalho Chehab, Linux-media@vger.kernel.org

Em Tue, 7 Nov 2017 09:43:41 -0800
Dan Williams <dan.j.williams@intel.com> escreveu:

> On Tue, Nov 7, 2017 at 12:33 AM, Mauro Carvalho Chehab
> <mchehab@s-opensource.com> wrote:
> > Em Mon, 06 Nov 2017 16:57:28 -0800
> > Dan Williams <dan.j.williams@intel.com> escreveu:
> >  
> >> V4L2 memory registrations are incompatible with filesystem-dax that
> >> needs the ability to revoke dma access to a mapping at will, or
> >> otherwise allow the kernel to wait for completion of DMA. The
> >> filesystem-dax implementation breaks the traditional solution of
> >> truncate of active file backed mappings since there is no page-cache
> >> page we can orphan to sustain ongoing DMA.
> >>
> >> If v4l2 wants to support long lived DMA mappings it needs to arrange to
> >> hold a file lease or use some other mechanism so that the kernel can
> >> coordinate revoking DMA access when the filesystem needs to truncate
> >> mappings.  
> >
> >
> > Not sure if I understand this your comment here... what happens
> > if FS_DAX is enabled? The new err = get_user_pages_longterm()
> > would cause DMA allocation to fail?  
> 
> Correct, any attempt to specify a filesystem-dax mapping range to
> get_user_pages_longterm will fail with EOPNOTSUPP. In the future we
> want to add something like a 'struct file_lock *' argument to
> get_user_pages_longterm so that the kernel has a handle to revoke
> access to the returned pages. Once we have a safe way for the kernel
> to undo elevated page counts we can stop failing the longterm vs
> filesystem-dax case.

Argh! Perhaps we should make it depend on BROKEN while not fixed :-/

> Here is more background on why _longterm gup is a problem for filesystem-dax:
> 
>     https://lwn.net/Articles/737273/
> 
> > If so, that doesn't sound
> > right. Instead, mm should somehow mark this mapping to be out
> > of FS_DAX control range.  
> 
> DAX is currently global setting for the entire backing device of the
> filesystem, so any mapping of any file when the "-o dax" mount option
> is set is in the "FS_DAX control range". In other words there's
> currently no way to prevent FS_DAX mappings from being exposed to V4L2
> outside of CONFIG_FS_DAX=n.

Grrr...

> > Also, it is not only videobuf-dma-sg.c that does long lived
> > DMA mappings. VB2 also does that (and videobuf-vmalloc).  
> 
> Without finding the code videobuf-vmalloc sounds like it should be ok
> if the kernel is allocating memory separate from a file-backed DAX
> mapping.

videobuf-vmalloc do DMA mapping for pages allocated via vmalloc(),
via vmalloc_user()/remap_vmalloc_range().

There aren't much drivers using VB1 anymore, but a change at VB2
will likely break support for almost all webcams if fs DAX is
in usage.

> Where is the VB2 get_user_pages call?

Before changeset 3336c24f25ec, the logic for get_user_pages() were
at drivers/media/v4l2-core/videobuf2-dma-sg.c. Now, the logic
it uses is inside mm/frame_vector.c.

Thanks,
Mauro

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
  2017-11-07 20:39         ` Mauro Carvalho Chehab
@ 2017-11-08  0:13           ` Dan Williams
  -1 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-08  0:13 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Andrew Morton, Jan Kara, linux-kernel, stable, Linux MM,
	Mauro Carvalho Chehab, Linux-media@vger.kernel.org

On Tue, Nov 7, 2017 at 12:39 PM, Mauro Carvalho Chehab
<mchehab@s-opensource.com> wrote:
> Em Tue, 7 Nov 2017 09:43:41 -0800
> Dan Williams <dan.j.williams@intel.com> escreveu:
>
>> On Tue, Nov 7, 2017 at 12:33 AM, Mauro Carvalho Chehab
>> <mchehab@s-opensource.com> wrote:
>> > Em Mon, 06 Nov 2017 16:57:28 -0800
>> > Dan Williams <dan.j.williams@intel.com> escreveu:
>> >
>> >> V4L2 memory registrations are incompatible with filesystem-dax that
>> >> needs the ability to revoke dma access to a mapping at will, or
>> >> otherwise allow the kernel to wait for completion of DMA. The
>> >> filesystem-dax implementation breaks the traditional solution of
>> >> truncate of active file backed mappings since there is no page-cache
>> >> page we can orphan to sustain ongoing DMA.
>> >>
>> >> If v4l2 wants to support long lived DMA mappings it needs to arrange to
>> >> hold a file lease or use some other mechanism so that the kernel can
>> >> coordinate revoking DMA access when the filesystem needs to truncate
>> >> mappings.
>> >
>> >
>> > Not sure if I understand this your comment here... what happens
>> > if FS_DAX is enabled? The new err = get_user_pages_longterm()
>> > would cause DMA allocation to fail?
>>
>> Correct, any attempt to specify a filesystem-dax mapping range to
>> get_user_pages_longterm will fail with EOPNOTSUPP. In the future we
>> want to add something like a 'struct file_lock *' argument to
>> get_user_pages_longterm so that the kernel has a handle to revoke
>> access to the returned pages. Once we have a safe way for the kernel
>> to undo elevated page counts we can stop failing the longterm vs
>> filesystem-dax case.
>
> Argh! Perhaps we should make it depend on BROKEN while not fixed :-/

Small consolation, but we do warn that filesystem-dax is still
considered experimental when mounting a filesystem with "-o dax"

>> Here is more background on why _longterm gup is a problem for filesystem-dax:
>>
>>     https://lwn.net/Articles/737273/
>>
>> > If so, that doesn't sound
>> > right. Instead, mm should somehow mark this mapping to be out
>> > of FS_DAX control range.
>>
>> DAX is currently global setting for the entire backing device of the
>> filesystem, so any mapping of any file when the "-o dax" mount option
>> is set is in the "FS_DAX control range". In other words there's
>> currently no way to prevent FS_DAX mappings from being exposed to V4L2
>> outside of CONFIG_FS_DAX=n.
>
> Grrr...
>
>> > Also, it is not only videobuf-dma-sg.c that does long lived
>> > DMA mappings. VB2 also does that (and videobuf-vmalloc).
>>
>> Without finding the code videobuf-vmalloc sounds like it should be ok
>> if the kernel is allocating memory separate from a file-backed DAX
>> mapping.
>
> videobuf-vmalloc do DMA mapping for pages allocated via vmalloc(),
> via vmalloc_user()/remap_vmalloc_range().

Ok, that's completely safe since filesystem-dax mappings are not
involved in a vmalloc backed virtual address range.

> There aren't much drivers using VB1 anymore, but a change at VB2
> will likely break support for almost all webcams if fs DAX is
> in usage.

Yes, unless / until we can switch userspace to using a new memory
registration api that includes a way for the kernel to revoke access
to a dax mapping. Another mitigation is following through on support
for moving dax support from a global mount flag to a per-inode flag to
at least prevent dax from leaking to use cases that need explicit
coordination.

>> Where is the VB2 get_user_pages call?
>
> Before changeset 3336c24f25ec, the logic for get_user_pages() were
> at drivers/media/v4l2-core/videobuf2-dma-sg.c. Now, the logic
> it uses is inside mm/frame_vector.c.

Ok, I'll take a look.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support
@ 2017-11-08  0:13           ` Dan Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2017-11-08  0:13 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Andrew Morton, Jan Kara, linux-kernel, stable, Linux MM,
	Mauro Carvalho Chehab, Linux-media@vger.kernel.org

On Tue, Nov 7, 2017 at 12:39 PM, Mauro Carvalho Chehab
<mchehab@s-opensource.com> wrote:
> Em Tue, 7 Nov 2017 09:43:41 -0800
> Dan Williams <dan.j.williams@intel.com> escreveu:
>
>> On Tue, Nov 7, 2017 at 12:33 AM, Mauro Carvalho Chehab
>> <mchehab@s-opensource.com> wrote:
>> > Em Mon, 06 Nov 2017 16:57:28 -0800
>> > Dan Williams <dan.j.williams@intel.com> escreveu:
>> >
>> >> V4L2 memory registrations are incompatible with filesystem-dax that
>> >> needs the ability to revoke dma access to a mapping at will, or
>> >> otherwise allow the kernel to wait for completion of DMA. The
>> >> filesystem-dax implementation breaks the traditional solution of
>> >> truncate of active file backed mappings since there is no page-cache
>> >> page we can orphan to sustain ongoing DMA.
>> >>
>> >> If v4l2 wants to support long lived DMA mappings it needs to arrange to
>> >> hold a file lease or use some other mechanism so that the kernel can
>> >> coordinate revoking DMA access when the filesystem needs to truncate
>> >> mappings.
>> >
>> >
>> > Not sure if I understand this your comment here... what happens
>> > if FS_DAX is enabled? The new err = get_user_pages_longterm()
>> > would cause DMA allocation to fail?
>>
>> Correct, any attempt to specify a filesystem-dax mapping range to
>> get_user_pages_longterm will fail with EOPNOTSUPP. In the future we
>> want to add something like a 'struct file_lock *' argument to
>> get_user_pages_longterm so that the kernel has a handle to revoke
>> access to the returned pages. Once we have a safe way for the kernel
>> to undo elevated page counts we can stop failing the longterm vs
>> filesystem-dax case.
>
> Argh! Perhaps we should make it depend on BROKEN while not fixed :-/

Small consolation, but we do warn that filesystem-dax is still
considered experimental when mounting a filesystem with "-o dax"

>> Here is more background on why _longterm gup is a problem for filesystem-dax:
>>
>>     https://lwn.net/Articles/737273/
>>
>> > If so, that doesn't sound
>> > right. Instead, mm should somehow mark this mapping to be out
>> > of FS_DAX control range.
>>
>> DAX is currently global setting for the entire backing device of the
>> filesystem, so any mapping of any file when the "-o dax" mount option
>> is set is in the "FS_DAX control range". In other words there's
>> currently no way to prevent FS_DAX mappings from being exposed to V4L2
>> outside of CONFIG_FS_DAX=n.
>
> Grrr...
>
>> > Also, it is not only videobuf-dma-sg.c that does long lived
>> > DMA mappings. VB2 also does that (and videobuf-vmalloc).
>>
>> Without finding the code videobuf-vmalloc sounds like it should be ok
>> if the kernel is allocating memory separate from a file-backed DAX
>> mapping.
>
> videobuf-vmalloc do DMA mapping for pages allocated via vmalloc(),
> via vmalloc_user()/remap_vmalloc_range().

Ok, that's completely safe since filesystem-dax mappings are not
involved in a vmalloc backed virtual address range.

> There aren't much drivers using VB1 anymore, but a change at VB2
> will likely break support for almost all webcams if fs DAX is
> in usage.

Yes, unless / until we can switch userspace to using a new memory
registration api that includes a way for the kernel to revoke access
to a dax mapping. Another mitigation is following through on support
for moving dax support from a global mount flag to a per-inode flag to
at least prevent dax from leaking to use cases that need explicit
coordination.

>> Where is the VB2 get_user_pages call?
>
> Before changeset 3336c24f25ec, the logic for get_user_pages() were
> at drivers/media/v4l2-core/videobuf2-dma-sg.c. Now, the logic
> it uses is inside mm/frame_vector.c.

Ok, I'll take a look.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] mm: introduce get_user_pages_longterm
  2017-11-07  0:57   ` Dan Williams
@ 2017-11-10  9:01     ` Christoph Hellwig
  -1 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2017-11-10  9:01 UTC (permalink / raw)
  To: Dan Williams; +Cc: akpm, linux-mm, Christoph Hellwig, stable, linux-kernel

> +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> +		unsigned int gup_flags, struct page **pages,
> +		struct vm_area_struct **vmas)
> +{
> +	struct vm_area_struct **__vmas = vmas;

How about calling the vma argument vma_arg, and the one used vma to
make thigns a little more readable?

> +	struct vm_area_struct *vma_prev = NULL;
> +	long rc, i;
> +
> +	if (!pages)
> +		return -EINVAL;
> +
> +	if (!vmas && IS_ENABLED(CONFIG_FS_DAX)) {
> +		__vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
> +				GFP_KERNEL);
> +		if (!__vmas)
> +			return -ENOMEM;
> +	}
> +
> +	rc = get_user_pages(start, nr_pages, gup_flags, pages, __vmas);
> +
> +	/* skip scan for fs-dax vmas if they are compile time disabled */
> +	if (!IS_ENABLED(CONFIG_FS_DAX))
> +		goto out;

Instead of all this IS_ENABLED magic I'd recomment to just conditionally
compile this function and define it to get_user_pages in the header
if FS_DAX is disabled.

Else this looks fine to me.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] mm: introduce get_user_pages_longterm
@ 2017-11-10  9:01     ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2017-11-10  9:01 UTC (permalink / raw)
  To: Dan Williams; +Cc: akpm, linux-mm, Christoph Hellwig, stable, linux-kernel

> +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> +		unsigned int gup_flags, struct page **pages,
> +		struct vm_area_struct **vmas)
> +{
> +	struct vm_area_struct **__vmas = vmas;

How about calling the vma argument vma_arg, and the one used vma to
make thigns a little more readable?

> +	struct vm_area_struct *vma_prev = NULL;
> +	long rc, i;
> +
> +	if (!pages)
> +		return -EINVAL;
> +
> +	if (!vmas && IS_ENABLED(CONFIG_FS_DAX)) {
> +		__vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
> +				GFP_KERNEL);
> +		if (!__vmas)
> +			return -ENOMEM;
> +	}
> +
> +	rc = get_user_pages(start, nr_pages, gup_flags, pages, __vmas);
> +
> +	/* skip scan for fs-dax vmas if they are compile time disabled */
> +	if (!IS_ENABLED(CONFIG_FS_DAX))
> +		goto out;

Instead of all this IS_ENABLED magic I'd recomment to just conditionally
compile this function and define it to get_user_pages in the header
if FS_DAX is disabled.

Else this looks fine to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] IB/core: disable memory registration of fileystem-dax vmas
  2017-11-07  0:57   ` Dan Williams
  (?)
@ 2017-11-10  9:01       ` Christoph Hellwig
  -1 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2017-11-10  9:01 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jeff Moyer,
	stable-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	Jason Gunthorpe, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Doug Ledford,
	Ross Zwisler, Sean Hefty, Hal Rosenstock

On Mon, Nov 06, 2017 at 04:57:21PM -0800, Dan Williams wrote:
> Until there is a solution to the dma-to-dax vs truncate problem it is
> not safe to allow RDMA to create long standing memory registrations
> against filesytem-dax vmas.

Looks good:

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] IB/core: disable memory registration of fileystem-dax vmas
@ 2017-11-10  9:01       ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2017-11-10  9:01 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, linux-rdma, linux-kernel, Jeff Moyer, stable,
	Christoph Hellwig, Jason Gunthorpe, linux-mm, Doug Ledford,
	Ross Zwisler, Sean Hefty, Hal Rosenstock

On Mon, Nov 06, 2017 at 04:57:21PM -0800, Dan Williams wrote:
> Until there is a solution to the dma-to-dax vs truncate problem it is
> not safe to allow RDMA to create long standing memory registrations
> against filesytem-dax vmas.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] IB/core: disable memory registration of fileystem-dax vmas
@ 2017-11-10  9:01       ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2017-11-10  9:01 UTC (permalink / raw)
  To: Dan Williams
  Cc: akpm, linux-rdma, linux-kernel, Jeff Moyer, stable,
	Christoph Hellwig, Jason Gunthorpe, linux-mm, Doug Ledford,
	Ross Zwisler, Sean Hefty, Hal Rosenstock

On Mon, Nov 06, 2017 at 04:57:21PM -0800, Dan Williams wrote:
> Until there is a solution to the dma-to-dax vs truncate problem it is
> not safe to allow RDMA to create long standing memory registrations
> against filesytem-dax vmas.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-11-10  9:01 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-07  0:57 [PATCH 0/3] introduce get_user_pages_longterm() Dan Williams
2017-11-07  0:57 ` Dan Williams
2017-11-07  0:57 ` [PATCH 1/3] mm: introduce get_user_pages_longterm Dan Williams
2017-11-07  0:57   ` Dan Williams
2017-11-10  9:01   ` Christoph Hellwig
2017-11-10  9:01     ` Christoph Hellwig
2017-11-07  0:57 ` [PATCH 2/3] IB/core: disable memory registration of fileystem-dax vmas Dan Williams
2017-11-07  0:57   ` Dan Williams
     [not found]   ` <151001624138.16354.16836728315400060928.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-11-10  9:01     ` Christoph Hellwig
2017-11-10  9:01       ` Christoph Hellwig
2017-11-10  9:01       ` Christoph Hellwig
2017-11-07  0:57 ` [PATCH 3/3] [media] v4l2: disable filesystem-dax mapping support Dan Williams
2017-11-07  0:57   ` Dan Williams
2017-11-07  8:33   ` Mauro Carvalho Chehab
2017-11-07  8:33     ` Mauro Carvalho Chehab
2017-11-07 17:43     ` Dan Williams
2017-11-07 17:43       ` Dan Williams
2017-11-07 20:39       ` Mauro Carvalho Chehab
2017-11-07 20:39         ` Mauro Carvalho Chehab
2017-11-08  0:13         ` Dan Williams
2017-11-08  0:13           ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.