All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] follow_pfn and other iomap races
@ 2020-10-07 16:44 ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter

Hi all,

This developed from a discussion with Jason, starting with some patches
touching get_vaddr_frame that I typed up.

The problem is that way back VM_IO | VM_PFNMAP mappings were pretty
static, and so just following the ptes to derive a pfn and then use that
somewhere else was ok.

But we're no longer in such a world, there's tons of little races and some
fundamental problems.

This series here is an attempt to at least scope the problem, it's all the
issues I've found with quite some code reading all over the tree:
- first part tries to move mm/frame-vector.c away, it's fundamentally an
  unsafe thing
- two patches to close follow_pfn races by holding pt locks
- two pci patches where I spotted inconsinstencies between the 3 different
  ways userspace can map pci bars
- and finally some patches to mark up the remaining issue

No testing beyond "it compiles", this is very much an rfc to figure out
whether this makes sense, whether it's a real thing, and how to fix this
up properly.

Cheers, Daniel

Daniel Vetter (13):
  drm/exynos: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  misc/habana: Stop using frame_vector helpers
  misc/habana: Use FOLL_LONGTERM for userptr
  mm/frame-vector: Use FOLL_LONGTERM
  media: videobuf2: Move frame_vector into media subsystem
  mm: close race in generic_access_phys
  s390/pci: Remove races against pte updates
  PCI: obey iomem restrictions for procfs mmap
  PCI: revoke mappings like devmem
  mm: add unsafe_follow_pfn
  media/videbuf1|2: Mark follow_pfn usage as unsafe
  vfio/type1: Mark follow_pfn as unsafe

 arch/s390/pci/pci_mmio.c                      | 98 +++++++++++--------
 drivers/char/mem.c                            | 16 ++-
 drivers/gpu/drm/exynos/Kconfig                |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c       | 49 +++++-----
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    | 40 +++-----
 drivers/media/platform/omap/Kconfig           |  1 -
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/misc/habanalabs/Kconfig               |  1 -
 drivers/misc/habanalabs/common/habanalabs.h   |  3 +-
 drivers/misc/habanalabs/common/memory.c       | 52 +++++-----
 drivers/pci/mmap.c                            |  3 +
 drivers/pci/proc.c                            |  5 +
 drivers/vfio/vfio_iommu_type1.c               |  4 +-
 include/linux/ioport.h                        |  2 +
 include/linux/mm.h                            | 47 +--------
 include/media/videobuf2-core.h                | 42 ++++++++
 mm/Kconfig                                    |  3 -
 mm/Makefile                                   |  1 -
 mm/memory.c                                   | 76 +++++++++++++-
 mm/nommu.c                                    | 17 ++++
 security/Kconfig                              | 13 +++
 23 files changed, 296 insertions(+), 182 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (90%)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 00/13] follow_pfn and other iomap races
@ 2020-10-07 16:44 ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, kvm, Daniel Vetter, linux-mm,
	linux-arm-kernel, linux-media

Hi all,

This developed from a discussion with Jason, starting with some patches
touching get_vaddr_frame that I typed up.

The problem is that way back VM_IO | VM_PFNMAP mappings were pretty
static, and so just following the ptes to derive a pfn and then use that
somewhere else was ok.

But we're no longer in such a world, there's tons of little races and some
fundamental problems.

This series here is an attempt to at least scope the problem, it's all the
issues I've found with quite some code reading all over the tree:
- first part tries to move mm/frame-vector.c away, it's fundamentally an
  unsafe thing
- two patches to close follow_pfn races by holding pt locks
- two pci patches where I spotted inconsinstencies between the 3 different
  ways userspace can map pci bars
- and finally some patches to mark up the remaining issue

No testing beyond "it compiles", this is very much an rfc to figure out
whether this makes sense, whether it's a real thing, and how to fix this
up properly.

Cheers, Daniel

Daniel Vetter (13):
  drm/exynos: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  misc/habana: Stop using frame_vector helpers
  misc/habana: Use FOLL_LONGTERM for userptr
  mm/frame-vector: Use FOLL_LONGTERM
  media: videobuf2: Move frame_vector into media subsystem
  mm: close race in generic_access_phys
  s390/pci: Remove races against pte updates
  PCI: obey iomem restrictions for procfs mmap
  PCI: revoke mappings like devmem
  mm: add unsafe_follow_pfn
  media/videbuf1|2: Mark follow_pfn usage as unsafe
  vfio/type1: Mark follow_pfn as unsafe

 arch/s390/pci/pci_mmio.c                      | 98 +++++++++++--------
 drivers/char/mem.c                            | 16 ++-
 drivers/gpu/drm/exynos/Kconfig                |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c       | 49 +++++-----
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    | 40 +++-----
 drivers/media/platform/omap/Kconfig           |  1 -
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/misc/habanalabs/Kconfig               |  1 -
 drivers/misc/habanalabs/common/habanalabs.h   |  3 +-
 drivers/misc/habanalabs/common/memory.c       | 52 +++++-----
 drivers/pci/mmap.c                            |  3 +
 drivers/pci/proc.c                            |  5 +
 drivers/vfio/vfio_iommu_type1.c               |  4 +-
 include/linux/ioport.h                        |  2 +
 include/linux/mm.h                            | 47 +--------
 include/media/videobuf2-core.h                | 42 ++++++++
 mm/Kconfig                                    |  3 -
 mm/Makefile                                   |  1 -
 mm/memory.c                                   | 76 +++++++++++++-
 mm/nommu.c                                    | 17 ++++
 security/Kconfig                              | 13 +++
 23 files changed, 296 insertions(+), 182 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (90%)

-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 00/13] follow_pfn and other iomap races
@ 2020-10-07 16:44 ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, kvm, Daniel Vetter, linux-mm,
	linux-arm-kernel, linux-media

Hi all,

This developed from a discussion with Jason, starting with some patches
touching get_vaddr_frame that I typed up.

The problem is that way back VM_IO | VM_PFNMAP mappings were pretty
static, and so just following the ptes to derive a pfn and then use that
somewhere else was ok.

But we're no longer in such a world, there's tons of little races and some
fundamental problems.

This series here is an attempt to at least scope the problem, it's all the
issues I've found with quite some code reading all over the tree:
- first part tries to move mm/frame-vector.c away, it's fundamentally an
  unsafe thing
- two patches to close follow_pfn races by holding pt locks
- two pci patches where I spotted inconsinstencies between the 3 different
  ways userspace can map pci bars
- and finally some patches to mark up the remaining issue

No testing beyond "it compiles", this is very much an rfc to figure out
whether this makes sense, whether it's a real thing, and how to fix this
up properly.

Cheers, Daniel

Daniel Vetter (13):
  drm/exynos: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  misc/habana: Stop using frame_vector helpers
  misc/habana: Use FOLL_LONGTERM for userptr
  mm/frame-vector: Use FOLL_LONGTERM
  media: videobuf2: Move frame_vector into media subsystem
  mm: close race in generic_access_phys
  s390/pci: Remove races against pte updates
  PCI: obey iomem restrictions for procfs mmap
  PCI: revoke mappings like devmem
  mm: add unsafe_follow_pfn
  media/videbuf1|2: Mark follow_pfn usage as unsafe
  vfio/type1: Mark follow_pfn as unsafe

 arch/s390/pci/pci_mmio.c                      | 98 +++++++++++--------
 drivers/char/mem.c                            | 16 ++-
 drivers/gpu/drm/exynos/Kconfig                |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c       | 49 +++++-----
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    | 40 +++-----
 drivers/media/platform/omap/Kconfig           |  1 -
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/misc/habanalabs/Kconfig               |  1 -
 drivers/misc/habanalabs/common/habanalabs.h   |  3 +-
 drivers/misc/habanalabs/common/memory.c       | 52 +++++-----
 drivers/pci/mmap.c                            |  3 +
 drivers/pci/proc.c                            |  5 +
 drivers/vfio/vfio_iommu_type1.c               |  4 +-
 include/linux/ioport.h                        |  2 +
 include/linux/mm.h                            | 47 +--------
 include/media/videobuf2-core.h                | 42 ++++++++
 mm/Kconfig                                    |  3 -
 mm/Makefile                                   |  1 -
 mm/memory.c                                   | 76 +++++++++++++-
 mm/nommu.c                                    | 17 ++++
 security/Kconfig                              | 13 +++
 23 files changed, 296 insertions(+), 182 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (90%)

-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Inki Dae, Joonyoung Shim, Seung-Woo Kim, Kyungmin Park,
	Kukjin Kim, Krzysztof Kozlowski, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/gpu/drm/exynos/Kconfig          |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++++++++++-------------
 2 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 6417f374b923..43257ef3c09d 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -88,7 +88,6 @@ comment "Sub-drivers"
 config DRM_EXYNOS_G2D
 	bool "G2D"
 	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
-	select FRAME_VECTOR
 	help
 	  Choose this option if you want to use Exynos G2D for DRM.
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 967a5cdc120e..c83f6faac9de 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
 	dma_addr_t		dma_addr;
 	unsigned long		userptr;
 	unsigned long		size;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	atomic_t		refcount;
 	bool			in_pool;
@@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 					bool force)
 {
 	struct g2d_cmdlist_userptr *g2d_userptr = obj;
-	struct page **pages;
+	int i;
 
 	if (!obj)
 		return;
@@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
 			  DMA_BIDIRECTIONAL, 0);
 
-	pages = frame_vector_pages(g2d_userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
+	for (i = 0; i < g2d_userptr->npages; i++)
+		set_page_dirty_lock(g2d_userptr->pages[i]);
 
-		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(g2d_userptr->vec);
-	frame_vector_destroy(g2d_userptr->vec);
+	unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
+	kvfree(g2d_userptr->pages);
 
 	if (!g2d_userptr->out_of_list)
 		list_del_init(&g2d_userptr->list);
@@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 	offset = userptr & ~PAGE_MASK;
 	end = PAGE_ALIGN(userptr + size);
 	npages = (end - start) >> PAGE_SHIFT;
-	g2d_userptr->vec = frame_vector_create(npages);
-	if (!g2d_userptr->vec) {
+	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
+					    GFP_KERNEL);
+	if (!g2d_userptr->pages) {
 		ret = -ENOMEM;
 		goto err_free;
 	}
 
-	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-		g2d_userptr->vec);
+	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
 			      "failed to get user pages from userptr.\n");
 		if (ret < 0)
-			goto err_destroy_framevec;
-		ret = -EFAULT;
-		goto err_put_framevec;
-	}
-	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
+			goto err_destroy_pages;
+		npages = ret;
 		ret = -EFAULT;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
+	g2d_userptr->npages = npages;
 
 	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
 	if (!sgt) {
 		ret = -ENOMEM;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
 
 	ret = sg_alloc_table_from_pages(sgt,
-					frame_vector_pages(g2d_userptr->vec),
+					g2d_userptr->pages,
 					npages, offset, size, GFP_KERNEL);
 	if (ret < 0) {
 		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
@@ -538,11 +534,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 err_free_sgt:
 	kfree(sgt);
 
-err_put_framevec:
-	put_vaddr_frames(g2d_userptr->vec);
+err_unpin_pages:
+	unpin_user_pages(g2d_userptr->pages, npages);
 
-err_destroy_framevec:
-	frame_vector_destroy(g2d_userptr->vec);
+err_destroy_pages:
+	kvfree(g2d_userptr->pages);
 
 err_free:
 	kfree(g2d_userptr);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Daniel Vetter, Seung-Woo Kim,
	Jérôme Glisse, Krzysztof Kozlowski, Inki Dae, linux-mm,
	Kyungmin Park, Kukjin Kim, John Hubbard, Daniel Vetter,
	Andrew Morton, Dan Williams, linux-arm-kernel, linux-media

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/gpu/drm/exynos/Kconfig          |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++++++++++-------------
 2 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 6417f374b923..43257ef3c09d 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -88,7 +88,6 @@ comment "Sub-drivers"
 config DRM_EXYNOS_G2D
 	bool "G2D"
 	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
-	select FRAME_VECTOR
 	help
 	  Choose this option if you want to use Exynos G2D for DRM.
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 967a5cdc120e..c83f6faac9de 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
 	dma_addr_t		dma_addr;
 	unsigned long		userptr;
 	unsigned long		size;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	atomic_t		refcount;
 	bool			in_pool;
@@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 					bool force)
 {
 	struct g2d_cmdlist_userptr *g2d_userptr = obj;
-	struct page **pages;
+	int i;
 
 	if (!obj)
 		return;
@@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
 			  DMA_BIDIRECTIONAL, 0);
 
-	pages = frame_vector_pages(g2d_userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
+	for (i = 0; i < g2d_userptr->npages; i++)
+		set_page_dirty_lock(g2d_userptr->pages[i]);
 
-		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(g2d_userptr->vec);
-	frame_vector_destroy(g2d_userptr->vec);
+	unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
+	kvfree(g2d_userptr->pages);
 
 	if (!g2d_userptr->out_of_list)
 		list_del_init(&g2d_userptr->list);
@@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 	offset = userptr & ~PAGE_MASK;
 	end = PAGE_ALIGN(userptr + size);
 	npages = (end - start) >> PAGE_SHIFT;
-	g2d_userptr->vec = frame_vector_create(npages);
-	if (!g2d_userptr->vec) {
+	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
+					    GFP_KERNEL);
+	if (!g2d_userptr->pages) {
 		ret = -ENOMEM;
 		goto err_free;
 	}
 
-	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-		g2d_userptr->vec);
+	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
 			      "failed to get user pages from userptr.\n");
 		if (ret < 0)
-			goto err_destroy_framevec;
-		ret = -EFAULT;
-		goto err_put_framevec;
-	}
-	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
+			goto err_destroy_pages;
+		npages = ret;
 		ret = -EFAULT;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
+	g2d_userptr->npages = npages;
 
 	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
 	if (!sgt) {
 		ret = -ENOMEM;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
 
 	ret = sg_alloc_table_from_pages(sgt,
-					frame_vector_pages(g2d_userptr->vec),
+					g2d_userptr->pages,
 					npages, offset, size, GFP_KERNEL);
 	if (ret < 0) {
 		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
@@ -538,11 +534,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 err_free_sgt:
 	kfree(sgt);
 
-err_put_framevec:
-	put_vaddr_frames(g2d_userptr->vec);
+err_unpin_pages:
+	unpin_user_pages(g2d_userptr->pages, npages);
 
-err_destroy_framevec:
-	frame_vector_destroy(g2d_userptr->vec);
+err_destroy_pages:
+	kvfree(g2d_userptr->pages);
 
 err_free:
 	kfree(g2d_userptr);
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Daniel Vetter, Seung-Woo Kim,
	Jérôme Glisse, Krzysztof Kozlowski, linux-mm,
	Kyungmin Park, Kukjin Kim, John Hubbard, Daniel Vetter,
	Andrew Morton, Dan Williams, linux-arm-kernel, linux-media

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/gpu/drm/exynos/Kconfig          |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++++++++++-------------
 2 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 6417f374b923..43257ef3c09d 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -88,7 +88,6 @@ comment "Sub-drivers"
 config DRM_EXYNOS_G2D
 	bool "G2D"
 	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
-	select FRAME_VECTOR
 	help
 	  Choose this option if you want to use Exynos G2D for DRM.
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 967a5cdc120e..c83f6faac9de 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
 	dma_addr_t		dma_addr;
 	unsigned long		userptr;
 	unsigned long		size;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	atomic_t		refcount;
 	bool			in_pool;
@@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 					bool force)
 {
 	struct g2d_cmdlist_userptr *g2d_userptr = obj;
-	struct page **pages;
+	int i;
 
 	if (!obj)
 		return;
@@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
 			  DMA_BIDIRECTIONAL, 0);
 
-	pages = frame_vector_pages(g2d_userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
+	for (i = 0; i < g2d_userptr->npages; i++)
+		set_page_dirty_lock(g2d_userptr->pages[i]);
 
-		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(g2d_userptr->vec);
-	frame_vector_destroy(g2d_userptr->vec);
+	unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
+	kvfree(g2d_userptr->pages);
 
 	if (!g2d_userptr->out_of_list)
 		list_del_init(&g2d_userptr->list);
@@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 	offset = userptr & ~PAGE_MASK;
 	end = PAGE_ALIGN(userptr + size);
 	npages = (end - start) >> PAGE_SHIFT;
-	g2d_userptr->vec = frame_vector_create(npages);
-	if (!g2d_userptr->vec) {
+	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
+					    GFP_KERNEL);
+	if (!g2d_userptr->pages) {
 		ret = -ENOMEM;
 		goto err_free;
 	}
 
-	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-		g2d_userptr->vec);
+	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
 			      "failed to get user pages from userptr.\n");
 		if (ret < 0)
-			goto err_destroy_framevec;
-		ret = -EFAULT;
-		goto err_put_framevec;
-	}
-	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
+			goto err_destroy_pages;
+		npages = ret;
 		ret = -EFAULT;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
+	g2d_userptr->npages = npages;
 
 	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
 	if (!sgt) {
 		ret = -ENOMEM;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
 
 	ret = sg_alloc_table_from_pages(sgt,
-					frame_vector_pages(g2d_userptr->vec),
+					g2d_userptr->pages,
 					npages, offset, size, GFP_KERNEL);
 	if (ret < 0) {
 		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
@@ -538,11 +534,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 err_free_sgt:
 	kfree(sgt);
 
-err_put_framevec:
-	put_vaddr_frames(g2d_userptr->vec);
+err_unpin_pages:
+	unpin_user_pages(g2d_userptr->pages, npages);
 
-err_destroy_framevec:
-	frame_vector_destroy(g2d_userptr->vec);
+err_destroy_pages:
+	kvfree(g2d_userptr->pages);
 
 err_free:
 	kfree(g2d_userptr);
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 02/13] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Inki Dae, Joonyoung Shim, Seung-Woo Kim, Kyungmin Park,
	Kukjin Kim, Krzysztof Kozlowski, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

The exynos g2d interface is very unusual, but it looks like the
userptr objects are persistent. Hence they need FOLL_LONGTERM.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index c83f6faac9de..514fd000feb1 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -478,7 +478,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 		goto err_free;
 	}
 
-	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	ret = pin_user_pages_fast(start, npages,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 02/13] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Daniel Vetter, Seung-Woo Kim,
	Jérôme Glisse, Krzysztof Kozlowski, Inki Dae, linux-mm,
	Kyungmin Park, Kukjin Kim, John Hubbard, Daniel Vetter,
	Andrew Morton, Dan Williams, linux-arm-kernel, linux-media

The exynos g2d interface is very unusual, but it looks like the
userptr objects are persistent. Hence they need FOLL_LONGTERM.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index c83f6faac9de..514fd000feb1 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -478,7 +478,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 		goto err_free;
 	}
 
-	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	ret = pin_user_pages_fast(start, npages,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 02/13] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Daniel Vetter, Seung-Woo Kim,
	Jérôme Glisse, Krzysztof Kozlowski, linux-mm,
	Kyungmin Park, Kukjin Kim, John Hubbard, Daniel Vetter,
	Andrew Morton, Dan Williams, linux-arm-kernel, linux-media

The exynos g2d interface is very unusual, but it looks like the
userptr objects are persistent. Hence they need FOLL_LONGTERM.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index c83f6faac9de..514fd000feb1 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -478,7 +478,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 		goto err_free;
 	}
 
-	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	ret = pin_user_pages_fast(start, npages,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 03/13] misc/habana: Stop using frame_vector helpers
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Dan Williams, Oded Gabbay, Omer Shpigelman, Ofir Bitton,
	Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman, Pawel Piskorski

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
---
 drivers/misc/habanalabs/Kconfig             |  1 -
 drivers/misc/habanalabs/common/habanalabs.h |  3 +-
 drivers/misc/habanalabs/common/memory.c     | 51 +++++++++------------
 3 files changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 8eb5d38c618e..2f04187f7167 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -6,7 +6,6 @@
 config HABANA_AI
 	tristate "HabanaAI accelerators (habanalabs)"
 	depends on PCI && HAS_IOMEM
-	select FRAME_VECTOR
 	select DMA_SHARED_BUFFER
 	select GENERIC_ALLOCATOR
 	select HWMON
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index edbd627b29d2..c1b3ad613b15 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -881,7 +881,8 @@ struct hl_ctx_mgr {
 struct hl_userptr {
 	enum vm_type_t		vm_type; /* must be first */
 	struct list_head	job_node;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	enum dma_data_direction dir;
 	struct list_head	debugfs_list;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 5ff4688683fd..ef89cfa2f95a 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -EFAULT;
 	}
 
-	userptr->vec = frame_vector_create(npages);
-	if (!userptr->vec) {
+	userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
+					GFP_KERNEL);
+	if (!userptr->pages) {
 		dev_err(hdev->dev, "Failed to create frame vector\n");
 		return -ENOMEM;
 	}
 
-	rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-				userptr->vec);
+	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				 userptr->pages);
 
 	if (rc != npages) {
 		dev_err(hdev->dev,
 			"Failed to map host memory, user ptr probably wrong\n");
 		if (rc < 0)
-			goto destroy_framevec;
+			goto destroy_pages;
+		npages = rc;
 		rc = -EFAULT;
-		goto put_framevec;
-	}
-
-	if (frame_vector_to_pages(userptr->vec) < 0) {
-		dev_err(hdev->dev,
-			"Failed to translate frame vector to pages\n");
-		rc = -EFAULT;
-		goto put_framevec;
+		goto put_pages;
 	}
+	userptr->npages = npages;
 
 	rc = sg_alloc_table_from_pages(userptr->sgt,
-					frame_vector_pages(userptr->vec),
-					npages, offset, size, GFP_ATOMIC);
+				       userptr->pages,
+				       npages, offset, size, GFP_ATOMIC);
 	if (rc < 0) {
 		dev_err(hdev->dev, "failed to create SG table from pages\n");
-		goto put_framevec;
+		goto put_pages;
 	}
 
 	return 0;
 
-put_framevec:
-	put_vaddr_frames(userptr->vec);
-destroy_framevec:
-	frame_vector_destroy(userptr->vec);
+put_pages:
+	unpin_user_pages(userptr->pages, npages);
+destroy_pages:
+	kvfree(userptr->pages);
 	return rc;
 }
 
@@ -1405,7 +1401,7 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
  */
 void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 {
-	struct page **pages;
+	int i;
 
 	hl_debugfs_remove_userptr(hdev, userptr);
 
@@ -1414,15 +1410,10 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 							userptr->sgt->nents,
 							userptr->dir);
 
-	pages = frame_vector_pages(userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(userptr->vec);
-	frame_vector_destroy(userptr->vec);
+	for (i = 0; i < userptr->npages; i++)
+		set_page_dirty_lock(userptr->pages[i]);
+	unpin_user_pages(userptr->pages, userptr->npages);
+	kvfree(userptr->pages);
 
 	list_del(&userptr->job_node);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 03/13] misc/habana: Stop using frame_vector helpers
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: Oded Gabbay, linux-s390, linux-samsung-soc, Jan Kara, kvm,
	Jason Gunthorpe, Pawel Piskorski, Daniel Vetter,
	Greg Kroah-Hartman, Ofir Bitton, linux-mm,
	Jérôme Glisse, Tomer Tayar, Omer Shpigelman,
	John Hubbard, Daniel Vetter, Andrew Morton, Moti Haimovski,
	Dan Williams, linux-arm-kernel, linux-media

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
---
 drivers/misc/habanalabs/Kconfig             |  1 -
 drivers/misc/habanalabs/common/habanalabs.h |  3 +-
 drivers/misc/habanalabs/common/memory.c     | 51 +++++++++------------
 3 files changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 8eb5d38c618e..2f04187f7167 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -6,7 +6,6 @@
 config HABANA_AI
 	tristate "HabanaAI accelerators (habanalabs)"
 	depends on PCI && HAS_IOMEM
-	select FRAME_VECTOR
 	select DMA_SHARED_BUFFER
 	select GENERIC_ALLOCATOR
 	select HWMON
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index edbd627b29d2..c1b3ad613b15 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -881,7 +881,8 @@ struct hl_ctx_mgr {
 struct hl_userptr {
 	enum vm_type_t		vm_type; /* must be first */
 	struct list_head	job_node;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	enum dma_data_direction dir;
 	struct list_head	debugfs_list;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 5ff4688683fd..ef89cfa2f95a 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -EFAULT;
 	}
 
-	userptr->vec = frame_vector_create(npages);
-	if (!userptr->vec) {
+	userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
+					GFP_KERNEL);
+	if (!userptr->pages) {
 		dev_err(hdev->dev, "Failed to create frame vector\n");
 		return -ENOMEM;
 	}
 
-	rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-				userptr->vec);
+	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				 userptr->pages);
 
 	if (rc != npages) {
 		dev_err(hdev->dev,
 			"Failed to map host memory, user ptr probably wrong\n");
 		if (rc < 0)
-			goto destroy_framevec;
+			goto destroy_pages;
+		npages = rc;
 		rc = -EFAULT;
-		goto put_framevec;
-	}
-
-	if (frame_vector_to_pages(userptr->vec) < 0) {
-		dev_err(hdev->dev,
-			"Failed to translate frame vector to pages\n");
-		rc = -EFAULT;
-		goto put_framevec;
+		goto put_pages;
 	}
+	userptr->npages = npages;
 
 	rc = sg_alloc_table_from_pages(userptr->sgt,
-					frame_vector_pages(userptr->vec),
-					npages, offset, size, GFP_ATOMIC);
+				       userptr->pages,
+				       npages, offset, size, GFP_ATOMIC);
 	if (rc < 0) {
 		dev_err(hdev->dev, "failed to create SG table from pages\n");
-		goto put_framevec;
+		goto put_pages;
 	}
 
 	return 0;
 
-put_framevec:
-	put_vaddr_frames(userptr->vec);
-destroy_framevec:
-	frame_vector_destroy(userptr->vec);
+put_pages:
+	unpin_user_pages(userptr->pages, npages);
+destroy_pages:
+	kvfree(userptr->pages);
 	return rc;
 }
 
@@ -1405,7 +1401,7 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
  */
 void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 {
-	struct page **pages;
+	int i;
 
 	hl_debugfs_remove_userptr(hdev, userptr);
 
@@ -1414,15 +1410,10 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 							userptr->sgt->nents,
 							userptr->dir);
 
-	pages = frame_vector_pages(userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(userptr->vec);
-	frame_vector_destroy(userptr->vec);
+	for (i = 0; i < userptr->npages; i++)
+		set_page_dirty_lock(userptr->pages[i]);
+	unpin_user_pages(userptr->pages, userptr->npages);
+	kvfree(userptr->pages);
 
 	list_del(&userptr->job_node);
 
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 03/13] misc/habana: Stop using frame_vector helpers
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, kvm, Jason Gunthorpe,
	Pawel Piskorski, Daniel Vetter, Greg Kroah-Hartman, Ofir Bitton,
	linux-mm, Jérôme Glisse, Tomer Tayar, Omer Shpigelman,
	John Hubbard, Daniel Vetter, Andrew Morton, Moti Haimovski,
	Dan Williams, linux-arm-kernel, linux-media

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
---
 drivers/misc/habanalabs/Kconfig             |  1 -
 drivers/misc/habanalabs/common/habanalabs.h |  3 +-
 drivers/misc/habanalabs/common/memory.c     | 51 +++++++++------------
 3 files changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 8eb5d38c618e..2f04187f7167 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -6,7 +6,6 @@
 config HABANA_AI
 	tristate "HabanaAI accelerators (habanalabs)"
 	depends on PCI && HAS_IOMEM
-	select FRAME_VECTOR
 	select DMA_SHARED_BUFFER
 	select GENERIC_ALLOCATOR
 	select HWMON
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index edbd627b29d2..c1b3ad613b15 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -881,7 +881,8 @@ struct hl_ctx_mgr {
 struct hl_userptr {
 	enum vm_type_t		vm_type; /* must be first */
 	struct list_head	job_node;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	enum dma_data_direction dir;
 	struct list_head	debugfs_list;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 5ff4688683fd..ef89cfa2f95a 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -EFAULT;
 	}
 
-	userptr->vec = frame_vector_create(npages);
-	if (!userptr->vec) {
+	userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
+					GFP_KERNEL);
+	if (!userptr->pages) {
 		dev_err(hdev->dev, "Failed to create frame vector\n");
 		return -ENOMEM;
 	}
 
-	rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-				userptr->vec);
+	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				 userptr->pages);
 
 	if (rc != npages) {
 		dev_err(hdev->dev,
 			"Failed to map host memory, user ptr probably wrong\n");
 		if (rc < 0)
-			goto destroy_framevec;
+			goto destroy_pages;
+		npages = rc;
 		rc = -EFAULT;
-		goto put_framevec;
-	}
-
-	if (frame_vector_to_pages(userptr->vec) < 0) {
-		dev_err(hdev->dev,
-			"Failed to translate frame vector to pages\n");
-		rc = -EFAULT;
-		goto put_framevec;
+		goto put_pages;
 	}
+	userptr->npages = npages;
 
 	rc = sg_alloc_table_from_pages(userptr->sgt,
-					frame_vector_pages(userptr->vec),
-					npages, offset, size, GFP_ATOMIC);
+				       userptr->pages,
+				       npages, offset, size, GFP_ATOMIC);
 	if (rc < 0) {
 		dev_err(hdev->dev, "failed to create SG table from pages\n");
-		goto put_framevec;
+		goto put_pages;
 	}
 
 	return 0;
 
-put_framevec:
-	put_vaddr_frames(userptr->vec);
-destroy_framevec:
-	frame_vector_destroy(userptr->vec);
+put_pages:
+	unpin_user_pages(userptr->pages, npages);
+destroy_pages:
+	kvfree(userptr->pages);
 	return rc;
 }
 
@@ -1405,7 +1401,7 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
  */
 void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 {
-	struct page **pages;
+	int i;
 
 	hl_debugfs_remove_userptr(hdev, userptr);
 
@@ -1414,15 +1410,10 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 							userptr->sgt->nents,
 							userptr->dir);
 
-	pages = frame_vector_pages(userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(userptr->vec);
-	frame_vector_destroy(userptr->vec);
+	for (i = 0; i < userptr->npages; i++)
+		set_page_dirty_lock(userptr->pages[i]);
+	unpin_user_pages(userptr->pages, userptr->npages);
+	kvfree(userptr->pages);
 
 	list_del(&userptr->job_node);
 
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 04/13] misc/habana: Use FOLL_LONGTERM for userptr
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Dan Williams, Oded Gabbay, Omer Shpigelman, Ofir Bitton,
	Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman, Pawel Piskorski

These are persistent, not just for the duration of a dma operation.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
---
 drivers/misc/habanalabs/common/memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index ef89cfa2f95a..94bef8faa82a 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -ENOMEM;
 	}
 
-	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	rc = pin_user_pages_fast(start, npages,
+				 FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				 userptr->pages);
 
 	if (rc != npages) {
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 04/13] misc/habana: Use FOLL_LONGTERM for userptr
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: Oded Gabbay, linux-s390, linux-samsung-soc, Jan Kara, kvm,
	Jason Gunthorpe, Pawel Piskorski, Daniel Vetter,
	Greg Kroah-Hartman, Ofir Bitton, linux-mm,
	Jérôme Glisse, Tomer Tayar, Omer Shpigelman,
	John Hubbard, Daniel Vetter, Andrew Morton, Moti Haimovski,
	Dan Williams, linux-arm-kernel, linux-media

These are persistent, not just for the duration of a dma operation.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
---
 drivers/misc/habanalabs/common/memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index ef89cfa2f95a..94bef8faa82a 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -ENOMEM;
 	}
 
-	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	rc = pin_user_pages_fast(start, npages,
+				 FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				 userptr->pages);
 
 	if (rc != npages) {
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 04/13] misc/habana: Use FOLL_LONGTERM for userptr
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, kvm, Jason Gunthorpe,
	Pawel Piskorski, Daniel Vetter, Greg Kroah-Hartman, Ofir Bitton,
	linux-mm, Jérôme Glisse, Tomer Tayar, Omer Shpigelman,
	John Hubbard, Daniel Vetter, Andrew Morton, Moti Haimovski,
	Dan Williams, linux-arm-kernel, linux-media

These are persistent, not just for the duration of a dma operation.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
---
 drivers/misc/habanalabs/common/memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index ef89cfa2f95a..94bef8faa82a 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -ENOMEM;
 	}
 
-	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	rc = pin_user_pages_fast(start, npages,
+				 FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				 userptr->pages);
 
 	if (rc != npages) {
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Pawel Osciak, Marek Szyprowski, Kyungmin Park, Tomasz Figa,
	Mauro Carvalho Chehab, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

This is used by media/videbuf2 for persistent dma mappings, not just
for a single dma operation and then freed again, so needs
FOLL_LONGTERM.

Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
locking issues. Rework the code to pull the pup path out from the
mmap_sem critical section as suggested by Jason.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 mm/frame_vector.c | 36 +++++++++++-------------------------
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 10f82d5643b6..39db520a51dc 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 	struct vm_area_struct *vma;
 	int ret = 0;
 	int err;
-	int locked;
 
 	if (nr_frames == 0)
 		return 0;
@@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 
 	start = untagged_addr(start);
 
+	ret = pin_user_pages_fast(start, nr_frames,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+				  (struct page **)(vec->ptrs));
+	if (ret > 0) {
+		vec->got_ref = true;
+		vec->is_pfns = false;
+		goto out_unlocked;
+	}
+
 	mmap_read_lock(mm);
-	locked = 1;
 	vma = find_vma_intersection(mm, start, start + 1);
 	if (!vma) {
 		ret = -EFAULT;
 		goto out;
 	}
 
-	/*
-	 * While get_vaddr_frames() could be used for transient (kernel
-	 * controlled lifetime) pinning of memory pages all current
-	 * users establish long term (userspace controlled lifetime)
-	 * page pinning. Treat get_vaddr_frames() like
-	 * get_user_pages_longterm() and disallow it for filesystem-dax
-	 * mappings.
-	 */
-	if (vma_is_fsdax(vma)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
-	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
-		vec->got_ref = true;
-		vec->is_pfns = false;
-		ret = pin_user_pages_locked(start, nr_frames,
-			gup_flags, (struct page **)(vec->ptrs), &locked);
-		goto out;
-	}
-
 	vec->got_ref = false;
 	vec->is_pfns = true;
 	do {
@@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 		vma = find_vma_intersection(mm, start, start + 1);
 	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
 out:
-	if (locked)
-		mmap_read_unlock(mm);
+	mmap_read_unlock(mm);
+out_unlocked:
 	if (!ret)
 		ret = -EFAULT;
 	if (ret > 0)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Daniel Vetter, Mauro Carvalho Chehab,
	Jérôme Glisse, Tomasz Figa, linux-mm, Kyungmin Park,
	John Hubbard, Daniel Vetter, Andrew Morton, Marek Szyprowski,
	Dan Williams, linux-arm-kernel, linux-media

This is used by media/videbuf2 for persistent dma mappings, not just
for a single dma operation and then freed again, so needs
FOLL_LONGTERM.

Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
locking issues. Rework the code to pull the pup path out from the
mmap_sem critical section as suggested by Jason.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 mm/frame_vector.c | 36 +++++++++++-------------------------
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 10f82d5643b6..39db520a51dc 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 	struct vm_area_struct *vma;
 	int ret = 0;
 	int err;
-	int locked;
 
 	if (nr_frames == 0)
 		return 0;
@@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 
 	start = untagged_addr(start);
 
+	ret = pin_user_pages_fast(start, nr_frames,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+				  (struct page **)(vec->ptrs));
+	if (ret > 0) {
+		vec->got_ref = true;
+		vec->is_pfns = false;
+		goto out_unlocked;
+	}
+
 	mmap_read_lock(mm);
-	locked = 1;
 	vma = find_vma_intersection(mm, start, start + 1);
 	if (!vma) {
 		ret = -EFAULT;
 		goto out;
 	}
 
-	/*
-	 * While get_vaddr_frames() could be used for transient (kernel
-	 * controlled lifetime) pinning of memory pages all current
-	 * users establish long term (userspace controlled lifetime)
-	 * page pinning. Treat get_vaddr_frames() like
-	 * get_user_pages_longterm() and disallow it for filesystem-dax
-	 * mappings.
-	 */
-	if (vma_is_fsdax(vma)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
-	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
-		vec->got_ref = true;
-		vec->is_pfns = false;
-		ret = pin_user_pages_locked(start, nr_frames,
-			gup_flags, (struct page **)(vec->ptrs), &locked);
-		goto out;
-	}
-
 	vec->got_ref = false;
 	vec->is_pfns = true;
 	do {
@@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 		vma = find_vma_intersection(mm, start, start + 1);
 	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
 out:
-	if (locked)
-		mmap_read_unlock(mm);
+	mmap_read_unlock(mm);
+out_unlocked:
 	if (!ret)
 		ret = -EFAULT;
 	if (ret > 0)
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Daniel Vetter, Mauro Carvalho Chehab,
	Jérôme Glisse, Tomasz Figa, linux-mm, Kyungmin Park,
	John Hubbard, Daniel Vetter, Andrew Morton, Marek Szyprowski,
	Dan Williams, linux-arm-kernel, linux-media

This is used by media/videbuf2 for persistent dma mappings, not just
for a single dma operation and then freed again, so needs
FOLL_LONGTERM.

Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
locking issues. Rework the code to pull the pup path out from the
mmap_sem critical section as suggested by Jason.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 mm/frame_vector.c | 36 +++++++++++-------------------------
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 10f82d5643b6..39db520a51dc 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 	struct vm_area_struct *vma;
 	int ret = 0;
 	int err;
-	int locked;
 
 	if (nr_frames == 0)
 		return 0;
@@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 
 	start = untagged_addr(start);
 
+	ret = pin_user_pages_fast(start, nr_frames,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+				  (struct page **)(vec->ptrs));
+	if (ret > 0) {
+		vec->got_ref = true;
+		vec->is_pfns = false;
+		goto out_unlocked;
+	}
+
 	mmap_read_lock(mm);
-	locked = 1;
 	vma = find_vma_intersection(mm, start, start + 1);
 	if (!vma) {
 		ret = -EFAULT;
 		goto out;
 	}
 
-	/*
-	 * While get_vaddr_frames() could be used for transient (kernel
-	 * controlled lifetime) pinning of memory pages all current
-	 * users establish long term (userspace controlled lifetime)
-	 * page pinning. Treat get_vaddr_frames() like
-	 * get_user_pages_longterm() and disallow it for filesystem-dax
-	 * mappings.
-	 */
-	if (vma_is_fsdax(vma)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
-	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
-		vec->got_ref = true;
-		vec->is_pfns = false;
-		ret = pin_user_pages_locked(start, nr_frames,
-			gup_flags, (struct page **)(vec->ptrs), &locked);
-		goto out;
-	}
-
 	vec->got_ref = false;
 	vec->is_pfns = true;
 	do {
@@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 		vma = find_vma_intersection(mm, start, start + 1);
 	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
 out:
-	if (locked)
-		mmap_read_unlock(mm);
+	mmap_read_unlock(mm);
+out_unlocked:
 	if (!ret)
 		ret = -EFAULT;
 	if (ret > 0)
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 06/13] media: videobuf2: Move frame_vector into media subsystem
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Pawel Osciak, Marek Szyprowski, Kyungmin Park, Tomasz Figa,
	Mauro Carvalho Chehab, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
symbol from all over the tree (well just one place, somehow omap media
driver still had this in its Kconfig, despite not using it).

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    |  2 +
 drivers/media/platform/omap/Kconfig           |  1 -
 include/linux/mm.h                            | 42 -------------------
 include/media/videobuf2-core.h                | 42 +++++++++++++++++++
 mm/Kconfig                                    |  3 --
 mm/Makefile                                   |  1 -
 8 files changed, 45 insertions(+), 48 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)

diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
index edbc99ebba87..d2223a12c95f 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
 
 config VIDEOBUF2_MEMOPS
 	tristate
-	select FRAME_VECTOR
 
 config VIDEOBUF2_DMA_CONTIG
 	tristate
diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
index 77bebe8b202f..54306f8d096c 100644
--- a/drivers/media/common/videobuf2/Makefile
+++ b/drivers/media/common/videobuf2/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 videobuf2-common-objs := videobuf2-core.o
+videobuf2-common-objs += frame_vector.o
 
 ifeq ($(CONFIG_TRACEPOINTS),y)
   videobuf2-common-objs += vb2-trace.o
diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
similarity index 99%
rename from mm/frame_vector.c
rename to drivers/media/common/videobuf2/frame_vector.c
index 39db520a51dc..b95f4f371681 100644
--- a/mm/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -8,6 +8,8 @@
 #include <linux/pagemap.h>
 #include <linux/sched.h>
 
+#include <media/videobuf2-core.h>
+
 /**
  * get_vaddr_frames() - map virtual addresses to pfns
  * @start:	starting user address
diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
index f73b5893220d..de16de46c0f4 100644
--- a/drivers/media/platform/omap/Kconfig
+++ b/drivers/media/platform/omap/Kconfig
@@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
 	depends on VIDEO_V4L2
 	select VIDEOBUF2_DMA_CONTIG
 	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
-	select FRAME_VECTOR
 	help
 	  V4L2 Display driver support for OMAP2/3 based boards.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..acd60fbf1a5a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
 int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
 			struct task_struct *task, bool bypass_rlim);
 
-/* Container for pinned pfns / pages */
-struct frame_vector {
-	unsigned int nr_allocated;	/* Number of frames we have space for */
-	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
-	bool got_ref;		/* Did we pin pages by getting page ref? */
-	bool is_pfns;		/* Does array contain pages or pfns? */
-	void *ptrs[];		/* Array of pinned pfns / pages. Use
-				 * pfns_vector_pages() or pfns_vector_pfns()
-				 * for access */
-};
-
-struct frame_vector *frame_vector_create(unsigned int nr_frames);
-void frame_vector_destroy(struct frame_vector *vec);
-int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
-		     unsigned int gup_flags, struct frame_vector *vec);
-void put_vaddr_frames(struct frame_vector *vec);
-int frame_vector_to_pages(struct frame_vector *vec);
-void frame_vector_to_pfns(struct frame_vector *vec);
-
-static inline unsigned int frame_vector_count(struct frame_vector *vec)
-{
-	return vec->nr_frames;
-}
-
-static inline struct page **frame_vector_pages(struct frame_vector *vec)
-{
-	if (vec->is_pfns) {
-		int err = frame_vector_to_pages(vec);
-
-		if (err)
-			return ERR_PTR(err);
-	}
-	return (struct page **)(vec->ptrs);
-}
-
-static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
-{
-	if (!vec->is_pfns)
-		frame_vector_to_pfns(vec);
-	return (unsigned long *)(vec->ptrs);
-}
-
 struct kvec;
 int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 			struct page **pages);
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index bbb3f26fbde9..a2e75ca0334f 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
  */
 unsigned int vb2_request_buffer_cnt(struct media_request *req);
 
+/* Container for pinned pfns / pages in frame_vector.c */
+struct frame_vector {
+	unsigned int nr_allocated;	/* Number of frames we have space for */
+	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
+	bool got_ref;		/* Did we pin pages by getting page ref? */
+	bool is_pfns;		/* Does array contain pages or pfns? */
+	void *ptrs[];		/* Array of pinned pfns / pages. Use
+				 * pfns_vector_pages() or pfns_vector_pfns()
+				 * for access */
+};
+
+struct frame_vector *frame_vector_create(unsigned int nr_frames);
+void frame_vector_destroy(struct frame_vector *vec);
+int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
+		     unsigned int gup_flags, struct frame_vector *vec);
+void put_vaddr_frames(struct frame_vector *vec);
+int frame_vector_to_pages(struct frame_vector *vec);
+void frame_vector_to_pfns(struct frame_vector *vec);
+
+static inline unsigned int frame_vector_count(struct frame_vector *vec)
+{
+	return vec->nr_frames;
+}
+
+static inline struct page **frame_vector_pages(struct frame_vector *vec)
+{
+	if (vec->is_pfns) {
+		int err = frame_vector_to_pages(vec);
+
+		if (err)
+			return ERR_PTR(err);
+	}
+	return (struct page **)(vec->ptrs);
+}
+
+static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
+{
+	if (!vec->is_pfns)
+		frame_vector_to_pfns(vec);
+	return (unsigned long *)(vec->ptrs);
+}
+
 #endif /* _MEDIA_VIDEOBUF2_CORE_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 6c974888f86f..da6c943fe9f1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -815,9 +815,6 @@ config DEVICE_PRIVATE
 	  memory; i.e., memory that is only accessible from the device (or
 	  group of devices). You likely also want to select HMM_MIRROR.
 
-config FRAME_VECTOR
-	bool
-
 config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
diff --git a/mm/Makefile b/mm/Makefile
index d5649f1c12c0..a025fd6c6afd 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
-obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
 obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 06/13] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Daniel Vetter, Mauro Carvalho Chehab,
	Jérôme Glisse, Tomasz Figa, linux-mm, Kyungmin Park,
	John Hubbard, Daniel Vetter, Andrew Morton, Marek Szyprowski,
	Dan Williams, linux-arm-kernel, linux-media

It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
symbol from all over the tree (well just one place, somehow omap media
driver still had this in its Kconfig, despite not using it).

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    |  2 +
 drivers/media/platform/omap/Kconfig           |  1 -
 include/linux/mm.h                            | 42 -------------------
 include/media/videobuf2-core.h                | 42 +++++++++++++++++++
 mm/Kconfig                                    |  3 --
 mm/Makefile                                   |  1 -
 8 files changed, 45 insertions(+), 48 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)

diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
index edbc99ebba87..d2223a12c95f 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
 
 config VIDEOBUF2_MEMOPS
 	tristate
-	select FRAME_VECTOR
 
 config VIDEOBUF2_DMA_CONTIG
 	tristate
diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
index 77bebe8b202f..54306f8d096c 100644
--- a/drivers/media/common/videobuf2/Makefile
+++ b/drivers/media/common/videobuf2/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 videobuf2-common-objs := videobuf2-core.o
+videobuf2-common-objs += frame_vector.o
 
 ifeq ($(CONFIG_TRACEPOINTS),y)
   videobuf2-common-objs += vb2-trace.o
diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
similarity index 99%
rename from mm/frame_vector.c
rename to drivers/media/common/videobuf2/frame_vector.c
index 39db520a51dc..b95f4f371681 100644
--- a/mm/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -8,6 +8,8 @@
 #include <linux/pagemap.h>
 #include <linux/sched.h>
 
+#include <media/videobuf2-core.h>
+
 /**
  * get_vaddr_frames() - map virtual addresses to pfns
  * @start:	starting user address
diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
index f73b5893220d..de16de46c0f4 100644
--- a/drivers/media/platform/omap/Kconfig
+++ b/drivers/media/platform/omap/Kconfig
@@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
 	depends on VIDEO_V4L2
 	select VIDEOBUF2_DMA_CONTIG
 	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
-	select FRAME_VECTOR
 	help
 	  V4L2 Display driver support for OMAP2/3 based boards.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..acd60fbf1a5a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
 int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
 			struct task_struct *task, bool bypass_rlim);
 
-/* Container for pinned pfns / pages */
-struct frame_vector {
-	unsigned int nr_allocated;	/* Number of frames we have space for */
-	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
-	bool got_ref;		/* Did we pin pages by getting page ref? */
-	bool is_pfns;		/* Does array contain pages or pfns? */
-	void *ptrs[];		/* Array of pinned pfns / pages. Use
-				 * pfns_vector_pages() or pfns_vector_pfns()
-				 * for access */
-};
-
-struct frame_vector *frame_vector_create(unsigned int nr_frames);
-void frame_vector_destroy(struct frame_vector *vec);
-int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
-		     unsigned int gup_flags, struct frame_vector *vec);
-void put_vaddr_frames(struct frame_vector *vec);
-int frame_vector_to_pages(struct frame_vector *vec);
-void frame_vector_to_pfns(struct frame_vector *vec);
-
-static inline unsigned int frame_vector_count(struct frame_vector *vec)
-{
-	return vec->nr_frames;
-}
-
-static inline struct page **frame_vector_pages(struct frame_vector *vec)
-{
-	if (vec->is_pfns) {
-		int err = frame_vector_to_pages(vec);
-
-		if (err)
-			return ERR_PTR(err);
-	}
-	return (struct page **)(vec->ptrs);
-}
-
-static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
-{
-	if (!vec->is_pfns)
-		frame_vector_to_pfns(vec);
-	return (unsigned long *)(vec->ptrs);
-}
-
 struct kvec;
 int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 			struct page **pages);
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index bbb3f26fbde9..a2e75ca0334f 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
  */
 unsigned int vb2_request_buffer_cnt(struct media_request *req);
 
+/* Container for pinned pfns / pages in frame_vector.c */
+struct frame_vector {
+	unsigned int nr_allocated;	/* Number of frames we have space for */
+	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
+	bool got_ref;		/* Did we pin pages by getting page ref? */
+	bool is_pfns;		/* Does array contain pages or pfns? */
+	void *ptrs[];		/* Array of pinned pfns / pages. Use
+				 * pfns_vector_pages() or pfns_vector_pfns()
+				 * for access */
+};
+
+struct frame_vector *frame_vector_create(unsigned int nr_frames);
+void frame_vector_destroy(struct frame_vector *vec);
+int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
+		     unsigned int gup_flags, struct frame_vector *vec);
+void put_vaddr_frames(struct frame_vector *vec);
+int frame_vector_to_pages(struct frame_vector *vec);
+void frame_vector_to_pfns(struct frame_vector *vec);
+
+static inline unsigned int frame_vector_count(struct frame_vector *vec)
+{
+	return vec->nr_frames;
+}
+
+static inline struct page **frame_vector_pages(struct frame_vector *vec)
+{
+	if (vec->is_pfns) {
+		int err = frame_vector_to_pages(vec);
+
+		if (err)
+			return ERR_PTR(err);
+	}
+	return (struct page **)(vec->ptrs);
+}
+
+static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
+{
+	if (!vec->is_pfns)
+		frame_vector_to_pfns(vec);
+	return (unsigned long *)(vec->ptrs);
+}
+
 #endif /* _MEDIA_VIDEOBUF2_CORE_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 6c974888f86f..da6c943fe9f1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -815,9 +815,6 @@ config DEVICE_PRIVATE
 	  memory; i.e., memory that is only accessible from the device (or
 	  group of devices). You likely also want to select HMM_MIRROR.
 
-config FRAME_VECTOR
-	bool
-
 config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
diff --git a/mm/Makefile b/mm/Makefile
index d5649f1c12c0..a025fd6c6afd 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
-obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
 obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 06/13] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Daniel Vetter, Mauro Carvalho Chehab,
	Jérôme Glisse, Tomasz Figa, linux-mm, Kyungmin Park,
	John Hubbard, Daniel Vetter, Andrew Morton, Marek Szyprowski,
	Dan Williams, linux-arm-kernel, linux-media

It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
symbol from all over the tree (well just one place, somehow omap media
driver still had this in its Kconfig, despite not using it).

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    |  2 +
 drivers/media/platform/omap/Kconfig           |  1 -
 include/linux/mm.h                            | 42 -------------------
 include/media/videobuf2-core.h                | 42 +++++++++++++++++++
 mm/Kconfig                                    |  3 --
 mm/Makefile                                   |  1 -
 8 files changed, 45 insertions(+), 48 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)

diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
index edbc99ebba87..d2223a12c95f 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
 
 config VIDEOBUF2_MEMOPS
 	tristate
-	select FRAME_VECTOR
 
 config VIDEOBUF2_DMA_CONTIG
 	tristate
diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
index 77bebe8b202f..54306f8d096c 100644
--- a/drivers/media/common/videobuf2/Makefile
+++ b/drivers/media/common/videobuf2/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 videobuf2-common-objs := videobuf2-core.o
+videobuf2-common-objs += frame_vector.o
 
 ifeq ($(CONFIG_TRACEPOINTS),y)
   videobuf2-common-objs += vb2-trace.o
diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
similarity index 99%
rename from mm/frame_vector.c
rename to drivers/media/common/videobuf2/frame_vector.c
index 39db520a51dc..b95f4f371681 100644
--- a/mm/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -8,6 +8,8 @@
 #include <linux/pagemap.h>
 #include <linux/sched.h>
 
+#include <media/videobuf2-core.h>
+
 /**
  * get_vaddr_frames() - map virtual addresses to pfns
  * @start:	starting user address
diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
index f73b5893220d..de16de46c0f4 100644
--- a/drivers/media/platform/omap/Kconfig
+++ b/drivers/media/platform/omap/Kconfig
@@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
 	depends on VIDEO_V4L2
 	select VIDEOBUF2_DMA_CONTIG
 	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
-	select FRAME_VECTOR
 	help
 	  V4L2 Display driver support for OMAP2/3 based boards.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..acd60fbf1a5a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
 int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
 			struct task_struct *task, bool bypass_rlim);
 
-/* Container for pinned pfns / pages */
-struct frame_vector {
-	unsigned int nr_allocated;	/* Number of frames we have space for */
-	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
-	bool got_ref;		/* Did we pin pages by getting page ref? */
-	bool is_pfns;		/* Does array contain pages or pfns? */
-	void *ptrs[];		/* Array of pinned pfns / pages. Use
-				 * pfns_vector_pages() or pfns_vector_pfns()
-				 * for access */
-};
-
-struct frame_vector *frame_vector_create(unsigned int nr_frames);
-void frame_vector_destroy(struct frame_vector *vec);
-int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
-		     unsigned int gup_flags, struct frame_vector *vec);
-void put_vaddr_frames(struct frame_vector *vec);
-int frame_vector_to_pages(struct frame_vector *vec);
-void frame_vector_to_pfns(struct frame_vector *vec);
-
-static inline unsigned int frame_vector_count(struct frame_vector *vec)
-{
-	return vec->nr_frames;
-}
-
-static inline struct page **frame_vector_pages(struct frame_vector *vec)
-{
-	if (vec->is_pfns) {
-		int err = frame_vector_to_pages(vec);
-
-		if (err)
-			return ERR_PTR(err);
-	}
-	return (struct page **)(vec->ptrs);
-}
-
-static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
-{
-	if (!vec->is_pfns)
-		frame_vector_to_pfns(vec);
-	return (unsigned long *)(vec->ptrs);
-}
-
 struct kvec;
 int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 			struct page **pages);
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index bbb3f26fbde9..a2e75ca0334f 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
  */
 unsigned int vb2_request_buffer_cnt(struct media_request *req);
 
+/* Container for pinned pfns / pages in frame_vector.c */
+struct frame_vector {
+	unsigned int nr_allocated;	/* Number of frames we have space for */
+	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
+	bool got_ref;		/* Did we pin pages by getting page ref? */
+	bool is_pfns;		/* Does array contain pages or pfns? */
+	void *ptrs[];		/* Array of pinned pfns / pages. Use
+				 * pfns_vector_pages() or pfns_vector_pfns()
+				 * for access */
+};
+
+struct frame_vector *frame_vector_create(unsigned int nr_frames);
+void frame_vector_destroy(struct frame_vector *vec);
+int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
+		     unsigned int gup_flags, struct frame_vector *vec);
+void put_vaddr_frames(struct frame_vector *vec);
+int frame_vector_to_pages(struct frame_vector *vec);
+void frame_vector_to_pfns(struct frame_vector *vec);
+
+static inline unsigned int frame_vector_count(struct frame_vector *vec)
+{
+	return vec->nr_frames;
+}
+
+static inline struct page **frame_vector_pages(struct frame_vector *vec)
+{
+	if (vec->is_pfns) {
+		int err = frame_vector_to_pages(vec);
+
+		if (err)
+			return ERR_PTR(err);
+	}
+	return (struct page **)(vec->ptrs);
+}
+
+static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
+{
+	if (!vec->is_pfns)
+		frame_vector_to_pfns(vec);
+	return (unsigned long *)(vec->ptrs);
+}
+
 #endif /* _MEDIA_VIDEOBUF2_CORE_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 6c974888f86f..da6c943fe9f1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -815,9 +815,6 @@ config DEVICE_PRIVATE
 	  memory; i.e., memory that is only accessible from the device (or
 	  group of devices). You likely also want to select HMM_MIRROR.
 
-config FRAME_VECTOR
-	bool
-
 config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
diff --git a/mm/Makefile b/mm/Makefile
index d5649f1c12c0..a025fd6c6afd 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
-obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
 obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 07/13] mm: close race in generic_access_phys
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Dan Williams,
	Kees Cook, Rik van Riel, Benjamin Herrensmidt, Dave Airlie,
	Hugh Dickins, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Daniel Vetter

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
  ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
  cma regions. This means if we miss the unmap the pfn might contain
  pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
  iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
  ("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since ioremap might need to manipulate pagetables too we need to drop
the pt lock and have a retry loop if we raced.

While at it, also add kerneldoc and improve the comment for the
vma_ops->access function. It's for accessing, not for moving the
memory from iomem to system memory, as the old comment seemed to
suggest.

References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 include/linux/mm.h |  3 ++-
 mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index acd60fbf1a5a..2a16631c1fda 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -566,7 +566,8 @@ struct vm_operations_struct {
 	vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);
 
 	/* called by access_process_vm when get_user_pages() fails, typically
-	 * for use by special VMAs that can switch between memory and hardware
+	 * for use by special VMAs. See also generic_access_phys() for a generic
+	 * implementation useful for any iomem mapping.
 	 */
 	int (*access)(struct vm_area_struct *vma, unsigned long addr,
 		      void *buf, int len, int write);
diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..8d467e23b44e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
 	return ret;
 }
 
+/**
+ * generic_access_phys - generic implementation for iomem mmap access
+ * @vma: the vma to access
+ * @addr: userspace addres, not relative offset within @vma
+ * @buf: buffer to read/write
+ * @len: length of transfer
+ * @write: set to FOLL_WRITE when writing, otherwise reading
+ *
+ * This is a generic implementation for &vm_operations_struct.access for an
+ * iomem mapping. This callback is used by access_process_vm() when the @vma is
+ * not page based.
+ */
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 			void *buf, int len, int write)
 {
 	resource_size_t phys_addr;
 	unsigned long prot = 0;
 	void __iomem *maddr;
+	pte_t *ptep, pte;
+	spinlock_t *ptl;
 	int offset = addr & (PAGE_SIZE-1);
+	int ret = -EINVAL;
+
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		return -EINVAL;
+
+retry:
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		return -EINVAL;
+	pte = *ptep;
+	pte_unmap_unlock(ptep, ptl);
 
-	if (follow_phys(vma, addr, write, &prot, &phys_addr))
+	prot = pgprot_val(pte_pgprot(pte));
+	phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
+
+	if ((write & FOLL_WRITE) && !pte_write(pte))
 		return -EINVAL;
 
 	maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
 	if (!maddr)
 		return -ENOMEM;
 
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		goto out_unmap;
+
+	if (pte_same(pte, *ptep)) {
+		pte_unmap_unlock(ptep, ptl);
+		iounmap(maddr);
+
+		goto retry;
+	}
+
 	if (write)
 		memcpy_toio(maddr + offset, buf, len);
 	else
 		memcpy_fromio(buf, maddr + offset, len);
+	ret = len;
+	pte_unmap_unlock(ptep, ptl);
+out_unmap:
 	iounmap(maddr);
 
-	return len;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Jason Gunthorpe, Dave Airlie, Daniel Vetter, Daniel Vetter,
	linux-mm, Jérôme Glisse, John Hubbard,
	Benjamin Herrensmidt, Hugh Dickins, Dan Williams, Andrew Morton,
	linux-arm-kernel, linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
  ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
  cma regions. This means if we miss the unmap the pfn might contain
  pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
  iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
  ("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since ioremap might need to manipulate pagetables too we need to drop
the pt lock and have a retry loop if we raced.

While at it, also add kerneldoc and improve the comment for the
vma_ops->access function. It's for accessing, not for moving the
memory from iomem to system memory, as the old comment seemed to
suggest.

References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 include/linux/mm.h |  3 ++-
 mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index acd60fbf1a5a..2a16631c1fda 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -566,7 +566,8 @@ struct vm_operations_struct {
 	vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);
 
 	/* called by access_process_vm when get_user_pages() fails, typically
-	 * for use by special VMAs that can switch between memory and hardware
+	 * for use by special VMAs. See also generic_access_phys() for a generic
+	 * implementation useful for any iomem mapping.
 	 */
 	int (*access)(struct vm_area_struct *vma, unsigned long addr,
 		      void *buf, int len, int write);
diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..8d467e23b44e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
 	return ret;
 }
 
+/**
+ * generic_access_phys - generic implementation for iomem mmap access
+ * @vma: the vma to access
+ * @addr: userspace addres, not relative offset within @vma
+ * @buf: buffer to read/write
+ * @len: length of transfer
+ * @write: set to FOLL_WRITE when writing, otherwise reading
+ *
+ * This is a generic implementation for &vm_operations_struct.access for an
+ * iomem mapping. This callback is used by access_process_vm() when the @vma is
+ * not page based.
+ */
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 			void *buf, int len, int write)
 {
 	resource_size_t phys_addr;
 	unsigned long prot = 0;
 	void __iomem *maddr;
+	pte_t *ptep, pte;
+	spinlock_t *ptl;
 	int offset = addr & (PAGE_SIZE-1);
+	int ret = -EINVAL;
+
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		return -EINVAL;
+
+retry:
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		return -EINVAL;
+	pte = *ptep;
+	pte_unmap_unlock(ptep, ptl);
 
-	if (follow_phys(vma, addr, write, &prot, &phys_addr))
+	prot = pgprot_val(pte_pgprot(pte));
+	phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
+
+	if ((write & FOLL_WRITE) && !pte_write(pte))
 		return -EINVAL;
 
 	maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
 	if (!maddr)
 		return -ENOMEM;
 
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		goto out_unmap;
+
+	if (pte_same(pte, *ptep)) {
+		pte_unmap_unlock(ptep, ptl);
+		iounmap(maddr);
+
+		goto retry;
+	}
+
 	if (write)
 		memcpy_toio(maddr + offset, buf, len);
 	else
 		memcpy_fromio(buf, maddr + offset, len);
+	ret = len;
+	pte_unmap_unlock(ptep, ptl);
+out_unmap:
 	iounmap(maddr);
 
-	return len;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Jason Gunthorpe, Dave Airlie, Daniel Vetter, Daniel Vetter,
	linux-mm, Jérôme Glisse, John Hubbard, Hugh Dickins,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
  ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
  cma regions. This means if we miss the unmap the pfn might contain
  pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
  iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
  ("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since ioremap might need to manipulate pagetables too we need to drop
the pt lock and have a retry loop if we raced.

While at it, also add kerneldoc and improve the comment for the
vma_ops->access function. It's for accessing, not for moving the
memory from iomem to system memory, as the old comment seemed to
suggest.

References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 include/linux/mm.h |  3 ++-
 mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index acd60fbf1a5a..2a16631c1fda 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -566,7 +566,8 @@ struct vm_operations_struct {
 	vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);
 
 	/* called by access_process_vm when get_user_pages() fails, typically
-	 * for use by special VMAs that can switch between memory and hardware
+	 * for use by special VMAs. See also generic_access_phys() for a generic
+	 * implementation useful for any iomem mapping.
 	 */
 	int (*access)(struct vm_area_struct *vma, unsigned long addr,
 		      void *buf, int len, int write);
diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..8d467e23b44e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
 	return ret;
 }
 
+/**
+ * generic_access_phys - generic implementation for iomem mmap access
+ * @vma: the vma to access
+ * @addr: userspace addres, not relative offset within @vma
+ * @buf: buffer to read/write
+ * @len: length of transfer
+ * @write: set to FOLL_WRITE when writing, otherwise reading
+ *
+ * This is a generic implementation for &vm_operations_struct.access for an
+ * iomem mapping. This callback is used by access_process_vm() when the @vma is
+ * not page based.
+ */
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 			void *buf, int len, int write)
 {
 	resource_size_t phys_addr;
 	unsigned long prot = 0;
 	void __iomem *maddr;
+	pte_t *ptep, pte;
+	spinlock_t *ptl;
 	int offset = addr & (PAGE_SIZE-1);
+	int ret = -EINVAL;
+
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		return -EINVAL;
+
+retry:
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		return -EINVAL;
+	pte = *ptep;
+	pte_unmap_unlock(ptep, ptl);
 
-	if (follow_phys(vma, addr, write, &prot, &phys_addr))
+	prot = pgprot_val(pte_pgprot(pte));
+	phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
+
+	if ((write & FOLL_WRITE) && !pte_write(pte))
 		return -EINVAL;
 
 	maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
 	if (!maddr)
 		return -ENOMEM;
 
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		goto out_unmap;
+
+	if (pte_same(pte, *ptep)) {
+		pte_unmap_unlock(ptep, ptl);
+		iounmap(maddr);
+
+		goto retry;
+	}
+
 	if (write)
 		memcpy_toio(maddr + offset, buf, len);
 	else
 		memcpy_fromio(buf, maddr + offset, len);
+	ret = len;
+	pte_unmap_unlock(ptep, ptl);
+out_unmap:
 	iounmap(maddr);
 
-	return len;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 08/13] s390/pci: Remove races against pte updates
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Dan Williams, Kees Cook, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Niklas Schnelle,
	Gerald Schaefer

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since zpci_memcpy_from|toio seems to not do anything nefarious with
locks we just need to open code get_pfn and follow_pfn and make sure
we drop the locks only after we've done. The write function also needs
the copy_from_user move, since we can't take userspace faults while
holding the mmap sem.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
---
 arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 401cf670a243..4d194cb09372 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
 	return rc;
 }
 
-static long get_pfn(unsigned long user_addr, unsigned long access,
-		    unsigned long *pfn)
-{
-	struct vm_area_struct *vma;
-	long ret;
-
-	mmap_read_lock(current->mm);
-	ret = -EINVAL;
-	vma = find_vma(current->mm, user_addr);
-	if (!vma)
-		goto out;
-	ret = -EACCES;
-	if (!(vma->vm_flags & access))
-		goto out;
-	ret = follow_pfn(vma, user_addr, pfn);
-out:
-	mmap_read_unlock(current->mm);
-	return ret;
-}
-
 SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 		const void __user *, user_buffer, size_t, length)
 {
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	 * We only support write access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	} else
 		buf = local_buf;
 
-	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
+	ret = -EFAULT;
+	if (copy_from_user(buf, user_buffer, length))
+		goto out_free;
+
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) |
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
 			(mmio_addr & ~PAGE_MASK));
 
-	ret = -EFAULT;
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
-		goto out;
-
-	if (copy_from_user(buf, user_buffer, length))
-		goto out;
+		goto out_unlock_pt;
 
 	ret = zpci_memcpy_toio(io_addr, buf, length);
-out:
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+out_free:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
@@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -287,7 +287,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	 * We only support read access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 		buf = local_buf;
 	}
 
-	ret = get_pfn(mmio_addr, VM_READ, &pfn);
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
+			(mmio_addr & ~PAGE_MASK));
 
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) {
 		ret = -EFAULT;
-		goto out;
+		goto out_unlock_pt;
 	}
 	ret = zpci_memcpy_fromio(buf, io_addr, length);
-	if (ret)
-		goto out;
-	if (copy_to_user(user_buffer, buf, length))
+
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+
+	if (!ret && copy_to_user(user_buffer, buf, length))
 		ret = -EFAULT;
 
-out:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 08/13] s390/pci: Remove races against pte updates
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, Niklas Schnelle, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Gerald Schaefer, Andrew Morton, linux-arm-kernel,
	linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since zpci_memcpy_from|toio seems to not do anything nefarious with
locks we just need to open code get_pfn and follow_pfn and make sure
we drop the locks only after we've done. The write function also needs
the copy_from_user move, since we can't take userspace faults while
holding the mmap sem.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
---
 arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 401cf670a243..4d194cb09372 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
 	return rc;
 }
 
-static long get_pfn(unsigned long user_addr, unsigned long access,
-		    unsigned long *pfn)
-{
-	struct vm_area_struct *vma;
-	long ret;
-
-	mmap_read_lock(current->mm);
-	ret = -EINVAL;
-	vma = find_vma(current->mm, user_addr);
-	if (!vma)
-		goto out;
-	ret = -EACCES;
-	if (!(vma->vm_flags & access))
-		goto out;
-	ret = follow_pfn(vma, user_addr, pfn);
-out:
-	mmap_read_unlock(current->mm);
-	return ret;
-}
-
 SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 		const void __user *, user_buffer, size_t, length)
 {
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	 * We only support write access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	} else
 		buf = local_buf;
 
-	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
+	ret = -EFAULT;
+	if (copy_from_user(buf, user_buffer, length))
+		goto out_free;
+
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) |
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
 			(mmio_addr & ~PAGE_MASK));
 
-	ret = -EFAULT;
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
-		goto out;
-
-	if (copy_from_user(buf, user_buffer, length))
-		goto out;
+		goto out_unlock_pt;
 
 	ret = zpci_memcpy_toio(io_addr, buf, length);
-out:
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+out_free:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
@@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -287,7 +287,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	 * We only support read access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 		buf = local_buf;
 	}
 
-	ret = get_pfn(mmio_addr, VM_READ, &pfn);
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
+			(mmio_addr & ~PAGE_MASK));
 
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) {
 		ret = -EFAULT;
-		goto out;
+		goto out_unlock_pt;
 	}
 	ret = zpci_memcpy_fromio(buf, io_addr, length);
-	if (ret)
-		goto out;
-	if (copy_to_user(user_buffer, buf, length))
+
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+
+	if (!ret && copy_to_user(user_buffer, buf, length))
 		ret = -EFAULT;
 
-out:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 08/13] s390/pci: Remove races against pte updates
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, Niklas Schnelle, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Gerald Schaefer, Andrew Morton, linux-arm-kernel,
	linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since zpci_memcpy_from|toio seems to not do anything nefarious with
locks we just need to open code get_pfn and follow_pfn and make sure
we drop the locks only after we've done. The write function also needs
the copy_from_user move, since we can't take userspace faults while
holding the mmap sem.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
---
 arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 401cf670a243..4d194cb09372 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
 	return rc;
 }
 
-static long get_pfn(unsigned long user_addr, unsigned long access,
-		    unsigned long *pfn)
-{
-	struct vm_area_struct *vma;
-	long ret;
-
-	mmap_read_lock(current->mm);
-	ret = -EINVAL;
-	vma = find_vma(current->mm, user_addr);
-	if (!vma)
-		goto out;
-	ret = -EACCES;
-	if (!(vma->vm_flags & access))
-		goto out;
-	ret = follow_pfn(vma, user_addr, pfn);
-out:
-	mmap_read_unlock(current->mm);
-	return ret;
-}
-
 SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 		const void __user *, user_buffer, size_t, length)
 {
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	 * We only support write access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	} else
 		buf = local_buf;
 
-	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
+	ret = -EFAULT;
+	if (copy_from_user(buf, user_buffer, length))
+		goto out_free;
+
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) |
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
 			(mmio_addr & ~PAGE_MASK));
 
-	ret = -EFAULT;
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
-		goto out;
-
-	if (copy_from_user(buf, user_buffer, length))
-		goto out;
+		goto out_unlock_pt;
 
 	ret = zpci_memcpy_toio(io_addr, buf, length);
-out:
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+out_free:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
@@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -287,7 +287,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	 * We only support read access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 		buf = local_buf;
 	}
 
-	ret = get_pfn(mmio_addr, VM_READ, &pfn);
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
+			(mmio_addr & ~PAGE_MASK));
 
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) {
 		ret = -EFAULT;
-		goto out;
+		goto out_unlock_pt;
 	}
 	ret = zpci_memcpy_fromio(buf, io_addr, length);
-	if (ret)
-		goto out;
-	if (copy_to_user(user_buffer, buf, length))
+
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+
+	if (!ret && copy_to_user(user_buffer, buf, length))
 		ret = -EFAULT;
 
-out:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 09/13] PCI: obey iomem restrictions for procfs mmap
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, linux-pci

There's three ways to access pci bars from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to pci bars while a driver is loaded and using it.

Fix this.

References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/pci/proc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 		else
 			return -EINVAL;
 	}
+
+	if (dev->resource[i].flags & IORESOURCE_MEM &&
+	    iomem_is_exclusive(dev->resource[i].start))
+		return -EINVAL;
+
 	ret = pci_mmap_page_range(dev, i, vma,
 				  fpriv->mmap_state, write_combine);
 	if (ret < 0)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 09/13] PCI: obey iomem restrictions for procfs mmap
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

There's three ways to access pci bars from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to pci bars while a driver is loaded and using it.

Fix this.

References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/pci/proc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 		else
 			return -EINVAL;
 	}
+
+	if (dev->resource[i].flags & IORESOURCE_MEM &&
+	    iomem_is_exclusive(dev->resource[i].start))
+		return -EINVAL;
+
 	ret = pci_mmap_page_range(dev, i, vma,
 				  fpriv->mmap_state, write_combine);
 	if (ret < 0)
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 09/13] PCI: obey iomem restrictions for procfs mmap
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

There's three ways to access pci bars from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to pci bars while a driver is loaded and using it.

Fix this.

References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/pci/proc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 		else
 			return -EINVAL;
 	}
+
+	if (dev->resource[i].flags & IORESOURCE_MEM &&
+	    iomem_is_exclusive(dev->resource[i].start))
+		return -EINVAL;
+
 	ret = pci_mmap_page_range(dev, i, vma,
 				  fpriv->mmap_state, write_combine);
 	if (ret < 0)
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, linux-pci

Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps ptes when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.

Except there's two more ways to access pci bars: sysfs and proc mmap
support. Let's plug that hole.

For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
at ->open time, but that's a bit tricky here with all the entry points
and arch code. So instead create a fake file and adjust vma->vm_file.

Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
seems to be a subset of architectures support STRICT_DEVMEM, so we
should be good.

The only difference in access checks left is that sysfs pci mmap does
not check for CAP_RAWIO. But I think that makes some sense compared to
/dev/mem and proc, where one file gives you access to everything and
no ownership applies.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/char/mem.c     | 16 +++++++++++++++-
 drivers/pci/mmap.c     |  3 +++
 include/linux/ioport.h |  2 ++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cde..5e58a326d4ee 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
 }
 
 static struct inode *devmem_inode;
+static struct vfsmount *devmem_vfs_mount;
 
 #ifdef CONFIG_IO_STRICT_DEVMEM
 void revoke_devmem(struct resource *res)
@@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
 
 	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
 }
+
+struct file *devmem_getfile(void)
+{
+	struct file *file;
+
+	file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
+				 O_RDWR, &kmem_fops);
+	if (IS_ERR(file))
+		return NULL;
+
+	file->f_mapping = devmem_indoe->i_mapping;
+
+	return file;
+}
 #endif
 
 static int open_port(struct inode *inode, struct file *filp)
@@ -1010,7 +1025,6 @@ static struct file_system_type devmem_fs_type = {
 
 static int devmem_init_inode(void)
 {
-	static struct vfsmount *devmem_vfs_mount;
 	static int devmem_fs_cnt;
 	struct inode *inode;
 	int rc;
diff --git a/drivers/pci/mmap.c b/drivers/pci/mmap.c
index b8c9011987f4..63786cc9c746 100644
--- a/drivers/pci/mmap.c
+++ b/drivers/pci/mmap.c
@@ -7,6 +7,7 @@
  * Author: David Woodhouse <dwmw2@infradead.org>
  */
 
+#include <linux/file.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/pci.h>
@@ -64,6 +65,8 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
 		vma->vm_pgoff += (pci_resource_start(pdev, bar) >> PAGE_SHIFT);
 
 	vma->vm_ops = &pci_phys_vm_ops;
+	fput(vma->vm_file);
+	vma->vm_file = devmem_getfile();
 
 	return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 				  vma->vm_end - vma->vm_start,
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6c2b06fe8beb..83238cba19fe 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct resource *base,
 
 #ifdef CONFIG_IO_STRICT_DEVMEM
 void revoke_devmem(struct resource *res);
+struct file *devm_getfile(void);
 #else
 static inline void revoke_devmem(struct resource *res) { };
+static inline struct file *devmem_getfile(void) { return NULL; };
 #endif
 
 #endif /* __ASSEMBLY__ */
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps ptes when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.

Except there's two more ways to access pci bars: sysfs and proc mmap
support. Let's plug that hole.

For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
at ->open time, but that's a bit tricky here with all the entry points
and arch code. So instead create a fake file and adjust vma->vm_file.

Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
seems to be a subset of architectures support STRICT_DEVMEM, so we
should be good.

The only difference in access checks left is that sysfs pci mmap does
not check for CAP_RAWIO. But I think that makes some sense compared to
/dev/mem and proc, where one file gives you access to everything and
no ownership applies.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/char/mem.c     | 16 +++++++++++++++-
 drivers/pci/mmap.c     |  3 +++
 include/linux/ioport.h |  2 ++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cde..5e58a326d4ee 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
 }
 
 static struct inode *devmem_inode;
+static struct vfsmount *devmem_vfs_mount;
 
 #ifdef CONFIG_IO_STRICT_DEVMEM
 void revoke_devmem(struct resource *res)
@@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
 
 	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
 }
+
+struct file *devmem_getfile(void)
+{
+	struct file *file;
+
+	file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
+				 O_RDWR, &kmem_fops);
+	if (IS_ERR(file))
+		return NULL;
+
+	file->f_mapping = devmem_indoe->i_mapping;
+
+	return file;
+}
 #endif
 
 static int open_port(struct inode *inode, struct file *filp)
@@ -1010,7 +1025,6 @@ static struct file_system_type devmem_fs_type = {
 
 static int devmem_init_inode(void)
 {
-	static struct vfsmount *devmem_vfs_mount;
 	static int devmem_fs_cnt;
 	struct inode *inode;
 	int rc;
diff --git a/drivers/pci/mmap.c b/drivers/pci/mmap.c
index b8c9011987f4..63786cc9c746 100644
--- a/drivers/pci/mmap.c
+++ b/drivers/pci/mmap.c
@@ -7,6 +7,7 @@
  * Author: David Woodhouse <dwmw2@infradead.org>
  */
 
+#include <linux/file.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/pci.h>
@@ -64,6 +65,8 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
 		vma->vm_pgoff += (pci_resource_start(pdev, bar) >> PAGE_SHIFT);
 
 	vma->vm_ops = &pci_phys_vm_ops;
+	fput(vma->vm_file);
+	vma->vm_file = devmem_getfile();
 
 	return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 				  vma->vm_end - vma->vm_start,
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6c2b06fe8beb..83238cba19fe 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct resource *base,
 
 #ifdef CONFIG_IO_STRICT_DEVMEM
 void revoke_devmem(struct resource *res);
+struct file *devm_getfile(void);
 #else
 static inline void revoke_devmem(struct resource *res) { };
+static inline struct file *devmem_getfile(void) { return NULL; };
 #endif
 
 #endif /* __ASSEMBLY__ */
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps ptes when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.

Except there's two more ways to access pci bars: sysfs and proc mmap
support. Let's plug that hole.

For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
at ->open time, but that's a bit tricky here with all the entry points
and arch code. So instead create a fake file and adjust vma->vm_file.

Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
seems to be a subset of architectures support STRICT_DEVMEM, so we
should be good.

The only difference in access checks left is that sysfs pci mmap does
not check for CAP_RAWIO. But I think that makes some sense compared to
/dev/mem and proc, where one file gives you access to everything and
no ownership applies.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/char/mem.c     | 16 +++++++++++++++-
 drivers/pci/mmap.c     |  3 +++
 include/linux/ioport.h |  2 ++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cde..5e58a326d4ee 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
 }
 
 static struct inode *devmem_inode;
+static struct vfsmount *devmem_vfs_mount;
 
 #ifdef CONFIG_IO_STRICT_DEVMEM
 void revoke_devmem(struct resource *res)
@@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
 
 	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
 }
+
+struct file *devmem_getfile(void)
+{
+	struct file *file;
+
+	file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
+				 O_RDWR, &kmem_fops);
+	if (IS_ERR(file))
+		return NULL;
+
+	file->f_mapping = devmem_indoe->i_mapping;
+
+	return file;
+}
 #endif
 
 static int open_port(struct inode *inode, struct file *filp)
@@ -1010,7 +1025,6 @@ static struct file_system_type devmem_fs_type = {
 
 static int devmem_init_inode(void)
 {
-	static struct vfsmount *devmem_vfs_mount;
 	static int devmem_fs_cnt;
 	struct inode *inode;
 	int rc;
diff --git a/drivers/pci/mmap.c b/drivers/pci/mmap.c
index b8c9011987f4..63786cc9c746 100644
--- a/drivers/pci/mmap.c
+++ b/drivers/pci/mmap.c
@@ -7,6 +7,7 @@
  * Author: David Woodhouse <dwmw2@infradead.org>
  */
 
+#include <linux/file.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/pci.h>
@@ -64,6 +65,8 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
 		vma->vm_pgoff += (pci_resource_start(pdev, bar) >> PAGE_SHIFT);
 
 	vma->vm_ops = &pci_phys_vm_ops;
+	fput(vma->vm_file);
+	vma->vm_file = devmem_getfile();
 
 	return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 				  vma->vm_end - vma->vm_start,
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6c2b06fe8beb..83238cba19fe 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct resource *base,
 
 #ifdef CONFIG_IO_STRICT_DEVMEM
 void revoke_devmem(struct resource *res);
+struct file *devm_getfile(void);
 #else
 static inline void revoke_devmem(struct resource *res) { };
+static inline struct file *devmem_getfile(void) { return NULL; };
 #endif
 
 #endif /* __ASSEMBLY__ */
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 11/13] mm: add unsafe_follow_pfn
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: kvm@vger.kernel.org
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
 mm/nommu.c         | 17 +++++++++++++++++
 security/Kconfig   | 13 +++++++++++++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+		      unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
 		unsigned int flags, unsigned long *prot, resource_size_t *phys);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 8d467e23b44e..8db7ad1c261c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
  * @address: user virtual address
  * @pfn: location to store found PFN
  *
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
  *
  * Return: zero and the pfn at @pfn on success, -ve otherwise.
  */
@@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
+	return -EINVAL;
+#else
+	WARN_ONCE(1, "unsafe follow_pfn usage\n");
+	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+	return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 int follow_phys(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+	return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..48945402e103 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
 	  If you wish for all usermode helper programs to be disabled,
 	  specify an empty string here (i.e. "").
 
+config STRICT_FOLLOW_PFN
+	bool "Disable unsafe use of follow_pfn"
+	depends on MMU
+	help
+	  Some functionality in the kernel follows userspace mappings to iomem
+	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
+	  buffers sharing.
+
+	  If this option is switched on, such access is rejected. Only enable
+	  this option when you must run userspace which requires this.
+
+	  If in doubt, say Y.
+
 source "security/selinux/Kconfig"
 source "security/smack/Kconfig"
 source "security/tomoyo/Kconfig"
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 11/13] mm: add unsafe_follow_pfn
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: kvm@vger.kernel.org
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
 mm/nommu.c         | 17 +++++++++++++++++
 security/Kconfig   | 13 +++++++++++++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+		      unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
 		unsigned int flags, unsigned long *prot, resource_size_t *phys);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 8d467e23b44e..8db7ad1c261c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
  * @address: user virtual address
  * @pfn: location to store found PFN
  *
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
  *
  * Return: zero and the pfn at @pfn on success, -ve otherwise.
  */
@@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
+	return -EINVAL;
+#else
+	WARN_ONCE(1, "unsafe follow_pfn usage\n");
+	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+	return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 int follow_phys(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+	return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..48945402e103 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
 	  If you wish for all usermode helper programs to be disabled,
 	  specify an empty string here (i.e. "").
 
+config STRICT_FOLLOW_PFN
+	bool "Disable unsafe use of follow_pfn"
+	depends on MMU
+	help
+	  Some functionality in the kernel follows userspace mappings to iomem
+	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
+	  buffers sharing.
+
+	  If this option is switched on, such access is rejected. Only enable
+	  this option when you must run userspace which requires this.
+
+	  If in doubt, say Y.
+
 source "security/selinux/Kconfig"
 source "security/smack/Kconfig"
 source "security/tomoyo/Kconfig"
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 11/13] mm: add unsafe_follow_pfn
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: kvm@vger.kernel.org
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
 mm/nommu.c         | 17 +++++++++++++++++
 security/Kconfig   | 13 +++++++++++++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+		      unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
 		unsigned int flags, unsigned long *prot, resource_size_t *phys);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 8d467e23b44e..8db7ad1c261c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
  * @address: user virtual address
  * @pfn: location to store found PFN
  *
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
  *
  * Return: zero and the pfn at @pfn on success, -ve otherwise.
  */
@@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
+	return -EINVAL;
+#else
+	WARN_ONCE(1, "unsafe follow_pfn usage\n");
+	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+	return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 int follow_phys(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+	return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..48945402e103 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
 	  If you wish for all usermode helper programs to be disabled,
 	  specify an empty string here (i.e. "").
 
+config STRICT_FOLLOW_PFN
+	bool "Disable unsafe use of follow_pfn"
+	depends on MMU
+	help
+	  Some functionality in the kernel follows userspace mappings to iomem
+	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
+	  buffers sharing.
+
+	  If this option is switched on, such access is rejected. Only enable
+	  this option when you must run userspace which requires this.
+
+	  If in doubt, say Y.
+
 source "security/selinux/Kconfig"
 source "security/smack/Kconfig"
 source "security/tomoyo/Kconfig"
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 12/13] media/videbuf1|2: Mark follow_pfn usage as unsafe
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa, Laurent Dufour,
	Vlastimil Babka, Daniel Jordan, Michel Lespinasse

The media model assumes that buffers are all preallocated, so that
when a media pipeline is running we never miss a deadline because the
buffers aren't allocated or available.

This means we cannot fix the v4l follow_pfn usage through
mmu_notifier, without breaking how this all works. The only real fix
is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
tell everyone to cut over to dma-buf memory sharing for zerocopy.

userptr for normal memory will keep working as-is.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michel Lespinasse <walken@google.com>
---
 drivers/media/common/videobuf2/frame_vector.c | 2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
index b95f4f371681..d56eb6258f09 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -71,7 +71,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 		unsigned long *nums = frame_vector_pfns(vec);
 
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
-			err = follow_pfn(vma, start, &nums[ret]);
+			err = unsafe_follow_pfn(vma, start, &nums[ret]);
 			if (err) {
 				if (ret == 0)
 					ret = err;
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..821c4a76ab96 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 	user_address = untagged_baddr;
 
 	while (pages_done < (mem->size >> PAGE_SHIFT)) {
-		ret = follow_pfn(vma, user_address, &this_pfn);
+		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
 		if (ret)
 			break;
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 12/13] media/videbuf1|2: Mark follow_pfn usage as unsafe
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: Jan Kara, kvm, Daniel Vetter, linux-mm, Daniel Vetter,
	Michel Lespinasse, Marek Szyprowski, linux-s390,
	linux-samsung-soc, Daniel Jordan, Jason Gunthorpe,
	linux-arm-kernel, linux-media, Kees Cook, Pawel Osciak,
	John Hubbard, Jérôme Glisse, Dan Williams,
	Laurent Dufour, Vlastimil Babka, Tomasz Figa, Kyungmin Park,
	Andrew Morton

The media model assumes that buffers are all preallocated, so that
when a media pipeline is running we never miss a deadline because the
buffers aren't allocated or available.

This means we cannot fix the v4l follow_pfn usage through
mmu_notifier, without breaking how this all works. The only real fix
is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
tell everyone to cut over to dma-buf memory sharing for zerocopy.

userptr for normal memory will keep working as-is.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michel Lespinasse <walken@google.com>
---
 drivers/media/common/videobuf2/frame_vector.c | 2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
index b95f4f371681..d56eb6258f09 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -71,7 +71,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 		unsigned long *nums = frame_vector_pfns(vec);
 
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
-			err = follow_pfn(vma, start, &nums[ret]);
+			err = unsafe_follow_pfn(vma, start, &nums[ret]);
 			if (err) {
 				if (ret == 0)
 					ret = err;
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..821c4a76ab96 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 	user_address = untagged_baddr;
 
 	while (pages_done < (mem->size >> PAGE_SHIFT)) {
-		ret = follow_pfn(vma, user_address, &this_pfn);
+		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
 		if (ret)
 			break;
 
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 12/13] media/videbuf1|2: Mark follow_pfn usage as unsafe
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: Jan Kara, kvm, Daniel Vetter, linux-mm, Daniel Vetter,
	Michel Lespinasse, Marek Szyprowski, linux-s390,
	linux-samsung-soc, Daniel Jordan, Jason Gunthorpe,
	linux-arm-kernel, linux-media, Kees Cook, Pawel Osciak,
	John Hubbard, Jérôme Glisse, Dan Williams,
	Laurent Dufour, Vlastimil Babka, Tomasz Figa, Kyungmin Park,
	Andrew Morton

The media model assumes that buffers are all preallocated, so that
when a media pipeline is running we never miss a deadline because the
buffers aren't allocated or available.

This means we cannot fix the v4l follow_pfn usage through
mmu_notifier, without breaking how this all works. The only real fix
is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
tell everyone to cut over to dma-buf memory sharing for zerocopy.

userptr for normal memory will keep working as-is.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michel Lespinasse <walken@google.com>
---
 drivers/media/common/videobuf2/frame_vector.c | 2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
index b95f4f371681..d56eb6258f09 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -71,7 +71,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 		unsigned long *nums = frame_vector_pfns(vec);
 
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
-			err = follow_pfn(vma, start, &nums[ret]);
+			err = unsafe_follow_pfn(vma, start, &nums[ret]);
 			if (err) {
 				if (ret == 0)
 					ret = err;
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..821c4a76ab96 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 	user_address = untagged_baddr;
 
 	while (pages_done < (mem->size >> PAGE_SHIFT)) {
-		ret = follow_pfn(vma, user_address, &this_pfn);
+		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
 		if (ret)
 			break;
 
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
  2020-10-07 16:44 ` Daniel Vetter
  (?)
@ 2020-10-07 16:44   ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Alex Williamson,
	Cornelia Huck

The code seems to stuff these pfns into iommu pts (or something like
that, I didn't follow), but there's no mmu_notifier to ensure that
access is synchronized with pte updates.

Hence mark these as unsafe. This means that with
CONFIG_STRICT_FOLLOW_PFN, these will be rejected.

Real fix is to wire up an mmu_notifier ... somehow. Probably means any
invalidate is a fatal fault for this vfio device, but then this
shouldn't ever happen if userspace is reasonable.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
---
 drivers/vfio/vfio_iommu_type1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5fbf0c1f7433..a4d53f3d0a35 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 {
 	int ret;
 
-	ret = follow_pfn(vma, vaddr, pfn);
+	ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	if (ret) {
 		bool unlocked = false;
 
@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 		if (ret)
 			return ret;
 
-		ret = follow_pfn(vma, vaddr, pfn);
+		ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	}
 
 	return ret;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, Cornelia Huck, Alex Williamson,
	linux-mm, Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

The code seems to stuff these pfns into iommu pts (or something like
that, I didn't follow), but there's no mmu_notifier to ensure that
access is synchronized with pte updates.

Hence mark these as unsafe. This means that with
CONFIG_STRICT_FOLLOW_PFN, these will be rejected.

Real fix is to wire up an mmu_notifier ... somehow. Probably means any
invalidate is a fatal fault for this vfio device, but then this
shouldn't ever happen if userspace is reasonable.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
---
 drivers/vfio/vfio_iommu_type1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5fbf0c1f7433..a4d53f3d0a35 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 {
 	int ret;
 
-	ret = follow_pfn(vma, vaddr, pfn);
+	ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	if (ret) {
 		bool unlocked = false;
 
@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 		if (ret)
 			return ret;
 
-		ret = follow_pfn(vma, vaddr, pfn);
+		ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	}
 
 	return ret;
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
@ 2020-10-07 16:44   ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 16:44 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, Cornelia Huck, Alex Williamson,
	linux-mm, Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

The code seems to stuff these pfns into iommu pts (or something like
that, I didn't follow), but there's no mmu_notifier to ensure that
access is synchronized with pte updates.

Hence mark these as unsafe. This means that with
CONFIG_STRICT_FOLLOW_PFN, these will be rejected.

Real fix is to wire up an mmu_notifier ... somehow. Probably means any
invalidate is a fatal fault for this vfio device, but then this
shouldn't ever happen if userspace is reasonable.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
---
 drivers/vfio/vfio_iommu_type1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5fbf0c1f7433..a4d53f3d0a35 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 {
 	int ret;
 
-	ret = follow_pfn(vma, vaddr, pfn);
+	ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	if (ret) {
 		bool unlocked = false;
 
@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 		if (ret)
 			return ret;
 
-		ret = follow_pfn(vma, vaddr, pfn);
+		ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	}
 
 	return ret;
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 16:53     ` Jason Gunthorpe
  -1 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 16:53 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Pawel Osciak, Marek Szyprowski, Kyungmin Park, Tomasz Figa,
	Mauro Carvalho Chehab, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote:
>  
> -	/*
> -	 * While get_vaddr_frames() could be used for transient (kernel
> -	 * controlled lifetime) pinning of memory pages all current
> -	 * users establish long term (userspace controlled lifetime)
> -	 * page pinning. Treat get_vaddr_frames() like
> -	 * get_user_pages_longterm() and disallow it for filesystem-dax
> -	 * mappings.
> -	 */
> -	if (vma_is_fsdax(vma)) {
> -		ret = -EOPNOTSUPP;
> -		goto out;
> -	}
> -
> -	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> -		vec->got_ref = true;
> -		vec->is_pfns = false;
> -		ret = pin_user_pages_locked(start, nr_frames,
> -			gup_flags, (struct page **)(vec->ptrs), &locked);
> -		goto out;
> -	}

The vm_flags still need to be checked before going into the while
loop. If the break is taken then nothing would check vm_flags

Jason

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 16:53     ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 16:53 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, John Hubbard, Mauro Carvalho Chehab, LKML,
	DRI Development, Tomasz Figa, linux-mm, Kyungmin Park,
	Daniel Vetter, Andrew Morton, Marek Szyprowski, Dan Williams,
	linux-arm-kernel, linux-media

On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote:
>  
> -	/*
> -	 * While get_vaddr_frames() could be used for transient (kernel
> -	 * controlled lifetime) pinning of memory pages all current
> -	 * users establish long term (userspace controlled lifetime)
> -	 * page pinning. Treat get_vaddr_frames() like
> -	 * get_user_pages_longterm() and disallow it for filesystem-dax
> -	 * mappings.
> -	 */
> -	if (vma_is_fsdax(vma)) {
> -		ret = -EOPNOTSUPP;
> -		goto out;
> -	}
> -
> -	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> -		vec->got_ref = true;
> -		vec->is_pfns = false;
> -		ret = pin_user_pages_locked(start, nr_frames,
> -			gup_flags, (struct page **)(vec->ptrs), &locked);
> -		goto out;
> -	}

The vm_flags still need to be checked before going into the while
loop. If the break is taken then nothing would check vm_flags

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 16:53     ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 16:53 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, John Hubbard, Mauro Carvalho Chehab, LKML,
	DRI Development, Tomasz Figa, linux-mm, Kyungmin Park,
	Daniel Vetter, Andrew Morton, Marek Szyprowski, Dan Williams,
	linux-arm-kernel, linux-media

On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote:
>  
> -	/*
> -	 * While get_vaddr_frames() could be used for transient (kernel
> -	 * controlled lifetime) pinning of memory pages all current
> -	 * users establish long term (userspace controlled lifetime)
> -	 * page pinning. Treat get_vaddr_frames() like
> -	 * get_user_pages_longterm() and disallow it for filesystem-dax
> -	 * mappings.
> -	 */
> -	if (vma_is_fsdax(vma)) {
> -		ret = -EOPNOTSUPP;
> -		goto out;
> -	}
> -
> -	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> -		vec->got_ref = true;
> -		vec->is_pfns = false;
> -		ret = pin_user_pages_locked(start, nr_frames,
> -			gup_flags, (struct page **)(vec->ptrs), &locked);
> -		goto out;
> -	}

The vm_flags still need to be checked before going into the while
loop. If the break is taken then nothing would check vm_flags

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
  2020-10-07 16:53     ` Jason Gunthorpe
  (?)
@ 2020-10-07 17:12       ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 17:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Pawel Osciak, Marek Szyprowski,
	Kyungmin Park, Tomasz Figa, Mauro Carvalho Chehab, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Dan Williams

On Wed, Oct 7, 2020 at 6:53 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote:
> >
> > -     /*
> > -      * While get_vaddr_frames() could be used for transient (kernel
> > -      * controlled lifetime) pinning of memory pages all current
> > -      * users establish long term (userspace controlled lifetime)
> > -      * page pinning. Treat get_vaddr_frames() like
> > -      * get_user_pages_longterm() and disallow it for filesystem-dax
> > -      * mappings.
> > -      */
> > -     if (vma_is_fsdax(vma)) {
> > -             ret = -EOPNOTSUPP;
> > -             goto out;
> > -     }
> > -
> > -     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > -             vec->got_ref = true;
> > -             vec->is_pfns = false;
> > -             ret = pin_user_pages_locked(start, nr_frames,
> > -                     gup_flags, (struct page **)(vec->ptrs), &locked);
> > -             goto out;
> > -     }
>
> The vm_flags still need to be checked before going into the while
> loop. If the break is taken then nothing would check vm_flags

Hm right that's a bin inconsistent. follow_pfn also checks for this,
so I think we can just ditch this entirely both here and in the do {}
while () check, simplifying the latter to just while (vma). Well, just
make it a real loop with less confusing control flow probably.

Or prefer I keep this and touch the code less?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 17:12       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 17:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, John Hubbard, Mauro Carvalho Chehab, LKML,
	DRI Development, Tomasz Figa, Linux MM, Kyungmin Park,
	Daniel Vetter, Andrew Morton, Marek Szyprowski, Dan Williams,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 6:53 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote:
> >
> > -     /*
> > -      * While get_vaddr_frames() could be used for transient (kernel
> > -      * controlled lifetime) pinning of memory pages all current
> > -      * users establish long term (userspace controlled lifetime)
> > -      * page pinning. Treat get_vaddr_frames() like
> > -      * get_user_pages_longterm() and disallow it for filesystem-dax
> > -      * mappings.
> > -      */
> > -     if (vma_is_fsdax(vma)) {
> > -             ret = -EOPNOTSUPP;
> > -             goto out;
> > -     }
> > -
> > -     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > -             vec->got_ref = true;
> > -             vec->is_pfns = false;
> > -             ret = pin_user_pages_locked(start, nr_frames,
> > -                     gup_flags, (struct page **)(vec->ptrs), &locked);
> > -             goto out;
> > -     }
>
> The vm_flags still need to be checked before going into the while
> loop. If the break is taken then nothing would check vm_flags

Hm right that's a bin inconsistent. follow_pfn also checks for this,
so I think we can just ditch this entirely both here and in the do {}
while () check, simplifying the latter to just while (vma). Well, just
make it a real loop with less confusing control flow probably.

Or prefer I keep this and touch the code less?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 17:12       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 17:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, John Hubbard, Mauro Carvalho Chehab, LKML,
	DRI Development, Tomasz Figa, Linux MM, Kyungmin Park,
	Daniel Vetter, Andrew Morton, Marek Szyprowski, Dan Williams,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 6:53 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote:
> >
> > -     /*
> > -      * While get_vaddr_frames() could be used for transient (kernel
> > -      * controlled lifetime) pinning of memory pages all current
> > -      * users establish long term (userspace controlled lifetime)
> > -      * page pinning. Treat get_vaddr_frames() like
> > -      * get_user_pages_longterm() and disallow it for filesystem-dax
> > -      * mappings.
> > -      */
> > -     if (vma_is_fsdax(vma)) {
> > -             ret = -EOPNOTSUPP;
> > -             goto out;
> > -     }
> > -
> > -     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > -             vec->got_ref = true;
> > -             vec->is_pfns = false;
> > -             ret = pin_user_pages_locked(start, nr_frames,
> > -                     gup_flags, (struct page **)(vec->ptrs), &locked);
> > -             goto out;
> > -     }
>
> The vm_flags still need to be checked before going into the while
> loop. If the break is taken then nothing would check vm_flags

Hm right that's a bin inconsistent. follow_pfn also checks for this,
so I think we can just ditch this entirely both here and in the do {}
while () check, simplifying the latter to just while (vma). Well, just
make it a real loop with less confusing control flow probably.

Or prefer I keep this and touch the code less?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 17:27     ` Jason Gunthorpe
  -1 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Dan Williams,
	Kees Cook, Rik van Riel, Benjamin Herrensmidt, Dave Airlie,
	Hugh Dickins, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Daniel Vetter

On Wed, Oct 07, 2020 at 06:44:20PM +0200, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
>   ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
>   cma regions. This means if we miss the unmap the pfn might contain
>   pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
>   iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
>   ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea. Fix this.
> 
> Since ioremap might need to manipulate pagetables too we need to drop
> the pt lock and have a retry loop if we raced.
> 
> While at it, also add kerneldoc and improve the comment for the
> vma_ops->access function. It's for accessing, not for moving the
> memory from iomem to system memory, as the old comment seemed to
> suggest.
> 
> References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
> Cc: Dave Airlie <airlied@linux.ie>
> Cc: Hugh Dickins <hugh@veritas.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  include/linux/mm.h |  3 ++-
>  mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 44 insertions(+), 3 deletions(-)

This does seem to solve the race with revoke_devmem(), but it is really ugly.

It would be much nicer to wrap a rwsem around this access and the unmap.

Any place using it has a nice linear translation from vm_off to pfn,
so I don't think there is a such a good reason to use follow_pte in
the first place.

ie why not the helper be this:

 int generic_access_phys(unsigned long pfn, unsigned long pgprot,
      void *buf, size_t len, bool write)

Then something like dev/mem would compute pfn and obtain the lock:

dev_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write)
{
     cpu_addr = vma->vm_pgoff*PAGE_SIZE + (addr - vma->vm_start));

     /* FIXME: Has to be over each page of len */
     if (!devmem_is_allowed_access(PHYS_PFN(cpu_addr/4096)))
           return -EPERM;

     down_read(&mem_sem);
     generic_access_phys(cpu_addr/4096, pgprot_val(vma->vm_page_prot),
                         buf, len, write);
     up_read(&mem_sem);
}

The other cases looked simpler because they don't revoke, here the
mmap_sem alone should be enough protection, they would just need to
provide the linear translation to pfn.

What do you think?

Jason

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-07 17:27     ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Dave Airlie, Benjamin Herrensmidt, LKML, DRI Development,
	linux-mm, Jérôme Glisse, Daniel Vetter, John Hubbard,
	Hugh Dickins, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

On Wed, Oct 07, 2020 at 06:44:20PM +0200, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
>   ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
>   cma regions. This means if we miss the unmap the pfn might contain
>   pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
>   iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
>   ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea. Fix this.
> 
> Since ioremap might need to manipulate pagetables too we need to drop
> the pt lock and have a retry loop if we raced.
> 
> While at it, also add kerneldoc and improve the comment for the
> vma_ops->access function. It's for accessing, not for moving the
> memory from iomem to system memory, as the old comment seemed to
> suggest.
> 
> References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
> Cc: Dave Airlie <airlied@linux.ie>
> Cc: Hugh Dickins <hugh@veritas.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  include/linux/mm.h |  3 ++-
>  mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 44 insertions(+), 3 deletions(-)

This does seem to solve the race with revoke_devmem(), but it is really ugly.

It would be much nicer to wrap a rwsem around this access and the unmap.

Any place using it has a nice linear translation from vm_off to pfn,
so I don't think there is a such a good reason to use follow_pte in
the first place.

ie why not the helper be this:

 int generic_access_phys(unsigned long pfn, unsigned long pgprot,
      void *buf, size_t len, bool write)

Then something like dev/mem would compute pfn and obtain the lock:

dev_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write)
{
     cpu_addr = vma->vm_pgoff*PAGE_SIZE + (addr - vma->vm_start));

     /* FIXME: Has to be over each page of len */
     if (!devmem_is_allowed_access(PHYS_PFN(cpu_addr/4096)))
           return -EPERM;

     down_read(&mem_sem);
     generic_access_phys(cpu_addr/4096, pgprot_val(vma->vm_page_prot),
                         buf, len, write);
     up_read(&mem_sem);
}

The other cases looked simpler because they don't revoke, here the
mmap_sem alone should be enough protection, they would just need to
provide the linear translation to pfn.

What do you think?

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-07 17:27     ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Dave Airlie, LKML, DRI Development, linux-mm,
	Jérôme Glisse, Daniel Vetter, John Hubbard,
	Hugh Dickins, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

On Wed, Oct 07, 2020 at 06:44:20PM +0200, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
>   ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
>   cma regions. This means if we miss the unmap the pfn might contain
>   pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
>   iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
>   ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea. Fix this.
> 
> Since ioremap might need to manipulate pagetables too we need to drop
> the pt lock and have a retry loop if we raced.
> 
> While at it, also add kerneldoc and improve the comment for the
> vma_ops->access function. It's for accessing, not for moving the
> memory from iomem to system memory, as the old comment seemed to
> suggest.
> 
> References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
> Cc: Dave Airlie <airlied@linux.ie>
> Cc: Hugh Dickins <hugh@veritas.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  include/linux/mm.h |  3 ++-
>  mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 44 insertions(+), 3 deletions(-)

This does seem to solve the race with revoke_devmem(), but it is really ugly.

It would be much nicer to wrap a rwsem around this access and the unmap.

Any place using it has a nice linear translation from vm_off to pfn,
so I don't think there is a such a good reason to use follow_pte in
the first place.

ie why not the helper be this:

 int generic_access_phys(unsigned long pfn, unsigned long pgprot,
      void *buf, size_t len, bool write)

Then something like dev/mem would compute pfn and obtain the lock:

dev_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write)
{
     cpu_addr = vma->vm_pgoff*PAGE_SIZE + (addr - vma->vm_start));

     /* FIXME: Has to be over each page of len */
     if (!devmem_is_allowed_access(PHYS_PFN(cpu_addr/4096)))
           return -EPERM;

     down_read(&mem_sem);
     generic_access_phys(cpu_addr/4096, pgprot_val(vma->vm_page_prot),
                         buf, len, write);
     up_read(&mem_sem);
}

The other cases looked simpler because they don't revoke, here the
mmap_sem alone should be enough protection, they would just need to
provide the linear translation to pfn.

What do you think?

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
  2020-10-07 17:12       ` Daniel Vetter
  (?)
@ 2020-10-07 17:33         ` Jason Gunthorpe
  -1 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Pawel Osciak, Marek Szyprowski,
	Kyungmin Park, Tomasz Figa, Mauro Carvalho Chehab, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Dan Williams

On Wed, Oct 07, 2020 at 07:12:24PM +0200, Daniel Vetter wrote:
> On Wed, Oct 7, 2020 at 6:53 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote:
> > >
> > > -     /*
> > > -      * While get_vaddr_frames() could be used for transient (kernel
> > > -      * controlled lifetime) pinning of memory pages all current
> > > -      * users establish long term (userspace controlled lifetime)
> > > -      * page pinning. Treat get_vaddr_frames() like
> > > -      * get_user_pages_longterm() and disallow it for filesystem-dax
> > > -      * mappings.
> > > -      */
> > > -     if (vma_is_fsdax(vma)) {
> > > -             ret = -EOPNOTSUPP;
> > > -             goto out;
> > > -     }
> > > -
> > > -     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > > -             vec->got_ref = true;
> > > -             vec->is_pfns = false;
> > > -             ret = pin_user_pages_locked(start, nr_frames,
> > > -                     gup_flags, (struct page **)(vec->ptrs), &locked);
> > > -             goto out;
> > > -     }
> >
> > The vm_flags still need to be checked before going into the while
> > loop. If the break is taken then nothing would check vm_flags
> 
> Hm right that's a bin inconsistent. follow_pfn also checks for this,
> so I think we can just ditch this entirely both here and in the do {}
> while () check, simplifying the latter to just while (vma). Well, just
> make it a real loop with less confusing control flow probably.

It does read very poorly with the redundant check, espeically since I
keep forgetting follow_pfn does it too :\

Jason

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 17:33         ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, John Hubbard, Mauro Carvalho Chehab, LKML,
	DRI Development, Tomasz Figa, Linux MM, Kyungmin Park,
	Daniel Vetter, Andrew Morton, Marek Szyprowski, Dan Williams,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 07, 2020 at 07:12:24PM +0200, Daniel Vetter wrote:
> On Wed, Oct 7, 2020 at 6:53 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote:
> > >
> > > -     /*
> > > -      * While get_vaddr_frames() could be used for transient (kernel
> > > -      * controlled lifetime) pinning of memory pages all current
> > > -      * users establish long term (userspace controlled lifetime)
> > > -      * page pinning. Treat get_vaddr_frames() like
> > > -      * get_user_pages_longterm() and disallow it for filesystem-dax
> > > -      * mappings.
> > > -      */
> > > -     if (vma_is_fsdax(vma)) {
> > > -             ret = -EOPNOTSUPP;
> > > -             goto out;
> > > -     }
> > > -
> > > -     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > > -             vec->got_ref = true;
> > > -             vec->is_pfns = false;
> > > -             ret = pin_user_pages_locked(start, nr_frames,
> > > -                     gup_flags, (struct page **)(vec->ptrs), &locked);
> > > -             goto out;
> > > -     }
> >
> > The vm_flags still need to be checked before going into the while
> > loop. If the break is taken then nothing would check vm_flags
> 
> Hm right that's a bin inconsistent. follow_pfn also checks for this,
> so I think we can just ditch this entirely both here and in the do {}
> while () check, simplifying the latter to just while (vma). Well, just
> make it a real loop with less confusing control flow probably.

It does read very poorly with the redundant check, espeically since I
keep forgetting follow_pfn does it too :\

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 17:33         ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, John Hubbard, Mauro Carvalho Chehab, LKML,
	DRI Development, Tomasz Figa, Linux MM, Kyungmin Park,
	Daniel Vetter, Andrew Morton, Marek Szyprowski, Dan Williams,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 07, 2020 at 07:12:24PM +0200, Daniel Vetter wrote:
> On Wed, Oct 7, 2020 at 6:53 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote:
> > >
> > > -     /*
> > > -      * While get_vaddr_frames() could be used for transient (kernel
> > > -      * controlled lifetime) pinning of memory pages all current
> > > -      * users establish long term (userspace controlled lifetime)
> > > -      * page pinning. Treat get_vaddr_frames() like
> > > -      * get_user_pages_longterm() and disallow it for filesystem-dax
> > > -      * mappings.
> > > -      */
> > > -     if (vma_is_fsdax(vma)) {
> > > -             ret = -EOPNOTSUPP;
> > > -             goto out;
> > > -     }
> > > -
> > > -     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > > -             vec->got_ref = true;
> > > -             vec->is_pfns = false;
> > > -             ret = pin_user_pages_locked(start, nr_frames,
> > > -                     gup_flags, (struct page **)(vec->ptrs), &locked);
> > > -             goto out;
> > > -     }
> >
> > The vm_flags still need to be checked before going into the while
> > loop. If the break is taken then nothing would check vm_flags
> 
> Hm right that's a bin inconsistent. follow_pfn also checks for this,
> so I think we can just ditch this entirely both here and in the do {}
> while () check, simplifying the latter to just while (vma). Well, just
> make it a real loop with less confusing control flow probably.

It does read very poorly with the redundant check, espeically since I
keep forgetting follow_pfn does it too :\

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 17:36     ` Jason Gunthorpe
  -1 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:36 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara

On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea.
> 
> Unfortunately there's some users where this is not fixable (like v4l
> userptr of iomem mappings) or involves a pile of work (vfio type1
> iommu). For now annotate these as unsafe and splat appropriately.
> 
> This patch adds an unsafe_follow_pfn, which later patches will then
> roll out to all appropriate places.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: kvm@vger.kernel.org
> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
>  mm/nommu.c         | 17 +++++++++++++++++
>  security/Kconfig   | 13 +++++++++++++
>  4 files changed, 63 insertions(+), 1 deletion(-)

Makes sense to me.

I wonder if we could change the original follow_pfn to require the
ptep and then lockdep_assert_held() it against the page table lock?

> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +#ifdef CONFIG_STRICT_FOLLOW_PFN
> +	pr_info("unsafe follow_pfn usage rejected, see
> CONFIG_STRICT_FOLLOW_PFN\n");

Wonder if we can print something useful here, like the current
PID/process name?

> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..48945402e103 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
>  	  If you wish for all usermode helper programs to be disabled,
>  	  specify an empty string here (i.e. "").
>  
> +config STRICT_FOLLOW_PFN
> +	bool "Disable unsafe use of follow_pfn"
> +	depends on MMU

I would probably invert this CONFIG_ALLOW_UNSAFE_FOLLOW_PFN
default n

Jason

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
@ 2020-10-07 17:36     ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:36 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, LKML, DRI Development, linux-mm,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Andrew Morton, linux-arm-kernel, linux-media

On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea.
> 
> Unfortunately there's some users where this is not fixable (like v4l
> userptr of iomem mappings) or involves a pile of work (vfio type1
> iommu). For now annotate these as unsafe and splat appropriately.
> 
> This patch adds an unsafe_follow_pfn, which later patches will then
> roll out to all appropriate places.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: kvm@vger.kernel.org
> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
>  mm/nommu.c         | 17 +++++++++++++++++
>  security/Kconfig   | 13 +++++++++++++
>  4 files changed, 63 insertions(+), 1 deletion(-)

Makes sense to me.

I wonder if we could change the original follow_pfn to require the
ptep and then lockdep_assert_held() it against the page table lock?

> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +#ifdef CONFIG_STRICT_FOLLOW_PFN
> +	pr_info("unsafe follow_pfn usage rejected, see
> CONFIG_STRICT_FOLLOW_PFN\n");

Wonder if we can print something useful here, like the current
PID/process name?

> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..48945402e103 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
>  	  If you wish for all usermode helper programs to be disabled,
>  	  specify an empty string here (i.e. "").
>  
> +config STRICT_FOLLOW_PFN
> +	bool "Disable unsafe use of follow_pfn"
> +	depends on MMU

I would probably invert this CONFIG_ALLOW_UNSAFE_FOLLOW_PFN
default n

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
@ 2020-10-07 17:36     ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:36 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, LKML, DRI Development, linux-mm,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Andrew Morton, linux-arm-kernel, linux-media

On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea.
> 
> Unfortunately there's some users where this is not fixable (like v4l
> userptr of iomem mappings) or involves a pile of work (vfio type1
> iommu). For now annotate these as unsafe and splat appropriately.
> 
> This patch adds an unsafe_follow_pfn, which later patches will then
> roll out to all appropriate places.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: kvm@vger.kernel.org
> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
>  mm/nommu.c         | 17 +++++++++++++++++
>  security/Kconfig   | 13 +++++++++++++
>  4 files changed, 63 insertions(+), 1 deletion(-)

Makes sense to me.

I wonder if we could change the original follow_pfn to require the
ptep and then lockdep_assert_held() it against the page table lock?

> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +#ifdef CONFIG_STRICT_FOLLOW_PFN
> +	pr_info("unsafe follow_pfn usage rejected, see
> CONFIG_STRICT_FOLLOW_PFN\n");

Wonder if we can print something useful here, like the current
PID/process name?

> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..48945402e103 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
>  	  If you wish for all usermode helper programs to be disabled,
>  	  specify an empty string here (i.e. "").
>  
> +config STRICT_FOLLOW_PFN
> +	bool "Disable unsafe use of follow_pfn"
> +	depends on MMU

I would probably invert this CONFIG_ALLOW_UNSAFE_FOLLOW_PFN
default n

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 17:39     ` Jason Gunthorpe
  -1 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Alex Williamson,
	Cornelia Huck

On Wed, Oct 07, 2020 at 06:44:26PM +0200, Daniel Vetter wrote:
> The code seems to stuff these pfns into iommu pts (or something like
> that, I didn't follow), but there's no mmu_notifier to ensure that
> access is synchronized with pte updates.
> 
> Hence mark these as unsafe. This means that with
> CONFIG_STRICT_FOLLOW_PFN, these will be rejected.
> 
> Real fix is to wire up an mmu_notifier ... somehow. Probably means any
> invalidate is a fatal fault for this vfio device, but then this
> shouldn't ever happen if userspace is reasonable.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: kvm@vger.kernel.org
> ---
>  drivers/vfio/vfio_iommu_type1.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5fbf0c1f7433..a4d53f3d0a35 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
>  {
>  	int ret;
>  
> -	ret = follow_pfn(vma, vaddr, pfn);
> +	ret = unsafe_follow_pfn(vma, vaddr, pfn);
>  	if (ret) {
>  		bool unlocked = false;
>  
> @@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
>  		if (ret)
>  			return ret;
>  
> -		ret = follow_pfn(vma, vaddr, pfn);
> +		ret = unsafe_follow_pfn(vma, vaddr, pfn);
>  	}

This is actually being commonly used, so it needs fixing.

When I talked to Alex about this last we had worked out a patch series
that adds a test on vm_ops that the vma came from vfio in the first
place. The VMA's created by VFIO are 'safe' as the PTEs are never changed.

Jason

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
@ 2020-10-07 17:39     ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, Cornelia Huck, LKML, DRI Development, linux-mm,
	Jérôme Glisse, Alex Williamson, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

On Wed, Oct 07, 2020 at 06:44:26PM +0200, Daniel Vetter wrote:
> The code seems to stuff these pfns into iommu pts (or something like
> that, I didn't follow), but there's no mmu_notifier to ensure that
> access is synchronized with pte updates.
> 
> Hence mark these as unsafe. This means that with
> CONFIG_STRICT_FOLLOW_PFN, these will be rejected.
> 
> Real fix is to wire up an mmu_notifier ... somehow. Probably means any
> invalidate is a fatal fault for this vfio device, but then this
> shouldn't ever happen if userspace is reasonable.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: kvm@vger.kernel.org
> ---
>  drivers/vfio/vfio_iommu_type1.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5fbf0c1f7433..a4d53f3d0a35 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
>  {
>  	int ret;
>  
> -	ret = follow_pfn(vma, vaddr, pfn);
> +	ret = unsafe_follow_pfn(vma, vaddr, pfn);
>  	if (ret) {
>  		bool unlocked = false;
>  
> @@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
>  		if (ret)
>  			return ret;
>  
> -		ret = follow_pfn(vma, vaddr, pfn);
> +		ret = unsafe_follow_pfn(vma, vaddr, pfn);
>  	}

This is actually being commonly used, so it needs fixing.

When I talked to Alex about this last we had worked out a patch series
that adds a test on vm_ops that the vma came from vfio in the first
place. The VMA's created by VFIO are 'safe' as the PTEs are never changed.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
@ 2020-10-07 17:39     ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 17:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, Cornelia Huck, LKML, DRI Development, linux-mm,
	Jérôme Glisse, Alex Williamson, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

On Wed, Oct 07, 2020 at 06:44:26PM +0200, Daniel Vetter wrote:
> The code seems to stuff these pfns into iommu pts (or something like
> that, I didn't follow), but there's no mmu_notifier to ensure that
> access is synchronized with pte updates.
> 
> Hence mark these as unsafe. This means that with
> CONFIG_STRICT_FOLLOW_PFN, these will be rejected.
> 
> Real fix is to wire up an mmu_notifier ... somehow. Probably means any
> invalidate is a fatal fault for this vfio device, but then this
> shouldn't ever happen if userspace is reasonable.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: kvm@vger.kernel.org
> ---
>  drivers/vfio/vfio_iommu_type1.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5fbf0c1f7433..a4d53f3d0a35 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
>  {
>  	int ret;
>  
> -	ret = follow_pfn(vma, vaddr, pfn);
> +	ret = unsafe_follow_pfn(vma, vaddr, pfn);
>  	if (ret) {
>  		bool unlocked = false;
>  
> @@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
>  		if (ret)
>  			return ret;
>  
> -		ret = follow_pfn(vma, vaddr, pfn);
> +		ret = unsafe_follow_pfn(vma, vaddr, pfn);
>  	}

This is actually being commonly used, so it needs fixing.

When I talked to Alex about this last we had worked out a patch series
that adds a test on vm_ops that the vma came from vfio in the first
place. The VMA's created by VFIO are 'safe' as the PTEs are never changed.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
  2020-10-07 17:27     ` Jason Gunthorpe
  (?)
@ 2020-10-07 18:01       ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 18:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Dan Williams, Kees Cook, Rik van Riel,
	Benjamin Herrensmidt, Dave Airlie, Hugh Dickins, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Daniel Vetter

On Wed, Oct 7, 2020 at 7:27 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:20PM +0200, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> >   ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> >   cma regions. This means if we miss the unmap the pfn might contain
> >   pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> >   iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> >   ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea. Fix this.
> >
> > Since ioremap might need to manipulate pagetables too we need to drop
> > the pt lock and have a retry loop if we raced.
> >
> > While at it, also add kerneldoc and improve the comment for the
> > vma_ops->access function. It's for accessing, not for moving the
> > memory from iomem to system memory, as the old comment seemed to
> > suggest.
> >
> > References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
> > Cc: Dave Airlie <airlied@linux.ie>
> > Cc: Hugh Dickins <hugh@veritas.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  include/linux/mm.h |  3 ++-
> >  mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 44 insertions(+), 3 deletions(-)
>
> This does seem to solve the race with revoke_devmem(), but it is really ugly.
>
> It would be much nicer to wrap a rwsem around this access and the unmap.
>
> Any place using it has a nice linear translation from vm_off to pfn,
> so I don't think there is a such a good reason to use follow_pte in
> the first place.
>
> ie why not the helper be this:
>
>  int generic_access_phys(unsigned long pfn, unsigned long pgprot,
>       void *buf, size_t len, bool write)
>
> Then something like dev/mem would compute pfn and obtain the lock:
>
> dev_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write)
> {
>      cpu_addr = vma->vm_pgoff*PAGE_SIZE + (addr - vma->vm_start));
>
>      /* FIXME: Has to be over each page of len */
>      if (!devmem_is_allowed_access(PHYS_PFN(cpu_addr/4096)))
>            return -EPERM;
>
>      down_read(&mem_sem);
>      generic_access_phys(cpu_addr/4096, pgprot_val(vma->vm_page_prot),
>                          buf, len, write);
>      up_read(&mem_sem);
> }
>
> The other cases looked simpler because they don't revoke, here the
> mmap_sem alone should be enough protection, they would just need to
> provide the linear translation to pfn.
>
> What do you think?

I think it'd fix the bug, until someone wires ->access up for
drivers/gpu, or the next subsystem. This is also just for ptrace, so
we really don't care when we stall the vm badly and other silly
things. So I figured the somewhat ugly, but full generic solution is
the better one, so that people who want to be able to ptrace
read/write their iomem mmaps can just sprinkle this wherever they feel
like.

But yeah if we go with most minimal fix, i.e. only trying to fix the
current users, then your thing should work and is simpler. But it
leaves the door open for future problems.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-07 18:01       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 18:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Dave Airlie, Benjamin Herrensmidt, LKML, DRI Development,
	Linux MM, Jérôme Glisse, Daniel Vetter, John Hubbard,
	Hugh Dickins, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 7:27 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:20PM +0200, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> >   ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> >   cma regions. This means if we miss the unmap the pfn might contain
> >   pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> >   iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> >   ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea. Fix this.
> >
> > Since ioremap might need to manipulate pagetables too we need to drop
> > the pt lock and have a retry loop if we raced.
> >
> > While at it, also add kerneldoc and improve the comment for the
> > vma_ops->access function. It's for accessing, not for moving the
> > memory from iomem to system memory, as the old comment seemed to
> > suggest.
> >
> > References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
> > Cc: Dave Airlie <airlied@linux.ie>
> > Cc: Hugh Dickins <hugh@veritas.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  include/linux/mm.h |  3 ++-
> >  mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 44 insertions(+), 3 deletions(-)
>
> This does seem to solve the race with revoke_devmem(), but it is really ugly.
>
> It would be much nicer to wrap a rwsem around this access and the unmap.
>
> Any place using it has a nice linear translation from vm_off to pfn,
> so I don't think there is a such a good reason to use follow_pte in
> the first place.
>
> ie why not the helper be this:
>
>  int generic_access_phys(unsigned long pfn, unsigned long pgprot,
>       void *buf, size_t len, bool write)
>
> Then something like dev/mem would compute pfn and obtain the lock:
>
> dev_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write)
> {
>      cpu_addr = vma->vm_pgoff*PAGE_SIZE + (addr - vma->vm_start));
>
>      /* FIXME: Has to be over each page of len */
>      if (!devmem_is_allowed_access(PHYS_PFN(cpu_addr/4096)))
>            return -EPERM;
>
>      down_read(&mem_sem);
>      generic_access_phys(cpu_addr/4096, pgprot_val(vma->vm_page_prot),
>                          buf, len, write);
>      up_read(&mem_sem);
> }
>
> The other cases looked simpler because they don't revoke, here the
> mmap_sem alone should be enough protection, they would just need to
> provide the linear translation to pfn.
>
> What do you think?

I think it'd fix the bug, until someone wires ->access up for
drivers/gpu, or the next subsystem. This is also just for ptrace, so
we really don't care when we stall the vm badly and other silly
things. So I figured the somewhat ugly, but full generic solution is
the better one, so that people who want to be able to ptrace
read/write their iomem mmaps can just sprinkle this wherever they feel
like.

But yeah if we go with most minimal fix, i.e. only trying to fix the
current users, then your thing should work and is simpler. But it
leaves the door open for future problems.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-07 18:01       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 18:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Dave Airlie, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, John Hubbard,
	Hugh Dickins, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 7:27 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:20PM +0200, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> >   ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> >   cma regions. This means if we miss the unmap the pfn might contain
> >   pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> >   iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> >   ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea. Fix this.
> >
> > Since ioremap might need to manipulate pagetables too we need to drop
> > the pt lock and have a retry loop if we raced.
> >
> > While at it, also add kerneldoc and improve the comment for the
> > vma_ops->access function. It's for accessing, not for moving the
> > memory from iomem to system memory, as the old comment seemed to
> > suggest.
> >
> > References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
> > Cc: Dave Airlie <airlied@linux.ie>
> > Cc: Hugh Dickins <hugh@veritas.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  include/linux/mm.h |  3 ++-
> >  mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 44 insertions(+), 3 deletions(-)
>
> This does seem to solve the race with revoke_devmem(), but it is really ugly.
>
> It would be much nicer to wrap a rwsem around this access and the unmap.
>
> Any place using it has a nice linear translation from vm_off to pfn,
> so I don't think there is a such a good reason to use follow_pte in
> the first place.
>
> ie why not the helper be this:
>
>  int generic_access_phys(unsigned long pfn, unsigned long pgprot,
>       void *buf, size_t len, bool write)
>
> Then something like dev/mem would compute pfn and obtain the lock:
>
> dev_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write)
> {
>      cpu_addr = vma->vm_pgoff*PAGE_SIZE + (addr - vma->vm_start));
>
>      /* FIXME: Has to be over each page of len */
>      if (!devmem_is_allowed_access(PHYS_PFN(cpu_addr/4096)))
>            return -EPERM;
>
>      down_read(&mem_sem);
>      generic_access_phys(cpu_addr/4096, pgprot_val(vma->vm_page_prot),
>                          buf, len, write);
>      up_read(&mem_sem);
> }
>
> The other cases looked simpler because they don't revoke, here the
> mmap_sem alone should be enough protection, they would just need to
> provide the linear translation to pfn.
>
> What do you think?

I think it'd fix the bug, until someone wires ->access up for
drivers/gpu, or the next subsystem. This is also just for ptrace, so
we really don't care when we stall the vm badly and other silly
things. So I figured the somewhat ugly, but full generic solution is
the better one, so that people who want to be able to ptrace
read/write their iomem mmaps can just sprinkle this wherever they feel
like.

But yeah if we go with most minimal fix, i.e. only trying to fix the
current users, then your thing should work and is simpler. But it
leaves the door open for future problems.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
  2020-10-07 17:36     ` Jason Gunthorpe
  (?)
@ 2020-10-07 18:10       ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 18:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara

On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea.
> >
> > Unfortunately there's some users where this is not fixable (like v4l
> > userptr of iomem mappings) or involves a pile of work (vfio type1
> > iommu). For now annotate these as unsafe and splat appropriately.
> >
> > This patch adds an unsafe_follow_pfn, which later patches will then
> > roll out to all appropriate places.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: kvm@vger.kernel.org
> > ---
> >  include/linux/mm.h |  2 ++
> >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
> >  mm/nommu.c         | 17 +++++++++++++++++
> >  security/Kconfig   | 13 +++++++++++++
> >  4 files changed, 63 insertions(+), 1 deletion(-)
>
> Makes sense to me.
>
> I wonder if we could change the original follow_pfn to require the
> ptep and then lockdep_assert_held() it against the page table lock?

The safe variant with the pagetable lock is follow_pte_pmd. The only
way to make follow_pfn safe is if you have an mmu notifier and
corresponding retry logic. That is not covered by lockdep (it would
splat if we annotate the retry side), so I'm not sure how you'd check
for that?

Checking for ptep lock doesn't work here, since the one leftover safe
user of this (kvm) doesn't need that at all, because it has the mmu
notifier.

Also follow_pte_pmd will splat with lockdep if you get it wrong, since
the function leaves you with the right ptlock lock when it returns. If
you forget to unlock that, lockdep will complain.

So I think we're as good as it gets, since I really have no idea how
to make sure follow_pfn callers do have an mmu notifier registered.

> > +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> > +     unsigned long *pfn)
> > +{
> > +#ifdef CONFIG_STRICT_FOLLOW_PFN
> > +     pr_info("unsafe follow_pfn usage rejected, see
> > CONFIG_STRICT_FOLLOW_PFN\n");
>
> Wonder if we can print something useful here, like the current
> PID/process name?

Yeah adding comm/pid here makes sense.

> > diff --git a/security/Kconfig b/security/Kconfig
> > index 7561f6f99f1d..48945402e103 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
> >         If you wish for all usermode helper programs to be disabled,
> >         specify an empty string here (i.e. "").
> >
> > +config STRICT_FOLLOW_PFN
> > +     bool "Disable unsafe use of follow_pfn"
> > +     depends on MMU
>
> I would probably invert this CONFIG_ALLOW_UNSAFE_FOLLOW_PFN
> default n

I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
explicit enables and default to "do not break uapi, damn the
(security) bugs". Which is I think how this should be done. It is in
the security section though, so hopefully competent distros will
enable this all.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
@ 2020-10-07 18:10       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 18:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Andrew Morton, Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea.
> >
> > Unfortunately there's some users where this is not fixable (like v4l
> > userptr of iomem mappings) or involves a pile of work (vfio type1
> > iommu). For now annotate these as unsafe and splat appropriately.
> >
> > This patch adds an unsafe_follow_pfn, which later patches will then
> > roll out to all appropriate places.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: kvm@vger.kernel.org
> > ---
> >  include/linux/mm.h |  2 ++
> >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
> >  mm/nommu.c         | 17 +++++++++++++++++
> >  security/Kconfig   | 13 +++++++++++++
> >  4 files changed, 63 insertions(+), 1 deletion(-)
>
> Makes sense to me.
>
> I wonder if we could change the original follow_pfn to require the
> ptep and then lockdep_assert_held() it against the page table lock?

The safe variant with the pagetable lock is follow_pte_pmd. The only
way to make follow_pfn safe is if you have an mmu notifier and
corresponding retry logic. That is not covered by lockdep (it would
splat if we annotate the retry side), so I'm not sure how you'd check
for that?

Checking for ptep lock doesn't work here, since the one leftover safe
user of this (kvm) doesn't need that at all, because it has the mmu
notifier.

Also follow_pte_pmd will splat with lockdep if you get it wrong, since
the function leaves you with the right ptlock lock when it returns. If
you forget to unlock that, lockdep will complain.

So I think we're as good as it gets, since I really have no idea how
to make sure follow_pfn callers do have an mmu notifier registered.

> > +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> > +     unsigned long *pfn)
> > +{
> > +#ifdef CONFIG_STRICT_FOLLOW_PFN
> > +     pr_info("unsafe follow_pfn usage rejected, see
> > CONFIG_STRICT_FOLLOW_PFN\n");
>
> Wonder if we can print something useful here, like the current
> PID/process name?

Yeah adding comm/pid here makes sense.

> > diff --git a/security/Kconfig b/security/Kconfig
> > index 7561f6f99f1d..48945402e103 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
> >         If you wish for all usermode helper programs to be disabled,
> >         specify an empty string here (i.e. "").
> >
> > +config STRICT_FOLLOW_PFN
> > +     bool "Disable unsafe use of follow_pfn"
> > +     depends on MMU
>
> I would probably invert this CONFIG_ALLOW_UNSAFE_FOLLOW_PFN
> default n

I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
explicit enables and default to "do not break uapi, damn the
(security) bugs". Which is I think how this should be done. It is in
the security section though, so hopefully competent distros will
enable this all.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
@ 2020-10-07 18:10       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 18:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Andrew Morton, Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea.
> >
> > Unfortunately there's some users where this is not fixable (like v4l
> > userptr of iomem mappings) or involves a pile of work (vfio type1
> > iommu). For now annotate these as unsafe and splat appropriately.
> >
> > This patch adds an unsafe_follow_pfn, which later patches will then
> > roll out to all appropriate places.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: kvm@vger.kernel.org
> > ---
> >  include/linux/mm.h |  2 ++
> >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
> >  mm/nommu.c         | 17 +++++++++++++++++
> >  security/Kconfig   | 13 +++++++++++++
> >  4 files changed, 63 insertions(+), 1 deletion(-)
>
> Makes sense to me.
>
> I wonder if we could change the original follow_pfn to require the
> ptep and then lockdep_assert_held() it against the page table lock?

The safe variant with the pagetable lock is follow_pte_pmd. The only
way to make follow_pfn safe is if you have an mmu notifier and
corresponding retry logic. That is not covered by lockdep (it would
splat if we annotate the retry side), so I'm not sure how you'd check
for that?

Checking for ptep lock doesn't work here, since the one leftover safe
user of this (kvm) doesn't need that at all, because it has the mmu
notifier.

Also follow_pte_pmd will splat with lockdep if you get it wrong, since
the function leaves you with the right ptlock lock when it returns. If
you forget to unlock that, lockdep will complain.

So I think we're as good as it gets, since I really have no idea how
to make sure follow_pfn callers do have an mmu notifier registered.

> > +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> > +     unsigned long *pfn)
> > +{
> > +#ifdef CONFIG_STRICT_FOLLOW_PFN
> > +     pr_info("unsafe follow_pfn usage rejected, see
> > CONFIG_STRICT_FOLLOW_PFN\n");
>
> Wonder if we can print something useful here, like the current
> PID/process name?

Yeah adding comm/pid here makes sense.

> > diff --git a/security/Kconfig b/security/Kconfig
> > index 7561f6f99f1d..48945402e103 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
> >         If you wish for all usermode helper programs to be disabled,
> >         specify an empty string here (i.e. "").
> >
> > +config STRICT_FOLLOW_PFN
> > +     bool "Disable unsafe use of follow_pfn"
> > +     depends on MMU
>
> I would probably invert this CONFIG_ALLOW_UNSAFE_FOLLOW_PFN
> default n

I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
explicit enables and default to "do not break uapi, damn the
(security) bugs". Which is I think how this should be done. It is in
the security section though, so hopefully competent distros will
enable this all.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
  2020-10-07 17:39     ` Jason Gunthorpe
  (?)
@ 2020-10-07 18:14       ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 18:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Alex Williamson, Cornelia Huck

On Wed, Oct 7, 2020 at 7:39 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:26PM +0200, Daniel Vetter wrote:
> > The code seems to stuff these pfns into iommu pts (or something like
> > that, I didn't follow), but there's no mmu_notifier to ensure that
> > access is synchronized with pte updates.
> >
> > Hence mark these as unsafe. This means that with
> > CONFIG_STRICT_FOLLOW_PFN, these will be rejected.
> >
> > Real fix is to wire up an mmu_notifier ... somehow. Probably means any
> > invalidate is a fatal fault for this vfio device, but then this
> > shouldn't ever happen if userspace is reasonable.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: kvm@vger.kernel.org
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 5fbf0c1f7433..a4d53f3d0a35 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
> >  {
> >       int ret;
> >
> > -     ret = follow_pfn(vma, vaddr, pfn);
> > +     ret = unsafe_follow_pfn(vma, vaddr, pfn);
> >       if (ret) {
> >               bool unlocked = false;
> >
> > @@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
> >               if (ret)
> >                       return ret;
> >
> > -             ret = follow_pfn(vma, vaddr, pfn);
> > +             ret = unsafe_follow_pfn(vma, vaddr, pfn);
> >       }
>
> This is actually being commonly used, so it needs fixing.
>
> When I talked to Alex about this last we had worked out a patch series
> that adds a test on vm_ops that the vma came from vfio in the first
> place. The VMA's created by VFIO are 'safe' as the PTEs are never changed.

Hm, but wouldn't need that the semi-nasty vma_open trick to make sure
that vma doesn't untimely disappear? Or is the idea to look up the
underlying vfio object, and refcount that directly?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
@ 2020-10-07 18:14       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 18:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, Cornelia Huck, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Alex Williamson, Daniel Vetter,
	Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 7:39 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:26PM +0200, Daniel Vetter wrote:
> > The code seems to stuff these pfns into iommu pts (or something like
> > that, I didn't follow), but there's no mmu_notifier to ensure that
> > access is synchronized with pte updates.
> >
> > Hence mark these as unsafe. This means that with
> > CONFIG_STRICT_FOLLOW_PFN, these will be rejected.
> >
> > Real fix is to wire up an mmu_notifier ... somehow. Probably means any
> > invalidate is a fatal fault for this vfio device, but then this
> > shouldn't ever happen if userspace is reasonable.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: kvm@vger.kernel.org
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 5fbf0c1f7433..a4d53f3d0a35 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
> >  {
> >       int ret;
> >
> > -     ret = follow_pfn(vma, vaddr, pfn);
> > +     ret = unsafe_follow_pfn(vma, vaddr, pfn);
> >       if (ret) {
> >               bool unlocked = false;
> >
> > @@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
> >               if (ret)
> >                       return ret;
> >
> > -             ret = follow_pfn(vma, vaddr, pfn);
> > +             ret = unsafe_follow_pfn(vma, vaddr, pfn);
> >       }
>
> This is actually being commonly used, so it needs fixing.
>
> When I talked to Alex about this last we had worked out a patch series
> that adds a test on vm_ops that the vma came from vfio in the first
> place. The VMA's created by VFIO are 'safe' as the PTEs are never changed.

Hm, but wouldn't need that the semi-nasty vma_open trick to make sure
that vma doesn't untimely disappear? Or is the idea to look up the
underlying vfio object, and refcount that directly?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
@ 2020-10-07 18:14       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 18:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, Cornelia Huck, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Alex Williamson, Daniel Vetter,
	Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 7:39 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 06:44:26PM +0200, Daniel Vetter wrote:
> > The code seems to stuff these pfns into iommu pts (or something like
> > that, I didn't follow), but there's no mmu_notifier to ensure that
> > access is synchronized with pte updates.
> >
> > Hence mark these as unsafe. This means that with
> > CONFIG_STRICT_FOLLOW_PFN, these will be rejected.
> >
> > Real fix is to wire up an mmu_notifier ... somehow. Probably means any
> > invalidate is a fatal fault for this vfio device, but then this
> > shouldn't ever happen if userspace is reasonable.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: kvm@vger.kernel.org
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 5fbf0c1f7433..a4d53f3d0a35 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
> >  {
> >       int ret;
> >
> > -     ret = follow_pfn(vma, vaddr, pfn);
> > +     ret = unsafe_follow_pfn(vma, vaddr, pfn);
> >       if (ret) {
> >               bool unlocked = false;
> >
> > @@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
> >               if (ret)
> >                       return ret;
> >
> > -             ret = follow_pfn(vma, vaddr, pfn);
> > +             ret = unsafe_follow_pfn(vma, vaddr, pfn);
> >       }
>
> This is actually being commonly used, so it needs fixing.
>
> When I talked to Alex about this last we had worked out a patch series
> that adds a test on vm_ops that the vma came from vfio in the first
> place. The VMA's created by VFIO are 'safe' as the PTEs are never changed.

Hm, but wouldn't need that the semi-nasty vma_open trick to make sure
that vma doesn't untimely disappear? Or is the idea to look up the
underlying vfio object, and refcount that directly?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 18:41     ` Bjorn Helgaas
  -1 siblings, 0 replies; 166+ messages in thread
From: Bjorn Helgaas @ 2020-10-07 18:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	linux-pci

Capitalize subject, like other patches in this series and previous
drivers/pci history.

On Wed, Oct 07, 2020 at 06:44:23PM +0200, Daniel Vetter wrote:
> Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> the region") /dev/kmem zaps ptes when the kernel requests exclusive
> acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> the default for all driver uses.
> 
> Except there's two more ways to access pci bars: sysfs and proc mmap
> support. Let's plug that hole.

s/pci/PCI/ in commit logs and comments.

> For revoke_devmem() to work we need to link our vma into the same
> address_space, with consistent vma->vm_pgoff. ->pgoff is already
> adjusted, because that's how (io_)remap_pfn_range works, but for the
> mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> at ->open time, but that's a bit tricky here with all the entry points
> and arch code. So instead create a fake file and adjust vma->vm_file.
> 
> Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
> seems to be a subset of architectures support STRICT_DEVMEM, so we
> should be good.
> 
> The only difference in access checks left is that sysfs pci mmap does
> not check for CAP_RAWIO. But I think that makes some sense compared to
> /dev/mem and proc, where one file gives you access to everything and
> no ownership applies.

> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
>  }
>  
>  static struct inode *devmem_inode;
> +static struct vfsmount *devmem_vfs_mount;
>  
>  #ifdef CONFIG_IO_STRICT_DEVMEM
>  void revoke_devmem(struct resource *res)
> @@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
>  
>  	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
>  }
> +
> +struct file *devmem_getfile(void)
> +{
> +	struct file *file;
> +
> +	file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
> +				 O_RDWR, &kmem_fops);
> +	if (IS_ERR(file))
> +		return NULL;
> +
> +	file->f_mapping = devmem_indoe->i_mapping;

"devmem_indoe"?  Obviously not compiled, I guess?

> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct resource *base,
>  
>  #ifdef CONFIG_IO_STRICT_DEVMEM
>  void revoke_devmem(struct resource *res);
> +struct file *devm_getfile(void);
>  #else
>  static inline void revoke_devmem(struct resource *res) { };
> +static inline struct file *devmem_getfile(void) { return NULL; };

I guess these names are supposed to match?

>  #endif
>  
>  #endif /* __ASSEMBLY__ */
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 18:41     ` Bjorn Helgaas
  0 siblings, 0 replies; 166+ messages in thread
From: Bjorn Helgaas @ 2020-10-07 18:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, linux-mm,
	Jérôme Glisse, linux-pci, Bjorn Helgaas, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

Capitalize subject, like other patches in this series and previous
drivers/pci history.

On Wed, Oct 07, 2020 at 06:44:23PM +0200, Daniel Vetter wrote:
> Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> the region") /dev/kmem zaps ptes when the kernel requests exclusive
> acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> the default for all driver uses.
> 
> Except there's two more ways to access pci bars: sysfs and proc mmap
> support. Let's plug that hole.

s/pci/PCI/ in commit logs and comments.

> For revoke_devmem() to work we need to link our vma into the same
> address_space, with consistent vma->vm_pgoff. ->pgoff is already
> adjusted, because that's how (io_)remap_pfn_range works, but for the
> mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> at ->open time, but that's a bit tricky here with all the entry points
> and arch code. So instead create a fake file and adjust vma->vm_file.
> 
> Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
> seems to be a subset of architectures support STRICT_DEVMEM, so we
> should be good.
> 
> The only difference in access checks left is that sysfs pci mmap does
> not check for CAP_RAWIO. But I think that makes some sense compared to
> /dev/mem and proc, where one file gives you access to everything and
> no ownership applies.

> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
>  }
>  
>  static struct inode *devmem_inode;
> +static struct vfsmount *devmem_vfs_mount;
>  
>  #ifdef CONFIG_IO_STRICT_DEVMEM
>  void revoke_devmem(struct resource *res)
> @@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
>  
>  	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
>  }
> +
> +struct file *devmem_getfile(void)
> +{
> +	struct file *file;
> +
> +	file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
> +				 O_RDWR, &kmem_fops);
> +	if (IS_ERR(file))
> +		return NULL;
> +
> +	file->f_mapping = devmem_indoe->i_mapping;

"devmem_indoe"?  Obviously not compiled, I guess?

> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct resource *base,
>  
>  #ifdef CONFIG_IO_STRICT_DEVMEM
>  void revoke_devmem(struct resource *res);
> +struct file *devm_getfile(void);
>  #else
>  static inline void revoke_devmem(struct resource *res) { };
> +static inline struct file *devmem_getfile(void) { return NULL; };

I guess these names are supposed to match?

>  #endif
>  
>  #endif /* __ASSEMBLY__ */
> -- 
> 2.28.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 18:41     ` Bjorn Helgaas
  0 siblings, 0 replies; 166+ messages in thread
From: Bjorn Helgaas @ 2020-10-07 18:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, linux-mm,
	Jérôme Glisse, linux-pci, Bjorn Helgaas, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

Capitalize subject, like other patches in this series and previous
drivers/pci history.

On Wed, Oct 07, 2020 at 06:44:23PM +0200, Daniel Vetter wrote:
> Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> the region") /dev/kmem zaps ptes when the kernel requests exclusive
> acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> the default for all driver uses.
> 
> Except there's two more ways to access pci bars: sysfs and proc mmap
> support. Let's plug that hole.

s/pci/PCI/ in commit logs and comments.

> For revoke_devmem() to work we need to link our vma into the same
> address_space, with consistent vma->vm_pgoff. ->pgoff is already
> adjusted, because that's how (io_)remap_pfn_range works, but for the
> mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> at ->open time, but that's a bit tricky here with all the entry points
> and arch code. So instead create a fake file and adjust vma->vm_file.
> 
> Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
> seems to be a subset of architectures support STRICT_DEVMEM, so we
> should be good.
> 
> The only difference in access checks left is that sysfs pci mmap does
> not check for CAP_RAWIO. But I think that makes some sense compared to
> /dev/mem and proc, where one file gives you access to everything and
> no ownership applies.

> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
>  }
>  
>  static struct inode *devmem_inode;
> +static struct vfsmount *devmem_vfs_mount;
>  
>  #ifdef CONFIG_IO_STRICT_DEVMEM
>  void revoke_devmem(struct resource *res)
> @@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
>  
>  	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
>  }
> +
> +struct file *devmem_getfile(void)
> +{
> +	struct file *file;
> +
> +	file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
> +				 O_RDWR, &kmem_fops);
> +	if (IS_ERR(file))
> +		return NULL;
> +
> +	file->f_mapping = devmem_indoe->i_mapping;

"devmem_indoe"?  Obviously not compiled, I guess?

> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct resource *base,
>  
>  #ifdef CONFIG_IO_STRICT_DEVMEM
>  void revoke_devmem(struct resource *res);
> +struct file *devm_getfile(void);
>  #else
>  static inline void revoke_devmem(struct resource *res) { };
> +static inline struct file *devmem_getfile(void) { return NULL; };

I guess these names are supposed to match?

>  #endif
>  
>  #endif /* __ASSEMBLY__ */
> -- 
> 2.28.0
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 09/13] PCI: obey iomem restrictions for procfs mmap
  2020-10-07 16:44   ` Daniel Vetter
  (?)
  (?)
@ 2020-10-07 18:46     ` Bjorn Helgaas
  -1 siblings, 0 replies; 166+ messages in thread
From: Bjorn Helgaas @ 2020-10-07 18:46 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	linux-pci

On Wed, Oct 07, 2020 at 06:44:22PM +0200, Daniel Vetter wrote:
> There's three ways to access pci bars from userspace: /dev/mem, sysfs
> files, and the old proc interface. Two check against
> iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> this starts to matter, since we don't want random userspace having
> access to pci bars while a driver is loaded and using it.
> 
> Fix this.

Please mention *how* you're fixing this.  I know you can sort of
deduce it from the first paragraph, but it's easy to save readers the
trouble.

s/pci/PCI/
s/bars/BARs/
Capitalize subject to match other patches.

> References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/pci/proc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> index d35186b01d98..3a2f90beb4cb 100644
> --- a/drivers/pci/proc.c
> +++ b/drivers/pci/proc.c
> @@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
>  		else
>  			return -EINVAL;
>  	}
> +
> +	if (dev->resource[i].flags & IORESOURCE_MEM &&
> +	    iomem_is_exclusive(dev->resource[i].start))
> +		return -EINVAL;
> +
>  	ret = pci_mmap_page_range(dev, i, vma,
>  				  fpriv->mmap_state, write_combine);
>  	if (ret < 0)
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 09/13] PCI: obey iomem restrictions for procfs mmap
@ 2020-10-07 18:46     ` Bjorn Helgaas
  0 siblings, 0 replies; 166+ messages in thread
From: Bjorn Helgaas @ 2020-10-07 18:46 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	linux-pci

On Wed, Oct 07, 2020 at 06:44:22PM +0200, Daniel Vetter wrote:
> There's three ways to access pci bars from userspace: /dev/mem, sysfs
> files, and the old proc interface. Two check against
> iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> this starts to matter, since we don't want random userspace having
> access to pci bars while a driver is loaded and using it.
> 
> Fix this.

Please mention *how* you're fixing this.  I know you can sort of
deduce it from the first paragraph, but it's easy to save readers the
trouble.

s/pci/PCI/
s/bars/BARs/
Capitalize subject to match other patches.

> References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: J�r�me Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/pci/proc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> index d35186b01d98..3a2f90beb4cb 100644
> --- a/drivers/pci/proc.c
> +++ b/drivers/pci/proc.c
> @@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
>  		else
>  			return -EINVAL;
>  	}
> +
> +	if (dev->resource[i].flags & IORESOURCE_MEM &&
> +	    iomem_is_exclusive(dev->resource[i].start))
> +		return -EINVAL;
> +
>  	ret = pci_mmap_page_range(dev, i, vma,
>  				  fpriv->mmap_state, write_combine);
>  	if (ret < 0)
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 09/13] PCI: obey iomem restrictions for procfs mmap
@ 2020-10-07 18:46     ` Bjorn Helgaas
  0 siblings, 0 replies; 166+ messages in thread
From: Bjorn Helgaas @ 2020-10-07 18:46 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, linux-mm,
	Jérôme Glisse, linux-pci, Bjorn Helgaas, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

On Wed, Oct 07, 2020 at 06:44:22PM +0200, Daniel Vetter wrote:
> There's three ways to access pci bars from userspace: /dev/mem, sysfs
> files, and the old proc interface. Two check against
> iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> this starts to matter, since we don't want random userspace having
> access to pci bars while a driver is loaded and using it.
> 
> Fix this.

Please mention *how* you're fixing this.  I know you can sort of
deduce it from the first paragraph, but it's easy to save readers the
trouble.

s/pci/PCI/
s/bars/BARs/
Capitalize subject to match other patches.

> References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/pci/proc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> index d35186b01d98..3a2f90beb4cb 100644
> --- a/drivers/pci/proc.c
> +++ b/drivers/pci/proc.c
> @@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
>  		else
>  			return -EINVAL;
>  	}
> +
> +	if (dev->resource[i].flags & IORESOURCE_MEM &&
> +	    iomem_is_exclusive(dev->resource[i].start))
> +		return -EINVAL;
> +
>  	ret = pci_mmap_page_range(dev, i, vma,
>  				  fpriv->mmap_state, write_combine);
>  	if (ret < 0)
> -- 
> 2.28.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 09/13] PCI: obey iomem restrictions for procfs mmap
@ 2020-10-07 18:46     ` Bjorn Helgaas
  0 siblings, 0 replies; 166+ messages in thread
From: Bjorn Helgaas @ 2020-10-07 18:46 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, linux-mm,
	Jérôme Glisse, linux-pci, Bjorn Helgaas, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

On Wed, Oct 07, 2020 at 06:44:22PM +0200, Daniel Vetter wrote:
> There's three ways to access pci bars from userspace: /dev/mem, sysfs
> files, and the old proc interface. Two check against
> iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> this starts to matter, since we don't want random userspace having
> access to pci bars while a driver is loaded and using it.
> 
> Fix this.

Please mention *how* you're fixing this.  I know you can sort of
deduce it from the first paragraph, but it's easy to save readers the
trouble.

s/pci/PCI/
s/bars/BARs/
Capitalize subject to match other patches.

> References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/pci/proc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> index d35186b01d98..3a2f90beb4cb 100644
> --- a/drivers/pci/proc.c
> +++ b/drivers/pci/proc.c
> @@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
>  		else
>  			return -EINVAL;
>  	}
> +
> +	if (dev->resource[i].flags & IORESOURCE_MEM &&
> +	    iomem_is_exclusive(dev->resource[i].start))
> +		return -EINVAL;
> +
>  	ret = pci_mmap_page_range(dev, i, vma,
>  				  fpriv->mmap_state, write_combine);
>  	if (ret < 0)
> -- 
> 2.28.0
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
  2020-10-07 18:14       ` Daniel Vetter
  (?)
@ 2020-10-07 18:47         ` Jason Gunthorpe
  -1 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 18:47 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Alex Williamson, Cornelia Huck

On Wed, Oct 07, 2020 at 08:14:06PM +0200, Daniel Vetter wrote:

> Hm, but wouldn't need that the semi-nasty vma_open trick to make sure
> that vma doesn't untimely disappear? Or is the idea to look up the
> underlying vfio object, and refcount that directly?

Ah, the patches Alex was working on had the refcount I think, it does
need co-ordination across multiple VFIO instances IIRC.

At least a simple check would guarentee we only have exposed PCI BAR
pages which is not as bad security wise as the other stuff.

Jason

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
@ 2020-10-07 18:47         ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 18:47 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, Cornelia Huck, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Alex Williamson, Daniel Vetter,
	Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 07, 2020 at 08:14:06PM +0200, Daniel Vetter wrote:

> Hm, but wouldn't need that the semi-nasty vma_open trick to make sure
> that vma doesn't untimely disappear? Or is the idea to look up the
> underlying vfio object, and refcount that directly?

Ah, the patches Alex was working on had the refcount I think, it does
need co-ordination across multiple VFIO instances IIRC.

At least a simple check would guarentee we only have exposed PCI BAR
pages which is not as bad security wise as the other stuff.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe
@ 2020-10-07 18:47         ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 18:47 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, Cornelia Huck, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Alex Williamson, Daniel Vetter,
	Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 07, 2020 at 08:14:06PM +0200, Daniel Vetter wrote:

> Hm, but wouldn't need that the semi-nasty vma_open trick to make sure
> that vma doesn't untimely disappear? Or is the idea to look up the
> underlying vfio object, and refcount that directly?

Ah, the patches Alex was working on had the refcount I think, it does
need co-ordination across multiple VFIO instances IIRC.

At least a simple check would guarentee we only have exposed PCI BAR
pages which is not as bad security wise as the other stuff.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
  2020-10-07 18:10       ` Daniel Vetter
  (?)
@ 2020-10-07 19:00         ` Jason Gunthorpe
  -1 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 19:00 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara

On Wed, Oct 07, 2020 at 08:10:34PM +0200, Daniel Vetter wrote:
> On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > > Way back it was a reasonable assumptions that iomem mappings never
> > > change the pfn range they point at. But this has changed:
> > >
> > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > ptes with unmap_mapping_range when buffers get moved
> > >
> > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > cma regions. This means if we miss the unmap the pfn might contain
> > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > >
> > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > >
> > > Accessing pfns obtained from ptes without holding all the locks is
> > > therefore no longer a good idea.
> > >
> > > Unfortunately there's some users where this is not fixable (like v4l
> > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > iommu). For now annotate these as unsafe and splat appropriately.
> > >
> > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > roll out to all appropriate places.
> > >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: kvm@vger.kernel.org
> > >  include/linux/mm.h |  2 ++
> > >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
> > >  mm/nommu.c         | 17 +++++++++++++++++
> > >  security/Kconfig   | 13 +++++++++++++
> > >  4 files changed, 63 insertions(+), 1 deletion(-)
> >
> > Makes sense to me.
> >
> > I wonder if we could change the original follow_pfn to require the
> > ptep and then lockdep_assert_held() it against the page table lock?
> 
> The safe variant with the pagetable lock is follow_pte_pmd. The only
> way to make follow_pfn safe is if you have an mmu notifier and
> corresponding retry logic. That is not covered by lockdep (it would
> splat if we annotate the retry side), so I'm not sure how you'd check
> for that?

Right OK.

> Checking for ptep lock doesn't work here, since the one leftover safe
> user of this (kvm) doesn't need that at all, because it has the mmu
> notifier.

Ah, so a better name and/or function kdoc for follow_pfn is probably a
good iead in this patch as well.

> So I think we're as good as it gets, since I really have no idea how
> to make sure follow_pfn callers do have an mmu notifier registered.

Yah, can't be done. Most mmu notifier users should be using
hmm_range_fault anyhow, kvm is really very special here.
 
> I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
> explicit enables and default to "do not break uapi, damn the
> (security) bugs". Which is I think how this should be done. It is in
> the security section though, so hopefully competent distros will
> enable this all.

I thought the strict ones were more general and less clear security
worries, not bugs like this.

This is "allow a user triggerable use after free bug to exist in the
kernel"

Jason

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
@ 2020-10-07 19:00         ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 19:00 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Andrew Morton, Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 07, 2020 at 08:10:34PM +0200, Daniel Vetter wrote:
> On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > > Way back it was a reasonable assumptions that iomem mappings never
> > > change the pfn range they point at. But this has changed:
> > >
> > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > ptes with unmap_mapping_range when buffers get moved
> > >
> > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > cma regions. This means if we miss the unmap the pfn might contain
> > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > >
> > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > >
> > > Accessing pfns obtained from ptes without holding all the locks is
> > > therefore no longer a good idea.
> > >
> > > Unfortunately there's some users where this is not fixable (like v4l
> > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > iommu). For now annotate these as unsafe and splat appropriately.
> > >
> > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > roll out to all appropriate places.
> > >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: kvm@vger.kernel.org
> > >  include/linux/mm.h |  2 ++
> > >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
> > >  mm/nommu.c         | 17 +++++++++++++++++
> > >  security/Kconfig   | 13 +++++++++++++
> > >  4 files changed, 63 insertions(+), 1 deletion(-)
> >
> > Makes sense to me.
> >
> > I wonder if we could change the original follow_pfn to require the
> > ptep and then lockdep_assert_held() it against the page table lock?
> 
> The safe variant with the pagetable lock is follow_pte_pmd. The only
> way to make follow_pfn safe is if you have an mmu notifier and
> corresponding retry logic. That is not covered by lockdep (it would
> splat if we annotate the retry side), so I'm not sure how you'd check
> for that?

Right OK.

> Checking for ptep lock doesn't work here, since the one leftover safe
> user of this (kvm) doesn't need that at all, because it has the mmu
> notifier.

Ah, so a better name and/or function kdoc for follow_pfn is probably a
good iead in this patch as well.

> So I think we're as good as it gets, since I really have no idea how
> to make sure follow_pfn callers do have an mmu notifier registered.

Yah, can't be done. Most mmu notifier users should be using
hmm_range_fault anyhow, kvm is really very special here.
 
> I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
> explicit enables and default to "do not break uapi, damn the
> (security) bugs". Which is I think how this should be done. It is in
> the security section though, so hopefully competent distros will
> enable this all.

I thought the strict ones were more general and less clear security
worries, not bugs like this.

This is "allow a user triggerable use after free bug to exist in the
kernel"

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
@ 2020-10-07 19:00         ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 19:00 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Andrew Morton, Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 07, 2020 at 08:10:34PM +0200, Daniel Vetter wrote:
> On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > > Way back it was a reasonable assumptions that iomem mappings never
> > > change the pfn range they point at. But this has changed:
> > >
> > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > ptes with unmap_mapping_range when buffers get moved
> > >
> > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > cma regions. This means if we miss the unmap the pfn might contain
> > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > >
> > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > >
> > > Accessing pfns obtained from ptes without holding all the locks is
> > > therefore no longer a good idea.
> > >
> > > Unfortunately there's some users where this is not fixable (like v4l
> > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > iommu). For now annotate these as unsafe and splat appropriately.
> > >
> > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > roll out to all appropriate places.
> > >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: kvm@vger.kernel.org
> > >  include/linux/mm.h |  2 ++
> > >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
> > >  mm/nommu.c         | 17 +++++++++++++++++
> > >  security/Kconfig   | 13 +++++++++++++
> > >  4 files changed, 63 insertions(+), 1 deletion(-)
> >
> > Makes sense to me.
> >
> > I wonder if we could change the original follow_pfn to require the
> > ptep and then lockdep_assert_held() it against the page table lock?
> 
> The safe variant with the pagetable lock is follow_pte_pmd. The only
> way to make follow_pfn safe is if you have an mmu notifier and
> corresponding retry logic. That is not covered by lockdep (it would
> splat if we annotate the retry side), so I'm not sure how you'd check
> for that?

Right OK.

> Checking for ptep lock doesn't work here, since the one leftover safe
> user of this (kvm) doesn't need that at all, because it has the mmu
> notifier.

Ah, so a better name and/or function kdoc for follow_pfn is probably a
good iead in this patch as well.

> So I think we're as good as it gets, since I really have no idea how
> to make sure follow_pfn callers do have an mmu notifier registered.

Yah, can't be done. Most mmu notifier users should be using
hmm_range_fault anyhow, kvm is really very special here.
 
> I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
> explicit enables and default to "do not break uapi, damn the
> (security) bugs". Which is I think how this should be done. It is in
> the security section though, so hopefully competent distros will
> enable this all.

I thought the strict ones were more general and less clear security
worries, not bugs like this.

This is "allow a user triggerable use after free bug to exist in the
kernel"

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 18:41     ` Bjorn Helgaas
  (?)
@ 2020-10-07 19:24       ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 19:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Kees Cook,
	Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, Linux PCI

On Wed, Oct 7, 2020 at 8:41 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> Capitalize subject, like other patches in this series and previous
> drivers/pci history.
>
> On Wed, Oct 07, 2020 at 06:44:23PM +0200, Daniel Vetter wrote:
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
>
> s/pci/PCI/ in commit logs and comments.
>
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
> > seems to be a subset of architectures support STRICT_DEVMEM, so we
> > should be good.
> >
> > The only difference in access checks left is that sysfs pci mmap does
> > not check for CAP_RAWIO. But I think that makes some sense compared to
> > /dev/mem and proc, where one file gives you access to everything and
> > no ownership applies.
>
> > --- a/drivers/char/mem.c
> > +++ b/drivers/char/mem.c
> > @@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
> >  }
> >
> >  static struct inode *devmem_inode;
> > +static struct vfsmount *devmem_vfs_mount;
> >
> >  #ifdef CONFIG_IO_STRICT_DEVMEM
> >  void revoke_devmem(struct resource *res)
> > @@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
> >
> >       unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
> >  }
> > +
> > +struct file *devmem_getfile(void)
> > +{
> > +     struct file *file;
> > +
> > +     file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
> > +                              O_RDWR, &kmem_fops);
> > +     if (IS_ERR(file))
> > +             return NULL;
> > +
> > +     file->f_mapping = devmem_indoe->i_mapping;
>
> "devmem_indoe"?  Obviously not compiled, I guess?

Yeah apologies, I forgot to compile this with CONFIG_IO_STRICT_DEVMEM
set. The entire series is more rfc about the overall problem really, I
need to also figure out how to even this this somehow. I guess there's
nothing really ready made here?
-Daniel

> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct resource *base,
> >
> >  #ifdef CONFIG_IO_STRICT_DEVMEM
> >  void revoke_devmem(struct resource *res);
> > +struct file *devm_getfile(void);
> >  #else
> >  static inline void revoke_devmem(struct resource *res) { };
> > +static inline struct file *devmem_getfile(void) { return NULL; };
>
> I guess these names are supposed to match?
>
> >  #endif
> >
> >  #endif /* __ASSEMBLY__ */
> > --
> > 2.28.0
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 19:24       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 19:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 8:41 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> Capitalize subject, like other patches in this series and previous
> drivers/pci history.
>
> On Wed, Oct 07, 2020 at 06:44:23PM +0200, Daniel Vetter wrote:
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
>
> s/pci/PCI/ in commit logs and comments.
>
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
> > seems to be a subset of architectures support STRICT_DEVMEM, so we
> > should be good.
> >
> > The only difference in access checks left is that sysfs pci mmap does
> > not check for CAP_RAWIO. But I think that makes some sense compared to
> > /dev/mem and proc, where one file gives you access to everything and
> > no ownership applies.
>
> > --- a/drivers/char/mem.c
> > +++ b/drivers/char/mem.c
> > @@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
> >  }
> >
> >  static struct inode *devmem_inode;
> > +static struct vfsmount *devmem_vfs_mount;
> >
> >  #ifdef CONFIG_IO_STRICT_DEVMEM
> >  void revoke_devmem(struct resource *res)
> > @@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
> >
> >       unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
> >  }
> > +
> > +struct file *devmem_getfile(void)
> > +{
> > +     struct file *file;
> > +
> > +     file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
> > +                              O_RDWR, &kmem_fops);
> > +     if (IS_ERR(file))
> > +             return NULL;
> > +
> > +     file->f_mapping = devmem_indoe->i_mapping;
>
> "devmem_indoe"?  Obviously not compiled, I guess?

Yeah apologies, I forgot to compile this with CONFIG_IO_STRICT_DEVMEM
set. The entire series is more rfc about the overall problem really, I
need to also figure out how to even this this somehow. I guess there's
nothing really ready made here?
-Daniel

> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct resource *base,
> >
> >  #ifdef CONFIG_IO_STRICT_DEVMEM
> >  void revoke_devmem(struct resource *res);
> > +struct file *devm_getfile(void);
> >  #else
> >  static inline void revoke_devmem(struct resource *res) { };
> > +static inline struct file *devmem_getfile(void) { return NULL; };
>
> I guess these names are supposed to match?
>
> >  #endif
> >
> >  #endif /* __ASSEMBLY__ */
> > --
> > 2.28.0
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 19:24       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 19:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 8:41 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> Capitalize subject, like other patches in this series and previous
> drivers/pci history.
>
> On Wed, Oct 07, 2020 at 06:44:23PM +0200, Daniel Vetter wrote:
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
>
> s/pci/PCI/ in commit logs and comments.
>
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
> > seems to be a subset of architectures support STRICT_DEVMEM, so we
> > should be good.
> >
> > The only difference in access checks left is that sysfs pci mmap does
> > not check for CAP_RAWIO. But I think that makes some sense compared to
> > /dev/mem and proc, where one file gives you access to everything and
> > no ownership applies.
>
> > --- a/drivers/char/mem.c
> > +++ b/drivers/char/mem.c
> > @@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
> >  }
> >
> >  static struct inode *devmem_inode;
> > +static struct vfsmount *devmem_vfs_mount;
> >
> >  #ifdef CONFIG_IO_STRICT_DEVMEM
> >  void revoke_devmem(struct resource *res)
> > @@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
> >
> >       unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
> >  }
> > +
> > +struct file *devmem_getfile(void)
> > +{
> > +     struct file *file;
> > +
> > +     file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
> > +                              O_RDWR, &kmem_fops);
> > +     if (IS_ERR(file))
> > +             return NULL;
> > +
> > +     file->f_mapping = devmem_indoe->i_mapping;
>
> "devmem_indoe"?  Obviously not compiled, I guess?

Yeah apologies, I forgot to compile this with CONFIG_IO_STRICT_DEVMEM
set. The entire series is more rfc about the overall problem really, I
need to also figure out how to even this this somehow. I guess there's
nothing really ready made here?
-Daniel

> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct resource *base,
> >
> >  #ifdef CONFIG_IO_STRICT_DEVMEM
> >  void revoke_devmem(struct resource *res);
> > +struct file *devm_getfile(void);
> >  #else
> >  static inline void revoke_devmem(struct resource *res) { };
> > +static inline struct file *devmem_getfile(void) { return NULL; };
>
> I guess these names are supposed to match?
>
> >  #endif
> >
> >  #endif /* __ASSEMBLY__ */
> > --
> > 2.28.0
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 19:33     ` Dan Williams
  -1 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-07 19:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, Linux-media@vger.kernel.org, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	linux-pci

On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> the region") /dev/kmem zaps ptes when the kernel requests exclusive
> acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> the default for all driver uses.
>
> Except there's two more ways to access pci bars: sysfs and proc mmap
> support. Let's plug that hole.

Ooh, yes, lets.

>
> For revoke_devmem() to work we need to link our vma into the same
> address_space, with consistent vma->vm_pgoff. ->pgoff is already
> adjusted, because that's how (io_)remap_pfn_range works, but for the
> mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> at ->open time, but that's a bit tricky here with all the entry points
> and arch code. So instead create a fake file and adjust vma->vm_file.

I don't think you want to share the devmem inode for this, this should
be based off the sysfs inode which I believe there is already only one
instance per resource. In contrast /dev/mem can have multiple inodes
because anyone can just mknod a new character device file, the same
problem does not exist for sysfs.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 19:33     ` Dan Williams
  0 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-07 19:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, linux-pci, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> the region") /dev/kmem zaps ptes when the kernel requests exclusive
> acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> the default for all driver uses.
>
> Except there's two more ways to access pci bars: sysfs and proc mmap
> support. Let's plug that hole.

Ooh, yes, lets.

>
> For revoke_devmem() to work we need to link our vma into the same
> address_space, with consistent vma->vm_pgoff. ->pgoff is already
> adjusted, because that's how (io_)remap_pfn_range works, but for the
> mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> at ->open time, but that's a bit tricky here with all the entry points
> and arch code. So instead create a fake file and adjust vma->vm_file.

I don't think you want to share the devmem inode for this, this should
be based off the sysfs inode which I believe there is already only one
instance per resource. In contrast /dev/mem can have multiple inodes
because anyone can just mknod a new character device file, the same
problem does not exist for sysfs.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 19:33     ` Dan Williams
  0 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-07 19:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, linux-pci, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> the region") /dev/kmem zaps ptes when the kernel requests exclusive
> acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> the default for all driver uses.
>
> Except there's two more ways to access pci bars: sysfs and proc mmap
> support. Let's plug that hole.

Ooh, yes, lets.

>
> For revoke_devmem() to work we need to link our vma into the same
> address_space, with consistent vma->vm_pgoff. ->pgoff is already
> adjusted, because that's how (io_)remap_pfn_range works, but for the
> mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> at ->open time, but that's a bit tricky here with all the entry points
> and arch code. So instead create a fake file and adjust vma->vm_file.

I don't think you want to share the devmem inode for this, this should
be based off the sysfs inode which I believe there is already only one
instance per resource. In contrast /dev/mem can have multiple inodes
because anyone can just mknod a new character device file, the same
problem does not exist for sysfs.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
  2020-10-07 19:00         ` Jason Gunthorpe
  (?)
@ 2020-10-07 19:38           ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 19:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara

On Wed, Oct 7, 2020 at 9:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 08:10:34PM +0200, Daniel Vetter wrote:
> > On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > >
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > >
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > >
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > >
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > >
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > >
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.
> > > >
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > > Cc: Jan Kara <jack@suse.cz>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-arm-kernel@lists.infradead.org
> > > > Cc: linux-samsung-soc@vger.kernel.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: kvm@vger.kernel.org
> > > >  include/linux/mm.h |  2 ++
> > > >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
> > > >  mm/nommu.c         | 17 +++++++++++++++++
> > > >  security/Kconfig   | 13 +++++++++++++
> > > >  4 files changed, 63 insertions(+), 1 deletion(-)
> > >
> > > Makes sense to me.
> > >
> > > I wonder if we could change the original follow_pfn to require the
> > > ptep and then lockdep_assert_held() it against the page table lock?
> >
> > The safe variant with the pagetable lock is follow_pte_pmd. The only
> > way to make follow_pfn safe is if you have an mmu notifier and
> > corresponding retry logic. That is not covered by lockdep (it would
> > splat if we annotate the retry side), so I'm not sure how you'd check
> > for that?
>
> Right OK.
>
> > Checking for ptep lock doesn't work here, since the one leftover safe
> > user of this (kvm) doesn't need that at all, because it has the mmu
> > notifier.
>
> Ah, so a better name and/or function kdoc for follow_pfn is probably a
> good iead in this patch as well.

I did change that already to mention that you need an mmu notifier,
and that follow_pte_pmd respectively unsafe_follow_pfn are the
alternatives. Do you want more or something else here?

Note that I left the kerneldoc for the nommu.c case unchanged, since
without an mmu all bets are off anyway.

> > So I think we're as good as it gets, since I really have no idea how
> > to make sure follow_pfn callers do have an mmu notifier registered.
>
> Yah, can't be done. Most mmu notifier users should be using
> hmm_range_fault anyhow, kvm is really very special here.

We could pass an mmu notifier to follow_pfn and check that it has a
registration for vma->vm_mm, but that feels like overkill when kvm is
the only legit user for this.

> > I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
> > explicit enables and default to "do not break uapi, damn the
> > (security) bugs". Which is I think how this should be done. It is in
> > the security section though, so hopefully competent distros will
> > enable this all.
>
> I thought the strict ones were more general and less clear security
> worries, not bugs like this.
>
> This is "allow a user triggerable use after free bug to exist in the
> kernel"

Since at most you get at GFP_MOVEABLE stuff I'm not sure you can use
this to pull the kernel over the table. Maybe best way is if you get a
gpu pagetable somehow into your pfn and then use that to access
abitrary stuff, but there's still an iommu. I think leveraging this is
going to be very tricky, and pretty much has to be device or driver
specific somehow.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
@ 2020-10-07 19:38           ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 19:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Andrew Morton, Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 9:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 08:10:34PM +0200, Daniel Vetter wrote:
> > On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > >
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > >
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > >
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > >
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > >
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > >
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.
> > > >
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > > Cc: Jan Kara <jack@suse.cz>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-arm-kernel@lists.infradead.org
> > > > Cc: linux-samsung-soc@vger.kernel.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: kvm@vger.kernel.org
> > > >  include/linux/mm.h |  2 ++
> > > >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
> > > >  mm/nommu.c         | 17 +++++++++++++++++
> > > >  security/Kconfig   | 13 +++++++++++++
> > > >  4 files changed, 63 insertions(+), 1 deletion(-)
> > >
> > > Makes sense to me.
> > >
> > > I wonder if we could change the original follow_pfn to require the
> > > ptep and then lockdep_assert_held() it against the page table lock?
> >
> > The safe variant with the pagetable lock is follow_pte_pmd. The only
> > way to make follow_pfn safe is if you have an mmu notifier and
> > corresponding retry logic. That is not covered by lockdep (it would
> > splat if we annotate the retry side), so I'm not sure how you'd check
> > for that?
>
> Right OK.
>
> > Checking for ptep lock doesn't work here, since the one leftover safe
> > user of this (kvm) doesn't need that at all, because it has the mmu
> > notifier.
>
> Ah, so a better name and/or function kdoc for follow_pfn is probably a
> good iead in this patch as well.

I did change that already to mention that you need an mmu notifier,
and that follow_pte_pmd respectively unsafe_follow_pfn are the
alternatives. Do you want more or something else here?

Note that I left the kerneldoc for the nommu.c case unchanged, since
without an mmu all bets are off anyway.

> > So I think we're as good as it gets, since I really have no idea how
> > to make sure follow_pfn callers do have an mmu notifier registered.
>
> Yah, can't be done. Most mmu notifier users should be using
> hmm_range_fault anyhow, kvm is really very special here.

We could pass an mmu notifier to follow_pfn and check that it has a
registration for vma->vm_mm, but that feels like overkill when kvm is
the only legit user for this.

> > I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
> > explicit enables and default to "do not break uapi, damn the
> > (security) bugs". Which is I think how this should be done. It is in
> > the security section though, so hopefully competent distros will
> > enable this all.
>
> I thought the strict ones were more general and less clear security
> worries, not bugs like this.
>
> This is "allow a user triggerable use after free bug to exist in the
> kernel"

Since at most you get at GFP_MOVEABLE stuff I'm not sure you can use
this to pull the kernel over the table. Maybe best way is if you get a
gpu pagetable somehow into your pfn and then use that to access
abitrary stuff, but there's still an iommu. I think leveraging this is
going to be very tricky, and pretty much has to be device or driver
specific somehow.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 11/13] mm: add unsafe_follow_pfn
@ 2020-10-07 19:38           ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 19:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Andrew Morton, Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 9:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 08:10:34PM +0200, Daniel Vetter wrote:
> > On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > >
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > >
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > >
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > >
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > >
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > >
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.
> > > >
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > > Cc: Jan Kara <jack@suse.cz>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-arm-kernel@lists.infradead.org
> > > > Cc: linux-samsung-soc@vger.kernel.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: kvm@vger.kernel.org
> > > >  include/linux/mm.h |  2 ++
> > > >  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
> > > >  mm/nommu.c         | 17 +++++++++++++++++
> > > >  security/Kconfig   | 13 +++++++++++++
> > > >  4 files changed, 63 insertions(+), 1 deletion(-)
> > >
> > > Makes sense to me.
> > >
> > > I wonder if we could change the original follow_pfn to require the
> > > ptep and then lockdep_assert_held() it against the page table lock?
> >
> > The safe variant with the pagetable lock is follow_pte_pmd. The only
> > way to make follow_pfn safe is if you have an mmu notifier and
> > corresponding retry logic. That is not covered by lockdep (it would
> > splat if we annotate the retry side), so I'm not sure how you'd check
> > for that?
>
> Right OK.
>
> > Checking for ptep lock doesn't work here, since the one leftover safe
> > user of this (kvm) doesn't need that at all, because it has the mmu
> > notifier.
>
> Ah, so a better name and/or function kdoc for follow_pfn is probably a
> good iead in this patch as well.

I did change that already to mention that you need an mmu notifier,
and that follow_pte_pmd respectively unsafe_follow_pfn are the
alternatives. Do you want more or something else here?

Note that I left the kerneldoc for the nommu.c case unchanged, since
without an mmu all bets are off anyway.

> > So I think we're as good as it gets, since I really have no idea how
> > to make sure follow_pfn callers do have an mmu notifier registered.
>
> Yah, can't be done. Most mmu notifier users should be using
> hmm_range_fault anyhow, kvm is really very special here.

We could pass an mmu notifier to follow_pfn and check that it has a
registration for vma->vm_mm, but that feels like overkill when kvm is
the only legit user for this.

> > I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
> > explicit enables and default to "do not break uapi, damn the
> > (security) bugs". Which is I think how this should be done. It is in
> > the security section though, so hopefully competent distros will
> > enable this all.
>
> I thought the strict ones were more general and less clear security
> worries, not bugs like this.
>
> This is "allow a user triggerable use after free bug to exist in the
> kernel"

Since at most you get at GFP_MOVEABLE stuff I'm not sure you can use
this to pull the kernel over the table. Maybe best way is if you get a
gpu pagetable somehow into your pfn and then use that to access
abitrary stuff, but there's still an iommu. I think leveraging this is
going to be very tricky, and pretty much has to be device or driver
specific somehow.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 19:33     ` Dan Williams
  (?)
@ 2020-10-07 19:47       ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 19:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, Linux-media@vger.kernel.org, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	Linux PCI

On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
>
> Ooh, yes, lets.
>
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
>
> I don't think you want to share the devmem inode for this, this should
> be based off the sysfs inode which I believe there is already only one
> instance per resource. In contrast /dev/mem can have multiple inodes
> because anyone can just mknod a new character device file, the same
> problem does not exist for sysfs.

But then I need to find the right one, plus I also need to find the
right one for the procfs side. That gets messy, and I already have no
idea how to really test this. Shared address_space is the same trick
we're using in drm (where we have multiple things all pointing to the
same underlying resources, through different files), and it gets the
job done. So that's why I figured the shared address_space is the
cleaner solution since then unmap_mapping_range takes care of
iterating over all vma for us. I guess I could reimplement that logic
with our own locking and everything in revoke_devmem, but feels a bit
silly. But it would also solve the problem of having mutliple
different mknod of /dev/kmem with different address_space behind them.
Also because of how remap_pfn_range works, all these vma do use the
same pgoff already anyway.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 19:47       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 19:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
>
> Ooh, yes, lets.
>
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
>
> I don't think you want to share the devmem inode for this, this should
> be based off the sysfs inode which I believe there is already only one
> instance per resource. In contrast /dev/mem can have multiple inodes
> because anyone can just mknod a new character device file, the same
> problem does not exist for sysfs.

But then I need to find the right one, plus I also need to find the
right one for the procfs side. That gets messy, and I already have no
idea how to really test this. Shared address_space is the same trick
we're using in drm (where we have multiple things all pointing to the
same underlying resources, through different files), and it gets the
job done. So that's why I figured the shared address_space is the
cleaner solution since then unmap_mapping_range takes care of
iterating over all vma for us. I guess I could reimplement that logic
with our own locking and everything in revoke_devmem, but feels a bit
silly. But it would also solve the problem of having mutliple
different mknod of /dev/kmem with different address_space behind them.
Also because of how remap_pfn_range works, all these vma do use the
same pgoff already anyway.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 19:47       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 19:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
>
> Ooh, yes, lets.
>
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
>
> I don't think you want to share the devmem inode for this, this should
> be based off the sysfs inode which I believe there is already only one
> instance per resource. In contrast /dev/mem can have multiple inodes
> because anyone can just mknod a new character device file, the same
> problem does not exist for sysfs.

But then I need to find the right one, plus I also need to find the
right one for the procfs side. That gets messy, and I already have no
idea how to really test this. Shared address_space is the same trick
we're using in drm (where we have multiple things all pointing to the
same underlying resources, through different files), and it gets the
job done. So that's why I figured the shared address_space is the
cleaner solution since then unmap_mapping_range takes care of
iterating over all vma for us. I guess I could reimplement that logic
with our own locking and everything in revoke_devmem, but feels a bit
silly. But it would also solve the problem of having mutliple
different mknod of /dev/kmem with different address_space behind them.
Also because of how remap_pfn_range works, all these vma do use the
same pgoff already anyway.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 20:32     ` John Hubbard
  -1 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:32 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Inki Dae,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Andrew Morton, Jérôme Glisse,
	Jan Kara, Dan Williams

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> All we need are a pages array, pin_user_pages_fast can give us that
> directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Kukjin Kim <kgene@kernel.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> ---
>   drivers/gpu/drm/exynos/Kconfig          |  1 -
>   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++++++++++-------------
>   2 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
> index 6417f374b923..43257ef3c09d 100644
> --- a/drivers/gpu/drm/exynos/Kconfig
> +++ b/drivers/gpu/drm/exynos/Kconfig
> @@ -88,7 +88,6 @@ comment "Sub-drivers"
>   config DRM_EXYNOS_G2D
>   	bool "G2D"
>   	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
> -	select FRAME_VECTOR
>   	help
>   	  Choose this option if you want to use Exynos G2D for DRM.
>   
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index 967a5cdc120e..c83f6faac9de 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
>   	dma_addr_t		dma_addr;
>   	unsigned long		userptr;
>   	unsigned long		size;
> -	struct frame_vector	*vec;
> +	struct page		**pages;
> +	unsigned int		npages;
>   	struct sg_table		*sgt;
>   	atomic_t		refcount;
>   	bool			in_pool;
> @@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>   					bool force)
>   {
>   	struct g2d_cmdlist_userptr *g2d_userptr = obj;
> -	struct page **pages;
> +	int i;

The above line can also be deleted, see below.

>   
>   	if (!obj)
>   		return;
> @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>   	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
>   			  DMA_BIDIRECTIONAL, 0);
>   
> -	pages = frame_vector_pages(g2d_userptr->vec);
> -	if (!IS_ERR(pages)) {
> -		int i;
> +	for (i = 0; i < g2d_userptr->npages; i++)
> +		set_page_dirty_lock(g2d_userptr->pages[i]);
>   
> -		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> -			set_page_dirty_lock(pages[i]);
> -	}
> -	put_vaddr_frames(g2d_userptr->vec);
> -	frame_vector_destroy(g2d_userptr->vec);
> +	unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> +	kvfree(g2d_userptr->pages);

You can avoid writing your own loop, and just simplify the whole thing down to
two lines:

	unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
				    true);
	kvfree(g2d_userptr->pages);


>   
>   	if (!g2d_userptr->out_of_list)
>   		list_del_init(&g2d_userptr->list);
> @@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   	offset = userptr & ~PAGE_MASK;
>   	end = PAGE_ALIGN(userptr + size);
>   	npages = (end - start) >> PAGE_SHIFT;
> -	g2d_userptr->vec = frame_vector_create(npages);
> -	if (!g2d_userptr->vec) {
> +	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
> +					    GFP_KERNEL);
> +	if (!g2d_userptr->pages) {
>   		ret = -ENOMEM;
>   		goto err_free;
>   	}
>   
> -	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> -		g2d_userptr->vec);
> +	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +				  g2d_userptr->pages);
>   	if (ret != npages) {
>   		DRM_DEV_ERROR(g2d->dev,
>   			      "failed to get user pages from userptr.\n");
>   		if (ret < 0)
> -			goto err_destroy_framevec;
> -		ret = -EFAULT;
> -		goto err_put_framevec;
> -	}
> -	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
> +			goto err_destroy_pages;
> +		npages = ret;
>   		ret = -EFAULT;
> -		goto err_put_framevec;
> +		goto err_unpin_pages;
>   	}
> +	g2d_userptr->npages = npages;
>   
>   	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>   	if (!sgt) {
>   		ret = -ENOMEM;
> -		goto err_put_framevec;
> +		goto err_unpin_pages;
>   	}
>   
>   	ret = sg_alloc_table_from_pages(sgt,
> -					frame_vector_pages(g2d_userptr->vec),
> +					g2d_userptr->pages,
>   					npages, offset, size, GFP_KERNEL);
>   	if (ret < 0) {
>   		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
> @@ -538,11 +534,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   err_free_sgt:
>   	kfree(sgt);
>   
> -err_put_framevec:
> -	put_vaddr_frames(g2d_userptr->vec);
> +err_unpin_pages:
> +	unpin_user_pages(g2d_userptr->pages, npages);
>   
> -err_destroy_framevec:
> -	frame_vector_destroy(g2d_userptr->vec);
> +err_destroy_pages:
> +	kvfree(g2d_userptr->pages);
>   
>   err_free:
>   	kfree(g2d_userptr);
> 

The rest all looks good, you've avoided the usual API pitfalls. :)

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
@ 2020-10-07 20:32     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:32 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Seung-Woo Kim, Jérôme Glisse,
	Krzysztof Kozlowski, Inki Dae, linux-mm, Kyungmin Park,
	Kukjin Kim, Daniel Vetter, Andrew Morton, Dan Williams,
	linux-arm-kernel, linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> All we need are a pages array, pin_user_pages_fast can give us that
> directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Kukjin Kim <kgene@kernel.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> ---
>   drivers/gpu/drm/exynos/Kconfig          |  1 -
>   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++++++++++-------------
>   2 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
> index 6417f374b923..43257ef3c09d 100644
> --- a/drivers/gpu/drm/exynos/Kconfig
> +++ b/drivers/gpu/drm/exynos/Kconfig
> @@ -88,7 +88,6 @@ comment "Sub-drivers"
>   config DRM_EXYNOS_G2D
>   	bool "G2D"
>   	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
> -	select FRAME_VECTOR
>   	help
>   	  Choose this option if you want to use Exynos G2D for DRM.
>   
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index 967a5cdc120e..c83f6faac9de 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
>   	dma_addr_t		dma_addr;
>   	unsigned long		userptr;
>   	unsigned long		size;
> -	struct frame_vector	*vec;
> +	struct page		**pages;
> +	unsigned int		npages;
>   	struct sg_table		*sgt;
>   	atomic_t		refcount;
>   	bool			in_pool;
> @@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>   					bool force)
>   {
>   	struct g2d_cmdlist_userptr *g2d_userptr = obj;
> -	struct page **pages;
> +	int i;

The above line can also be deleted, see below.

>   
>   	if (!obj)
>   		return;
> @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>   	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
>   			  DMA_BIDIRECTIONAL, 0);
>   
> -	pages = frame_vector_pages(g2d_userptr->vec);
> -	if (!IS_ERR(pages)) {
> -		int i;
> +	for (i = 0; i < g2d_userptr->npages; i++)
> +		set_page_dirty_lock(g2d_userptr->pages[i]);
>   
> -		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> -			set_page_dirty_lock(pages[i]);
> -	}
> -	put_vaddr_frames(g2d_userptr->vec);
> -	frame_vector_destroy(g2d_userptr->vec);
> +	unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> +	kvfree(g2d_userptr->pages);

You can avoid writing your own loop, and just simplify the whole thing down to
two lines:

	unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
				    true);
	kvfree(g2d_userptr->pages);


>   
>   	if (!g2d_userptr->out_of_list)
>   		list_del_init(&g2d_userptr->list);
> @@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   	offset = userptr & ~PAGE_MASK;
>   	end = PAGE_ALIGN(userptr + size);
>   	npages = (end - start) >> PAGE_SHIFT;
> -	g2d_userptr->vec = frame_vector_create(npages);
> -	if (!g2d_userptr->vec) {
> +	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
> +					    GFP_KERNEL);
> +	if (!g2d_userptr->pages) {
>   		ret = -ENOMEM;
>   		goto err_free;
>   	}
>   
> -	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> -		g2d_userptr->vec);
> +	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +				  g2d_userptr->pages);
>   	if (ret != npages) {
>   		DRM_DEV_ERROR(g2d->dev,
>   			      "failed to get user pages from userptr.\n");
>   		if (ret < 0)
> -			goto err_destroy_framevec;
> -		ret = -EFAULT;
> -		goto err_put_framevec;
> -	}
> -	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
> +			goto err_destroy_pages;
> +		npages = ret;
>   		ret = -EFAULT;
> -		goto err_put_framevec;
> +		goto err_unpin_pages;
>   	}
> +	g2d_userptr->npages = npages;
>   
>   	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>   	if (!sgt) {
>   		ret = -ENOMEM;
> -		goto err_put_framevec;
> +		goto err_unpin_pages;
>   	}
>   
>   	ret = sg_alloc_table_from_pages(sgt,
> -					frame_vector_pages(g2d_userptr->vec),
> +					g2d_userptr->pages,
>   					npages, offset, size, GFP_KERNEL);
>   	if (ret < 0) {
>   		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
> @@ -538,11 +534,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   err_free_sgt:
>   	kfree(sgt);
>   
> -err_put_framevec:
> -	put_vaddr_frames(g2d_userptr->vec);
> +err_unpin_pages:
> +	unpin_user_pages(g2d_userptr->pages, npages);
>   
> -err_destroy_framevec:
> -	frame_vector_destroy(g2d_userptr->vec);
> +err_destroy_pages:
> +	kvfree(g2d_userptr->pages);
>   
>   err_free:
>   	kfree(g2d_userptr);
> 

The rest all looks good, you've avoided the usual API pitfalls. :)

thanks,
-- 
John Hubbard
NVIDIA

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
@ 2020-10-07 20:32     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:32 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Seung-Woo Kim, Jérôme Glisse,
	Krzysztof Kozlowski, linux-mm, Kyungmin Park, Kukjin Kim,
	Daniel Vetter, Andrew Morton, Dan Williams, linux-arm-kernel,
	linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> All we need are a pages array, pin_user_pages_fast can give us that
> directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Kukjin Kim <kgene@kernel.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> ---
>   drivers/gpu/drm/exynos/Kconfig          |  1 -
>   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++++++++++-------------
>   2 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
> index 6417f374b923..43257ef3c09d 100644
> --- a/drivers/gpu/drm/exynos/Kconfig
> +++ b/drivers/gpu/drm/exynos/Kconfig
> @@ -88,7 +88,6 @@ comment "Sub-drivers"
>   config DRM_EXYNOS_G2D
>   	bool "G2D"
>   	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
> -	select FRAME_VECTOR
>   	help
>   	  Choose this option if you want to use Exynos G2D for DRM.
>   
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index 967a5cdc120e..c83f6faac9de 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
>   	dma_addr_t		dma_addr;
>   	unsigned long		userptr;
>   	unsigned long		size;
> -	struct frame_vector	*vec;
> +	struct page		**pages;
> +	unsigned int		npages;
>   	struct sg_table		*sgt;
>   	atomic_t		refcount;
>   	bool			in_pool;
> @@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>   					bool force)
>   {
>   	struct g2d_cmdlist_userptr *g2d_userptr = obj;
> -	struct page **pages;
> +	int i;

The above line can also be deleted, see below.

>   
>   	if (!obj)
>   		return;
> @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>   	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
>   			  DMA_BIDIRECTIONAL, 0);
>   
> -	pages = frame_vector_pages(g2d_userptr->vec);
> -	if (!IS_ERR(pages)) {
> -		int i;
> +	for (i = 0; i < g2d_userptr->npages; i++)
> +		set_page_dirty_lock(g2d_userptr->pages[i]);
>   
> -		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> -			set_page_dirty_lock(pages[i]);
> -	}
> -	put_vaddr_frames(g2d_userptr->vec);
> -	frame_vector_destroy(g2d_userptr->vec);
> +	unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> +	kvfree(g2d_userptr->pages);

You can avoid writing your own loop, and just simplify the whole thing down to
two lines:

	unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
				    true);
	kvfree(g2d_userptr->pages);


>   
>   	if (!g2d_userptr->out_of_list)
>   		list_del_init(&g2d_userptr->list);
> @@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   	offset = userptr & ~PAGE_MASK;
>   	end = PAGE_ALIGN(userptr + size);
>   	npages = (end - start) >> PAGE_SHIFT;
> -	g2d_userptr->vec = frame_vector_create(npages);
> -	if (!g2d_userptr->vec) {
> +	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
> +					    GFP_KERNEL);
> +	if (!g2d_userptr->pages) {
>   		ret = -ENOMEM;
>   		goto err_free;
>   	}
>   
> -	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> -		g2d_userptr->vec);
> +	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +				  g2d_userptr->pages);
>   	if (ret != npages) {
>   		DRM_DEV_ERROR(g2d->dev,
>   			      "failed to get user pages from userptr.\n");
>   		if (ret < 0)
> -			goto err_destroy_framevec;
> -		ret = -EFAULT;
> -		goto err_put_framevec;
> -	}
> -	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
> +			goto err_destroy_pages;
> +		npages = ret;
>   		ret = -EFAULT;
> -		goto err_put_framevec;
> +		goto err_unpin_pages;
>   	}
> +	g2d_userptr->npages = npages;
>   
>   	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>   	if (!sgt) {
>   		ret = -ENOMEM;
> -		goto err_put_framevec;
> +		goto err_unpin_pages;
>   	}
>   
>   	ret = sg_alloc_table_from_pages(sgt,
> -					frame_vector_pages(g2d_userptr->vec),
> +					g2d_userptr->pages,
>   					npages, offset, size, GFP_KERNEL);
>   	if (ret < 0) {
>   		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
> @@ -538,11 +534,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   err_free_sgt:
>   	kfree(sgt);
>   
> -err_put_framevec:
> -	put_vaddr_frames(g2d_userptr->vec);
> +err_unpin_pages:
> +	unpin_user_pages(g2d_userptr->pages, npages);
>   
> -err_destroy_framevec:
> -	frame_vector_destroy(g2d_userptr->vec);
> +err_destroy_pages:
> +	kvfree(g2d_userptr->pages);
>   
>   err_free:
>   	kfree(g2d_userptr);
> 

The rest all looks good, you've avoided the usual API pitfalls. :)

thanks,
-- 
John Hubbard
NVIDIA
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 03/13] misc/habana: Stop using frame_vector helpers
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 20:38     ` John Hubbard
  -1 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:38 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Andrew Morton,
	Jérôme Glisse, Jan Kara, Dan Williams, Oded Gabbay,
	Omer Shpigelman, Ofir Bitton, Tomer Tayar, Moti Haimovski,
	Greg Kroah-Hartman, Pawel Piskorski

On 10/7/20 9:44 AM, Daniel Vetter wrote:
...
> @@ -1414,15 +1410,10 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>   							userptr->sgt->nents,
>   							userptr->dir);
>   
> -	pages = frame_vector_pages(userptr->vec);
> -	if (!IS_ERR(pages)) {
> -		int i;
> -
> -		for (i = 0; i < frame_vector_count(userptr->vec); i++)
> -			set_page_dirty_lock(pages[i]);
> -	}
> -	put_vaddr_frames(userptr->vec);
> -	frame_vector_destroy(userptr->vec);
> +	for (i = 0; i < userptr->npages; i++)
> +		set_page_dirty_lock(userptr->pages[i]);
> +	unpin_user_pages(userptr->pages, userptr->npages);
> +	kvfree(userptr->pages);

Same thing here as in patch 1: you can further simplify by using
unpin_user_pages_dirty_lock().

>   
>   	list_del(&userptr->job_node);
>   
> 

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 03/13] misc/habana: Stop using frame_vector helpers
@ 2020-10-07 20:38     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:38 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: Oded Gabbay, linux-s390, linux-samsung-soc, Jan Kara, kvm,
	Jason Gunthorpe, Pawel Piskorski, Greg Kroah-Hartman,
	Ofir Bitton, linux-mm, Jérôme Glisse, Tomer Tayar,
	Omer Shpigelman, Daniel Vetter, Andrew Morton, Moti Haimovski,
	Dan Williams, linux-arm-kernel, linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
...
> @@ -1414,15 +1410,10 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>   							userptr->sgt->nents,
>   							userptr->dir);
>   
> -	pages = frame_vector_pages(userptr->vec);
> -	if (!IS_ERR(pages)) {
> -		int i;
> -
> -		for (i = 0; i < frame_vector_count(userptr->vec); i++)
> -			set_page_dirty_lock(pages[i]);
> -	}
> -	put_vaddr_frames(userptr->vec);
> -	frame_vector_destroy(userptr->vec);
> +	for (i = 0; i < userptr->npages; i++)
> +		set_page_dirty_lock(userptr->pages[i]);
> +	unpin_user_pages(userptr->pages, userptr->npages);
> +	kvfree(userptr->pages);

Same thing here as in patch 1: you can further simplify by using
unpin_user_pages_dirty_lock().

>   
>   	list_del(&userptr->job_node);
>   
> 

thanks,
-- 
John Hubbard
NVIDIA

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 03/13] misc/habana: Stop using frame_vector helpers
@ 2020-10-07 20:38     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:38 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, kvm, Jason Gunthorpe,
	Pawel Piskorski, Greg Kroah-Hartman, Ofir Bitton, linux-mm,
	Jérôme Glisse, Tomer Tayar, Omer Shpigelman,
	Daniel Vetter, Andrew Morton, Moti Haimovski, Dan Williams,
	linux-arm-kernel, linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
...
> @@ -1414,15 +1410,10 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>   							userptr->sgt->nents,
>   							userptr->dir);
>   
> -	pages = frame_vector_pages(userptr->vec);
> -	if (!IS_ERR(pages)) {
> -		int i;
> -
> -		for (i = 0; i < frame_vector_count(userptr->vec); i++)
> -			set_page_dirty_lock(pages[i]);
> -	}
> -	put_vaddr_frames(userptr->vec);
> -	frame_vector_destroy(userptr->vec);
> +	for (i = 0; i < userptr->npages; i++)
> +		set_page_dirty_lock(userptr->pages[i]);
> +	unpin_user_pages(userptr->pages, userptr->npages);
> +	kvfree(userptr->pages);

Same thing here as in patch 1: you can further simplify by using
unpin_user_pages_dirty_lock().

>   
>   	list_del(&userptr->job_node);
>   
> 

thanks,
-- 
John Hubbard
NVIDIA
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 02/13] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 20:43     ` John Hubbard
  -1 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:43 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Inki Dae,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Andrew Morton, Jérôme Glisse,
	Jan Kara, Dan Williams

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> The exynos g2d interface is very unusual, but it looks like the
> userptr objects are persistent. Hence they need FOLL_LONGTERM.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Kukjin Kim <kgene@kernel.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> ---
>   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index c83f6faac9de..514fd000feb1 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -478,7 +478,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   		goto err_free;
>   	}
>   
> -	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +	ret = pin_user_pages_fast(start, npages,
> +				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
>   				  g2d_userptr->pages);
>   	if (ret != npages) {
>   		DRM_DEV_ERROR(g2d->dev,
> 

Looks good from a pin_user_pages_fast() point of view. I'm of course not a exynos
developer, so we still need a look from one of those, ideally, but:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 02/13] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
@ 2020-10-07 20:43     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:43 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Seung-Woo Kim, Jérôme Glisse,
	Krzysztof Kozlowski, Inki Dae, linux-mm, Kyungmin Park,
	Kukjin Kim, Daniel Vetter, Andrew Morton, Dan Williams,
	linux-arm-kernel, linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> The exynos g2d interface is very unusual, but it looks like the
> userptr objects are persistent. Hence they need FOLL_LONGTERM.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Kukjin Kim <kgene@kernel.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> ---
>   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index c83f6faac9de..514fd000feb1 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -478,7 +478,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   		goto err_free;
>   	}
>   
> -	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +	ret = pin_user_pages_fast(start, npages,
> +				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
>   				  g2d_userptr->pages);
>   	if (ret != npages) {
>   		DRM_DEV_ERROR(g2d->dev,
> 

Looks good from a pin_user_pages_fast() point of view. I'm of course not a exynos
developer, so we still need a look from one of those, ideally, but:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 02/13] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
@ 2020-10-07 20:43     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:43 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Seung-Woo Kim, Jérôme Glisse,
	Krzysztof Kozlowski, linux-mm, Kyungmin Park, Kukjin Kim,
	Daniel Vetter, Andrew Morton, Dan Williams, linux-arm-kernel,
	linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> The exynos g2d interface is very unusual, but it looks like the
> userptr objects are persistent. Hence they need FOLL_LONGTERM.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Kukjin Kim <kgene@kernel.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> ---
>   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index c83f6faac9de..514fd000feb1 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -478,7 +478,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   		goto err_free;
>   	}
>   
> -	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +	ret = pin_user_pages_fast(start, npages,
> +				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
>   				  g2d_userptr->pages);
>   	if (ret != npages) {
>   		DRM_DEV_ERROR(g2d->dev,
> 

Looks good from a pin_user_pages_fast() point of view. I'm of course not a exynos
developer, so we still need a look from one of those, ideally, but:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 04/13] misc/habana: Use FOLL_LONGTERM for userptr
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 20:46     ` John Hubbard
  -1 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:46 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Andrew Morton,
	Jérôme Glisse, Jan Kara, Dan Williams, Oded Gabbay,
	Omer Shpigelman, Ofir Bitton, Tomer Tayar, Moti Haimovski,
	Greg Kroah-Hartman, Pawel Piskorski

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> These are persistent, not just for the duration of a dma operation.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Oded Gabbay <oded.gabbay@gmail.com>
> Cc: Omer Shpigelman <oshpigelman@habana.ai>
> Cc: Ofir Bitton <obitton@habana.ai>
> Cc: Tomer Tayar <ttayar@habana.ai>
> Cc: Moti Haimovski <mhaimovski@habana.ai>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Pawel Piskorski <ppiskorski@habana.ai>
> ---
>   drivers/misc/habanalabs/common/memory.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> index ef89cfa2f95a..94bef8faa82a 100644
> --- a/drivers/misc/habanalabs/common/memory.c
> +++ b/drivers/misc/habanalabs/common/memory.c
> @@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
>   		return -ENOMEM;
>   	}
>   
> -	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +	rc = pin_user_pages_fast(start, npages,
> +				 FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
>   				 userptr->pages);
>   
>   	if (rc != npages) {
> 

Again, from a pin_user_pages_fast() point of view, and not being at all familiar
with the habana driver (but their use of this really does seem clearly _LONGTERM!):

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 04/13] misc/habana: Use FOLL_LONGTERM for userptr
@ 2020-10-07 20:46     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:46 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: Oded Gabbay, linux-s390, linux-samsung-soc, Jan Kara, kvm,
	Jason Gunthorpe, Pawel Piskorski, Greg Kroah-Hartman,
	Ofir Bitton, linux-mm, Jérôme Glisse, Tomer Tayar,
	Omer Shpigelman, Daniel Vetter, Andrew Morton, Moti Haimovski,
	Dan Williams, linux-arm-kernel, linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> These are persistent, not just for the duration of a dma operation.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Oded Gabbay <oded.gabbay@gmail.com>
> Cc: Omer Shpigelman <oshpigelman@habana.ai>
> Cc: Ofir Bitton <obitton@habana.ai>
> Cc: Tomer Tayar <ttayar@habana.ai>
> Cc: Moti Haimovski <mhaimovski@habana.ai>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Pawel Piskorski <ppiskorski@habana.ai>
> ---
>   drivers/misc/habanalabs/common/memory.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> index ef89cfa2f95a..94bef8faa82a 100644
> --- a/drivers/misc/habanalabs/common/memory.c
> +++ b/drivers/misc/habanalabs/common/memory.c
> @@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
>   		return -ENOMEM;
>   	}
>   
> -	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +	rc = pin_user_pages_fast(start, npages,
> +				 FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
>   				 userptr->pages);
>   
>   	if (rc != npages) {
> 

Again, from a pin_user_pages_fast() point of view, and not being at all familiar
with the habana driver (but their use of this really does seem clearly _LONGTERM!):

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 04/13] misc/habana: Use FOLL_LONGTERM for userptr
@ 2020-10-07 20:46     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 20:46 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, kvm, Jason Gunthorpe,
	Pawel Piskorski, Greg Kroah-Hartman, Ofir Bitton, linux-mm,
	Jérôme Glisse, Tomer Tayar, Omer Shpigelman,
	Daniel Vetter, Andrew Morton, Moti Haimovski, Dan Williams,
	linux-arm-kernel, linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> These are persistent, not just for the duration of a dma operation.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Oded Gabbay <oded.gabbay@gmail.com>
> Cc: Omer Shpigelman <oshpigelman@habana.ai>
> Cc: Ofir Bitton <obitton@habana.ai>
> Cc: Tomer Tayar <ttayar@habana.ai>
> Cc: Moti Haimovski <mhaimovski@habana.ai>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Pawel Piskorski <ppiskorski@habana.ai>
> ---
>   drivers/misc/habanalabs/common/memory.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> index ef89cfa2f95a..94bef8faa82a 100644
> --- a/drivers/misc/habanalabs/common/memory.c
> +++ b/drivers/misc/habanalabs/common/memory.c
> @@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
>   		return -ENOMEM;
>   	}
>   
> -	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +	rc = pin_user_pages_fast(start, npages,
> +				 FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
>   				 userptr->pages);
>   
>   	if (rc != npages) {
> 

Again, from a pin_user_pages_fast() point of view, and not being at all familiar
with the habana driver (but their use of this really does seem clearly _LONGTERM!):

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 21:13     ` John Hubbard
  -1 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 21:13 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa,
	Mauro Carvalho Chehab, Andrew Morton, Jérôme Glisse,
	Jan Kara, Dan Williams

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> This is used by media/videbuf2 for persistent dma mappings, not just
> for a single dma operation and then freed again, so needs
> FOLL_LONGTERM.
> 
> Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> locking issues. Rework the code to pull the pup path out from the
> mmap_sem critical section as suggested by Jason.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> ---
>   mm/frame_vector.c | 36 +++++++++++-------------------------
>   1 file changed, 11 insertions(+), 25 deletions(-)
> 
> diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> index 10f82d5643b6..39db520a51dc 100644
> --- a/mm/frame_vector.c
> +++ b/mm/frame_vector.c
> @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>   	struct vm_area_struct *vma;
>   	int ret = 0;
>   	int err;
> -	int locked;
>   
>   	if (nr_frames == 0)
>   		return 0;
> @@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>   
>   	start = untagged_addr(start);
>   
> +	ret = pin_user_pages_fast(start, nr_frames,
> +				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> +				  (struct page **)(vec->ptrs));
> +	if (ret > 0) {
> +		vec->got_ref = true;
> +		vec->is_pfns = false;
> +		goto out_unlocked;
> +	}

This part looks good, and changing to _fast is a potential performance improvement,
too.

> +
>   	mmap_read_lock(mm);
> -	locked = 1;
>   	vma = find_vma_intersection(mm, start, start + 1);
>   	if (!vma) {
>   		ret = -EFAULT;
>   		goto out;
>   	}
>   
> -	/*
> -	 * While get_vaddr_frames() could be used for transient (kernel
> -	 * controlled lifetime) pinning of memory pages all current
> -	 * users establish long term (userspace controlled lifetime)
> -	 * page pinning. Treat get_vaddr_frames() like
> -	 * get_user_pages_longterm() and disallow it for filesystem-dax
> -	 * mappings.
> -	 */
> -	if (vma_is_fsdax(vma)) {
> -		ret = -EOPNOTSUPP;
> -		goto out;
> -	}

Are you sure we don't need to check vma_is_fsdax() anymore?

> -
> -	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> -		vec->got_ref = true;
> -		vec->is_pfns = false;
> -		ret = pin_user_pages_locked(start, nr_frames,
> -			gup_flags, (struct page **)(vec->ptrs), &locked);
> -		goto out;
> -	}
> -
>   	vec->got_ref = false;
>   	vec->is_pfns = true;
>   	do {
> @@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>   		vma = find_vma_intersection(mm, start, start + 1);
>   	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
>   out:
> -	if (locked)
> -		mmap_read_unlock(mm);
> +	mmap_read_unlock(mm);
> +out_unlocked:
>   	if (!ret)
>   		ret = -EFAULT;
>   	if (ret > 0)
> 

All of the error handling still looks accurate there.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 21:13     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 21:13 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Mauro Carvalho Chehab, Jérôme Glisse,
	Tomasz Figa, linux-mm, Kyungmin Park, Daniel Vetter,
	Andrew Morton, Marek Szyprowski, Dan Williams, linux-arm-kernel,
	linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> This is used by media/videbuf2 for persistent dma mappings, not just
> for a single dma operation and then freed again, so needs
> FOLL_LONGTERM.
> 
> Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> locking issues. Rework the code to pull the pup path out from the
> mmap_sem critical section as suggested by Jason.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> ---
>   mm/frame_vector.c | 36 +++++++++++-------------------------
>   1 file changed, 11 insertions(+), 25 deletions(-)
> 
> diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> index 10f82d5643b6..39db520a51dc 100644
> --- a/mm/frame_vector.c
> +++ b/mm/frame_vector.c
> @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>   	struct vm_area_struct *vma;
>   	int ret = 0;
>   	int err;
> -	int locked;
>   
>   	if (nr_frames == 0)
>   		return 0;
> @@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>   
>   	start = untagged_addr(start);
>   
> +	ret = pin_user_pages_fast(start, nr_frames,
> +				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> +				  (struct page **)(vec->ptrs));
> +	if (ret > 0) {
> +		vec->got_ref = true;
> +		vec->is_pfns = false;
> +		goto out_unlocked;
> +	}

This part looks good, and changing to _fast is a potential performance improvement,
too.

> +
>   	mmap_read_lock(mm);
> -	locked = 1;
>   	vma = find_vma_intersection(mm, start, start + 1);
>   	if (!vma) {
>   		ret = -EFAULT;
>   		goto out;
>   	}
>   
> -	/*
> -	 * While get_vaddr_frames() could be used for transient (kernel
> -	 * controlled lifetime) pinning of memory pages all current
> -	 * users establish long term (userspace controlled lifetime)
> -	 * page pinning. Treat get_vaddr_frames() like
> -	 * get_user_pages_longterm() and disallow it for filesystem-dax
> -	 * mappings.
> -	 */
> -	if (vma_is_fsdax(vma)) {
> -		ret = -EOPNOTSUPP;
> -		goto out;
> -	}

Are you sure we don't need to check vma_is_fsdax() anymore?

> -
> -	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> -		vec->got_ref = true;
> -		vec->is_pfns = false;
> -		ret = pin_user_pages_locked(start, nr_frames,
> -			gup_flags, (struct page **)(vec->ptrs), &locked);
> -		goto out;
> -	}
> -
>   	vec->got_ref = false;
>   	vec->is_pfns = true;
>   	do {
> @@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>   		vma = find_vma_intersection(mm, start, start + 1);
>   	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
>   out:
> -	if (locked)
> -		mmap_read_unlock(mm);
> +	mmap_read_unlock(mm);
> +out_unlocked:
>   	if (!ret)
>   		ret = -EFAULT;
>   	if (ret > 0)
> 

All of the error handling still looks accurate there.

thanks,
-- 
John Hubbard
NVIDIA

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 21:13     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 21:13 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Mauro Carvalho Chehab, Jérôme Glisse,
	Tomasz Figa, linux-mm, Kyungmin Park, Daniel Vetter,
	Andrew Morton, Marek Szyprowski, Dan Williams, linux-arm-kernel,
	linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> This is used by media/videbuf2 for persistent dma mappings, not just
> for a single dma operation and then freed again, so needs
> FOLL_LONGTERM.
> 
> Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> locking issues. Rework the code to pull the pup path out from the
> mmap_sem critical section as suggested by Jason.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> ---
>   mm/frame_vector.c | 36 +++++++++++-------------------------
>   1 file changed, 11 insertions(+), 25 deletions(-)
> 
> diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> index 10f82d5643b6..39db520a51dc 100644
> --- a/mm/frame_vector.c
> +++ b/mm/frame_vector.c
> @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>   	struct vm_area_struct *vma;
>   	int ret = 0;
>   	int err;
> -	int locked;
>   
>   	if (nr_frames == 0)
>   		return 0;
> @@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>   
>   	start = untagged_addr(start);
>   
> +	ret = pin_user_pages_fast(start, nr_frames,
> +				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> +				  (struct page **)(vec->ptrs));
> +	if (ret > 0) {
> +		vec->got_ref = true;
> +		vec->is_pfns = false;
> +		goto out_unlocked;
> +	}

This part looks good, and changing to _fast is a potential performance improvement,
too.

> +
>   	mmap_read_lock(mm);
> -	locked = 1;
>   	vma = find_vma_intersection(mm, start, start + 1);
>   	if (!vma) {
>   		ret = -EFAULT;
>   		goto out;
>   	}
>   
> -	/*
> -	 * While get_vaddr_frames() could be used for transient (kernel
> -	 * controlled lifetime) pinning of memory pages all current
> -	 * users establish long term (userspace controlled lifetime)
> -	 * page pinning. Treat get_vaddr_frames() like
> -	 * get_user_pages_longterm() and disallow it for filesystem-dax
> -	 * mappings.
> -	 */
> -	if (vma_is_fsdax(vma)) {
> -		ret = -EOPNOTSUPP;
> -		goto out;
> -	}

Are you sure we don't need to check vma_is_fsdax() anymore?

> -
> -	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> -		vec->got_ref = true;
> -		vec->is_pfns = false;
> -		ret = pin_user_pages_locked(start, nr_frames,
> -			gup_flags, (struct page **)(vec->ptrs), &locked);
> -		goto out;
> -	}
> -
>   	vec->got_ref = false;
>   	vec->is_pfns = true;
>   	do {
> @@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>   		vma = find_vma_intersection(mm, start, start + 1);
>   	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
>   out:
> -	if (locked)
> -		mmap_read_unlock(mm);
> +	mmap_read_unlock(mm);
> +out_unlocked:
>   	if (!ret)
>   		ret = -EFAULT;
>   	if (ret > 0)
> 

All of the error handling still looks accurate there.

thanks,
-- 
John Hubbard
NVIDIA
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
  2020-10-07 21:13     ` John Hubbard
  (?)
@ 2020-10-07 21:30       ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 21:30 UTC (permalink / raw)
  To: John Hubbard, Dan Williams
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa,
	Mauro Carvalho Chehab, Andrew Morton, Jérôme Glisse,
	Jan Kara

On Wed, Oct 7, 2020 at 11:13 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > This is used by media/videbuf2 for persistent dma mappings, not just
> > for a single dma operation and then freed again, so needs
> > FOLL_LONGTERM.
> >
> > Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> > locking issues. Rework the code to pull the pup path out from the
> > mmap_sem critical section as suggested by Jason.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Pawel Osciak <pawel@osciak.com>
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Tomasz Figa <tfiga@chromium.org>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > ---
> >   mm/frame_vector.c | 36 +++++++++++-------------------------
> >   1 file changed, 11 insertions(+), 25 deletions(-)
> >
> > diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> > index 10f82d5643b6..39db520a51dc 100644
> > --- a/mm/frame_vector.c
> > +++ b/mm/frame_vector.c
> > @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >       struct vm_area_struct *vma;
> >       int ret = 0;
> >       int err;
> > -     int locked;
> >
> >       if (nr_frames == 0)
> >               return 0;
> > @@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >
> >       start = untagged_addr(start);
> >
> > +     ret = pin_user_pages_fast(start, nr_frames,
> > +                               FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> > +                               (struct page **)(vec->ptrs));
> > +     if (ret > 0) {
> > +             vec->got_ref = true;
> > +             vec->is_pfns = false;
> > +             goto out_unlocked;
> > +     }
>
> This part looks good, and changing to _fast is a potential performance improvement,
> too.
>
> > +
> >       mmap_read_lock(mm);
> > -     locked = 1;
> >       vma = find_vma_intersection(mm, start, start + 1);
> >       if (!vma) {
> >               ret = -EFAULT;
> >               goto out;
> >       }
> >
> > -     /*
> > -      * While get_vaddr_frames() could be used for transient (kernel
> > -      * controlled lifetime) pinning of memory pages all current
> > -      * users establish long term (userspace controlled lifetime)
> > -      * page pinning. Treat get_vaddr_frames() like
> > -      * get_user_pages_longterm() and disallow it for filesystem-dax
> > -      * mappings.
> > -      */
> > -     if (vma_is_fsdax(vma)) {
> > -             ret = -EOPNOTSUPP;
> > -             goto out;
> > -     }
>
> Are you sure we don't need to check vma_is_fsdax() anymore?

Since FOLL_LONGTERM checks for this and can only return struct page
backed memory, and explicitly excludes VM_IO | VM_PFNMAP, was assuming
this is not needed for follow_pfn. And the get_user_pages_locked this
used back then didn't have the same check, hence why it was added (and
FOLL_LONGTERM still doesn't work for the _locked versions, as you
pointed out on the last round of this discussion).

But now that you're asking, I have no idea whether fsdax vma can also
be of VM_IO | VM_PFNMAP type. I'm not seeing that set anywhere in
fs/dax.c, but that says nothing :-)

Dan, you added this check originally, do we need it for VM_SPECIAL vmas too?

Thanks, Daniel

>
> > -
> > -     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > -             vec->got_ref = true;
> > -             vec->is_pfns = false;
> > -             ret = pin_user_pages_locked(start, nr_frames,
> > -                     gup_flags, (struct page **)(vec->ptrs), &locked);
> > -             goto out;
> > -     }
> > -
> >       vec->got_ref = false;
> >       vec->is_pfns = true;
> >       do {
> > @@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >               vma = find_vma_intersection(mm, start, start + 1);
> >       } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
> >   out:
> > -     if (locked)
> > -             mmap_read_unlock(mm);
> > +     mmap_read_unlock(mm);
> > +out_unlocked:
> >       if (!ret)
> >               ret = -EFAULT;
> >       if (ret > 0)
> >
>
> All of the error handling still looks accurate there.
>
> thanks,
> --
> John Hubbard
> NVIDIA



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 21:30       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 21:30 UTC (permalink / raw)
  To: John Hubbard, Dan Williams
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, Jason Gunthorpe, Mauro Carvalho Chehab, LKML,
	DRI Development, Tomasz Figa, Linux MM, Kyungmin Park,
	Daniel Vetter, Andrew Morton, Marek Szyprowski, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 11:13 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > This is used by media/videbuf2 for persistent dma mappings, not just
> > for a single dma operation and then freed again, so needs
> > FOLL_LONGTERM.
> >
> > Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> > locking issues. Rework the code to pull the pup path out from the
> > mmap_sem critical section as suggested by Jason.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Pawel Osciak <pawel@osciak.com>
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Tomasz Figa <tfiga@chromium.org>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > ---
> >   mm/frame_vector.c | 36 +++++++++++-------------------------
> >   1 file changed, 11 insertions(+), 25 deletions(-)
> >
> > diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> > index 10f82d5643b6..39db520a51dc 100644
> > --- a/mm/frame_vector.c
> > +++ b/mm/frame_vector.c
> > @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >       struct vm_area_struct *vma;
> >       int ret = 0;
> >       int err;
> > -     int locked;
> >
> >       if (nr_frames == 0)
> >               return 0;
> > @@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >
> >       start = untagged_addr(start);
> >
> > +     ret = pin_user_pages_fast(start, nr_frames,
> > +                               FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> > +                               (struct page **)(vec->ptrs));
> > +     if (ret > 0) {
> > +             vec->got_ref = true;
> > +             vec->is_pfns = false;
> > +             goto out_unlocked;
> > +     }
>
> This part looks good, and changing to _fast is a potential performance improvement,
> too.
>
> > +
> >       mmap_read_lock(mm);
> > -     locked = 1;
> >       vma = find_vma_intersection(mm, start, start + 1);
> >       if (!vma) {
> >               ret = -EFAULT;
> >               goto out;
> >       }
> >
> > -     /*
> > -      * While get_vaddr_frames() could be used for transient (kernel
> > -      * controlled lifetime) pinning of memory pages all current
> > -      * users establish long term (userspace controlled lifetime)
> > -      * page pinning. Treat get_vaddr_frames() like
> > -      * get_user_pages_longterm() and disallow it for filesystem-dax
> > -      * mappings.
> > -      */
> > -     if (vma_is_fsdax(vma)) {
> > -             ret = -EOPNOTSUPP;
> > -             goto out;
> > -     }
>
> Are you sure we don't need to check vma_is_fsdax() anymore?

Since FOLL_LONGTERM checks for this and can only return struct page
backed memory, and explicitly excludes VM_IO | VM_PFNMAP, was assuming
this is not needed for follow_pfn. And the get_user_pages_locked this
used back then didn't have the same check, hence why it was added (and
FOLL_LONGTERM still doesn't work for the _locked versions, as you
pointed out on the last round of this discussion).

But now that you're asking, I have no idea whether fsdax vma can also
be of VM_IO | VM_PFNMAP type. I'm not seeing that set anywhere in
fs/dax.c, but that says nothing :-)

Dan, you added this check originally, do we need it for VM_SPECIAL vmas too?

Thanks, Daniel

>
> > -
> > -     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > -             vec->got_ref = true;
> > -             vec->is_pfns = false;
> > -             ret = pin_user_pages_locked(start, nr_frames,
> > -                     gup_flags, (struct page **)(vec->ptrs), &locked);
> > -             goto out;
> > -     }
> > -
> >       vec->got_ref = false;
> >       vec->is_pfns = true;
> >       do {
> > @@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >               vma = find_vma_intersection(mm, start, start + 1);
> >       } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
> >   out:
> > -     if (locked)
> > -             mmap_read_unlock(mm);
> > +     mmap_read_unlock(mm);
> > +out_unlocked:
> >       if (!ret)
> >               ret = -EFAULT;
> >       if (ret > 0)
> >
>
> All of the error handling still looks accurate there.
>
> thanks,
> --
> John Hubbard
> NVIDIA



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-07 21:30       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 21:30 UTC (permalink / raw)
  To: John Hubbard, Dan Williams
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, Jason Gunthorpe, Mauro Carvalho Chehab, LKML,
	DRI Development, Tomasz Figa, Linux MM, Kyungmin Park,
	Daniel Vetter, Andrew Morton, Marek Szyprowski, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 11:13 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > This is used by media/videbuf2 for persistent dma mappings, not just
> > for a single dma operation and then freed again, so needs
> > FOLL_LONGTERM.
> >
> > Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> > locking issues. Rework the code to pull the pup path out from the
> > mmap_sem critical section as suggested by Jason.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Pawel Osciak <pawel@osciak.com>
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Tomasz Figa <tfiga@chromium.org>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > ---
> >   mm/frame_vector.c | 36 +++++++++++-------------------------
> >   1 file changed, 11 insertions(+), 25 deletions(-)
> >
> > diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> > index 10f82d5643b6..39db520a51dc 100644
> > --- a/mm/frame_vector.c
> > +++ b/mm/frame_vector.c
> > @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >       struct vm_area_struct *vma;
> >       int ret = 0;
> >       int err;
> > -     int locked;
> >
> >       if (nr_frames == 0)
> >               return 0;
> > @@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >
> >       start = untagged_addr(start);
> >
> > +     ret = pin_user_pages_fast(start, nr_frames,
> > +                               FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> > +                               (struct page **)(vec->ptrs));
> > +     if (ret > 0) {
> > +             vec->got_ref = true;
> > +             vec->is_pfns = false;
> > +             goto out_unlocked;
> > +     }
>
> This part looks good, and changing to _fast is a potential performance improvement,
> too.
>
> > +
> >       mmap_read_lock(mm);
> > -     locked = 1;
> >       vma = find_vma_intersection(mm, start, start + 1);
> >       if (!vma) {
> >               ret = -EFAULT;
> >               goto out;
> >       }
> >
> > -     /*
> > -      * While get_vaddr_frames() could be used for transient (kernel
> > -      * controlled lifetime) pinning of memory pages all current
> > -      * users establish long term (userspace controlled lifetime)
> > -      * page pinning. Treat get_vaddr_frames() like
> > -      * get_user_pages_longterm() and disallow it for filesystem-dax
> > -      * mappings.
> > -      */
> > -     if (vma_is_fsdax(vma)) {
> > -             ret = -EOPNOTSUPP;
> > -             goto out;
> > -     }
>
> Are you sure we don't need to check vma_is_fsdax() anymore?

Since FOLL_LONGTERM checks for this and can only return struct page
backed memory, and explicitly excludes VM_IO | VM_PFNMAP, was assuming
this is not needed for follow_pfn. And the get_user_pages_locked this
used back then didn't have the same check, hence why it was added (and
FOLL_LONGTERM still doesn't work for the _locked versions, as you
pointed out on the last round of this discussion).

But now that you're asking, I have no idea whether fsdax vma can also
be of VM_IO | VM_PFNMAP type. I'm not seeing that set anywhere in
fs/dax.c, but that says nothing :-)

Dan, you added this check originally, do we need it for VM_SPECIAL vmas too?

Thanks, Daniel

>
> > -
> > -     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > -             vec->got_ref = true;
> > -             vec->is_pfns = false;
> > -             ret = pin_user_pages_locked(start, nr_frames,
> > -                     gup_flags, (struct page **)(vec->ptrs), &locked);
> > -             goto out;
> > -     }
> > -
> >       vec->got_ref = false;
> >       vec->is_pfns = true;
> >       do {
> > @@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >               vma = find_vma_intersection(mm, start, start + 1);
> >       } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
> >   out:
> > -     if (locked)
> > -             mmap_read_unlock(mm);
> > +     mmap_read_unlock(mm);
> > +out_unlocked:
> >       if (!ret)
> >               ret = -EFAULT;
> >       if (ret > 0)
> >
>
> All of the error handling still looks accurate there.
>
> thanks,
> --
> John Hubbard
> NVIDIA



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
  2020-10-07 20:32     ` John Hubbard
  (?)
@ 2020-10-07 21:32       ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 21:32 UTC (permalink / raw)
  To: John Hubbard
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Inki Dae,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Andrew Morton, Jérôme Glisse,
	Jan Kara, Dan Williams

On Wed, Oct 7, 2020 at 10:33 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > All we need are a pages array, pin_user_pages_fast can give us that
> > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Inki Dae <inki.dae@samsung.com>
> > Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> > Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Kukjin Kim <kgene@kernel.org>
> > Cc: Krzysztof Kozlowski <krzk@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > ---
> >   drivers/gpu/drm/exynos/Kconfig          |  1 -
> >   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++++++++++-------------
> >   2 files changed, 22 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
> > index 6417f374b923..43257ef3c09d 100644
> > --- a/drivers/gpu/drm/exynos/Kconfig
> > +++ b/drivers/gpu/drm/exynos/Kconfig
> > @@ -88,7 +88,6 @@ comment "Sub-drivers"
> >   config DRM_EXYNOS_G2D
> >       bool "G2D"
> >       depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
> > -     select FRAME_VECTOR
> >       help
> >         Choose this option if you want to use Exynos G2D for DRM.
> >
> > diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > index 967a5cdc120e..c83f6faac9de 100644
> > --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > @@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
> >       dma_addr_t              dma_addr;
> >       unsigned long           userptr;
> >       unsigned long           size;
> > -     struct frame_vector     *vec;
> > +     struct page             **pages;
> > +     unsigned int            npages;
> >       struct sg_table         *sgt;
> >       atomic_t                refcount;
> >       bool                    in_pool;
> > @@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
> >                                       bool force)
> >   {
> >       struct g2d_cmdlist_userptr *g2d_userptr = obj;
> > -     struct page **pages;
> > +     int i;
>
> The above line can also be deleted, see below.
>
> >
> >       if (!obj)
> >               return;
> > @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
> >       dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
> >                         DMA_BIDIRECTIONAL, 0);
> >
> > -     pages = frame_vector_pages(g2d_userptr->vec);
> > -     if (!IS_ERR(pages)) {
> > -             int i;
> > +     for (i = 0; i < g2d_userptr->npages; i++)
> > +             set_page_dirty_lock(g2d_userptr->pages[i]);
> >
> > -             for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> > -                     set_page_dirty_lock(pages[i]);
> > -     }
> > -     put_vaddr_frames(g2d_userptr->vec);
> > -     frame_vector_destroy(g2d_userptr->vec);
> > +     unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> > +     kvfree(g2d_userptr->pages);
>
> You can avoid writing your own loop, and just simplify the whole thing down to
> two lines:
>
>         unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
>                                     true);
>         kvfree(g2d_userptr->pages);

Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
that has the same thing. Well almost, it only uses set_page_dirty, not
the _lock variant. But I have no idea whether that matters or not?
-Daniel

>
>
> >
> >       if (!g2d_userptr->out_of_list)
> >               list_del_init(&g2d_userptr->list);
> > @@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
> >       offset = userptr & ~PAGE_MASK;
> >       end = PAGE_ALIGN(userptr + size);
> >       npages = (end - start) >> PAGE_SHIFT;
> > -     g2d_userptr->vec = frame_vector_create(npages);
> > -     if (!g2d_userptr->vec) {
> > +     g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
> > +                                         GFP_KERNEL);
> > +     if (!g2d_userptr->pages) {
> >               ret = -ENOMEM;
> >               goto err_free;
> >       }
> >
> > -     ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > -             g2d_userptr->vec);
> > +     ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > +                               g2d_userptr->pages);
> >       if (ret != npages) {
> >               DRM_DEV_ERROR(g2d->dev,
> >                             "failed to get user pages from userptr.\n");
> >               if (ret < 0)
> > -                     goto err_destroy_framevec;
> > -             ret = -EFAULT;
> > -             goto err_put_framevec;
> > -     }
> > -     if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
> > +                     goto err_destroy_pages;
> > +             npages = ret;
> >               ret = -EFAULT;
> > -             goto err_put_framevec;
> > +             goto err_unpin_pages;
> >       }
> > +     g2d_userptr->npages = npages;
> >
> >       sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> >       if (!sgt) {
> >               ret = -ENOMEM;
> > -             goto err_put_framevec;
> > +             goto err_unpin_pages;
> >       }
> >
> >       ret = sg_alloc_table_from_pages(sgt,
> > -                                     frame_vector_pages(g2d_userptr->vec),
> > +                                     g2d_userptr->pages,
> >                                       npages, offset, size, GFP_KERNEL);
> >       if (ret < 0) {
> >               DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
> > @@ -538,11 +534,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
> >   err_free_sgt:
> >       kfree(sgt);
> >
> > -err_put_framevec:
> > -     put_vaddr_frames(g2d_userptr->vec);
> > +err_unpin_pages:
> > +     unpin_user_pages(g2d_userptr->pages, npages);
> >
> > -err_destroy_framevec:
> > -     frame_vector_destroy(g2d_userptr->vec);
> > +err_destroy_pages:
> > +     kvfree(g2d_userptr->pages);
> >
> >   err_free:
> >       kfree(g2d_userptr);
> >
>
> The rest all looks good, you've avoided the usual API pitfalls. :)
>
> thanks,
> --
> John Hubbard
> NVIDIA



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
@ 2020-10-07 21:32       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 21:32 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Joonyoung Shim, kvm, Jason Gunthorpe, Seung-Woo Kim, LKML,
	DRI Development, Inki Dae, Linux MM, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Daniel Vetter, Andrew Morton, Dan Williams,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 10:33 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > All we need are a pages array, pin_user_pages_fast can give us that
> > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Inki Dae <inki.dae@samsung.com>
> > Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> > Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Kukjin Kim <kgene@kernel.org>
> > Cc: Krzysztof Kozlowski <krzk@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > ---
> >   drivers/gpu/drm/exynos/Kconfig          |  1 -
> >   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++++++++++-------------
> >   2 files changed, 22 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
> > index 6417f374b923..43257ef3c09d 100644
> > --- a/drivers/gpu/drm/exynos/Kconfig
> > +++ b/drivers/gpu/drm/exynos/Kconfig
> > @@ -88,7 +88,6 @@ comment "Sub-drivers"
> >   config DRM_EXYNOS_G2D
> >       bool "G2D"
> >       depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
> > -     select FRAME_VECTOR
> >       help
> >         Choose this option if you want to use Exynos G2D for DRM.
> >
> > diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > index 967a5cdc120e..c83f6faac9de 100644
> > --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > @@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
> >       dma_addr_t              dma_addr;
> >       unsigned long           userptr;
> >       unsigned long           size;
> > -     struct frame_vector     *vec;
> > +     struct page             **pages;
> > +     unsigned int            npages;
> >       struct sg_table         *sgt;
> >       atomic_t                refcount;
> >       bool                    in_pool;
> > @@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
> >                                       bool force)
> >   {
> >       struct g2d_cmdlist_userptr *g2d_userptr = obj;
> > -     struct page **pages;
> > +     int i;
>
> The above line can also be deleted, see below.
>
> >
> >       if (!obj)
> >               return;
> > @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
> >       dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
> >                         DMA_BIDIRECTIONAL, 0);
> >
> > -     pages = frame_vector_pages(g2d_userptr->vec);
> > -     if (!IS_ERR(pages)) {
> > -             int i;
> > +     for (i = 0; i < g2d_userptr->npages; i++)
> > +             set_page_dirty_lock(g2d_userptr->pages[i]);
> >
> > -             for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> > -                     set_page_dirty_lock(pages[i]);
> > -     }
> > -     put_vaddr_frames(g2d_userptr->vec);
> > -     frame_vector_destroy(g2d_userptr->vec);
> > +     unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> > +     kvfree(g2d_userptr->pages);
>
> You can avoid writing your own loop, and just simplify the whole thing down to
> two lines:
>
>         unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
>                                     true);
>         kvfree(g2d_userptr->pages);

Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
that has the same thing. Well almost, it only uses set_page_dirty, not
the _lock variant. But I have no idea whether that matters or not?
-Daniel

>
>
> >
> >       if (!g2d_userptr->out_of_list)
> >               list_del_init(&g2d_userptr->list);
> > @@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
> >       offset = userptr & ~PAGE_MASK;
> >       end = PAGE_ALIGN(userptr + size);
> >       npages = (end - start) >> PAGE_SHIFT;
> > -     g2d_userptr->vec = frame_vector_create(npages);
> > -     if (!g2d_userptr->vec) {
> > +     g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
> > +                                         GFP_KERNEL);
> > +     if (!g2d_userptr->pages) {
> >               ret = -ENOMEM;
> >               goto err_free;
> >       }
> >
> > -     ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > -             g2d_userptr->vec);
> > +     ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > +                               g2d_userptr->pages);
> >       if (ret != npages) {
> >               DRM_DEV_ERROR(g2d->dev,
> >                             "failed to get user pages from userptr.\n");
> >               if (ret < 0)
> > -                     goto err_destroy_framevec;
> > -             ret = -EFAULT;
> > -             goto err_put_framevec;
> > -     }
> > -     if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
> > +                     goto err_destroy_pages;
> > +             npages = ret;
> >               ret = -EFAULT;
> > -             goto err_put_framevec;
> > +             goto err_unpin_pages;
> >       }
> > +     g2d_userptr->npages = npages;
> >
> >       sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> >       if (!sgt) {
> >               ret = -ENOMEM;
> > -             goto err_put_framevec;
> > +             goto err_unpin_pages;
> >       }
> >
> >       ret = sg_alloc_table_from_pages(sgt,
> > -                                     frame_vector_pages(g2d_userptr->vec),
> > +                                     g2d_userptr->pages,
> >                                       npages, offset, size, GFP_KERNEL);
> >       if (ret < 0) {
> >               DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
> > @@ -538,11 +534,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
> >   err_free_sgt:
> >       kfree(sgt);
> >
> > -err_put_framevec:
> > -     put_vaddr_frames(g2d_userptr->vec);
> > +err_unpin_pages:
> > +     unpin_user_pages(g2d_userptr->pages, npages);
> >
> > -err_destroy_framevec:
> > -     frame_vector_destroy(g2d_userptr->vec);
> > +err_destroy_pages:
> > +     kvfree(g2d_userptr->pages);
> >
> >   err_free:
> >       kfree(g2d_userptr);
> >
>
> The rest all looks good, you've avoided the usual API pitfalls. :)
>
> thanks,
> --
> John Hubbard
> NVIDIA



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
@ 2020-10-07 21:32       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 21:32 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Joonyoung Shim, kvm, Jason Gunthorpe, Seung-Woo Kim, LKML,
	DRI Development, Linux MM, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Daniel Vetter, Andrew Morton, Dan Williams,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 10:33 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > All we need are a pages array, pin_user_pages_fast can give us that
> > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Inki Dae <inki.dae@samsung.com>
> > Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> > Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Kukjin Kim <kgene@kernel.org>
> > Cc: Krzysztof Kozlowski <krzk@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > ---
> >   drivers/gpu/drm/exynos/Kconfig          |  1 -
> >   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++++++++++-------------
> >   2 files changed, 22 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
> > index 6417f374b923..43257ef3c09d 100644
> > --- a/drivers/gpu/drm/exynos/Kconfig
> > +++ b/drivers/gpu/drm/exynos/Kconfig
> > @@ -88,7 +88,6 @@ comment "Sub-drivers"
> >   config DRM_EXYNOS_G2D
> >       bool "G2D"
> >       depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
> > -     select FRAME_VECTOR
> >       help
> >         Choose this option if you want to use Exynos G2D for DRM.
> >
> > diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > index 967a5cdc120e..c83f6faac9de 100644
> > --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > @@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
> >       dma_addr_t              dma_addr;
> >       unsigned long           userptr;
> >       unsigned long           size;
> > -     struct frame_vector     *vec;
> > +     struct page             **pages;
> > +     unsigned int            npages;
> >       struct sg_table         *sgt;
> >       atomic_t                refcount;
> >       bool                    in_pool;
> > @@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
> >                                       bool force)
> >   {
> >       struct g2d_cmdlist_userptr *g2d_userptr = obj;
> > -     struct page **pages;
> > +     int i;
>
> The above line can also be deleted, see below.
>
> >
> >       if (!obj)
> >               return;
> > @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
> >       dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
> >                         DMA_BIDIRECTIONAL, 0);
> >
> > -     pages = frame_vector_pages(g2d_userptr->vec);
> > -     if (!IS_ERR(pages)) {
> > -             int i;
> > +     for (i = 0; i < g2d_userptr->npages; i++)
> > +             set_page_dirty_lock(g2d_userptr->pages[i]);
> >
> > -             for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> > -                     set_page_dirty_lock(pages[i]);
> > -     }
> > -     put_vaddr_frames(g2d_userptr->vec);
> > -     frame_vector_destroy(g2d_userptr->vec);
> > +     unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> > +     kvfree(g2d_userptr->pages);
>
> You can avoid writing your own loop, and just simplify the whole thing down to
> two lines:
>
>         unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
>                                     true);
>         kvfree(g2d_userptr->pages);

Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
that has the same thing. Well almost, it only uses set_page_dirty, not
the _lock variant. But I have no idea whether that matters or not?
-Daniel

>
>
> >
> >       if (!g2d_userptr->out_of_list)
> >               list_del_init(&g2d_userptr->list);
> > @@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
> >       offset = userptr & ~PAGE_MASK;
> >       end = PAGE_ALIGN(userptr + size);
> >       npages = (end - start) >> PAGE_SHIFT;
> > -     g2d_userptr->vec = frame_vector_create(npages);
> > -     if (!g2d_userptr->vec) {
> > +     g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
> > +                                         GFP_KERNEL);
> > +     if (!g2d_userptr->pages) {
> >               ret = -ENOMEM;
> >               goto err_free;
> >       }
> >
> > -     ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > -             g2d_userptr->vec);
> > +     ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > +                               g2d_userptr->pages);
> >       if (ret != npages) {
> >               DRM_DEV_ERROR(g2d->dev,
> >                             "failed to get user pages from userptr.\n");
> >               if (ret < 0)
> > -                     goto err_destroy_framevec;
> > -             ret = -EFAULT;
> > -             goto err_put_framevec;
> > -     }
> > -     if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
> > +                     goto err_destroy_pages;
> > +             npages = ret;
> >               ret = -EFAULT;
> > -             goto err_put_framevec;
> > +             goto err_unpin_pages;
> >       }
> > +     g2d_userptr->npages = npages;
> >
> >       sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> >       if (!sgt) {
> >               ret = -ENOMEM;
> > -             goto err_put_framevec;
> > +             goto err_unpin_pages;
> >       }
> >
> >       ret = sg_alloc_table_from_pages(sgt,
> > -                                     frame_vector_pages(g2d_userptr->vec),
> > +                                     g2d_userptr->pages,
> >                                       npages, offset, size, GFP_KERNEL);
> >       if (ret < 0) {
> >               DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
> > @@ -538,11 +534,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
> >   err_free_sgt:
> >       kfree(sgt);
> >
> > -err_put_framevec:
> > -     put_vaddr_frames(g2d_userptr->vec);
> > +err_unpin_pages:
> > +     unpin_user_pages(g2d_userptr->pages, npages);
> >
> > -err_destroy_framevec:
> > -     frame_vector_destroy(g2d_userptr->vec);
> > +err_destroy_pages:
> > +     kvfree(g2d_userptr->pages);
> >
> >   err_free:
> >       kfree(g2d_userptr);
> >
>
> The rest all looks good, you've avoided the usual API pitfalls. :)
>
> thanks,
> --
> John Hubbard
> NVIDIA



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
  2020-10-07 21:32       ` Daniel Vetter
  (?)
@ 2020-10-07 21:36         ` John Hubbard
  -1 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 21:36 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Inki Dae,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Andrew Morton, Jérôme Glisse,
	Jan Kara, Dan Williams

On 10/7/20 2:32 PM, Daniel Vetter wrote:
> On Wed, Oct 7, 2020 at 10:33 PM John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> On 10/7/20 9:44 AM, Daniel Vetter wrote:
...
>>> @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>>>        dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
>>>                          DMA_BIDIRECTIONAL, 0);
>>>
>>> -     pages = frame_vector_pages(g2d_userptr->vec);
>>> -     if (!IS_ERR(pages)) {
>>> -             int i;
>>> +     for (i = 0; i < g2d_userptr->npages; i++)
>>> +             set_page_dirty_lock(g2d_userptr->pages[i]);
>>>
>>> -             for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
>>> -                     set_page_dirty_lock(pages[i]);
>>> -     }
>>> -     put_vaddr_frames(g2d_userptr->vec);
>>> -     frame_vector_destroy(g2d_userptr->vec);
>>> +     unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
>>> +     kvfree(g2d_userptr->pages);
>>
>> You can avoid writing your own loop, and just simplify the whole thing down to
>> two lines:
>>
>>          unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
>>                                      true);
>>          kvfree(g2d_userptr->pages);
> 
> Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
> that has the same thing. Well almost, it only uses set_page_dirty, not
> the _lock variant. But I have no idea whether that matters or not?


It matters. And invariably, call sites that use set_page_dirty() instead
of set_page_dirty_lock() were already wrong.  Which is why I never had to
provide anything like "unpin_user_pages_dirty (not locked)".

Although in habanalabs case, I just reviewed patch 3 and I think they *were*
correctly using set_page_dirty_lock()...

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
@ 2020-10-07 21:36         ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 21:36 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Joonyoung Shim, kvm, Jason Gunthorpe, Seung-Woo Kim, LKML,
	DRI Development, Inki Dae, Linux MM, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Daniel Vetter, Andrew Morton, Dan Williams,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On 10/7/20 2:32 PM, Daniel Vetter wrote:
> On Wed, Oct 7, 2020 at 10:33 PM John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> On 10/7/20 9:44 AM, Daniel Vetter wrote:
...
>>> @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>>>        dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
>>>                          DMA_BIDIRECTIONAL, 0);
>>>
>>> -     pages = frame_vector_pages(g2d_userptr->vec);
>>> -     if (!IS_ERR(pages)) {
>>> -             int i;
>>> +     for (i = 0; i < g2d_userptr->npages; i++)
>>> +             set_page_dirty_lock(g2d_userptr->pages[i]);
>>>
>>> -             for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
>>> -                     set_page_dirty_lock(pages[i]);
>>> -     }
>>> -     put_vaddr_frames(g2d_userptr->vec);
>>> -     frame_vector_destroy(g2d_userptr->vec);
>>> +     unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
>>> +     kvfree(g2d_userptr->pages);
>>
>> You can avoid writing your own loop, and just simplify the whole thing down to
>> two lines:
>>
>>          unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
>>                                      true);
>>          kvfree(g2d_userptr->pages);
> 
> Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
> that has the same thing. Well almost, it only uses set_page_dirty, not
> the _lock variant. But I have no idea whether that matters or not?


It matters. And invariably, call sites that use set_page_dirty() instead
of set_page_dirty_lock() were already wrong.  Which is why I never had to
provide anything like "unpin_user_pages_dirty (not locked)".

Although in habanalabs case, I just reviewed patch 3 and I think they *were*
correctly using set_page_dirty_lock()...

thanks,
-- 
John Hubbard
NVIDIA

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
@ 2020-10-07 21:36         ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 21:36 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Joonyoung Shim, kvm, Jason Gunthorpe, Seung-Woo Kim, LKML,
	DRI Development, Linux MM, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Daniel Vetter, Andrew Morton, Dan Williams,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On 10/7/20 2:32 PM, Daniel Vetter wrote:
> On Wed, Oct 7, 2020 at 10:33 PM John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> On 10/7/20 9:44 AM, Daniel Vetter wrote:
...
>>> @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>>>        dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
>>>                          DMA_BIDIRECTIONAL, 0);
>>>
>>> -     pages = frame_vector_pages(g2d_userptr->vec);
>>> -     if (!IS_ERR(pages)) {
>>> -             int i;
>>> +     for (i = 0; i < g2d_userptr->npages; i++)
>>> +             set_page_dirty_lock(g2d_userptr->pages[i]);
>>>
>>> -             for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
>>> -                     set_page_dirty_lock(pages[i]);
>>> -     }
>>> -     put_vaddr_frames(g2d_userptr->vec);
>>> -     frame_vector_destroy(g2d_userptr->vec);
>>> +     unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
>>> +     kvfree(g2d_userptr->pages);
>>
>> You can avoid writing your own loop, and just simplify the whole thing down to
>> two lines:
>>
>>          unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
>>                                      true);
>>          kvfree(g2d_userptr->pages);
> 
> Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
> that has the same thing. Well almost, it only uses set_page_dirty, not
> the _lock variant. But I have no idea whether that matters or not?


It matters. And invariably, call sites that use set_page_dirty() instead
of set_page_dirty_lock() were already wrong.  Which is why I never had to
provide anything like "unpin_user_pages_dirty (not locked)".

Although in habanalabs case, I just reviewed patch 3 and I think they *were*
correctly using set_page_dirty_lock()...

thanks,
-- 
John Hubbard
NVIDIA
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
  2020-10-07 21:36         ` John Hubbard
  (?)
@ 2020-10-07 21:50           ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 21:50 UTC (permalink / raw)
  To: John Hubbard
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Inki Dae,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Andrew Morton, Jérôme Glisse,
	Jan Kara, Dan Williams

On Wed, Oct 7, 2020 at 11:37 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 2:32 PM, Daniel Vetter wrote:
> > On Wed, Oct 7, 2020 at 10:33 PM John Hubbard <jhubbard@nvidia.com> wrote:
> >>
> >> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> ...
> >>> @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
> >>>        dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
> >>>                          DMA_BIDIRECTIONAL, 0);
> >>>
> >>> -     pages = frame_vector_pages(g2d_userptr->vec);
> >>> -     if (!IS_ERR(pages)) {
> >>> -             int i;
> >>> +     for (i = 0; i < g2d_userptr->npages; i++)
> >>> +             set_page_dirty_lock(g2d_userptr->pages[i]);
> >>>
> >>> -             for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> >>> -                     set_page_dirty_lock(pages[i]);
> >>> -     }
> >>> -     put_vaddr_frames(g2d_userptr->vec);
> >>> -     frame_vector_destroy(g2d_userptr->vec);
> >>> +     unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> >>> +     kvfree(g2d_userptr->pages);
> >>
> >> You can avoid writing your own loop, and just simplify the whole thing down to
> >> two lines:
> >>
> >>          unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
> >>                                      true);
> >>          kvfree(g2d_userptr->pages);
> >
> > Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
> > that has the same thing. Well almost, it only uses set_page_dirty, not
> > the _lock variant. But I have no idea whether that matters or not?
>
>
> It matters. And invariably, call sites that use set_page_dirty() instead
> of set_page_dirty_lock() were already wrong.  Which is why I never had to
> provide anything like "unpin_user_pages_dirty (not locked)".
>
> Although in habanalabs case, I just reviewed patch 3 and I think they *were*
> correctly using set_page_dirty_lock()...

Yeah I mixed that up with some other code I read, habanalabs is using
_lock. I have seen a pile of gup/pup code though that only uses
set_page_dirty. And looking around I did not really parse the comment
above set_page_dirty(). I guess just using the _lock variant shouldn't
hurt too much. I've found a comment though from the infiniband umem
notifier that it's sometimes called with the page locked, and
sometimes not, so life is complicated there. But how it avoids races I
didn't understand.
-Daniel


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
@ 2020-10-07 21:50           ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 21:50 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Joonyoung Shim, kvm, Jason Gunthorpe, Seung-Woo Kim, LKML,
	DRI Development, Inki Dae, Linux MM, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Daniel Vetter, Andrew Morton, Dan Williams,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 11:37 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 2:32 PM, Daniel Vetter wrote:
> > On Wed, Oct 7, 2020 at 10:33 PM John Hubbard <jhubbard@nvidia.com> wrote:
> >>
> >> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> ...
> >>> @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
> >>>        dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
> >>>                          DMA_BIDIRECTIONAL, 0);
> >>>
> >>> -     pages = frame_vector_pages(g2d_userptr->vec);
> >>> -     if (!IS_ERR(pages)) {
> >>> -             int i;
> >>> +     for (i = 0; i < g2d_userptr->npages; i++)
> >>> +             set_page_dirty_lock(g2d_userptr->pages[i]);
> >>>
> >>> -             for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> >>> -                     set_page_dirty_lock(pages[i]);
> >>> -     }
> >>> -     put_vaddr_frames(g2d_userptr->vec);
> >>> -     frame_vector_destroy(g2d_userptr->vec);
> >>> +     unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> >>> +     kvfree(g2d_userptr->pages);
> >>
> >> You can avoid writing your own loop, and just simplify the whole thing down to
> >> two lines:
> >>
> >>          unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
> >>                                      true);
> >>          kvfree(g2d_userptr->pages);
> >
> > Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
> > that has the same thing. Well almost, it only uses set_page_dirty, not
> > the _lock variant. But I have no idea whether that matters or not?
>
>
> It matters. And invariably, call sites that use set_page_dirty() instead
> of set_page_dirty_lock() were already wrong.  Which is why I never had to
> provide anything like "unpin_user_pages_dirty (not locked)".
>
> Although in habanalabs case, I just reviewed patch 3 and I think they *were*
> correctly using set_page_dirty_lock()...

Yeah I mixed that up with some other code I read, habanalabs is using
_lock. I have seen a pile of gup/pup code though that only uses
set_page_dirty. And looking around I did not really parse the comment
above set_page_dirty(). I guess just using the _lock variant shouldn't
hurt too much. I've found a comment though from the infiniband umem
notifier that it's sometimes called with the page locked, and
sometimes not, so life is complicated there. But how it avoids races I
didn't understand.
-Daniel


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers
@ 2020-10-07 21:50           ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-07 21:50 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Joonyoung Shim, kvm, Jason Gunthorpe, Seung-Woo Kim, LKML,
	DRI Development, Linux MM, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Daniel Vetter, Andrew Morton, Dan Williams,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 7, 2020 at 11:37 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 2:32 PM, Daniel Vetter wrote:
> > On Wed, Oct 7, 2020 at 10:33 PM John Hubbard <jhubbard@nvidia.com> wrote:
> >>
> >> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> ...
> >>> @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
> >>>        dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
> >>>                          DMA_BIDIRECTIONAL, 0);
> >>>
> >>> -     pages = frame_vector_pages(g2d_userptr->vec);
> >>> -     if (!IS_ERR(pages)) {
> >>> -             int i;
> >>> +     for (i = 0; i < g2d_userptr->npages; i++)
> >>> +             set_page_dirty_lock(g2d_userptr->pages[i]);
> >>>
> >>> -             for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> >>> -                     set_page_dirty_lock(pages[i]);
> >>> -     }
> >>> -     put_vaddr_frames(g2d_userptr->vec);
> >>> -     frame_vector_destroy(g2d_userptr->vec);
> >>> +     unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> >>> +     kvfree(g2d_userptr->pages);
> >>
> >> You can avoid writing your own loop, and just simplify the whole thing down to
> >> two lines:
> >>
> >>          unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
> >>                                      true);
> >>          kvfree(g2d_userptr->pages);
> >
> > Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
> > that has the same thing. Well almost, it only uses set_page_dirty, not
> > the _lock variant. But I have no idea whether that matters or not?
>
>
> It matters. And invariably, call sites that use set_page_dirty() instead
> of set_page_dirty_lock() were already wrong.  Which is why I never had to
> provide anything like "unpin_user_pages_dirty (not locked)".
>
> Although in habanalabs case, I just reviewed patch 3 and I think they *were*
> correctly using set_page_dirty_lock()...

Yeah I mixed that up with some other code I read, habanalabs is using
_lock. I have seen a pile of gup/pup code though that only uses
set_page_dirty. And looking around I did not really parse the comment
above set_page_dirty(). I guess just using the _lock variant shouldn't
hurt too much. I've found a comment though from the infiniband umem
notifier that it's sometimes called with the page locked, and
sometimes not, so life is complicated there. But how it avoids races I
didn't understand.
-Daniel


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 06/13] media: videobuf2: Move frame_vector into media subsystem
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-07 22:18     ` John Hubbard
  -1 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 22:18 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa,
	Mauro Carvalho Chehab, Andrew Morton, Jérôme Glisse,
	Jan Kara, Dan Williams

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> symbol from all over the tree (well just one place, somehow omap media
> driver still had this in its Kconfig, despite not using it).
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---

Failed to spot any problems here. :)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

>   drivers/media/common/videobuf2/Kconfig        |  1 -
>   drivers/media/common/videobuf2/Makefile       |  1 +
>   .../media/common/videobuf2}/frame_vector.c    |  2 +
>   drivers/media/platform/omap/Kconfig           |  1 -
>   include/linux/mm.h                            | 42 -------------------
>   include/media/videobuf2-core.h                | 42 +++++++++++++++++++
>   mm/Kconfig                                    |  3 --
>   mm/Makefile                                   |  1 -
>   8 files changed, 45 insertions(+), 48 deletions(-)
>   rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> 
> diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> index edbc99ebba87..d2223a12c95f 100644
> --- a/drivers/media/common/videobuf2/Kconfig
> +++ b/drivers/media/common/videobuf2/Kconfig
> @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
>   
>   config VIDEOBUF2_MEMOPS
>   	tristate
> -	select FRAME_VECTOR
>   
>   config VIDEOBUF2_DMA_CONTIG
>   	tristate
> diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> index 77bebe8b202f..54306f8d096c 100644
> --- a/drivers/media/common/videobuf2/Makefile
> +++ b/drivers/media/common/videobuf2/Makefile
> @@ -1,5 +1,6 @@
>   # SPDX-License-Identifier: GPL-2.0
>   videobuf2-common-objs := videobuf2-core.o
> +videobuf2-common-objs += frame_vector.o
>   
>   ifeq ($(CONFIG_TRACEPOINTS),y)
>     videobuf2-common-objs += vb2-trace.o
> diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> similarity index 99%
> rename from mm/frame_vector.c
> rename to drivers/media/common/videobuf2/frame_vector.c
> index 39db520a51dc..b95f4f371681 100644
> --- a/mm/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -8,6 +8,8 @@
>   #include <linux/pagemap.h>
>   #include <linux/sched.h>
>   
> +#include <media/videobuf2-core.h>
> +
>   /**
>    * get_vaddr_frames() - map virtual addresses to pfns
>    * @start:	starting user address
> diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> index f73b5893220d..de16de46c0f4 100644
> --- a/drivers/media/platform/omap/Kconfig
> +++ b/drivers/media/platform/omap/Kconfig
> @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
>   	depends on VIDEO_V4L2
>   	select VIDEOBUF2_DMA_CONTIG
>   	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> -	select FRAME_VECTOR
>   	help
>   	  V4L2 Display driver support for OMAP2/3 based boards.
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..acd60fbf1a5a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
>   int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>   			struct task_struct *task, bool bypass_rlim);
>   
> -/* Container for pinned pfns / pages */
> -struct frame_vector {
> -	unsigned int nr_allocated;	/* Number of frames we have space for */
> -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> -	bool got_ref;		/* Did we pin pages by getting page ref? */
> -	bool is_pfns;		/* Does array contain pages or pfns? */
> -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> -				 * pfns_vector_pages() or pfns_vector_pfns()
> -				 * for access */
> -};
> -
> -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> -void frame_vector_destroy(struct frame_vector *vec);
> -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> -		     unsigned int gup_flags, struct frame_vector *vec);
> -void put_vaddr_frames(struct frame_vector *vec);
> -int frame_vector_to_pages(struct frame_vector *vec);
> -void frame_vector_to_pfns(struct frame_vector *vec);
> -
> -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> -{
> -	return vec->nr_frames;
> -}
> -
> -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> -{
> -	if (vec->is_pfns) {
> -		int err = frame_vector_to_pages(vec);
> -
> -		if (err)
> -			return ERR_PTR(err);
> -	}
> -	return (struct page **)(vec->ptrs);
> -}
> -
> -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> -{
> -	if (!vec->is_pfns)
> -		frame_vector_to_pfns(vec);
> -	return (unsigned long *)(vec->ptrs);
> -}
> -
>   struct kvec;
>   int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>   			struct page **pages);
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index bbb3f26fbde9..a2e75ca0334f 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
>    */
>   unsigned int vb2_request_buffer_cnt(struct media_request *req);
>   
> +/* Container for pinned pfns / pages in frame_vector.c */
> +struct frame_vector {
> +	unsigned int nr_allocated;	/* Number of frames we have space for */
> +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> +	bool got_ref;		/* Did we pin pages by getting page ref? */
> +	bool is_pfns;		/* Does array contain pages or pfns? */
> +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> +				 * pfns_vector_pages() or pfns_vector_pfns()
> +				 * for access */
> +};
> +
> +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> +void frame_vector_destroy(struct frame_vector *vec);
> +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> +		     unsigned int gup_flags, struct frame_vector *vec);
> +void put_vaddr_frames(struct frame_vector *vec);
> +int frame_vector_to_pages(struct frame_vector *vec);
> +void frame_vector_to_pfns(struct frame_vector *vec);
> +
> +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> +{
> +	return vec->nr_frames;
> +}
> +
> +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> +{
> +	if (vec->is_pfns) {
> +		int err = frame_vector_to_pages(vec);
> +
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +	return (struct page **)(vec->ptrs);
> +}
> +
> +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> +{
> +	if (!vec->is_pfns)
> +		frame_vector_to_pfns(vec);
> +	return (unsigned long *)(vec->ptrs);
> +}
> +
>   #endif /* _MEDIA_VIDEOBUF2_CORE_H */
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 6c974888f86f..da6c943fe9f1 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
>   	  memory; i.e., memory that is only accessible from the device (or
>   	  group of devices). You likely also want to select HMM_MIRROR.
>   
> -config FRAME_VECTOR
> -	bool
> -
>   config ARCH_USES_HIGH_VMA_FLAGS
>   	bool
>   config ARCH_HAS_PKEYS
> diff --git a/mm/Makefile b/mm/Makefile
> index d5649f1c12c0..a025fd6c6afd 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
>   obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
>   obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
>   obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>   obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
>   obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
>   obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
> 


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 06/13] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-07 22:18     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 22:18 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Mauro Carvalho Chehab, Jérôme Glisse,
	Tomasz Figa, linux-mm, Kyungmin Park, Daniel Vetter,
	Andrew Morton, Marek Szyprowski, Dan Williams, linux-arm-kernel,
	linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> symbol from all over the tree (well just one place, somehow omap media
> driver still had this in its Kconfig, despite not using it).
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---

Failed to spot any problems here. :)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

>   drivers/media/common/videobuf2/Kconfig        |  1 -
>   drivers/media/common/videobuf2/Makefile       |  1 +
>   .../media/common/videobuf2}/frame_vector.c    |  2 +
>   drivers/media/platform/omap/Kconfig           |  1 -
>   include/linux/mm.h                            | 42 -------------------
>   include/media/videobuf2-core.h                | 42 +++++++++++++++++++
>   mm/Kconfig                                    |  3 --
>   mm/Makefile                                   |  1 -
>   8 files changed, 45 insertions(+), 48 deletions(-)
>   rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> 
> diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> index edbc99ebba87..d2223a12c95f 100644
> --- a/drivers/media/common/videobuf2/Kconfig
> +++ b/drivers/media/common/videobuf2/Kconfig
> @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
>   
>   config VIDEOBUF2_MEMOPS
>   	tristate
> -	select FRAME_VECTOR
>   
>   config VIDEOBUF2_DMA_CONTIG
>   	tristate
> diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> index 77bebe8b202f..54306f8d096c 100644
> --- a/drivers/media/common/videobuf2/Makefile
> +++ b/drivers/media/common/videobuf2/Makefile
> @@ -1,5 +1,6 @@
>   # SPDX-License-Identifier: GPL-2.0
>   videobuf2-common-objs := videobuf2-core.o
> +videobuf2-common-objs += frame_vector.o
>   
>   ifeq ($(CONFIG_TRACEPOINTS),y)
>     videobuf2-common-objs += vb2-trace.o
> diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> similarity index 99%
> rename from mm/frame_vector.c
> rename to drivers/media/common/videobuf2/frame_vector.c
> index 39db520a51dc..b95f4f371681 100644
> --- a/mm/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -8,6 +8,8 @@
>   #include <linux/pagemap.h>
>   #include <linux/sched.h>
>   
> +#include <media/videobuf2-core.h>
> +
>   /**
>    * get_vaddr_frames() - map virtual addresses to pfns
>    * @start:	starting user address
> diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> index f73b5893220d..de16de46c0f4 100644
> --- a/drivers/media/platform/omap/Kconfig
> +++ b/drivers/media/platform/omap/Kconfig
> @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
>   	depends on VIDEO_V4L2
>   	select VIDEOBUF2_DMA_CONTIG
>   	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> -	select FRAME_VECTOR
>   	help
>   	  V4L2 Display driver support for OMAP2/3 based boards.
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..acd60fbf1a5a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
>   int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>   			struct task_struct *task, bool bypass_rlim);
>   
> -/* Container for pinned pfns / pages */
> -struct frame_vector {
> -	unsigned int nr_allocated;	/* Number of frames we have space for */
> -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> -	bool got_ref;		/* Did we pin pages by getting page ref? */
> -	bool is_pfns;		/* Does array contain pages or pfns? */
> -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> -				 * pfns_vector_pages() or pfns_vector_pfns()
> -				 * for access */
> -};
> -
> -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> -void frame_vector_destroy(struct frame_vector *vec);
> -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> -		     unsigned int gup_flags, struct frame_vector *vec);
> -void put_vaddr_frames(struct frame_vector *vec);
> -int frame_vector_to_pages(struct frame_vector *vec);
> -void frame_vector_to_pfns(struct frame_vector *vec);
> -
> -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> -{
> -	return vec->nr_frames;
> -}
> -
> -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> -{
> -	if (vec->is_pfns) {
> -		int err = frame_vector_to_pages(vec);
> -
> -		if (err)
> -			return ERR_PTR(err);
> -	}
> -	return (struct page **)(vec->ptrs);
> -}
> -
> -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> -{
> -	if (!vec->is_pfns)
> -		frame_vector_to_pfns(vec);
> -	return (unsigned long *)(vec->ptrs);
> -}
> -
>   struct kvec;
>   int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>   			struct page **pages);
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index bbb3f26fbde9..a2e75ca0334f 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
>    */
>   unsigned int vb2_request_buffer_cnt(struct media_request *req);
>   
> +/* Container for pinned pfns / pages in frame_vector.c */
> +struct frame_vector {
> +	unsigned int nr_allocated;	/* Number of frames we have space for */
> +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> +	bool got_ref;		/* Did we pin pages by getting page ref? */
> +	bool is_pfns;		/* Does array contain pages or pfns? */
> +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> +				 * pfns_vector_pages() or pfns_vector_pfns()
> +				 * for access */
> +};
> +
> +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> +void frame_vector_destroy(struct frame_vector *vec);
> +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> +		     unsigned int gup_flags, struct frame_vector *vec);
> +void put_vaddr_frames(struct frame_vector *vec);
> +int frame_vector_to_pages(struct frame_vector *vec);
> +void frame_vector_to_pfns(struct frame_vector *vec);
> +
> +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> +{
> +	return vec->nr_frames;
> +}
> +
> +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> +{
> +	if (vec->is_pfns) {
> +		int err = frame_vector_to_pages(vec);
> +
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +	return (struct page **)(vec->ptrs);
> +}
> +
> +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> +{
> +	if (!vec->is_pfns)
> +		frame_vector_to_pfns(vec);
> +	return (unsigned long *)(vec->ptrs);
> +}
> +
>   #endif /* _MEDIA_VIDEOBUF2_CORE_H */
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 6c974888f86f..da6c943fe9f1 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
>   	  memory; i.e., memory that is only accessible from the device (or
>   	  group of devices). You likely also want to select HMM_MIRROR.
>   
> -config FRAME_VECTOR
> -	bool
> -
>   config ARCH_USES_HIGH_VMA_FLAGS
>   	bool
>   config ARCH_HAS_PKEYS
> diff --git a/mm/Makefile b/mm/Makefile
> index d5649f1c12c0..a025fd6c6afd 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
>   obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
>   obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
>   obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>   obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
>   obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
>   obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 06/13] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-07 22:18     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-07 22:18 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Mauro Carvalho Chehab, Jérôme Glisse,
	Tomasz Figa, linux-mm, Kyungmin Park, Daniel Vetter,
	Andrew Morton, Marek Szyprowski, Dan Williams, linux-arm-kernel,
	linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> symbol from all over the tree (well just one place, somehow omap media
> driver still had this in its Kconfig, despite not using it).
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---

Failed to spot any problems here. :)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

>   drivers/media/common/videobuf2/Kconfig        |  1 -
>   drivers/media/common/videobuf2/Makefile       |  1 +
>   .../media/common/videobuf2}/frame_vector.c    |  2 +
>   drivers/media/platform/omap/Kconfig           |  1 -
>   include/linux/mm.h                            | 42 -------------------
>   include/media/videobuf2-core.h                | 42 +++++++++++++++++++
>   mm/Kconfig                                    |  3 --
>   mm/Makefile                                   |  1 -
>   8 files changed, 45 insertions(+), 48 deletions(-)
>   rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> 
> diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> index edbc99ebba87..d2223a12c95f 100644
> --- a/drivers/media/common/videobuf2/Kconfig
> +++ b/drivers/media/common/videobuf2/Kconfig
> @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
>   
>   config VIDEOBUF2_MEMOPS
>   	tristate
> -	select FRAME_VECTOR
>   
>   config VIDEOBUF2_DMA_CONTIG
>   	tristate
> diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> index 77bebe8b202f..54306f8d096c 100644
> --- a/drivers/media/common/videobuf2/Makefile
> +++ b/drivers/media/common/videobuf2/Makefile
> @@ -1,5 +1,6 @@
>   # SPDX-License-Identifier: GPL-2.0
>   videobuf2-common-objs := videobuf2-core.o
> +videobuf2-common-objs += frame_vector.o
>   
>   ifeq ($(CONFIG_TRACEPOINTS),y)
>     videobuf2-common-objs += vb2-trace.o
> diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> similarity index 99%
> rename from mm/frame_vector.c
> rename to drivers/media/common/videobuf2/frame_vector.c
> index 39db520a51dc..b95f4f371681 100644
> --- a/mm/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -8,6 +8,8 @@
>   #include <linux/pagemap.h>
>   #include <linux/sched.h>
>   
> +#include <media/videobuf2-core.h>
> +
>   /**
>    * get_vaddr_frames() - map virtual addresses to pfns
>    * @start:	starting user address
> diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> index f73b5893220d..de16de46c0f4 100644
> --- a/drivers/media/platform/omap/Kconfig
> +++ b/drivers/media/platform/omap/Kconfig
> @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
>   	depends on VIDEO_V4L2
>   	select VIDEOBUF2_DMA_CONTIG
>   	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> -	select FRAME_VECTOR
>   	help
>   	  V4L2 Display driver support for OMAP2/3 based boards.
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..acd60fbf1a5a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
>   int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>   			struct task_struct *task, bool bypass_rlim);
>   
> -/* Container for pinned pfns / pages */
> -struct frame_vector {
> -	unsigned int nr_allocated;	/* Number of frames we have space for */
> -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> -	bool got_ref;		/* Did we pin pages by getting page ref? */
> -	bool is_pfns;		/* Does array contain pages or pfns? */
> -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> -				 * pfns_vector_pages() or pfns_vector_pfns()
> -				 * for access */
> -};
> -
> -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> -void frame_vector_destroy(struct frame_vector *vec);
> -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> -		     unsigned int gup_flags, struct frame_vector *vec);
> -void put_vaddr_frames(struct frame_vector *vec);
> -int frame_vector_to_pages(struct frame_vector *vec);
> -void frame_vector_to_pfns(struct frame_vector *vec);
> -
> -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> -{
> -	return vec->nr_frames;
> -}
> -
> -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> -{
> -	if (vec->is_pfns) {
> -		int err = frame_vector_to_pages(vec);
> -
> -		if (err)
> -			return ERR_PTR(err);
> -	}
> -	return (struct page **)(vec->ptrs);
> -}
> -
> -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> -{
> -	if (!vec->is_pfns)
> -		frame_vector_to_pfns(vec);
> -	return (unsigned long *)(vec->ptrs);
> -}
> -
>   struct kvec;
>   int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>   			struct page **pages);
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index bbb3f26fbde9..a2e75ca0334f 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
>    */
>   unsigned int vb2_request_buffer_cnt(struct media_request *req);
>   
> +/* Container for pinned pfns / pages in frame_vector.c */
> +struct frame_vector {
> +	unsigned int nr_allocated;	/* Number of frames we have space for */
> +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> +	bool got_ref;		/* Did we pin pages by getting page ref? */
> +	bool is_pfns;		/* Does array contain pages or pfns? */
> +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> +				 * pfns_vector_pages() or pfns_vector_pfns()
> +				 * for access */
> +};
> +
> +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> +void frame_vector_destroy(struct frame_vector *vec);
> +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> +		     unsigned int gup_flags, struct frame_vector *vec);
> +void put_vaddr_frames(struct frame_vector *vec);
> +int frame_vector_to_pages(struct frame_vector *vec);
> +void frame_vector_to_pfns(struct frame_vector *vec);
> +
> +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> +{
> +	return vec->nr_frames;
> +}
> +
> +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> +{
> +	if (vec->is_pfns) {
> +		int err = frame_vector_to_pages(vec);
> +
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +	return (struct page **)(vec->ptrs);
> +}
> +
> +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> +{
> +	if (!vec->is_pfns)
> +		frame_vector_to_pfns(vec);
> +	return (unsigned long *)(vec->ptrs);
> +}
> +
>   #endif /* _MEDIA_VIDEOBUF2_CORE_H */
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 6c974888f86f..da6c943fe9f1 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
>   	  memory; i.e., memory that is only accessible from the device (or
>   	  group of devices). You likely also want to select HMM_MIRROR.
>   
> -config FRAME_VECTOR
> -	bool
> -
>   config ARCH_USES_HIGH_VMA_FLAGS
>   	bool
>   config ARCH_HAS_PKEYS
> diff --git a/mm/Makefile b/mm/Makefile
> index d5649f1c12c0..a025fd6c6afd 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
>   obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
>   obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
>   obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>   obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
>   obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
>   obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 19:47       ` Daniel Vetter
  (?)
@ 2020-10-07 22:23         ` Dan Williams
  -1 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-07 22:23 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, Linux-media@vger.kernel.org, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	Linux PCI

On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > the default for all driver uses.
> > >
> > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > support. Let's plug that hole.
> >
> > Ooh, yes, lets.
> >
> > > For revoke_devmem() to work we need to link our vma into the same
> > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > at ->open time, but that's a bit tricky here with all the entry points
> > > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > I don't think you want to share the devmem inode for this, this should
> > be based off the sysfs inode which I believe there is already only one
> > instance per resource. In contrast /dev/mem can have multiple inodes
> > because anyone can just mknod a new character device file, the same
> > problem does not exist for sysfs.
>
> But then I need to find the right one, plus I also need to find the
> right one for the procfs side. That gets messy, and I already have no
> idea how to really test this. Shared address_space is the same trick
> we're using in drm (where we have multiple things all pointing to the
> same underlying resources, through different files), and it gets the
> job done. So that's why I figured the shared address_space is the
> cleaner solution since then unmap_mapping_range takes care of
> iterating over all vma for us. I guess I could reimplement that logic
> with our own locking and everything in revoke_devmem, but feels a bit
> silly. But it would also solve the problem of having mutliple
> different mknod of /dev/kmem with different address_space behind them.
> Also because of how remap_pfn_range works, all these vma do use the
> same pgoff already anyway.

True, remap_pfn_range() makes sure that ->pgoff is an absolute
physical address offset for all use cases. So you might be able to
just point proc_bus_pci_open() at the shared devmem address space. For
sysfs it's messier. I think you would need to somehow get the inode
from kernfs_fop_open() to adjust its address space, but only if the
bin_file will ultimately be used for PCI memory.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 22:23         ` Dan Williams
  0 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-07 22:23 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > the default for all driver uses.
> > >
> > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > support. Let's plug that hole.
> >
> > Ooh, yes, lets.
> >
> > > For revoke_devmem() to work we need to link our vma into the same
> > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > at ->open time, but that's a bit tricky here with all the entry points
> > > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > I don't think you want to share the devmem inode for this, this should
> > be based off the sysfs inode which I believe there is already only one
> > instance per resource. In contrast /dev/mem can have multiple inodes
> > because anyone can just mknod a new character device file, the same
> > problem does not exist for sysfs.
>
> But then I need to find the right one, plus I also need to find the
> right one for the procfs side. That gets messy, and I already have no
> idea how to really test this. Shared address_space is the same trick
> we're using in drm (where we have multiple things all pointing to the
> same underlying resources, through different files), and it gets the
> job done. So that's why I figured the shared address_space is the
> cleaner solution since then unmap_mapping_range takes care of
> iterating over all vma for us. I guess I could reimplement that logic
> with our own locking and everything in revoke_devmem, but feels a bit
> silly. But it would also solve the problem of having mutliple
> different mknod of /dev/kmem with different address_space behind them.
> Also because of how remap_pfn_range works, all these vma do use the
> same pgoff already anyway.

True, remap_pfn_range() makes sure that ->pgoff is an absolute
physical address offset for all use cases. So you might be able to
just point proc_bus_pci_open() at the shared devmem address space. For
sysfs it's messier. I think you would need to somehow get the inode
from kernfs_fop_open() to adjust its address space, but only if the
bin_file will ultimately be used for PCI memory.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 22:23         ` Dan Williams
  0 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-07 22:23 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > the default for all driver uses.
> > >
> > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > support. Let's plug that hole.
> >
> > Ooh, yes, lets.
> >
> > > For revoke_devmem() to work we need to link our vma into the same
> > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > at ->open time, but that's a bit tricky here with all the entry points
> > > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > I don't think you want to share the devmem inode for this, this should
> > be based off the sysfs inode which I believe there is already only one
> > instance per resource. In contrast /dev/mem can have multiple inodes
> > because anyone can just mknod a new character device file, the same
> > problem does not exist for sysfs.
>
> But then I need to find the right one, plus I also need to find the
> right one for the procfs side. That gets messy, and I already have no
> idea how to really test this. Shared address_space is the same trick
> we're using in drm (where we have multiple things all pointing to the
> same underlying resources, through different files), and it gets the
> job done. So that's why I figured the shared address_space is the
> cleaner solution since then unmap_mapping_range takes care of
> iterating over all vma for us. I guess I could reimplement that logic
> with our own locking and everything in revoke_devmem, but feels a bit
> silly. But it would also solve the problem of having mutliple
> different mknod of /dev/kmem with different address_space behind them.
> Also because of how remap_pfn_range works, all these vma do use the
> same pgoff already anyway.

True, remap_pfn_range() makes sure that ->pgoff is an absolute
physical address offset for all use cases. So you might be able to
just point proc_bus_pci_open() at the shared devmem address space. For
sysfs it's messier. I think you would need to somehow get the inode
from kernfs_fop_open() to adjust its address space, but only if the
bin_file will ultimately be used for PCI memory.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 22:23         ` Dan Williams
  (?)
@ 2020-10-07 22:29           ` Dan Williams
  -1 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-07 22:29 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, Linux-media@vger.kernel.org, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	Linux PCI

On Wed, Oct 7, 2020 at 3:23 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > >
> > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > the default for all driver uses.
> > > >
> > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > support. Let's plug that hole.
> > >
> > > Ooh, yes, lets.
> > >
> > > > For revoke_devmem() to work we need to link our vma into the same
> > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > >
> > > I don't think you want to share the devmem inode for this, this should
> > > be based off the sysfs inode which I believe there is already only one
> > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > because anyone can just mknod a new character device file, the same
> > > problem does not exist for sysfs.
> >
> > But then I need to find the right one, plus I also need to find the
> > right one for the procfs side. That gets messy, and I already have no
> > idea how to really test this. Shared address_space is the same trick
> > we're using in drm (where we have multiple things all pointing to the
> > same underlying resources, through different files), and it gets the
> > job done. So that's why I figured the shared address_space is the
> > cleaner solution since then unmap_mapping_range takes care of
> > iterating over all vma for us. I guess I could reimplement that logic
> > with our own locking and everything in revoke_devmem, but feels a bit
> > silly. But it would also solve the problem of having mutliple
> > different mknod of /dev/kmem with different address_space behind them.
> > Also because of how remap_pfn_range works, all these vma do use the
> > same pgoff already anyway.
>
> True, remap_pfn_range() makes sure that ->pgoff is an absolute
> physical address offset for all use cases. So you might be able to
> just point proc_bus_pci_open() at the shared devmem address space. For
> sysfs it's messier. I think you would need to somehow get the inode
> from kernfs_fop_open() to adjust its address space, but only if the
> bin_file will ultimately be used for PCI memory.

To me this seems like a new sysfs_create_bin_file() flavor that
registers the file with the common devmem address_space.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 22:29           ` Dan Williams
  0 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-07 22:29 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Wed, Oct 7, 2020 at 3:23 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > >
> > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > the default for all driver uses.
> > > >
> > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > support. Let's plug that hole.
> > >
> > > Ooh, yes, lets.
> > >
> > > > For revoke_devmem() to work we need to link our vma into the same
> > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > >
> > > I don't think you want to share the devmem inode for this, this should
> > > be based off the sysfs inode which I believe there is already only one
> > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > because anyone can just mknod a new character device file, the same
> > > problem does not exist for sysfs.
> >
> > But then I need to find the right one, plus I also need to find the
> > right one for the procfs side. That gets messy, and I already have no
> > idea how to really test this. Shared address_space is the same trick
> > we're using in drm (where we have multiple things all pointing to the
> > same underlying resources, through different files), and it gets the
> > job done. So that's why I figured the shared address_space is the
> > cleaner solution since then unmap_mapping_range takes care of
> > iterating over all vma for us. I guess I could reimplement that logic
> > with our own locking and everything in revoke_devmem, but feels a bit
> > silly. But it would also solve the problem of having mutliple
> > different mknod of /dev/kmem with different address_space behind them.
> > Also because of how remap_pfn_range works, all these vma do use the
> > same pgoff already anyway.
>
> True, remap_pfn_range() makes sure that ->pgoff is an absolute
> physical address offset for all use cases. So you might be able to
> just point proc_bus_pci_open() at the shared devmem address space. For
> sysfs it's messier. I think you would need to somehow get the inode
> from kernfs_fop_open() to adjust its address space, but only if the
> bin_file will ultimately be used for PCI memory.

To me this seems like a new sysfs_create_bin_file() flavor that
registers the file with the common devmem address_space.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 22:29           ` Dan Williams
  0 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-07 22:29 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Wed, Oct 7, 2020 at 3:23 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > >
> > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > the default for all driver uses.
> > > >
> > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > support. Let's plug that hole.
> > >
> > > Ooh, yes, lets.
> > >
> > > > For revoke_devmem() to work we need to link our vma into the same
> > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > >
> > > I don't think you want to share the devmem inode for this, this should
> > > be based off the sysfs inode which I believe there is already only one
> > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > because anyone can just mknod a new character device file, the same
> > > problem does not exist for sysfs.
> >
> > But then I need to find the right one, plus I also need to find the
> > right one for the procfs side. That gets messy, and I already have no
> > idea how to really test this. Shared address_space is the same trick
> > we're using in drm (where we have multiple things all pointing to the
> > same underlying resources, through different files), and it gets the
> > job done. So that's why I figured the shared address_space is the
> > cleaner solution since then unmap_mapping_range takes care of
> > iterating over all vma for us. I guess I could reimplement that logic
> > with our own locking and everything in revoke_devmem, but feels a bit
> > silly. But it would also solve the problem of having mutliple
> > different mknod of /dev/kmem with different address_space behind them.
> > Also because of how remap_pfn_range works, all these vma do use the
> > same pgoff already anyway.
>
> True, remap_pfn_range() makes sure that ->pgoff is an absolute
> physical address offset for all use cases. So you might be able to
> just point proc_bus_pci_open() at the shared devmem address space. For
> sysfs it's messier. I think you would need to somehow get the inode
> from kernfs_fop_open() to adjust its address space, but only if the
> bin_file will ultimately be used for PCI memory.

To me this seems like a new sysfs_create_bin_file() flavor that
registers the file with the common devmem address_space.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
  2020-10-07 18:01       ` Daniel Vetter
  (?)
@ 2020-10-07 23:21         ` Jason Gunthorpe
  -1 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 23:21 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Dan Williams, Kees Cook, Rik van Riel,
	Benjamin Herrensmidt, Dave Airlie, Hugh Dickins, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Daniel Vetter

On Wed, Oct 07, 2020 at 08:01:42PM +0200, Daniel Vetter wrote:
> I think it'd fix the bug, until someone wires ->access up for
> drivers/gpu, or the next subsystem. This is also just for ptrace, so
> we really don't care when we stall the vm badly and other silly
> things. So I figured the somewhat ugly, but full generic solution is
> the better one, so that people who want to be able to ptrace
> read/write their iomem mmaps can just sprinkle this wherever they feel
> like.
> 
> But yeah if we go with most minimal fix, i.e. only trying to fix the
> current users, then your thing should work and is simpler. But it
> leaves the door open for future problems.

The only other idea I had was to fully make the 'vma of __iomem
memory' some generic utility, completely take over the vm_ops.

We did something like this in RDMA, what I found was even just
implementing mmap() using the kernel helpers turned out to be pretty
tricky, many drivers did it wrong in small ways.

Jason

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-07 23:21         ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 23:21 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Dave Airlie, Benjamin Herrensmidt, LKML, DRI Development,
	Linux MM, Jérôme Glisse, Daniel Vetter, John Hubbard,
	Hugh Dickins, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 07, 2020 at 08:01:42PM +0200, Daniel Vetter wrote:
> I think it'd fix the bug, until someone wires ->access up for
> drivers/gpu, or the next subsystem. This is also just for ptrace, so
> we really don't care when we stall the vm badly and other silly
> things. So I figured the somewhat ugly, but full generic solution is
> the better one, so that people who want to be able to ptrace
> read/write their iomem mmaps can just sprinkle this wherever they feel
> like.
> 
> But yeah if we go with most minimal fix, i.e. only trying to fix the
> current users, then your thing should work and is simpler. But it
> leaves the door open for future problems.

The only other idea I had was to fully make the 'vma of __iomem
memory' some generic utility, completely take over the vm_ops.

We did something like this in RDMA, what I found was even just
implementing mmap() using the kernel helpers turned out to be pretty
tricky, many drivers did it wrong in small ways.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-07 23:21         ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 23:21 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Dave Airlie, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, John Hubbard,
	Hugh Dickins, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Oct 07, 2020 at 08:01:42PM +0200, Daniel Vetter wrote:
> I think it'd fix the bug, until someone wires ->access up for
> drivers/gpu, or the next subsystem. This is also just for ptrace, so
> we really don't care when we stall the vm badly and other silly
> things. So I figured the somewhat ugly, but full generic solution is
> the better one, so that people who want to be able to ptrace
> read/write their iomem mmaps can just sprinkle this wherever they feel
> like.
> 
> But yeah if we go with most minimal fix, i.e. only trying to fix the
> current users, then your thing should work and is simpler. But it
> leaves the door open for future problems.

The only other idea I had was to fully make the 'vma of __iomem
memory' some generic utility, completely take over the vm_ops.

We did something like this in RDMA, what I found was even just
implementing mmap() using the kernel helpers turned out to be pretty
tricky, many drivers did it wrong in small ways.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 19:33     ` Dan Williams
  (?)
@ 2020-10-07 23:24       ` Jason Gunthorpe
  -1 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 23:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: Daniel Vetter, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc, Linux-media@vger.kernel.org,
	linux-s390, Daniel Vetter, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	linux-pci

On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
> 
> Ooh, yes, lets.
> 
> >
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
> 
> I don't think you want to share the devmem inode for this, this should
> be based off the sysfs inode which I believe there is already only one
> instance per resource. In contrast /dev/mem can have multiple inodes
> because anyone can just mknod a new character device file, the same
> problem does not exist for sysfs.

The inode does not come from the filesystem char/mem.c creates a
singular anon inode in devmem_init_inode()

Seems OK to use this more widely, but it feels a bit weird to live in
char/memory.c.

This is what got me thinking maybe this needs to be a bit bigger
generic infrastructure - eg enter this scheme from fops mmap and
everything else is in mm/user_iomem.c

Jason

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 23:24       ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 23:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Daniel Vetter, linux-pci, LKML, DRI Development, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Andrew Morton, Linux ARM,
	Linux-media@vger.kernel.org

On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
> 
> Ooh, yes, lets.
> 
> >
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
> 
> I don't think you want to share the devmem inode for this, this should
> be based off the sysfs inode which I believe there is already only one
> instance per resource. In contrast /dev/mem can have multiple inodes
> because anyone can just mknod a new character device file, the same
> problem does not exist for sysfs.

The inode does not come from the filesystem char/mem.c creates a
singular anon inode in devmem_init_inode()

Seems OK to use this more widely, but it feels a bit weird to live in
char/memory.c.

This is what got me thinking maybe this needs to be a bit bigger
generic infrastructure - eg enter this scheme from fops mmap and
everything else is in mm/user_iomem.c

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-07 23:24       ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-07 23:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Daniel Vetter, linux-pci, LKML, DRI Development, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Andrew Morton, Linux ARM,
	Linux-media@vger.kernel.org

On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
> 
> Ooh, yes, lets.
> 
> >
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
> 
> I don't think you want to share the devmem inode for this, this should
> be based off the sysfs inode which I believe there is already only one
> instance per resource. In contrast /dev/mem can have multiple inodes
> because anyone can just mknod a new character device file, the same
> problem does not exist for sysfs.

The inode does not come from the filesystem char/mem.c creates a
singular anon inode in devmem_init_inode()

Seems OK to use this more widely, but it feels a bit weird to live in
char/memory.c.

This is what got me thinking maybe this needs to be a bit bigger
generic infrastructure - eg enter this scheme from fops mmap and
everything else is in mm/user_iomem.c

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-08  0:44     ` John Hubbard
  -1 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-08  0:44 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Jason Gunthorpe, Dan Williams, Kees Cook,
	Rik van Riel, Benjamin Herrensmidt, Dave Airlie, Hugh Dickins,
	Andrew Morton, Jérôme Glisse, Jan Kara, Daniel Vetter

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
>    ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to

s/carvetouts/carveouts/

>    cma regions. This means if we miss the unmap the pfn might contain
>    pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
>    iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
>    ("/dev/mem: Revoke mappings when a driver claims the region")

Thanks for putting these references into the log, it's very helpful.
...
> diff --git a/mm/memory.c b/mm/memory.c
> index fcfc4ca36eba..8d467e23b44e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
>   	return ret;
>   }
>   
> +/**
> + * generic_access_phys - generic implementation for iomem mmap access
> + * @vma: the vma to access
> + * @addr: userspace addres, not relative offset within @vma
> + * @buf: buffer to read/write
> + * @len: length of transfer
> + * @write: set to FOLL_WRITE when writing, otherwise reading
> + *
> + * This is a generic implementation for &vm_operations_struct.access for an
> + * iomem mapping. This callback is used by access_process_vm() when the @vma is
> + * not page based.
> + */
>   int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
>   			void *buf, int len, int write)
>   {
>   	resource_size_t phys_addr;
>   	unsigned long prot = 0;
>   	void __iomem *maddr;
> +	pte_t *ptep, pte;
> +	spinlock_t *ptl;
>   	int offset = addr & (PAGE_SIZE-1);
> +	int ret = -EINVAL;
> +
> +	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		return -EINVAL;
> +
> +retry:
> +	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> +		return -EINVAL;
> +	pte = *ptep;
> +	pte_unmap_unlock(ptep, ptl);
>   
> -	if (follow_phys(vma, addr, write, &prot, &phys_addr))
> +	prot = pgprot_val(pte_pgprot(pte));
> +	phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
> +
> +	if ((write & FOLL_WRITE) && !pte_write(pte))
>   		return -EINVAL;
>   
>   	maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
>   	if (!maddr)
>   		return -ENOMEM;
>   
> +	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> +		goto out_unmap;
> +
> +	if (pte_same(pte, *ptep)) {


The ioremap area is something I'm sorta new to, so a newbie question:
is it possible for the same pte to already be there, ever? If so, we
be stuck in an infinite loop here.  I'm sure that's not the case, but
it's not yet obvious to me why it's impossible. Resource reservations
maybe?


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-08  0:44     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-08  0:44 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Jason Gunthorpe, Dave Airlie, Benjamin Herrensmidt,
	linux-mm, Jérôme Glisse, Daniel Vetter, Hugh Dickins,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
>    ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to

s/carvetouts/carveouts/

>    cma regions. This means if we miss the unmap the pfn might contain
>    pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
>    iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
>    ("/dev/mem: Revoke mappings when a driver claims the region")

Thanks for putting these references into the log, it's very helpful.
...
> diff --git a/mm/memory.c b/mm/memory.c
> index fcfc4ca36eba..8d467e23b44e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
>   	return ret;
>   }
>   
> +/**
> + * generic_access_phys - generic implementation for iomem mmap access
> + * @vma: the vma to access
> + * @addr: userspace addres, not relative offset within @vma
> + * @buf: buffer to read/write
> + * @len: length of transfer
> + * @write: set to FOLL_WRITE when writing, otherwise reading
> + *
> + * This is a generic implementation for &vm_operations_struct.access for an
> + * iomem mapping. This callback is used by access_process_vm() when the @vma is
> + * not page based.
> + */
>   int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
>   			void *buf, int len, int write)
>   {
>   	resource_size_t phys_addr;
>   	unsigned long prot = 0;
>   	void __iomem *maddr;
> +	pte_t *ptep, pte;
> +	spinlock_t *ptl;
>   	int offset = addr & (PAGE_SIZE-1);
> +	int ret = -EINVAL;
> +
> +	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		return -EINVAL;
> +
> +retry:
> +	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> +		return -EINVAL;
> +	pte = *ptep;
> +	pte_unmap_unlock(ptep, ptl);
>   
> -	if (follow_phys(vma, addr, write, &prot, &phys_addr))
> +	prot = pgprot_val(pte_pgprot(pte));
> +	phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
> +
> +	if ((write & FOLL_WRITE) && !pte_write(pte))
>   		return -EINVAL;
>   
>   	maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
>   	if (!maddr)
>   		return -ENOMEM;
>   
> +	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> +		goto out_unmap;
> +
> +	if (pte_same(pte, *ptep)) {


The ioremap area is something I'm sorta new to, so a newbie question:
is it possible for the same pte to already be there, ever? If so, we
be stuck in an infinite loop here.  I'm sure that's not the case, but
it's not yet obvious to me why it's impossible. Resource reservations
maybe?


thanks,
-- 
John Hubbard
NVIDIA

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-08  0:44     ` John Hubbard
  0 siblings, 0 replies; 166+ messages in thread
From: John Hubbard @ 2020-10-08  0:44 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Jason Gunthorpe, Dave Airlie, linux-mm,
	Jérôme Glisse, Daniel Vetter, Hugh Dickins,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

On 10/7/20 9:44 AM, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
>    ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to

s/carvetouts/carveouts/

>    cma regions. This means if we miss the unmap the pfn might contain
>    pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
>    iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
>    ("/dev/mem: Revoke mappings when a driver claims the region")

Thanks for putting these references into the log, it's very helpful.
...
> diff --git a/mm/memory.c b/mm/memory.c
> index fcfc4ca36eba..8d467e23b44e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
>   	return ret;
>   }
>   
> +/**
> + * generic_access_phys - generic implementation for iomem mmap access
> + * @vma: the vma to access
> + * @addr: userspace addres, not relative offset within @vma
> + * @buf: buffer to read/write
> + * @len: length of transfer
> + * @write: set to FOLL_WRITE when writing, otherwise reading
> + *
> + * This is a generic implementation for &vm_operations_struct.access for an
> + * iomem mapping. This callback is used by access_process_vm() when the @vma is
> + * not page based.
> + */
>   int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
>   			void *buf, int len, int write)
>   {
>   	resource_size_t phys_addr;
>   	unsigned long prot = 0;
>   	void __iomem *maddr;
> +	pte_t *ptep, pte;
> +	spinlock_t *ptl;
>   	int offset = addr & (PAGE_SIZE-1);
> +	int ret = -EINVAL;
> +
> +	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		return -EINVAL;
> +
> +retry:
> +	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> +		return -EINVAL;
> +	pte = *ptep;
> +	pte_unmap_unlock(ptep, ptl);
>   
> -	if (follow_phys(vma, addr, write, &prot, &phys_addr))
> +	prot = pgprot_val(pte_pgprot(pte));
> +	phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
> +
> +	if ((write & FOLL_WRITE) && !pte_write(pte))
>   		return -EINVAL;
>   
>   	maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
>   	if (!maddr)
>   		return -ENOMEM;
>   
> +	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> +		goto out_unmap;
> +
> +	if (pte_same(pte, *ptep)) {


The ioremap area is something I'm sorta new to, so a newbie question:
is it possible for the same pte to already be there, ever? If so, we
be stuck in an infinite loop here.  I'm sure that's not the case, but
it's not yet obvious to me why it's impossible. Resource reservations
maybe?


thanks,
-- 
John Hubbard
NVIDIA
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
  2020-10-08  0:44     ` John Hubbard
  (?)
@ 2020-10-08  7:23       ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  7:23 UTC (permalink / raw)
  To: John Hubbard
  Cc: DRI Development, LKML, kvm, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Jason Gunthorpe, Dan Williams, Kees Cook,
	Rik van Riel, Benjamin Herrensmidt, Dave Airlie, Hugh Dickins,
	Andrew Morton, Jérôme Glisse, Jan Kara, Daniel Vetter

On Thu, Oct 8, 2020 at 2:44 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> >    ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
>
> s/carvetouts/carveouts/
>
> >    cma regions. This means if we miss the unmap the pfn might contain
> >    pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> >    iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> >    ("/dev/mem: Revoke mappings when a driver claims the region")
>
> Thanks for putting these references into the log, it's very helpful.
> ...
> > diff --git a/mm/memory.c b/mm/memory.c
> > index fcfc4ca36eba..8d467e23b44e 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
> >       return ret;
> >   }
> >
> > +/**
> > + * generic_access_phys - generic implementation for iomem mmap access
> > + * @vma: the vma to access
> > + * @addr: userspace addres, not relative offset within @vma
> > + * @buf: buffer to read/write
> > + * @len: length of transfer
> > + * @write: set to FOLL_WRITE when writing, otherwise reading
> > + *
> > + * This is a generic implementation for &vm_operations_struct.access for an
> > + * iomem mapping. This callback is used by access_process_vm() when the @vma is
> > + * not page based.
> > + */
> >   int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> >                       void *buf, int len, int write)
> >   {
> >       resource_size_t phys_addr;
> >       unsigned long prot = 0;
> >       void __iomem *maddr;
> > +     pte_t *ptep, pte;
> > +     spinlock_t *ptl;
> >       int offset = addr & (PAGE_SIZE-1);
> > +     int ret = -EINVAL;
> > +
> > +     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +             return -EINVAL;
> > +
> > +retry:
> > +     if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> > +             return -EINVAL;
> > +     pte = *ptep;
> > +     pte_unmap_unlock(ptep, ptl);
> >
> > -     if (follow_phys(vma, addr, write, &prot, &phys_addr))
> > +     prot = pgprot_val(pte_pgprot(pte));
> > +     phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
> > +
> > +     if ((write & FOLL_WRITE) && !pte_write(pte))
> >               return -EINVAL;
> >
> >       maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
> >       if (!maddr)
> >               return -ENOMEM;
> >
> > +     if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> > +             goto out_unmap;
> > +
> > +     if (pte_same(pte, *ptep)) {
>
>
> The ioremap area is something I'm sorta new to, so a newbie question:
> is it possible for the same pte to already be there, ever? If so, we
> be stuck in an infinite loop here.  I'm sure that's not the case, but
> it's not yet obvious to me why it's impossible. Resource reservations
> maybe?

It's just buggy, it should be !pte_same. And I need to figure out how
to test this I guess.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-08  7:23       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  7:23 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Jason Gunthorpe, Dave Airlie, Benjamin Herrensmidt, LKML,
	DRI Development, Linux MM, Jérôme Glisse,
	Daniel Vetter, Hugh Dickins, Dan Williams, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Oct 8, 2020 at 2:44 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> >    ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
>
> s/carvetouts/carveouts/
>
> >    cma regions. This means if we miss the unmap the pfn might contain
> >    pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> >    iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> >    ("/dev/mem: Revoke mappings when a driver claims the region")
>
> Thanks for putting these references into the log, it's very helpful.
> ...
> > diff --git a/mm/memory.c b/mm/memory.c
> > index fcfc4ca36eba..8d467e23b44e 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
> >       return ret;
> >   }
> >
> > +/**
> > + * generic_access_phys - generic implementation for iomem mmap access
> > + * @vma: the vma to access
> > + * @addr: userspace addres, not relative offset within @vma
> > + * @buf: buffer to read/write
> > + * @len: length of transfer
> > + * @write: set to FOLL_WRITE when writing, otherwise reading
> > + *
> > + * This is a generic implementation for &vm_operations_struct.access for an
> > + * iomem mapping. This callback is used by access_process_vm() when the @vma is
> > + * not page based.
> > + */
> >   int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> >                       void *buf, int len, int write)
> >   {
> >       resource_size_t phys_addr;
> >       unsigned long prot = 0;
> >       void __iomem *maddr;
> > +     pte_t *ptep, pte;
> > +     spinlock_t *ptl;
> >       int offset = addr & (PAGE_SIZE-1);
> > +     int ret = -EINVAL;
> > +
> > +     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +             return -EINVAL;
> > +
> > +retry:
> > +     if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> > +             return -EINVAL;
> > +     pte = *ptep;
> > +     pte_unmap_unlock(ptep, ptl);
> >
> > -     if (follow_phys(vma, addr, write, &prot, &phys_addr))
> > +     prot = pgprot_val(pte_pgprot(pte));
> > +     phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
> > +
> > +     if ((write & FOLL_WRITE) && !pte_write(pte))
> >               return -EINVAL;
> >
> >       maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
> >       if (!maddr)
> >               return -ENOMEM;
> >
> > +     if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> > +             goto out_unmap;
> > +
> > +     if (pte_same(pte, *ptep)) {
>
>
> The ioremap area is something I'm sorta new to, so a newbie question:
> is it possible for the same pte to already be there, ever? If so, we
> be stuck in an infinite loop here.  I'm sure that's not the case, but
> it's not yet obvious to me why it's impossible. Resource reservations
> maybe?

It's just buggy, it should be !pte_same. And I need to figure out how
to test this I guess.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 07/13] mm: close race in generic_access_phys
@ 2020-10-08  7:23       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  7:23 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Jason Gunthorpe, Dave Airlie, LKML, DRI Development,
	Linux MM, Jérôme Glisse, Daniel Vetter, Hugh Dickins,
	Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Oct 8, 2020 at 2:44 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> >    ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
>
> s/carvetouts/carveouts/
>
> >    cma regions. This means if we miss the unmap the pfn might contain
> >    pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> >    iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> >    ("/dev/mem: Revoke mappings when a driver claims the region")
>
> Thanks for putting these references into the log, it's very helpful.
> ...
> > diff --git a/mm/memory.c b/mm/memory.c
> > index fcfc4ca36eba..8d467e23b44e 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
> >       return ret;
> >   }
> >
> > +/**
> > + * generic_access_phys - generic implementation for iomem mmap access
> > + * @vma: the vma to access
> > + * @addr: userspace addres, not relative offset within @vma
> > + * @buf: buffer to read/write
> > + * @len: length of transfer
> > + * @write: set to FOLL_WRITE when writing, otherwise reading
> > + *
> > + * This is a generic implementation for &vm_operations_struct.access for an
> > + * iomem mapping. This callback is used by access_process_vm() when the @vma is
> > + * not page based.
> > + */
> >   int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> >                       void *buf, int len, int write)
> >   {
> >       resource_size_t phys_addr;
> >       unsigned long prot = 0;
> >       void __iomem *maddr;
> > +     pte_t *ptep, pte;
> > +     spinlock_t *ptl;
> >       int offset = addr & (PAGE_SIZE-1);
> > +     int ret = -EINVAL;
> > +
> > +     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +             return -EINVAL;
> > +
> > +retry:
> > +     if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> > +             return -EINVAL;
> > +     pte = *ptep;
> > +     pte_unmap_unlock(ptep, ptl);
> >
> > -     if (follow_phys(vma, addr, write, &prot, &phys_addr))
> > +     prot = pgprot_val(pte_pgprot(pte));
> > +     phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
> > +
> > +     if ((write & FOLL_WRITE) && !pte_write(pte))
> >               return -EINVAL;
> >
> >       maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
> >       if (!maddr)
> >               return -ENOMEM;
> >
> > +     if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
> > +             goto out_unmap;
> > +
> > +     if (pte_same(pte, *ptep)) {
>
>
> The ioremap area is something I'm sorta new to, so a newbie question:
> is it possible for the same pte to already be there, ever? If so, we
> be stuck in an infinite loop here.  I'm sure that's not the case, but
> it's not yet obvious to me why it's impossible. Resource reservations
> maybe?

It's just buggy, it should be !pte_same. And I need to figure out how
to test this I guess.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 23:24       ` Jason Gunthorpe
  (?)
@ 2020-10-08  7:31         ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  7:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dan Williams, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc, Linux-media@vger.kernel.org,
	linux-s390, Daniel Vetter, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	Linux PCI

On Thu, Oct 8, 2020 at 1:24 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > the default for all driver uses.
> > >
> > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > support. Let's plug that hole.
> >
> > Ooh, yes, lets.
> >
> > >
> > > For revoke_devmem() to work we need to link our vma into the same
> > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > at ->open time, but that's a bit tricky here with all the entry points
> > > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > I don't think you want to share the devmem inode for this, this should
> > be based off the sysfs inode which I believe there is already only one
> > instance per resource. In contrast /dev/mem can have multiple inodes
> > because anyone can just mknod a new character device file, the same
> > problem does not exist for sysfs.
>
> The inode does not come from the filesystem char/mem.c creates a
> singular anon inode in devmem_init_inode()
>
> Seems OK to use this more widely, but it feels a bit weird to live in
> char/memory.c.
>
> This is what got me thinking maybe this needs to be a bit bigger
> generic infrastructure - eg enter this scheme from fops mmap and
> everything else is in mm/user_iomem.c

Yeah moving it to iomem and renaming it to have an iomem_prefix
instead of devmem sounds like a good idea.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08  7:31         ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  7:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Dan Williams, Andrew Morton, Linux ARM,
	Linux-media@vger.kernel.org

On Thu, Oct 8, 2020 at 1:24 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > the default for all driver uses.
> > >
> > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > support. Let's plug that hole.
> >
> > Ooh, yes, lets.
> >
> > >
> > > For revoke_devmem() to work we need to link our vma into the same
> > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > at ->open time, but that's a bit tricky here with all the entry points
> > > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > I don't think you want to share the devmem inode for this, this should
> > be based off the sysfs inode which I believe there is already only one
> > instance per resource. In contrast /dev/mem can have multiple inodes
> > because anyone can just mknod a new character device file, the same
> > problem does not exist for sysfs.
>
> The inode does not come from the filesystem char/mem.c creates a
> singular anon inode in devmem_init_inode()
>
> Seems OK to use this more widely, but it feels a bit weird to live in
> char/memory.c.
>
> This is what got me thinking maybe this needs to be a bit bigger
> generic infrastructure - eg enter this scheme from fops mmap and
> everything else is in mm/user_iomem.c

Yeah moving it to iomem and renaming it to have an iomem_prefix
instead of devmem sounds like a good idea.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08  7:31         ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  7:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Dan Williams, Andrew Morton, Linux ARM,
	Linux-media@vger.kernel.org

On Thu, Oct 8, 2020 at 1:24 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > the default for all driver uses.
> > >
> > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > support. Let's plug that hole.
> >
> > Ooh, yes, lets.
> >
> > >
> > > For revoke_devmem() to work we need to link our vma into the same
> > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > at ->open time, but that's a bit tricky here with all the entry points
> > > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > I don't think you want to share the devmem inode for this, this should
> > be based off the sysfs inode which I believe there is already only one
> > instance per resource. In contrast /dev/mem can have multiple inodes
> > because anyone can just mknod a new character device file, the same
> > problem does not exist for sysfs.
>
> The inode does not come from the filesystem char/mem.c creates a
> singular anon inode in devmem_init_inode()
>
> Seems OK to use this more widely, but it feels a bit weird to live in
> char/memory.c.
>
> This is what got me thinking maybe this needs to be a bit bigger
> generic infrastructure - eg enter this scheme from fops mmap and
> everything else is in mm/user_iomem.c

Yeah moving it to iomem and renaming it to have an iomem_prefix
instead of devmem sounds like a good idea.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 23:24       ` Jason Gunthorpe
  (?)
@ 2020-10-08  7:49         ` Dan Williams
  -1 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-08  7:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc, Linux-media@vger.kernel.org,
	linux-s390, Daniel Vetter, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	Linux PCI

On Wed, Oct 7, 2020 at 4:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > the default for all driver uses.
> > >
> > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > support. Let's plug that hole.
> >
> > Ooh, yes, lets.
> >
> > >
> > > For revoke_devmem() to work we need to link our vma into the same
> > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > at ->open time, but that's a bit tricky here with all the entry points
> > > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > I don't think you want to share the devmem inode for this, this should
> > be based off the sysfs inode which I believe there is already only one
> > instance per resource. In contrast /dev/mem can have multiple inodes
> > because anyone can just mknod a new character device file, the same
> > problem does not exist for sysfs.
>
> The inode does not come from the filesystem char/mem.c creates a
> singular anon inode in devmem_init_inode()

That's not quite right, An inode does come from the filesystem I just
arranged for that inode's i_mapping to be set to a common instance.

> Seems OK to use this more widely, but it feels a bit weird to live in
> char/memory.c.

Sure, now that more users have arrived it should move somewhere common.

> This is what got me thinking maybe this needs to be a bit bigger
> generic infrastructure - eg enter this scheme from fops mmap and
> everything else is in mm/user_iomem.c

It still requires every file that can map physical memory to have its
->open fop do

       inode->i_mapping = devmem_inode->i_mapping;
       filp->f_mapping = inode->i_mapping;

I don't see how you can centralize that part.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08  7:49         ` Dan Williams
  0 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-08  7:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Daniel Vetter, Linux PCI, LKML, DRI Development, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Andrew Morton, Linux ARM,
	Linux-media@vger.kernel.org

On Wed, Oct 7, 2020 at 4:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > the default for all driver uses.
> > >
> > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > support. Let's plug that hole.
> >
> > Ooh, yes, lets.
> >
> > >
> > > For revoke_devmem() to work we need to link our vma into the same
> > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > at ->open time, but that's a bit tricky here with all the entry points
> > > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > I don't think you want to share the devmem inode for this, this should
> > be based off the sysfs inode which I believe there is already only one
> > instance per resource. In contrast /dev/mem can have multiple inodes
> > because anyone can just mknod a new character device file, the same
> > problem does not exist for sysfs.
>
> The inode does not come from the filesystem char/mem.c creates a
> singular anon inode in devmem_init_inode()

That's not quite right, An inode does come from the filesystem I just
arranged for that inode's i_mapping to be set to a common instance.

> Seems OK to use this more widely, but it feels a bit weird to live in
> char/memory.c.

Sure, now that more users have arrived it should move somewhere common.

> This is what got me thinking maybe this needs to be a bit bigger
> generic infrastructure - eg enter this scheme from fops mmap and
> everything else is in mm/user_iomem.c

It still requires every file that can map physical memory to have its
->open fop do

       inode->i_mapping = devmem_inode->i_mapping;
       filp->f_mapping = inode->i_mapping;

I don't see how you can centralize that part.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08  7:49         ` Dan Williams
  0 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-08  7:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Daniel Vetter, Linux PCI, LKML, DRI Development, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Andrew Morton, Linux ARM,
	Linux-media@vger.kernel.org

On Wed, Oct 7, 2020 at 4:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > the default for all driver uses.
> > >
> > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > support. Let's plug that hole.
> >
> > Ooh, yes, lets.
> >
> > >
> > > For revoke_devmem() to work we need to link our vma into the same
> > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > at ->open time, but that's a bit tricky here with all the entry points
> > > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > I don't think you want to share the devmem inode for this, this should
> > be based off the sysfs inode which I believe there is already only one
> > instance per resource. In contrast /dev/mem can have multiple inodes
> > because anyone can just mknod a new character device file, the same
> > problem does not exist for sysfs.
>
> The inode does not come from the filesystem char/mem.c creates a
> singular anon inode in devmem_init_inode()

That's not quite right, An inode does come from the filesystem I just
arranged for that inode's i_mapping to be set to a common instance.

> Seems OK to use this more widely, but it feels a bit weird to live in
> char/memory.c.

Sure, now that more users have arrived it should move somewhere common.

> This is what got me thinking maybe this needs to be a bit bigger
> generic infrastructure - eg enter this scheme from fops mmap and
> everything else is in mm/user_iomem.c

It still requires every file that can map physical memory to have its
->open fop do

       inode->i_mapping = devmem_inode->i_mapping;
       filp->f_mapping = inode->i_mapping;

I don't see how you can centralize that part.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-07 22:29           ` Dan Williams
  (?)
@ 2020-10-08  8:09             ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  8:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, Linux-media@vger.kernel.org, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	Linux PCI

On Thu, Oct 8, 2020 at 12:29 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 3:23 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > >
> > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > > the default for all driver uses.
> > > > >
> > > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > > support. Let's plug that hole.
> > > >
> > > > Ooh, yes, lets.
> > > >
> > > > > For revoke_devmem() to work we need to link our vma into the same
> > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > > >
> > > > I don't think you want to share the devmem inode for this, this should
> > > > be based off the sysfs inode which I believe there is already only one
> > > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > > because anyone can just mknod a new character device file, the same
> > > > problem does not exist for sysfs.
> > >
> > > But then I need to find the right one, plus I also need to find the
> > > right one for the procfs side. That gets messy, and I already have no
> > > idea how to really test this. Shared address_space is the same trick
> > > we're using in drm (where we have multiple things all pointing to the
> > > same underlying resources, through different files), and it gets the
> > > job done. So that's why I figured the shared address_space is the
> > > cleaner solution since then unmap_mapping_range takes care of
> > > iterating over all vma for us. I guess I could reimplement that logic
> > > with our own locking and everything in revoke_devmem, but feels a bit
> > > silly. But it would also solve the problem of having mutliple
> > > different mknod of /dev/kmem with different address_space behind them.
> > > Also because of how remap_pfn_range works, all these vma do use the
> > > same pgoff already anyway.
> >
> > True, remap_pfn_range() makes sure that ->pgoff is an absolute
> > physical address offset for all use cases. So you might be able to
> > just point proc_bus_pci_open() at the shared devmem address space. For
> > sysfs it's messier. I think you would need to somehow get the inode
> > from kernfs_fop_open() to adjust its address space, but only if the
> > bin_file will ultimately be used for PCI memory.

Just read the code  a bit more, and for proc it's impossible. There's
only a single file, and before you mmap it you have to call a few
ioctl to select the right pci resource on that device you want to
mmap. Which includes legacy ioport stuff, and at least for now those
don't get revoked (maybe they should, but I'm looking at iomem here
now). Setting the mapping too early in ->open means that on
architectures which can do ioport as mmaps (not many, but powerpc is
among them) we'd shoot down these mmaps too.

Looking at the code there's the generic implementation, which consults
pci_iobar_pfn. And the only other implementation for sparc looks
similar, they separate iomem vs ioport through different pfn. So I
think this should indeed work.

> To me this seems like a new sysfs_create_bin_file() flavor that
> registers the file with the common devmem address_space.

Hm I think we could just add a i_mapping member to bin_attributes and
let the normal open code set that up for us. That should work.
mmapable binary sysfs file is already a similar special case.
-Daniel




--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08  8:09             ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  8:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Thu, Oct 8, 2020 at 12:29 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 3:23 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > >
> > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > > the default for all driver uses.
> > > > >
> > > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > > support. Let's plug that hole.
> > > >
> > > > Ooh, yes, lets.
> > > >
> > > > > For revoke_devmem() to work we need to link our vma into the same
> > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > > >
> > > > I don't think you want to share the devmem inode for this, this should
> > > > be based off the sysfs inode which I believe there is already only one
> > > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > > because anyone can just mknod a new character device file, the same
> > > > problem does not exist for sysfs.
> > >
> > > But then I need to find the right one, plus I also need to find the
> > > right one for the procfs side. That gets messy, and I already have no
> > > idea how to really test this. Shared address_space is the same trick
> > > we're using in drm (where we have multiple things all pointing to the
> > > same underlying resources, through different files), and it gets the
> > > job done. So that's why I figured the shared address_space is the
> > > cleaner solution since then unmap_mapping_range takes care of
> > > iterating over all vma for us. I guess I could reimplement that logic
> > > with our own locking and everything in revoke_devmem, but feels a bit
> > > silly. But it would also solve the problem of having mutliple
> > > different mknod of /dev/kmem with different address_space behind them.
> > > Also because of how remap_pfn_range works, all these vma do use the
> > > same pgoff already anyway.
> >
> > True, remap_pfn_range() makes sure that ->pgoff is an absolute
> > physical address offset for all use cases. So you might be able to
> > just point proc_bus_pci_open() at the shared devmem address space. For
> > sysfs it's messier. I think you would need to somehow get the inode
> > from kernfs_fop_open() to adjust its address space, but only if the
> > bin_file will ultimately be used for PCI memory.

Just read the code  a bit more, and for proc it's impossible. There's
only a single file, and before you mmap it you have to call a few
ioctl to select the right pci resource on that device you want to
mmap. Which includes legacy ioport stuff, and at least for now those
don't get revoked (maybe they should, but I'm looking at iomem here
now). Setting the mapping too early in ->open means that on
architectures which can do ioport as mmaps (not many, but powerpc is
among them) we'd shoot down these mmaps too.

Looking at the code there's the generic implementation, which consults
pci_iobar_pfn. And the only other implementation for sparc looks
similar, they separate iomem vs ioport through different pfn. So I
think this should indeed work.

> To me this seems like a new sysfs_create_bin_file() flavor that
> registers the file with the common devmem address_space.

Hm I think we could just add a i_mapping member to bin_attributes and
let the normal open code set that up for us. That should work.
mmapable binary sysfs file is already a similar special case.
-Daniel




--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08  8:09             ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  8:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Thu, Oct 8, 2020 at 12:29 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 3:23 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > On Wed, Oct 7, 2020 at 9:33 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > >
> > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > > the default for all driver uses.
> > > > >
> > > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > > support. Let's plug that hole.
> > > >
> > > > Ooh, yes, lets.
> > > >
> > > > > For revoke_devmem() to work we need to link our vma into the same
> > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > > >
> > > > I don't think you want to share the devmem inode for this, this should
> > > > be based off the sysfs inode which I believe there is already only one
> > > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > > because anyone can just mknod a new character device file, the same
> > > > problem does not exist for sysfs.
> > >
> > > But then I need to find the right one, plus I also need to find the
> > > right one for the procfs side. That gets messy, and I already have no
> > > idea how to really test this. Shared address_space is the same trick
> > > we're using in drm (where we have multiple things all pointing to the
> > > same underlying resources, through different files), and it gets the
> > > job done. So that's why I figured the shared address_space is the
> > > cleaner solution since then unmap_mapping_range takes care of
> > > iterating over all vma for us. I guess I could reimplement that logic
> > > with our own locking and everything in revoke_devmem, but feels a bit
> > > silly. But it would also solve the problem of having mutliple
> > > different mknod of /dev/kmem with different address_space behind them.
> > > Also because of how remap_pfn_range works, all these vma do use the
> > > same pgoff already anyway.
> >
> > True, remap_pfn_range() makes sure that ->pgoff is an absolute
> > physical address offset for all use cases. So you might be able to
> > just point proc_bus_pci_open() at the shared devmem address space. For
> > sysfs it's messier. I think you would need to somehow get the inode
> > from kernfs_fop_open() to adjust its address space, but only if the
> > bin_file will ultimately be used for PCI memory.

Just read the code  a bit more, and for proc it's impossible. There's
only a single file, and before you mmap it you have to call a few
ioctl to select the right pci resource on that device you want to
mmap. Which includes legacy ioport stuff, and at least for now those
don't get revoked (maybe they should, but I'm looking at iomem here
now). Setting the mapping too early in ->open means that on
architectures which can do ioport as mmaps (not many, but powerpc is
among them) we'd shoot down these mmaps too.

Looking at the code there's the generic implementation, which consults
pci_iobar_pfn. And the only other implementation for sparc looks
similar, they separate iomem vs ioport through different pfn. So I
think this should indeed work.

> To me this seems like a new sysfs_create_bin_file() flavor that
> registers the file with the common devmem address_space.

Hm I think we could just add a i_mapping member to bin_attributes and
let the normal open code set that up for us. That should work.
mmapable binary sysfs file is already a similar special case.
-Daniel




--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-08  7:49         ` Dan Williams
  (?)
@ 2020-10-08  8:13           ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  8:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jason Gunthorpe, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc, Linux-media@vger.kernel.org,
	linux-s390, Daniel Vetter, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	Linux PCI

On Thu, Oct 8, 2020 at 9:50 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 4:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > >
> > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > the default for all driver uses.
> > > >
> > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > support. Let's plug that hole.
> > >
> > > Ooh, yes, lets.
> > >
> > > >
> > > > For revoke_devmem() to work we need to link our vma into the same
> > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > >
> > > I don't think you want to share the devmem inode for this, this should
> > > be based off the sysfs inode which I believe there is already only one
> > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > because anyone can just mknod a new character device file, the same
> > > problem does not exist for sysfs.
> >
> > The inode does not come from the filesystem char/mem.c creates a
> > singular anon inode in devmem_init_inode()
>
> That's not quite right, An inode does come from the filesystem I just
> arranged for that inode's i_mapping to be set to a common instance.
>
> > Seems OK to use this more widely, but it feels a bit weird to live in
> > char/memory.c.
>
> Sure, now that more users have arrived it should move somewhere common.
>
> > This is what got me thinking maybe this needs to be a bit bigger
> > generic infrastructure - eg enter this scheme from fops mmap and
> > everything else is in mm/user_iomem.c
>
> It still requires every file that can map physical memory to have its
> ->open fop do
>
>        inode->i_mapping = devmem_inode->i_mapping;
>        filp->f_mapping = inode->i_mapping;
>
> I don't see how you can centralize that part.

btw, why are you setting inode->i_mapping? The inode is already
published, changing that looks risky. And I don't think it's needed,
vma_link() only looks at filp->f_mapping, and in our drm_open() we
only set that one.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08  8:13           ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  8:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Thu, Oct 8, 2020 at 9:50 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 4:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > >
> > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > the default for all driver uses.
> > > >
> > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > support. Let's plug that hole.
> > >
> > > Ooh, yes, lets.
> > >
> > > >
> > > > For revoke_devmem() to work we need to link our vma into the same
> > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > >
> > > I don't think you want to share the devmem inode for this, this should
> > > be based off the sysfs inode which I believe there is already only one
> > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > because anyone can just mknod a new character device file, the same
> > > problem does not exist for sysfs.
> >
> > The inode does not come from the filesystem char/mem.c creates a
> > singular anon inode in devmem_init_inode()
>
> That's not quite right, An inode does come from the filesystem I just
> arranged for that inode's i_mapping to be set to a common instance.
>
> > Seems OK to use this more widely, but it feels a bit weird to live in
> > char/memory.c.
>
> Sure, now that more users have arrived it should move somewhere common.
>
> > This is what got me thinking maybe this needs to be a bit bigger
> > generic infrastructure - eg enter this scheme from fops mmap and
> > everything else is in mm/user_iomem.c
>
> It still requires every file that can map physical memory to have its
> ->open fop do
>
>        inode->i_mapping = devmem_inode->i_mapping;
>        filp->f_mapping = inode->i_mapping;
>
> I don't see how you can centralize that part.

btw, why are you setting inode->i_mapping? The inode is already
published, changing that looks risky. And I don't think it's needed,
vma_link() only looks at filp->f_mapping, and in our drm_open() we
only set that one.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08  8:13           ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08  8:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Thu, Oct 8, 2020 at 9:50 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Oct 7, 2020 at 4:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > >
> > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > the default for all driver uses.
> > > >
> > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > support. Let's plug that hole.
> > >
> > > Ooh, yes, lets.
> > >
> > > >
> > > > For revoke_devmem() to work we need to link our vma into the same
> > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > >
> > > I don't think you want to share the devmem inode for this, this should
> > > be based off the sysfs inode which I believe there is already only one
> > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > because anyone can just mknod a new character device file, the same
> > > problem does not exist for sysfs.
> >
> > The inode does not come from the filesystem char/mem.c creates a
> > singular anon inode in devmem_init_inode()
>
> That's not quite right, An inode does come from the filesystem I just
> arranged for that inode's i_mapping to be set to a common instance.
>
> > Seems OK to use this more widely, but it feels a bit weird to live in
> > char/memory.c.
>
> Sure, now that more users have arrived it should move somewhere common.
>
> > This is what got me thinking maybe this needs to be a bit bigger
> > generic infrastructure - eg enter this scheme from fops mmap and
> > everything else is in mm/user_iomem.c
>
> It still requires every file that can map physical memory to have its
> ->open fop do
>
>        inode->i_mapping = devmem_inode->i_mapping;
>        filp->f_mapping = inode->i_mapping;
>
> I don't see how you can centralize that part.

btw, why are you setting inode->i_mapping? The inode is already
published, changing that looks risky. And I don't think it's needed,
vma_link() only looks at filp->f_mapping, and in our drm_open() we
only set that one.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-08  8:13           ` Daniel Vetter
  (?)
@ 2020-10-08  8:35             ` Dan Williams
  -1 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-08  8:35 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc, Linux-media@vger.kernel.org,
	linux-s390, Daniel Vetter, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	Linux PCI

On Thu, Oct 8, 2020 at 1:13 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Thu, Oct 8, 2020 at 9:50 AM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Wed, Oct 7, 2020 at 4:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > >
> > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > > the default for all driver uses.
> > > > >
> > > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > > support. Let's plug that hole.
> > > >
> > > > Ooh, yes, lets.
> > > >
> > > > >
> > > > > For revoke_devmem() to work we need to link our vma into the same
> > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > > >
> > > > I don't think you want to share the devmem inode for this, this should
> > > > be based off the sysfs inode which I believe there is already only one
> > > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > > because anyone can just mknod a new character device file, the same
> > > > problem does not exist for sysfs.
> > >
> > > The inode does not come from the filesystem char/mem.c creates a
> > > singular anon inode in devmem_init_inode()
> >
> > That's not quite right, An inode does come from the filesystem I just
> > arranged for that inode's i_mapping to be set to a common instance.
> >
> > > Seems OK to use this more widely, but it feels a bit weird to live in
> > > char/memory.c.
> >
> > Sure, now that more users have arrived it should move somewhere common.
> >
> > > This is what got me thinking maybe this needs to be a bit bigger
> > > generic infrastructure - eg enter this scheme from fops mmap and
> > > everything else is in mm/user_iomem.c
> >
> > It still requires every file that can map physical memory to have its
> > ->open fop do
> >
> >        inode->i_mapping = devmem_inode->i_mapping;
> >        filp->f_mapping = inode->i_mapping;
> >
> > I don't see how you can centralize that part.
>
> btw, why are you setting inode->i_mapping? The inode is already
> published, changing that looks risky. And I don't think it's needed,
> vma_link() only looks at filp->f_mapping, and in our drm_open() we
> only set that one.

I think you're right it is unnecessary for devmem, but I don't think
it's dangerous to do it from the very first open before anything is
using the address space. It's copy-paste from what all the other
"shared address space" implementers do. For example, block-devices in
bd_acquire(). However, the rationale for block_devices to do it is so
that page cache pages can be associated with the address space in the
absence of an f_mapping. Without filesystem page writeback to
coordinate I don't see any devmem code paths that would operate on the
inode->i_mapping.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08  8:35             ` Dan Williams
  0 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-08  8:35 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Thu, Oct 8, 2020 at 1:13 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Thu, Oct 8, 2020 at 9:50 AM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Wed, Oct 7, 2020 at 4:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > >
> > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > > the default for all driver uses.
> > > > >
> > > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > > support. Let's plug that hole.
> > > >
> > > > Ooh, yes, lets.
> > > >
> > > > >
> > > > > For revoke_devmem() to work we need to link our vma into the same
> > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > > >
> > > > I don't think you want to share the devmem inode for this, this should
> > > > be based off the sysfs inode which I believe there is already only one
> > > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > > because anyone can just mknod a new character device file, the same
> > > > problem does not exist for sysfs.
> > >
> > > The inode does not come from the filesystem char/mem.c creates a
> > > singular anon inode in devmem_init_inode()
> >
> > That's not quite right, An inode does come from the filesystem I just
> > arranged for that inode's i_mapping to be set to a common instance.
> >
> > > Seems OK to use this more widely, but it feels a bit weird to live in
> > > char/memory.c.
> >
> > Sure, now that more users have arrived it should move somewhere common.
> >
> > > This is what got me thinking maybe this needs to be a bit bigger
> > > generic infrastructure - eg enter this scheme from fops mmap and
> > > everything else is in mm/user_iomem.c
> >
> > It still requires every file that can map physical memory to have its
> > ->open fop do
> >
> >        inode->i_mapping = devmem_inode->i_mapping;
> >        filp->f_mapping = inode->i_mapping;
> >
> > I don't see how you can centralize that part.
>
> btw, why are you setting inode->i_mapping? The inode is already
> published, changing that looks risky. And I don't think it's needed,
> vma_link() only looks at filp->f_mapping, and in our drm_open() we
> only set that one.

I think you're right it is unnecessary for devmem, but I don't think
it's dangerous to do it from the very first open before anything is
using the address space. It's copy-paste from what all the other
"shared address space" implementers do. For example, block-devices in
bd_acquire(). However, the rationale for block_devices to do it is so
that page cache pages can be associated with the address space in the
absence of an f_mapping. Without filesystem page writeback to
coordinate I don't see any devmem code paths that would operate on the
inode->i_mapping.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08  8:35             ` Dan Williams
  0 siblings, 0 replies; 166+ messages in thread
From: Dan Williams @ 2020-10-08  8:35 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Linux PCI, Bjorn Helgaas, Daniel Vetter,
	Andrew Morton, Linux ARM, Linux-media@vger.kernel.org

On Thu, Oct 8, 2020 at 1:13 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Thu, Oct 8, 2020 at 9:50 AM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Wed, Oct 7, 2020 at 4:25 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > >
> > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > > the default for all driver uses.
> > > > >
> > > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > > support. Let's plug that hole.
> > > >
> > > > Ooh, yes, lets.
> > > >
> > > > >
> > > > > For revoke_devmem() to work we need to link our vma into the same
> > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > > >
> > > > I don't think you want to share the devmem inode for this, this should
> > > > be based off the sysfs inode which I believe there is already only one
> > > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > > because anyone can just mknod a new character device file, the same
> > > > problem does not exist for sysfs.
> > >
> > > The inode does not come from the filesystem char/mem.c creates a
> > > singular anon inode in devmem_init_inode()
> >
> > That's not quite right, An inode does come from the filesystem I just
> > arranged for that inode's i_mapping to be set to a common instance.
> >
> > > Seems OK to use this more widely, but it feels a bit weird to live in
> > > char/memory.c.
> >
> > Sure, now that more users have arrived it should move somewhere common.
> >
> > > This is what got me thinking maybe this needs to be a bit bigger
> > > generic infrastructure - eg enter this scheme from fops mmap and
> > > everything else is in mm/user_iomem.c
> >
> > It still requires every file that can map physical memory to have its
> > ->open fop do
> >
> >        inode->i_mapping = devmem_inode->i_mapping;
> >        filp->f_mapping = inode->i_mapping;
> >
> > I don't see how you can centralize that part.
>
> btw, why are you setting inode->i_mapping? The inode is already
> published, changing that looks risky. And I don't think it's needed,
> vma_link() only looks at filp->f_mapping, and in our drm_open() we
> only set that one.

I think you're right it is unnecessary for devmem, but I don't think
it's dangerous to do it from the very first open before anything is
using the address space. It's copy-paste from what all the other
"shared address space" implementers do. For example, block-devices in
bd_acquire(). However, the rationale for block_devices to do it is so
that page cache pages can be associated with the address space in the
absence of an f_mapping. Without filesystem page writeback to
coordinate I don't see any devmem code paths that would operate on the
inode->i_mapping.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
  2020-10-08  7:49         ` Dan Williams
  (?)
@ 2020-10-08 12:41           ` Jason Gunthorpe
  -1 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-08 12:41 UTC (permalink / raw)
  To: Dan Williams
  Cc: Daniel Vetter, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc, Linux-media@vger.kernel.org,
	linux-s390, Daniel Vetter, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	Linux PCI

On Thu, Oct 08, 2020 at 12:49:54AM -0700, Dan Williams wrote:

> > This is what got me thinking maybe this needs to be a bit bigger
> > generic infrastructure - eg enter this scheme from fops mmap and
> > everything else is in mm/user_iomem.c
> 
> It still requires every file that can map physical memory to have its
> ->open fop do

Common infrastructure would have to create a dummy struct file at mmap
time with the global inode and attach that to the VMA.

Jason

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08 12:41           ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-08 12:41 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Daniel Vetter, Linux PCI, LKML, DRI Development, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Andrew Morton, Linux ARM,
	Linux-media@vger.kernel.org

On Thu, Oct 08, 2020 at 12:49:54AM -0700, Dan Williams wrote:

> > This is what got me thinking maybe this needs to be a bit bigger
> > generic infrastructure - eg enter this scheme from fops mmap and
> > everything else is in mm/user_iomem.c
> 
> It still requires every file that can map physical memory to have its
> ->open fop do

Common infrastructure would have to create a dummy struct file at mmap
time with the global inode and attach that to the VMA.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 10/13] PCI: revoke mappings like devmem
@ 2020-10-08 12:41           ` Jason Gunthorpe
  0 siblings, 0 replies; 166+ messages in thread
From: Jason Gunthorpe @ 2020-10-08 12:41 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Daniel Vetter, Linux PCI, LKML, DRI Development, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Andrew Morton, Linux ARM,
	Linux-media@vger.kernel.org

On Thu, Oct 08, 2020 at 12:49:54AM -0700, Dan Williams wrote:

> > This is what got me thinking maybe this needs to be a bit bigger
> > generic infrastructure - eg enter this scheme from fops mmap and
> > everything else is in mm/user_iomem.c
> 
> It still requires every file that can map physical memory to have its
> ->open fop do

Common infrastructure would have to create a dummy struct file at mmap
time with the global inode and attach that to the VMA.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 08/13] s390/pci: Remove races against pte updates
  2020-10-07 16:44   ` Daniel Vetter
  (?)
@ 2020-10-08 16:44     ` Gerald Schaefer
  -1 siblings, 0 replies; 166+ messages in thread
From: Gerald Schaefer @ 2020-10-08 16:44 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Dan Williams, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Niklas Schnelle

On Wed,  7 Oct 2020 18:44:21 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea. Fix this.
> 
> Since zpci_memcpy_from|toio seems to not do anything nefarious with
> locks we just need to open code get_pfn and follow_pfn and make sure
> we drop the locks only after we've done. The write function also needs
> the copy_from_user move, since we can't take userspace faults while
> holding the mmap sem.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> ---
>  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
>  1 file changed, 57 insertions(+), 41 deletions(-)

Looks good, thanks. Also survived some basic function test. Only some
minor nitpick, see below.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>

> 
> diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> index 401cf670a243..4d194cb09372 100644
> --- a/arch/s390/pci/pci_mmio.c
> +++ b/arch/s390/pci/pci_mmio.c
> @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
>  	return rc;
>  }
>  
> -static long get_pfn(unsigned long user_addr, unsigned long access,
> -		    unsigned long *pfn)
> -{
> -	struct vm_area_struct *vma;
> -	long ret;
> -
> -	mmap_read_lock(current->mm);
> -	ret = -EINVAL;
> -	vma = find_vma(current->mm, user_addr);
> -	if (!vma)
> -		goto out;
> -	ret = -EACCES;
> -	if (!(vma->vm_flags & access))
> -		goto out;
> -	ret = follow_pfn(vma, user_addr, pfn);
> -out:
> -	mmap_read_unlock(current->mm);
> -	return ret;
> -}
> -
>  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>  		const void __user *, user_buffer, size_t, length)
>  {
>  	u8 local_buf[64];
>  	void __iomem *io_addr;
>  	void *buf;
> -	unsigned long pfn;
> +	struct vm_area_struct *vma;
> +	pte_t *ptep;
> +	spinlock_t *ptl;
>  	long ret;
>  
>  	if (!zpci_is_enabled())
> @@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>  	 * We only support write access to MIO capable devices if we are on
>  	 * a MIO enabled system. Otherwise we would have to check for every
>  	 * address if it is a special ZPCI_ADDR and would have to do
> -	 * a get_pfn() which we don't need for MIO capable devices.  Currently
> +	 * a pfn lookup which we don't need for MIO capable devices.  Currently
>  	 * ISM devices are the only devices without MIO support and there is no
>  	 * known need for accessing these from userspace.
>  	 */
> @@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>  	} else
>  		buf = local_buf;
>  
> -	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
> +	ret = -EFAULT;
> +	if (copy_from_user(buf, user_buffer, length))
> +		goto out_free;
> +
> +	mmap_read_lock(current->mm);
> +	ret = -EINVAL;
> +	vma = find_vma(current->mm, mmio_addr);
> +	if (!vma)
> +		goto out_unlock_mmap;
> +	ret = -EACCES;
> +	if (!(vma->vm_flags & VM_WRITE))
> +		goto out_unlock_mmap;
> +	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		goto out_unlock_mmap;

That check for VM_IO | VM_PFNMAP was previously hidden inside follow_pfn(),
and that would have returned -EINVAL in this case. With your change, we
now return -EACCES. Not sure how important that is, but it feels wrong.
Maybe move the VM_IO | VM_PFNMAP check up, before the ret = -EACCES?

[...]
> @@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
>  		buf = local_buf;
>  	}
>  
> -	ret = get_pfn(mmio_addr, VM_READ, &pfn);
> +	mmap_read_lock(current->mm);
> +	ret = -EINVAL;
> +	vma = find_vma(current->mm, mmio_addr);
> +	if (!vma)
> +		goto out_unlock_mmap;
> +	ret = -EACCES;
> +	if (!(vma->vm_flags & VM_WRITE))
> +		goto out_unlock_mmap;
> +	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		goto out_unlock_mmap;

Same here with VM_IO | VM_PFNMAP and -EINVAL.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 08/13] s390/pci: Remove races against pte updates
@ 2020-10-08 16:44     ` Gerald Schaefer
  0 siblings, 0 replies; 166+ messages in thread
From: Gerald Schaefer @ 2020-10-08 16:44 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, Niklas Schnelle, LKML,
	DRI Development, linux-mm, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

On Wed,  7 Oct 2020 18:44:21 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea. Fix this.
> 
> Since zpci_memcpy_from|toio seems to not do anything nefarious with
> locks we just need to open code get_pfn and follow_pfn and make sure
> we drop the locks only after we've done. The write function also needs
> the copy_from_user move, since we can't take userspace faults while
> holding the mmap sem.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> ---
>  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
>  1 file changed, 57 insertions(+), 41 deletions(-)

Looks good, thanks. Also survived some basic function test. Only some
minor nitpick, see below.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>

> 
> diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> index 401cf670a243..4d194cb09372 100644
> --- a/arch/s390/pci/pci_mmio.c
> +++ b/arch/s390/pci/pci_mmio.c
> @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
>  	return rc;
>  }
>  
> -static long get_pfn(unsigned long user_addr, unsigned long access,
> -		    unsigned long *pfn)
> -{
> -	struct vm_area_struct *vma;
> -	long ret;
> -
> -	mmap_read_lock(current->mm);
> -	ret = -EINVAL;
> -	vma = find_vma(current->mm, user_addr);
> -	if (!vma)
> -		goto out;
> -	ret = -EACCES;
> -	if (!(vma->vm_flags & access))
> -		goto out;
> -	ret = follow_pfn(vma, user_addr, pfn);
> -out:
> -	mmap_read_unlock(current->mm);
> -	return ret;
> -}
> -
>  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>  		const void __user *, user_buffer, size_t, length)
>  {
>  	u8 local_buf[64];
>  	void __iomem *io_addr;
>  	void *buf;
> -	unsigned long pfn;
> +	struct vm_area_struct *vma;
> +	pte_t *ptep;
> +	spinlock_t *ptl;
>  	long ret;
>  
>  	if (!zpci_is_enabled())
> @@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>  	 * We only support write access to MIO capable devices if we are on
>  	 * a MIO enabled system. Otherwise we would have to check for every
>  	 * address if it is a special ZPCI_ADDR and would have to do
> -	 * a get_pfn() which we don't need for MIO capable devices.  Currently
> +	 * a pfn lookup which we don't need for MIO capable devices.  Currently
>  	 * ISM devices are the only devices without MIO support and there is no
>  	 * known need for accessing these from userspace.
>  	 */
> @@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>  	} else
>  		buf = local_buf;
>  
> -	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
> +	ret = -EFAULT;
> +	if (copy_from_user(buf, user_buffer, length))
> +		goto out_free;
> +
> +	mmap_read_lock(current->mm);
> +	ret = -EINVAL;
> +	vma = find_vma(current->mm, mmio_addr);
> +	if (!vma)
> +		goto out_unlock_mmap;
> +	ret = -EACCES;
> +	if (!(vma->vm_flags & VM_WRITE))
> +		goto out_unlock_mmap;
> +	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		goto out_unlock_mmap;

That check for VM_IO | VM_PFNMAP was previously hidden inside follow_pfn(),
and that would have returned -EINVAL in this case. With your change, we
now return -EACCES. Not sure how important that is, but it feels wrong.
Maybe move the VM_IO | VM_PFNMAP check up, before the ret = -EACCES?

[...]
> @@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
>  		buf = local_buf;
>  	}
>  
> -	ret = get_pfn(mmio_addr, VM_READ, &pfn);
> +	mmap_read_lock(current->mm);
> +	ret = -EINVAL;
> +	vma = find_vma(current->mm, mmio_addr);
> +	if (!vma)
> +		goto out_unlock_mmap;
> +	ret = -EACCES;
> +	if (!(vma->vm_flags & VM_WRITE))
> +		goto out_unlock_mmap;
> +	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		goto out_unlock_mmap;

Same here with VM_IO | VM_PFNMAP and -EINVAL.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 08/13] s390/pci: Remove races against pte updates
@ 2020-10-08 16:44     ` Gerald Schaefer
  0 siblings, 0 replies; 166+ messages in thread
From: Gerald Schaefer @ 2020-10-08 16:44 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, Niklas Schnelle, LKML,
	DRI Development, linux-mm, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

On Wed,  7 Oct 2020 18:44:21 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea. Fix this.
> 
> Since zpci_memcpy_from|toio seems to not do anything nefarious with
> locks we just need to open code get_pfn and follow_pfn and make sure
> we drop the locks only after we've done. The write function also needs
> the copy_from_user move, since we can't take userspace faults while
> holding the mmap sem.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> ---
>  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
>  1 file changed, 57 insertions(+), 41 deletions(-)

Looks good, thanks. Also survived some basic function test. Only some
minor nitpick, see below.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>

> 
> diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> index 401cf670a243..4d194cb09372 100644
> --- a/arch/s390/pci/pci_mmio.c
> +++ b/arch/s390/pci/pci_mmio.c
> @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
>  	return rc;
>  }
>  
> -static long get_pfn(unsigned long user_addr, unsigned long access,
> -		    unsigned long *pfn)
> -{
> -	struct vm_area_struct *vma;
> -	long ret;
> -
> -	mmap_read_lock(current->mm);
> -	ret = -EINVAL;
> -	vma = find_vma(current->mm, user_addr);
> -	if (!vma)
> -		goto out;
> -	ret = -EACCES;
> -	if (!(vma->vm_flags & access))
> -		goto out;
> -	ret = follow_pfn(vma, user_addr, pfn);
> -out:
> -	mmap_read_unlock(current->mm);
> -	return ret;
> -}
> -
>  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>  		const void __user *, user_buffer, size_t, length)
>  {
>  	u8 local_buf[64];
>  	void __iomem *io_addr;
>  	void *buf;
> -	unsigned long pfn;
> +	struct vm_area_struct *vma;
> +	pte_t *ptep;
> +	spinlock_t *ptl;
>  	long ret;
>  
>  	if (!zpci_is_enabled())
> @@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>  	 * We only support write access to MIO capable devices if we are on
>  	 * a MIO enabled system. Otherwise we would have to check for every
>  	 * address if it is a special ZPCI_ADDR and would have to do
> -	 * a get_pfn() which we don't need for MIO capable devices.  Currently
> +	 * a pfn lookup which we don't need for MIO capable devices.  Currently
>  	 * ISM devices are the only devices without MIO support and there is no
>  	 * known need for accessing these from userspace.
>  	 */
> @@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>  	} else
>  		buf = local_buf;
>  
> -	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
> +	ret = -EFAULT;
> +	if (copy_from_user(buf, user_buffer, length))
> +		goto out_free;
> +
> +	mmap_read_lock(current->mm);
> +	ret = -EINVAL;
> +	vma = find_vma(current->mm, mmio_addr);
> +	if (!vma)
> +		goto out_unlock_mmap;
> +	ret = -EACCES;
> +	if (!(vma->vm_flags & VM_WRITE))
> +		goto out_unlock_mmap;
> +	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		goto out_unlock_mmap;

That check for VM_IO | VM_PFNMAP was previously hidden inside follow_pfn(),
and that would have returned -EINVAL in this case. With your change, we
now return -EACCES. Not sure how important that is, but it feels wrong.
Maybe move the VM_IO | VM_PFNMAP check up, before the ret = -EACCES?

[...]
> @@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
>  		buf = local_buf;
>  	}
>  
> -	ret = get_pfn(mmio_addr, VM_READ, &pfn);
> +	mmap_read_lock(current->mm);
> +	ret = -EINVAL;
> +	vma = find_vma(current->mm, mmio_addr);
> +	if (!vma)
> +		goto out_unlock_mmap;
> +	ret = -EACCES;
> +	if (!(vma->vm_flags & VM_WRITE))
> +		goto out_unlock_mmap;
> +	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		goto out_unlock_mmap;

Same here with VM_IO | VM_PFNMAP and -EINVAL.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 08/13] s390/pci: Remove races against pte updates
  2020-10-08 16:44     ` Gerald Schaefer
  (?)
@ 2020-10-08 17:16       ` Daniel Vetter
  -1 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08 17:16 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Dan Williams,
	Kees Cook, Andrew Morton, John Hubbard, Jérôme Glisse,
	Jan Kara, Niklas Schnelle

On Thu, Oct 8, 2020 at 6:44 PM Gerald Schaefer
<gerald.schaefer@linux.ibm.com> wrote:
>
> On Wed,  7 Oct 2020 18:44:21 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea. Fix this.
> >
> > Since zpci_memcpy_from|toio seems to not do anything nefarious with
> > locks we just need to open code get_pfn and follow_pfn and make sure
> > we drop the locks only after we've done. The write function also needs
> > the copy_from_user move, since we can't take userspace faults while
> > holding the mmap sem.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> > Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> > Cc: linux-s390@vger.kernel.org
> > ---
> >  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
> >  1 file changed, 57 insertions(+), 41 deletions(-)
>
> Looks good, thanks. Also survived some basic function test. Only some
> minor nitpick, see below.
>
> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
>
> >
> > diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> > index 401cf670a243..4d194cb09372 100644
> > --- a/arch/s390/pci/pci_mmio.c
> > +++ b/arch/s390/pci/pci_mmio.c
> > @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
> >       return rc;
> >  }
> >
> > -static long get_pfn(unsigned long user_addr, unsigned long access,
> > -                 unsigned long *pfn)
> > -{
> > -     struct vm_area_struct *vma;
> > -     long ret;
> > -
> > -     mmap_read_lock(current->mm);
> > -     ret = -EINVAL;
> > -     vma = find_vma(current->mm, user_addr);
> > -     if (!vma)
> > -             goto out;
> > -     ret = -EACCES;
> > -     if (!(vma->vm_flags & access))
> > -             goto out;
> > -     ret = follow_pfn(vma, user_addr, pfn);
> > -out:
> > -     mmap_read_unlock(current->mm);
> > -     return ret;
> > -}
> > -
> >  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >               const void __user *, user_buffer, size_t, length)
> >  {
> >       u8 local_buf[64];
> >       void __iomem *io_addr;
> >       void *buf;
> > -     unsigned long pfn;
> > +     struct vm_area_struct *vma;
> > +     pte_t *ptep;
> > +     spinlock_t *ptl;
> >       long ret;
> >
> >       if (!zpci_is_enabled())
> > @@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >        * We only support write access to MIO capable devices if we are on
> >        * a MIO enabled system. Otherwise we would have to check for every
> >        * address if it is a special ZPCI_ADDR and would have to do
> > -      * a get_pfn() which we don't need for MIO capable devices.  Currently
> > +      * a pfn lookup which we don't need for MIO capable devices.  Currently
> >        * ISM devices are the only devices without MIO support and there is no
> >        * known need for accessing these from userspace.
> >        */
> > @@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >       } else
> >               buf = local_buf;
> >
> > -     ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
> > +     ret = -EFAULT;
> > +     if (copy_from_user(buf, user_buffer, length))
> > +             goto out_free;
> > +
> > +     mmap_read_lock(current->mm);
> > +     ret = -EINVAL;
> > +     vma = find_vma(current->mm, mmio_addr);
> > +     if (!vma)
> > +             goto out_unlock_mmap;
> > +     ret = -EACCES;
> > +     if (!(vma->vm_flags & VM_WRITE))
> > +             goto out_unlock_mmap;
> > +     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +             goto out_unlock_mmap;
>
> That check for VM_IO | VM_PFNMAP was previously hidden inside follow_pfn(),
> and that would have returned -EINVAL in this case. With your change, we
> now return -EACCES. Not sure how important that is, but it feels wrong.
> Maybe move the VM_IO | VM_PFNMAP check up, before the ret = -EACCES?

I tried to keep the errno unchanged, but fumbled this. Will fix in the
next round, thanks a lot for reviewing and testing.

For merging I think this one here would be best through the s390 tree,
since it can be merged without any of the others in here.

Thanks, Daniel

>
> [...]
> > @@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
> >               buf = local_buf;
> >       }
> >
> > -     ret = get_pfn(mmio_addr, VM_READ, &pfn);
> > +     mmap_read_lock(current->mm);
> > +     ret = -EINVAL;
> > +     vma = find_vma(current->mm, mmio_addr);
> > +     if (!vma)
> > +             goto out_unlock_mmap;
> > +     ret = -EACCES;
> > +     if (!(vma->vm_flags & VM_WRITE))
> > +             goto out_unlock_mmap;
> > +     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +             goto out_unlock_mmap;
>
> Same here with VM_IO | VM_PFNMAP and -EINVAL.



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 08/13] s390/pci: Remove races against pte updates
@ 2020-10-08 17:16       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08 17:16 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, Niklas Schnelle, LKML,
	DRI Development, Linux MM, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Oct 8, 2020 at 6:44 PM Gerald Schaefer
<gerald.schaefer@linux.ibm.com> wrote:
>
> On Wed,  7 Oct 2020 18:44:21 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea. Fix this.
> >
> > Since zpci_memcpy_from|toio seems to not do anything nefarious with
> > locks we just need to open code get_pfn and follow_pfn and make sure
> > we drop the locks only after we've done. The write function also needs
> > the copy_from_user move, since we can't take userspace faults while
> > holding the mmap sem.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> > Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> > Cc: linux-s390@vger.kernel.org
> > ---
> >  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
> >  1 file changed, 57 insertions(+), 41 deletions(-)
>
> Looks good, thanks. Also survived some basic function test. Only some
> minor nitpick, see below.
>
> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
>
> >
> > diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> > index 401cf670a243..4d194cb09372 100644
> > --- a/arch/s390/pci/pci_mmio.c
> > +++ b/arch/s390/pci/pci_mmio.c
> > @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
> >       return rc;
> >  }
> >
> > -static long get_pfn(unsigned long user_addr, unsigned long access,
> > -                 unsigned long *pfn)
> > -{
> > -     struct vm_area_struct *vma;
> > -     long ret;
> > -
> > -     mmap_read_lock(current->mm);
> > -     ret = -EINVAL;
> > -     vma = find_vma(current->mm, user_addr);
> > -     if (!vma)
> > -             goto out;
> > -     ret = -EACCES;
> > -     if (!(vma->vm_flags & access))
> > -             goto out;
> > -     ret = follow_pfn(vma, user_addr, pfn);
> > -out:
> > -     mmap_read_unlock(current->mm);
> > -     return ret;
> > -}
> > -
> >  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >               const void __user *, user_buffer, size_t, length)
> >  {
> >       u8 local_buf[64];
> >       void __iomem *io_addr;
> >       void *buf;
> > -     unsigned long pfn;
> > +     struct vm_area_struct *vma;
> > +     pte_t *ptep;
> > +     spinlock_t *ptl;
> >       long ret;
> >
> >       if (!zpci_is_enabled())
> > @@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >        * We only support write access to MIO capable devices if we are on
> >        * a MIO enabled system. Otherwise we would have to check for every
> >        * address if it is a special ZPCI_ADDR and would have to do
> > -      * a get_pfn() which we don't need for MIO capable devices.  Currently
> > +      * a pfn lookup which we don't need for MIO capable devices.  Currently
> >        * ISM devices are the only devices without MIO support and there is no
> >        * known need for accessing these from userspace.
> >        */
> > @@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >       } else
> >               buf = local_buf;
> >
> > -     ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
> > +     ret = -EFAULT;
> > +     if (copy_from_user(buf, user_buffer, length))
> > +             goto out_free;
> > +
> > +     mmap_read_lock(current->mm);
> > +     ret = -EINVAL;
> > +     vma = find_vma(current->mm, mmio_addr);
> > +     if (!vma)
> > +             goto out_unlock_mmap;
> > +     ret = -EACCES;
> > +     if (!(vma->vm_flags & VM_WRITE))
> > +             goto out_unlock_mmap;
> > +     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +             goto out_unlock_mmap;
>
> That check for VM_IO | VM_PFNMAP was previously hidden inside follow_pfn(),
> and that would have returned -EINVAL in this case. With your change, we
> now return -EACCES. Not sure how important that is, but it feels wrong.
> Maybe move the VM_IO | VM_PFNMAP check up, before the ret = -EACCES?

I tried to keep the errno unchanged, but fumbled this. Will fix in the
next round, thanks a lot for reviewing and testing.

For merging I think this one here would be best through the s390 tree,
since it can be merged without any of the others in here.

Thanks, Daniel

>
> [...]
> > @@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
> >               buf = local_buf;
> >       }
> >
> > -     ret = get_pfn(mmio_addr, VM_READ, &pfn);
> > +     mmap_read_lock(current->mm);
> > +     ret = -EINVAL;
> > +     vma = find_vma(current->mm, mmio_addr);
> > +     if (!vma)
> > +             goto out_unlock_mmap;
> > +     ret = -EACCES;
> > +     if (!(vma->vm_flags & VM_WRITE))
> > +             goto out_unlock_mmap;
> > +     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +             goto out_unlock_mmap;
>
> Same here with VM_IO | VM_PFNMAP and -EINVAL.



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: [PATCH 08/13] s390/pci: Remove races against pte updates
@ 2020-10-08 17:16       ` Daniel Vetter
  0 siblings, 0 replies; 166+ messages in thread
From: Daniel Vetter @ 2020-10-08 17:16 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, Niklas Schnelle, LKML,
	DRI Development, Linux MM, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Oct 8, 2020 at 6:44 PM Gerald Schaefer
<gerald.schaefer@linux.ibm.com> wrote:
>
> On Wed,  7 Oct 2020 18:44:21 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea. Fix this.
> >
> > Since zpci_memcpy_from|toio seems to not do anything nefarious with
> > locks we just need to open code get_pfn and follow_pfn and make sure
> > we drop the locks only after we've done. The write function also needs
> > the copy_from_user move, since we can't take userspace faults while
> > holding the mmap sem.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> > Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> > Cc: linux-s390@vger.kernel.org
> > ---
> >  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
> >  1 file changed, 57 insertions(+), 41 deletions(-)
>
> Looks good, thanks. Also survived some basic function test. Only some
> minor nitpick, see below.
>
> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
>
> >
> > diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> > index 401cf670a243..4d194cb09372 100644
> > --- a/arch/s390/pci/pci_mmio.c
> > +++ b/arch/s390/pci/pci_mmio.c
> > @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
> >       return rc;
> >  }
> >
> > -static long get_pfn(unsigned long user_addr, unsigned long access,
> > -                 unsigned long *pfn)
> > -{
> > -     struct vm_area_struct *vma;
> > -     long ret;
> > -
> > -     mmap_read_lock(current->mm);
> > -     ret = -EINVAL;
> > -     vma = find_vma(current->mm, user_addr);
> > -     if (!vma)
> > -             goto out;
> > -     ret = -EACCES;
> > -     if (!(vma->vm_flags & access))
> > -             goto out;
> > -     ret = follow_pfn(vma, user_addr, pfn);
> > -out:
> > -     mmap_read_unlock(current->mm);
> > -     return ret;
> > -}
> > -
> >  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >               const void __user *, user_buffer, size_t, length)
> >  {
> >       u8 local_buf[64];
> >       void __iomem *io_addr;
> >       void *buf;
> > -     unsigned long pfn;
> > +     struct vm_area_struct *vma;
> > +     pte_t *ptep;
> > +     spinlock_t *ptl;
> >       long ret;
> >
> >       if (!zpci_is_enabled())
> > @@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >        * We only support write access to MIO capable devices if we are on
> >        * a MIO enabled system. Otherwise we would have to check for every
> >        * address if it is a special ZPCI_ADDR and would have to do
> > -      * a get_pfn() which we don't need for MIO capable devices.  Currently
> > +      * a pfn lookup which we don't need for MIO capable devices.  Currently
> >        * ISM devices are the only devices without MIO support and there is no
> >        * known need for accessing these from userspace.
> >        */
> > @@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >       } else
> >               buf = local_buf;
> >
> > -     ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
> > +     ret = -EFAULT;
> > +     if (copy_from_user(buf, user_buffer, length))
> > +             goto out_free;
> > +
> > +     mmap_read_lock(current->mm);
> > +     ret = -EINVAL;
> > +     vma = find_vma(current->mm, mmio_addr);
> > +     if (!vma)
> > +             goto out_unlock_mmap;
> > +     ret = -EACCES;
> > +     if (!(vma->vm_flags & VM_WRITE))
> > +             goto out_unlock_mmap;
> > +     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +             goto out_unlock_mmap;
>
> That check for VM_IO | VM_PFNMAP was previously hidden inside follow_pfn(),
> and that would have returned -EINVAL in this case. With your change, we
> now return -EACCES. Not sure how important that is, but it feels wrong.
> Maybe move the VM_IO | VM_PFNMAP check up, before the ret = -EACCES?

I tried to keep the errno unchanged, but fumbled this. Will fix in the
next round, thanks a lot for reviewing and testing.

For merging I think this one here would be best through the s390 tree,
since it can be merged without any of the others in here.

Thanks, Daniel

>
> [...]
> > @@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
> >               buf = local_buf;
> >       }
> >
> > -     ret = get_pfn(mmio_addr, VM_READ, &pfn);
> > +     mmap_read_lock(current->mm);
> > +     ret = -EINVAL;
> > +     vma = find_vma(current->mm, mmio_addr);
> > +     if (!vma)
> > +             goto out_unlock_mmap;
> > +     ret = -EACCES;
> > +     if (!(vma->vm_flags & VM_WRITE))
> > +             goto out_unlock_mmap;
> > +     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +             goto out_unlock_mmap;
>
> Same here with VM_IO | VM_PFNMAP and -EINVAL.



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 166+ messages in thread

end of thread, other threads:[~2020-10-09  7:30 UTC | newest]

Thread overview: 166+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-07 16:44 [PATCH 00/13] follow_pfn and other iomap races Daniel Vetter
2020-10-07 16:44 ` Daniel Vetter
2020-10-07 16:44 ` Daniel Vetter
2020-10-07 16:44 ` [PATCH 01/13] drm/exynos: Stop using frame_vector helpers Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 20:32   ` John Hubbard
2020-10-07 20:32     ` John Hubbard
2020-10-07 20:32     ` John Hubbard
2020-10-07 21:32     ` Daniel Vetter
2020-10-07 21:32       ` Daniel Vetter
2020-10-07 21:32       ` Daniel Vetter
2020-10-07 21:36       ` John Hubbard
2020-10-07 21:36         ` John Hubbard
2020-10-07 21:36         ` John Hubbard
2020-10-07 21:50         ` Daniel Vetter
2020-10-07 21:50           ` Daniel Vetter
2020-10-07 21:50           ` Daniel Vetter
2020-10-07 16:44 ` [PATCH 02/13] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 20:43   ` John Hubbard
2020-10-07 20:43     ` John Hubbard
2020-10-07 20:43     ` John Hubbard
2020-10-07 16:44 ` [PATCH 03/13] misc/habana: Stop using frame_vector helpers Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 20:38   ` John Hubbard
2020-10-07 20:38     ` John Hubbard
2020-10-07 20:38     ` John Hubbard
2020-10-07 16:44 ` [PATCH 04/13] misc/habana: Use FOLL_LONGTERM for userptr Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 20:46   ` John Hubbard
2020-10-07 20:46     ` John Hubbard
2020-10-07 20:46     ` John Hubbard
2020-10-07 16:44 ` [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:53   ` Jason Gunthorpe
2020-10-07 16:53     ` Jason Gunthorpe
2020-10-07 16:53     ` Jason Gunthorpe
2020-10-07 17:12     ` Daniel Vetter
2020-10-07 17:12       ` Daniel Vetter
2020-10-07 17:12       ` Daniel Vetter
2020-10-07 17:33       ` Jason Gunthorpe
2020-10-07 17:33         ` Jason Gunthorpe
2020-10-07 17:33         ` Jason Gunthorpe
2020-10-07 21:13   ` John Hubbard
2020-10-07 21:13     ` John Hubbard
2020-10-07 21:13     ` John Hubbard
2020-10-07 21:30     ` Daniel Vetter
2020-10-07 21:30       ` Daniel Vetter
2020-10-07 21:30       ` Daniel Vetter
2020-10-07 16:44 ` [PATCH 06/13] media: videobuf2: Move frame_vector into media subsystem Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 22:18   ` John Hubbard
2020-10-07 22:18     ` John Hubbard
2020-10-07 22:18     ` John Hubbard
2020-10-07 16:44 ` [PATCH 07/13] mm: close race in generic_access_phys Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 17:27   ` Jason Gunthorpe
2020-10-07 17:27     ` Jason Gunthorpe
2020-10-07 17:27     ` Jason Gunthorpe
2020-10-07 18:01     ` Daniel Vetter
2020-10-07 18:01       ` Daniel Vetter
2020-10-07 18:01       ` Daniel Vetter
2020-10-07 23:21       ` Jason Gunthorpe
2020-10-07 23:21         ` Jason Gunthorpe
2020-10-07 23:21         ` Jason Gunthorpe
2020-10-08  0:44   ` John Hubbard
2020-10-08  0:44     ` John Hubbard
2020-10-08  0:44     ` John Hubbard
2020-10-08  7:23     ` Daniel Vetter
2020-10-08  7:23       ` Daniel Vetter
2020-10-08  7:23       ` Daniel Vetter
2020-10-07 16:44 ` [PATCH 08/13] s390/pci: Remove races against pte updates Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-08 16:44   ` Gerald Schaefer
2020-10-08 16:44     ` Gerald Schaefer
2020-10-08 16:44     ` Gerald Schaefer
2020-10-08 17:16     ` Daniel Vetter
2020-10-08 17:16       ` Daniel Vetter
2020-10-08 17:16       ` Daniel Vetter
2020-10-07 16:44 ` [PATCH 09/13] PCI: obey iomem restrictions for procfs mmap Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 18:46   ` Bjorn Helgaas
2020-10-07 18:46     ` Bjorn Helgaas
2020-10-07 18:46     ` Bjorn Helgaas
2020-10-07 18:46     ` Bjorn Helgaas
2020-10-07 16:44 ` [PATCH 10/13] PCI: revoke mappings like devmem Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 18:41   ` Bjorn Helgaas
2020-10-07 18:41     ` Bjorn Helgaas
2020-10-07 18:41     ` Bjorn Helgaas
2020-10-07 19:24     ` Daniel Vetter
2020-10-07 19:24       ` Daniel Vetter
2020-10-07 19:24       ` Daniel Vetter
2020-10-07 19:33   ` Dan Williams
2020-10-07 19:33     ` Dan Williams
2020-10-07 19:33     ` Dan Williams
2020-10-07 19:47     ` Daniel Vetter
2020-10-07 19:47       ` Daniel Vetter
2020-10-07 19:47       ` Daniel Vetter
2020-10-07 22:23       ` Dan Williams
2020-10-07 22:23         ` Dan Williams
2020-10-07 22:23         ` Dan Williams
2020-10-07 22:29         ` Dan Williams
2020-10-07 22:29           ` Dan Williams
2020-10-07 22:29           ` Dan Williams
2020-10-08  8:09           ` Daniel Vetter
2020-10-08  8:09             ` Daniel Vetter
2020-10-08  8:09             ` Daniel Vetter
2020-10-07 23:24     ` Jason Gunthorpe
2020-10-07 23:24       ` Jason Gunthorpe
2020-10-07 23:24       ` Jason Gunthorpe
2020-10-08  7:31       ` Daniel Vetter
2020-10-08  7:31         ` Daniel Vetter
2020-10-08  7:31         ` Daniel Vetter
2020-10-08  7:49       ` Dan Williams
2020-10-08  7:49         ` Dan Williams
2020-10-08  7:49         ` Dan Williams
2020-10-08  8:13         ` Daniel Vetter
2020-10-08  8:13           ` Daniel Vetter
2020-10-08  8:13           ` Daniel Vetter
2020-10-08  8:35           ` Dan Williams
2020-10-08  8:35             ` Dan Williams
2020-10-08  8:35             ` Dan Williams
2020-10-08 12:41         ` Jason Gunthorpe
2020-10-08 12:41           ` Jason Gunthorpe
2020-10-08 12:41           ` Jason Gunthorpe
2020-10-07 16:44 ` [PATCH 11/13] mm: add unsafe_follow_pfn Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 17:36   ` Jason Gunthorpe
2020-10-07 17:36     ` Jason Gunthorpe
2020-10-07 17:36     ` Jason Gunthorpe
2020-10-07 18:10     ` Daniel Vetter
2020-10-07 18:10       ` Daniel Vetter
2020-10-07 18:10       ` Daniel Vetter
2020-10-07 19:00       ` Jason Gunthorpe
2020-10-07 19:00         ` Jason Gunthorpe
2020-10-07 19:00         ` Jason Gunthorpe
2020-10-07 19:38         ` Daniel Vetter
2020-10-07 19:38           ` Daniel Vetter
2020-10-07 19:38           ` Daniel Vetter
2020-10-07 16:44 ` [PATCH 12/13] media/videbuf1|2: Mark follow_pfn usage as unsafe Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44 ` [PATCH 13/13] vfio/type1: Mark follow_pfn " Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 16:44   ` Daniel Vetter
2020-10-07 17:39   ` Jason Gunthorpe
2020-10-07 17:39     ` Jason Gunthorpe
2020-10-07 17:39     ` Jason Gunthorpe
2020-10-07 18:14     ` Daniel Vetter
2020-10-07 18:14       ` Daniel Vetter
2020-10-07 18:14       ` Daniel Vetter
2020-10-07 18:47       ` Jason Gunthorpe
2020-10-07 18:47         ` Jason Gunthorpe
2020-10-07 18:47         ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.